1
0
Fork 0
Commit Graph

78 Commits

Author SHA1 Message Date
Lars Maier 9a33122c5d [3.5] Added precondition to ensure that server is still as seen before. (#10477)
* Added precondition to ensure that server is still as seen before.

* Removed merge conflicts.
2019-11-20 13:39:57 +01:00
Max Neunhöffer 328f46e3d6 This merges hotbackup and atomic-db-creation into 3.5. (#9968)
* Squashed commit of feature-3.5/hotbackup_devel.

This puts hotbackup into 3.5.

* Port atomic-database-creation-2 to 3.5.

* Remove some wrongly ported code.

* Fix compilation.

* Fix a manual merge error.

* Remove a feature from the mocks which does not exist in 3.5.

* Add some code which was forgotten in manual merge.

* Fix a problem introduced in a manual merge.

* reuse function

* Address some whitespace issues that came up in review

* aardvark should not create the frontend collection

* create _frontend collection from c++

* recheckAndUpdate Callback in CollectionWatcher

* Wrong author ;)

* rm outdated todo

* Update lib/Basics/VelocyPackHelper.h

Co-Authored-By: Michael Hackstein <michael@arangodb.com>

* use logger unique id, use startup logger

* not needed

* optimized vector shardid method

* do not create _modules collection lazy anymre

* Formatting.

* Assert instead of if/TRI_ASSERT(false)

* Don't use exceptions as control structure

* Re-add READ_LOCKER that got lost in translation

* Fix audit log in case database creation fails early.

* legacy sharding

* Add CHANGELOG entry.

* Retry database cancellation indefinitely

* Do not use exceptions in UpgradeTask

* DropCollection is a FAST_LANE action and should not need much time or else retry.

* Remove superflous addition of LdapFeature

Proudly brought to you by ASAN tests

* Fixed check for distributShardsLike sharding on _system database

* Fixed compile issue on tests

* Removed assertion that seems to be not correct yet on devel.

* Sort out google cloud storage as remote. (#9918)

* Add successful method to ClusterCommResult.
* Improve error forwarding for cluster internal communication.

* Feature/hotbackup list retries (#9924)

* retry hot backup listing for 2 minutes in cluster before giving up

* Enable api by default.

* fix broken list of non existing id (#9957)

* Fix compilation after manual merge.

* Fix another compilation problem.

* Yet more fixes for compilation.

* More compilation fixes.
2019-09-11 13:13:54 +03:00
Max Neunhöffer 80bfb85695
Port agency performance tuning for many shards to devel. (#8647)
* Port agency performance tuning for many shards to devel.
* Add more IDs to LOG_TOPIC calls.
* Even more IDs for LOG_TOPIC.
* Fix a duplicate LOG_TOPIC ID.
* Fix an old merging bug in devel.
* Don't hesitate between phases one and two for small clusters.
2019-04-11 11:14:56 +02:00
Max Neunhöffer 02281d3be4
Handle InitDone correctly. (#8552)
* precondition plan / version in compaction / store TTL removal independent of local _ttl set
* Agency init loops break when shutting down.
* assertion failures in store on restarting following agents
* Minor porting fixes from 3.4
2019-04-01 17:01:05 +02:00
Kaveh Vahedipour 68178ba165 [devel] supervision bug fix backports (#8314)
* back ports for supervision fixes from 3.4 part 1

* back ports for supervision fixes from 3.4 part 2
2019-03-04 19:27:24 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
Kaveh Vahedipour 9ec6619b84 Bug fix/index readiness (#6541)
* indexes are marked  while still missing in Current
* index handling getCollection
* supervision gets indexes from isbuilding, when coordinator is gone before finishing
* seems right now
* fixed broken views
* remove junk comments
* cleanup
* node / supervision adjustements
* supervision fixes
* neunhoef remarks part i
* neunhoef remarks part ii
* neunhoef remarks part ii
* neunhoef remarks part iiI
* collection's current version please
* no need to wait for current once again
* no longer necessary code
* clear comments
* delete left overs
* dead code revived
2018-11-21 14:42:58 +01:00
Lars Maier 3dbb0558f3 Clean lost collections in supervision (#6592)
* Working draft: clean lost collections in supervision.
* Added early exit as in spec.
* Finished test. Fixed logging.
2018-09-26 16:54:29 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00
Matthew Von-Maszewski a84f7805ad Feature/mv thread death logging (#5111)
* Initial low level interface for thread crash reporting (and management).
* Add a member version of isClusterRole()
* isolate heartbeat thread creation to new StartHeartbeatThread().  create heartbeat thread even if not a cluster or if an agent.
* update runDBServer() and runCoordinator() to shutdown more quickly by polling isStopping() at additional locations.
* copying updates from different branch / PR
* basic thread crash logging.  Not yet tied into Agency arangod or have any specific threads posting crashes
* make Supervision thread a CriticalThread
* sandwich CriticalThread between Thread and other classes to create long term, repeating thread crash reporting.
* restore code lost upon branch update relating to new startHeartbeatThread() function
* add CriticalThread.cpp to build
* add new runAgentServer() function to loop for Agents.  Make Heartbeat thread derive from CriticalThread.
* remove debug line
2018-04-23 15:50:14 +02:00
Kaveh Vahedipour 3d043b35a3 Feature/supervsion maintenance mode (#5108)
* Supervision goes to Maintenance mode, when /arango/Supervision/Maintenance exists
* coordinator route stands
* stop updates in transient, when supervision off
2018-04-20 13:23:22 +02:00
Kaveh Vahedipour 7f9786eb27 builder fixed for agency transaction. worked only for a single server. (#4436) 2018-02-06 23:14:53 +01:00
Kaveh Vahedipour 255d90d26a cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency (#3709)
This is the HealthRecord upgrade patch.
2017-11-17 10:14:14 +01:00
Jan bef52d7dc3
Bug fix/cleanup after cppcheck (#3639) 2017-11-10 13:53:28 +01:00
Kaveh Vahedipour 627f344266 fixed a bug, where when servers failed, when also agency leadership c… (#3189)
* fixed a bug, where when servers failed, when also agency leadership changes

* redid entire design of checkDBServers/checkCoordinators.

* comparison in supervision must be between oldPersisted and newHealth

* UI stuff

* UI stuff

* FailedServer test needed adjustment

* Hopefully final round

* fixed supervision failure detection

* FailedServer tests back to origin devel

* oldNot documented among preconditions in Agency HTTP API docs

* changed only look for status updated

* non action line in api-cluster
2017-09-07 16:10:23 +02:00
Kaveh Vahedipour 00650e6a3f Bug fix/agency mt fixes (#3158)
* added debugging methods

* try to fix invalid access in case of error

* remove unused members

* bugfixes and comments

* all agency fixes in

* merge bug

* partially unguarded Agent::lead fixed

* all agency fixes in

* added nrBlocked to thread startup eval

* added nrBlocked to thread startup eval

* recombination of cases in State::get

* some maps replaced with unordered_maps

* optimized maps some
2017-08-30 10:43:51 +02:00
Andreas Streichardt fe59502848 Fix server health 2017-05-11 12:20:15 +02:00
Kaveh Vahedipour 68efba18e8 keep agencyPrefix, when non set 2017-04-26 15:32:26 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour 8d66d69f83 supervision handles coordinator demise correctly 2017-02-07 11:29:37 +01:00
Kaveh Vahedipour aaee2f9e61 transient heartbeats 2017-01-18 13:43:33 +01:00
Kaveh Vahedipour 55985ed5de missing prototypes 2017-01-09 10:38:34 +01:00
jsteemann 7359ac44b2 more style cleanup 2017-01-05 10:52:03 +01:00
Kaveh Vahedipour 12e54902df agency's supervision must wait grace period after becoming leader before acting on db server failure 2016-12-21 11:17:41 +01:00
Max Neunhoeffer 985ccaeb70 Get rid of Supervision::wakeUp(). 2016-12-20 10:19:24 +01:00
Kaveh Vahedipour 51b279346b redirects to myelf should be hinstory 2016-12-06 17:10:15 +01:00
Andreas Streichardt 63a173f002 Delete all shard move jobs when server is healthy again 2016-11-22 14:13:09 +01:00
Kaveh Vahedipour 9a6f605f2f fixed small double / long conversion 2016-10-31 17:00:55 +01:00
Kaveh Vahedipour f8235b9c63 agency locks code review 2016-10-25 15:07:57 +02:00
Max Neunhoeffer 3a76784af4 Protect memory accesses to _snapshot in Supervision. 2016-10-12 10:23:21 +00:00
Kaveh Vahedipour 1f4abf3c36 upgrade 3.0 agency to 3.1 2016-10-06 17:04:29 +02:00
jsteemann f5a595f464 Merge branch 'devel' of https://github.com/arangodb/arangodb into generic-col-types 2016-09-07 08:52:07 +02:00
Andreas Streichardt 6396ac4dc7 Implement removeServer job 2016-09-06 16:49:25 +02:00
jsteemann 6ddf8bab54 Merge branch 'devel' of https://github.com/arangodb/arangodb into generic-col-types 2016-09-06 11:22:14 +02:00
Kaveh Vahedipour 85ea1d5ff9 clang-format 2016-09-06 10:01:33 +02:00
Andreas Streichardt f9fea70c3e readd method 2016-09-05 15:50:41 +02:00
Kaveh Vahedipour 9808a55a33 some cleaning up 2016-09-05 15:12:46 +02:00
jsteemann c6efe26198 cppcheck 2016-08-25 14:04:23 +02:00
Andreas Streichardt 89ebeefbb9 Proper shutdown 2016-08-24 13:51:23 +02:00
Andreas Streichardt 47a0f8602a Better shutdown handling 2016-08-23 12:51:38 +02:00
Andreas Streichardt 03b9d97e2f Implement proper cluster shutdown 2016-08-18 11:23:23 +02:00
Andreas Streichardt 3f412debf0 Revert futile attempts to implement client resilience tests 2016-08-17 18:12:40 +02:00
Andreas Streichardt 70af1e3647 Implement proper cluster shutdown 2016-08-17 17:25:39 +02:00
Andreas Streichardt 526c8f42c2 Fix foxx issues in cluster
Bootstrap will now be done on the bootstrap coordinator.

queues will now be executed by the "foxxmaster"
2016-07-29 16:06:31 +02:00
jsteemann f21561b25f use nullptr, don't include Thread.h when unnecessary 2016-06-15 19:21:53 +02:00
Kaveh Vahedipour beba4887a3 shrink cluster in supervision 2016-06-10 18:10:37 +02:00
Kaveh Vahedipour 00d6111a3e server health for aardvark 2016-06-03 14:27:04 +02:00
Kaveh Vahedipour 427453bcc7 server health for aardvark 2016-06-03 12:19:39 +02:00
Kaveh Vahedipour 9957270df6 hunting down exceptions in agency supervision 2016-05-31 21:42:41 +02:00
Max Neunhoeffer b600ddbeb4 Fix getUniqueIds and updateAgencyPrefix in Supervision.
This prevents some race conditions at cluster startup that crashed the
agency.
2016-05-31 12:38:17 -06:00