1
0
Fork 0
Commit Graph

70 Commits

Author SHA1 Message Date
Lars Maier 642c5fd994 Bug fix 3.3/cleanup lost collections (#6721)
* Working draft: clean lost collections in supervision.
* Added early exit as in spec.
* Finished test. Fixed logging.
* Increase plan version when cleaning out a lost collection.
* Increase the current version rather than the plan version.
* Fixed test for 3.3
2018-10-08 16:35:18 +02:00
Matthew Von-Maszewski ec5a2f62b8 3.3: Bring two key Agency bug fixes, plus some secondary stuff back to 3.3 (#6009) 2018-08-08 10:33:17 +02:00
Kaveh Vahedipour 507418d9a4 stop supervision on demand (#5109)
* stop supervision on demand
* adding tests
* Correct an error message.
2018-04-20 11:58:47 +02:00
Kaveh Vahedipour cce5b2decb Bug fix 3.3/supervision to delete removed nodes from health (#4455) 2018-02-13 15:55:42 +01:00
Kaveh Vahedipour 255d90d26a cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency (#3709)
This is the HealthRecord upgrade patch.
2017-11-17 10:14:14 +01:00
Jan bef52d7dc3
Bug fix/cleanup after cppcheck (#3639) 2017-11-10 13:53:28 +01:00
Kaveh Vahedipour 627f344266 fixed a bug, where when servers failed, when also agency leadership c… (#3189)
* fixed a bug, where when servers failed, when also agency leadership changes

* redid entire design of checkDBServers/checkCoordinators.

* comparison in supervision must be between oldPersisted and newHealth

* UI stuff

* UI stuff

* FailedServer test needed adjustment

* Hopefully final round

* fixed supervision failure detection

* FailedServer tests back to origin devel

* oldNot documented among preconditions in Agency HTTP API docs

* changed only look for status updated

* non action line in api-cluster
2017-09-07 16:10:23 +02:00
Kaveh Vahedipour 00650e6a3f Bug fix/agency mt fixes (#3158)
* added debugging methods

* try to fix invalid access in case of error

* remove unused members

* bugfixes and comments

* all agency fixes in

* merge bug

* partially unguarded Agent::lead fixed

* all agency fixes in

* added nrBlocked to thread startup eval

* added nrBlocked to thread startup eval

* recombination of cases in State::get

* some maps replaced with unordered_maps

* optimized maps some
2017-08-30 10:43:51 +02:00
Andreas Streichardt fe59502848 Fix server health 2017-05-11 12:20:15 +02:00
Kaveh Vahedipour 68efba18e8 keep agencyPrefix, when non set 2017-04-26 15:32:26 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour 8d66d69f83 supervision handles coordinator demise correctly 2017-02-07 11:29:37 +01:00
Kaveh Vahedipour aaee2f9e61 transient heartbeats 2017-01-18 13:43:33 +01:00
Kaveh Vahedipour 55985ed5de missing prototypes 2017-01-09 10:38:34 +01:00
jsteemann 7359ac44b2 more style cleanup 2017-01-05 10:52:03 +01:00
Kaveh Vahedipour 12e54902df agency's supervision must wait grace period after becoming leader before acting on db server failure 2016-12-21 11:17:41 +01:00
Max Neunhoeffer 985ccaeb70 Get rid of Supervision::wakeUp(). 2016-12-20 10:19:24 +01:00
Kaveh Vahedipour 51b279346b redirects to myelf should be hinstory 2016-12-06 17:10:15 +01:00
Andreas Streichardt 63a173f002 Delete all shard move jobs when server is healthy again 2016-11-22 14:13:09 +01:00
Kaveh Vahedipour 9a6f605f2f fixed small double / long conversion 2016-10-31 17:00:55 +01:00
Kaveh Vahedipour f8235b9c63 agency locks code review 2016-10-25 15:07:57 +02:00
Max Neunhoeffer 3a76784af4 Protect memory accesses to _snapshot in Supervision. 2016-10-12 10:23:21 +00:00
Kaveh Vahedipour 1f4abf3c36 upgrade 3.0 agency to 3.1 2016-10-06 17:04:29 +02:00
jsteemann f5a595f464 Merge branch 'devel' of https://github.com/arangodb/arangodb into generic-col-types 2016-09-07 08:52:07 +02:00
Andreas Streichardt 6396ac4dc7 Implement removeServer job 2016-09-06 16:49:25 +02:00
jsteemann 6ddf8bab54 Merge branch 'devel' of https://github.com/arangodb/arangodb into generic-col-types 2016-09-06 11:22:14 +02:00
Kaveh Vahedipour 85ea1d5ff9 clang-format 2016-09-06 10:01:33 +02:00
Andreas Streichardt f9fea70c3e readd method 2016-09-05 15:50:41 +02:00
Kaveh Vahedipour 9808a55a33 some cleaning up 2016-09-05 15:12:46 +02:00
jsteemann c6efe26198 cppcheck 2016-08-25 14:04:23 +02:00
Andreas Streichardt 89ebeefbb9 Proper shutdown 2016-08-24 13:51:23 +02:00
Andreas Streichardt 47a0f8602a Better shutdown handling 2016-08-23 12:51:38 +02:00
Andreas Streichardt 03b9d97e2f Implement proper cluster shutdown 2016-08-18 11:23:23 +02:00
Andreas Streichardt 3f412debf0 Revert futile attempts to implement client resilience tests 2016-08-17 18:12:40 +02:00
Andreas Streichardt 70af1e3647 Implement proper cluster shutdown 2016-08-17 17:25:39 +02:00
Andreas Streichardt 526c8f42c2 Fix foxx issues in cluster
Bootstrap will now be done on the bootstrap coordinator.

queues will now be executed by the "foxxmaster"
2016-07-29 16:06:31 +02:00
jsteemann f21561b25f use nullptr, don't include Thread.h when unnecessary 2016-06-15 19:21:53 +02:00
Kaveh Vahedipour beba4887a3 shrink cluster in supervision 2016-06-10 18:10:37 +02:00
Kaveh Vahedipour 00d6111a3e server health for aardvark 2016-06-03 14:27:04 +02:00
Kaveh Vahedipour 427453bcc7 server health for aardvark 2016-06-03 12:19:39 +02:00
Kaveh Vahedipour 9957270df6 hunting down exceptions in agency supervision 2016-05-31 21:42:41 +02:00
Max Neunhoeffer b600ddbeb4 Fix getUniqueIds and updateAgencyPrefix in Supervision.
This prevents some race conditions at cluster startup that crashed the
agency.
2016-05-31 12:38:17 -06:00
Kaveh Vahedipour 7b440f94dc Moving Job classes out of Supervision 2016-05-31 16:28:54 +02:00
Kaveh Vahedipour bad7a6a35a leader fail seems good 2016-05-31 15:21:42 +02:00
Kaveh Vahedipour 68478f530d visual studio warning 2016-05-30 15:47:08 +02:00
Kaveh Vahedipour 318a073068 finish cleans up blocks 2016-05-27 16:27:38 +02:00
Kaveh Vahedipour 1846a3c4f7 finished jobs. clean out server, failed leader, move shard 2016-05-25 17:45:28 +02:00
Kaveh Vahedipour 00d3587e9a Supervision moves shards 2016-05-24 15:57:08 +02:00
Kaveh Vahedipour 3d0ebeab13 Some surious warnings. 2016-05-23 17:34:52 +02:00
Kaveh Vahedipour 6110773fdb Redone job design in supervision to simpler interface. 2016-05-23 17:07:35 +02:00