1
0
Fork 0
Commit Graph

185 Commits

Author SHA1 Message Date
Kaveh Vahedipour 7b80deb5cc Fixed object assignment operator for agency's key value store (#3701)
* Fixed object assignment operator for agency's key value store
* Node's toJson is now actually toJson. getString should be used for string extractions
* adjust agency's documentation (clarify precondition)
2017-11-17 15:49:40 +01:00
Kaveh Vahedipour 255d90d26a cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency (#3709)
This is the HealthRecord upgrade patch.
2017-11-17 10:14:14 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
Michael Hackstein 15d9a4be5f Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510) 2017-10-25 18:03:24 +02:00
Simon Grätzer 7c31960cf2 Feature/async failover (#3451) 2017-10-18 23:59:29 +02:00
Max Neunhöffer 9a2385b941 Add host id detection and show in /_admin/cluster/Health. (#3389) 2017-10-11 12:42:44 +02:00
Kaveh Vahedipour 627f344266 fixed a bug, where when servers failed, when also agency leadership c… (#3189)
* fixed a bug, where when servers failed, when also agency leadership changes

* redid entire design of checkDBServers/checkCoordinators.

* comparison in supervision must be between oldPersisted and newHealth

* UI stuff

* UI stuff

* FailedServer test needed adjustment

* Hopefully final round

* fixed supervision failure detection

* FailedServer tests back to origin devel

* oldNot documented among preconditions in Agency HTTP API docs

* changed only look for status updated

* non action line in api-cluster
2017-09-07 16:10:23 +02:00
Kaveh Vahedipour 00650e6a3f Bug fix/agency mt fixes (#3158)
* added debugging methods

* try to fix invalid access in case of error

* remove unused members

* bugfixes and comments

* all agency fixes in

* merge bug

* partially unguarded Agent::lead fixed

* all agency fixes in

* added nrBlocked to thread startup eval

* added nrBlocked to thread startup eval

* recombination of cases in State::get

* some maps replaced with unordered_maps

* optimized maps some
2017-08-30 10:43:51 +02:00
Andreas Streichardt 8e15412e06 Wait for supervision node to prevent races 2017-06-09 15:52:29 +02:00
jsteemann 2930ab6b57 cppcheck 2017-05-15 22:39:16 +02:00
Andreas Streichardt fe59502848 Fix server health 2017-05-11 12:20:15 +02:00
Kaveh Vahedipour de77b5ec7a getting rid of exceptions in supervision 2017-05-10 17:50:31 +02:00
Kaveh Vahedipour b0e7ce40f0 avoid exceptions in supervision main thread when running without cluster 2017-05-04 14:37:03 +02:00
Kaveh Vahedipour 68efba18e8 keep agencyPrefix, when non set 2017-04-26 15:32:26 +02:00
jsteemann 4289105eb3 fix shutdown issue 2017-04-25 16:09:01 +02:00
Kaveh Vahedipour 09a6888d14 attempt at fixing shutdown bug on mac os x 2017-04-24 10:45:54 +02:00
jsteemann ea8496f1a5 cppcheck 2017-04-21 20:19:36 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour 4cc830b0df merge from 3.1 2017-02-20 20:05:52 +01:00
jsteemann b3ac54d065 remove global namespace include 2017-02-13 13:03:33 +01:00
jsteemann d024a6d00a remove logging for non-topics 2017-02-10 09:32:50 +01:00
Andreas Streichardt 8349f56e40 Properly check return valiue 2017-02-07 15:15:56 +01:00
Kaveh Vahedipour 8d66d69f83 supervision handles coordinator demise correctly 2017-02-07 11:29:37 +01:00
Kaveh Vahedipour f3cb1307a5 3.1 fixes backported to devel 2017-02-03 10:48:25 +01:00
jsteemann fa917937c4 do not use namespaces in header files 2017-02-01 13:41:31 +01:00
Kaveh Vahedipour 3f3633bd2c supervision to proper preconditioning of jobs on plan 2017-01-27 15:29:22 +01:00
Kaveh Vahedipour c4bff477a6 wrong persistence of status 2017-01-24 12:52:31 +01:00
Kaveh Vahedipour cfbdaff0a8 Back in add follower 2017-01-23 09:39:32 +01:00
Kaveh Vahedipour 163e0158dc before cppcheck enthusiasts start slacking :) 2017-01-20 15:22:30 +01:00
Kaveh Vahedipour d2760f4ef1 pushing avoidServers property 2017-01-20 15:15:03 +01:00
Kaveh Vahedipour bbb45ca397 Correct depiction of servers health status 2017-01-20 09:17:04 +01:00
Kaveh Vahedipour eb661f95f2 Merge branch 'devel' of https://github.com/arangodb/arangodb into devel 2017-01-18 17:26:54 +01:00
Kaveh Vahedipour f47b3b3c9d transient heartbeats 2017-01-18 17:26:45 +01:00
jsteemann 73da10a7e7 remove unused variable 2017-01-18 13:50:07 +01:00
Kaveh Vahedipour aaee2f9e61 transient heartbeats 2017-01-18 13:43:33 +01:00
Kaveh Vahedipour 879102117d more replicationTest 2017-01-16 15:43:32 +01:00
Kaveh Vahedipour a75b3624de resilience move ok again? 2017-01-16 12:09:21 +01:00
Kaveh Vahedipour d30458b011 Supervision should not exit of empty plan collection 2017-01-10 16:53:24 +01:00
Kaveh Vahedipour 331d074ebe more information from ClusterInfo's dropCollectionCoordinator 2017-01-10 16:25:00 +01:00
Kaveh Vahedipour 90c18e4914 waitFor will report more paranoid 2017-01-10 13:53:31 +01:00
Kaveh Vahedipour 55985ed5de missing prototypes 2017-01-09 10:38:34 +01:00
Kaveh Vahedipour ab6678eb1f need to fix tests first 2016-12-29 16:25:30 +01:00
Kaveh Vahedipour ce687562f2 less rigid expectation on smooth operations through agency comm under worst case scenarios. 2016-12-28 10:32:20 +01:00
Kaveh Vahedipour fcdc7601f3 Merge branch 'devel' of https://github.com/arangodb/arangodb into devel 2016-12-23 14:06:34 +01:00
Max Neunhoeffer b6ad88d7f8 Do not getUniqueIds when not leading. 2016-12-23 14:02:44 +01:00
Kaveh Vahedipour 770a49d8ca supervision should only collect unique ids in leadership 2016-12-23 13:27:59 +01:00
Kaveh Vahedipour facd58b8b5 replication factor is enforced after failedleader to add new follower 2016-12-22 13:12:16 +01:00
Kaveh Vahedipour 8924cf7852 let's not count failed db servers in replication factor fix 2016-12-21 15:49:56 +01:00
Kaveh Vahedipour 12e54902df agency's supervision must wait grace period after becoming leader before acting on db server failure 2016-12-21 11:17:41 +01:00
Max Neunhoeffer 985ccaeb70 Get rid of Supervision::wakeUp(). 2016-12-20 10:19:24 +01:00