1
0
Fork 0
Commit Graph

198 Commits

Author SHA1 Message Date
Lars Maier 642c5fd994 Bug fix 3.3/cleanup lost collections (#6721)
* Working draft: clean lost collections in supervision.
* Added early exit as in spec.
* Finished test. Fixed logging.
* Increase plan version when cleaning out a lost collection.
* Increase the current version rather than the plan version.
* Fixed test for 3.3
2018-10-08 16:35:18 +02:00
Matthew Von-Maszewski ec5a2f62b8 3.3: Bring two key Agency bug fixes, plus some secondary stuff back to 3.3 (#6009) 2018-08-08 10:33:17 +02:00
Simon c954841a4f Backport supervision for active failover job + testsuite (#5181) 2018-04-23 16:38:11 +02:00
Kaveh Vahedipour 507418d9a4 stop supervision on demand (#5109)
* stop supervision on demand
* adding tests
* Correct an error message.
2018-04-20 11:58:47 +02:00
Kaveh Vahedipour c07a706948 supervision fix for internal issue #2215 backport to 3.3 (#5063)
* supervision fix for internal issue #2215 backport to 3.3
2018-04-10 15:29:27 +02:00
Kaveh Vahedipour cce5b2decb Bug fix 3.3/supervision to delete removed nodes from health (#4455) 2018-02-13 15:55:42 +01:00
Kaveh Vahedipour a14c4bd02f constituent correctly persisiting _votedFor and _term (#4248) (#4320) 2018-01-17 10:37:16 +01:00
Kaveh Vahedipour 56a9ad69b1 Bug fix 3.3/supervision no longer fails to remove server from failed when back to good (#4210)
* let's not miss failedserver removal
* remove resetting of FailedServers in test code
* Only call abortRequestsToFailedServers at most every 3 seconds.
2018-01-03 21:55:01 +01:00
Matthew Von-Maszewski 41d1bfce23 create independent executeLockedRead and executeLockedWrite to speed performance (#4178) 2017-12-29 13:36:48 +01:00
Matthew Von-Maszewski 392ddde251 Bug fix 3.3: Fix supervisor thread crash (#4165)
* port devel branch to 3.3 of supervisor thread death fix
2017-12-27 22:34:29 +01:00
Max Neunhöffer ef8fcd101c
Port to 3.3 of various fixes around leadership preparation in agency. (#4150)
* Add logging for _earliestPackage in Agent.
* Really enforce the hidden option --server.maximal-threads if given.
* Switch off --log.force-direct in scripts/startStandAloneAgency.sh
* Lower the timeout for sending AppendEntriesRPC to 150s.
* Erase _earliestPackage when becoming a leader.
* Challenge leadership in agent main loop.
* Use steady_clock for _earliestPackage.
* Change _lastAcked and _leaderSince to steady_clock as well.
* time difference calculations based on old readSystemClock to steadyClockToDouble
* All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps
* Inception system_clock to steady_clock
2017-12-27 16:47:16 +01:00
Jan 7af86685e3
when upgrading from 3.1 LastHeartBeatAcked could also have been missing, when the 3.1 cluster had not run for long enough (#3974) 2017-12-08 17:33:37 +01:00
Kaveh Vahedipour c300eee5f0 minor (#3813) 2017-11-27 18:22:13 +01:00
Kaveh Vahedipour 7b80deb5cc Fixed object assignment operator for agency's key value store (#3701)
* Fixed object assignment operator for agency's key value store
* Node's toJson is now actually toJson. getString should be used for string extractions
* adjust agency's documentation (clarify precondition)
2017-11-17 15:49:40 +01:00
Kaveh Vahedipour 255d90d26a cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency (#3709)
This is the HealthRecord upgrade patch.
2017-11-17 10:14:14 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
Michael Hackstein 15d9a4be5f Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510) 2017-10-25 18:03:24 +02:00
Simon Grätzer 7c31960cf2 Feature/async failover (#3451) 2017-10-18 23:59:29 +02:00
Max Neunhöffer 9a2385b941 Add host id detection and show in /_admin/cluster/Health. (#3389) 2017-10-11 12:42:44 +02:00
Kaveh Vahedipour 627f344266 fixed a bug, where when servers failed, when also agency leadership c… (#3189)
* fixed a bug, where when servers failed, when also agency leadership changes

* redid entire design of checkDBServers/checkCoordinators.

* comparison in supervision must be between oldPersisted and newHealth

* UI stuff

* UI stuff

* FailedServer test needed adjustment

* Hopefully final round

* fixed supervision failure detection

* FailedServer tests back to origin devel

* oldNot documented among preconditions in Agency HTTP API docs

* changed only look for status updated

* non action line in api-cluster
2017-09-07 16:10:23 +02:00
Kaveh Vahedipour 00650e6a3f Bug fix/agency mt fixes (#3158)
* added debugging methods

* try to fix invalid access in case of error

* remove unused members

* bugfixes and comments

* all agency fixes in

* merge bug

* partially unguarded Agent::lead fixed

* all agency fixes in

* added nrBlocked to thread startup eval

* added nrBlocked to thread startup eval

* recombination of cases in State::get

* some maps replaced with unordered_maps

* optimized maps some
2017-08-30 10:43:51 +02:00
Andreas Streichardt 8e15412e06 Wait for supervision node to prevent races 2017-06-09 15:52:29 +02:00
jsteemann 2930ab6b57 cppcheck 2017-05-15 22:39:16 +02:00
Andreas Streichardt fe59502848 Fix server health 2017-05-11 12:20:15 +02:00
Kaveh Vahedipour de77b5ec7a getting rid of exceptions in supervision 2017-05-10 17:50:31 +02:00
Kaveh Vahedipour b0e7ce40f0 avoid exceptions in supervision main thread when running without cluster 2017-05-04 14:37:03 +02:00
Kaveh Vahedipour 68efba18e8 keep agencyPrefix, when non set 2017-04-26 15:32:26 +02:00
jsteemann 4289105eb3 fix shutdown issue 2017-04-25 16:09:01 +02:00
Kaveh Vahedipour 09a6888d14 attempt at fixing shutdown bug on mac os x 2017-04-24 10:45:54 +02:00
jsteemann ea8496f1a5 cppcheck 2017-04-21 20:19:36 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour 4cc830b0df merge from 3.1 2017-02-20 20:05:52 +01:00
jsteemann b3ac54d065 remove global namespace include 2017-02-13 13:03:33 +01:00
jsteemann d024a6d00a remove logging for non-topics 2017-02-10 09:32:50 +01:00
Andreas Streichardt 8349f56e40 Properly check return valiue 2017-02-07 15:15:56 +01:00
Kaveh Vahedipour 8d66d69f83 supervision handles coordinator demise correctly 2017-02-07 11:29:37 +01:00
Kaveh Vahedipour f3cb1307a5 3.1 fixes backported to devel 2017-02-03 10:48:25 +01:00
jsteemann fa917937c4 do not use namespaces in header files 2017-02-01 13:41:31 +01:00
Kaveh Vahedipour 3f3633bd2c supervision to proper preconditioning of jobs on plan 2017-01-27 15:29:22 +01:00
Kaveh Vahedipour c4bff477a6 wrong persistence of status 2017-01-24 12:52:31 +01:00
Kaveh Vahedipour cfbdaff0a8 Back in add follower 2017-01-23 09:39:32 +01:00
Kaveh Vahedipour 163e0158dc before cppcheck enthusiasts start slacking :) 2017-01-20 15:22:30 +01:00
Kaveh Vahedipour d2760f4ef1 pushing avoidServers property 2017-01-20 15:15:03 +01:00
Kaveh Vahedipour bbb45ca397 Correct depiction of servers health status 2017-01-20 09:17:04 +01:00
Kaveh Vahedipour eb661f95f2 Merge branch 'devel' of https://github.com/arangodb/arangodb into devel 2017-01-18 17:26:54 +01:00
Kaveh Vahedipour f47b3b3c9d transient heartbeats 2017-01-18 17:26:45 +01:00
jsteemann 73da10a7e7 remove unused variable 2017-01-18 13:50:07 +01:00
Kaveh Vahedipour aaee2f9e61 transient heartbeats 2017-01-18 13:43:33 +01:00
Kaveh Vahedipour 879102117d more replicationTest 2017-01-16 15:43:32 +01:00
Kaveh Vahedipour a75b3624de resilience move ok again? 2017-01-16 12:09:21 +01:00