Kaveh Vahedipour
7b80deb5cc
Fixed object assignment operator for agency's key value store ( #3701 )
...
* Fixed object assignment operator for agency's key value store
* Node's toJson is now actually toJson. getString should be used for string extractions
* adjust agency's documentation (clarify precondition)
2017-11-17 15:49:40 +01:00
Kaveh Vahedipour
255d90d26a
cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency ( #3709 )
...
This is the HealthRecord upgrade patch.
2017-11-17 10:14:14 +01:00
Simon Grätzer
ee8209943f
Missing things for active / passive ( #3578 )
...
* Switching from ttl to supervision based failover mechanism
* Allowing canceling of ongoing actions
* refactored asyncjobmanager
* refactoring some code
* adding read-only flag
* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode
* fixing "createsANewDatabaseWithAnInvalidUser"
* auth = off does not longer make everyone superuser
* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
Michael Hackstein
15d9a4be5f
Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies ( #3510 )
2017-10-25 18:03:24 +02:00
Simon Grätzer
7c31960cf2
Feature/async failover ( #3451 )
2017-10-18 23:59:29 +02:00
Max Neunhöffer
9a2385b941
Add host id detection and show in /_admin/cluster/Health. ( #3389 )
2017-10-11 12:42:44 +02:00
Kaveh Vahedipour
627f344266
fixed a bug, where when servers failed, when also agency leadership c… ( #3189 )
...
* fixed a bug, where when servers failed, when also agency leadership changes
* redid entire design of checkDBServers/checkCoordinators.
* comparison in supervision must be between oldPersisted and newHealth
* UI stuff
* UI stuff
* FailedServer test needed adjustment
* Hopefully final round
* fixed supervision failure detection
* FailedServer tests back to origin devel
* oldNot documented among preconditions in Agency HTTP API docs
* changed only look for status updated
* non action line in api-cluster
2017-09-07 16:10:23 +02:00
Kaveh Vahedipour
00650e6a3f
Bug fix/agency mt fixes ( #3158 )
...
* added debugging methods
* try to fix invalid access in case of error
* remove unused members
* bugfixes and comments
* all agency fixes in
* merge bug
* partially unguarded Agent::lead fixed
* all agency fixes in
* added nrBlocked to thread startup eval
* added nrBlocked to thread startup eval
* recombination of cases in State::get
* some maps replaced with unordered_maps
* optimized maps some
2017-08-30 10:43:51 +02:00
Andreas Streichardt
8e15412e06
Wait for supervision node to prevent races
2017-06-09 15:52:29 +02:00
jsteemann
2930ab6b57
cppcheck
2017-05-15 22:39:16 +02:00
Andreas Streichardt
fe59502848
Fix server health
2017-05-11 12:20:15 +02:00
Kaveh Vahedipour
de77b5ec7a
getting rid of exceptions in supervision
2017-05-10 17:50:31 +02:00
Kaveh Vahedipour
b0e7ce40f0
avoid exceptions in supervision main thread when running without cluster
2017-05-04 14:37:03 +02:00
Kaveh Vahedipour
68efba18e8
keep agencyPrefix, when non set
2017-04-26 15:32:26 +02:00
jsteemann
4289105eb3
fix shutdown issue
2017-04-25 16:09:01 +02:00
Kaveh Vahedipour
09a6888d14
attempt at fixing shutdown bug on mac os x
2017-04-24 10:45:54 +02:00
jsteemann
ea8496f1a5
cppcheck
2017-04-21 20:19:36 +02:00
Kaveh Vahedipour
1f81ce28b0
merge in cpp & js from 3.1.18 yet to do tests
2017-04-21 15:41:05 +02:00
Kaveh Vahedipour
4cc830b0df
merge from 3.1
2017-02-20 20:05:52 +01:00
jsteemann
b3ac54d065
remove global namespace include
2017-02-13 13:03:33 +01:00
jsteemann
d024a6d00a
remove logging for non-topics
2017-02-10 09:32:50 +01:00
Andreas Streichardt
8349f56e40
Properly check return valiue
2017-02-07 15:15:56 +01:00
Kaveh Vahedipour
8d66d69f83
supervision handles coordinator demise correctly
2017-02-07 11:29:37 +01:00
Kaveh Vahedipour
f3cb1307a5
3.1 fixes backported to devel
2017-02-03 10:48:25 +01:00
jsteemann
fa917937c4
do not use namespaces in header files
2017-02-01 13:41:31 +01:00
Kaveh Vahedipour
3f3633bd2c
supervision to proper preconditioning of jobs on plan
2017-01-27 15:29:22 +01:00
Kaveh Vahedipour
c4bff477a6
wrong persistence of status
2017-01-24 12:52:31 +01:00
Kaveh Vahedipour
cfbdaff0a8
Back in add follower
2017-01-23 09:39:32 +01:00
Kaveh Vahedipour
163e0158dc
before cppcheck enthusiasts start slacking :)
2017-01-20 15:22:30 +01:00
Kaveh Vahedipour
d2760f4ef1
pushing avoidServers property
2017-01-20 15:15:03 +01:00
Kaveh Vahedipour
bbb45ca397
Correct depiction of servers health status
2017-01-20 09:17:04 +01:00
Kaveh Vahedipour
eb661f95f2
Merge branch 'devel' of https://github.com/arangodb/arangodb into devel
2017-01-18 17:26:54 +01:00
Kaveh Vahedipour
f47b3b3c9d
transient heartbeats
2017-01-18 17:26:45 +01:00
jsteemann
73da10a7e7
remove unused variable
2017-01-18 13:50:07 +01:00
Kaveh Vahedipour
aaee2f9e61
transient heartbeats
2017-01-18 13:43:33 +01:00
Kaveh Vahedipour
879102117d
more replicationTest
2017-01-16 15:43:32 +01:00
Kaveh Vahedipour
a75b3624de
resilience move ok again?
2017-01-16 12:09:21 +01:00
Kaveh Vahedipour
d30458b011
Supervision should not exit of empty plan collection
2017-01-10 16:53:24 +01:00
Kaveh Vahedipour
331d074ebe
more information from ClusterInfo's dropCollectionCoordinator
2017-01-10 16:25:00 +01:00
Kaveh Vahedipour
90c18e4914
waitFor will report more paranoid
2017-01-10 13:53:31 +01:00
Kaveh Vahedipour
55985ed5de
missing prototypes
2017-01-09 10:38:34 +01:00
Kaveh Vahedipour
ab6678eb1f
need to fix tests first
2016-12-29 16:25:30 +01:00
Kaveh Vahedipour
ce687562f2
less rigid expectation on smooth operations through agency comm under worst case scenarios.
2016-12-28 10:32:20 +01:00
Kaveh Vahedipour
fcdc7601f3
Merge branch 'devel' of https://github.com/arangodb/arangodb into devel
2016-12-23 14:06:34 +01:00
Max Neunhoeffer
b6ad88d7f8
Do not getUniqueIds when not leading.
2016-12-23 14:02:44 +01:00
Kaveh Vahedipour
770a49d8ca
supervision should only collect unique ids in leadership
2016-12-23 13:27:59 +01:00
Kaveh Vahedipour
facd58b8b5
replication factor is enforced after failedleader to add new follower
2016-12-22 13:12:16 +01:00
Kaveh Vahedipour
8924cf7852
let's not count failed db servers in replication factor fix
2016-12-21 15:49:56 +01:00
Kaveh Vahedipour
12e54902df
agency's supervision must wait grace period after becoming leader before acting on db server failure
2016-12-21 11:17:41 +01:00
Max Neunhoeffer
985ccaeb70
Get rid of Supervision::wakeUp().
2016-12-20 10:19:24 +01:00