1
0
Fork 0
Commit Graph

48 Commits

Author SHA1 Message Date
Max Neunhöffer e53966c843
Try to fix agency problems with snapshots. (#8947)
* Try to fix agency problems with snapshots.
* Abort MoveShards jobs that have the failed server as fromServer.
* Report aborts.
2019-05-16 14:41:39 +02:00
Lars Maier a1bae63cf1 [3.4] Verbose Abort Reason (#8878)
* Added reason to job abort method.

* Additional abort that is not in devel.
2019-05-01 13:54:47 +02:00
Max Neunhöffer 54f84cab92 Performance tuning for many shards. (#8577) 2019-03-29 21:34:45 +01:00
Kaveh Vahedipour ab3206486d [3.4] job must not copy snapshots (#8406)
* job must not copy snapshots
* Node correct empty children
* checked all hasAsChildren sites
* No copy in operator() for node.
* Don't spam log.
* const operator too
* full path to missing key in agency
* the key is missing
* Another info level to DEBUG from INFO.
* Increase timeouts of MoveShard and CleanOutServer agency jobs.
* CHANGELOG.
2019-03-20 17:03:19 +01:00
Max Neunhöffer 46e479376d
Further supervision fixes. (#8259)
* Do not schedule Coordinators in Plan.

* Finish failed server when server is no longer in health.

* Fix removeServer checks.

Check that server is no longer in use before removing it. Give 60s
waiting time for condition to be met. Also observer agency lock.

* Finish FailedFollower job if server no longer follower.

This can happen because RemoveFollower was faster.

* Only use GOOD servers as replacement followers.

* Fix AddFollower for satellite collections.

* Fix RemoveServer for satellite collections.

* MoveShard handles moves from leader to followers

* Prepare CleanoutServer and FailedServer for satellite collections.

* More sorting out of AddFollower and RemoveFollower.

* Fix RemoveFollower job w.r.t. choice of follower to remove.

* Fix message.

* kill you own sub jobs, please

* Added preconditions to payloads for supervision's job finishers

* Improve logging.

* Add agency diagnostics to failed move shard test, start.

* Add coordinator agency diagnostics.

* Remove warning.

* Add changelog entry.

* Add agency diagnostics if things go sour with move shard.

* Add agency diags when things go wrong 2.

* API /_api/agency/state: back to old format.

* Fix Windows compilation.

* handle aborts in supervision and wait for the last Raft log to be committed

* tests compiling, 2 failing for valid reasons

* Correctly report TRI_ERROR_CLUSTER_CONNECTION_LOST as 503.

* FailedLeader /FailedFollower cannot continue, when aborting blocks
2019-03-04 11:43:35 +01:00
Kaveh Vahedipour e8d39666fd fixing failedserver/leader/follower chain for mishap (#8089)
* fixing failedserver/leader/follower chain for mishap
* change log mention
2019-02-05 13:55:19 +01:00
Frank Celler 9477af198b big reformat 2018-12-26 00:57:05 +01:00
Jan 8b7400a36a cppcheck (#6856) 2018-10-12 12:40:31 +02:00
Matthew Von-Maszewski c0c149cf5b Create non-throwing wrappers for Node access in Agency (#4598)
* safety checkin of Node throw reduction.
* final round of Node throw protection.  Common accessors now protected to force code to hasAsXXX() functions.
2018-04-17 10:21:14 +02:00
Simon 68442dae5a Fixing agency prefix in Agency/Job.cpp (#5039)
* Fixing some test issues and fixing the agency prefix in Agency/Job.cpp
* Making logic consistent in  failed- leader / follower job
* reverting condition back to == GOOD
2018-04-09 16:21:24 +02:00
Andreas Streichardt dd89653798 typo in error message 2017-06-09 12:20:35 +02:00
Andreas Streichardt 439203dc3b Better logging 2017-05-11 12:20:15 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour 4cc830b0df merge from 3.1 2017-02-20 20:05:52 +01:00
jsteemann b3ac54d065 remove global namespace include 2017-02-13 13:03:33 +01:00
Kaveh Vahedipour f47b3b3c9d transient heartbeats 2017-01-18 17:26:45 +01:00
Kaveh Vahedipour 0df8e4e2cd isWatch no longer needed after move to arangodb agency 2016-12-16 12:26:27 +01:00
jsteemann 404e04baa4 fix art 2016-12-08 17:36:42 +01:00
Kaveh Vahedipour b930b23fc2 AddFollower jobs for newly arrived db server to satisfy replication factors 2016-12-07 16:20:47 +01:00
Kaveh Vahedipour 3a1a9c898c correct handling of distributeShardsLike in FailedFollower 2016-12-05 15:44:53 +01:00
Kaveh Vahedipour 77c8c51865 FailedFollower and Windows build problmes 2016-11-30 15:39:10 +01:00
Kaveh Vahedipour 575a671fac AddFollower added to supervisionm 2016-11-29 17:29:44 +01:00
Kaveh Vahedipour 5c3f5f8013 AddFollower added to supervisionm 2016-11-29 17:28:17 +01:00
Kaveh Vahedipour f0e1168e5a Merge remote-tracking branch 'origin/devel' into FMH 2016-11-22 17:48:36 +01:00
Andreas Streichardt 63a173f002 Delete all shard move jobs when server is healthy again 2016-11-22 14:13:09 +01:00
Frank Celler e4ba82e8e9 rewrite of AgencyComm 2016-10-23 00:46:30 +02:00
Kaveh Vahedipour cf09546d93 fixed erroneous break of supervision agency updates 2016-10-07 11:01:45 +02:00
Kaveh Vahedipour c793c3ac44 FailedServer jobs can report when last FailedLeader has been processed 2016-09-22 17:23:56 +02:00
Kaveh Vahedipour 16a35ee15a multi-host agency in tests 2016-09-09 14:46:54 +02:00
Kaveh Vahedipour 85ea1d5ff9 clang-format 2016-09-06 10:01:33 +02:00
Kaveh Vahedipour f066ff9920 looks good for dangling creation of shards 2016-09-05 11:03:37 +02:00
Kaveh Vahedipour f0023d70e1 half way through unassumed leadership 2016-09-02 17:38:49 +02:00
Kaveh Vahedipour e669de0f70 removed bug into failedserver 2016-09-02 14:39:15 +02:00
Kaveh Vahedipour b3b7d7c907 failed servers are excluded from new shard creation 2016-09-02 12:37:53 +02:00
Kaveh Vahedipour 3603a6d63d failed server entry in target and plan increase until resolution 2016-09-02 09:28:08 +02:00
Kaveh Vahedipour e6ec1864c5 move-shard slightly changed order of actions 2016-06-09 12:01:44 +02:00
Max Neunhoeffer 5668e6e524 Agency changes. 2016-06-09 10:51:46 +02:00
Kaveh Vahedipour 382ac052d4 resilience green 2016-06-08 18:27:59 +02:00
Kaveh Vahedipour 3090710b31 bug in subjobs iteration 2016-06-06 17:08:22 +02:00
Kaveh Vahedipour a2af8e1176 move shards are planned correctly 2016-06-06 15:04:10 +02:00
Kaveh Vahedipour 6f62f5baa3 checking before range loops for emptyness 2016-06-02 17:22:12 +02:00
Kaveh Vahedipour f56d36d168 Merge branch 'devel' of https://github.com/arangodb/arangodb into devel 2016-06-02 12:17:40 +02:00
Kaveh Vahedipour 2e87f59218 Testing supervision 2016-06-02 12:17:35 +02:00
Jan Steemann 192caed889 fix Visual Studio compile warnings 2016-06-01 17:09:43 +02:00
Kaveh Vahedipour cc23d0df99 Cleaning out server 2016-06-01 13:44:27 +02:00
Kaveh Vahedipour f4591e3a6f hunting down exceptions in agency supervision 2016-05-31 22:28:02 +02:00
Kaveh Vahedipour 402ed3c2a3 hunting down the exeption in agency 2016-05-31 21:35:07 +02:00
Kaveh Vahedipour b6e15313c3 Moving Job classes out of Supervision 2016-05-31 16:45:23 +02:00