arangodb

Commit Graph

Author	SHA1	Message	Date
Max Neunhöffer	46e479376d	Further supervision fixes. (#8259 ) * Do not schedule Coordinators in Plan. * Finish failed server when server is no longer in health. * Fix removeServer checks. Check that server is no longer in use before removing it. Give 60s waiting time for condition to be met. Also observer agency lock. * Finish FailedFollower job if server no longer follower. This can happen because RemoveFollower was faster. * Only use GOOD servers as replacement followers. * Fix AddFollower for satellite collections. * Fix RemoveServer for satellite collections. * MoveShard handles moves from leader to followers * Prepare CleanoutServer and FailedServer for satellite collections. * More sorting out of AddFollower and RemoveFollower. * Fix RemoveFollower job w.r.t. choice of follower to remove. * Fix message. * kill you own sub jobs, please * Added preconditions to payloads for supervision's job finishers * Improve logging. * Add agency diagnostics to failed move shard test, start. * Add coordinator agency diagnostics. * Remove warning. * Add changelog entry. * Add agency diagnostics if things go sour with move shard. * Add agency diags when things go wrong 2. * API /_api/agency/state: back to old format. * Fix Windows compilation. * handle aborts in supervision and wait for the last Raft log to be committed * tests compiling, 2 failing for valid reasons * Correctly report TRI_ERROR_CLUSTER_CONNECTION_LOST as 503. * FailedLeader /FailedFollower cannot continue, when aborting blocks	2019-03-04 11:43:35 +01:00
Kaveh Vahedipour	87e7185dd7	agency updating endpoints properly (#6643 ) * agency updating endpoints properly	2018-09-28 15:12:40 +02:00
Simon	45fbed497b	Supervision Job for Active Failover (#5066 )	2018-04-23 12:49:41 +02:00

Author

SHA1

Message

Date

Max Neunhöffer

46e479376d

Further supervision fixes. (#8259 )

* Do not schedule Coordinators in Plan.

* Finish failed server when server is no longer in health.

* Fix removeServer checks.

Check that server is no longer in use before removing it. Give 60s
waiting time for condition to be met. Also observer agency lock.

* Finish FailedFollower job if server no longer follower.

This can happen because RemoveFollower was faster.

* Only use GOOD servers as replacement followers.

* Fix AddFollower for satellite collections.

* Fix RemoveServer for satellite collections.

* MoveShard handles moves from leader to followers

* Prepare CleanoutServer and FailedServer for satellite collections.

* More sorting out of AddFollower and RemoveFollower.

* Fix RemoveFollower job w.r.t. choice of follower to remove.

* Fix message.

* kill you own sub jobs, please

* Added preconditions to payloads for supervision's job finishers

* Improve logging.

* Add agency diagnostics to failed move shard test, start.

* Add coordinator agency diagnostics.

* Remove warning.

* Add changelog entry.

* Add agency diagnostics if things go sour with move shard.

* Add agency diags when things go wrong 2.

* API /_api/agency/state: back to old format.

* Fix Windows compilation.

* handle aborts in supervision and wait for the last Raft log to be committed

* tests compiling, 2 failing for valid reasons

* Correctly report TRI_ERROR_CLUSTER_CONNECTION_LOST as 503.

* FailedLeader /FailedFollower cannot continue, when aborting blocks

2019-03-04 11:43:35 +01:00

Kaveh Vahedipour

87e7185dd7

agency updating endpoints properly (#6643 )

* agency updating endpoints properly

2018-09-28 15:12:40 +02:00

Simon

45fbed497b

Supervision Job for Active Failover (#5066 )

2018-04-23 12:49:41 +02:00

3 Commits