* Avoid delays caused by invalidateCurrent and invalidatePlan.
* Do loadPlan and loadCurrent asynchronously in the background.
* Reload server mappings on Plan Version change.
* Fail a MoveShard job to a FAILED server.
* Better logic for AddFollower/RemoveFollower scheduling.
* Abort MoveShard (leader) in case of a FAILED server in Plan.
* Wait for statistics collections before doing stuff in tests.
cleanOutServer, moveShard, failover and the like.
* Abort MoveShard for follower if FAILED server in Plan.
* Take resigned servers into account when checking for health.
* CHANGELOG.
running compact() in the same transaction will only increase the data size on disk due to
RocksDB not being able to remove any documents physically due to the snapshot that is
taken at transaction start.
This change also exposes db.<collection>.compact() in the arangosh, in order to manually
run a compaction on the data range of a collection should it be needed for maintenance.
* Ignore satellite collections in shrinkCluster in agency.
* Abort RemoveFollower job if not enough in-sync followers or leader failure.
* Break quick wait loop in supervision if leadership is lost.
* In case of resigned leader, set isReady=false in clusterInventory.
* Fix catch tests.
* Do not schedule Coordinators in Plan.
* Finish failed server when server is no longer in health.
* Fix removeServer checks.
Check that server is no longer in use before removing it. Give 60s
waiting time for condition to be met. Also observer agency lock.
* Finish FailedFollower job if server no longer follower.
This can happen because RemoveFollower was faster.
* Only use GOOD servers as replacement followers.
* Fix AddFollower for satellite collections.
* Fix RemoveServer for satellite collections.
* MoveShard handles moves from leader to followers
* Prepare CleanoutServer and FailedServer for satellite collections.
* More sorting out of AddFollower and RemoveFollower.
* Fix RemoveFollower job w.r.t. choice of follower to remove.
* Fix message.
* kill you own sub jobs, please
* Added preconditions to payloads for supervision's job finishers
* Improve logging.
* Add agency diagnostics to failed move shard test, start.
* Add coordinator agency diagnostics.
* Remove warning.
* Add changelog entry.
* Add agency diagnostics if things go sour with move shard.
* Add agency diags when things go wrong 2.
* API /_api/agency/state: back to old format.
* Fix Windows compilation.
* handle aborts in supervision and wait for the last Raft log to be committed
* tests compiling, 2 failing for valid reasons
* Correctly report TRI_ERROR_CLUSTER_CONNECTION_LOST as 503.
* FailedLeader /FailedFollower cannot continue, when aborting blocks
* Updated CleanoutServerTests. Exclude servers in ToBeCleanedServers. Allow bad servers as new follower.
* Prefer good servers.
* Removed copy, sort and binary_search for a list of ~10 elements.
* Fix move shard bug with compare.
* MoveShard fixes, expansion of doForAllShards
* Count only GOOD servers in actualReplicationFactor.
* Make RemoveFollower remove broken servers.
* Precondition on Plan Version for updating Current as leader.
* CleanupServer to evict server from ToBeCleaned, when aborting
* cleanoutserver with payload in finish
* Use static string for ToBeCleanedOut.
* Fixed typo in log message.
* Change warning level. If a MoveShard job is aborted and we can no longer roll back, then we issue a WARNING rather than a DEBUG log message.
* Another typo and log level.
* Start to fix unit tests.
* Does not make sense for AddFollowerTest to have a FAILED leader.
* Only count GOOD followers in AddFollower.
* Fix AddFollowerTest.
* Report precondition failed in MoveShard follower case.
* Add CHANGELOG.
* Begin work on repair-dsl suite to run with data, too
* Use and check data in all tests
* Fixed jslint errors
* Added data to moving-shards-cluster test
* Added additional asserts during createBrokenClusterState()
* Improved failure messages
* Minor cleanup
* Greylist affected tests
* Un-greylist resilience tests, as the fix for moving leaders is now merged
* Prevent "Duplicate testsuite" error
* Added missing require
* Backport active-failover fix for Windows into 3.4
* Backport stop/resume for Windows from devel
* Backport changes from devel into tests also
* Fix tests
* Remove forgotten whitespaces
* Fixed bug where the Foxxmaster doesn't reset jobs after a crash when it should, or a non-master coordinator removes jobs in progress during startup
* Added a regression test
* Removed foxxmaster test from greylist
* Updated CHANGELOG
* Fixed non-maintainer compile
* added check for empty scheduler
* removed log, old is 1 not 0
* require running in this thread
* test
* added isDirect to callback
* signature fixed
* added drain
* added allowDirectHandling
* disabled for testing
* Add ExecContextScope object to direct call.
* try alternate initialization of ExecContextScope
* remove ExecContextScope, no help. try _fifoSize as part of direct decision.
* strand management to minimize reuse of same strand per listen socket
* blind attempt to address Jenkins shutdown lock up. may remove quickly.
* add filename and line to existing error log message
* Adjust queueOperation() to stop accepting items once isStopping() becomes true.
* revert previous check-in to MMFilesCollectorThread.cpp
* big reformat
* fixed merge conflicts
* Add CHANGELOG entry.