* Log when queues become filled up or are completly filled.
* Added latency for pushing.
* lowered log level.
* clever logging. and mutex. :(
* Reset local clock.
* Try to fix mac compile.
* Improve logging logic for half full and full queue.
* CHANGELOG.
* Let only on shard per collection, not per DBServer, be responsible for initialize cursor (and shutdown)
Reverted assertion
Changed parameter to const&
* Fixed erroneous merge
* Style correction
* probably fixed a bug, where db servers reported none slice in agency for newly created databases
* exempt none case from user experience
* also harden local ]collection lookups
* Avoid delays caused by invalidateCurrent and invalidatePlan.
* Do loadPlan and loadCurrent asynchronously in the background.
* Reload server mappings on Plan Version change.
* Only hesitate between phase 1 and 2 if phase 1 took long.
* Get rid of warning in agency (_id attribute filtered out).
* Resolve externals.
* CHANGELOG.
* Fail a MoveShard job to a FAILED server.
* Better logic for AddFollower/RemoveFollower scheduling.
* Abort MoveShard (leader) in case of a FAILED server in Plan.
* Wait for statistics collections before doing stuff in tests.
cleanOutServer, moveShard, failover and the like.
* Abort MoveShard for follower if FAILED server in Plan.
* Take resigned servers into account when checking for health.
* CHANGELOG.
* adding agent functionality
* agency route working
* coordinator route to acquiring agency dumps
* remove logging remains and add superuser requirements to API
* Update State.cpp
* guard route
* change we can believe in
running compact() in the same transaction will only increase the data size on disk due to
RocksDB not being able to remove any documents physically due to the snapshot that is
taken at transaction start.
This change also exposes db.<collection>.compact() in the arangosh, in order to manually
run a compaction on the data range of a collection should it be needed for maintenance.
* job must not copy snapshots
* Node correct empty children
* checked all hasAsChildren sites
* No copy in operator() for node.
* Don't spam log.
* const operator too
* full path to missing key in agency
* the key is missing
* Another info level to DEBUG from INFO.
* Increase timeouts of MoveShard and CleanOutServer agency jobs.
* CHANGELOG.
truncating instead of deleting introduced the possibility of the collection's indexes continuing to exist with different ids on the slave than on the master, leading to potential follow-up problems
* Ignore satellite collections in shrinkCluster in agency.
* Abort RemoveFollower job if not enough in-sync followers or leader failure.
* Break quick wait loop in supervision if leadership is lost.
* In case of resigned leader, set isReady=false in clusterInventory.
* Fix catch tests.
* Do not schedule Coordinators in Plan.
* Finish failed server when server is no longer in health.
* Fix removeServer checks.
Check that server is no longer in use before removing it. Give 60s
waiting time for condition to be met. Also observer agency lock.
* Finish FailedFollower job if server no longer follower.
This can happen because RemoveFollower was faster.
* Only use GOOD servers as replacement followers.
* Fix AddFollower for satellite collections.
* Fix RemoveServer for satellite collections.
* MoveShard handles moves from leader to followers
* Prepare CleanoutServer and FailedServer for satellite collections.
* More sorting out of AddFollower and RemoveFollower.
* Fix RemoveFollower job w.r.t. choice of follower to remove.
* Fix message.
* kill you own sub jobs, please
* Added preconditions to payloads for supervision's job finishers
* Improve logging.
* Add agency diagnostics to failed move shard test, start.
* Add coordinator agency diagnostics.
* Remove warning.
* Add changelog entry.
* Add agency diagnostics if things go sour with move shard.
* Add agency diags when things go wrong 2.
* API /_api/agency/state: back to old format.
* Fix Windows compilation.
* handle aborts in supervision and wait for the last Raft log to be committed
* tests compiling, 2 failing for valid reasons
* Correctly report TRI_ERROR_CLUSTER_CONNECTION_LOST as 503.
* FailedLeader /FailedFollower cannot continue, when aborting blocks
* Updated CleanoutServerTests. Exclude servers in ToBeCleanedServers. Allow bad servers as new follower.
* Prefer good servers.
* Removed copy, sort and binary_search for a list of ~10 elements.
* Fix move shard bug with compare.
* MoveShard fixes, expansion of doForAllShards
* Count only GOOD servers in actualReplicationFactor.
* Make RemoveFollower remove broken servers.
* Precondition on Plan Version for updating Current as leader.
* CleanupServer to evict server from ToBeCleaned, when aborting
* cleanoutserver with payload in finish
* Use static string for ToBeCleanedOut.
* Fixed typo in log message.
* Change warning level. If a MoveShard job is aborted and we can no longer roll back, then we issue a WARNING rather than a DEBUG log message.
* Another typo and log level.
* Start to fix unit tests.
* Does not make sense for AddFollowerTest to have a FAILED leader.
* Only count GOOD followers in AddFollower.
* Fix AddFollowerTest.
* Report precondition failed in MoveShard follower case.
* Add CHANGELOG.
* added missing return statements
* only spend up to 10 seconds for initially fetching the list of collections in arangosh
fetching the list of collections is a blocking operation, and the default timeout for this is very high.
If the server is blocked by whatever reason, then the shell is unusable until the collections list request returns.
To avoid this, the initial request is limited to 10 seconds, so the shell can be used afterwards.
* if an index cannot be used for sorting, its sort
cost was previously returned as 0. this will in fact favor
indexes that can be used for filtering but not for sorting
over indexes that can be used for both.
this change is to report the sort cost for indexes that
cannot be used for sorting to n * log(n), where n is the
number of documents that optimizer expects to come out of the
index after filtering
* Backport active-failover fix for Windows into 3.4
* Backport stop/resume for Windows from devel
* Backport changes from devel into tests also
* Fix tests
* Remove forgotten whitespaces
* Fixed bug where the Foxxmaster doesn't reset jobs after a crash when it should, or a non-master coordinator removes jobs in progress during startup
* Added a regression test
* Removed foxxmaster test from greylist
* Updated CHANGELOG
* Fixed non-maintainer compile
* added check for empty scheduler
* removed log, old is 1 not 0
* require running in this thread
* test
* added isDirect to callback
* signature fixed
* added drain
* added allowDirectHandling
* disabled for testing
* Add ExecContextScope object to direct call.
* try alternate initialization of ExecContextScope
* remove ExecContextScope, no help. try _fifoSize as part of direct decision.
* strand management to minimize reuse of same strand per listen socket
* blind attempt to address Jenkins shutdown lock up. may remove quickly.
* add filename and line to existing error log message
* Adjust queueOperation() to stop accepting items once isStopping() becomes true.
* revert previous check-in to MMFilesCollectorThread.cpp
* big reformat
* fixed merge conflicts
* Add CHANGELOG entry.
* should not neglect the initial async request for read lock acquisition
* fixed nullptr
* correct timeout
* corrected error handling in getReadLock
* reverted "test fix"
* should remove async request from ClusterCom