* adding agent functionality
* agency route working
* coordinator route to acquiring agency dumps
* remove logging remains and add superuser requirements to API
* Update State.cpp
* guard route
* change we can believe in
running compact() in the same transaction will only increase the data size on disk due to
RocksDB not being able to remove any documents physically due to the snapshot that is
taken at transaction start.
This change also exposes db.<collection>.compact() in the arangosh, in order to manually
run a compaction on the data range of a collection should it be needed for maintenance.
* job must not copy snapshots
* Node correct empty children
* checked all hasAsChildren sites
* No copy in operator() for node.
* Don't spam log.
* const operator too
* full path to missing key in agency
* the key is missing
* Another info level to DEBUG from INFO.
* Increase timeouts of MoveShard and CleanOutServer agency jobs.
* CHANGELOG.
truncating instead of deleting introduced the possibility of the collection's indexes continuing to exist with different ids on the slave than on the master, leading to potential follow-up problems
* Ignore satellite collections in shrinkCluster in agency.
* Abort RemoveFollower job if not enough in-sync followers or leader failure.
* Break quick wait loop in supervision if leadership is lost.
* In case of resigned leader, set isReady=false in clusterInventory.
* Fix catch tests.
* Do not schedule Coordinators in Plan.
* Finish failed server when server is no longer in health.
* Fix removeServer checks.
Check that server is no longer in use before removing it. Give 60s
waiting time for condition to be met. Also observer agency lock.
* Finish FailedFollower job if server no longer follower.
This can happen because RemoveFollower was faster.
* Only use GOOD servers as replacement followers.
* Fix AddFollower for satellite collections.
* Fix RemoveServer for satellite collections.
* MoveShard handles moves from leader to followers
* Prepare CleanoutServer and FailedServer for satellite collections.
* More sorting out of AddFollower and RemoveFollower.
* Fix RemoveFollower job w.r.t. choice of follower to remove.
* Fix message.
* kill you own sub jobs, please
* Added preconditions to payloads for supervision's job finishers
* Improve logging.
* Add agency diagnostics to failed move shard test, start.
* Add coordinator agency diagnostics.
* Remove warning.
* Add changelog entry.
* Add agency diagnostics if things go sour with move shard.
* Add agency diags when things go wrong 2.
* API /_api/agency/state: back to old format.
* Fix Windows compilation.
* handle aborts in supervision and wait for the last Raft log to be committed
* tests compiling, 2 failing for valid reasons
* Correctly report TRI_ERROR_CLUSTER_CONNECTION_LOST as 503.
* FailedLeader /FailedFollower cannot continue, when aborting blocks
* Updated CleanoutServerTests. Exclude servers in ToBeCleanedServers. Allow bad servers as new follower.
* Prefer good servers.
* Removed copy, sort and binary_search for a list of ~10 elements.
* Fix move shard bug with compare.
* MoveShard fixes, expansion of doForAllShards
* Count only GOOD servers in actualReplicationFactor.
* Make RemoveFollower remove broken servers.
* Precondition on Plan Version for updating Current as leader.
* CleanupServer to evict server from ToBeCleaned, when aborting
* cleanoutserver with payload in finish
* Use static string for ToBeCleanedOut.
* Fixed typo in log message.
* Change warning level. If a MoveShard job is aborted and we can no longer roll back, then we issue a WARNING rather than a DEBUG log message.
* Another typo and log level.
* Start to fix unit tests.
* Does not make sense for AddFollowerTest to have a FAILED leader.
* Only count GOOD followers in AddFollower.
* Fix AddFollowerTest.
* Report precondition failed in MoveShard follower case.
* Add CHANGELOG.
* added missing return statements
* only spend up to 10 seconds for initially fetching the list of collections in arangosh
fetching the list of collections is a blocking operation, and the default timeout for this is very high.
If the server is blocked by whatever reason, then the shell is unusable until the collections list request returns.
To avoid this, the initial request is limited to 10 seconds, so the shell can be used afterwards.
* if an index cannot be used for sorting, its sort
cost was previously returned as 0. this will in fact favor
indexes that can be used for filtering but not for sorting
over indexes that can be used for both.
this change is to report the sort cost for indexes that
cannot be used for sorting to n * log(n), where n is the
number of documents that optimizer expects to come out of the
index after filtering
* Backport active-failover fix for Windows into 3.4
* Backport stop/resume for Windows from devel
* Backport changes from devel into tests also
* Fix tests
* Remove forgotten whitespaces
* Fixed bug where the Foxxmaster doesn't reset jobs after a crash when it should, or a non-master coordinator removes jobs in progress during startup
* Added a regression test
* Removed foxxmaster test from greylist
* Updated CHANGELOG
* Fixed non-maintainer compile
* added check for empty scheduler
* removed log, old is 1 not 0
* require running in this thread
* test
* added isDirect to callback
* signature fixed
* added drain
* added allowDirectHandling
* disabled for testing
* Add ExecContextScope object to direct call.
* try alternate initialization of ExecContextScope
* remove ExecContextScope, no help. try _fifoSize as part of direct decision.
* strand management to minimize reuse of same strand per listen socket
* blind attempt to address Jenkins shutdown lock up. may remove quickly.
* add filename and line to existing error log message
* Adjust queueOperation() to stop accepting items once isStopping() becomes true.
* revert previous check-in to MMFilesCollectorThread.cpp
* big reformat
* fixed merge conflicts
* Add CHANGELOG entry.
* should not neglect the initial async request for read lock acquisition
* fixed nullptr
* correct timeout
* corrected error handling in getReadLock
* reverted "test fix"
* should remove async request from ClusterCom
* Do nothing in phaseTwo if leader has not been touched.
* Drop follower if it refuses to cooperate.
This is important since a dbserver that is follower for a shard will
after a reboot think that it is a leader, at least for a short amount
of time. If it came back quickly enough, the leader might not have
noticed that it was away.
* Initialize theLeader non-empty, thus not assuming leadership.
* Correct ClusterInfo to look into Target/CleanedServers.
* Prevent usage of to be cleaned out servers in new collections.
* After a restart, do not assume to be leader for a shard.
* agents' is obtained from leader's configuration
* corrections in Supervision for advertised endpoints
* change log
* Updated Documentation for cluster/health.
* Unified naming convention.
* Fixed missing update of volatile fields.
* Set version in right order.
* Removed debug output.
* Fixed jslint - missing ;
* Fix index creation in cluster.
Simplify and correct error handling logic in ensureIndexCoordinator.
* After index creation, wait until index appears.
We wait until the Supervision has removed the isBuilding flag and
the coordinator has reloaded the Plan.
* More index handling fixes.
* Explicitly remove isBuilding flag in coordinator (again).
* Fix order of arguments in REPLACE call.
* Take out debugging output again.
* Fix catch tests by holding mutex shorter.
* Better mutex handling in ClusterInfo.
* issue 506.3: backport 3.4: issue 506.3: use camel-case configuration parameter names consistntly, add a configuration version property to iresearch view meta
* backport: ensure meta version is supported
* backport: hide 'version' property from non-persistence json
* issue 506.2: backport 3.4: add optimization to not reexecute a primary-key filter if a match was already found
* backport: explicitly check type of instance of the primary-key filter
* backport: return non-null prepared filter and convert check to assert
* Ungreylist move shard test.
* Move leader shard: wait until all but the old leader are in sync.
* Increate moveShard timeout to 10000 seconds.
* Add CHANGELOG.
* Fix compilation.
* Fix a misleading comment.
* issue 153: ensure views are dropped in Agency when database is dropped in cluster, minor fixes
* backport: add test to ensure views are dropped when database is dropped from plan, fix some issues in ClusterInfo
* optimize primary key lookups in ArangoSearch
* fix test
* Add JS tests
* temporary comment optimizations
# Conflicts:
# arangod/Cluster/ClusterInfo.cpp
* Added some DEBUG output for replication rest handler
* Some more debug logging.
* Increased the priority of the ReplicationHandler. This way we will not get stuck with locks that cannot be canceled. Also cancel the lock on the correct database.
* Added extensive log output for replication thins
* Added tombstones to RestReplicationHandler. In a very unlikely case the cancel of a lock can be executed BEFORE the code that actually registers the lock, in this case we will now write a tombstone and do not lock.
* Revert "Added extensive log output for replication thins"
This reverts commit 6d4e37ea1e59e3b3457336019cc7dbc4c979504d.
* Added extensive log output for replication things, now in ERR level instead of MAINTAINER only
* Now actually use hours for synchronization
* React to errors under soft lock if they show up.
* Added a retry loop to increase the read-lock timer.
* Added more timeing output in RocksDB collection internals to figure out why the followers are dropped
* Tweaked RocksDB options
* Revert "Tweaked RocksDB options"
This reverts commit 2bf9c43280beda4792c47d079387fe5154cdd896.
* Removed debug output
* Applied all requested changes by goedderz
* Deleted unused variable
* backport of test data generation for maintenance from devel
* 3.4 working
* fixing index use in cluster while still being built
* fixed broken views
* correct 200 for ensureIndex
* merge with 3.4
* agency comm to handle replace in array
* supervision changes
* cluster info's exsureIndex
* 3.4 ready
* timeout
* missing files from origin
* neunhoef complaints
* bogus entry
* no need to wait for current once again
* no longer necessary. done in IndexFactory now
* correct comments
* left overs
* dead code revived
* Move CHANGELOG entry to the right place.
* Fix resign order
* Fixed a typo
* Get followers later, add TODOs
* Added a callback parameter to collection insert methods
* Get followers under the lock if necessary
* Extracted the replication of inserts into a separate method
* Move shortcut into replicate method
* Added callbacks for remove, replace and update
* Added missing overrides
* Extracted replication code from modifyLocal and removeLocal
* Update followers under lock also during replace, update, remove
* Fix changes from the last commit for update/replace
* Update comments, add asserts
* Remove changes for document-level locks that will be done in another PR
* Unify replication
* Adapt log messages to the devel ones
* Move common methods from its descendants to TransactionCollection, fix Mock on the way
* More IResearch test / mock fixes
* Relax asserts for nested transactions
* Reformat
* Fix non-babies remove and modify replication
* defense against the dark arts (nullptr in _ioContext)
* move incQueued() so that we can imply race state of _ioContext.
* adjust to meet Jans expectations
* jsteeman noticed that queue count is not considered before shutdown ... bad
* add JobGuard object to manage working count. should hold shutdown a tad longer.
* TEMPORARY HACK: need to validate problem that is randomly occurring in Jenkins automation
* TEMPORARY HACK 2: trying to isolate an acceptable sequence.
* TEMPORARY HACK 3: trying to isolate an acceptable sequence.
* TEMPORARY HACK 4: so close ... seem to have all the moving parts isolated. Come on Jenkin!
* shutdown now orderly finishes everything already in fifo queues and active on threads. Then forces any late requests to execute on callers thread.
* refactor arangosearch pks
* minor refactoring
* store PK as BigEndian since it leads to more compact index representation
* force iresearch to not to use libbfd
* fix tests
* Fix loophole.
* Fix inquiry case of id not found: 404.
* Also handle correctly in AgencyComm.
* Fix agency tests.
* Fix error handling in dropCollectionOnCoordinator.