* Bug fix 3.4/bad leader report current (#7574)
* Initialize theLeader non-empty, thus not assuming leadership.
* Correct ClusterInfo to look into Target/CleanedServers.
* Prevent usage of to be cleaned out servers in new collections.
* After a restart, do not assume to be leader for a shard.
* Do nothing in phaseTwo if leader has not been touched. (#7579)
* Drop follower if it refuses to cooperate.
This is important since a dbserver that is follower for a shard will
after a reboot think that it is a leader, at least for a short amount
of time. If it came back quickly enough, the leader might not have
noticed that it was away.
* Added some DEBUG output for replication rest handler
* Some more debug logging.
* Increased the priority of the ReplicationHandler. This way we will not get stuck with locks that cannot be canceled. Also cancel the lock on the correct database.
* Added extensive log output for replication thins
* Added tombstones to RestReplicationHandler. In a very unlikely case the cancel of a lock can be executed BEFORE the code that actually registers the lock, in this case we will now write a tombstone and do not lock.
* Revert "Added extensive log output for replication thins"
This reverts commit 6d4e37ea1e59e3b3457336019cc7dbc4c979504d.
* Added extensive log output for replication things, now in ERR level instead of MAINTAINER only
* Now actually use hours for synchronization
* React to errors under soft lock if they show up.
* Added a retry loop to increase the read-lock timer.
* Added more timeing output in RocksDB collection internals to figure out why the followers are dropped
* Tweaked RocksDB options
* Revert "Tweaked RocksDB options"
This reverts commit 2bf9c43280beda4792c47d079387fe5154cdd896.
* Removed debug output
* Applied all requested changes by goedderz
* Deleted unused variable
* agents' is obtained from leader's configuration
* corrections in Supervision for advertised endpoints
* change log
* Updated Documentation for cluster/health.
* Unified naming convention.
* Fixed missing update of volatile fields.
* Set version in right order.
* Removed debug output.
* Fixed jslint - missing ;
* Fix index creation in cluster.
Simplify and correct error handling logic in ensureIndexCoordinator.
* After index creation, wait until index appears.
We wait until the Supervision has removed the isBuilding flag and
the coordinator has reloaded the Plan.
* More index handling fixes.
* Directly remove isBuilding in ensureIndexCoordinator (again).
* Fix catch tests by holding mutex shorter.
* Better mutex handling in ClusterInfo.
* merged fixes from 3.4
* odd fix
* Bug fix 3.4/sync repl release thread (#6784)
* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock
* Fixed insertion of query into registry in rest replication handler.
* Removed unnecessary / false asserts as suggested in review. Fixed code comments.
* Replaced auto with a correct type as suggested in review
* Added a helper function to validate if a query is in use in the registry
* Fixed logic bug in usage of query registry
* Fixed compile issue
* Automaticly transfrom int -> bool in initializerlist sucks...
* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.
* Today it seems that bools are too complicated for my brain.
* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future
* Applied chenges required by @goedderz in review
* Bug fix 3.4/shorter foot in door (#7084)
* Implement `syncCollectionCatchup` in DatabaseTailingSyncer.
First stab, might not even compile.
* Fixed a typo.
* Fix a typo and a compilation problem.
* Further compilation fix.
* Implement two stage catchup.
* Two small corrections.
* Unified error messages in Synchronize shard job.
* Improved a code comment.
* Fixed autocasting bool->double and double->bool issue. That is truely one of the best features ever invented... </irony>
* Renamed doHardLock => toSoftLockOnly and inverted default value
* Merged soft/hard foot logic with Transaction splits
* Use scopeguards to cancel readlocks
* Bug fix 3.4/sync replication allow soft and hard lock (#6864)
* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock
* Fixed insertion of query into registry in rest replication handler.
* Removed unnecessary / false asserts as suggested in review. Fixed code comments.
* Replaced auto with a correct type as suggested in review
* Added a helper function to validate if a query is in use in the registry
* Fixed logic bug in usage of query registry
* Fixed compile issue
* Implemented optional 'doHardLock' parameter in the replication acquire read-lock call. A hard-lock guarntees to stop all writes, a soft-lock may not.
* Fixed compile issue
* Automaticly transfrom int -> bool in initializerlist sucks...
* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.
* Today it seems that bools are too complicated for my brain.
* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future
* Applied chenges required by @goedderz in review
* Renamed doHardLock => toSoftLockOnly and inverted default value
* issue 153: ensure views are dropped in Agency when database is dropped in cluster, minor fixes
* backport: add test to ensure views are dropped when database is dropped from plan, fix some issues in ClusterInfo
* optimize primary key lookups in ArangoSearch
* fix test
* Add JS tests
* temporary comment optimizations
* indexes are marked while still missing in Current
* index handling getCollection
* supervision gets indexes from isbuilding, when coordinator is gone before finishing
* seems right now
* fixed broken views
* remove junk comments
* cleanup
* node / supervision adjustements
* supervision fixes
* neunhoef remarks part i
* neunhoef remarks part ii
* neunhoef remarks part ii
* neunhoef remarks part iiI
* collection's current version please
* no need to wait for current once again
* no longer necessary code
* clear comments
* delete left overs
* dead code revived
* Forward-port of bug-fix-3.4/mmfiles-replication-get-followers-under-lock
Fix resign order
Fixed a typo
Get followers later, add TODOs
Added a callback parameter to collection insert methods
Get followers under the lock if necessary
Extracted the replication of inserts into a separate method
Move shortcut into replicate method
Added callbacks for remove, replace and update
Added missing overrides
Extracted replication code from modifyLocal and removeLocal
Update followers under lock also during replace, update, remove
Fix changes from the last commit for update/replace
Update comments, add asserts
Remove changes for document-level locks that will be done in another PR
Unify replication
Adapt log messages to the devel ones
Move common methods from its descendants to TransactionCollection, fix Mock on the way
More IResearch test / mock fixes
Relax asserts for nested transactions
Reformat
Fix non-babies remove and modify replication
* Remove some changes introduced by the merge
* Fixed compile errors introduced by merge
* Fix loophole in error handling.
* Fix inquiry case of id not found: 404.
* Also handle correctly in AgencyComm.
* Fix agency tests.
* Fix error handling in dropCollectionOnCoordinator.
* Start ClusterComm threads in `ClusterFeature::start`. Stop ClusterComm threads in `ClusterFeature::stop`.
* Do not free objects in `Scheduler::shutdown`. Let the `unique_ptr` do their job. Stop ClusterComm threads in `ClusterFeature::stop`, but free instance in `ClusterFeature::unprepare`.
* `io_context` may contains lambdas that hold `shared_ptr`s to `Tasks` the required a functional `VocBase` in their destructor.
* Improve logging on coordinator when doing `arangorestore`.
* Return more error information in `mergeResults`.
* Longer timeout for communication coordinator -> leader for writes.
This is taking into account possible write stops from followers needed
to get in sync.
* Fix compilation.
* Get rid of numbers in exception log messages.
* Fix a typo.
* Fix compilation.
* issue 496.1: switch scope of responsibility between a TRI_vocbase_t and a LogicalView in respect to view creation/deletion
* backport: address test failures
* backport: ensure arangosearch links get exported in the dump
* backport: ensure view is created during restore on the coordinator
* Updates for ArangoSearch DDL tests, IResearchView unregistration and known issues
* Add fix for internal issue 483
* update iresearch to codebase as of 20181011
* backport: address cluster test failures
* backport: address dump test failures
* backport: address discrepency in view creation between single-server and cluster
* backport: address test failure on cluster (revert change)
* backport: address test failures
* backport: address MSVC build issues
* backport: address issue with LogicalDatasource destructing after TRI_vocbase_t
* Revert "backport: address issue with LogicalDatasource destructing after TRI_vocbase_t"
This reverts commit 4f9880bbaa22194dfbb604b5a54658de1d447ac1.