* Fix dump_authentication suite
* Fix typos
* user the correct attribute name
* properly reload user permissions after _users collection restore
* fixed foxx restore test
* changelog
* changed the order of index creation during restore for _users collection
* Squashed commit of feature-3.5/hotbackup_devel.
This puts hotbackup into 3.5.
* Port atomic-database-creation-2 to 3.5.
* Remove some wrongly ported code.
* Fix compilation.
* Fix a manual merge error.
* Remove a feature from the mocks which does not exist in 3.5.
* Add some code which was forgotten in manual merge.
* Fix a problem introduced in a manual merge.
* reuse function
* Address some whitespace issues that came up in review
* aardvark should not create the frontend collection
* create _frontend collection from c++
* recheckAndUpdate Callback in CollectionWatcher
* Wrong author ;)
* rm outdated todo
* Update lib/Basics/VelocyPackHelper.h
Co-Authored-By: Michael Hackstein <michael@arangodb.com>
* use logger unique id, use startup logger
* not needed
* optimized vector shardid method
* do not create _modules collection lazy anymre
* Formatting.
* Assert instead of if/TRI_ASSERT(false)
* Don't use exceptions as control structure
* Re-add READ_LOCKER that got lost in translation
* Fix audit log in case database creation fails early.
* legacy sharding
* Add CHANGELOG entry.
* Retry database cancellation indefinitely
* Do not use exceptions in UpgradeTask
* DropCollection is a FAST_LANE action and should not need much time or else retry.
* Remove superflous addition of LdapFeature
Proudly brought to you by ASAN tests
* Fixed check for distributShardsLike sharding on _system database
* Fixed compile issue on tests
* Removed assertion that seems to be not correct yet on devel.
* Sort out google cloud storage as remote. (#9918)
* Add successful method to ClusterCommResult.
* Improve error forwarding for cluster internal communication.
* Feature/hotbackup list retries (#9924)
* retry hot backup listing for 2 minutes in cluster before giving up
* Enable api by default.
* fix broken list of non existing id (#9957)
* Fix compilation after manual merge.
* Fix another compilation problem.
* Yet more fixes for compilation.
* More compilation fixes.
* Bug fix 3.5/make arangosh reconnect (#9615)
* make arangosh reconnect
* added CHANGELOG entry
* fix lagging AgencyCallbacks (#9620)
* fix lagging AgencyCallbacks
* optimizations, discussed with @mchacki
* fix wording
* updated CHANGELOG
* fix yet another undefined behavior (#9629)
* [3.5.1] Fail the FailedLeader Job if the new leader fails. (#9628)
* Fail the FailedLeader Job if the new leader fails.
* Updated changelog.
* In case of timeout do not rollback.
* Fixed catch tests.
* Changed wording.
* DELETED rollback.
* reduce wait timeouts as a mitigation for notifying waiters without ho… (#9619)
* reduce wait timeouts as a mitigation for notifying waiters without holding the required mutex
this is a quick mitigation only, which reduces maximum wait time from 1
second to 100 milliseconds without changing other behavior.
the main problem of notifying pending writers without successfully
acquiring the required mutex still needs proper addressing.
* adjust timing-dependent test
* [3.5.1] Fast Controlled Leaderchange (#9634)
* First draft of keeping in sync during controlled leader change.
* Test if server is actually the leader in plan.
* Updated changelog.
* Added oldLeader check for set-the-leader request.
* Small fixes.
* Removed LOG_DEVEL.
* less copying, more moving! 🚚 (#9645)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* Port TakeoverShardLeadership from devel to 3.5.1 (#9659)
* Create TakeoverShardLeader job.
* Add TakeoverShardLeadership to Action factory.
* Add log message at level debug.
* Sort out LOG_TOPIC ids.
* Fix unit tests.
* CHANGELOG.
* Bug fix 3.5/hide mmfiles specific info in web ui (#9668)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* hide MMFiles-specific information when we don't need it
* Ported ResignLeadership to 3.5 (#9656)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* Ported ResignLeadership to 3.5
* Add the actual http route.
* Aardvark: Add k Shortest Paths example graph to UI (#9491) (#9661)
* Aardvark: Add k Shortest Paths example graph to UI (#9491)
* Add example graph to UI
* Add kShortestPathsGraph to examples.js
* Update example-graph.js
* Update aardvark.js
* Regenerate UI
* add the ability to have cluster special examples (#9613) (#9663)
* add the ability to have cluster special examples
* Update get_cluster_health.md
* fix abort condition, fix negative filtering for cluster tests
* Test if job fails with unmet assertion
* Remove cluster test example
* germanize
* better skip reasons
* removing superfluous semicolons
* Revert skip reasons, too noisy
* various replication improvements: (#9675)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* various replication improvements:
- better debuggability (more log details)
- shorter minimum wait delay in active failover
- fixed too early pruning of WAL files on leaders
* Bug fix 3.5/fix rocksdb return code (#9692)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* fix return codes for concurrent writes to same documents
* [3.5] Feature/rebootid notice changes, backport of #9523 (#9684)
* Feature/rebootid notice changes, backport of #9523
* Fixed error code to not re-use an old one
* Bug fix 3.5/issue 9679 (#9682)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* fixed issue #9679
* bug-fix/issue-#9660 (#9704) (#9707)
* bug-fix/issue-#9660 (#9704)
* fix issue
* Update tests/js/common/aql/aql-view-arangosearch-cluster.inc
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update tests/js/common/aql/aql-view-arangosearch-noncluster.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* fix cluster tests
* Update CHANGELOG
* [3.5] agency node fixes (#9698)
* node fixes port from 3.4
* fixed change log
* update rocksdb statistics to deliver sums from column family instead of single value from default family. (#9706)
* Feature 3.5/geo functions (#9710)
* Add support for WGS84 on distances (#9672)
* Add area calculations (#9693)
* Update CHANGELOG
* Cherry-pick minReplicationFactor
* Bug fix/failover with min replication factor (#9486)
* Improve collection time of IResearchQueryOptimizationTest
* Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it
* Added some assertion son minReplicationFactor
* Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled
* added minReplicationFactor to the user interface, preparation for the collection api changes
* added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo
* added minReplicationFactor usage to tests
* TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME
* minReplicationFactor now able to change via collection properties route
* fixed wrongly assert
* added minReplicationFactor to the graph management ui
* added minReplicationFactor to the gharial api
* Fixed off-by-one error in minReplicationFactor. We actually enforced one more.
* adjusted description of minReplicationFactor
* FollowerInfo Refactoring
* added gharial api graph creation tests with minimal replication factor
* proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests
* added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor
* Debug logging
* MORE Debug logging
* Included replication fast lane
* Use correct minreplicationfactor
* modified debug logging
* Fixed compileissues
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* Revert "MORE Debug logging"
This reverts commit dab5af28c0.
* Revert "MORE Debug logging"
This reverts commit 6134b664bd.
* Revert "MORE Debug logging"
This reverts commit 80160bdf3b.
* Revert "MORE Debug logging"
This reverts commit 06aabcdfe1.
* Removed debug output
* Added replication fast lane. Also refactored the commands as i cannot take it any more...
* Put some requests of RocksDBReplication onto CATCHUP Lane.
* Put some requests of MMFilesReplication onto CATCHUP Lane.
* Adjusted Fast and MED lane usage in Supervised scheduler
* Added changelog entry
* Added new features entry
* A new leader will now keep old followers in case of failover
* Update arangod/Cluster/ClusterCollectionCreationInfo.cpp
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Fixed JSLINT
* Unified lane handling of replication handlers
* Sorry forgotten in last commit
* replaced strings with static strings
* more use of static strings
* optimized min repl description in the ui
* decr initial loop variable
* clean up of the createWithId test
* more use of static strings
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Added some comments on condition, renamed variable as suggested in review
* Added check for min replicationFactor to be non-zero
* Added assertion
* Added function to modify min and max replication factor in one go
* added missing semicolon
* rm log devel
* Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place
* Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers
* check replFactor against nr dbservers
* Add lie reporting in CURRENT
* Reverted most of my recent commits about Failover situation. The intended plan simply does not work out
* move replication checks from logical collection to rest collection handler
* added more replication tests
* Include assert only if we are not in gtest
* jslint
* set min repl factor to zero if satellite collection
* check replication attributes in v8 collection
* Initial commit, old plan, does not yet work
* fixed ires tests
* Included FailoverCandidates key. Not fully implemented
* fixed wrong assert
* unified in sync follower reporting
* fixed compiler errors
* Cleanup locking, and fixed potential deadlocks
* Comments about locking order in FollowerInfo.
* properly check uint
* Keep old leader as potential failover candidate
* Transaction methods now use followerInfo to check if the leader can write, this might have the sideeffect that 'failoverCandidates' are updated
* Let agency check failoverCandidates if possible
* Initialize member variables
* Use unified follower reporting in DBServerAgencySync
* Removed obsolete variable, collecting it somewhere else
* repl factor attr check
* Reimplemented previous followers, second attempt now. PhaseOne and PhaseTwo can now synchronize on current.
* Fixed assertion, forgot an off-by-one
* adjusted test to be more preciese now
* Fixed failove candidates list
* Disable write on dropping too many followers
* Allow to run updateFailoerCandidates multiple times with same leader.
* Final fixes, resilience tests now green, crossing fingers for jenkins
* Fixed race on atomics comparison
* Fixed invalid number type
* added nullptr handling
* added nullptr handling
* Removed invalid assert
* Make takeover of leadership an atomic operation
* Update tests/js/common/shell/shell-cluster-collection.js
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Review fixes
* Fixed creation code to use takeoverLeadership
* Update arangod/Cluster/FollowerInfo.h
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Applied review fixes
* There is no timeout
* Moved AQL + Pregel to INTERNAL_AQL lane, which is medium priority, to avoid deadlocks with Sync replication
* More review fixes
* Use difference if you want to compare two vectors...
* Use std::string ...
* Now check if we are in recovery mode
* Added documentation for minReplicationFactor
* Added readme update as well in documenation
* Removed merge conflict leftovers 0o, i should not trust the IDE
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/Architecture/Replication/README.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update CHANGELOG
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/DataModeling/Collections/DatabaseMethods.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/ReleaseNotes/NewFeatures35.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/DocuBlocks/Rest/Collections/1_structs.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/DocuBlocks/Rest/Graph/1_structs.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Apply suggestions from code review
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Adepted review requests, thanks for finding!
* Removed unnecessary const
* Apply suggestions from code review
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Moved initilization of variable more downwards
* Apply lock before notify_all()
* Remove documentation except DocuBlocks, covered by PR in docs repo
* Remove accidental indent
* Use int type for server id
Change serverId to an int
Pass syncerId only for synchronous replication
Added UrlBuilder
structs to classes, reordering
Added Location class, cleanup
Fixed initialization order
Use Location class
Use string for large ints
Documentation
Added clientInfo to ReplicationClientProgressTracker and corresponding rest handlers
Pass clientInfo string in sync replication
Pass clientInfo in addFollower, too
Updated docu
Renamed UrlBuilder to UrlHelper
Updated docu
Try to fix compile error on windows
Fixed a bug and a test
* Implemented @jsteeman's comments
* Bug fix 3.4/collection babies (#9033)
* Prepare API to create multiple collections in a single request to ClusterMethods to improve speedup
* Added counter on how many collections are successfully created
* Allow multi collection creation one level higher
* CollectionMethods now allow batch createion of Collections
* Improved array size assertions
* Now a graph is createad within a single roundtrip in the agency.
* Added new header files
* Insert collections in the AGENCY with TTL and a isBuilding flag, collections with this flag should not be visisible in the coordinator
* Added forgotten C++ file
* Fixed a rare race condition, and the failing IResearch Tests
* readded callback on DONE, otherwise lists are out of sync
* Fixed assertions to let mocked tests pass...
* Fixed community cluster
* Started fixing IResearch analyzer test, catch-tests are failing ;(
* Solved missed merge-conflict
* Added helper functions in AnalyzerFeature-test
* Refactoring AnalyzerTest Section-Auth
* Refactoring AnalyzerTest Section-Emplace-Duplicates
* Refactoring AnalyzerTest Section-Emplace-Error-Cases. Recovery-Test is now red, it seemed to be green because of invalid test case before.
* Refactoring AnalyzerTest, split GET test into multiple parts, still left 'cluster simulation'.
* Attempt to extract Coordinator / DBServer tests a little bit. This commit starts to break all Coordinator tests. However i am convinced that earlier version did NOT test a cluster situation at all, but some hybrid of SingleServer with full local storage that got told to be a Coordinator from now on, but without any Coordinator setup...
* Temporarly disabled some tests in AnalyzerFeature, as discussed with @gnusi.
* Fixed include guard.
* Temporarily deactivated failing tests
* You shall save your files before you commit...
* Fixed test asserting on plan version, which is now higher than before
* Ignore satellite collections in shrinkCluster in agency.
* Abort RemoveFollower job if not enough in-sync followers or leader failure.
* Break quick wait loop in supervision if leadership is lost.
* In case of resigned leader, set isReady=false in clusterInventory.
* Fix catch tests.
* Decoupled IO handling from Scheduler.
* Fixed SSL start up bug.
* Replaced Scheduler with new worker farm implementation.
* Added minimal statistics and info string for Scheduler.
* Added support for timed submissions.
* Updated delayed submission api. Updated code that used timers.
* Extracted new Scheduler into a virtual parent class. The implementation can now depend on the usecase.
* Signal handler now working.
* Changed threads names, `_stop` is atomic, check for failure during thread start + exception handling like old scheduler did.
* Commented on source code and added TODOs.
* Played around with start-stop-conditions
* Play around with start stop condition.
* start stop cond
* Sart Stop Conditions
* Removed bad cv_status check.
* Bug fix: now compare the actual objects instead of pointer values. Setup t1 and t2 depending on the thread id.
* Moved most of the stuff now unrelated to the Scheduler to GeneralServer. Got rid of JobGuard.
* Instead of waiting for a thread to terminate, put it on a clean up list and check for its termination in each supervisor run.
* Allow detaching long running threads.
* Fixed test mock.
* Updated the WorkHandle logic. Removed post functions.
* Fixed crash when obtaining shared_ptr from this in destructor.
* Added lost mutex.
* Fixed memory leak.
* Fixed merge bug.
* Changed a lot of code to optimize the scheduler.
* Fixed bug of invalidated iterator. Dont remove task on shutdown at different places. Let scheduler threads run until queue is empty.
* Only by value calls to queue.
* Added options again.
* Clean up of code.
* UI Request Lane added.
* Bug fixes in Scheduler.
* Applied reformat.
* Use sigaction.
* Added some DEBUG output for replication rest handler
* Some more debug logging.
* Increased the priority of the ReplicationHandler. This way we will not get stuck with locks that cannot be canceled. Also cancel the lock on the correct database.
* Added extensive log output for replication thins
* Added tombstones to RestReplicationHandler. In a very unlikely case the cancel of a lock can be executed BEFORE the code that actually registers the lock, in this case we will now write a tombstone and do not lock.
* Revert "Added extensive log output for replication thins"
This reverts commit 6d4e37ea1e59e3b3457336019cc7dbc4c979504d.
* Added extensive log output for replication things, now in ERR level instead of MAINTAINER only
* Now actually use hours for synchronization
* React to errors under soft lock if they show up.
* Added a retry loop to increase the read-lock timer.
* Added more timeing output in RocksDB collection internals to figure out why the followers are dropped
* Tweaked RocksDB options
* Revert "Tweaked RocksDB options"
This reverts commit 2bf9c43280beda4792c47d079387fe5154cdd896.
* Removed debug output
* Applied all requested changes by goedderz
* Deleted unused variable
* merged fixes from 3.4
* odd fix
* Bug fix 3.4/sync repl release thread (#6784)
* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock
* Fixed insertion of query into registry in rest replication handler.
* Removed unnecessary / false asserts as suggested in review. Fixed code comments.
* Replaced auto with a correct type as suggested in review
* Added a helper function to validate if a query is in use in the registry
* Fixed logic bug in usage of query registry
* Fixed compile issue
* Automaticly transfrom int -> bool in initializerlist sucks...
* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.
* Today it seems that bools are too complicated for my brain.
* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future
* Applied chenges required by @goedderz in review
* Bug fix 3.4/shorter foot in door (#7084)
* Implement `syncCollectionCatchup` in DatabaseTailingSyncer.
First stab, might not even compile.
* Fixed a typo.
* Fix a typo and a compilation problem.
* Further compilation fix.
* Implement two stage catchup.
* Two small corrections.
* Unified error messages in Synchronize shard job.
* Improved a code comment.
* Fixed autocasting bool->double and double->bool issue. That is truely one of the best features ever invented... </irony>
* Renamed doHardLock => toSoftLockOnly and inverted default value
* Merged soft/hard foot logic with Transaction splits
* Use scopeguards to cancel readlocks
* Bug fix 3.4/sync replication allow soft and hard lock (#6864)
* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock
* Fixed insertion of query into registry in rest replication handler.
* Removed unnecessary / false asserts as suggested in review. Fixed code comments.
* Replaced auto with a correct type as suggested in review
* Added a helper function to validate if a query is in use in the registry
* Fixed logic bug in usage of query registry
* Fixed compile issue
* Implemented optional 'doHardLock' parameter in the replication acquire read-lock call. A hard-lock guarntees to stop all writes, a soft-lock may not.
* Fixed compile issue
* Automaticly transfrom int -> bool in initializerlist sucks...
* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.
* Today it seems that bools are too complicated for my brain.
* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future
* Applied chenges required by @goedderz in review
* Renamed doHardLock => toSoftLockOnly and inverted default value
* issue 496.3: move more coordinator-related logic out of TRI_vocbase_t, rename some arangosearch view configuration parameters, remove some consolidation policies, update iresearch to revision 6fd9760d81b136f769e277ea5b8f53996ed7a1ca
* address potential deadlock between link creation and FlushThread
* remove code causing nullptr access
* add back lock around reader reopen
* revert: address potential deadlock between link creation and FlushThread
* invalidate payload for each field in FieldIterator before setting a value
* Improve logging on coordinator when doing `arangorestore`.
* Return more error information in `mergeResults`.
* Longer timeout for communication coordinator -> leader for writes.
This is taking into account possible write stops from followers needed
to get in sync.
* Fix compilation.
* Get rid of numbers in exception log messages.
* Fix a typo.
* Fix compilation.
* issue 496.1: switch scope of responsibility between a TRI_vocbase_t and a LogicalView in respect to view creation/deletion
* backport: address test failures
* backport: ensure arangosearch links get exported in the dump
* backport: ensure view is created during restore on the coordinator
* Updates for ArangoSearch DDL tests, IResearchView unregistration and known issues
* Add fix for internal issue 483