* Squashed commit of feature-3.5/hotbackup_devel.
This puts hotbackup into 3.5.
* Port atomic-database-creation-2 to 3.5.
* Remove some wrongly ported code.
* Fix compilation.
* Fix a manual merge error.
* Remove a feature from the mocks which does not exist in 3.5.
* Add some code which was forgotten in manual merge.
* Fix a problem introduced in a manual merge.
* reuse function
* Address some whitespace issues that came up in review
* aardvark should not create the frontend collection
* create _frontend collection from c++
* recheckAndUpdate Callback in CollectionWatcher
* Wrong author ;)
* rm outdated todo
* Update lib/Basics/VelocyPackHelper.h
Co-Authored-By: Michael Hackstein <michael@arangodb.com>
* use logger unique id, use startup logger
* not needed
* optimized vector shardid method
* do not create _modules collection lazy anymre
* Formatting.
* Assert instead of if/TRI_ASSERT(false)
* Don't use exceptions as control structure
* Re-add READ_LOCKER that got lost in translation
* Fix audit log in case database creation fails early.
* legacy sharding
* Add CHANGELOG entry.
* Retry database cancellation indefinitely
* Do not use exceptions in UpgradeTask
* DropCollection is a FAST_LANE action and should not need much time or else retry.
* Remove superflous addition of LdapFeature
Proudly brought to you by ASAN tests
* Fixed check for distributShardsLike sharding on _system database
* Fixed compile issue on tests
* Removed assertion that seems to be not correct yet on devel.
* Sort out google cloud storage as remote. (#9918)
* Add successful method to ClusterCommResult.
* Improve error forwarding for cluster internal communication.
* Feature/hotbackup list retries (#9924)
* retry hot backup listing for 2 minutes in cluster before giving up
* Enable api by default.
* fix broken list of non existing id (#9957)
* Fix compilation after manual merge.
* Fix another compilation problem.
* Yet more fixes for compilation.
* More compilation fixes.
* Bug fix 3.5/make arangosh reconnect (#9615)
* make arangosh reconnect
* added CHANGELOG entry
* fix lagging AgencyCallbacks (#9620)
* fix lagging AgencyCallbacks
* optimizations, discussed with @mchacki
* fix wording
* updated CHANGELOG
* fix yet another undefined behavior (#9629)
* [3.5.1] Fail the FailedLeader Job if the new leader fails. (#9628)
* Fail the FailedLeader Job if the new leader fails.
* Updated changelog.
* In case of timeout do not rollback.
* Fixed catch tests.
* Changed wording.
* DELETED rollback.
* reduce wait timeouts as a mitigation for notifying waiters without ho… (#9619)
* reduce wait timeouts as a mitigation for notifying waiters without holding the required mutex
this is a quick mitigation only, which reduces maximum wait time from 1
second to 100 milliseconds without changing other behavior.
the main problem of notifying pending writers without successfully
acquiring the required mutex still needs proper addressing.
* adjust timing-dependent test
* [3.5.1] Fast Controlled Leaderchange (#9634)
* First draft of keeping in sync during controlled leader change.
* Test if server is actually the leader in plan.
* Updated changelog.
* Added oldLeader check for set-the-leader request.
* Small fixes.
* Removed LOG_DEVEL.
* less copying, more moving! 🚚 (#9645)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* Port TakeoverShardLeadership from devel to 3.5.1 (#9659)
* Create TakeoverShardLeader job.
* Add TakeoverShardLeadership to Action factory.
* Add log message at level debug.
* Sort out LOG_TOPIC ids.
* Fix unit tests.
* CHANGELOG.
* Bug fix 3.5/hide mmfiles specific info in web ui (#9668)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* hide MMFiles-specific information when we don't need it
* Ported ResignLeadership to 3.5 (#9656)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* Ported ResignLeadership to 3.5
* Add the actual http route.
* Aardvark: Add k Shortest Paths example graph to UI (#9491) (#9661)
* Aardvark: Add k Shortest Paths example graph to UI (#9491)
* Add example graph to UI
* Add kShortestPathsGraph to examples.js
* Update example-graph.js
* Update aardvark.js
* Regenerate UI
* add the ability to have cluster special examples (#9613) (#9663)
* add the ability to have cluster special examples
* Update get_cluster_health.md
* fix abort condition, fix negative filtering for cluster tests
* Test if job fails with unmet assertion
* Remove cluster test example
* germanize
* better skip reasons
* removing superfluous semicolons
* Revert skip reasons, too noisy
* various replication improvements: (#9675)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* various replication improvements:
- better debuggability (more log details)
- shorter minimum wait delay in active failover
- fixed too early pruning of WAL files on leaders
* Bug fix 3.5/fix rocksdb return code (#9692)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* fix return codes for concurrent writes to same documents
* [3.5] Feature/rebootid notice changes, backport of #9523 (#9684)
* Feature/rebootid notice changes, backport of #9523
* Fixed error code to not re-use an old one
* Bug fix 3.5/issue 9679 (#9682)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* fixed issue #9679
* bug-fix/issue-#9660 (#9704) (#9707)
* bug-fix/issue-#9660 (#9704)
* fix issue
* Update tests/js/common/aql/aql-view-arangosearch-cluster.inc
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update tests/js/common/aql/aql-view-arangosearch-noncluster.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* fix cluster tests
* Update CHANGELOG
* [3.5] agency node fixes (#9698)
* node fixes port from 3.4
* fixed change log
* update rocksdb statistics to deliver sums from column family instead of single value from default family. (#9706)
* Feature 3.5/geo functions (#9710)
* Add support for WGS84 on distances (#9672)
* Add area calculations (#9693)
* Update CHANGELOG
* Cherry-pick minReplicationFactor
* Bug fix/failover with min replication factor (#9486)
* Improve collection time of IResearchQueryOptimizationTest
* Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it
* Added some assertion son minReplicationFactor
* Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled
* added minReplicationFactor to the user interface, preparation for the collection api changes
* added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo
* added minReplicationFactor usage to tests
* TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME
* minReplicationFactor now able to change via collection properties route
* fixed wrongly assert
* added minReplicationFactor to the graph management ui
* added minReplicationFactor to the gharial api
* Fixed off-by-one error in minReplicationFactor. We actually enforced one more.
* adjusted description of minReplicationFactor
* FollowerInfo Refactoring
* added gharial api graph creation tests with minimal replication factor
* proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests
* added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor
* Debug logging
* MORE Debug logging
* Included replication fast lane
* Use correct minreplicationfactor
* modified debug logging
* Fixed compileissues
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* Revert "MORE Debug logging"
This reverts commit dab5af28c0.
* Revert "MORE Debug logging"
This reverts commit 6134b664bd.
* Revert "MORE Debug logging"
This reverts commit 80160bdf3b.
* Revert "MORE Debug logging"
This reverts commit 06aabcdfe1.
* Removed debug output
* Added replication fast lane. Also refactored the commands as i cannot take it any more...
* Put some requests of RocksDBReplication onto CATCHUP Lane.
* Put some requests of MMFilesReplication onto CATCHUP Lane.
* Adjusted Fast and MED lane usage in Supervised scheduler
* Added changelog entry
* Added new features entry
* A new leader will now keep old followers in case of failover
* Update arangod/Cluster/ClusterCollectionCreationInfo.cpp
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Fixed JSLINT
* Unified lane handling of replication handlers
* Sorry forgotten in last commit
* replaced strings with static strings
* more use of static strings
* optimized min repl description in the ui
* decr initial loop variable
* clean up of the createWithId test
* more use of static strings
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Added some comments on condition, renamed variable as suggested in review
* Added check for min replicationFactor to be non-zero
* Added assertion
* Added function to modify min and max replication factor in one go
* added missing semicolon
* rm log devel
* Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place
* Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers
* check replFactor against nr dbservers
* Add lie reporting in CURRENT
* Reverted most of my recent commits about Failover situation. The intended plan simply does not work out
* move replication checks from logical collection to rest collection handler
* added more replication tests
* Include assert only if we are not in gtest
* jslint
* set min repl factor to zero if satellite collection
* check replication attributes in v8 collection
* Initial commit, old plan, does not yet work
* fixed ires tests
* Included FailoverCandidates key. Not fully implemented
* fixed wrong assert
* unified in sync follower reporting
* fixed compiler errors
* Cleanup locking, and fixed potential deadlocks
* Comments about locking order in FollowerInfo.
* properly check uint
* Keep old leader as potential failover candidate
* Transaction methods now use followerInfo to check if the leader can write, this might have the sideeffect that 'failoverCandidates' are updated
* Let agency check failoverCandidates if possible
* Initialize member variables
* Use unified follower reporting in DBServerAgencySync
* Removed obsolete variable, collecting it somewhere else
* repl factor attr check
* Reimplemented previous followers, second attempt now. PhaseOne and PhaseTwo can now synchronize on current.
* Fixed assertion, forgot an off-by-one
* adjusted test to be more preciese now
* Fixed failove candidates list
* Disable write on dropping too many followers
* Allow to run updateFailoerCandidates multiple times with same leader.
* Final fixes, resilience tests now green, crossing fingers for jenkins
* Fixed race on atomics comparison
* Fixed invalid number type
* added nullptr handling
* added nullptr handling
* Removed invalid assert
* Make takeover of leadership an atomic operation
* Update tests/js/common/shell/shell-cluster-collection.js
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Review fixes
* Fixed creation code to use takeoverLeadership
* Update arangod/Cluster/FollowerInfo.h
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Applied review fixes
* There is no timeout
* Moved AQL + Pregel to INTERNAL_AQL lane, which is medium priority, to avoid deadlocks with Sync replication
* More review fixes
* Use difference if you want to compare two vectors...
* Use std::string ...
* Now check if we are in recovery mode
* Added documentation for minReplicationFactor
* Added readme update as well in documenation
* Removed merge conflict leftovers 0o, i should not trust the IDE
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/Architecture/Replication/README.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update CHANGELOG
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/DataModeling/Collections/DatabaseMethods.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/ReleaseNotes/NewFeatures35.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/DocuBlocks/Rest/Collections/1_structs.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/DocuBlocks/Rest/Graph/1_structs.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Apply suggestions from code review
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Adepted review requests, thanks for finding!
* Removed unnecessary const
* Apply suggestions from code review
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Moved initilization of variable more downwards
* Apply lock before notify_all()
* Remove documentation except DocuBlocks, covered by PR in docs repo
* Remove accidental indent
* fix races in cluster collection creation, fix return codes of collection deletion
* honor review comments (partially)
* produce agency dumps only in maintainer mode
* fix unit test failures
* Fixed case where a SmartVertex collection could be available too early. Only possible if a SmartGraph is created only with this one collection.
* Now the TTL remove operation will properly check preconditions again.
* Second attempt, we only say collection creation was success iff the plan for the collection has not been mdified during create.
* Disabled assertion in favor of tests.
* Removed debug output
When we try dropping a collection `A` for which there are other collections
that have distributeShardsLike set to `A`, mention this in the error message.
* Add drop-check for index creation in cluster.
* Move check from callback to regular read.
* Add changelog entry.
* Incorporate review suggestion
Co-Authored-By: Simon <simon@graetzer.org>
* Convert to VPackArrayIterator.
* Bug fix 3.4/collection babies (#9033)
* Prepare API to create multiple collections in a single request to ClusterMethods to improve speedup
* Added counter on how many collections are successfully created
* Allow multi collection creation one level higher
* CollectionMethods now allow batch createion of Collections
* Improved array size assertions
* Now a graph is createad within a single roundtrip in the agency.
* Added new header files
* Insert collections in the AGENCY with TTL and a isBuilding flag, collections with this flag should not be visisible in the coordinator
* Added forgotten C++ file
* Fixed a rare race condition, and the failing IResearch Tests
* readded callback on DONE, otherwise lists are out of sync
* Fixed assertions to let mocked tests pass...
* Fixed community cluster
* Started fixing IResearch analyzer test, catch-tests are failing ;(
* Solved missed merge-conflict
* Added helper functions in AnalyzerFeature-test
* Refactoring AnalyzerTest Section-Auth
* Refactoring AnalyzerTest Section-Emplace-Duplicates
* Refactoring AnalyzerTest Section-Emplace-Error-Cases. Recovery-Test is now red, it seemed to be green because of invalid test case before.
* Refactoring AnalyzerTest, split GET test into multiple parts, still left 'cluster simulation'.
* Attempt to extract Coordinator / DBServer tests a little bit. This commit starts to break all Coordinator tests. However i am convinced that earlier version did NOT test a cluster situation at all, but some hybrid of SingleServer with full local storage that got told to be a Coordinator from now on, but without any Coordinator setup...
* Temporarly disabled some tests in AnalyzerFeature, as discussed with @gnusi.
* Fixed include guard.
* Temporarily deactivated failing tests
* You shall save your files before you commit...
* Fixed test asserting on plan version, which is now higher than before
* Port agency performance tuning for many shards to devel.
* Add more IDs to LOG_TOPIC calls.
* Even more IDs for LOG_TOPIC.
* Fix a duplicate LOG_TOPIC ID.
* Fix an old merging bug in devel.
* Don't hesitate between phases one and two for small clusters.
* Decoupled IO handling from Scheduler.
* Fixed SSL start up bug.
* Replaced Scheduler with new worker farm implementation.
* Added minimal statistics and info string for Scheduler.
* Added support for timed submissions.
* Updated delayed submission api. Updated code that used timers.
* Extracted new Scheduler into a virtual parent class. The implementation can now depend on the usecase.
* Signal handler now working.
* Changed threads names, `_stop` is atomic, check for failure during thread start + exception handling like old scheduler did.
* Commented on source code and added TODOs.
* Played around with start-stop-conditions
* Play around with start stop condition.
* start stop cond
* Sart Stop Conditions
* Removed bad cv_status check.
* Bug fix: now compare the actual objects instead of pointer values. Setup t1 and t2 depending on the thread id.
* Moved most of the stuff now unrelated to the Scheduler to GeneralServer. Got rid of JobGuard.
* Instead of waiting for a thread to terminate, put it on a clean up list and check for its termination in each supervisor run.
* Allow detaching long running threads.
* Fixed test mock.
* Updated the WorkHandle logic. Removed post functions.
* Fixed crash when obtaining shared_ptr from this in destructor.
* Added lost mutex.
* Fixed memory leak.
* Fixed merge bug.
* Changed a lot of code to optimize the scheduler.
* Fixed bug of invalidated iterator. Dont remove task on shutdown at different places. Let scheduler threads run until queue is empty.
* Only by value calls to queue.
* Added options again.
* Clean up of code.
* UI Request Lane added.
* Bug fixes in Scheduler.
* Applied reformat.
* Use sigaction.
* Bug fix 3.4/bad leader report current (#7574)
* Initialize theLeader non-empty, thus not assuming leadership.
* Correct ClusterInfo to look into Target/CleanedServers.
* Prevent usage of to be cleaned out servers in new collections.
* After a restart, do not assume to be leader for a shard.
* Do nothing in phaseTwo if leader has not been touched. (#7579)
* Drop follower if it refuses to cooperate.
This is important since a dbserver that is follower for a shard will
after a reboot think that it is a leader, at least for a short amount
of time. If it came back quickly enough, the leader might not have
noticed that it was away.
* Fix index creation in cluster.
Simplify and correct error handling logic in ensureIndexCoordinator.
* After index creation, wait until index appears.
We wait until the Supervision has removed the isBuilding flag and
the coordinator has reloaded the Plan.
* More index handling fixes.
* Directly remove isBuilding in ensureIndexCoordinator (again).
* Fix catch tests by holding mutex shorter.
* Better mutex handling in ClusterInfo.
* issue 153: ensure views are dropped in Agency when database is dropped in cluster, minor fixes
* backport: add test to ensure views are dropped when database is dropped from plan, fix some issues in ClusterInfo
* optimize primary key lookups in ArangoSearch
* fix test
* Add JS tests
* temporary comment optimizations
* indexes are marked while still missing in Current
* index handling getCollection
* supervision gets indexes from isbuilding, when coordinator is gone before finishing
* seems right now
* fixed broken views
* remove junk comments
* cleanup
* node / supervision adjustements
* supervision fixes
* neunhoef remarks part i
* neunhoef remarks part ii
* neunhoef remarks part ii
* neunhoef remarks part iiI
* collection's current version please
* no need to wait for current once again
* no longer necessary code
* clear comments
* delete left overs
* dead code revived
* Fix loophole in error handling.
* Fix inquiry case of id not found: 404.
* Also handle correctly in AgencyComm.
* Fix agency tests.
* Fix error handling in dropCollectionOnCoordinator.
* issue 496.1: switch scope of responsibility between a TRI_vocbase_t and a LogicalView in respect to view creation/deletion
* backport: address test failures
* backport: ensure arangosearch links get exported in the dump
* backport: ensure view is created during restore on the coordinator
* Updates for ArangoSearch DDL tests, IResearchView unregistration and known issues
* Add fix for internal issue 483