* add missing whitespace to make error message readable
* try to continue running scheduler threads even when there are exceptions
* give up trying to persist follower info in agency for already dropped collections.
* updated CHANGELOG
* Bug fix 3.5/make arangosh reconnect (#9615)
* make arangosh reconnect
* added CHANGELOG entry
* fix lagging AgencyCallbacks (#9620)
* fix lagging AgencyCallbacks
* optimizations, discussed with @mchacki
* fix wording
* updated CHANGELOG
* fix yet another undefined behavior (#9629)
* [3.5.1] Fail the FailedLeader Job if the new leader fails. (#9628)
* Fail the FailedLeader Job if the new leader fails.
* Updated changelog.
* In case of timeout do not rollback.
* Fixed catch tests.
* Changed wording.
* DELETED rollback.
* reduce wait timeouts as a mitigation for notifying waiters without ho… (#9619)
* reduce wait timeouts as a mitigation for notifying waiters without holding the required mutex
this is a quick mitigation only, which reduces maximum wait time from 1
second to 100 milliseconds without changing other behavior.
the main problem of notifying pending writers without successfully
acquiring the required mutex still needs proper addressing.
* adjust timing-dependent test
* [3.5.1] Fast Controlled Leaderchange (#9634)
* First draft of keeping in sync during controlled leader change.
* Test if server is actually the leader in plan.
* Updated changelog.
* Added oldLeader check for set-the-leader request.
* Small fixes.
* Removed LOG_DEVEL.
* less copying, more moving! 🚚 (#9645)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* Port TakeoverShardLeadership from devel to 3.5.1 (#9659)
* Create TakeoverShardLeader job.
* Add TakeoverShardLeadership to Action factory.
* Add log message at level debug.
* Sort out LOG_TOPIC ids.
* Fix unit tests.
* CHANGELOG.
* Bug fix 3.5/hide mmfiles specific info in web ui (#9668)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* hide MMFiles-specific information when we don't need it
* Ported ResignLeadership to 3.5 (#9656)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* Ported ResignLeadership to 3.5
* Add the actual http route.
* Aardvark: Add k Shortest Paths example graph to UI (#9491) (#9661)
* Aardvark: Add k Shortest Paths example graph to UI (#9491)
* Add example graph to UI
* Add kShortestPathsGraph to examples.js
* Update example-graph.js
* Update aardvark.js
* Regenerate UI
* add the ability to have cluster special examples (#9613) (#9663)
* add the ability to have cluster special examples
* Update get_cluster_health.md
* fix abort condition, fix negative filtering for cluster tests
* Test if job fails with unmet assertion
* Remove cluster test example
* germanize
* better skip reasons
* removing superfluous semicolons
* Revert skip reasons, too noisy
* various replication improvements: (#9675)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* various replication improvements:
- better debuggability (more log details)
- shorter minimum wait delay in active failover
- fixed too early pruning of WAL files on leaders
* Bug fix 3.5/fix rocksdb return code (#9692)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* fix return codes for concurrent writes to same documents
* [3.5] Feature/rebootid notice changes, backport of #9523 (#9684)
* Feature/rebootid notice changes, backport of #9523
* Fixed error code to not re-use an old one
* Bug fix 3.5/issue 9679 (#9682)
* attempt to fix load_balancing tests in slow test environments (#9626)
* Bug fix/fix swagger datatype (#9045) (#9602)
* Bug fix/fix swagger datatype (#9045)
* remove http so https arangos will work
* verify that query parameters are proper swagger data types, fix offending documentation files
* return the actual type - not the list of available ones
* check formats
* there is no uint64 in swagger
* Fresh Swagger
* fixed issue #9679
* bug-fix/issue-#9660 (#9704) (#9707)
* bug-fix/issue-#9660 (#9704)
* fix issue
* Update tests/js/common/aql/aql-view-arangosearch-cluster.inc
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update tests/js/common/aql/aql-view-arangosearch-noncluster.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* fix cluster tests
* Update CHANGELOG
* [3.5] agency node fixes (#9698)
* node fixes port from 3.4
* fixed change log
* update rocksdb statistics to deliver sums from column family instead of single value from default family. (#9706)
* Feature 3.5/geo functions (#9710)
* Add support for WGS84 on distances (#9672)
* Add area calculations (#9693)
* Update CHANGELOG
* Cherry-pick minReplicationFactor
* Bug fix/failover with min replication factor (#9486)
* Improve collection time of IResearchQueryOptimizationTest
* Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it
* Added some assertion son minReplicationFactor
* Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled
* added minReplicationFactor to the user interface, preparation for the collection api changes
* added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo
* added minReplicationFactor usage to tests
* TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME
* minReplicationFactor now able to change via collection properties route
* fixed wrongly assert
* added minReplicationFactor to the graph management ui
* added minReplicationFactor to the gharial api
* Fixed off-by-one error in minReplicationFactor. We actually enforced one more.
* adjusted description of minReplicationFactor
* FollowerInfo Refactoring
* added gharial api graph creation tests with minimal replication factor
* proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests
* added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor
* Debug logging
* MORE Debug logging
* Included replication fast lane
* Use correct minreplicationfactor
* modified debug logging
* Fixed compileissues
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* MORE Debug logging
* Revert "MORE Debug logging"
This reverts commit dab5af28c0.
* Revert "MORE Debug logging"
This reverts commit 6134b664bd.
* Revert "MORE Debug logging"
This reverts commit 80160bdf3b.
* Revert "MORE Debug logging"
This reverts commit 06aabcdfe1.
* Removed debug output
* Added replication fast lane. Also refactored the commands as i cannot take it any more...
* Put some requests of RocksDBReplication onto CATCHUP Lane.
* Put some requests of MMFilesReplication onto CATCHUP Lane.
* Adjusted Fast and MED lane usage in Supervised scheduler
* Added changelog entry
* Added new features entry
* A new leader will now keep old followers in case of failover
* Update arangod/Cluster/ClusterCollectionCreationInfo.cpp
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Fixed JSLINT
* Unified lane handling of replication handlers
* Sorry forgotten in last commit
* replaced strings with static strings
* more use of static strings
* optimized min repl description in the ui
* decr initial loop variable
* clean up of the createWithId test
* more use of static strings
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Added some comments on condition, renamed variable as suggested in review
* Added check for min replicationFactor to be non-zero
* Added assertion
* Added function to modify min and max replication factor in one go
* added missing semicolon
* rm log devel
* Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place
* Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers
* check replFactor against nr dbservers
* Add lie reporting in CURRENT
* Reverted most of my recent commits about Failover situation. The intended plan simply does not work out
* move replication checks from logical collection to rest collection handler
* added more replication tests
* Include assert only if we are not in gtest
* jslint
* set min repl factor to zero if satellite collection
* check replication attributes in v8 collection
* Initial commit, old plan, does not yet work
* fixed ires tests
* Included FailoverCandidates key. Not fully implemented
* fixed wrong assert
* unified in sync follower reporting
* fixed compiler errors
* Cleanup locking, and fixed potential deadlocks
* Comments about locking order in FollowerInfo.
* properly check uint
* Keep old leader as potential failover candidate
* Transaction methods now use followerInfo to check if the leader can write, this might have the sideeffect that 'failoverCandidates' are updated
* Let agency check failoverCandidates if possible
* Initialize member variables
* Use unified follower reporting in DBServerAgencySync
* Removed obsolete variable, collecting it somewhere else
* repl factor attr check
* Reimplemented previous followers, second attempt now. PhaseOne and PhaseTwo can now synchronize on current.
* Fixed assertion, forgot an off-by-one
* adjusted test to be more preciese now
* Fixed failove candidates list
* Disable write on dropping too many followers
* Allow to run updateFailoerCandidates multiple times with same leader.
* Final fixes, resilience tests now green, crossing fingers for jenkins
* Fixed race on atomics comparison
* Fixed invalid number type
* added nullptr handling
* added nullptr handling
* Removed invalid assert
* Make takeover of leadership an atomic operation
* Update tests/js/common/shell/shell-cluster-collection.js
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Review fixes
* Fixed creation code to use takeoverLeadership
* Update arangod/Cluster/FollowerInfo.h
Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>
* Applied review fixes
* There is no timeout
* Moved AQL + Pregel to INTERNAL_AQL lane, which is medium priority, to avoid deadlocks with Sync replication
* More review fixes
* Use difference if you want to compare two vectors...
* Use std::string ...
* Now check if we are in recovery mode
* Added documentation for minReplicationFactor
* Added readme update as well in documenation
* Removed merge conflict leftovers 0o, i should not trust the IDE
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/Architecture/Replication/README.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update CHANGELOG
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/DataModeling/Collections/DatabaseMethods.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/Books/Manual/ReleaseNotes/NewFeatures35.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/DocuBlocks/Rest/Collections/1_structs.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Update Documentation/DocuBlocks/Rest/Graph/1_structs.md
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Apply suggestions from code review
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Adepted review requests, thanks for finding!
* Removed unnecessary const
* Apply suggestions from code review
Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
* Moved initilization of variable more downwards
* Apply lock before notify_all()
* Remove documentation except DocuBlocks, covered by PR in docs repo
* Remove accidental indent
* Decoupled IO handling from Scheduler.
* Fixed SSL start up bug.
* Replaced Scheduler with new worker farm implementation.
* Added minimal statistics and info string for Scheduler.
* Added support for timed submissions.
* Updated delayed submission api. Updated code that used timers.
* Extracted new Scheduler into a virtual parent class. The implementation can now depend on the usecase.
* Signal handler now working.
* Changed threads names, `_stop` is atomic, check for failure during thread start + exception handling like old scheduler did.
* Commented on source code and added TODOs.
* Played around with start-stop-conditions
* Play around with start stop condition.
* start stop cond
* Sart Stop Conditions
* Removed bad cv_status check.
* Bug fix: now compare the actual objects instead of pointer values. Setup t1 and t2 depending on the thread id.
* Moved most of the stuff now unrelated to the Scheduler to GeneralServer. Got rid of JobGuard.
* Instead of waiting for a thread to terminate, put it on a clean up list and check for its termination in each supervisor run.
* Allow detaching long running threads.
* Fixed test mock.
* Updated the WorkHandle logic. Removed post functions.
* Fixed crash when obtaining shared_ptr from this in destructor.
* Added lost mutex.
* Fixed memory leak.
* Fixed merge bug.
* Changed a lot of code to optimize the scheduler.
* Fixed bug of invalidated iterator. Dont remove task on shutdown at different places. Let scheduler threads run until queue is empty.
* Only by value calls to queue.
* Added options again.
* Clean up of code.
* UI Request Lane added.
* Bug fixes in Scheduler.
* Applied reformat.
* Use sigaction.