1
0
Fork 0
Commit Graph

47 Commits

Author SHA1 Message Date
Max Neunhöffer 328f46e3d6 This merges hotbackup and atomic-db-creation into 3.5. (#9968)
* Squashed commit of feature-3.5/hotbackup_devel.

This puts hotbackup into 3.5.

* Port atomic-database-creation-2 to 3.5.

* Remove some wrongly ported code.

* Fix compilation.

* Fix a manual merge error.

* Remove a feature from the mocks which does not exist in 3.5.

* Add some code which was forgotten in manual merge.

* Fix a problem introduced in a manual merge.

* reuse function

* Address some whitespace issues that came up in review

* aardvark should not create the frontend collection

* create _frontend collection from c++

* recheckAndUpdate Callback in CollectionWatcher

* Wrong author ;)

* rm outdated todo

* Update lib/Basics/VelocyPackHelper.h

Co-Authored-By: Michael Hackstein <michael@arangodb.com>

* use logger unique id, use startup logger

* not needed

* optimized vector shardid method

* do not create _modules collection lazy anymre

* Formatting.

* Assert instead of if/TRI_ASSERT(false)

* Don't use exceptions as control structure

* Re-add READ_LOCKER that got lost in translation

* Fix audit log in case database creation fails early.

* legacy sharding

* Add CHANGELOG entry.

* Retry database cancellation indefinitely

* Do not use exceptions in UpgradeTask

* DropCollection is a FAST_LANE action and should not need much time or else retry.

* Remove superflous addition of LdapFeature

Proudly brought to you by ASAN tests

* Fixed check for distributShardsLike sharding on _system database

* Fixed compile issue on tests

* Removed assertion that seems to be not correct yet on devel.

* Sort out google cloud storage as remote. (#9918)

* Add successful method to ClusterCommResult.
* Improve error forwarding for cluster internal communication.

* Feature/hotbackup list retries (#9924)

* retry hot backup listing for 2 minutes in cluster before giving up

* Enable api by default.

* fix broken list of non existing id (#9957)

* Fix compilation after manual merge.

* Fix another compilation problem.

* Yet more fixes for compilation.

* More compilation fixes.
2019-09-11 13:13:54 +03:00
KVS85 e64080e207
Merge 3.5.1 back to 3.5 (#9713)
* Bug fix 3.5/make arangosh reconnect (#9615)

* make arangosh reconnect

* added CHANGELOG entry

* fix lagging AgencyCallbacks (#9620)

* fix lagging AgencyCallbacks

* optimizations, discussed with @mchacki

* fix wording

* updated CHANGELOG

* fix yet another undefined behavior (#9629)

* [3.5.1] Fail the FailedLeader Job if the new leader fails. (#9628)

* Fail the FailedLeader Job if the new leader fails.

* Updated changelog.

* In case of timeout do not rollback.

* Fixed catch tests.

* Changed wording.

* DELETED rollback.

* reduce wait timeouts as a mitigation for notifying waiters without ho… (#9619)

* reduce wait timeouts as a mitigation for notifying waiters without holding the required mutex

this is a quick mitigation only, which reduces maximum wait time from 1
second to 100 milliseconds without changing other behavior.

the main problem of notifying pending writers without successfully
acquiring the required mutex still needs proper addressing.

* adjust timing-dependent test

* [3.5.1] Fast Controlled Leaderchange (#9634)

* First draft of keeping in sync during controlled leader change.

* Test if server is actually the leader in plan.

* Updated changelog.

* Added oldLeader check for set-the-leader request.

* Small fixes.

* Removed LOG_DEVEL.

* less copying, more moving! 🚚 (#9645)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* Port TakeoverShardLeadership from devel to 3.5.1 (#9659)

* Create TakeoverShardLeader job.
* Add TakeoverShardLeadership to Action factory.
* Add log message at level debug.
* Sort out LOG_TOPIC ids.
* Fix unit tests.
* CHANGELOG.

* Bug fix 3.5/hide mmfiles specific info in web ui (#9668)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* hide MMFiles-specific information when we don't need it

* Ported ResignLeadership to 3.5 (#9656)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* Ported ResignLeadership to 3.5

* Add the actual http route.

* Aardvark: Add k Shortest Paths example graph to UI (#9491) (#9661)

* Aardvark: Add k Shortest Paths example graph to UI (#9491)

* Add example graph to UI

* Add kShortestPathsGraph to examples.js

* Update example-graph.js

* Update aardvark.js

* Regenerate UI

* add the ability to have cluster special examples (#9613) (#9663)

* add the ability to have cluster special examples

* Update get_cluster_health.md

* fix abort condition, fix negative filtering for cluster tests

* Test if job fails with unmet assertion

* Remove cluster test example

* germanize

* better skip reasons

* removing superfluous semicolons

* Revert skip reasons, too noisy

* various replication improvements: (#9675)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* various replication improvements:

- better debuggability (more log details)
- shorter minimum wait delay in active failover
- fixed too early pruning of WAL files on leaders

* Bug fix 3.5/fix rocksdb return code (#9692)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* fix return codes for concurrent writes to same documents

* [3.5] Feature/rebootid notice changes, backport of #9523 (#9684)

* Feature/rebootid notice changes, backport of #9523

* Fixed error code to not re-use an old one

* Bug fix 3.5/issue 9679 (#9682)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* fixed issue #9679

* bug-fix/issue-#9660 (#9704) (#9707)

* bug-fix/issue-#9660 (#9704)

* fix issue

* Update tests/js/common/aql/aql-view-arangosearch-cluster.inc

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update tests/js/common/aql/aql-view-arangosearch-noncluster.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* fix cluster tests

* Update CHANGELOG

* [3.5] agency node fixes (#9698)

* node fixes port from 3.4
* fixed change log

* update rocksdb statistics to deliver sums from column family instead of single value from default family. (#9706)

* Feature 3.5/geo functions (#9710)

* Add support for WGS84 on distances (#9672)

* Add area calculations (#9693)

* Update CHANGELOG
2019-08-14 20:24:47 +03:00
Michael Hackstein d5840c125a Bug fix 3.5/min replication factor (#9524)
* Cherry-pick minReplicationFactor

* Bug fix/failover with min replication factor (#9486)

* Improve collection time of IResearchQueryOptimizationTest

* Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it

* Added some assertion son minReplicationFactor

* Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled

* added minReplicationFactor to the user interface, preparation for the collection api changes

* added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo

* added minReplicationFactor usage to tests

* TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME

* minReplicationFactor now able to change via collection  properties route

* fixed wrongly assert

* added minReplicationFactor to the graph management ui

* added minReplicationFactor to the gharial api

* Fixed off-by-one error in minReplicationFactor. We actually enforced one more.

* adjusted description of minReplicationFactor

* FollowerInfo Refactoring

* added gharial api graph creation tests with minimal replication factor

* proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests

* added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor

* Debug logging

* MORE Debug logging

* Included replication fast lane

* Use correct minreplicationfactor

* modified debug logging

* Fixed compileissues

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* Revert "MORE Debug logging"

This reverts commit dab5af28c0.

* Revert "MORE Debug logging"

This reverts commit 6134b664bd.

* Revert "MORE Debug logging"

This reverts commit 80160bdf3b.

* Revert "MORE Debug logging"

This reverts commit 06aabcdfe1.

* Removed debug output

* Added replication fast lane. Also refactored the commands as i cannot take it any more...

* Put some requests of RocksDBReplication onto CATCHUP Lane.

* Put some requests of MMFilesReplication onto CATCHUP Lane.

* Adjusted Fast and MED lane usage in Supervised scheduler

* Added changelog entry

* Added new features entry

* A new leader will now keep old followers in case of failover

* Update arangod/Cluster/ClusterCollectionCreationInfo.cpp

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Fixed JSLINT

* Unified lane handling of replication handlers

* Sorry forgotten in last commit

* replaced strings with static strings

* more use of static strings

* optimized min repl description in the ui

* decr initial loop variable

* clean up of the createWithId test

* more use of static strings

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Added some comments on condition, renamed variable as suggested in review

* Added check for min replicationFactor to be non-zero

* Added assertion

* Added function to modify min and max replication factor in one go

* added missing semicolon

* rm log devel

* Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place

* Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers

* check replFactor against nr dbservers

* Add lie reporting in CURRENT

* Reverted most of my recent commits about Failover situation. The intended plan simply does not work out

* move replication checks from logical collection to rest collection handler

* added more replication tests

* Include assert only if we are not in gtest

* jslint

* set min repl factor to zero if satellite collection

* check replication attributes in v8 collection

* Initial commit, old plan, does not yet work

* fixed ires tests

* Included FailoverCandidates key. Not fully implemented

* fixed wrong assert

* unified in sync follower reporting

* fixed compiler errors

* Cleanup locking, and fixed potential deadlocks

* Comments about locking order in FollowerInfo.

* properly check uint

* Keep old leader as potential failover candidate

* Transaction methods now use followerInfo to check if the leader can write, this might have the sideeffect that 'failoverCandidates' are updated

* Let agency check failoverCandidates if possible

* Initialize member variables

* Use unified follower reporting in DBServerAgencySync

* Removed obsolete variable, collecting it somewhere else

* repl factor attr check

* Reimplemented previous followers, second attempt now. PhaseOne and PhaseTwo can now synchronize on current.

* Fixed assertion, forgot an off-by-one

* adjusted test to be more preciese now

* Fixed failove candidates list

* Disable write on dropping too many followers

* Allow to run updateFailoerCandidates multiple times with same leader.

* Final fixes, resilience tests now green, crossing fingers for jenkins

* Fixed race on atomics comparison

* Fixed invalid number type

* added nullptr handling

* added nullptr handling

* Removed invalid assert

* Make takeover of leadership an atomic operation

* Update tests/js/common/shell/shell-cluster-collection.js

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Review fixes

* Fixed creation code to use takeoverLeadership

* Update arangod/Cluster/FollowerInfo.h

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Applied review fixes

* There is no timeout

* Moved AQL + Pregel to INTERNAL_AQL lane, which is medium priority, to avoid deadlocks with Sync replication

* More review fixes

* Use difference if you want to compare two vectors...

* Use std::string ...

* Now check if we are in recovery mode

* Added documentation for minReplicationFactor

* Added readme update as well in documenation

* Removed merge conflict leftovers 0o, i should not trust the IDE

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/Books/Manual/Architecture/Replication/README.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update CHANGELOG

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/Books/Manual/DataModeling/Collections/DatabaseMethods.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/Books/Manual/ReleaseNotes/NewFeatures35.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/DocuBlocks/Rest/Collections/1_structs.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/DocuBlocks/Rest/Graph/1_structs.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Apply suggestions from code review

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Adepted review requests, thanks for finding!

* Removed unnecessary const

* Apply suggestions from code review

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Moved initilization of variable more downwards

* Apply lock before notify_all()

* Remove documentation except DocuBlocks, covered by PR in docs repo

* Remove accidental indent
2019-07-22 17:48:34 +03:00
Jan 9cb08ded92
make the comparison functions unambiguous (#9349)
* make the comparison functions unambiguous

* added @kaveh's suggestion
2019-07-01 16:35:28 +02:00
Kaveh Vahedipour 773f3c8422 [devel] fix state clientlookuptable (#9066) 2019-05-30 04:24:46 +02:00
Jan 449ab1ed8e
Bug fix/cppcheck 13042019 (#8752) 2019-04-15 10:13:56 +02:00
Max Neunhöffer 80bfb85695
Port agency performance tuning for many shards to devel. (#8647)
* Port agency performance tuning for many shards to devel.
* Add more IDs to LOG_TOPIC calls.
* Even more IDs for LOG_TOPIC.
* Fix a duplicate LOG_TOPIC ID.
* Fix an old merging bug in devel.
* Don't hesitate between phases one and two for small clusters.
2019-04-11 11:14:56 +02:00
Jan d6d3e3daa4
initialize some member variables, added TODOs (#8545) 2019-03-26 12:57:32 +01:00
Jan Christoph Uhde c3f7961b88 apply unique log ids (#8561) 2019-03-25 20:26:51 +01:00
Max Neunhöffer 55706e3c74
Make addfollower jobs less aggressive. (#8490)
* Make addfollower jobs less aggressive.
* CHANGELOG.
2019-03-21 15:24:31 +01:00
Kaveh Vahedipour 5038dfe685 supervision must not copy snapshots into jobs (#8425)
* supervision must not copy snapshots into jobs
* CHANGELOG.
2019-03-20 17:07:54 +01:00
Kaveh Vahedipour 68178ba165 [devel] supervision bug fix backports (#8314)
* back ports for supervision fixes from 3.4 part 1

* back ports for supervision fixes from 3.4 part 2
2019-03-04 19:27:24 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
jsteemann 44c7b1b476 remove tabstops 2018-07-16 15:00:12 +02:00
Simon 45fbed497b Supervision Job for Active Failover (#5066) 2018-04-23 12:49:41 +02:00
Matthew Von-Maszewski c0c149cf5b Create non-throwing wrappers for Node access in Agency (#4598)
* safety checkin of Node throw reduction.
* final round of Node throw protection.  Common accessors now protected to force code to hasAsXXX() functions.
2018-04-17 10:21:14 +02:00
Simon 68442dae5a Fixing agency prefix in Agency/Job.cpp (#5039)
* Fixing some test issues and fixing the agency prefix in Agency/Job.cpp
* Making logic consistent in  failed- leader / follower job
* reverting condition back to == GOOD
2018-04-09 16:21:24 +02:00
Tobias Gödderz 4f6847b1b8 Bug fix/supervision bug distributeshardslike and virtual collections (#4759) 2018-03-07 09:54:39 +01:00
Michael Hackstein 76e7461aa9
Revert "bug fix for jobs looking at distrubuteShardsLike and virtual collections (#4665)" (#4758)
This reverts commit 3c35cd32dd.
2018-03-05 17:48:29 +01:00
Kaveh Vahedipour 3c35cd32dd bug fix for jobs looking at distrubuteShardsLike and virtual collections (#4665) 2018-03-05 17:37:07 +01:00
Matthew Von-Maszewski e566150b2e There is a start-up race condition where collection could be in plan but not current. A server shutdown during this period locks system. (#4478) 2018-02-19 09:14:24 +01:00
Kaveh Vahedipour 42f543fd10 constituent correctly persisiting _votedFor and _term (#4248) 2018-01-16 09:47:25 +01:00
Kaveh Vahedipour 7b80deb5cc Fixed object assignment operator for agency's key value store (#3701)
* Fixed object assignment operator for agency's key value store
* Node's toJson is now actually toJson. getString should be used for string extractions
* adjust agency's documentation (clarify precondition)
2017-11-17 15:49:40 +01:00
Kaveh Vahedipour 00650e6a3f Bug fix/agency mt fixes (#3158)
* added debugging methods

* try to fix invalid access in case of error

* remove unused members

* bugfixes and comments

* all agency fixes in

* merge bug

* partially unguarded Agent::lead fixed

* all agency fixes in

* added nrBlocked to thread startup eval

* added nrBlocked to thread startup eval

* recombination of cases in State::get

* some maps replaced with unordered_maps

* optimized maps some
2017-08-30 10:43:51 +02:00
Jan 47e29e6e1f Bug fix/issues 1806 (#3069)
* fix buffer overruns in linenoise for long input lines

* don't make historian repeatedly print the same error messages that nothing can be done about

* make the implementations of the logging operator<<s not throw exceptions, so that logging does throw exceptions as an unintended side effect

* update CHANGELOG

* improve error message

* don't copy strings, but pass them by const reference
2017-08-18 22:58:09 +02:00
Kaveh Vahedipour fd90318fd8 correct-funny-fail-rotation-after-compaction-bugfix (#2774) 2017-07-12 22:39:23 +02:00
Andreas Streichardt f2670f8040 Extract compareServerList and make it reuseable 2017-05-24 14:13:51 +02:00
Andreas Streichardt 8558cb85c9 warning on windows 2017-05-11 13:41:20 +02:00
Kaveh Vahedipour e7797d292e fixed shard ordering in Job::clones with consequences for unit tests 2017-04-27 13:37:47 +02:00
Kaveh Vahedipour 262bb4faac avoid warnings for time being 2017-04-24 16:49:26 +02:00
Kaveh Vahedipour ccc388a940 more dictributeShardsLike code mergedfrom 3.1 2017-04-24 15:13:40 +02:00
Kaveh Vahedipour c099c6daa9 more dictributeShardsLike code mergedfrom 3.1 2017-04-24 15:12:38 +02:00
Andreas Streichardt 7322e3bff3 Allow seeding of randomgenerator for tests 2017-04-21 18:08:49 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour f3cb1307a5 3.1 fixes backported to devel 2017-02-03 10:48:25 +01:00
jsteemann fa917937c4 do not use namespaces in header files 2017-02-01 13:41:31 +01:00
Max Neunhoeffer 7e4f45ec5c Fix server list comparison. 2017-01-19 14:20:00 +01:00
Kaveh Vahedipour 8251cd46e1 cannot depend on Slice.getDouble 2016-12-15 15:23:45 +01:00
Kaveh Vahedipour 2b9c018817 fixed resilience 2016-12-09 16:35:32 +01:00
Kaveh Vahedipour eddecc0a4c clones method in Jobs more useful 2016-12-09 09:29:00 +01:00
Kaveh Vahedipour b930b23fc2 AddFollower jobs for newly arrived db server to satisfy replication factors 2016-12-07 16:20:47 +01:00
Kaveh Vahedipour 3a1a9c898c correct handling of distributeShardsLike in FailedFollower 2016-12-05 15:44:53 +01:00
jsteemann 9d9b4871ba fixes for Visual Studio 2016-10-31 12:16:39 +01:00
Kaveh Vahedipour 72bf15c118 Fixed moveShard to do distributeShardsLike in start instead of create 2016-10-06 15:32:41 +02:00
Kaveh Vahedipour ce8c1a0cac revisiting all supervision jobs 2016-10-05 17:16:02 +02:00
Kaveh Vahedipour e419a52369 Implementations out of Job header 2016-10-05 15:28:26 +02:00
Kaveh Vahedipour 138d3f304e Implementations out of Job header 2016-10-05 15:26:57 +02:00