1
0
Fork 0
Commit Graph

2337 Commits

Author SHA1 Message Date
Simon cb7bf0314b Use fuerte in RemoteExecutor (#10077) 2019-09-27 16:20:38 +02:00
Dan Larkin-York dc23896a01 Make count, figures, revision, and index warmup use non-blocking communication (#10048) 2019-09-27 09:54:01 +02:00
Jan 9533d1b71c
fix skipInaccessible (#10092) 2019-09-26 14:45:58 +02:00
Dan Larkin-York a83c2323c9 Refactor ApplicationServer stack (#9965) 2019-09-25 17:31:59 +02:00
Jan 8a56ed9a2c
fix sporadically failing one shard test (#10074) 2019-09-25 12:52:31 +02:00
Simon ac2158ee22 Async el cheapo (#10061) 2019-09-24 12:00:13 +02:00
Simon 2d1c76a55a Refactor Batch API docs (#10015) 2019-09-23 19:56:04 +02:00
Jan Christoph Uhde 0b8c75c7b7 one shard db - devel (#9395) 2019-09-23 15:48:37 +02:00
Kaveh Vahedipour dd10909dfc rebootIds instead of boot stamps (#10050)
* rebootIds instead of boot stamps
* noexcept wrong as copies are done
2019-09-20 10:26:35 +02:00
Kaveh Vahedipour 49a01e14ff Bugfix/agency lock left behind (#10021)
* fix potentially left behind agency lock
* typo
* do not silently patronise the customer
2019-09-20 10:09:21 +02:00
Jan 41b0717bc9
remove unused code (#10043) 2019-09-19 12:25:25 +02:00
Tobias Gödderz 7d091523e5 Fixed some problems found with IPO enabled (#10020) 2019-09-16 17:10:13 +02:00
Dan Larkin-York 8c573549b3 Make truncate use non-blocking communication. (#9980) 2019-09-16 10:46:49 +02:00
Simon 2e91f4fe67 Non block delete (#10005) 2019-09-12 21:44:35 +02:00
Jan a93733da9d
reset _activeTrx as early as possible to release locks (#10004)
* reset _activeTrx as early as possible to release locks

* moved code into if condition
2019-09-12 19:32:17 +02:00
Jan 84ad504a6c
corrected several wrong macro names (#9971) 2019-09-11 16:45:32 +02:00
Simon 3d2952b23a Non block modify (#9963) 2019-09-11 15:37:02 +02:00
Jan aada04e75b
don't assert/crash when using an unknown collection/shard (#9959) 2019-09-11 12:03:13 +02:00
Michael Hackstein d251c3316d
Fixed and enabled an accidentially disabled assertion (#9969)
* Fixed and enabled an accidentially disabled assertion

* Removed debug include
2019-09-11 09:04:31 +02:00
Kaveh Vahedipour 3011846025 broken hotbackup list with (#9956)
* fix broken list of non existing id
2019-09-10 10:17:24 +02:00
Markus Pfeiffer d25ea0e377 Cleanup ServerState.cpp (#9923)
* Remove unused function mkdir()

* Remove some outdated comments

* Add include guards to AgencyStrings.h

* Move check for initialized agency out of registerAtAgencyPhase1

* Whitespace cleanup

* Address two minor comments from review
2019-09-10 10:06:16 +02:00
Jan 3a59abd1dc
various issues reported by cppcheck (#9962) 2019-09-09 20:32:04 +02:00
Tobias Gödderz 9e9fa3a4f1 Bug fix/allow agency ops in active failover (#9881)
* allow agency operations in active failover too

* Added regression test

* Allowed more calls in active failover for the health endpoint to work

* Updated CHANGELOG
2019-09-09 16:53:57 +02:00
Tobias Gödderz 1d65a37cc8 Feature/agency paths framework (#9933)
* Added a skeleton framework for agency paths

* Added some basic tests

* Added missing header

* Move to shared_ptrs to parents

* Added a virtual base class

* Sprinkle some final specifiers

* Moved some code into class Path and simplified tests

* Added root() function

* Added assertions

* Added /arango/Supervision

* Replaced PathComponent by StaticComponent and added DynamicComponent

* Added /arango/Target

* Added /arango/Current

* Added /arango/Plan

* Added a TODO note

* Added the last missing top-level paths in /arango/

* Added aliases, cleaned up comment

* Fixed some specifiers

* s/typeof/decltype/
2019-09-09 14:04:12 +02:00
Jan 9f078f363d
Bug fix/harden database creation against duplicate name (#9951) 2019-09-09 11:00:33 +02:00
Simon 6b7fb0994e Non-Blocking reads (#9883) 2019-09-05 19:26:01 +02:00
Kaveh Vahedipour ad36adc354 Feature/hotbackup list retries (#9924)
* retry hot backup listing for 2 minutes in cluster before giving up
2019-09-05 16:44:43 +02:00
Max Neunhöffer d45757db51
Sort out google cloud storage as remote. (#9918)
* Add successful method to ClusterCommResult.
* Improve error forwarding for cluster internal communication.
2019-09-05 13:36:50 +02:00
Markus Pfeiffer 753ff4aa67 Feature/atomic database creation 2 (#9826) 2019-09-05 12:38:07 +02:00
Tobias Gödderz b7f8f01a22 Feature/split libarangoserver (#9670)
* Made usage of named variables for targets a little more consistent

* Bumped minimum cmake version

- removed policies superseded by minimum cmake version
- fixed static linking due to new policy CMP0060

* Fixed library suffixes for windows

* Extracted libs arango_mmfiles and _rocksdb from arangoserver

* Replaced target variables by constants

* Extracted arango_cluster_engine from arangoserver

* Extracted llhttp from arangoserver

* First successful split of arangoserver

* Moved enterprise files to enterprise

* Again only optionally include RestTestHandler and AcceptorUnixDomain

* Cleaned source files from other libraries

* Removed old commented sources

* Split off a small third library

* Fixed boost dependency for cluster engine

* Added some missing dependencies

* Added arango_geo dep, and -J on windows

* Began to split off an arango_graph lib

* Moved more files to arango_graph

* Do not set /J globally for ATL

* Moved more files to arango_graph

* Moved graph-RestHandlers to arango_graph

* Added arango_geo dependency to _mmfiles and _rocksdb

* Updated graph dependencies

* Split off arango_pregel

* Split off arango_aql

* Added missing boost_system dependency to pregel

* Split off arango_vocbase

* Cleanup

* Added missing boost_system dependency for arango_vocbase

* Split off arango_v8server

* Split off arango_utils

* Minor cleanup

* Split of arango_storage_engine

* Split off arango_indexes and arango_cache

* Fixed some dependencies

* Split off arango_replication

* Resolved two todos

* Split off arango_agency

* Reordered some statements

* Ordered dependency definitions alphabetically

* Cleaned some deps

* Break one cycle, comment on another

* Merge the remaining arangoserver_part[123] sources

* Moved some utils to vocbase to break cycles

* Added missing backtrace dependency to iresearch-s

* Added missing boost dependency

* Added dependency arango_indexes -> arango_geo

* Added deps to arango_cluster_engine, cleaned duplicate deps

* Broke remaining dependency cycles

* Actually, missed one cycle...

* Re-added include for Mac

* arango_cache needs SharedPRNG
2019-09-05 09:37:12 +02:00
Jan 5572675106
Bug fix/remove base directory from include path (#9885) 2019-09-04 17:39:01 +02:00
Kaveh Vahedipour e6cb5d0f16 DropCollection is a FAST_LANE action and should not need much time or else retry. (#9893) 2019-09-04 15:34:02 +02:00
Kaveh Vahedipour cced54cbc7 forceBackup is now allowInconsistent (#9850)
* forceBackup is now allowInconsistent
2019-09-02 10:32:58 +02:00
Simon 2f9d1f8c51 Baby operations in cluster fix (#9870) 2019-09-02 09:36:37 +02:00
Jan d42490aa42
honor return values of important methods (#9859) 2019-08-30 23:11:19 +02:00
Jan 13e1327723
fix return value when reboot id cannot be retrieved (#9857) 2019-08-30 10:34:47 +02:00
Jan ec3043dd8f
check for duplicate server endpoints on cluster startup (#9860) 2019-08-30 10:29:46 +02:00
Simon 0ee0cebb11 Non-Blocking inserts (#9823) 2019-08-30 09:17:58 +02:00
Jan 30b36a2a42
fix return value checks (#9852) 2019-08-29 20:38:53 +02:00
Jan f01385e969
Bug fix/multi bugs (#9789) 2019-08-26 13:11:59 +02:00
Dan Larkin-York 1cc31e1085 Minimize unnecessary dropping of followers due to poorly set synchronous replication timeouts (#9798) 2019-08-26 11:20:02 +02:00
Jan 00bcc4954c
AQL date functions improvements (#9714) 2019-08-22 12:50:08 +02:00
Lars Maier 2ec2e1c1bc Background Get Ids (#9474)
* Obtain more ids via a background thread.
* Wait for thread to stop on shutdown.
* Added scope guard.
* Atomic weapons.
* Fix log level.
* One big lock!
* Added mutex for cleanup.
* Fixed unused variable.
2019-08-16 12:42:22 +02:00
Max Neunhöffer dc095f70c7
Fix agency bugs. (#9718)
* More logging in agency client inquiry.
* Fix first bug, increase logging for second.
* Handle timeout better in agency tests.
* Fix wrong term bug.
* Fix transact similarly.
* CHANGELOG.
2019-08-16 11:16:28 +02:00
Simon dde1a82fce Network Code V2 (#7962) 2019-08-14 17:25:03 +02:00
Frank Celler 68ea717af4
Feature/set environment (#9688) 2019-08-13 09:18:16 +02:00
Frank Celler aa3d3f8e40
Feature/cleanup ccpcheck (#9665) 2019-08-12 11:11:49 +02:00
Jan 6ad0a995b8
various replication improvements: (#9676)
- better debuggability (more log details)
- shorter minimum wait delay in active failover
- fixed too early pruning of WAL files on leaders
2019-08-12 10:53:20 +02:00
Tobias Gödderz 9cd332b958 Feature/rebootid notice changes (#9523)
* Consolidated _servers and _serverAdvertisedEndpoints, added rebootId, prepared change notifications

* Cleanup

* Added a RebootId type

* Began implementing RebootTracker (still WIP)

* Moved RebootId operators into the class

* Removed RebootId operator<< again

* Added tests, added CallbackGuard, removed/commented old RebootTracker code

* Fix: do not try to call unset callbacks

* Split one test, added another

* Added more tests

* Renamed tests, added more tests

* Fixed missing variable declarations

* Let MockServer appear to be started

* Reorded test, fixed naming

* Implemented callMeOnChange()

* Re-implemented RebootTracker (not yet working)

* Resolved a TODO, updated a test, added comments

* Call old callbacks immediately

* Fixed tests

* Use EXPECT_* instead of ASSERT_*

* Suppress a log message

* Resolved TODOs

* Reverted changes on reading ServersRegistered

* Update RebootTracker

* Introduce `rebootId` into ServerState for Cluster

 * A server *boots* if it is started on a previously non-existing data
   directory and hence does not have a UUID yet.
 * A server *reboots* if it is started on a pre-existing data directory

We keep the rebootId in the cluster's agency under
Current/ServersKnown/$uuid/rebootId.

When rebooting (and subsequently re-joining a cluster), the server increments
its rebootId in Phase 2 of registration. This way it can be detected within the
cluster whether a server was restarted.

This information will later be used to handle cases where server restarts can
lead to problems, for example with transactions or in-progress queries.

* Move rebootId into Current/ServersKnown/

* Fixed typo

* Fixed log ids

* Add deletion of ServersKnown/UUID from agency

* Add deletion of Current/ServersKnown/UUID to removeServer

* Clean up readRebootIdFromAgency and add retry loop around it

* Bugfix

* Added nolint comments

* Fixed initialization order

* Fixed ClusterInfo-test

* Added log messages

* Revert "Fixed ClusterInfo-test"

This reverts commit d983596979.

* Disabled assertion for google tests

* Ignore windows compile warning

* Always call loadServers in loadCurrent

* Fix really subtle bug when not returning a value

* Introduce `rebootId` into ServerState for Cluster

 * A server *boots* if it is started on a previously non-existing data
   directory and hence does not have a UUID yet.
 * A server *reboots* if it is started on a pre-existing data directory

We keep the rebootId in the cluster's agency under
Current/ServersKnown/$uuid/rebootId.

When rebooting (and subsequently re-joining a cluster), the server increments
its rebootId in Phase 2 of registration. This way it can be detected within the
cluster whether a server was restarted.

This information will later be used to handle cases where server restarts can
lead to problems, for example with transactions or in-progress queries.

* Move rebootId into Current/ServersKnown/

* Add deletion of ServersKnown/UUID from agency

* Add deletion of Current/ServersKnown/UUID to removeServer

* Clean up readRebootIdFromAgency and add retry loop around it

* Fixed compile error due to forbidden implicit cast

* Fixed compile error on windows

* Fixed compile error due to devel merge

* Removed dead comment

* Removed TODO note

* Extended comment

* Removed TODO note

* Fixed using an invalidated iterator

* Copy string only if necessary

* Fixed compile error
2019-08-12 09:33:22 +02:00
Frank Celler cf26b3a39e removed unused variable 2019-08-09 10:01:14 +02:00
Max Neunhöffer b7dd51229d
Create TakeoverShardLeader job. (#9653)
* Create TakeoverShardLeader job.
* Add TakeoverShardLeadership to Action factory.
* Add log message at level debug.
* Sort out LOG_TOPIC ids.
* Fix unit tests.
2019-08-07 16:49:08 +02:00
Lars Maier 492057d4f4 [devel] Resign Leadership (#9427)
* First version of ResignLeadership Job.
* Port some performance optimizations from CleanOutServerJob.
* Draft of resigning leadership on shutdown.
* Moved code into Maintenance Feature. Fixed beginShutdown.
2019-08-07 15:02:17 +02:00
Jan 5452b01990
Bug fix/optimizations 06 08 2019 (#9641) 2019-08-06 16:04:39 +02:00
jsteemann 963d8eed57 fix compile error 2019-08-06 15:54:53 +02:00
Dan Larkin-York 3d0246cb18 Decentralize includes (#9623) 2019-08-06 15:32:09 +02:00
Lars Maier 715a3b19b0 Fast Controlled Leaderchange (#9608)
* First draft of keeping in sync during controlled leader change.
* Test if server is actually the leader in plan.
* Update changelog.
* Added oldLeader check for set-the-leader request.
* Small fixes.
2019-08-05 12:08:21 +02:00
Jan 5282fa23c5
fix lagging AgencyCallbacks (#9617) 2019-08-02 11:44:01 +02:00
Lars Maier ed496fe5dd Feature/hotbackup devel (#9495)
Hotbackup
2019-08-02 11:39:46 +02:00
Jan 7d829de89e added internal function getResponsibleServers() (#9604)
* added internal function getResponsibleServers()

* forgot to commit

* honor review comments

* Update arangod/Cluster/ClusterInfo.cpp

Potentially Fixed Unique logID usage. (let Jenkins test it)
2019-07-31 10:18:37 +02:00
Jan 50f41cec59
added missing function db._transactions(), and equivalent REST API route GET /_api/transaction (#9571) 2019-07-26 16:20:28 +02:00
Jan 4ea95bc109
ascii art (#9565) 2019-07-25 13:03:30 +02:00
Michael Hackstein 987ad41364
Forward Port of changes in 3.5 review (#9544)
* Bug fix 3.5/min replication factor (#9524)

* Cherry-pick minReplicationFactor

* Bug fix/failover with min replication factor (#9486)

* Improve collection time of IResearchQueryOptimizationTest

* Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it

* Added some assertion son minReplicationFactor

* Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled

* added minReplicationFactor to the user interface, preparation for the collection api changes

* added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo

* added minReplicationFactor usage to tests

* TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME

* minReplicationFactor now able to change via collection  properties route

* fixed wrongly assert

* added minReplicationFactor to the graph management ui

* added minReplicationFactor to the gharial api

* Fixed off-by-one error in minReplicationFactor. We actually enforced one more.

* adjusted description of minReplicationFactor

* FollowerInfo Refactoring

* added gharial api graph creation tests with minimal replication factor

* proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests

* added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor

* Debug logging

* MORE Debug logging

* Included replication fast lane

* Use correct minreplicationfactor

* modified debug logging

* Fixed compileissues

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* Revert "MORE Debug logging"

This reverts commit dab5af28c0.

* Revert "MORE Debug logging"

This reverts commit 6134b664bd.

* Revert "MORE Debug logging"

This reverts commit 80160bdf3b.

* Revert "MORE Debug logging"

This reverts commit 06aabcdfe1.

* Removed debug output

* Added replication fast lane. Also refactored the commands as i cannot take it any more...

* Put some requests of RocksDBReplication onto CATCHUP Lane.

* Put some requests of MMFilesReplication onto CATCHUP Lane.

* Adjusted Fast and MED lane usage in Supervised scheduler

* Added changelog entry

* Added new features entry

* A new leader will now keep old followers in case of failover

* Update arangod/Cluster/ClusterCollectionCreationInfo.cpp

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Fixed JSLINT

* Unified lane handling of replication handlers

* Sorry forgotten in last commit

* replaced strings with static strings

* more use of static strings

* optimized min repl description in the ui

* decr initial loop variable

* clean up of the createWithId test

* more use of static strings

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Added some comments on condition, renamed variable as suggested in review

* Added check for min replicationFactor to be non-zero

* Added assertion

* Added function to modify min and max replication factor in one go

* added missing semicolon

* rm log devel

* Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place

* Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers

* check replFactor against nr dbservers

* Add lie reporting in CURRENT

* Reverted most of my recent commits about Failover situation. The intended plan simply does not work out

* move replication checks from logical collection to rest collection handler

* added more replication tests

* Include assert only if we are not in gtest

* jslint

* set min repl factor to zero if satellite collection

* check replication attributes in v8 collection

* Initial commit, old plan, does not yet work

* fixed ires tests

* Included FailoverCandidates key. Not fully implemented

* fixed wrong assert

* unified in sync follower reporting

* fixed compiler errors

* Cleanup locking, and fixed potential deadlocks

* Comments about locking order in FollowerInfo.

* properly check uint

* Keep old leader as potential failover candidate

* Transaction methods now use followerInfo to check if the leader can write, this might have the sideeffect that 'failoverCandidates' are updated

* Let agency check failoverCandidates if possible

* Initialize member variables

* Use unified follower reporting in DBServerAgencySync

* Removed obsolete variable, collecting it somewhere else

* repl factor attr check

* Reimplemented previous followers, second attempt now. PhaseOne and PhaseTwo can now synchronize on current.

* Fixed assertion, forgot an off-by-one

* adjusted test to be more preciese now

* Fixed failove candidates list

* Disable write on dropping too many followers

* Allow to run updateFailoerCandidates multiple times with same leader.

* Final fixes, resilience tests now green, crossing fingers for jenkins

* Fixed race on atomics comparison

* Fixed invalid number type

* added nullptr handling

* added nullptr handling

* Removed invalid assert

* Make takeover of leadership an atomic operation

* Update tests/js/common/shell/shell-cluster-collection.js

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Review fixes

* Fixed creation code to use takeoverLeadership

* Update arangod/Cluster/FollowerInfo.h

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Applied review fixes

* There is no timeout

* Moved AQL + Pregel to INTERNAL_AQL lane, which is medium priority, to avoid deadlocks with Sync replication

* More review fixes

* Use difference if you want to compare two vectors...

* Use std::string ...

* Now check if we are in recovery mode

* Added documentation for minReplicationFactor

* Added readme update as well in documenation

* Removed merge conflict leftovers 0o, i should not trust the IDE

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/Books/Manual/Architecture/Replication/README.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update CHANGELOG

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/Books/Manual/DataModeling/Collections/DatabaseMethods.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/Books/Manual/ReleaseNotes/NewFeatures35.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/DocuBlocks/Rest/Collections/1_structs.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/graphManagementView.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update Documentation/DocuBlocks/Rest/Graph/1_structs.md

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Apply suggestions from code review

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Adepted review requests, thanks for finding!

* Removed unnecessary const

* Apply suggestions from code review

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Moved initilization of variable more downwards

* Apply lock before notify_all()

* Remove documentation except DocuBlocks, covered by PR in docs repo

* Remove accidental indent

* Removed leftover merge conflict in documentation block
2019-07-23 13:14:38 +02:00
Michael Hackstein 19c25d1e3b Fixed merge conflict marker 2019-07-19 17:26:45 +02:00
Tobias Gödderz 7e98f56cf5 Bug fix/clean replication api wal tracking (#9473) 2019-07-19 15:44:14 +02:00
Jan cdbe63fa6e
Bug fix/fix races in collection creation (#9506) 2019-07-19 15:11:08 +02:00
Michael Hackstein 36b1d290a9
Bug fix/failover with min replication factor (#9486)
* Improve collection time of IResearchQueryOptimizationTest

* Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it

* Added some assertion son minReplicationFactor

* Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled

* added minReplicationFactor to the user interface, preparation for the collection api changes

* added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo

* added minReplicationFactor usage to tests

* TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME

* minReplicationFactor now able to change via collection  properties route

* fixed wrongly assert

* added minReplicationFactor to the graph management ui

* added minReplicationFactor to the gharial api

* Fixed off-by-one error in minReplicationFactor. We actually enforced one more.

* adjusted description of minReplicationFactor

* FollowerInfo Refactoring

* added gharial api graph creation tests with minimal replication factor

* proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests

* added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor

* Debug logging

* MORE Debug logging

* Included replication fast lane

* Use correct minreplicationfactor

* modified debug logging

* Fixed compileissues

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* Revert "MORE Debug logging"

This reverts commit dab5af28c0.

* Revert "MORE Debug logging"

This reverts commit 6134b664bd.

* Revert "MORE Debug logging"

This reverts commit 80160bdf3b.

* Revert "MORE Debug logging"

This reverts commit 06aabcdfe1.

* Removed debug output

* Added replication fast lane. Also refactored the commands as i cannot take it any more...

* Put some requests of RocksDBReplication onto CATCHUP Lane.

* Put some requests of MMFilesReplication onto CATCHUP Lane.

* Adjusted Fast and MED lane usage in Supervised scheduler

* Added changelog entry

* Added new features entry

* A new leader will now keep old followers in case of failover

* Update arangod/Cluster/ClusterCollectionCreationInfo.cpp

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Fixed JSLINT

* Unified lane handling of replication handlers

* Sorry forgotten in last commit

* replaced strings with static strings

* more use of static strings

* optimized min repl description in the ui

* decr initial loop variable

* clean up of the createWithId test

* more use of static strings

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Added some comments on condition, renamed variable as suggested in review

* Added check for min replicationFactor to be non-zero

* Added assertion

* Added function to modify min and max replication factor in one go

* added missing semicolon

* rm log devel

* Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place

* Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers

* check replFactor against nr dbservers

* Add lie reporting in CURRENT

* Reverted most of my recent commits about Failover situation. The intended plan simply does not work out

* move replication checks from logical collection to rest collection handler

* added more replication tests

* Include assert only if we are not in gtest

* jslint

* set min repl factor to zero if satellite collection

* check replication attributes in v8 collection

* Initial commit, old plan, does not yet work

* fixed ires tests

* Included FailoverCandidates key. Not fully implemented

* fixed wrong assert

* unified in sync follower reporting

* fixed compiler errors

* Cleanup locking, and fixed potential deadlocks

* Comments about locking order in FollowerInfo.

* properly check uint

* Keep old leader as potential failover candidate

* Transaction methods now use followerInfo to check if the leader can write, this might have the sideeffect that 'failoverCandidates' are updated

* Let agency check failoverCandidates if possible

* Initialize member variables

* Use unified follower reporting in DBServerAgencySync

* Removed obsolete variable, collecting it somewhere else

* repl factor attr check

* Reimplemented previous followers, second attempt now. PhaseOne and PhaseTwo can now synchronize on current.

* Fixed assertion, forgot an off-by-one

* adjusted test to be more preciese now

* Fixed failove candidates list

* Disable write on dropping too many followers

* Allow to run updateFailoerCandidates multiple times with same leader.

* Final fixes, resilience tests now green, crossing fingers for jenkins

* Fixed race on atomics comparison

* Fixed invalid number type

* added nullptr handling

* added nullptr handling

* Removed invalid assert

* Make takeover of leadership an atomic operation

* Update tests/js/common/shell/shell-cluster-collection.js

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Review fixes

* Fixed creation code to use takeoverLeadership

* Update arangod/Cluster/FollowerInfo.h

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Applied review fixes

* There is no timeout

* Moved AQL + Pregel to INTERNAL_AQL lane, which is medium priority, to avoid deadlocks with Sync replication

* More review fixes

* Use difference if you want to compare two vectors...

* Use std::string ...

* Now check if we are in recovery mode

* Added documentation for minReplicationFactor

* Added readme update as well in documenation
2019-07-19 15:00:30 +02:00
Wilfried Goesgens c922c5f133 move deleting of ClusterComm threads to the unprepare of the ClusterFeature (#9369) 2019-07-19 13:53:00 +02:00
Wilfried Goesgens ca0f2b8b86 All hail to the SI (#9445) 2019-07-19 13:52:12 +02:00
Michael Hackstein cbcf561450
Feature/min replication factor (#9433)
* Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it

* Added some assertion son minReplicationFactor

* Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled

* added minReplicationFactor to the user interface, preparation for the collection api changes

* added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo

* added minReplicationFactor usage to tests

* TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME

* minReplicationFactor now able to change via collection  properties route

* fixed wrongly assert

* added minReplicationFactor to the graph management ui

* added minReplicationFactor to the gharial api

* Fixed off-by-one error in minReplicationFactor. We actually enforced one more.

* adjusted description of minReplicationFactor

* FollowerInfo Refactoring

* added gharial api graph creation tests with minimal replication factor

* proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests

* added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor

* Debug logging

* MORE Debug logging

* Included replication fast lane

* Use correct minreplicationfactor

* modified debug logging

* Fixed compileissues

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* MORE Debug logging

* Revert "MORE Debug logging"

This reverts commit dab5af28c0.

* Revert "MORE Debug logging"

This reverts commit 6134b664bd.

* Revert "MORE Debug logging"

This reverts commit 80160bdf3b.

* Revert "MORE Debug logging"

This reverts commit 06aabcdfe1.

* Removed debug output

* Added replication fast lane. Also refactored the commands as i cannot take it any more...

* Put some requests of RocksDBReplication onto CATCHUP Lane.

* Put some requests of MMFilesReplication onto CATCHUP Lane.

* Adjusted Fast and MED lane usage in Supervised scheduler

* Added changelog entry

* Added new features entry

* A new leader will now keep old followers in case of failover

* Update arangod/Cluster/ClusterCollectionCreationInfo.cpp

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Fixed JSLINT

* Unified lane handling of replication handlers

* Sorry forgotten in last commit

* replaced strings with static strings

* more use of static strings

* optimized min repl description in the ui

* decr initial loop variable

* clean up of the createWithId test

* more use of static strings

* Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Added some comments on condition, renamed variable as suggested in review

* Added check for min replicationFactor to be non-zero

* Added assertion

* Added function to modify min and max replication factor in one go

* added missing semicolon

* rm log devel

* Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place

* Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers

* check replFactor against nr dbservers

* Add lie reporting in CURRENT

* Reverted most of my recent commits about Failover situation. The intended plan simply does not work out

* move replication checks from logical collection to rest collection handler

* added more replication tests

* Include assert only if we are not in gtest

* jslint

* set min repl factor to zero if satellite collection

* check replication attributes in v8 collection

* fixed ires tests

* fixed wrong assert

* properly check uint

* repl factor attr check

* adjusted test to be more preciese now

* Fixed race on atomics comparison

* Fixed invalid number type

* Update tests/js/common/shell/shell-cluster-collection.js

Co-Authored-By: Tobias Gödderz <tobias@arangodb.com>

* Review fixes

* More review fixes
2019-07-19 13:02:28 +02:00
Jan 300b8e58f4
Bug fix/fix duplicate actions (#9452) 2019-07-19 09:17:19 +02:00
Jan 16405482c8
micro optimizations (#9487) 2019-07-17 14:10:18 +02:00
Simon e5507d840f Feature/comm task refactor (#9426) 2019-07-16 09:43:25 +02:00
Matthew Von-Maszewski f26403c4e3 BugFix: Some error results have messages that are not reporting (#9454)
* some error results have messages that are not reporting

* update CHANGELOG for rocksdb reporting fix

* Add mutex protection to errMsg usage per Jans code review
2019-07-11 18:29:39 +03:00
Jan c52f2a8315
refactoring (#9411) 2019-07-09 11:15:52 +02:00
Jan 1d15b50d22
Bug fix/applicationserver stop (#9414) 2019-07-08 20:30:05 +02:00
Jan 66d8c01ad6
Bug fix/fixes 08 07 2019 (#9423) 2019-07-08 20:29:29 +02:00
Tobias Gödderz f501e00e9d Bug fix/add shard id to replication client identifier (#9366) 2019-07-08 14:03:42 +02:00
Max Neunhöffer 7aa0c19026
Leader updates Current precondition fixes. (#9410)
* Better logging and error reporting.
* Preconditions for FollowerInfo.
* Preconditions when updating Current as leader.
* Change a log level.
* Fix unit tests.
* CHANGELOG.
* LOG_TOPIC ids.
* Fix a log id.
* Fix Windows compilation.
2019-07-05 13:35:13 +02:00
Jan 1a58cc2213
add VelocyPackHelper::equal method (#9389) 2019-07-03 12:15:11 +02:00
Jan 11f5f33659
make sure all error code names are prefixed with ERROR_ @fceller @kvs85 (#9384) 2019-07-02 18:07:33 +02:00
Jan 9cb08ded92
make the comparison functions unambiguous (#9349)
* make the comparison functions unambiguous

* added @kaveh's suggestion
2019-07-01 16:35:28 +02:00
Jan d842b877a2
actually honor the return value of FollowerInfo::addFollower (#9358) 2019-06-28 18:31:15 +02:00
Jan 1653e9698a
remove unused functionality (#9360) 2019-06-28 18:28:11 +02:00
Kaveh Vahedipour 2fb159a8e2 compilation fix (#9340) 2019-06-26 17:00:33 +02:00
Kaveh Vahedipour 18fa84d619 if collection is gone in meantine ... (#9332) 2019-06-26 15:11:51 +02:00
Simon 505138b7d1 Fix issues with parallel database creation/dropping (#9272) 2019-06-25 11:09:26 +02:00
Simon cf7cf0131b Try to fix corruption error (#9258) 2019-06-25 10:18:26 +02:00
Michael Hackstein e61fb5a34e
Bug fix/create collections better preconditions (#9296)
* Fixed case where a SmartVertex collection could be available too early. Only possible if a SmartGraph is created only with this one collection.

* Now the TTL remove operation will properly check preconditions again.

* Second attempt, we only  say collection creation was success iff the plan for the collection has not been mdified during create.

* Disabled assertion in favor of tests.

* Removed debug output
2019-06-21 15:17:57 +02:00
Lars Maier 49fde75427 Added special lock for local data. Use read and write locking. Do not hold read lock during agency transactions. (#9277) 2019-06-21 14:27:20 +02:00
Markus Pfeiffer a6bf8028c7 Try improving error message (#9303)
When we try dropping a collection `A` for which there are other collections
that have distributeShardsLike set to `A`, mention this in the error message.
2019-06-21 14:23:42 +02:00
Wilfried Goesgens 6fd3dd024c windows warning as errors, masquerade remaining unfixeable errors (#9249) 2019-06-17 16:21:29 +02:00
Matthew Von-Maszewski 80157a1267 update syncRequest() so that it aborts pending messages on shutdown. (#9260) 2019-06-13 21:47:45 +02:00
Michael Hackstein 2c78e2471b Bug fix/collection babies race timeout (#9185)
* Fixed include guard.

* Forward port of 3.4 bug-fix

* Removed lockers alltogether we are secured mutex already

* Fixed recursive lock gathering
2019-06-13 19:11:24 +02:00
Dan Larkin-York 1b5475a0c2 Revert change to shared_ptr in ClusterInfo. (#9256) 2019-06-12 16:19:18 +02:00
Dan Larkin-York 938e5fd36c Add drop-check for index creation in cluster (#9219)
* Add drop-check for index creation in cluster.

* Move check from callback to regular read.

* Add changelog entry.

* Incorporate review suggestion

Co-Authored-By: Simon <simon@graetzer.org>

* Convert to VPackArrayIterator.
2019-06-10 10:11:17 +02:00
Jan 63eadb1d79
fix issues (#9236) 2019-06-08 19:40:42 +02:00
Jan Christoph Uhde 3f603f024f remove some containers from common.h (#9223)
* remove some containers from Common.h

* enterprise fixes
2019-06-07 13:27:24 +02:00
Tobias Gödderz b632d58c80 Fix shutdown deadlock regarding comm tasks (#9204)
* Wait for _commTasks in unprepare, that is after Cluster::stop

* Chose better method names

* Revert "Chose better method names"

This reverts commit 91e821348740c655f47207af7e570075f2241895.

* Revert "Wait for _commTasks in unprepare, that is after Cluster::stop"

This reverts commit 6551ae90d74fc046369fdb97cc5872706ce1a184.

* Next try, stop ClusterComm threads earlier
2019-06-07 13:23:33 +02:00
Jan 6a07476c41
don't include the Logger in header files if it's not necessary (#9216) 2019-06-07 10:08:03 +02:00
Michael Hackstein d135d55d55
Bug fix/collection babies (#9124)
* Bug fix 3.4/collection babies (#9033)

* Prepare API to create multiple collections in a single request to ClusterMethods to improve speedup

* Added counter on how many collections are successfully created

* Allow multi collection creation one level higher

* CollectionMethods now allow batch createion of Collections

* Improved array size assertions

* Now a graph is createad within a single roundtrip in the agency.

* Added new header files

* Insert collections in the AGENCY with TTL and a isBuilding flag, collections with this flag should not be visisible in the coordinator

* Added forgotten C++ file

* Fixed a rare race condition, and the failing IResearch Tests

* readded callback on DONE, otherwise lists are out of sync

* Fixed assertions to let mocked tests pass...

* Fixed community cluster

* Started fixing IResearch analyzer test, catch-tests are failing ;(

* Solved missed merge-conflict

* Added helper functions in AnalyzerFeature-test

* Refactoring AnalyzerTest Section-Auth

* Refactoring AnalyzerTest Section-Emplace-Duplicates

* Refactoring AnalyzerTest Section-Emplace-Error-Cases. Recovery-Test is now red, it seemed to be green because of invalid test case before.

* Refactoring AnalyzerTest, split GET test into multiple parts, still left 'cluster simulation'.

* Attempt to extract Coordinator / DBServer tests a little bit. This commit starts to break all Coordinator tests. However i am convinced that earlier version did NOT test a cluster situation at all, but some hybrid of SingleServer with full local storage that got told to be a Coordinator from now on, but without any Coordinator setup...

* Temporarly disabled some tests in AnalyzerFeature, as discussed with @gnusi.

* Fixed include guard.

* Temporarily deactivated failing tests

* You shall save your files before you commit...

* Fixed test asserting on plan version, which is now higher than before
2019-06-03 17:11:22 +02:00