1
0
Fork 0
Commit Graph

2102 Commits

Author SHA1 Message Date
Lars Maier 4bf2302150 Do nothing in phaseTwo if leader has not been touched. (#7579)
* Do nothing in phaseTwo if leader has not been touched.

* Drop follower if it refuses to cooperate.

This is important since a dbserver that is follower for a shard will
after a reboot think that it is a leader, at least for a short amount
of time. If it came back quickly enough, the leader might not have
noticed that it was away.
2018-12-02 13:14:46 +01:00
Frank Celler a86fd3dd67 fixed init 2018-11-30 21:17:38 +01:00
Frank Celler 067606da3a
Bug fix 3.4/bad leader report current (#7574)
* Initialize theLeader non-empty, thus not assuming leadership.

* Correct ClusterInfo to look into Target/CleanedServers.

* Prevent usage of to be cleaned out servers in new collections.

* After a restart, do not assume to be leader for a shard.
2018-11-30 21:11:48 +01:00
Jan 836954b8e3
allow using UTF8 filenames for UUID directory (#7569) 2018-11-30 17:25:50 +01:00
Andrey Abramov 2c36657a9e improve logging in ClusterInfo::loadPlan (#7511) (#7532) 2018-11-29 20:08:11 +01:00
Tobias Gödderz f61ccd4047 Reload Foxx routes during startup (#7531) 2018-11-29 15:31:40 +01:00
Andrey Abramov e67c2cac06
avoid calling cluster related functions while instantiating views on … (#7509) (#7528)
* avoid calling cluster related functions while instantiating views on a db server

* minor cleanup
2018-11-29 17:18:34 +03:00
Kaveh Vahedipour 3225a7b16d [3.4] Feature/engine version added to agent configuration (#7481)
* agents' is obtained from leader's configuration
* corrections in Supervision for advertised endpoints
* change log
* Updated Documentation for cluster/health.
* Unified naming convention.
* Fixed missing update of volatile fields.
* Set version in right order.
* Removed debug output.
* Fixed jslint - missing ;
2018-11-29 12:00:47 +01:00
Max Neunhöffer 804ac13db2
SynchronizeShard's potentially long running while loops yield for shutdown (#7523) 2018-11-29 11:47:16 +01:00
Max Neunhöffer b74358a3dd
Improve log messages. (#7520) 2018-11-29 11:30:43 +01:00
Max Neunhöffer 10b6813f01
Fix index creation (port from devel). (#7443)
* Fix index creation in cluster.

Simplify and correct error handling logic in ensureIndexCoordinator.

* After index creation, wait until index appears.

We wait until the Supervision has removed the isBuilding flag and
the coordinator has reloaded the Plan.

* More index handling fixes.

* Explicitly remove isBuilding flag in coordinator (again).

* Fix order of arguments in REPLACE call.

* Take out debugging output again.

* Fix catch tests by holding mutex shorter.

* Better mutex handling in ClusterInfo.
2018-11-28 16:58:27 +01:00
Lars Maier 154d449061 Export Version and Engine in Cluster Health. Additionally export `versionString` in registered Servers. (#7463) 2018-11-27 09:15:38 +01:00
Jan ffc823e1c8
Bug fix 3.4/backport optimizations (#7434) 2018-11-26 19:16:05 +01:00
Tobias Gödderz a83300dc29 Fix error handling in case ClusterCommResult.result == nullptr (#7355) 2018-11-26 16:22:43 +01:00
Andrey Abramov 822e15e770
issue 153: ensure views are dropped in Agency when database is dropped in cluster, minor fixes (#7370) (#7451)
* issue 153: ensure views are dropped in Agency when database is dropped in cluster, minor fixes

* backport: add test to ensure views are dropped when database is dropped from plan, fix some issues in ClusterInfo

* optimize primary key lookups in ArangoSearch

* fix test

* Add JS tests

* temporary comment optimizations

# Conflicts:
#	arangod/Cluster/ClusterInfo.cpp
2018-11-26 00:33:58 +03:00
Michael Hackstein 8098bb4eed
Bug fix 3.4/syncing of followers (#7377)
* Added some DEBUG output for replication rest handler

* Some more debug logging.

* Increased the priority of the ReplicationHandler. This way we will not get stuck with locks that cannot be canceled. Also cancel the lock on the correct database.

* Added extensive log output for replication thins

* Added tombstones to RestReplicationHandler. In a very unlikely case the cancel of a lock can be executed BEFORE the code that actually registers the lock, in this case we will now write a tombstone and do not lock.

* Revert "Added extensive log output for replication thins"

This reverts commit 6d4e37ea1e59e3b3457336019cc7dbc4c979504d.

* Added extensive log output for replication things, now in ERR level instead of MAINTAINER only

* Now actually use hours for synchronization

* React to errors under soft lock if they show up.

* Added a retry loop to increase the read-lock timer.

* Added more timeing output in RocksDB collection internals to figure out why the followers are dropped

* Tweaked RocksDB options

* Revert "Tweaked RocksDB options"

This reverts commit 2bf9c43280beda4792c47d079387fe5154cdd896.

* Removed debug output

* Applied all requested changes by goedderz

* Deleted unused variable
2018-11-23 16:08:27 +01:00
Wilfried Goesgens 4bbd6a02bb Bug fix/less exceptions (#7385) (#7415) 2018-11-23 11:15:36 +01:00
Wilfried Goesgens c50d346453 add alternative to ClusterInfo::getCollection() that doesn't throw (#7413)
* add alternative to ClusterInfo::getCollection() that doesn't throw (#7339)

* handle more potential nullptrs, fix try/catch scope
2018-11-23 11:15:25 +01:00
Wilfried Goesgens d4af8fe287 remove enterprise-gotos (#7375) (#7414) 2018-11-23 11:14:21 +01:00
Kaveh Vahedipour 860fa21219 Bug fix 3.4/index readiness (#6716)
* backport of test data generation for maintenance from devel
* 3.4 working
* fixing index use in cluster while still being built
* fixed broken views
* correct 200 for ensureIndex
* merge with 3.4
* agency comm to handle replace in array
* supervision changes
* cluster info's exsureIndex
* 3.4 ready
* timeout
* missing files from origin
* neunhoef complaints
* bogus entry
* no need to wait for current once again
* no longer necessary. done in IndexFactory now
* correct comments
* left overs
* dead code revived
* Move CHANGELOG entry to the right place.
2018-11-21 14:41:36 +01:00
Simon 5124633e6a Faster index creation (#7348) 2018-11-20 13:41:01 +01:00
Tobias Gödderz 3d1c643e23 [3.4] MMFiles replication: get followers under lock (#7298)
* Fix resign order

* Fixed a typo

* Get followers later, add TODOs

* Added a callback parameter to collection insert methods

* Get followers under the lock if necessary

* Extracted the replication of inserts into a separate method

* Move shortcut into replicate method

* Added callbacks for remove, replace and update

* Added missing overrides

* Extracted replication code from modifyLocal and removeLocal

* Update followers under lock also during replace, update, remove

* Fix changes from the last commit for update/replace

* Update comments, add asserts

* Remove changes for document-level locks that will be done in another PR

* Unify replication

* Adapt log messages to the devel ones

* Move common methods from its descendants to TransactionCollection, fix Mock on the way

* More IResearch test / mock fixes

* Relax asserts for nested transactions

* Reformat

* Fix non-babies remove and modify replication
2018-11-19 13:03:07 +01:00
Max Neunhöffer c005e0b0f0
Improve error reporting in maintenance. (#7340)
* Improve error reporting from maintenance.
* Fix compilation.
* Tiny polishing fix.
2018-11-16 10:25:55 +01:00
Max Neunhöffer 805f7a7621
Fix timeout in cluster operation in create and drop collections. (#7300)
* Fix loophole.
* Fix inquiry case of id not found: 404.
* Also handle correctly in AgencyComm.
* Fix agency tests.
* Fix error handling in dropCollectionOnCoordinator.
2018-11-14 10:02:26 +01:00
jsteemann bce1f51b8c simplify conditions 2018-11-12 11:14:19 +01:00
Dan Larkin-York 8bd754b9ad [3.4] Fix nullptr dereference in SynchronizeShard. (#7267) 2018-11-08 14:12:33 +01:00
Simon f4a1f15964 Simplify dropDatabaseCoordinator & fix some bugs (#7211) (#7243) 2018-11-07 10:41:02 +01:00
Matthew Von-Maszewski d927e8ebeb
Bugfix 3.4: revert recently added condition variable in ClusterCommThread stop (#7239)
* remove recent _activeThreadCondition. it made things worse. moved all ClusterCommThread methods to end of file to ease review.

* attempt at avoiding Scheduler io_context being nullptr in late shutdown steps

* manually revert last change since bug is realy about devel branch not 3.4 branch
2018-11-06 13:43:51 -06:00
Matthew Von-Maszewski d4c8b43024
test to verify communication thread has fully exited before saying ClusterComm is stopped. (#7232) 2018-11-05 16:31:33 -06:00
Vasiliy d644561f1f issue 496.4.1: backport 3.4: move StorageEngine-specific flag out of the genric API and closer to the storage engine (#7213)
* issue 496.4.1: backport 3.4: move StorageEngine-specific flag out of the genric API and closer to the storage engine

* address merge issue
2018-11-04 16:52:54 +03:00
Simon cf86d9bbc8 Fix a crash in DBServerAgencySync (#7204) 2018-11-03 20:19:04 +01:00
Max Neunhöffer 42fd0825ab
Fix timeouts for write operations from coordinator to leader. (#7081)
* Improve logging on coordinator when doing `arangorestore`.

* Return more error information in `mergeResults`.

* Longer timeout for communication coordinator -> leader for writes.

This is taking into account possible write stops from followers needed
to get in sync.

* Fix compilation.

* Get rid of numbers in exception log messages.

* Fix compilation.

* Fix indentation.
2018-10-31 14:39:48 +01:00
Michael Hackstein b280142efa
Revert "fixes some misbehaviour within the coordinator agency callbacks (#7104)" (#7150)
This reverts commit 9ee7a0e955.
2018-10-30 16:48:56 +01:00
Heiko 9ee7a0e955 fixes some misbehaviour within the coordinator agency callbacks (#7104)
* fixes some misbehaviour within the coordinator agency callbacks

* changelog
2018-10-30 16:47:37 +01:00
Simon c073b9dbbe Make ensureIndexOnCoordinator more robust (#7110) (#7130) 2018-10-30 11:25:06 +01:00
Simon 9271a11441 RocksDB replication thread safety (#7088) (#7131) 2018-10-30 11:24:17 +01:00
Vasiliy e6a6025818 backport: switch scope of responsibility between a TRI_vocbase_t and a LogicalView in respect to view creation/deletion (#7106)
* backport: switch scope of responsibility between a TRI_vocbase_t and a LogicalView in respect to view creation/deletion

* backport: ensure arangosearch links get exported in the dump

* backport: ensure view is created during restore on the coordinator

* Updates for ArangoSearch DDL tests, IResearchView unregistration and known issues

* Add fix for internal issue 483
2018-10-30 12:50:29 +03:00
Tobias Gödderz e9388ab710 [3.4] Stop curl from trying to POST stdin (#7097)
* Stop libcurl from trying to POST stdin

* Stop relocking every iteration in wait

* Remove unimplemented function

* Restrict setting of empty POSTFIELDS to POST requests

* Revert locking change
2018-10-29 14:41:23 +01:00
Michael Hackstein e05880895a
Bug fix 3.4/shorter foot in door (#7084)
* Implement `syncCollectionCatchup` in DatabaseTailingSyncer.

First stab, might not even compile.

* Fixed a typo.

* Fix a typo and a compilation problem.

* Further compilation fix.

* Implement two stage catchup.

* Two small corrections.

* Unified error messages in Synchronize shard job.

* Improved a code comment.

* Fixed autocasting bool->double and double->bool issue. That is truely one of the best features ever invented... </irony>

* Renamed doHardLock => toSoftLockOnly and inverted default value

* Merged soft/hard foot logic with Transaction splits

* Use scopeguards to cancel readlocks
2018-10-26 16:16:52 +02:00
Max Neunhoeffer 015275a724
Emergency fix to compile on gcc 8. 2018-10-26 11:13:56 +02:00
Max Neunhöffer 8564a08bbb
Try to fix timeout in drop collection. (#7058)
* Try to fix timeout in drop collection.
* Fix compilation.
2018-10-25 16:51:16 +02:00
Jan b903f1f8ff
Bug fix 3.4/fix catch test issues (#7045) 2018-10-25 12:49:00 +02:00
Simon e87b42a0c3 Silence tsan warnings (#7051) 2018-10-24 23:58:47 +02:00
Simon 6eb9e38b08 Better agency pool update (#7036) 2018-10-24 16:23:10 +02:00
Vasiliy 52e2c97693 backport missed changes (#7016) 2018-10-24 15:43:45 +03:00
Simon 8b19d40136 Properly compare velocypack objects in Agency operations (#6922) 2018-10-23 11:52:22 +02:00
Matthew Von-Maszewski 43016cf04f Bugfix 3.4: address concerns from prior scheduler PR (#7005) 2018-10-23 11:30:45 +02:00
Simon c0455e9c60 Add engine specific collection APIs (#6962) 2018-10-19 15:23:55 +02:00
Lars Maier d7863b4583 Bug fix 3.4/cluster comm threads start stop (#6939)
* Start ClusterComm threads in `ClusterFeature::start`. Stop ClusterComm threads in `ClusterFeature::stop`.

* Do not free objects in `Scheduler::shutdown`. Let the `unique_ptr` do their job. Stop ClusterComm threads in `ClusterFeature::stop`, but free instance in `ClusterFeature::unprepare`.

* `io_context` may contains lambdas that hold `shared_ptr`s to `Tasks` the required a functional `VocBase` in their destructor.

* Clean up.
2018-10-19 13:12:51 +02:00
Jan 19e2dd87bd
Replace engine equality feature (#6931) (#6950) 2018-10-17 20:34:19 +02:00