1
0
Fork 0
Commit Graph

2115 Commits

Author SHA1 Message Date
Wilfried Goesgens c0f9e8125a Bug fix/allow tcp connection to finish (#7635) 2018-12-10 10:38:34 +01:00
Jan 5bae3742e5
Feature/internal 3306 (#7683) 2018-12-06 16:19:28 +01:00
Jan 8305524250
nicer log messages with stringified status value (#7660)
instead of
```
2018-12-05T15:43:16Z [14614] P ERROR {maintenance} CancelBarrier: failed to send message to leader : status \x06
```
2018-12-05 18:05:39 +01:00
Lars Maier dd07d74d69 [devel] Bug fix/bad leader report current (#7585)
* Bug fix 3.4/bad leader report current (#7574)
* Initialize theLeader non-empty, thus not assuming leadership.
* Correct ClusterInfo to look into Target/CleanedServers.
* Prevent usage of to be cleaned out servers in new collections.
* After a restart, do not assume to be leader for a shard.
* Do nothing in phaseTwo if leader has not been touched. (#7579)
* Drop follower if it refuses to cooperate.

This is important since a dbserver that is follower for a shard will
after a reboot think that it is a leader, at least for a short amount
of time. If it came back quickly enough, the leader might not have
noticed that it was away.
2018-12-03 10:20:30 +01:00
Jan 14c598c194
allow using UTF8 filenames for UUID directory (#7568) 2018-11-30 16:44:04 +01:00
Michael Hackstein 2d73f04008
Bug fix 3.4/syncing of followers (#7377) (#7535)
* Added some DEBUG output for replication rest handler

* Some more debug logging.

* Increased the priority of the ReplicationHandler. This way we will not get stuck with locks that cannot be canceled. Also cancel the lock on the correct database.

* Added extensive log output for replication thins

* Added tombstones to RestReplicationHandler. In a very unlikely case the cancel of a lock can be executed BEFORE the code that actually registers the lock, in this case we will now write a tombstone and do not lock.

* Revert "Added extensive log output for replication thins"

This reverts commit 6d4e37ea1e59e3b3457336019cc7dbc4c979504d.

* Added extensive log output for replication things, now in ERR level instead of MAINTAINER only

* Now actually use hours for synchronization

* React to errors under soft lock if they show up.

* Added a retry loop to increase the read-lock timer.

* Added more timeing output in RocksDB collection internals to figure out why the followers are dropped

* Tweaked RocksDB options

* Revert "Tweaked RocksDB options"

This reverts commit 2bf9c43280beda4792c47d079387fe5154cdd896.

* Removed debug output

* Applied all requested changes by goedderz

* Deleted unused variable
2018-11-30 14:43:04 +01:00
Lars Maier 52cff7ad55 Feature/engine version added to agent configuration (#7481) (#7524)
* agents' is obtained from leader's configuration
* corrections in Supervision for advertised endpoints
* change log
* Updated Documentation for cluster/health.
* Unified naming convention.
* Fixed missing update of volatile fields.
* Set version in right order.
* Removed debug output.
* Fixed jslint - missing ;
2018-11-29 14:25:40 +01:00
Andrey Abramov 2a0fa4946e
improve logging in ClusterInfo::loadPlan (#7511)
* improve logging in ClusterInfo::loadPlan

* address review comments
2018-11-29 15:56:51 +03:00
Andrey Abramov 6674a4282d
avoid calling cluster related functions while instantiating views on … (#7509)
* avoid calling cluster related functions while instantiating views on a db server

* minor cleanup
2018-11-29 15:43:53 +03:00
Max Neunhöffer a16fbf5df3
Improve log messages. (#7521) 2018-11-29 11:30:52 +01:00
Jan b2924057e7
cleanup (#7507) 2018-11-28 19:42:37 +01:00
Max Neunhöffer ae29e5d2ba
Fix index creation in cluster. (#7440)
* Fix index creation in cluster.

Simplify and correct error handling logic in ensureIndexCoordinator.

* After index creation, wait until index appears.

We wait until the Supervision has removed the isBuilding flag and
the coordinator has reloaded the Plan.

* More index handling fixes.

* Directly remove isBuilding in ensureIndexCoordinator (again).

* Fix catch tests by holding mutex shorter.

* Better mutex handling in ClusterInfo.
2018-11-28 16:58:05 +01:00
Lars Maier f3ade0f860 Version/Engine Cluster Health (#7474)
* Export Version and Engine in Cluster Health. Additionally export `versionString` in registered Servers.

* Updated Changelog.
2018-11-27 14:56:00 +01:00
Tobias Gödderz 0d5f85e684 Fix error handling in case ClusterCommResult.result == nullptr (#7356) 2018-11-26 16:23:44 +01:00
Michael Hackstein 16d0874da5
Bug fix/synchronous replication catchup (#7146)
* merged fixes from 3.4

* odd fix

* Bug fix 3.4/sync repl release thread (#6784)

* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock

* Fixed insertion of query into registry in rest replication handler.

* Removed unnecessary / false asserts as suggested in review. Fixed code comments.

* Replaced auto with a correct type as suggested in review

* Added a helper function to validate if a query is in use in the registry

* Fixed logic bug in usage of query registry

* Fixed compile issue

* Automaticly transfrom int -> bool in initializerlist sucks...

* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.

* Today it seems that bools are too complicated for my brain.

* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future

* Applied chenges required by @goedderz in review

* Bug fix 3.4/shorter foot in door (#7084)

* Implement `syncCollectionCatchup` in DatabaseTailingSyncer.

First stab, might not even compile.

* Fixed a typo.

* Fix a typo and a compilation problem.

* Further compilation fix.

* Implement two stage catchup.

* Two small corrections.

* Unified error messages in Synchronize shard job.

* Improved a code comment.

* Fixed autocasting bool->double and double->bool issue. That is truely one of the best features ever invented... </irony>

* Renamed doHardLock => toSoftLockOnly and inverted default value

* Merged soft/hard foot logic with Transaction splits

* Use scopeguards to cancel readlocks

* Bug fix 3.4/sync replication allow soft and hard lock (#6864)

* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock

* Fixed insertion of query into registry in rest replication handler.

* Removed unnecessary / false asserts as suggested in review. Fixed code comments.

* Replaced auto with a correct type as suggested in review

* Added a helper function to validate if a query is in use in the registry

* Fixed logic bug in usage of query registry

* Fixed compile issue

* Implemented optional 'doHardLock' parameter in the replication acquire read-lock call. A hard-lock guarntees to stop all writes, a soft-lock may not.

* Fixed compile issue

* Automaticly transfrom int -> bool in initializerlist sucks...

* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.

* Today it seems that bools are too complicated for my brain.

* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future

* Applied chenges required by @goedderz in review

* Renamed doHardLock => toSoftLockOnly and inverted default value
2018-11-23 16:16:34 +01:00
Simon d5cb94d2d0 Minor refactoring (#7408) 2018-11-22 16:16:05 +01:00
Vasiliy 1a0b9b9261 issue 153: ensure views are dropped in Agency when database is dropped in cluster, minor fixes (#7370)
* issue 153: ensure views are dropped in Agency when database is dropped in cluster, minor fixes

* backport: add test to ensure views are dropped when database is dropped from plan, fix some issues in ClusterInfo

* optimize primary key lookups in ArangoSearch

* fix test

* Add JS tests

* temporary comment optimizations
2018-11-21 19:18:34 +03:00
Kaveh Vahedipour 9ec6619b84 Bug fix/index readiness (#6541)
* indexes are marked  while still missing in Current
* index handling getCollection
* supervision gets indexes from isbuilding, when coordinator is gone before finishing
* seems right now
* fixed broken views
* remove junk comments
* cleanup
* node / supervision adjustements
* supervision fixes
* neunhoef remarks part i
* neunhoef remarks part ii
* neunhoef remarks part ii
* neunhoef remarks part iiI
* collection's current version please
* no need to wait for current once again
* no longer necessary code
* clear comments
* delete left overs
* dead code revived
2018-11-21 14:42:58 +01:00
Wilfried Goesgens 0a7c7446af Bug fix/less exceptions (#7385) 2018-11-21 12:00:14 +01:00
Simon cc55ef9f82 Faster index creation (#7348) (#7383) 2018-11-21 09:53:14 +01:00
Wilfried Goesgens 56289dcdbb remove enterprise-gotos (#7375) 2018-11-20 16:06:26 +01:00
Wilfried Goesgens 05a7d4e96e add alternative to ClusterInfo::getCollection() that doesn't throw (#7339) 2018-11-20 16:05:57 +01:00
Tobias Gödderz c61ed1d77a MMFiles-replication-get-followers-under-lock (forward-port) (#7343)
* Forward-port of bug-fix-3.4/mmfiles-replication-get-followers-under-lock

Fix resign order

Fixed a typo

Get followers later, add TODOs

Added a callback parameter to collection insert methods

Get followers under the lock if necessary

Extracted the replication of inserts into a separate method

Move shortcut into replicate method

Added callbacks for remove, replace and update

Added missing overrides

Extracted replication code from modifyLocal and removeLocal

Update followers under lock also during replace, update, remove

Fix changes from the last commit for update/replace

Update comments, add asserts

Remove changes for document-level locks that will be done in another PR

Unify replication

Adapt log messages to the devel ones

Move common methods from its descendants to TransactionCollection, fix Mock on the way

More IResearch test / mock fixes

Relax asserts for nested transactions

Reformat

Fix non-babies remove and modify replication

* Remove some changes introduced by the merge

* Fixed compile errors introduced by merge
2018-11-20 09:43:26 +01:00
Max Neunhöffer 476f941161
Improve error reporting in maintenance. (#7341)
* Improve error reporting from maintenance.
* Fix compilation.
* Tiny polishing fix.
2018-11-16 10:25:30 +01:00
Markus Pfeiffer 39bdebf851 Port bug-fix-3.4/timeout-create-coll to devel (#7307)
* Fix loophole in error handling.
* Fix inquiry case of id not found: 404.
* Also handle correctly in AgencyComm.
* Fix agency tests.
* Fix error handling in dropCollectionOnCoordinator.
2018-11-14 10:03:55 +01:00
Jan a5db298c92
fix buffer overrun, remove unused variable (#7302) 2018-11-13 14:18:50 +01:00
Jan dbf8d582d5
added missing change to clusterinfo (#7294) 2018-11-13 11:37:48 +01:00
Dan Larkin-York 48c3fd3b7f Fix nullptr dereference in SynchronizeShard. (#7268) 2018-11-08 14:13:00 +01:00
Max Neunhöffer a74330250f
Port bug fix 3.4/cluster comm threads start stop (#6939) to devel. (#7253)
* Start ClusterComm threads in `ClusterFeature::start`. Stop ClusterComm threads in `ClusterFeature::stop`.

* Do not free objects in `Scheduler::shutdown`. Let the `unique_ptr` do their job. Stop ClusterComm threads in `ClusterFeature::stop`, but free instance in `ClusterFeature::unprepare`.

* `io_context` may contains lambdas that hold `shared_ptr`s to `Tasks` the required a functional `VocBase` in their destructor.
2018-11-07 21:42:34 +01:00
Simon 386fc0e9ad Simplify dropDatabaseCoordinator & fix some bugs (#7211) 2018-11-06 15:26:33 +01:00
Vasiliy 68953ae33a issue 496.4.1: move StorageEngine-specific flag out of the genric API and closer to the storage engine (#7212) 2018-11-04 16:52:28 +03:00
Simon 40f54aebc8 Fix a crash in DBServerAgencySync (#7207)
(cherry picked from commit 99ba608be98a44b8ce1d0e681107271a22f42761)
2018-11-03 20:18:15 +01:00
Jan 1973022d00
Bug fix/refactor find emplace (#7197) 2018-11-02 17:18:47 +01:00
Max Neunhöffer 37359821cb
Fix arangorestore by adjusting timeouts in write ops. (#7083)
* Improve logging on coordinator when doing `arangorestore`.

* Return more error information in `mergeResults`.

* Longer timeout for communication coordinator -> leader for writes.

This is taking into account possible write stops from followers needed
to get in sync.

* Fix compilation.

* Get rid of numbers in exception log messages.

* Fix a typo.

* Fix compilation.
2018-10-31 14:39:58 +01:00
Vasiliy 8f44afb6cf issue 496.1: switch scope of responsibility between a TRI_vocbase_t and a LogicalView in respect to view creation/deletion (#7101)
* issue 496.1: switch scope of responsibility between a TRI_vocbase_t and a LogicalView in respect to view creation/deletion

* backport: address test failures

* backport: ensure arangosearch links get exported in the dump

* backport: ensure view is created during restore on the coordinator

* Updates for ArangoSearch DDL tests, IResearchView unregistration and known issues

* Add fix for internal issue 483
2018-10-30 12:50:35 +03:00
Simon 5b71dff64f RocksDB replication thread safety (#7088) 2018-10-29 18:09:46 +01:00
Simon c72818a9dc Make ensureIndexOnCoordinator more robust (#7110) 2018-10-29 17:45:46 +01:00
Jan 5719a7fc99
remove unneeded nullptr checks (#7121) 2018-10-29 11:51:27 +01:00
Matthew Von-Maszewski 97ba8ca2be Bugfix: More 3.4 scheduler changes backported (#7091) 2018-10-26 17:09:20 +02:00
Simon 10dc287eb3 Silence Tsan warnings (#7075) 2018-10-25 15:50:39 +02:00
Jan 221d036d5d
Bug fix/fix catch test issues (#7044) 2018-10-25 11:39:55 +02:00
Simon d23aaa2198 Better agency pool update (#7040) 2018-10-24 16:23:21 +02:00
Simon 4c1e8819c2 Add engine specific collection APIs (#6977) 2018-10-19 17:46:33 +02:00
Simon 8b7a4099b8 Properly compare velocypack objects in Agency operations (#6921)
* Properly compare velocypack objects in Agency operations

* Add changelog

* added option for VPackDumper
2018-10-17 20:03:53 +02:00
Simon cb4c07e0ed Replace engine equality feature (#6931)
* replace engine equality feature

* remove pointless code
2018-10-17 14:41:47 +02:00
Vasiliy 78567bef09 update iresearch to codebase as of 20181011 (#6858)
* update iresearch to codebase as of 20181011

* backport: address cluster test failures

* backport: address dump test failures

* backport: address discrepency in view creation between single-server and cluster

* backport: address test failure on cluster (revert change)

* backport: address test failures

* backport: address MSVC build issues

* backport: address issue with LogicalDatasource destructing after TRI_vocbase_t

* Revert "backport: address issue with LogicalDatasource destructing after TRI_vocbase_t"

This reverts commit 4f9880bbaa22194dfbb604b5a54658de1d447ac1.
2018-10-12 21:07:12 +03:00
Jan 2dc05429fe
Bug fix/fixes 110918 2 (#6848) 2018-10-12 12:59:10 +02:00
Jan c7cd0262aa
suppress some of these dreaded error messages (#6786) 2018-10-11 10:46:12 +02:00
Dan Larkin-York 4644d2b023 Fix issue with colleciton/view name conflict checking in cluster. (#6796) 2018-10-11 10:45:29 +02:00
Tobias Gödderz 102d17de89 Rework move shards with view test (#6773)
* Fixed testSetup(). Reduced redundant code.

* Reworked assertions in moving-shards-with-arangosearch-view-cluster.js

* Added changes from review

* Removed debug output / fixed jslint error
2018-10-11 10:25:22 +02:00