1
0
Fork 0
Commit Graph

35 Commits

Author SHA1 Message Date
Jan 4a749a66b3 Bug fix 3.5/multi bugs (#9792)
* add missing whitespace to make error message readable

* try to continue running scheduler threads even when there are exceptions

* give up trying to persist follower info in agency for already dropped collections.

* updated CHANGELOG
2019-08-26 15:51:14 +03:00
Tobias Gödderz 87e5fe7dd2 Bug fix 3.5/clean replication api wal tracking (#9503)
* Use int type for server id

Change serverId to an int

Pass syncerId only for synchronous replication

Added UrlBuilder

structs to classes, reordering

Added Location class, cleanup

Fixed initialization order

Use Location class

Use string for large ints

Documentation

Added clientInfo to ReplicationClientProgressTracker and corresponding rest handlers

Pass clientInfo string in sync replication

Pass clientInfo in addFollower, too

Updated docu

Renamed UrlBuilder to UrlHelper

Updated docu

Try to fix compile error on windows

Fixed a bug and a test

* Implemented @jsteeman's comments
2019-07-18 19:38:31 +03:00
Jan c52f2a8315
refactoring (#9411) 2019-07-09 11:15:52 +02:00
Tobias Gödderz f501e00e9d Bug fix/add shard id to replication client identifier (#9366) 2019-07-08 14:03:42 +02:00
Tobias Gödderz 79cd45f89c Wait for replication before inserting documents (#9151)
* Wait for replication before inserting documents

Also, increased some timeouts and fixed a log message

* Fixed some log levels and a log message

* Removed repair-distribute-shards-like-spec from greylisted tests
2019-05-31 16:09:20 +03:00
Jan 616ea94f24
Bug fix/cleanup 31032019 (#8632) 2019-04-01 17:14:11 +02:00
Jan Christoph Uhde c3f7961b88 apply unique log ids (#8561) 2019-03-25 20:26:51 +01:00
Dan Larkin-York f4c2347fbd Make Result final (#8157) 2019-02-15 20:05:30 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
Kaveh Vahedipour b9409f631a equalising devel and 3.4 in agency/cluster (#7755)
* equalising devel and 3.4 in agency/cluster
* need to move code higher up
* Correct two error codes (shutting down).
2018-12-17 09:50:26 +01:00
Jan 8305524250
nicer log messages with stringified status value (#7660)
instead of
```
2018-12-05T15:43:16Z [14614] P ERROR {maintenance} CancelBarrier: failed to send message to leader : status \x06
```
2018-12-05 18:05:39 +01:00
Michael Hackstein 2d73f04008
Bug fix 3.4/syncing of followers (#7377) (#7535)
* Added some DEBUG output for replication rest handler

* Some more debug logging.

* Increased the priority of the ReplicationHandler. This way we will not get stuck with locks that cannot be canceled. Also cancel the lock on the correct database.

* Added extensive log output for replication thins

* Added tombstones to RestReplicationHandler. In a very unlikely case the cancel of a lock can be executed BEFORE the code that actually registers the lock, in this case we will now write a tombstone and do not lock.

* Revert "Added extensive log output for replication thins"

This reverts commit 6d4e37ea1e59e3b3457336019cc7dbc4c979504d.

* Added extensive log output for replication things, now in ERR level instead of MAINTAINER only

* Now actually use hours for synchronization

* React to errors under soft lock if they show up.

* Added a retry loop to increase the read-lock timer.

* Added more timeing output in RocksDB collection internals to figure out why the followers are dropped

* Tweaked RocksDB options

* Revert "Tweaked RocksDB options"

This reverts commit 2bf9c43280beda4792c47d079387fe5154cdd896.

* Removed debug output

* Applied all requested changes by goedderz

* Deleted unused variable
2018-11-30 14:43:04 +01:00
Max Neunhöffer a16fbf5df3
Improve log messages. (#7521) 2018-11-29 11:30:52 +01:00
Jan b2924057e7
cleanup (#7507) 2018-11-28 19:42:37 +01:00
Tobias Gödderz 0d5f85e684 Fix error handling in case ClusterCommResult.result == nullptr (#7356) 2018-11-26 16:23:44 +01:00
Michael Hackstein 16d0874da5
Bug fix/synchronous replication catchup (#7146)
* merged fixes from 3.4

* odd fix

* Bug fix 3.4/sync repl release thread (#6784)

* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock

* Fixed insertion of query into registry in rest replication handler.

* Removed unnecessary / false asserts as suggested in review. Fixed code comments.

* Replaced auto with a correct type as suggested in review

* Added a helper function to validate if a query is in use in the registry

* Fixed logic bug in usage of query registry

* Fixed compile issue

* Automaticly transfrom int -> bool in initializerlist sucks...

* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.

* Today it seems that bools are too complicated for my brain.

* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future

* Applied chenges required by @goedderz in review

* Bug fix 3.4/shorter foot in door (#7084)

* Implement `syncCollectionCatchup` in DatabaseTailingSyncer.

First stab, might not even compile.

* Fixed a typo.

* Fix a typo and a compilation problem.

* Further compilation fix.

* Implement two stage catchup.

* Two small corrections.

* Unified error messages in Synchronize shard job.

* Improved a code comment.

* Fixed autocasting bool->double and double->bool issue. That is truely one of the best features ever invented... </irony>

* Renamed doHardLock => toSoftLockOnly and inverted default value

* Merged soft/hard foot logic with Transaction splits

* Use scopeguards to cancel readlocks

* Bug fix 3.4/sync replication allow soft and hard lock (#6864)

* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock

* Fixed insertion of query into registry in rest replication handler.

* Removed unnecessary / false asserts as suggested in review. Fixed code comments.

* Replaced auto with a correct type as suggested in review

* Added a helper function to validate if a query is in use in the registry

* Fixed logic bug in usage of query registry

* Fixed compile issue

* Implemented optional 'doHardLock' parameter in the replication acquire read-lock call. A hard-lock guarntees to stop all writes, a soft-lock may not.

* Fixed compile issue

* Automaticly transfrom int -> bool in initializerlist sucks...

* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.

* Today it seems that bools are too complicated for my brain.

* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future

* Applied chenges required by @goedderz in review

* Renamed doHardLock => toSoftLockOnly and inverted default value
2018-11-23 16:16:34 +01:00
Simon cc55ef9f82 Faster index creation (#7348) (#7383) 2018-11-21 09:53:14 +01:00
Wilfried Goesgens 05a7d4e96e add alternative to ClusterInfo::getCollection() that doesn't throw (#7339) 2018-11-20 16:05:57 +01:00
Dan Larkin-York 48c3fd3b7f Fix nullptr dereference in SynchronizeShard. (#7268) 2018-11-08 14:13:00 +01:00
Simon 5b71dff64f RocksDB replication thread safety (#7088) 2018-10-29 18:09:46 +01:00
Simon 10dc287eb3 Silence Tsan warnings (#7075) 2018-10-25 15:50:39 +02:00
Jan 221d036d5d
Bug fix/fix catch test issues (#7044) 2018-10-25 11:39:55 +02:00
Tobias Gödderz 102d17de89 Rework move shards with view test (#6773)
* Fixed testSetup(). Reduced redundant code.

* Reworked assertions in moving-shards-with-arangosearch-view-cluster.js

* Added changes from review

* Removed debug output / fixed jslint error
2018-10-11 10:25:22 +02:00
Max Neunhöffer 282a1a7193
Fix a bug when getting in sync and old requests are still lingering. (#6788) 2018-10-10 16:30:05 +02:00
Max Neunhöffer 79bade7e6b
This is porting from 3.4 a cleanup in Current (follower removed from plan). (#6718)
* Fix cleanup of Current entry in case a follower is removed from Plan. (#6623)
* Properly remove unplanned followers in leader and Current.
* Add a catch test.
* Fix tests.
* Fix a bug with a temporary object.
* Protect against exception from getCollection not found.
* New Maintenance test data.
2018-10-09 15:29:42 +02:00
Dan Larkin-York 1f63f16396 Move some logging off of general topic. 2018-10-01 13:28:11 -04:00
Simon 0fa7f01c66 Resilience test failure points (#6539) 2018-09-20 01:05:10 +02:00
Kaveh Vahedipour 8bd834bcf7 Maintenance delayed by incomplete hashing maintenance actions (#6448) 2018-09-14 17:44:32 +02:00
Simon 22b9c31c13 Removing ClusterComm ClientTransactionID (#6294) 2018-09-12 22:15:16 +02:00
Kaveh Vahedipour 6b2733625c Feature/static const strings cleanup (#6352)
* AgentConfiguration cleanup
* static strings in maintenance / agency
* more strings unified
* fix windows build
2018-09-11 13:40:03 +02:00
Jan 07abfca588
Bug fix/cleanup 020918 (#6338) 2018-09-03 12:56:41 +02:00
Jan 5022ccc24d
Bug fix/fixes 2508 (#6254) 2018-08-27 21:36:39 +02:00
Lars Maier 5555bd2fad Schmutz++ Improved (#6259)
* Fixed startup order. Don't start maintenance threads in single-server or agent.
Added range check for `--server.maintenance-threads`.
Fixed invalid array access, when shard exists locally but not in plan.
* Removed unused header imports.
* Added CHANGELOG entry
* Fixed shutdown bug. Startup fixed.
* Fixed catch test.
* Add Maintenance improvements to NewFeature34.md.
2018-08-27 20:25:09 +02:00
jsteemann 08ee458608 blind attempt to fix MacOS compile error 2018-08-24 13:57:33 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00