1
0
Fork 0
Commit Graph

43 Commits

Author SHA1 Message Date
Kaveh Vahedipour d11c0954e8 Missing post failure on readlock in SynchronizeShard (#7860)
* when post cannot be delivered for readlock waiting for put meningless

* devel pull

* we can rely on POST alone

* clean up

* wrong errorCode tested

* fixed nullptr

* correct timeout

* fixed nullptr

* merge devel

* log level

* maintenace default log level raised to INFO, shard synchronisation only logs start/success all WARN/ERR demoted to DEBUG

* message unification

* devel merge

* fixed query registry

* tobias remarks

* mchacki part one

* static string for rebootid

* working tests

* use the reboot tracker from clusterinfo

* use the reboot tracker from clusterinfo

* rebootid be rebootid

* fixed serverid and all working

* static strings

* callguard out of reboottracker

* call callbacks, when query is ditched

* clear priv

* change log

* my oh my

* simon attention

* merge seems fine

* typo

* fix headers
2019-11-26 17:46:26 +01:00
Jan 98880f3937
clean up a bit (#10391) 2019-11-11 09:28:18 +01:00
Simon 4ed945a04b Improve Connection pool robustness (#10268) 2019-10-30 17:30:50 +01:00
Simon 4d8c909666 A mixed bag of goods (#10220) 2019-10-14 17:02:36 +02:00
Dan Larkin-York 13e24b2db9 Convert many uses of ClusterComm to Fuerte (#10154) 2019-10-10 14:03:33 +02:00
Tobias Gödderz e2c84acfaf Use explicit default destructors where possible (#10166) 2019-10-04 15:58:30 +02:00
Dan Larkin-York a83c2323c9 Refactor ApplicationServer stack (#9965) 2019-09-25 17:31:59 +02:00
Jan f01385e969
Bug fix/multi bugs (#9789) 2019-08-26 13:11:59 +02:00
Jan 4ea95bc109
ascii art (#9565) 2019-07-25 13:03:30 +02:00
Tobias Gödderz 7e98f56cf5 Bug fix/clean replication api wal tracking (#9473) 2019-07-19 15:44:14 +02:00
Jan c52f2a8315
refactoring (#9411) 2019-07-09 11:15:52 +02:00
Tobias Gödderz f501e00e9d Bug fix/add shard id to replication client identifier (#9366) 2019-07-08 14:03:42 +02:00
Tobias Gödderz 79cd45f89c Wait for replication before inserting documents (#9151)
* Wait for replication before inserting documents

Also, increased some timeouts and fixed a log message

* Fixed some log levels and a log message

* Removed repair-distribute-shards-like-spec from greylisted tests
2019-05-31 16:09:20 +03:00
Jan 616ea94f24
Bug fix/cleanup 31032019 (#8632) 2019-04-01 17:14:11 +02:00
Jan Christoph Uhde c3f7961b88 apply unique log ids (#8561) 2019-03-25 20:26:51 +01:00
Dan Larkin-York f4c2347fbd Make Result final (#8157) 2019-02-15 20:05:30 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
Kaveh Vahedipour b9409f631a equalising devel and 3.4 in agency/cluster (#7755)
* equalising devel and 3.4 in agency/cluster
* need to move code higher up
* Correct two error codes (shutting down).
2018-12-17 09:50:26 +01:00
Jan 8305524250
nicer log messages with stringified status value (#7660)
instead of
```
2018-12-05T15:43:16Z [14614] P ERROR {maintenance} CancelBarrier: failed to send message to leader : status \x06
```
2018-12-05 18:05:39 +01:00
Michael Hackstein 2d73f04008
Bug fix 3.4/syncing of followers (#7377) (#7535)
* Added some DEBUG output for replication rest handler

* Some more debug logging.

* Increased the priority of the ReplicationHandler. This way we will not get stuck with locks that cannot be canceled. Also cancel the lock on the correct database.

* Added extensive log output for replication thins

* Added tombstones to RestReplicationHandler. In a very unlikely case the cancel of a lock can be executed BEFORE the code that actually registers the lock, in this case we will now write a tombstone and do not lock.

* Revert "Added extensive log output for replication thins"

This reverts commit 6d4e37ea1e59e3b3457336019cc7dbc4c979504d.

* Added extensive log output for replication things, now in ERR level instead of MAINTAINER only

* Now actually use hours for synchronization

* React to errors under soft lock if they show up.

* Added a retry loop to increase the read-lock timer.

* Added more timeing output in RocksDB collection internals to figure out why the followers are dropped

* Tweaked RocksDB options

* Revert "Tweaked RocksDB options"

This reverts commit 2bf9c43280beda4792c47d079387fe5154cdd896.

* Removed debug output

* Applied all requested changes by goedderz

* Deleted unused variable
2018-11-30 14:43:04 +01:00
Max Neunhöffer a16fbf5df3
Improve log messages. (#7521) 2018-11-29 11:30:52 +01:00
Jan b2924057e7
cleanup (#7507) 2018-11-28 19:42:37 +01:00
Tobias Gödderz 0d5f85e684 Fix error handling in case ClusterCommResult.result == nullptr (#7356) 2018-11-26 16:23:44 +01:00
Michael Hackstein 16d0874da5
Bug fix/synchronous replication catchup (#7146)
* merged fixes from 3.4

* odd fix

* Bug fix 3.4/sync repl release thread (#6784)

* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock

* Fixed insertion of query into registry in rest replication handler.

* Removed unnecessary / false asserts as suggested in review. Fixed code comments.

* Replaced auto with a correct type as suggested in review

* Added a helper function to validate if a query is in use in the registry

* Fixed logic bug in usage of query registry

* Fixed compile issue

* Automaticly transfrom int -> bool in initializerlist sucks...

* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.

* Today it seems that bools are too complicated for my brain.

* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future

* Applied chenges required by @goedderz in review

* Bug fix 3.4/shorter foot in door (#7084)

* Implement `syncCollectionCatchup` in DatabaseTailingSyncer.

First stab, might not even compile.

* Fixed a typo.

* Fix a typo and a compilation problem.

* Further compilation fix.

* Implement two stage catchup.

* Two small corrections.

* Unified error messages in Synchronize shard job.

* Improved a code comment.

* Fixed autocasting bool->double and double->bool issue. That is truely one of the best features ever invented... </irony>

* Renamed doHardLock => toSoftLockOnly and inverted default value

* Merged soft/hard foot logic with Transaction splits

* Use scopeguards to cancel readlocks

* Bug fix 3.4/sync replication allow soft and hard lock (#6864)

* First attempt to not block the thread that requires the EXCLUSIVE sync-up lock

* Fixed insertion of query into registry in rest replication handler.

* Removed unnecessary / false asserts as suggested in review. Fixed code comments.

* Replaced auto with a correct type as suggested in review

* Added a helper function to validate if a query is in use in the registry

* Fixed logic bug in usage of query registry

* Fixed compile issue

* Implemented optional 'doHardLock' parameter in the replication acquire read-lock call. A hard-lock guarntees to stop all writes, a soft-lock may not.

* Fixed compile issue

* Automaticly transfrom int -> bool in initializerlist sucks...

* Inverted boolen logic bug hidden due to int->bool beeing logically inverted.

* Today it seems that bools are too complicated for my brain.

* Removed failure point, didn't write a test for it, and it is hard to write it in the current test environment. Need to find a better solution in future

* Applied chenges required by @goedderz in review

* Renamed doHardLock => toSoftLockOnly and inverted default value
2018-11-23 16:16:34 +01:00
Simon cc55ef9f82 Faster index creation (#7348) (#7383) 2018-11-21 09:53:14 +01:00
Wilfried Goesgens 05a7d4e96e add alternative to ClusterInfo::getCollection() that doesn't throw (#7339) 2018-11-20 16:05:57 +01:00
Dan Larkin-York 48c3fd3b7f Fix nullptr dereference in SynchronizeShard. (#7268) 2018-11-08 14:13:00 +01:00
Simon 5b71dff64f RocksDB replication thread safety (#7088) 2018-10-29 18:09:46 +01:00
Simon 10dc287eb3 Silence Tsan warnings (#7075) 2018-10-25 15:50:39 +02:00
Jan 221d036d5d
Bug fix/fix catch test issues (#7044) 2018-10-25 11:39:55 +02:00
Tobias Gödderz 102d17de89 Rework move shards with view test (#6773)
* Fixed testSetup(). Reduced redundant code.

* Reworked assertions in moving-shards-with-arangosearch-view-cluster.js

* Added changes from review

* Removed debug output / fixed jslint error
2018-10-11 10:25:22 +02:00
Max Neunhöffer 282a1a7193
Fix a bug when getting in sync and old requests are still lingering. (#6788) 2018-10-10 16:30:05 +02:00
Max Neunhöffer 79bade7e6b
This is porting from 3.4 a cleanup in Current (follower removed from plan). (#6718)
* Fix cleanup of Current entry in case a follower is removed from Plan. (#6623)
* Properly remove unplanned followers in leader and Current.
* Add a catch test.
* Fix tests.
* Fix a bug with a temporary object.
* Protect against exception from getCollection not found.
* New Maintenance test data.
2018-10-09 15:29:42 +02:00
Dan Larkin-York 1f63f16396 Move some logging off of general topic. 2018-10-01 13:28:11 -04:00
Simon 0fa7f01c66 Resilience test failure points (#6539) 2018-09-20 01:05:10 +02:00
Kaveh Vahedipour 8bd834bcf7 Maintenance delayed by incomplete hashing maintenance actions (#6448) 2018-09-14 17:44:32 +02:00
Simon 22b9c31c13 Removing ClusterComm ClientTransactionID (#6294) 2018-09-12 22:15:16 +02:00
Kaveh Vahedipour 6b2733625c Feature/static const strings cleanup (#6352)
* AgentConfiguration cleanup
* static strings in maintenance / agency
* more strings unified
* fix windows build
2018-09-11 13:40:03 +02:00
Jan 07abfca588
Bug fix/cleanup 020918 (#6338) 2018-09-03 12:56:41 +02:00
Jan 5022ccc24d
Bug fix/fixes 2508 (#6254) 2018-08-27 21:36:39 +02:00
Lars Maier 5555bd2fad Schmutz++ Improved (#6259)
* Fixed startup order. Don't start maintenance threads in single-server or agent.
Added range check for `--server.maintenance-threads`.
Fixed invalid array access, when shard exists locally but not in plan.
* Removed unused header imports.
* Added CHANGELOG entry
* Fixed shutdown bug. Startup fixed.
* Fixed catch test.
* Add Maintenance improvements to NewFeature34.md.
2018-08-27 20:25:09 +02:00
jsteemann 08ee458608 blind attempt to fix MacOS compile error 2018-08-24 13:57:33 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00