1
0
Fork 0
Commit Graph

583 Commits

Author SHA1 Message Date
Jan 9b5d75071d make replication timeouts configurable via startup options (#10472)
* make replication timeouts configurable via startup options

The following options are available (for active failover
and master-slave replication):

    --replication.connect-timeout
    --replication.request-timeout

Values can be specified in seconds. If these options are used, they will
be used for replication requests, overriding any hard-coded defaults or
explicitly configured timeouts.

Additionally, this change increases the default request timeout
for replication from 10 minutes to 20 minutes.

* do *not* change timeouts

* make tests work again
2019-11-19 19:45:40 +03:00
Jan 34481860a9 cover more cases of "unique constraint violated" issues during replication (#9828)
* release version 3.4.8

* cover more cases of "unique constraint violated" issues during
replication

* add more testing
2019-09-11 12:48:08 +03:00
Tobias Gödderz 4ada35f20c [3.4] Bug fix 3.4/add shard id to replication client identifier (#9466)
* Bug fix/add shard id to replication client identifier (#9366)

* Fixed compile (but not linker) errors

* Backported ReplicationClientProgressTracker

* Fixed compile errors, fixed hash function

* No longer use SyncerId for real asynchronous replication

* Updated docu

* Try to fix compile error on windows

* Fixed a bug

* Removed old code

* Fixed CHANGELOG entry
2019-09-11 12:45:57 +03:00
Jan fdbf2eb1e8
various replication improvements: (#9674)
- better debuggability (more log details)
- shorter minimum wait delay in active failover
- fixed too early pruning of WAL files on leaders
2019-08-12 10:44:50 +02:00
Jan 0133f3c48d
increase the timeout for more reliable test results (#8735) 2019-04-15 10:16:42 +02:00
Jan d29d65face
Bug fix 3.4/cleanup 31032019 (#8633) 2019-04-01 17:02:00 +02:00
Jan 3631d55146
prevent PregelFeature from shutting down while its workers are running (#8627) 2019-03-29 18:22:37 +01:00
jsteemann ccf2a1e59d fix two minor issues reported by cppcheck 2019-03-21 19:42:40 +01:00
Jan f621fe6076
revert a previous change that caused existing system collections on a slave to be truncated instead of being deleted (#8443)
truncating instead of deleting introduced the possibility of the collection's indexes continuing to exist with different ids on the slave than on the master, leading to potential follow-up problems
2019-03-18 20:56:50 +01:00
Jan 30ddb98659
try an incremental sync when restarting a follower in active failover mode (#8364) 2019-03-12 15:28:00 +01:00
Jan cd5c9edce1
various replication improvements (#8300) 2019-03-11 13:07:43 +01:00
Jan ca53f5b503
abort ongoing transactions in all cases (#8290) 2019-02-28 14:41:22 +01:00
Jan 8e3fb5dfc7
Feature 3.4/improve replication speed (#8268) 2019-02-28 14:37:40 +01:00
Simon a52e6fa3d3 Sync Foxx Queues (#8254) 2019-02-25 17:16:26 +01:00
Jan 4ee7ff1932
yet some more replication tests adjustments (#8101) 2019-02-04 16:53:15 +01:00
Jan 8a16a4b3ae update velocypack (#8075) 2019-01-31 17:31:54 +01:00
Jan 675bb78552
more debugging for replication (#8062) 2019-01-30 21:23:00 +01:00
Jan 15852cb491
Bug fix 3.4/address jenkins fails (#7985) 2019-01-22 12:32:17 +01:00
KVS85 dfad8906d9 Bug fix/active failover fix windows 3.4 (#7959)
* Backport active-failover fix for Windows into 3.4

* Backport stop/resume for Windows from devel

* Backport changes from devel into tests also

* Fix tests

* Remove forgotten whitespaces
2019-01-16 11:08:48 +01:00
Matthew Von-Maszewski 474f0cde31 Bug fix 3.4/scheduler empty reformat (#7872)
* added check for empty scheduler

* removed log, old is 1 not 0

* require running in this thread

* test

* added isDirect to callback

* signature fixed

* added drain

* added allowDirectHandling

* disabled for testing

* Add ExecContextScope object to direct call.

* try alternate initialization of ExecContextScope

* remove ExecContextScope, no help.  try _fifoSize as part of direct decision.

* strand management to minimize reuse of same strand per listen socket

* blind attempt to address Jenkins shutdown lock up.  may remove quickly.

* add filename and line to existing error log message

* Adjust queueOperation() to stop accepting items once isStopping() becomes true.

* revert previous check-in to MMFilesCollectorThread.cpp

* big reformat

* fixed merge conflicts

* Add CHANGELOG entry.
2019-01-08 20:39:42 +01:00
Jan 9c099ba5da
multiplex REPLICATION-APPLIER-STATE files for RocksDB engine (#7897) 2019-01-08 14:26:09 +01:00
Frank Celler 9477af198b big reformat 2018-12-26 00:57:05 +01:00
Simon 1498c08084 fix restrictCollections parameter on database level replication (#7808) 2018-12-19 18:00:10 +01:00
Jan 677522991e
Feature/internal 3306 (#7683) (#7688) 2018-12-06 17:46:58 +01:00
Simon 933ca8a775 Bug fix/restore index refactor (#7470) (#7491)
(cherry picked from commit d0efd95a37)
2018-11-29 14:08:29 +01:00
jsteemann f907dcebbd increase shutdown time 2018-11-27 13:44:18 +01:00
Simon 96346a12d0 switch default message for requireFromPresent (#7439) (#7450)
(cherry picked from commit f90b48f792)
2018-11-26 09:16:48 +01:00
Michael Hackstein 8098bb4eed
Bug fix 3.4/syncing of followers (#7377)
* Added some DEBUG output for replication rest handler

* Some more debug logging.

* Increased the priority of the ReplicationHandler. This way we will not get stuck with locks that cannot be canceled. Also cancel the lock on the correct database.

* Added extensive log output for replication thins

* Added tombstones to RestReplicationHandler. In a very unlikely case the cancel of a lock can be executed BEFORE the code that actually registers the lock, in this case we will now write a tombstone and do not lock.

* Revert "Added extensive log output for replication thins"

This reverts commit 6d4e37ea1e59e3b3457336019cc7dbc4c979504d.

* Added extensive log output for replication things, now in ERR level instead of MAINTAINER only

* Now actually use hours for synchronization

* React to errors under soft lock if they show up.

* Added a retry loop to increase the read-lock timer.

* Added more timeing output in RocksDB collection internals to figure out why the followers are dropped

* Tweaked RocksDB options

* Revert "Tweaked RocksDB options"

This reverts commit 2bf9c43280beda4792c47d079387fe5154cdd896.

* Removed debug output

* Applied all requested changes by goedderz

* Deleted unused variable
2018-11-23 16:08:27 +01:00
Simon ebad3c3c83 Fix restore of views, add --view option (#7425) (#7427)
(cherry picked from commit c584527d79)
2018-11-23 09:11:33 +01:00
Simon ef239cbe4e Make recovery more reliable (#7297) (#7367) 2018-11-21 16:51:38 +01:00
Jan a5f4fe4a22
dont update lastProcessedTick too early (#7381) 2018-11-20 17:54:30 +01:00
Simon 5124633e6a Faster index creation (#7348) 2018-11-20 13:41:01 +01:00
Simon 0d955554f2 Use shared_ptr for LogicalCollection (#7220) (#7244) 2018-11-07 10:43:08 +01:00
Vasiliy 1ba23cd39b issue 496.5: backport 3.4: minor API cleanup and error reportin enhancements, update iresearch to commit d69f7bd184e009da7bf0a478efd34a0c85b74291 (#7217)
* issue 496.5: backport 3.4: minor API cleanup and error reportin enhancements, update iresearch to commit d69f7bd184e009da7bf0a478efd34a0c85b74291

* add workaround for shell-collection-rocksdb-noncluster.js::testSystemSpecial test failure

* fix typo
2018-11-05 16:17:41 +03:00
Vasiliy d644561f1f issue 496.4.1: backport 3.4: move StorageEngine-specific flag out of the genric API and closer to the storage engine (#7213)
* issue 496.4.1: backport 3.4: move StorageEngine-specific flag out of the genric API and closer to the storage engine

* address merge issue
2018-11-04 16:52:54 +03:00
Simon 9c53d045be Server stream cursor (#7186) (#7210) 2018-11-03 20:17:52 +01:00
Jan eb3deb578f
smaller changes for replication (#7201) 2018-11-02 15:47:49 +01:00
Vasiliy 850919178f issue 496.3: backport 3.4: move more coordinator-related logic out of TRI_vocbase_t, rename some arangosearch view configuration parameters, remove some consolidation policies, update iresearch to revision 6fd9760d81b136f769e277ea5b8f53996ed7a1ca (#7167)
* issue 496.3: backport 3.4: move more coordinator-related logic out of TRI_vocbase_t, rename some arangosearch view configuration parameters, remove some consolidation policies, update iresearch to revision 6fd9760d81b136f769e277ea5b8f53996ed7a1ca

* address merge issue

* backport: remove code causing nullptr access

* invalidate payload for each field in FieldIterator before setting a value

* address compilation issues
2018-11-01 23:12:39 +03:00
Simon 9271a11441 RocksDB replication thread safety (#7088) (#7131) 2018-10-30 11:24:17 +01:00
Michael Hackstein e05880895a
Bug fix 3.4/shorter foot in door (#7084)
* Implement `syncCollectionCatchup` in DatabaseTailingSyncer.

First stab, might not even compile.

* Fixed a typo.

* Fix a typo and a compilation problem.

* Further compilation fix.

* Implement two stage catchup.

* Two small corrections.

* Unified error messages in Synchronize shard job.

* Improved a code comment.

* Fixed autocasting bool->double and double->bool issue. That is truely one of the best features ever invented... </irony>

* Renamed doHardLock => toSoftLockOnly and inverted default value

* Merged soft/hard foot logic with Transaction splits

* Use scopeguards to cancel readlocks
2018-10-26 16:16:52 +02:00
Michael Hackstein 94a793fe61
Removed incorrect skipping of Batches in RocksDB Tailing syncer (#7021)
* Removed incorrect skipping of Batches in RocksDB Tailing syncer. This caused issues, whenever one transaction was spiltted.

* Added a test for Splitting a large transaction in RocksDB

* Reactivated skipping in RocksDB Wal Tailing (reverts initial fix)

* Actually include lastScannedTick in CollectionFinalize. Proper fix, kudos to @jsteemann.

* Fixed healFollower task in split-large-transaction test
2018-10-25 14:13:40 +02:00
Jan 6eeb9eab86
backport some replication debugging from devel (#7069) 2018-10-25 14:08:26 +02:00
Vasiliy 52e2c97693 backport missed changes (#7016) 2018-10-24 15:43:45 +03:00
Jan 1002928b5f
fix https://github.com/arangodb/release-3.4/issues/99 (#6951) 2018-10-19 15:26:23 +02:00
Jan fa235feeb2
attempt to fix https://github.com/arangodb/release-3.4/issues/96 (#6953) 2018-10-18 12:41:13 +02:00
Matthew Von-Maszewski a9ce39f85c Bugfix 3.4: Merge scheduler changes by Michael & Frank into recent overlapping code changes (#6928)
* manual recreation of bug-fix-3.4/scheduler-high-low within recent Scheduler changes.

* restore Documentation that was unintentionally deleted
2018-10-16 22:51:00 +02:00
Simon 7eda6768ab Refactor stuff, add async batch extension task (#6875)
* Refactor stuff, add async batch extension task

* fix compilation
2018-10-15 11:43:45 +02:00
Simon 010b54c81e Allow WAL logger to split up transactions (#6800) 2018-10-12 15:15:55 +02:00
Jan 815adaa56f
Bug fix 3.4/fixes 110918 2 (#6845) 2018-10-12 12:48:41 +02:00
Jan 7290380dc7
Bug fix 3.4/increase replication timeouts (#6741) 2018-10-08 09:40:58 +02:00