1
0
Fork 0
Commit Graph

163 Commits

Author SHA1 Message Date
Kaveh Vahedipour 09d2745625 [3.5] clean up your crap, dbservers. alright, i'll do it. (#9722)
* clean up your crap, dbservers. alright, i'll do it.

* ts ts ts

* body is shared_ptr

* Update CHANGELOG

* revert callback bodies to API specification

* array needs be inside so that multiple unobserves to same key are possible
2019-08-22 14:26:44 +03:00
Max Neunhöffer 02281d3be4
Handle InitDone correctly. (#8552)
* precondition plan / version in compaction / store TTL removal independent of local _ttl set
* Agency init loops break when shutting down.
* assertion failures in store on restarting following agents
* Minor porting fixes from 3.4
2019-04-01 17:01:05 +02:00
Jan 80a6e621ee
don't allocate memory so often in ClusterComm requests (#8550) 2019-03-26 00:31:56 +01:00
Kaveh Vahedipour 68178ba165 [devel] supervision bug fix backports (#8314)
* back ports for supervision fixes from 3.4 part 1

* back ports for supervision fixes from 3.4 part 2
2019-03-04 19:27:24 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
Kaveh Vahedipour a73023e512 Bug fix/agency update endpoints (#6519)
* update endpoints in agency done the RAFT way
* fix mock interface
* tests functioning with new agent interfacwe
* handling non-leader
2018-09-28 15:14:48 +02:00
Kaveh Vahedipour fe9b2fecdc notifyInactive has been lying aroung in the agent without being used. relique of the time, when we thought, that we would have an pool of agents from which we'd draw, if an agent failed (#6290) 2018-08-31 10:48:39 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00
Kaveh Vahedipour 5b307db85d Better log compaction 2018-07-16 12:09:58 +02:00
Kaveh Vahedipour 7df40fa905 backport agency fixes for replacing agent with total data loss (#5823) 2018-07-11 11:23:48 +02:00
Kaveh Vahedipour f4edcc7ba8 Bug fix/supervision engine starting early on leadership change (#5062)
* supervision must not work as long as agent is still preparing
* leadersince atomic and pushed to end of leader preparation
* More consistent use of integer types.
* Slightly change order of events in Supervision loop.
2018-04-10 15:28:26 +02:00
Simon 68442dae5a Fixing agency prefix in Agency/Job.cpp (#5039)
* Fixing some test issues and fixing the agency prefix in Agency/Job.cpp
* Making logic consistent in  failed- leader / follower job
* reverting condition back to == GOOD
2018-04-09 16:21:24 +02:00
Kaveh Vahedipour 2e2d947c1c devel: fixed the missed changes to plan after agency callback is registred f… (#4775)
* fixed the missed changes to plan after agency callback is registred for create collection
* Force check in timeout case.
* Sort out RestAgencyHandler behaviour for inquire.
* Take "ongoing" stuff out of AgencyComm.
2018-03-14 12:01:17 +01:00
Matthew Von-Maszewski ae77ff80c2 create independent executeLockedRead and executeLockedWrite to speed performance (#4177) 2017-12-29 12:02:27 +01:00
Max Neunhöffer 927027695d
Sort out locking agency to separate reads and writes. (#4174)
* disentagle writes and reads in agency
* renamed _oLock to _outputLock.  Documented read and write rules for _readDB and _commitIndex using _outputLock and _waitForCV.  Adjusted code to match rules.
* update executeLocked() knowing some callers use _readDB via readDB().  readDB() currently read only, but using write locks due to absolutely safe.
* Lay out clear rules against deadlock in agency.
* Avoid unprotected access to _commitIndex.
2017-12-28 11:27:20 +01:00
Max Neunhöffer 7bae6980e8
Bug fix/agent lead hanger (#4147)
* Really enforce the hidden option --server.maximal-threads if given.
* Switch off --log.force-direct in scripts/startStandAloneAgency.sh
* Lower the timeout for sending AppendEntriesRPC to 150s.
* Erase _earliestPackage when becoming a leader.
* Challenge leadership in agent main loop.
* Use steady_clock for _earliestPackage.
* Change _lastAcked and _leaderSince to steady_clock as well.
* time difference calculations based on old readSystemClock to steadyClockToDouble
* All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps
* Inception system_clock to steady_clock
2017-12-27 16:45:39 +01:00
Kaveh Vahedipour 2beaef41ff Bug fix/agencycomm validate methods broken (#3784) 2017-11-24 10:31:07 +01:00
Max Neunhöffer 766ab7c8cf
Fix agency shutdown bug. (#3683)
* Fix agency shutdown bug.
* Remove precondition that was not needed in AgencyComm::removeValues.
* Fail fatally if threads do not shut down.
2017-11-14 16:33:46 +01:00
Kaveh Vahedipour 7e816db51e Bug fix/agency restart enhancements (#3619)
* Removed unused active(...) method in Agent
* Inception's restart from persistence allows peer with empty active RAFT list to join
* Agency's UUID is persisted outside of the database comparable to coordinator and db server action.
* Publicized Methods to UUID stuff in ServerState
* Inception method documentation
* added --agency.disaster-recovery-id to allow for specification of known former agency id. this is a very dangerous option potentially.
* Delete a unused methods.
* separate _id and _recoveryId
* populating active list with entire pool
* Improve logging.
* reject gossip from unknown agent, if pool is complete
2017-11-10 23:40:26 +01:00
Max Neunhöffer 3c0ee6908b Bug fix/lead to agent (#3541) 2017-11-09 11:10:09 +01:00
Max Neunhöffer ee96c37237 Fix agency restart problems. (#3493)
* Fix agency restart problems (port from a 3.2 fix).

* Further fixes after Craneware rescue.
2017-10-25 18:05:58 +02:00
Kaveh Vahedipour 46333a762f Bug fix/agency restart after compaction and holes in log (#3413)
* State fixes holes in RAFT index range
* Avoid application of entries older than compaction index _cur and guard for unsigned overflow
2017-10-13 16:01:41 +02:00
Max Neunhöffer d86f27bd19 Bug fix/agency leader timeouts (#3373)
* Send out empty heartbeats regardless of non-empty AppendEntriesRPC.
* Also improve logging:
  Note if a log in the empty heartbeat sending takes > 0.01 s.
  Clearly mark places where a leader resigns in logging.
  Log if no empty heartbeat is sent out.
* Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses.
* Add debug logging for _lastAcked and challengeLeadership.
* Remove some unused code. Do not count ourselves in challengeLeadership.
* Removal of entire activation/deactivation mechanisms in agency
* TRI_microtime up to c++11
* added term to response to sendAppendEntries.
2017-10-06 10:11:51 +02:00
Max Neunhöffer 22e46978a6 Bug fix/sort out agency locks (#3306)
New locking concept in Agency. Ensure empty heartbeats can be sent, answered and processed without long locks. Adjust logging. Fix compaction bugs.
2017-09-27 15:22:30 +02:00
Kaveh Vahedipour 00650e6a3f Bug fix/agency mt fixes (#3158)
* added debugging methods

* try to fix invalid access in case of error

* remove unused members

* bugfixes and comments

* all agency fixes in

* merge bug

* partially unguarded Agent::lead fixed

* all agency fixes in

* added nrBlocked to thread startup eval

* added nrBlocked to thread startup eval

* recombination of cases in State::get

* some maps replaced with unordered_maps

* optimized maps some
2017-08-30 10:43:51 +02:00
Jan 49d2313c2c fix ub in agency compaction (#2736) 2017-07-09 20:45:00 +02:00
Frank Celler 545e861829 Bug fix/agency prepare leading bug (#2752) 2017-07-08 17:08:30 +02:00
Max Neunhöffer 3d8e590bee Adapt Raft timeouts dynamically and fix create collection timeout race
Various fixes.
2017-07-06 12:51:51 +02:00
Kaveh Vahedipour 72721f5b2b _preparing is enough for both persistent restart and leadership change in agency 2017-06-09 14:03:47 +02:00
Kaveh Vahedipour da0cc3490c Squashed commit of the following:
commit 3d9cf792912db1974b9ac5e00ca2b4c9245b7d34
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Fri Jun 2 15:32:43 2017 +0200

    optimise for single writes in agency log

commit 65056ab9026f9b4b211dda0f17c75602b978f2bf
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Fri Jun 2 15:01:15 2017 +0200

    More tests, taking agency log compaction interval into account.

commit 6600d707784e8fd5b62c0c75fd1826af09b8e13f
Merge: cf46882 02f00cc
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Fri Jun 2 14:50:38 2017 +0200

    Merge branch 'agency-log-compaction-overhaul' of ssh://github.com/ArangoDB/ArangoDB into agency-log-compaction-overhaul

commit 02f00cc271d027f02b0625afb76745bfa76bf833
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Fri Jun 2 14:34:41 2017 +0200

    compaction step and keep size defaults for 3.2

commit 03fc8fbff8f0ac701f7d7f94521c0c3152dd6f92
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Fri Jun 2 14:32:46 2017 +0200

    Constituent fatally exists if eletion ballot cannot be persisted

commit cf4688226fc897e74bb2d9ebdfca3ce4578c3b70
Merge: c727fc4 724bd1e
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Fri Jun 2 13:08:15 2017 +0200

    Merge branch 'agency-log-compaction-overhaul' of ssh://github.com/ArangoDB/ArangoDB into agency-log-compaction-overhaul

commit 724bd1efe19e2e9dbfc14cd819f180816b6d62d0
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Fri Jun 2 13:02:51 2017 +0200

    persistence success in agency state is properly evaluated

commit c727fc48bb93e7b135b3ca929c03221c7bcaddb9
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Fri Jun 2 12:04:55 2017 +0200

    Set default compaction step size in agency to 20000 and 10000 keep size.

commit ded16ae6945e9c1479e99bc2e7ccb4d6feca19a6
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Fri Jun 2 11:11:12 2017 +0200

    Fix help page in startStandAloneAgency.sh for --use-persistence.

commit 13ae9f40f649a8f92eeca4b16bbb5647b540722d
Merge: 834c7c9 aa3e8c1
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Thu Jun 1 23:41:34 2017 +0200

    Merge remote-tracking branch 'origin/devel' into agency-log-compaction-overhaul

commit 834c7c920d36db3579def66c38fb04870936f8bd
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Thu Jun 1 16:56:03 2017 +0200

    Handle error in recvAppendEntriesRPC properly.

commit bd9c8d03b76ad25d4078740b5bf994fdba3845d0
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Thu Jun 1 16:55:35 2017 +0200

    Handle errors in persist() and log() properly.

commit 5b4d2c3d9af078d6a1b5626af20dc9abf2546baa
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Thu Jun 1 16:25:22 2017 +0200

    Improve error reporting.

commit d60697c5f26d6592eecefc9b9a43e9b699d1773d
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Thu Jun 1 12:16:39 2017 +0200

    Agency must not responds to any requests after startup util leader has RAFT commited up to pre shutdown state

commit 92b8ede5fa022ace1596607abcf8fad1130504c8
Merge: 9340e74 d24455c
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Wed May 31 16:54:45 2017 +0200

    Merge remote-tracking branch 'origin/devel' into agency-log-compaction-overhaul

commit 9340e7461130a4783c09ad8d91e5a07f9500a045
Merge: 7b7ce9d 63a9d60
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Wed May 31 12:13:55 2017 +0200

    agency tests to cover compaction

commit 63a9d604c474eda4302032629dff1f0f69fa0813
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 31 11:59:11 2017 +0200

    Set agency.compaction-keep-size to at least 0.

commit ef842260968a4769d9502a701b7251da32647e52
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 31 11:49:34 2017 +0200

    Fix agent size 1 case (thread already gone).

commit 7b7ce9d79f6e8208c13f153b1b9a395b780d6ce1
Merge: 24e2e7e ff306bf
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Wed May 31 11:39:58 2017 +0200

    Merge branch 'agency-log-compaction-overhaul' of https://github.com/arangodb/arangodb into agency-log-compaction-overhaul

commit ff306bf547bc4f528c9b66e222271ac143029508
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 31 11:11:06 2017 +0200

    Move compaction into the future when we take a snapshot from leader.

commit 24e2e7e00f960928a79ce4008b8031d6b9b07fd9
Merge: 84034ac b3ea17a
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Wed May 31 11:01:13 2017 +0200

    Merge branch 'agency-log-compaction-overhaul' of https://github.com/arangodb/arangodb into agency-log-compaction-overhaul

commit b3ea17a219baa2abd5892819012fb59f440cdeb8
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 31 09:42:59 2017 +0200

    Get rid of double nonsense.

commit 035c8d1b34e1b73a381d5468422adf13b2ebc36a
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 31 09:25:28 2017 +0200

    Sort out Agent::load sequence.

commit 84034ac2809a77145d6b1d23bf44857b3a0c4651
Merge: eb34a2e 3180a9d
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Tue May 30 17:07:20 2017 +0200

    merging in

commit eb34a2e64e6ac8dc6571b92cb853c38b7022c833
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Tue May 30 16:58:05 2017 +0200

    keep persistence for restarts in standalone agency

commit 3180a9d9ce4a4401a55ef02606b020316d43cbe5
Merge: 5d60524 28b9580
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Tue May 30 16:56:56 2017 +0200

    Merge branch 'agency-log-compaction-overhaul' of ssh://github.com/ArangoDB/ArangoDB into agency-log-compaction-overhaul

commit 5d60524429d8ddda4491beecb931c3b9e3cc1d8a
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Tue May 30 16:56:36 2017 +0200

    Implement snapshot sending in AppendEntriesRCV.

commit 28b958054f51c9cb36706df4e4345aa0f726ed15
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Tue May 30 14:20:13 2017 +0200

    state machine should not advance _committed if empty

commit df18f326acea7f5bc2660a37e22f1503952e4b41
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 23:39:47 2017 +0200

    Store term with compaction snapshot, recover again.

commit 2551a48b6fb513c9ea934bce755f8c364dae2f05
Author: Kaveh Vahedipour <kaveh@vahedipour.de>
Date:   Mon May 29 17:45:26 2017 +0200

    indices renamed to closely match RAFT documentation

commit e62dcdecf6e8650cfa5725d91b809d05591b48a4
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 16:37:43 2017 +0200

    More cleanup.

commit 9f4787c46621375f0361138a8961431eb21ce5c0
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 14:50:25 2017 +0200

    Revise loadLastCompactedSnapshot to return 0 without a snapshot.

commit 13285e1d70c8a4ac8c79a08de6f8fbc0f8d242bf
Merge: 3393c43 6c5f23e
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 12:06:20 2017 +0200

    Merge branch 'agency-log-compaction-overhaul' of ssh://github.com/ArangoDB/ArangoDB into agency-log-compaction-overhaul

commit 3393c43c75520c74d20df09c74fbbbd8b1af5976
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 12:03:47 2017 +0200

    More cleanup around Store::apply and friends.

commit 4ccb41d1839748c98e11403fa04f6a7d6af5e95b
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 09:45:41 2017 +0200

    Document apply methods in Store with comments.

commit ea05c4880fedb6fe535e24761ac5cb3c26ccfc20
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 09:45:18 2017 +0200

    Intentionally keep one log entry more to prevent empty log.

commit 67fb62f2259cc3c6368319917c7257ebcc177d3f
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 09:44:42 2017 +0200

    Improve plausibility checks at compaction time.

commit 0bafc368785b15a94f8783c4c929f4208f87d09c
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Thu May 25 00:51:51 2017 +0200

    Sort out (re-)building of agency K/V store(s) from compaction snapshots.

    This is in case of (re-)start, becoming a leader and when serving
    /_api/agency/store.

commit 46b0750bc6c597ec388aac0cdca32082c0cc54b8
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Thu May 25 00:50:54 2017 +0200

    Set compaction interval to 200 and keep to 0.

    This is way to small but tests should run with it.
    Will later increase numbers again.

commit 024dc0846ae30248b464dd481a8bbc1134f56983
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 24 23:29:57 2017 +0200

    Add a trivial test for agency log compaction.

commit e12fd3b46833419d7b436eeadd7246304324b891
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 24 23:26:03 2017 +0200

    First part of cleanup of agency log compaction.

    Now the right compaction snapshots are taken and persisted.
    Furthermore, the right log entries and old snapshots are removed
    after compaction, both the volatile and the persisted ones.
    The readDB and spearHead stay unchanged at compaction time as it
    should be.

commit d59901aea0c3ca31ef253299d2adc3353b79e664
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 24 12:18:26 2017 +0200

    Remove unused member variable.

commit 6c5f23eb7b42d9f20d4dadb2932a63add99f9c76
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 09:45:41 2017 +0200

    Document apply methods in Store with comments.

commit 670899f72d215e0fcc0ca0389cea9250a291e83b
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 09:45:18 2017 +0200

    Intentionally keep one log entry more to prevent empty log.

commit 660f61029917bbc2ce1fae3e4fc903095b023297
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Mon May 29 09:44:42 2017 +0200

    Improve plausibility checks at compaction time.

commit e2802e4b36d1f67d8361c1d8b0c92fbff696f439
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Thu May 25 00:51:51 2017 +0200

    Sort out (re-)building of agency K/V store(s) from compaction snapshots.

    This is in case of (re-)start, becoming a leader and when serving
    /_api/agency/store.

commit 12b43f1b91284a1185390d6dcfbd1e838522d392
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Thu May 25 00:50:54 2017 +0200

    Set compaction interval to 200 and keep to 0.

    This is way to small but tests should run with it.
    Will later increase numbers again.

commit c8b9a37a690b8e7e8bfa1276a3f9ba4b6b5a9c27
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 24 23:29:57 2017 +0200

    Add a trivial test for agency log compaction.

commit cf0c8c1fff666f76411082f87efe685a412ecebb
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 24 23:26:03 2017 +0200

    First part of cleanup of agency log compaction.

    Now the right compaction snapshots are taken and persisted.
    Furthermore, the right log entries and old snapshots are removed
    after compaction, both the volatile and the persisted ones.
    The readDB and spearHead stay unchanged at compaction time as it
    should be.

commit 0a4255359a57b8686133e6014e2b82b8079f36fa
Author: Max Neunhoeffer <max@arangodb.com>
Date:   Wed May 24 12:18:26 2017 +0200

    Remove unused member variable.
2017-06-02 16:13:03 +02:00
Max Neunhoeffer ff12815b2e Correct a typo in a variable name, improve comments. 2017-05-24 09:53:40 +02:00
Kaveh Vahedipour 8f8ebbcb03 agents wait for inception thread to finish before unprepare 2017-05-18 09:13:01 +02:00
Max Neunhoeffer 1684676924 Agency improvement: track ongoing transactions for inquire. 2017-04-28 17:24:30 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour a87fb6d71e restructured the leadership takeover 2017-03-17 15:44:58 +01:00
Kaveh Vahedipour b1299ec3b9 fixes from 3.1.11 in agency/state 2017-02-21 17:44:27 +01:00
Kaveh Vahedipour 4cc830b0df merge from 3.1 2017-02-20 20:05:52 +01:00
Kaveh Vahedipour 29d73b2e9c sendAppendEntries does resonable estimation of follower time needs leading to less frequent spamming of followers 2017-02-10 11:25:55 +01:00
Kaveh Vahedipour 3ee7a8d595 compaction thread tested and functional 2017-02-08 14:18:46 +01:00
Kaveh Vahedipour b931aa967a new compaction thread for agency 2017-02-07 14:16:22 +01:00
Kaveh Vahedipour 54ccffc0ee agencycommresult with clientids 2017-01-19 14:11:09 +01:00
Kaveh Vahedipour 3639e2ad5b inquire in agency interface adjusted 2017-01-19 11:33:01 +01:00
Kaveh Vahedipour aaee2f9e61 transient heartbeats 2017-01-18 13:43:33 +01:00
Kaveh Vahedipour 54dbf0a814 inquire interface and clientids 2017-01-17 17:33:12 +01:00
Kaveh Vahedipour 272324c506 towards clientids in agency transactions 2017-01-16 09:54:55 +01:00
Kaveh Vahedipour fffba306a1 waitFor will report more paranoid 2017-01-10 13:51:31 +01:00
Kaveh Vahedipour 5b3d95298b agent restart from persistence with complete set of new endpoints 2017-01-03 15:39:52 +01:00
Kaveh Vahedipour 67b53bb91b no need considering reportIn for updating endpoints 2017-01-03 09:40:03 +01:00
Kaveh Vahedipour f380ebae31 remove deceased agents from AgencyComm 2017-01-02 18:50:26 +01:00
Kaveh Vahedipour 9d5a5537ce remove deceased agents from AgencyComm 2017-01-02 17:12:00 +01:00