1
0
Fork 0
Commit Graph

126 Commits

Author SHA1 Message Date
Max Neunhöffer 02281d3be4
Handle InitDone correctly. (#8552)
* precondition plan / version in compaction / store TTL removal independent of local _ttl set
* Agency init loops break when shutting down.
* assertion failures in store on restarting following agents
* Minor porting fixes from 3.4
2019-04-01 17:01:05 +02:00
Jan Christoph Uhde c3f7961b88 apply unique log ids (#8561) 2019-03-25 20:26:51 +01:00
Manuel Pöter ecf4d9d62a Fix race conditions in thread management. (#8032) 2019-01-28 15:44:46 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
Kaveh Vahedipour a73023e512 Bug fix/agency update endpoints (#6519)
* update endpoints in agency done the RAFT way
* fix mock interface
* tests functioning with new agent interfacwe
* handling non-leader
2018-09-28 15:14:48 +02:00
Simon 22b9c31c13 Removing ClusterComm ClientTransactionID (#6294) 2018-09-12 22:15:16 +02:00
Lars Maier 63d9cfa081 Maintenance Fixes (#6284)
* Clean up for `FIXMEMAINTENANCE` comments: removed race condition, added errors and `notify()`s.
* Removed dublicated code.
* Added requested changes. Added error reporting for `UpdateCollection`.
* Make it compile. Add missing `notify()`.
* `CreateCollection` generates errors in all code paths.
* Fixed catch test.
2018-08-31 15:24:29 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00
Simon 545561e9a9 Read only server (#5652) 2018-07-03 09:58:16 +02:00
Matthew Von-Maszewski 0264f3bc9b update gossip loop to be more responsive to other agents (#5390) 2018-05-22 16:30:27 +02:00
Kaveh Vahedipour 34f66539bd inception ignored leaders configuration (#5387) 2018-05-22 10:14:12 +02:00
Simon 17b1a2aafb Rest middleware refactoring (#5332) 2018-05-14 17:43:10 +02:00
Wilfried Goesgens 7d6e580780 Refactoring & code cleanup (#5138) (#5142) 2018-04-24 14:42:23 +02:00
Max Neunhöffer 7bae6980e8
Bug fix/agent lead hanger (#4147)
* Really enforce the hidden option --server.maximal-threads if given.
* Switch off --log.force-direct in scripts/startStandAloneAgency.sh
* Lower the timeout for sending AppendEntriesRPC to 150s.
* Erase _earliestPackage when becoming a leader.
* Challenge leadership in agent main loop.
* Use steady_clock for _earliestPackage.
* Change _lastAcked and _leaderSince to steady_clock as well.
* time difference calculations based on old readSystemClock to steadyClockToDouble
* All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps
* Inception system_clock to steady_clock
2017-12-27 16:45:39 +01:00
Jan 282be208cc
remove TRI_usleep and TRI_sleep, and use std::this_thread::sleep_for … (#3817) 2017-12-06 18:43:49 +01:00
Kaveh Vahedipour 7e816db51e Bug fix/agency restart enhancements (#3619)
* Removed unused active(...) method in Agent
* Inception's restart from persistence allows peer with empty active RAFT list to join
* Agency's UUID is persisted outside of the database comparable to coordinator and db server action.
* Publicized Methods to UUID stuff in ServerState
* Inception method documentation
* added --agency.disaster-recovery-id to allow for specification of known former agency id. this is a very dangerous option potentially.
* Delete a unused methods.
* separate _id and _recoveryId
* populating active list with entire pool
* Improve logging.
* reject gossip from unknown agent, if pool is complete
2017-11-10 23:40:26 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
Max Neunhöffer d86f27bd19 Bug fix/agency leader timeouts (#3373)
* Send out empty heartbeats regardless of non-empty AppendEntriesRPC.
* Also improve logging:
  Note if a log in the empty heartbeat sending takes > 0.01 s.
  Clearly mark places where a leader resigns in logging.
  Log if no empty heartbeat is sent out.
* Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses.
* Add debug logging for _lastAcked and challengeLeadership.
* Remove some unused code. Do not count ourselves in challengeLeadership.
* Removal of entire activation/deactivation mechanisms in agency
* TRI_microtime up to c++11
* added term to response to sendAppendEntries.
2017-10-06 10:11:51 +02:00
Kaveh Vahedipour 00650e6a3f Bug fix/agency mt fixes (#3158)
* added debugging methods

* try to fix invalid access in case of error

* remove unused members

* bugfixes and comments

* all agency fixes in

* merge bug

* partially unguarded Agent::lead fixed

* all agency fixes in

* added nrBlocked to thread startup eval

* added nrBlocked to thread startup eval

* recombination of cases in State::get

* some maps replaced with unordered_maps

* optimized maps some
2017-08-30 10:43:51 +02:00
Frank Celler ccd56c2571 Bug fix/inception typo (#2756)
small typo
2017-07-08 19:16:23 +02:00
Frank Celler 545e861829 Bug fix/agency prepare leading bug (#2752) 2017-07-08 17:08:30 +02:00
Kaveh Vahedipour 94cf025b34 check first time when ClusterComm is accessed, if not stopping 2017-05-18 11:48:41 +02:00
Kaveh Vahedipour c998e37462 Inception should not fatally exit, when in shutdown 2017-05-17 11:54:11 +02:00
Kaveh Vahedipour 243d646f9e avoid nullptr in Inception 2017-05-12 16:35:33 +02:00
Kaveh Vahedipour 7766c44aaa all agency threads shutdown in their destructors if not stopping yet 2017-04-25 09:34:08 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour 4cc830b0df merge from 3.1 2017-02-20 20:05:52 +01:00
jsteemann b3ac54d065 remove global namespace include 2017-02-13 13:03:33 +01:00
Kaveh Vahedipour 76e5dec3d7 agent with less traffic 2017-02-10 17:03:15 +01:00
Max Neunhoeffer 883c11ea45 Handle the case that ClusterComm is already shut down gracefully.
This touches every single place where ClusterComm is being used.
2017-02-07 15:31:40 +01:00
Kaveh Vahedipour f3cb1307a5 3.1 fixes backported to devel 2017-02-03 10:48:25 +01:00
jsteemann fa917937c4 do not use namespaces in header files 2017-02-01 13:41:31 +01:00
Kaveh Vahedipour 169cf88c0b too short timeouts for load situations 2017-01-11 08:58:31 +01:00
Kaveh Vahedipour f34796b432 move resilience should now be correct as a test 2017-01-10 17:30:09 +01:00
Kaveh Vahedipour fffba306a1 waitFor will report more paranoid 2017-01-10 13:51:31 +01:00
Kaveh Vahedipour 5b3d95298b agent restart from persistence with complete set of new endpoints 2017-01-03 15:39:52 +01:00
Kaveh Vahedipour 449800d922 agent id is in configuration part 2017-01-03 09:35:33 +01:00
Kaveh Vahedipour 466d645545 it is probably a must to continue if leader cannot be reached 2017-01-03 09:26:45 +01:00
Kaveh Vahedipour bd28896b69 do not resend inception message, if their leaderId and id are the same 2017-01-03 08:43:27 +01:00
Kaveh Vahedipour f380ebae31 remove deceased agents from AgencyComm 2017-01-02 18:50:26 +01:00
Kaveh Vahedipour 9d5a5537ce remove deceased agents from AgencyComm 2017-01-02 17:12:00 +01:00
Kaveh Vahedipour a2ee40d4f3 restarting agents inform rest of their new endpoints 2017-01-02 15:58:38 +01:00
Kaveh Vahedipour 5db9ec52ec investigation into agency comm errors 2016-12-28 11:45:57 +01:00
Kaveh Vahedipour 034961142a constituent does elections more efficiently 2016-12-19 17:19:58 +01:00
Kaveh Vahedipour 0e29e93816 race condition in agency when leader impaired 2016-12-19 15:00:32 +01:00
Kaveh Vahedipour 0d3c1b16d9 faily confident about sendWithFailover 2016-12-16 17:55:10 +01:00
Kaveh Vahedipour 1312c59b6e 1st stage of fixing sendWithFailover 2016-12-16 15:23:24 +01:00
Kaveh Vahedipour 043c0bd92f cannot depend on Slice.getDouble 2016-12-15 15:32:09 +01:00
Kaveh Vahedipour 842d1030f0 Fixed dangling UUID problem in missing database directory 2016-12-13 15:36:19 +01:00
Kaveh Vahedipour 2b9c018817 fixed resilience 2016-12-09 16:35:32 +01:00