Max Neunhöffer
d86f27bd19
Bug fix/agency leader timeouts ( #3373 )
...
* Send out empty heartbeats regardless of non-empty AppendEntriesRPC.
* Also improve logging:
Note if a log in the empty heartbeat sending takes > 0.01 s.
Clearly mark places where a leader resigns in logging.
Log if no empty heartbeat is sent out.
* Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses.
* Add debug logging for _lastAcked and challengeLeadership.
* Remove some unused code. Do not count ourselves in challengeLeadership.
* Removal of entire activation/deactivation mechanisms in agency
* TRI_microtime up to c++11
* added term to response to sendAppendEntries.
2017-10-06 10:11:51 +02:00
Kaveh Vahedipour
00650e6a3f
Bug fix/agency mt fixes ( #3158 )
...
* added debugging methods
* try to fix invalid access in case of error
* remove unused members
* bugfixes and comments
* all agency fixes in
* merge bug
* partially unguarded Agent::lead fixed
* all agency fixes in
* added nrBlocked to thread startup eval
* added nrBlocked to thread startup eval
* recombination of cases in State::get
* some maps replaced with unordered_maps
* optimized maps some
2017-08-30 10:43:51 +02:00
Frank Celler
ccd56c2571
Bug fix/inception typo ( #2756 )
...
small typo
2017-07-08 19:16:23 +02:00
Frank Celler
545e861829
Bug fix/agency prepare leading bug ( #2752 )
2017-07-08 17:08:30 +02:00
Kaveh Vahedipour
94cf025b34
check first time when ClusterComm is accessed, if not stopping
2017-05-18 11:48:41 +02:00
Kaveh Vahedipour
c998e37462
Inception should not fatally exit, when in shutdown
2017-05-17 11:54:11 +02:00
Kaveh Vahedipour
243d646f9e
avoid nullptr in Inception
2017-05-12 16:35:33 +02:00
Kaveh Vahedipour
7766c44aaa
all agency threads shutdown in their destructors if not stopping yet
2017-04-25 09:34:08 +02:00
Kaveh Vahedipour
1f81ce28b0
merge in cpp & js from 3.1.18 yet to do tests
2017-04-21 15:41:05 +02:00
Kaveh Vahedipour
4cc830b0df
merge from 3.1
2017-02-20 20:05:52 +01:00
jsteemann
b3ac54d065
remove global namespace include
2017-02-13 13:03:33 +01:00
Kaveh Vahedipour
76e5dec3d7
agent with less traffic
2017-02-10 17:03:15 +01:00
Max Neunhoeffer
883c11ea45
Handle the case that ClusterComm is already shut down gracefully.
...
This touches every single place where ClusterComm is being used.
2017-02-07 15:31:40 +01:00
Kaveh Vahedipour
f3cb1307a5
3.1 fixes backported to devel
2017-02-03 10:48:25 +01:00
jsteemann
fa917937c4
do not use namespaces in header files
2017-02-01 13:41:31 +01:00
Kaveh Vahedipour
169cf88c0b
too short timeouts for load situations
2017-01-11 08:58:31 +01:00
Kaveh Vahedipour
f34796b432
move resilience should now be correct as a test
2017-01-10 17:30:09 +01:00
Kaveh Vahedipour
fffba306a1
waitFor will report more paranoid
2017-01-10 13:51:31 +01:00
Kaveh Vahedipour
5b3d95298b
agent restart from persistence with complete set of new endpoints
2017-01-03 15:39:52 +01:00
Kaveh Vahedipour
449800d922
agent id is in configuration part
2017-01-03 09:35:33 +01:00
Kaveh Vahedipour
466d645545
it is probably a must to continue if leader cannot be reached
2017-01-03 09:26:45 +01:00
Kaveh Vahedipour
bd28896b69
do not resend inception message, if their leaderId and id are the same
2017-01-03 08:43:27 +01:00
Kaveh Vahedipour
f380ebae31
remove deceased agents from AgencyComm
2017-01-02 18:50:26 +01:00
Kaveh Vahedipour
9d5a5537ce
remove deceased agents from AgencyComm
2017-01-02 17:12:00 +01:00
Kaveh Vahedipour
a2ee40d4f3
restarting agents inform rest of their new endpoints
2017-01-02 15:58:38 +01:00
Kaveh Vahedipour
5db9ec52ec
investigation into agency comm errors
2016-12-28 11:45:57 +01:00
Kaveh Vahedipour
034961142a
constituent does elections more efficiently
2016-12-19 17:19:58 +01:00
Kaveh Vahedipour
0e29e93816
race condition in agency when leader impaired
2016-12-19 15:00:32 +01:00
Kaveh Vahedipour
0d3c1b16d9
faily confident about sendWithFailover
2016-12-16 17:55:10 +01:00
Kaveh Vahedipour
1312c59b6e
1st stage of fixing sendWithFailover
2016-12-16 15:23:24 +01:00
Kaveh Vahedipour
043c0bd92f
cannot depend on Slice.getDouble
2016-12-15 15:32:09 +01:00
Kaveh Vahedipour
842d1030f0
Fixed dangling UUID problem in missing database directory
2016-12-13 15:36:19 +01:00
Kaveh Vahedipour
2b9c018817
fixed resilience
2016-12-09 16:35:32 +01:00
Kaveh Vahedipour
b20de61ae8
Useful logging in failure mode of inception's persistence
2016-12-07 14:29:48 +01:00
Kaveh Vahedipour
9bf7d3fb5b
Useful logging in failure mode of inception's persistence
2016-12-07 12:12:49 +01:00
Kaveh Vahedipour
3b4266962a
Fatal error exit missing
2016-12-07 12:06:38 +01:00
Kaveh Vahedipour
47463a2f1c
Agency startup redone after revisit of design document
2016-12-07 11:56:41 +01:00
Kaveh Vahedipour
51b279346b
redirects to myelf should be hinstory
2016-12-06 17:10:15 +01:00
Kaveh Vahedipour
3a1a9c898c
correct handling of distributeShardsLike in FailedFollower
2016-12-05 15:44:53 +01:00
Kaveh Vahedipour
77c8c51865
FailedFollower and Windows build problmes
2016-11-30 15:39:10 +01:00
Kaveh Vahedipour
5c3f5f8013
AddFollower added to supervisionm
2016-11-29 17:28:17 +01:00
Kaveh Vahedipour
7587053ffc
some minor warnings
2016-11-29 13:13:25 +01:00
Kaveh Vahedipour
acf8a12c53
more reliable RAFT timing\?
2016-11-29 12:52:10 +01:00
Kaveh Vahedipour
8bf05ecf8c
Some verbosity in RAFT startup
2016-11-29 11:06:08 +01:00
Kaveh Vahedipour
029ff44bb0
Merge back FMH to devel
2016-11-25 16:03:13 +01:00
Kaveh Vahedipour
41e1ba144f
general transactions in agency comm
2016-11-25 09:24:41 +01:00
kvahed
64db982815
broken agency fixed. the list of active agents failed to fill.
2016-11-24 15:43:38 +01:00
Max Neunhoeffer
4f23998bb9
Wait with Inception gossip until maintenance mode is off.
2016-11-24 13:34:28 +01:00
Kaveh Vahedipour
62492195e9
Recapsulating MUTEX in key value Store
2016-11-04 11:42:08 +01:00
Kaveh Vahedipour
8185588bb2
double percision conserved for timeout in AgencyComm
2016-11-03 16:47:38 +01:00