arangodb

Commit Graph

Author	SHA1	Message	Date
Lars Maier	a1bae63cf1	[3.4] Verbose Abort Reason (#8878 ) * Added reason to job abort method. * Additional abort that is not in devel.	2019-05-01 13:54:47 +02:00
Max Neunhöffer	54f84cab92	Performance tuning for many shards. (#8577 )	2019-03-29 21:34:45 +01:00
Max Neunhöffer	1365eebfac	Make AddFollower and RemoveFollower less aggressive. (#8477 ) * Make AddFollower and RemoveFollower less aggressive. * Adjust comment * Early exit in count loop. * Adjust comment in 2nd place. * CHANGELOG.	2019-03-21 15:27:22 +01:00
Max Neunhöffer	46e479376d	Further supervision fixes. (#8259 ) * Do not schedule Coordinators in Plan. * Finish failed server when server is no longer in health. * Fix removeServer checks. Check that server is no longer in use before removing it. Give 60s waiting time for condition to be met. Also observer agency lock. * Finish FailedFollower job if server no longer follower. This can happen because RemoveFollower was faster. * Only use GOOD servers as replacement followers. * Fix AddFollower for satellite collections. * Fix RemoveServer for satellite collections. * MoveShard handles moves from leader to followers * Prepare CleanoutServer and FailedServer for satellite collections. * More sorting out of AddFollower and RemoveFollower. * Fix RemoveFollower job w.r.t. choice of follower to remove. * Fix message. * kill you own sub jobs, please * Added preconditions to payloads for supervision's job finishers * Improve logging. * Add agency diagnostics to failed move shard test, start. * Add coordinator agency diagnostics. * Remove warning. * Add changelog entry. * Add agency diagnostics if things go sour with move shard. * Add agency diags when things go wrong 2. * API /_api/agency/state: back to old format. * Fix Windows compilation. * handle aborts in supervision and wait for the last Raft log to be committed * tests compiling, 2 failing for valid reasons * Correctly report TRI_ERROR_CLUSTER_CONNECTION_LOST as 503. * FailedLeader /FailedFollower cannot continue, when aborting blocks	2019-03-04 11:43:35 +01:00
Max Neunhöffer	b87f362f27	The big supervision fix. (#8243 ) * Updated CleanoutServerTests. Exclude servers in ToBeCleanedServers. Allow bad servers as new follower. * Prefer good servers. * Removed copy, sort and binary_search for a list of ~10 elements. * Fix move shard bug with compare. * MoveShard fixes, expansion of doForAllShards * Count only GOOD servers in actualReplicationFactor. * Make RemoveFollower remove broken servers. * Precondition on Plan Version for updating Current as leader. * CleanupServer to evict server from ToBeCleaned, when aborting * cleanoutserver with payload in finish * Use static string for ToBeCleanedOut. * Fixed typo in log message. * Change warning level. If a MoveShard job is aborted and we can no longer roll back, then we issue a WARNING rather than a DEBUG log message. * Another typo and log level. * Start to fix unit tests. * Does not make sense for AddFollowerTest to have a FAILED leader. * Only count GOOD followers in AddFollower. * Fix AddFollowerTest. * Report precondition failed in MoveShard follower case. * Add CHANGELOG.	2019-02-25 08:12:18 -05:00
Frank Celler	9477af198b	big reformat	2018-12-26 00:57:05 +01:00
Kaveh Vahedipour	28754cbf15	Feature/schmutz plus plus (#5972 ) - Schmutz now called "Maintenance" and completely implemented in C++ - Fix index locking bug in mmfiles - Fix a bug in mmfiles with silent option and repsert - Slightly increase supervision okperiod and graceperiod	2018-08-24 12:15:35 +02:00
Matthew Von-Maszewski	c0c149cf5b	Create non-throwing wrappers for Node access in Agency (#4598 ) * safety checkin of Node throw reduction. * final round of Node throw protection. Common accessors now protected to force code to hasAsXXX() functions.	2018-04-17 10:21:14 +02:00
Simon	68442dae5a	Fixing agency prefix in Agency/Job.cpp (#5039 ) * Fixing some test issues and fixing the agency prefix in Agency/Job.cpp * Making logic consistent in failed- leader / follower job * reverting condition back to == GOOD	2018-04-09 16:21:24 +02:00
Simon Grätzer	7c31960cf2	Feature/async failover (#3451 )	2017-10-18 23:59:29 +02:00
m0ppers	bb1d303473	Cmake 5.0 complains about unused lambda captures (#3390 )	2017-10-13 12:20:48 +02:00
Andreas Streichardt	439203dc3b	Better logging	2017-05-11 12:20:15 +02:00
Max Neunhoeffer	09ff77cce2	Make Windows VS compiler a bit happier.	2017-04-28 17:18:37 +02:00
Kaveh Vahedipour	1f81ce28b0	merge in cpp & js from 3.1.18 yet to do tests	2017-04-21 15:41:05 +02:00
Kaveh Vahedipour	4cc830b0df	merge from 3.1	2017-02-20 20:05:52 +01:00
jsteemann	b3ac54d065	remove global namespace include	2017-02-13 13:03:33 +01:00
Kaveh Vahedipour	76e5dec3d7	agent with less traffic	2017-02-10 17:03:15 +01:00
Kaveh Vahedipour	3f3633bd2c	supervision to proper preconditioning of jobs on plan	2017-01-27 15:29:22 +01:00
Kaveh Vahedipour	ab22ffa8ee	shard jobs should check for the plan to be the same as expected	2017-01-27 11:27:45 +01:00
Kaveh Vahedipour	c803d52f51	startLocalCluster handles port offset so that multiple clusters can be started on same machine	2017-01-27 09:33:42 +01:00
Kaveh Vahedipour	2b9c018817	fixed resilience	2016-12-09 16:35:32 +01:00
Kaveh Vahedipour	eddecc0a4c	clones method in Jobs more useful	2016-12-09 09:29:00 +01:00
Kaveh Vahedipour	c6ef45b64d	AddFollower to handle multiple followers at the same time	2016-12-08 15:12:05 +01:00
Kaveh Vahedipour	b930b23fc2	AddFollower jobs for newly arrived db server to satisfy replication factors	2016-12-07 16:20:47 +01:00
Frank Celler	e4ba82e8e9	rewrite of AgencyComm	2016-10-23 00:46:30 +02:00
jsteemann	34f7e27d6c	Merge branch 'devel' of https://github.com/arangodb/arangodb into generic-col-types	2016-09-08 09:27:53 +02:00
Frank Celler	52b1541f46	silenced warning in maintainer-mode	2016-09-08 08:41:58 +02:00
jsteemann	8ef63acf55	Merge branch 'devel' of https://github.com/arangodb/arangodb into generic-col-types	2016-09-07 15:24:51 +02:00
Kaveh Vahedipour	beb46cc1a0	cppcheck warnings	2016-09-07 15:11:10 +02:00
jsteemann	c14c6ab025	removed unused variables	2016-09-07 08:56:48 +02:00
Frank Celler	5a14ab5a12	silence warning	2016-09-06 23:20:56 +02:00
Andreas Streichardt	6396ac4dc7	Implement removeServer job	2016-09-06 16:49:25 +02:00

32 Commits