arangodb

Commit Graph

Author	SHA1	Message	Date
Max Neunhöffer	0599a1c79c	Fix unneeded return code. (#9853 )	2019-08-30 08:44:07 +02:00
Kaveh Vahedipour	139c5d3839	[3.5] agency lockup when removing 404ed callbacks and leadership preparation (#9846 ) * queue the write * finishing 3.5 * commenting	2019-08-29 14:32:48 +02:00
Kaveh Vahedipour	09d2745625	[3.5] clean up your crap, dbservers. alright, i'll do it. (#9722 ) * clean up your crap, dbservers. alright, i'll do it. * ts ts ts * body is shared_ptr * Update CHANGELOG * revert callback bodies to API specification * array needs be inside so that multiple unobserves to same key are possible	2019-08-22 14:26:44 +03:00
Max Neunhöffer	b753c895e4	Fix an agency bug found in Windows tests. (#9728 ) * Fix agency bug found in Windows tests. * CHANGELOG.	2019-08-16 12:17:09 +02:00
KVS85	e64080e207	Merge 3.5.1 back to 3.5 (#9713 ) * Bug fix 3.5/make arangosh reconnect (#9615) * make arangosh reconnect * added CHANGELOG entry * fix lagging AgencyCallbacks (#9620) * fix lagging AgencyCallbacks * optimizations, discussed with @mchacki * fix wording * updated CHANGELOG * fix yet another undefined behavior (#9629) * [3.5.1] Fail the FailedLeader Job if the new leader fails. (#9628) * Fail the FailedLeader Job if the new leader fails. * Updated changelog. * In case of timeout do not rollback. * Fixed catch tests. * Changed wording. * DELETED rollback. * reduce wait timeouts as a mitigation for notifying waiters without ho… (#9619) * reduce wait timeouts as a mitigation for notifying waiters without holding the required mutex this is a quick mitigation only, which reduces maximum wait time from 1 second to 100 milliseconds without changing other behavior. the main problem of notifying pending writers without successfully acquiring the required mutex still needs proper addressing. * adjust timing-dependent test * [3.5.1] Fast Controlled Leaderchange (#9634) * First draft of keeping in sync during controlled leader change. * Test if server is actually the leader in plan. * Updated changelog. * Added oldLeader check for set-the-leader request. * Small fixes. * Removed LOG_DEVEL. * less copying, more moving! 🚚 (#9645) * attempt to fix load_balancing tests in slow test environments (#9626) * Bug fix/fix swagger datatype (#9045) (#9602) * Bug fix/fix swagger datatype (#9045) * remove http so https arangos will work * verify that query parameters are proper swagger data types, fix offending documentation files * return the actual type - not the list of available ones * check formats * there is no uint64 in swagger * Fresh Swagger * Port TakeoverShardLeadership from devel to 3.5.1 (#9659) * Create TakeoverShardLeader job. * Add TakeoverShardLeadership to Action factory. * Add log message at level debug. * Sort out LOG_TOPIC ids. * Fix unit tests. * CHANGELOG. * Bug fix 3.5/hide mmfiles specific info in web ui (#9668) * attempt to fix load_balancing tests in slow test environments (#9626) * Bug fix/fix swagger datatype (#9045) (#9602) * Bug fix/fix swagger datatype (#9045) * remove http so https arangos will work * verify that query parameters are proper swagger data types, fix offending documentation files * return the actual type - not the list of available ones * check formats * there is no uint64 in swagger * Fresh Swagger * hide MMFiles-specific information when we don't need it * Ported ResignLeadership to 3.5 (#9656) * attempt to fix load_balancing tests in slow test environments (#9626) * Bug fix/fix swagger datatype (#9045) (#9602) * Bug fix/fix swagger datatype (#9045) * remove http so https arangos will work * verify that query parameters are proper swagger data types, fix offending documentation files * return the actual type - not the list of available ones * check formats * there is no uint64 in swagger * Fresh Swagger * Ported ResignLeadership to 3.5 * Add the actual http route. * Aardvark: Add k Shortest Paths example graph to UI (#9491) (#9661) * Aardvark: Add k Shortest Paths example graph to UI (#9491) * Add example graph to UI * Add kShortestPathsGraph to examples.js * Update example-graph.js * Update aardvark.js * Regenerate UI * add the ability to have cluster special examples (#9613) (#9663) * add the ability to have cluster special examples * Update get_cluster_health.md * fix abort condition, fix negative filtering for cluster tests * Test if job fails with unmet assertion * Remove cluster test example * germanize * better skip reasons * removing superfluous semicolons * Revert skip reasons, too noisy * various replication improvements: (#9675) * attempt to fix load_balancing tests in slow test environments (#9626) * Bug fix/fix swagger datatype (#9045) (#9602) * Bug fix/fix swagger datatype (#9045) * remove http so https arangos will work * verify that query parameters are proper swagger data types, fix offending documentation files * return the actual type - not the list of available ones * check formats * there is no uint64 in swagger * Fresh Swagger * various replication improvements: - better debuggability (more log details) - shorter minimum wait delay in active failover - fixed too early pruning of WAL files on leaders * Bug fix 3.5/fix rocksdb return code (#9692) * attempt to fix load_balancing tests in slow test environments (#9626) * Bug fix/fix swagger datatype (#9045) (#9602) * Bug fix/fix swagger datatype (#9045) * remove http so https arangos will work * verify that query parameters are proper swagger data types, fix offending documentation files * return the actual type - not the list of available ones * check formats * there is no uint64 in swagger * Fresh Swagger * fix return codes for concurrent writes to same documents * [3.5] Feature/rebootid notice changes, backport of #9523 (#9684) * Feature/rebootid notice changes, backport of #9523 * Fixed error code to not re-use an old one * Bug fix 3.5/issue 9679 (#9682) * attempt to fix load_balancing tests in slow test environments (#9626) * Bug fix/fix swagger datatype (#9045) (#9602) * Bug fix/fix swagger datatype (#9045) * remove http so https arangos will work * verify that query parameters are proper swagger data types, fix offending documentation files * return the actual type - not the list of available ones * check formats * there is no uint64 in swagger * Fresh Swagger * fixed issue #9679 * bug-fix/issue-#9660 (#9704) (#9707) * bug-fix/issue-#9660 (#9704) * fix issue * Update tests/js/common/aql/aql-view-arangosearch-cluster.inc Co-Authored-By: Jan <jsteemann@users.noreply.github.com> * Update tests/js/common/aql/aql-view-arangosearch-noncluster.js Co-Authored-By: Jan <jsteemann@users.noreply.github.com> * fix cluster tests * Update CHANGELOG * [3.5] agency node fixes (#9698) * node fixes port from 3.4 * fixed change log * update rocksdb statistics to deliver sums from column family instead of single value from default family. (#9706) * Feature 3.5/geo functions (#9710) * Add support for WGS84 on distances (#9672) * Add area calculations (#9693) * Update CHANGELOG	2019-08-14 20:24:47 +03:00
Wilfried Goesgens	1907a7211b	Bug fix/cleanup system includes (#8962 )	2019-05-15 15:12:59 +02:00
Jan	c00442e31c	fix some issues found by cppcheck (#8959 )	2019-05-10 16:31:22 +02:00
Max Neunhöffer	80bfb85695	Port agency performance tuning for many shards to devel. (#8647 ) * Port agency performance tuning for many shards to devel. * Add more IDs to LOG_TOPIC calls. * Even more IDs for LOG_TOPIC. * Fix a duplicate LOG_TOPIC ID. * Fix an old merging bug in devel. * Don't hesitate between phases one and two for small clusters.	2019-04-11 11:14:56 +02:00
Max Neunhöffer	02281d3be4	Handle InitDone correctly. (#8552 ) * precondition plan / version in compaction / store TTL removal independent of local _ttl set * Agency init loops break when shutting down. * assertion failures in store on restarting following agents * Minor porting fixes from 3.4	2019-04-01 17:01:05 +02:00
Jan	80a6e621ee	don't allocate memory so often in ClusterComm requests (#8550 )	2019-03-26 00:31:56 +01:00
Jan Christoph Uhde	c3f7961b88	apply unique log ids (#8561 )	2019-03-25 20:26:51 +01:00
Lars Maier	4d49285754	[devel]agency appends entries with leaders timestamp (#8478 ) * agency's append entries with leader's timestamp * compatibility with appendEntries protocol without timestamps * Updated changelog.	2019-03-21 09:52:58 +01:00
Kaveh Vahedipour	098d2d086c	fix compaction behaviour of followers (#8348 )	2019-03-08 10:39:49 +01:00
Kaveh Vahedipour	68178ba165	[devel] supervision bug fix backports (#8314 ) * back ports for supervision fixes from 3.4 part 1 * back ports for supervision fixes from 3.4 part 2	2019-03-04 19:27:24 +01:00
Manuel Pöter	ecf4d9d62a	Fix race conditions in thread management. (#8032 )	2019-01-28 15:44:46 +01:00
Tobias Gödderz	a1d3bc3e94	Foxx queue jobs hanging after Foxxmaster crash (#7922 ) * Fixed bug where the Foxxmaster doesn't reset jobs after a crash when it should, or a non-master coordinator removes jobs in progress during startup * Added a regression test * Updated CHANGELOG * Fixed non-maintainer compile	2019-01-14 16:08:08 +01:00
Frank Celler	ac9f375fb5	big reformat	2018-12-26 00:54:03 +01:00
Kaveh Vahedipour	2e680a7f9b	Agency not starting in log level trace (#7684 )	2018-12-10 15:30:57 +01:00
Max Neunhöffer	f720703c38	Supervision bug fix to start with clean transient store. (#7325 ) * Supervision bug fix to start with clean transient store. * Add CHANGELOG entry.	2018-11-15 11:24:34 +01:00
Simon	10dc287eb3	Silence Tsan warnings (#7075 )	2018-10-25 15:50:39 +02:00
jsteemann	5f951840a9	fix compilation	2018-10-12 17:56:55 +02:00
Kaveh Vahedipour	d524ba616b	fixed hyperventing agent (#6776 ) * fixed hyperventing agent	2018-10-12 17:03:08 +02:00
Max Neunhöffer	2fc368028b	Fix a crash found by the agency torturer. (#6589 )	2018-09-28 15:15:26 +02:00
Kaveh Vahedipour	a73023e512	Bug fix/agency update endpoints (#6519 ) * update endpoints in agency done the RAFT way * fix mock interface * tests functioning with new agent interfacwe * handling non-leader	2018-09-28 15:14:48 +02:00
Dan Larkin-York	0dfabd8f04	Fix several TSan warnings (#6473 )	2018-09-14 11:16:45 +02:00
Simon	22b9c31c13	Removing ClusterComm ClientTransactionID (#6294 )	2018-09-12 22:15:16 +02:00
Vasiliy	e862efdc3b	issue 458.4: retrieve the system database via the SystemDatabaseFeature (#6299 )	2018-08-31 19:45:10 +02:00
Kaveh Vahedipour	fe9b2fecdc	notifyInactive has been lying aroung in the agent without being used. relique of the time, when we thought, that we would have an pool of agents from which we'd draw, if an agent failed (#6290 )	2018-08-31 10:48:39 +02:00
Kaveh Vahedipour	0080498e89	compaction index should not exceed local commit index (#5900 )	2018-07-17 15:54:20 +02:00
Kaveh Vahedipour	5b307db85d	Better log compaction	2018-07-16 12:09:58 +02:00
Kaveh Vahedipour	7df40fa905	backport agency fixes for replacing agent with total data loss (#5823 )	2018-07-11 11:23:48 +02:00
Kaveh Vahedipour	7b40a61b85	fixing issue when disaster recovered agent has new endpoint (#5809 )	2018-07-11 11:19:41 +02:00
Matthew Von-Maszewski	0264f3bc9b	update gossip loop to be more responsive to other agents (#5390 )	2018-05-22 16:30:27 +02:00
Wilfried Goesgens	7d6e580780	Refactoring & code cleanup (#5138 ) (#5142 )	2018-04-24 14:42:23 +02:00
Kaveh Vahedipour	f4edcc7ba8	Bug fix/supervision engine starting early on leadership change (#5062 ) * supervision must not work as long as agent is still preparing * leadersince atomic and pushed to end of leader preparation * More consistent use of integer types. * Slightly change order of events in Supervision loop.	2018-04-10 15:28:26 +02:00
Kaveh Vahedipour	2e2d947c1c	devel: fixed the missed changes to plan after agency callback is registred f… (#4775 ) * fixed the missed changes to plan after agency callback is registred for create collection * Force check in timeout case. * Sort out RestAgencyHandler behaviour for inquire. * Take "ongoing" stuff out of AgencyComm.	2018-03-14 12:01:17 +01:00
Kaveh Vahedipour	42f543fd10	constituent correctly persisiting _votedFor and _term (#4248 )	2018-01-16 09:47:25 +01:00
Matthew Von-Maszewski	ae77ff80c2	create independent executeLockedRead and executeLockedWrite to speed performance (#4177 )	2017-12-29 12:02:27 +01:00
Max Neunhöffer	927027695d	Sort out locking agency to separate reads and writes. (#4174 ) * disentagle writes and reads in agency * renamed _oLock to _outputLock. Documented read and write rules for _readDB and _commitIndex using _outputLock and _waitForCV. Adjusted code to match rules. * update executeLocked() knowing some callers use _readDB via readDB(). readDB() currently read only, but using write locks due to absolutely safe. * Lay out clear rules against deadlock in agency. * Avoid unprotected access to _commitIndex.	2017-12-28 11:27:20 +01:00
Max Neunhöffer	7bae6980e8	Bug fix/agent lead hanger (#4147 ) * Really enforce the hidden option --server.maximal-threads if given. * Switch off --log.force-direct in scripts/startStandAloneAgency.sh * Lower the timeout for sending AppendEntriesRPC to 150s. * Erase _earliestPackage when becoming a leader. * Challenge leadership in agent main loop. * Use steady_clock for _earliestPackage. * Change _lastAcked and _leaderSince to steady_clock as well. * time difference calculations based on old readSystemClock to steadyClockToDouble * All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps * Inception system_clock to steady_clock	2017-12-27 16:45:39 +01:00
Jan	282be208cc	remove TRI_usleep and TRI_sleep, and use std::this_thread::sleep_for … (#3817 )	2017-12-06 18:43:49 +01:00
Kaveh Vahedipour	2beaef41ff	Bug fix/agencycomm validate methods broken (#3784 )	2017-11-24 10:31:07 +01:00
Max Neunhöffer	766ab7c8cf	Fix agency shutdown bug. (#3683 ) * Fix agency shutdown bug. * Remove precondition that was not needed in AgencyComm::removeValues. * Fail fatally if threads do not shut down.	2017-11-14 16:33:46 +01:00
Kaveh Vahedipour	7e816db51e	Bug fix/agency restart enhancements (#3619 ) * Removed unused active(...) method in Agent * Inception's restart from persistence allows peer with empty active RAFT list to join * Agency's UUID is persisted outside of the database comparable to coordinator and db server action. * Publicized Methods to UUID stuff in ServerState * Inception method documentation * added --agency.disaster-recovery-id to allow for specification of known former agency id. this is a very dangerous option potentially. * Delete a unused methods. * separate _id and _recoveryId * populating active list with entire pool * Improve logging. * reject gossip from unknown agent, if pool is complete	2017-11-10 23:40:26 +01:00
Max Neunhöffer	3c0ee6908b	Bug fix/lead to agent (#3541 )	2017-11-09 11:10:09 +01:00
Max Neunhöffer	ee96c37237	Fix agency restart problems. (#3493 ) * Fix agency restart problems (port from a 3.2 fix). * Further fixes after Craneware rescue.	2017-10-25 18:05:58 +02:00
Kaveh Vahedipour	46333a762f	Bug fix/agency restart after compaction and holes in log (#3413 ) * State fixes holes in RAFT index range * Avoid application of entries older than compaction index _cur and guard for unsigned overflow	2017-10-13 16:01:41 +02:00
Max Neunhöffer	d86f27bd19	Bug fix/agency leader timeouts (#3373 ) * Send out empty heartbeats regardless of non-empty AppendEntriesRPC. * Also improve logging: Note if a log in the empty heartbeat sending takes > 0.01 s. Clearly mark places where a leader resigns in logging. Log if no empty heartbeat is sent out. * Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses. * Add debug logging for _lastAcked and challengeLeadership. * Remove some unused code. Do not count ourselves in challengeLeadership. * Removal of entire activation/deactivation mechanisms in agency * TRI_microtime up to c++11 * added term to response to sendAppendEntries.	2017-10-06 10:11:51 +02:00
Max Neunhoeffer	af3f977997	Revert "Send out empty heartbeats regardless of non-empty AppendEntriesRPC." This reverts commit `e974501446`.	2017-10-02 15:02:15 +02:00
Max Neunhoeffer	2852f80b5a	Revert "Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses." This reverts commit `45d37edfb2`.	2017-10-02 15:02:06 +02:00

1 2 3 4 5 ...

393 Commits