1
0
Fork 0
Commit Graph

393 Commits

Author SHA1 Message Date
Max Neunhöffer 0599a1c79c
Fix unneeded return code. (#9853) 2019-08-30 08:44:07 +02:00
Kaveh Vahedipour 139c5d3839 [3.5] agency lockup when removing 404ed callbacks and leadership preparation (#9846)
* queue the write
* finishing 3.5
* commenting
2019-08-29 14:32:48 +02:00
Kaveh Vahedipour 09d2745625 [3.5] clean up your crap, dbservers. alright, i'll do it. (#9722)
* clean up your crap, dbservers. alright, i'll do it.

* ts ts ts

* body is shared_ptr

* Update CHANGELOG

* revert callback bodies to API specification

* array needs be inside so that multiple unobserves to same key are possible
2019-08-22 14:26:44 +03:00
Max Neunhöffer b753c895e4
Fix an agency bug found in Windows tests. (#9728)
* Fix agency bug found in Windows tests.
* CHANGELOG.
2019-08-16 12:17:09 +02:00
KVS85 e64080e207
Merge 3.5.1 back to 3.5 (#9713)
* Bug fix 3.5/make arangosh reconnect (#9615)

* make arangosh reconnect

* added CHANGELOG entry

* fix lagging AgencyCallbacks (#9620)

* fix lagging AgencyCallbacks

* optimizations, discussed with @mchacki

* fix wording

* updated CHANGELOG

* fix yet another undefined behavior (#9629)

* [3.5.1] Fail the FailedLeader Job if the new leader fails. (#9628)

* Fail the FailedLeader Job if the new leader fails.

* Updated changelog.

* In case of timeout do not rollback.

* Fixed catch tests.

* Changed wording.

* DELETED rollback.

* reduce wait timeouts as a mitigation for notifying waiters without ho… (#9619)

* reduce wait timeouts as a mitigation for notifying waiters without holding the required mutex

this is a quick mitigation only, which reduces maximum wait time from 1
second to 100 milliseconds without changing other behavior.

the main problem of notifying pending writers without successfully
acquiring the required mutex still needs proper addressing.

* adjust timing-dependent test

* [3.5.1] Fast Controlled Leaderchange (#9634)

* First draft of keeping in sync during controlled leader change.

* Test if server is actually the leader in plan.

* Updated changelog.

* Added oldLeader check for set-the-leader request.

* Small fixes.

* Removed LOG_DEVEL.

* less copying, more moving! 🚚 (#9645)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* Port TakeoverShardLeadership from devel to 3.5.1 (#9659)

* Create TakeoverShardLeader job.
* Add TakeoverShardLeadership to Action factory.
* Add log message at level debug.
* Sort out LOG_TOPIC ids.
* Fix unit tests.
* CHANGELOG.

* Bug fix 3.5/hide mmfiles specific info in web ui (#9668)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* hide MMFiles-specific information when we don't need it

* Ported ResignLeadership to 3.5 (#9656)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* Ported ResignLeadership to 3.5

* Add the actual http route.

* Aardvark: Add k Shortest Paths example graph to UI (#9491) (#9661)

* Aardvark: Add k Shortest Paths example graph to UI (#9491)

* Add example graph to UI

* Add kShortestPathsGraph to examples.js

* Update example-graph.js

* Update aardvark.js

* Regenerate UI

* add the ability to have cluster special examples (#9613) (#9663)

* add the ability to have cluster special examples

* Update get_cluster_health.md

* fix abort condition, fix negative filtering for cluster tests

* Test if job fails with unmet assertion

* Remove cluster test example

* germanize

* better skip reasons

* removing superfluous semicolons

* Revert skip reasons, too noisy

* various replication improvements: (#9675)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* various replication improvements:

- better debuggability (more log details)
- shorter minimum wait delay in active failover
- fixed too early pruning of WAL files on leaders

* Bug fix 3.5/fix rocksdb return code (#9692)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* fix return codes for concurrent writes to same documents

* [3.5] Feature/rebootid notice changes, backport of #9523 (#9684)

* Feature/rebootid notice changes, backport of #9523

* Fixed error code to not re-use an old one

* Bug fix 3.5/issue 9679 (#9682)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* fixed issue #9679

* bug-fix/issue-#9660 (#9704) (#9707)

* bug-fix/issue-#9660 (#9704)

* fix issue

* Update tests/js/common/aql/aql-view-arangosearch-cluster.inc

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update tests/js/common/aql/aql-view-arangosearch-noncluster.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* fix cluster tests

* Update CHANGELOG

* [3.5] agency node fixes (#9698)

* node fixes port from 3.4
* fixed change log

* update rocksdb statistics to deliver sums from column family instead of single value from default family. (#9706)

* Feature 3.5/geo functions (#9710)

* Add support for WGS84 on distances (#9672)

* Add area calculations (#9693)

* Update CHANGELOG
2019-08-14 20:24:47 +03:00
Wilfried Goesgens 1907a7211b Bug fix/cleanup system includes (#8962) 2019-05-15 15:12:59 +02:00
Jan c00442e31c
fix some issues found by cppcheck (#8959) 2019-05-10 16:31:22 +02:00
Max Neunhöffer 80bfb85695
Port agency performance tuning for many shards to devel. (#8647)
* Port agency performance tuning for many shards to devel.
* Add more IDs to LOG_TOPIC calls.
* Even more IDs for LOG_TOPIC.
* Fix a duplicate LOG_TOPIC ID.
* Fix an old merging bug in devel.
* Don't hesitate between phases one and two for small clusters.
2019-04-11 11:14:56 +02:00
Max Neunhöffer 02281d3be4
Handle InitDone correctly. (#8552)
* precondition plan / version in compaction / store TTL removal independent of local _ttl set
* Agency init loops break when shutting down.
* assertion failures in store on restarting following agents
* Minor porting fixes from 3.4
2019-04-01 17:01:05 +02:00
Jan 80a6e621ee
don't allocate memory so often in ClusterComm requests (#8550) 2019-03-26 00:31:56 +01:00
Jan Christoph Uhde c3f7961b88 apply unique log ids (#8561) 2019-03-25 20:26:51 +01:00
Lars Maier 4d49285754 [devel]agency appends entries with leaders timestamp (#8478)
* agency's append entries with leader's timestamp
* compatibility with appendEntries protocol without timestamps
* Updated changelog.
2019-03-21 09:52:58 +01:00
Kaveh Vahedipour 098d2d086c fix compaction behaviour of followers (#8348) 2019-03-08 10:39:49 +01:00
Kaveh Vahedipour 68178ba165 [devel] supervision bug fix backports (#8314)
* back ports for supervision fixes from 3.4 part 1

* back ports for supervision fixes from 3.4 part 2
2019-03-04 19:27:24 +01:00
Manuel Pöter ecf4d9d62a Fix race conditions in thread management. (#8032) 2019-01-28 15:44:46 +01:00
Tobias Gödderz a1d3bc3e94 Foxx queue jobs hanging after Foxxmaster crash (#7922)
* Fixed bug where the Foxxmaster doesn't reset jobs after a crash when it should, or a non-master coordinator removes jobs in progress during startup

* Added a regression test

* Updated CHANGELOG

* Fixed non-maintainer compile
2019-01-14 16:08:08 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
Kaveh Vahedipour 2e680a7f9b Agency not starting in log level trace (#7684) 2018-12-10 15:30:57 +01:00
Max Neunhöffer f720703c38
Supervision bug fix to start with clean transient store. (#7325)
* Supervision bug fix to start with clean transient store.

* Add CHANGELOG entry.
2018-11-15 11:24:34 +01:00
Simon 10dc287eb3 Silence Tsan warnings (#7075) 2018-10-25 15:50:39 +02:00
jsteemann 5f951840a9 fix compilation 2018-10-12 17:56:55 +02:00
Kaveh Vahedipour d524ba616b fixed hyperventing agent (#6776)
* fixed hyperventing agent
2018-10-12 17:03:08 +02:00
Max Neunhöffer 2fc368028b
Fix a crash found by the agency torturer. (#6589) 2018-09-28 15:15:26 +02:00
Kaveh Vahedipour a73023e512 Bug fix/agency update endpoints (#6519)
* update endpoints in agency done the RAFT way
* fix mock interface
* tests functioning with new agent interfacwe
* handling non-leader
2018-09-28 15:14:48 +02:00
Dan Larkin-York 0dfabd8f04 Fix several TSan warnings (#6473) 2018-09-14 11:16:45 +02:00
Simon 22b9c31c13 Removing ClusterComm ClientTransactionID (#6294) 2018-09-12 22:15:16 +02:00
Vasiliy e862efdc3b issue 458.4: retrieve the system database via the SystemDatabaseFeature (#6299) 2018-08-31 19:45:10 +02:00
Kaveh Vahedipour fe9b2fecdc notifyInactive has been lying aroung in the agent without being used. relique of the time, when we thought, that we would have an pool of agents from which we'd draw, if an agent failed (#6290) 2018-08-31 10:48:39 +02:00
Kaveh Vahedipour 0080498e89 compaction index should not exceed local commit index (#5900) 2018-07-17 15:54:20 +02:00
Kaveh Vahedipour 5b307db85d Better log compaction 2018-07-16 12:09:58 +02:00
Kaveh Vahedipour 7df40fa905 backport agency fixes for replacing agent with total data loss (#5823) 2018-07-11 11:23:48 +02:00
Kaveh Vahedipour 7b40a61b85 fixing issue when disaster recovered agent has new endpoint (#5809) 2018-07-11 11:19:41 +02:00
Matthew Von-Maszewski 0264f3bc9b update gossip loop to be more responsive to other agents (#5390) 2018-05-22 16:30:27 +02:00
Wilfried Goesgens 7d6e580780 Refactoring & code cleanup (#5138) (#5142) 2018-04-24 14:42:23 +02:00
Kaveh Vahedipour f4edcc7ba8 Bug fix/supervision engine starting early on leadership change (#5062)
* supervision must not work as long as agent is still preparing
* leadersince atomic and pushed to end of leader preparation
* More consistent use of integer types.
* Slightly change order of events in Supervision loop.
2018-04-10 15:28:26 +02:00
Kaveh Vahedipour 2e2d947c1c devel: fixed the missed changes to plan after agency callback is registred f… (#4775)
* fixed the missed changes to plan after agency callback is registred for create collection
* Force check in timeout case.
* Sort out RestAgencyHandler behaviour for inquire.
* Take "ongoing" stuff out of AgencyComm.
2018-03-14 12:01:17 +01:00
Kaveh Vahedipour 42f543fd10 constituent correctly persisiting _votedFor and _term (#4248) 2018-01-16 09:47:25 +01:00
Matthew Von-Maszewski ae77ff80c2 create independent executeLockedRead and executeLockedWrite to speed performance (#4177) 2017-12-29 12:02:27 +01:00
Max Neunhöffer 927027695d
Sort out locking agency to separate reads and writes. (#4174)
* disentagle writes and reads in agency
* renamed _oLock to _outputLock.  Documented read and write rules for _readDB and _commitIndex using _outputLock and _waitForCV.  Adjusted code to match rules.
* update executeLocked() knowing some callers use _readDB via readDB().  readDB() currently read only, but using write locks due to absolutely safe.
* Lay out clear rules against deadlock in agency.
* Avoid unprotected access to _commitIndex.
2017-12-28 11:27:20 +01:00
Max Neunhöffer 7bae6980e8
Bug fix/agent lead hanger (#4147)
* Really enforce the hidden option --server.maximal-threads if given.
* Switch off --log.force-direct in scripts/startStandAloneAgency.sh
* Lower the timeout for sending AppendEntriesRPC to 150s.
* Erase _earliestPackage when becoming a leader.
* Challenge leadership in agent main loop.
* Use steady_clock for _earliestPackage.
* Change _lastAcked and _leaderSince to steady_clock as well.
* time difference calculations based on old readSystemClock to steadyClockToDouble
* All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps
* Inception system_clock to steady_clock
2017-12-27 16:45:39 +01:00
Jan 282be208cc
remove TRI_usleep and TRI_sleep, and use std::this_thread::sleep_for … (#3817) 2017-12-06 18:43:49 +01:00
Kaveh Vahedipour 2beaef41ff Bug fix/agencycomm validate methods broken (#3784) 2017-11-24 10:31:07 +01:00
Max Neunhöffer 766ab7c8cf
Fix agency shutdown bug. (#3683)
* Fix agency shutdown bug.
* Remove precondition that was not needed in AgencyComm::removeValues.
* Fail fatally if threads do not shut down.
2017-11-14 16:33:46 +01:00
Kaveh Vahedipour 7e816db51e Bug fix/agency restart enhancements (#3619)
* Removed unused active(...) method in Agent
* Inception's restart from persistence allows peer with empty active RAFT list to join
* Agency's UUID is persisted outside of the database comparable to coordinator and db server action.
* Publicized Methods to UUID stuff in ServerState
* Inception method documentation
* added --agency.disaster-recovery-id to allow for specification of known former agency id. this is a very dangerous option potentially.
* Delete a unused methods.
* separate _id and _recoveryId
* populating active list with entire pool
* Improve logging.
* reject gossip from unknown agent, if pool is complete
2017-11-10 23:40:26 +01:00
Max Neunhöffer 3c0ee6908b Bug fix/lead to agent (#3541) 2017-11-09 11:10:09 +01:00
Max Neunhöffer ee96c37237 Fix agency restart problems. (#3493)
* Fix agency restart problems (port from a 3.2 fix).

* Further fixes after Craneware rescue.
2017-10-25 18:05:58 +02:00
Kaveh Vahedipour 46333a762f Bug fix/agency restart after compaction and holes in log (#3413)
* State fixes holes in RAFT index range
* Avoid application of entries older than compaction index _cur and guard for unsigned overflow
2017-10-13 16:01:41 +02:00
Max Neunhöffer d86f27bd19 Bug fix/agency leader timeouts (#3373)
* Send out empty heartbeats regardless of non-empty AppendEntriesRPC.
* Also improve logging:
  Note if a log in the empty heartbeat sending takes > 0.01 s.
  Clearly mark places where a leader resigns in logging.
  Log if no empty heartbeat is sent out.
* Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses.
* Add debug logging for _lastAcked and challengeLeadership.
* Remove some unused code. Do not count ourselves in challengeLeadership.
* Removal of entire activation/deactivation mechanisms in agency
* TRI_microtime up to c++11
* added term to response to sendAppendEntries.
2017-10-06 10:11:51 +02:00
Max Neunhoeffer af3f977997
Revert "Send out empty heartbeats regardless of non-empty AppendEntriesRPC."
This reverts commit e974501446.
2017-10-02 15:02:15 +02:00
Max Neunhoeffer 2852f80b5a
Revert "Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses."
This reverts commit 45d37edfb2.
2017-10-02 15:02:06 +02:00