1
0
Fork 0
Commit Graph

1387 Commits

Author SHA1 Message Date
Lars Maier c215e30299 Precs to check if collection exists (#9278)
* Adding preconditions for jobs to check that the collection still exists.

* Make it compile.

* Fixed tests.
2019-07-01 13:24:53 +02:00
Lars Maier eb1aa6e024 Response compression (#9300)
* First draft of response compression.

* Cleanup.

* Removed compression from /_api/version.

* Added ruby test for response compression.
2019-06-28 10:02:48 +02:00
Max Neunhöffer d6d362bd3b
Fix agency election lock step bug. (#9351)
* Fix agency election lockstep bug.

Reset the base point for the random election timeout to now whenever we have
cast a vote, be it for us or for some other server.

* CHANGELOG.

* Fix compilation.
2019-06-27 22:06:26 +02:00
Lars Maier 50ce41e062 Release _to server when abort because of dropped collection or `something serious went wrong`. (#9267) 2019-06-17 15:04:44 +02:00
Jan Christoph Uhde 3f603f024f remove some containers from common.h (#9223)
* remove some containers from Common.h

* enterprise fixes
2019-06-07 13:27:24 +02:00
Kaveh Vahedipour 8a511934df [devel] fix ttl handling for object assigments (#9207)
* fix ttl handling for object assigments
* completed test handling
* CHANGELOG.
2019-06-06 15:48:40 +02:00
Lars Maier 1e94ecf414 Bug fix/supervision fixes4 (#9016)
* Try to fix agency problems with snapshots.

* Abort MoveShards jobs that have the failed server as fromServer.

* Report aborts.

* CHANGELOG.
2019-05-31 17:20:06 +02:00
Kaveh Vahedipour 773f3c8422 [devel] fix state clientlookuptable (#9066) 2019-05-30 04:24:46 +02:00
Jan 59b67cad40
fix various small annoyances (#9079) 2019-05-23 17:36:38 +02:00
Simon 394c070a4f Do not check isEnabled (#8997) 2019-05-17 16:33:17 +02:00
Wilfried Goesgens 1907a7211b Bug fix/cleanup system includes (#8962) 2019-05-15 15:12:59 +02:00
Andrey Abramov 8358b978be disable arangosearch analyzers in agency (#8989)
* disable arangosearch analyzers

* do nothing in `stop()` if feature is disabled
2019-05-14 17:37:38 +02:00
Matthew Von-Maszewski fb157b89ac Add protective barrier to sendWithFailover() in case MANAGER is nullptr (#8998) 2019-05-14 17:35:55 +02:00
Lars Maier d9ff6647d8 [devel] Do not skip the check if there is only a single agent. (#8943)
* Do not skip the check if there is only a single agent.
* Updated changelog.
2019-05-14 15:42:43 +02:00
Lars Maier 49c568e674 Added reason to job abort method. (#8877) 2019-05-14 15:39:53 +02:00
Jan c00442e31c
fix some issues found by cppcheck (#8959) 2019-05-10 16:31:22 +02:00
Jan 976dc2b726
Bug fix/issues 2019 05 06 (#8913) 2019-05-07 12:17:16 +02:00
Lars Maier c99e8e8973 [devel] ClientID Agency Transaction (#8652)
* Changed clientId to format <serverid>:<uuid>.
* Changed behavior if id is not known.
2019-04-30 10:39:23 +02:00
Jan 449ab1ed8e
Bug fix/cppcheck 13042019 (#8752) 2019-04-15 10:13:56 +02:00
Simon 937d743ba6 Bug fix/pregel stuff (#8733) 2019-04-11 15:58:28 +02:00
Max Neunhöffer 80bfb85695
Port agency performance tuning for many shards to devel. (#8647)
* Port agency performance tuning for many shards to devel.
* Add more IDs to LOG_TOPIC calls.
* Even more IDs for LOG_TOPIC.
* Fix a duplicate LOG_TOPIC ID.
* Fix an old merging bug in devel.
* Don't hesitate between phases one and two for small clusters.
2019-04-11 11:14:56 +02:00
Jan c6d3f8e052
Bug fix/pass on error messages (#8690) 2019-04-10 12:34:25 +02:00
Jan 6b9b9b0946
Bug fix/fix test muell (#8703) 2019-04-09 11:27:53 +02:00
Simon 7cd84a785a Remove Obsolete code (#8657) 2019-04-03 13:40:44 +02:00
Max Neunhöffer 02281d3be4
Handle InitDone correctly. (#8552)
* precondition plan / version in compaction / store TTL removal independent of local _ttl set
* Agency init loops break when shutting down.
* assertion failures in store on restarting following agents
* Minor porting fixes from 3.4
2019-04-01 17:01:05 +02:00
Jan d6d3e3daa4
initialize some member variables, added TODOs (#8545) 2019-03-26 12:57:32 +01:00
Jan 80a6e621ee
don't allocate memory so often in ClusterComm requests (#8550) 2019-03-26 00:31:56 +01:00
Jan Christoph Uhde c3f7961b88 apply unique log ids (#8561) 2019-03-25 20:26:51 +01:00
Max Neunhöffer 55706e3c74
Make addfollower jobs less aggressive. (#8490)
* Make addfollower jobs less aggressive.
* CHANGELOG.
2019-03-21 15:24:31 +01:00
Lars Maier 4d49285754 [devel]agency appends entries with leaders timestamp (#8478)
* agency's append entries with leader's timestamp
* compatibility with appendEntries protocol without timestamps
* Updated changelog.
2019-03-21 09:52:58 +01:00
Kaveh Vahedipour 5038dfe685 supervision must not copy snapshots into jobs (#8425)
* supervision must not copy snapshots into jobs
* CHANGELOG.
2019-03-20 17:07:54 +01:00
Kaveh Vahedipour 237e079614 leader check needs to sit inside waitfor loop (#8445)
* leader check needs to sit inside waitfor loop
* Do not wait in Supervision for commits of new writes.
* CHANGELOG.
2019-03-20 16:34:54 +01:00
Simon 49cc3bcd1e Refactorings from cluster trx improvement branch (#8391) 2019-03-14 23:13:17 +01:00
Kaveh Vahedipour fa98e94d23 Supervision must not waitfor if no longer leading (#8403)
* Supervision must not waitfor if no longer leading

* Supervision must not waitfor if no longer leading
2019-03-13 13:18:10 +01:00
Max Neunhöffer 2a4f606df2
Various agency improvements. (#8380)
* Ignore satellite collections in shrinkCluster in agency.
* Abort RemoveFollower job if not enough in-sync followers or leader failure.
* Break quick wait loop in supervision if leadership is lost.
* In case of resigned leader, set isReady=false in clusterInventory.
* Fix catch tests.
2019-03-12 15:25:16 +01:00
Max Neunhöffer 30adf5e2d9
Fix an agency crash. (#8381)
* Check if transaction failed before accessing the result.
* FailedFollower has the same bug.
* Add CHANGELOG.
2019-03-12 15:24:37 +01:00
Kaveh Vahedipour 098d2d086c fix compaction behaviour of followers (#8348) 2019-03-08 10:39:49 +01:00
Kaveh Vahedipour ee751e8ba3 [devel] clear compilation warnings (#8345) 2019-03-08 10:35:09 +01:00
Kaveh Vahedipour 4b464aeb97 oversight (#8324)
* oversight of an abort
* fix waitFor trap in supervision
2019-03-05 23:31:18 +01:00
Kaveh Vahedipour 68178ba165 [devel] supervision bug fix backports (#8314)
* back ports for supervision fixes from 3.4 part 1

* back ports for supervision fixes from 3.4 part 2
2019-03-04 19:27:24 +01:00
Kaveh Vahedipour 22639f53f1 Failed servers now transactionally adds along the leader/follower jobs (#8096)
* failed servers now transactionally adds along the leader/follower jobs
* one pending still left
* typos
2019-02-05 13:56:27 +01:00
Manuel Pöter ecf4d9d62a Fix race conditions in thread management. (#8032) 2019-01-28 15:44:46 +01:00
Tobias Gödderz a1d3bc3e94 Foxx queue jobs hanging after Foxxmaster crash (#7922)
* Fixed bug where the Foxxmaster doesn't reset jobs after a crash when it should, or a non-master coordinator removes jobs in progress during startup

* Added a regression test

* Updated CHANGELOG

* Fixed non-maintainer compile
2019-01-14 16:08:08 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
Simon a2a0b03f43 Rdb index background (preliminary) (#7644) 2018-12-21 19:24:10 +01:00
Kaveh Vahedipour 2e680a7f9b Agency not starting in log level trace (#7684) 2018-12-10 15:30:57 +01:00
Jan 5bae3742e5
Feature/internal 3306 (#7683) 2018-12-06 16:19:28 +01:00
Lars Maier dd07d74d69 [devel] Bug fix/bad leader report current (#7585)
* Bug fix 3.4/bad leader report current (#7574)
* Initialize theLeader non-empty, thus not assuming leadership.
* Correct ClusterInfo to look into Target/CleanedServers.
* Prevent usage of to be cleaned out servers in new collections.
* After a restart, do not assume to be leader for a shard.
* Do nothing in phaseTwo if leader has not been touched. (#7579)
* Drop follower if it refuses to cooperate.

This is important since a dbserver that is follower for a shard will
after a reboot think that it is a leader, at least for a short amount
of time. If it came back quickly enough, the leader might not have
noticed that it was away.
2018-12-03 10:20:30 +01:00
Lars Maier 908df47cd7 [devel] Bug fix/cluster health ui timestamp (#7562) 2018-11-30 16:26:21 +01:00
Lars Maier 52cff7ad55 Feature/engine version added to agent configuration (#7481) (#7524)
* agents' is obtained from leader's configuration
* corrections in Supervision for advertised endpoints
* change log
* Updated Documentation for cluster/health.
* Unified naming convention.
* Fixed missing update of volatile fields.
* Set version in right order.
* Removed debug output.
* Fixed jslint - missing ;
2018-11-29 14:25:40 +01:00
Lars Maier f3ade0f860 Version/Engine Cluster Health (#7474)
* Export Version and Engine in Cluster Health. Additionally export `versionString` in registered Servers.

* Updated Changelog.
2018-11-27 14:56:00 +01:00
Max Neunhöffer d72e51ed8f
Fix move leader shard. (#7445)
* Ungreylist move shard test.
* Move leader shard: wait until all but the old leader are in sync.
* Increate moveShard timeout to 10000 seconds.
* Add CHANGELOG entry.
* Fix compilation.
* Fix a misleading comment.
2018-11-26 15:04:04 +01:00
Kaveh Vahedipour 9ec6619b84 Bug fix/index readiness (#6541)
* indexes are marked  while still missing in Current
* index handling getCollection
* supervision gets indexes from isbuilding, when coordinator is gone before finishing
* seems right now
* fixed broken views
* remove junk comments
* cleanup
* node / supervision adjustements
* supervision fixes
* neunhoef remarks part i
* neunhoef remarks part ii
* neunhoef remarks part ii
* neunhoef remarks part iiI
* collection's current version please
* no need to wait for current once again
* no longer necessary code
* clear comments
* delete left overs
* dead code revived
2018-11-21 14:42:58 +01:00
Max Neunhöffer f720703c38
Supervision bug fix to start with clean transient store. (#7325)
* Supervision bug fix to start with clean transient store.

* Add CHANGELOG entry.
2018-11-15 11:24:34 +01:00
Markus Pfeiffer 39bdebf851 Port bug-fix-3.4/timeout-create-coll to devel (#7307)
* Fix loophole in error handling.
* Fix inquiry case of id not found: 404.
* Also handle correctly in AgencyComm.
* Fix agency tests.
* Fix error handling in dropCollectionOnCoordinator.
2018-11-14 10:03:55 +01:00
Jan 7306cdaa03
try not to throw so many exceptions from Supervision (#7227) 2018-11-07 15:36:41 +01:00
Simon c72818a9dc Make ensureIndexOnCoordinator more robust (#7110) 2018-10-29 17:45:46 +01:00
Simon 10dc287eb3 Silence Tsan warnings (#7075) 2018-10-25 15:50:39 +02:00
Heiko a13f68bc5b Bug fix/agency loop wrong credentials (#7039)
* arangod now exits when used wrong credentials during the startup process

* CHANGELOG
2018-10-25 14:15:50 +02:00
Simon d23aaa2198 Better agency pool update (#7040) 2018-10-24 16:23:21 +02:00
Simon 8b7a4099b8 Properly compare velocypack objects in Agency operations (#6921)
* Properly compare velocypack objects in Agency operations

* Add changelog

* added option for VPackDumper
2018-10-17 20:03:53 +02:00
jsteemann 5f951840a9 fix compilation 2018-10-12 17:56:55 +02:00
Kaveh Vahedipour d524ba616b fixed hyperventing agent (#6776)
* fixed hyperventing agent
2018-10-12 17:03:08 +02:00
Max Neunhöffer 2452dcc5d0
Remove a relic from early days in /Target/FailedServers. (#6690)
* Remove a relic from early days in /Target/FailedServers.
* Fix a test.
2018-10-09 13:52:32 +02:00
Jan e78d1aa541
Bug fix/even more ldap debugging (#6736) 2018-10-08 09:42:11 +02:00
Lars Maier 6546b908be Bug fix/cleanup lost collection inc plan v (#6720)
* Increase the current version rather than the plan version.
2018-10-04 15:38:41 +02:00
jsteemann b067d738e5 fixed indentation a bit 2018-10-03 13:25:32 +02:00
Simon 5837291495 Debug logs for ActiveFailover (#6684) 2018-10-02 15:10:50 +02:00
Jan c06f2d77da
Feature/velocypack update (#6678) 2018-10-02 14:04:14 +02:00
Max Neunhöffer a549dd9264
Increase Plan/Version if follower is removed in MoveShard. (#6669)
This was forgotten when we added the `remainsFollower` flag.
2018-10-01 16:55:04 +02:00
Lars Maier 14d1487710 Catch all exceptions to prevent maintenance workers from crashing. (#6645)
* Catch all exceptions to prevent maintenance workers from crashing.
* Please don't free this.
* Unified code paths.
* Remove dub comment.
* Removed debug output.
* Deleted unneeded constructors.
* Assignment operator deleted.
2018-09-28 17:10:44 +02:00
Max Neunhöffer 2fc368028b
Fix a crash found by the agency torturer. (#6589) 2018-09-28 15:15:26 +02:00
Kaveh Vahedipour a73023e512 Bug fix/agency update endpoints (#6519)
* update endpoints in agency done the RAFT way
* fix mock interface
* tests functioning with new agent interfacwe
* handling non-leader
2018-09-28 15:14:48 +02:00
Lars Maier 3dbb0558f3 Clean lost collections in supervision (#6592)
* Working draft: clean lost collections in supervision.
* Added early exit as in spec.
* Finished test. Fixed logging.
2018-09-26 16:54:29 +02:00
Simon 0a9afccde5 Fix crash on Agency / DBserver with user JWT tokens (#6594) 2018-09-26 14:26:35 +02:00
Simon b16af5ac71 Fix superfluous QueryRegistry::close, cleanup (#6579) 2018-09-24 13:10:07 +02:00
Simon 912f109968 Add simple Future library (#6464) 2018-09-21 16:14:17 +02:00
Lars Maier 5929cafaf9 cleanoutServer Bug Fix (#6537)
* Fixing bug: cleanoutServer will no longer add old leader as follower.

* Fixed rollback.
2018-09-21 10:16:14 +02:00
Simon aa21ffdb7a Properly check syncer erros, catch more exceptions (#6520) 2018-09-17 16:39:23 +02:00
Dan Larkin-York 0dfabd8f04 Fix several TSan warnings (#6473) 2018-09-14 11:16:45 +02:00
Max Neunhöffer 84735955ea Add advertised endpoints. (#6104) 2018-09-13 16:30:55 +02:00
Simon 22b9c31c13 Removing ClusterComm ClientTransactionID (#6294) 2018-09-12 22:15:16 +02:00
Kaveh Vahedipour 6b2733625c Feature/static const strings cleanup (#6352)
* AgentConfiguration cleanup
* static strings in maintenance / agency
* more strings unified
* fix windows build
2018-09-11 13:40:03 +02:00
Jan 17ea2d4ec9
suppress some messages which are expected on shutdown (#6381) 2018-09-05 14:15:35 +02:00
Vasiliy 5329f34771 issue 465.2.2: remove redudnant heap allocations and simplify API (#6349)
* issue 465.2.2: remove redudnant heap allocations and simplify API

* address merge issue

* address more merge issues

* address more merge issues

* address review comments

* do not deallocate non-allocated instances
2018-09-05 13:37:37 +03:00
Vasiliy e862efdc3b issue 458.4: retrieve the system database via the SystemDatabaseFeature (#6299) 2018-08-31 19:45:10 +02:00
Jan 5873f63a72
Bug fix/fixes 2908 (#6279) 2018-08-31 17:26:54 +02:00
Lars Maier 63d9cfa081 Maintenance Fixes (#6284)
* Clean up for `FIXMEMAINTENANCE` comments: removed race condition, added errors and `notify()`s.
* Removed dublicated code.
* Added requested changes. Added error reporting for `UpdateCollection`.
* Make it compile. Add missing `notify()`.
* `CreateCollection` generates errors in all code paths.
* Fixed catch test.
2018-08-31 15:24:29 +02:00
Kaveh Vahedipour fe9b2fecdc notifyInactive has been lying aroung in the agent without being used. relique of the time, when we thought, that we would have an pool of agents from which we'd draw, if an agent failed (#6290) 2018-08-31 10:48:39 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00
Simon 229c09d434 Allow dirty-reads from passive (#6136) 2018-08-20 16:26:14 +02:00
Matthew Von-Maszewski 86ea784372 bugfix: establish unique function name & implementation for communication retry status (#6150)
* initial checkin of isRetryOK().  Includes fixes to known code that has previously hung shutdowns by performing infinite retries.

* slight help on getting out of a loop faster during shutdown.  not essential.
2018-08-17 14:57:12 +02:00
Vasiliy 6fd541d110 issue 427.5: use ApplicationServer reference instead of pointer (#6145)
* issue 427.5: use ApplicationServer reference instead of pointer

* address MSVC build failure
2018-08-15 12:16:02 +03:00
Jan a5bb50b0bf
remove methods from VelocyPackHelper that are also in VPackSlice (#5946) 2018-07-25 09:01:29 +02:00
Jan ac1d5aac9b
allow starting agency with --console again (requires V8 then) (#5927) 2018-07-24 09:34:22 +02:00
Max Neunhoeffer 1c4beb4c34 Keep failed follower in followers list in Plan. 2018-07-23 11:25:10 +02:00
Kaveh Vahedipour 0080498e89 compaction index should not exceed local commit index (#5900) 2018-07-17 15:54:20 +02:00
Jan 006995a6a5
Bug fix/dont start v8 for agency (#5891)
* disable V8 for agency setups

* add missing section declaration (fixes unrelated Windows bug)
2018-07-17 11:24:53 +02:00
jsteemann a0e9865181 typos 2018-07-16 20:49:22 +02:00
jsteemann 44c7b1b476 remove tabstops 2018-07-16 15:00:12 +02:00