1
0
Fork 0
Commit Graph

1387 Commits

Author SHA1 Message Date
Lars Maier f3ade0f860 Version/Engine Cluster Health (#7474)
* Export Version and Engine in Cluster Health. Additionally export `versionString` in registered Servers.

* Updated Changelog.
2018-11-27 14:56:00 +01:00
Max Neunhöffer d72e51ed8f
Fix move leader shard. (#7445)
* Ungreylist move shard test.
* Move leader shard: wait until all but the old leader are in sync.
* Increate moveShard timeout to 10000 seconds.
* Add CHANGELOG entry.
* Fix compilation.
* Fix a misleading comment.
2018-11-26 15:04:04 +01:00
Kaveh Vahedipour 9ec6619b84 Bug fix/index readiness (#6541)
* indexes are marked  while still missing in Current
* index handling getCollection
* supervision gets indexes from isbuilding, when coordinator is gone before finishing
* seems right now
* fixed broken views
* remove junk comments
* cleanup
* node / supervision adjustements
* supervision fixes
* neunhoef remarks part i
* neunhoef remarks part ii
* neunhoef remarks part ii
* neunhoef remarks part iiI
* collection's current version please
* no need to wait for current once again
* no longer necessary code
* clear comments
* delete left overs
* dead code revived
2018-11-21 14:42:58 +01:00
Max Neunhöffer f720703c38
Supervision bug fix to start with clean transient store. (#7325)
* Supervision bug fix to start with clean transient store.

* Add CHANGELOG entry.
2018-11-15 11:24:34 +01:00
Markus Pfeiffer 39bdebf851 Port bug-fix-3.4/timeout-create-coll to devel (#7307)
* Fix loophole in error handling.
* Fix inquiry case of id not found: 404.
* Also handle correctly in AgencyComm.
* Fix agency tests.
* Fix error handling in dropCollectionOnCoordinator.
2018-11-14 10:03:55 +01:00
Jan 7306cdaa03
try not to throw so many exceptions from Supervision (#7227) 2018-11-07 15:36:41 +01:00
Simon c72818a9dc Make ensureIndexOnCoordinator more robust (#7110) 2018-10-29 17:45:46 +01:00
Simon 10dc287eb3 Silence Tsan warnings (#7075) 2018-10-25 15:50:39 +02:00
Heiko a13f68bc5b Bug fix/agency loop wrong credentials (#7039)
* arangod now exits when used wrong credentials during the startup process

* CHANGELOG
2018-10-25 14:15:50 +02:00
Simon d23aaa2198 Better agency pool update (#7040) 2018-10-24 16:23:21 +02:00
Simon 8b7a4099b8 Properly compare velocypack objects in Agency operations (#6921)
* Properly compare velocypack objects in Agency operations

* Add changelog

* added option for VPackDumper
2018-10-17 20:03:53 +02:00
jsteemann 5f951840a9 fix compilation 2018-10-12 17:56:55 +02:00
Kaveh Vahedipour d524ba616b fixed hyperventing agent (#6776)
* fixed hyperventing agent
2018-10-12 17:03:08 +02:00
Max Neunhöffer 2452dcc5d0
Remove a relic from early days in /Target/FailedServers. (#6690)
* Remove a relic from early days in /Target/FailedServers.
* Fix a test.
2018-10-09 13:52:32 +02:00
Jan e78d1aa541
Bug fix/even more ldap debugging (#6736) 2018-10-08 09:42:11 +02:00
Lars Maier 6546b908be Bug fix/cleanup lost collection inc plan v (#6720)
* Increase the current version rather than the plan version.
2018-10-04 15:38:41 +02:00
jsteemann b067d738e5 fixed indentation a bit 2018-10-03 13:25:32 +02:00
Simon 5837291495 Debug logs for ActiveFailover (#6684) 2018-10-02 15:10:50 +02:00
Jan c06f2d77da
Feature/velocypack update (#6678) 2018-10-02 14:04:14 +02:00
Max Neunhöffer a549dd9264
Increase Plan/Version if follower is removed in MoveShard. (#6669)
This was forgotten when we added the `remainsFollower` flag.
2018-10-01 16:55:04 +02:00
Lars Maier 14d1487710 Catch all exceptions to prevent maintenance workers from crashing. (#6645)
* Catch all exceptions to prevent maintenance workers from crashing.
* Please don't free this.
* Unified code paths.
* Remove dub comment.
* Removed debug output.
* Deleted unneeded constructors.
* Assignment operator deleted.
2018-09-28 17:10:44 +02:00
Max Neunhöffer 2fc368028b
Fix a crash found by the agency torturer. (#6589) 2018-09-28 15:15:26 +02:00
Kaveh Vahedipour a73023e512 Bug fix/agency update endpoints (#6519)
* update endpoints in agency done the RAFT way
* fix mock interface
* tests functioning with new agent interfacwe
* handling non-leader
2018-09-28 15:14:48 +02:00
Lars Maier 3dbb0558f3 Clean lost collections in supervision (#6592)
* Working draft: clean lost collections in supervision.
* Added early exit as in spec.
* Finished test. Fixed logging.
2018-09-26 16:54:29 +02:00
Simon 0a9afccde5 Fix crash on Agency / DBserver with user JWT tokens (#6594) 2018-09-26 14:26:35 +02:00
Simon b16af5ac71 Fix superfluous QueryRegistry::close, cleanup (#6579) 2018-09-24 13:10:07 +02:00
Simon 912f109968 Add simple Future library (#6464) 2018-09-21 16:14:17 +02:00
Lars Maier 5929cafaf9 cleanoutServer Bug Fix (#6537)
* Fixing bug: cleanoutServer will no longer add old leader as follower.

* Fixed rollback.
2018-09-21 10:16:14 +02:00
Simon aa21ffdb7a Properly check syncer erros, catch more exceptions (#6520) 2018-09-17 16:39:23 +02:00
Dan Larkin-York 0dfabd8f04 Fix several TSan warnings (#6473) 2018-09-14 11:16:45 +02:00
Max Neunhöffer 84735955ea Add advertised endpoints. (#6104) 2018-09-13 16:30:55 +02:00
Simon 22b9c31c13 Removing ClusterComm ClientTransactionID (#6294) 2018-09-12 22:15:16 +02:00
Kaveh Vahedipour 6b2733625c Feature/static const strings cleanup (#6352)
* AgentConfiguration cleanup
* static strings in maintenance / agency
* more strings unified
* fix windows build
2018-09-11 13:40:03 +02:00
Jan 17ea2d4ec9
suppress some messages which are expected on shutdown (#6381) 2018-09-05 14:15:35 +02:00
Vasiliy 5329f34771 issue 465.2.2: remove redudnant heap allocations and simplify API (#6349)
* issue 465.2.2: remove redudnant heap allocations and simplify API

* address merge issue

* address more merge issues

* address more merge issues

* address review comments

* do not deallocate non-allocated instances
2018-09-05 13:37:37 +03:00
Vasiliy e862efdc3b issue 458.4: retrieve the system database via the SystemDatabaseFeature (#6299) 2018-08-31 19:45:10 +02:00
Jan 5873f63a72
Bug fix/fixes 2908 (#6279) 2018-08-31 17:26:54 +02:00
Lars Maier 63d9cfa081 Maintenance Fixes (#6284)
* Clean up for `FIXMEMAINTENANCE` comments: removed race condition, added errors and `notify()`s.
* Removed dublicated code.
* Added requested changes. Added error reporting for `UpdateCollection`.
* Make it compile. Add missing `notify()`.
* `CreateCollection` generates errors in all code paths.
* Fixed catch test.
2018-08-31 15:24:29 +02:00
Kaveh Vahedipour fe9b2fecdc notifyInactive has been lying aroung in the agent without being used. relique of the time, when we thought, that we would have an pool of agents from which we'd draw, if an agent failed (#6290) 2018-08-31 10:48:39 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00
Simon 229c09d434 Allow dirty-reads from passive (#6136) 2018-08-20 16:26:14 +02:00
Matthew Von-Maszewski 86ea784372 bugfix: establish unique function name & implementation for communication retry status (#6150)
* initial checkin of isRetryOK().  Includes fixes to known code that has previously hung shutdowns by performing infinite retries.

* slight help on getting out of a loop faster during shutdown.  not essential.
2018-08-17 14:57:12 +02:00
Vasiliy 6fd541d110 issue 427.5: use ApplicationServer reference instead of pointer (#6145)
* issue 427.5: use ApplicationServer reference instead of pointer

* address MSVC build failure
2018-08-15 12:16:02 +03:00
Jan a5bb50b0bf
remove methods from VelocyPackHelper that are also in VPackSlice (#5946) 2018-07-25 09:01:29 +02:00
Jan ac1d5aac9b
allow starting agency with --console again (requires V8 then) (#5927) 2018-07-24 09:34:22 +02:00
Max Neunhoeffer 1c4beb4c34 Keep failed follower in followers list in Plan. 2018-07-23 11:25:10 +02:00
Kaveh Vahedipour 0080498e89 compaction index should not exceed local commit index (#5900) 2018-07-17 15:54:20 +02:00
Jan 006995a6a5
Bug fix/dont start v8 for agency (#5891)
* disable V8 for agency setups

* add missing section declaration (fixes unrelated Windows bug)
2018-07-17 11:24:53 +02:00
jsteemann a0e9865181 typos 2018-07-16 20:49:22 +02:00
jsteemann 44c7b1b476 remove tabstops 2018-07-16 15:00:12 +02:00
Michael Hackstein 7a95c5e675
Feature/feature phases (#5272)
* Added feature phases

* BasicsPhase and DatabasePhase to the required files. Server now has Feature circles and does not boot. Will be sorted out later on.

* Added ClusterPhase to features

* Added V8Phase to the required features

* Added AQLPhase to the affected features

* Added ServerPhase to Features

* Added FoxxPhase to the relevant features

* Added AgencyPhase to the relevant features

* Moved registration from local variable SYS_SYSTEM_REPLICATION_FACTOR from cluster to V8 as their ordering is now vice versa

* Moved Bootstrap feature into FoxxPhase. It could be moved to ServerPhase easily if the FoxxQueue dependency would be removed

* Final movement of Startup Phases. Now solved all circles.

* Removed merge conflict

* Moved ReplicationTimeout into cluster phase and fixed cross-phase requirements

* Added greetings phase. This phase separates the Basics Phase and is the first to be run. Includes Logger and Hello/Goodbye

* Added the GreetingsPhase in the corresponding features. Now all BasicsPhase features start after greetings Phase. There is some issue in this branch which prevents the Agency from Gossipping right now. Will be fixed next

* Moved creation of the Agent into the prepare phase of the feature. THereby it is guaranteed that agents at least exists before the GeneralServer is activating endpoints

* Recovery needs to be started after the ServerID

* Moved log output of FeaturePhases to DEBUG instead of ERROR.

* Added feature phases for clients

* ClusterFeature now does not directly require AgencyFeature any more

* Added requirement of TravEngineRegistryFeature in AQL feature. Otherwise shutdown may be undefined

* The ApplicationServer can now handout the list of ordered features. Used for testing purposes

* Fixed IResearchVew Tests Setup to honor new feature ordering

* Fixed IResearchViewDBServer Tests Setup to honor new feature ordering

* Started fixing IResearchView Coordinator tests with startup ordering. Not finished yet

* Added startup phases to ViewCoordinator test

* Disabled expected logoutput in ClusterRepairsTest

* Fixed indention in test code

* LinkCoordinator now honors startup ordering

* Link meta now honors startup rdering

* Supress expected cluster logs in ViewTest

* Removed '#' accidentially added.
2018-07-16 14:09:36 +02:00
Kaveh Vahedipour 5b307db85d Better log compaction 2018-07-16 12:09:58 +02:00
Jan 201a6a308b
allow turning off statistics feature (#5883) 2018-07-16 10:46:43 +02:00
Simran 34ec56d421 Feature/misc spelling corrections (#5164) 2018-07-13 13:06:20 +02:00
Kaveh Vahedipour 7df40fa905 backport agency fixes for replacing agent with total data loss (#5823) 2018-07-11 11:23:48 +02:00
Kaveh Vahedipour 7b40a61b85 fixing issue when disaster recovered agent has new endpoint (#5809) 2018-07-11 11:19:41 +02:00
Tobias Gödderz fc3e11dbbc Async AQL (#5806)
* Modified header to new initializeCursor API

* Adapted initializeCursor to DONE/WAITING API. Compiles but not tested and no one reacts to WAITING state, it is not returned anywhere yet

* Subqueries now expect a WAITING return from initilize cursor. However they will just return a nullptr and pretend the query is empty, this will be fixed later

* First attempt to simulate thread waiting over information within the query

* Small fix to allow for isDirect handlers to go to sleep.

* Waiting in the necessary places now for the async request to be send.

* Thank you auto-casting compiler, you are totally right i absolutely wanted to use this bool value as an index in may Array. How could i possibly not want to use it here?

* Include cond-var header

* Fixed mutex/cond_var usage

* Added oldAPI wrappers in AQL Blocks for get/skip some variants. This Commit compiles but is NOT tested

* Let getSome now return unique_ptr of AqlItemsBlocks. Also implemented the async variant of getSome in subqueries.

* Removed all references to OLD implementations in AQL. only the base wrappers are allowed to call OLD functions from now on. Now the testing part starts

* Fixed endless virtual recursion

* Implemented new getOrSkip API in SortBlock

* Implemented new getOrSkip API in LimitBlock

* Initilaize all variables

* Fixed logic bug in SubqueryBlock

* getBlock in ExecutionBlock now returns a state. All blocks need to handle this properly!

* Createad a wrapper getBlockOld that servers the old sync api and is used now in AQL. To be replaced overtime.

* Added IndexBlock::skipSome and IndexBlock::getSome

* getBlock now returns its old return value along with the state

* Switch from getBlockOld to getBlock in IndexBlock::skipSome

* Switch from getBlockOld to getBlock in IndexBlock::getSome

* ShortestPathBlock::skipSome is not implemented! Added a regression test

* Attempt to fix SubQueryResult memory management

* Fixed LIMIT Block

* Moved from ShortestPathBlock::getSomeOld to ::getSome

* Implemented ASYNC api on SingletonBlock

* Adapted EnumerateCollectionBlock to new async API

* Fixed FilterBlock and adapted return block to async API

* Adapted NORESULTS block to async AQL api.

* Adapted Modification Blocks to async API

* Fixed some initialize cursor functions to reset values required during get/skipSome

* First steps to adapt ClusterNodes to Async AQL api. Not there yet, need to implement the core still

* Added asnyc implementation for xxxForShard in ClusterBlocks. This commit changes internal logic of _doneForShard. Needs additional testing as soon as everything is in place.

* Adapted CalculationBlock to async API

* Adapted TraversalBlocks to ASYNC Aql. This is not optimal yet, we need a better decission if we are DONE or not on RETURN

* Adapted EnumerateListBlock to Async AQL api

* Adapted RemoteBlock to ASYNC API in getSome/skipSome. The whole thing is now LIVE in the cluster. Exetensive testing to be started now

* Fixed IndexBlock WAITING behaviour if Waiting occurs during a index processing

* Adapted IReasearchViewBlock to ASYNC AQL API

* Fixed SortingGatherBlock in WAITING state.

* Adapted IResearch ExecutionBlockMock to Async API

* Unified the HASMORE/DONE distinction. Code is much more readable now and harder to get incorrect 👍

* Implemented tonly heoretically reachable function of non void function.

* Fixed last commit

* Added inline TODO comments

* fix warning

* Fixed a clearing logic bug in RemoveNodes

* Fixed Error Handling in RemoteBlocks. Also fixed a logic bug (true/false simply has a 50% chance of getting it wrong) in Distribute and Scatter.

* remove unused methods

* Fixed failure test

* implement skipping

* Moved the Query Waiting out of the ExecutionEngine.

* changed one of the collect blocks

* Removed _upstreamState from ExecutionBlockMock, that is in the base-class now

* Added a Test Mock for a an ExecutionBlock that simulates the WAITING/HASMORE/DONE api.

* do not check "hasMore" if not necessary

* Added DistinctCollectBlock::getOrSkipSome from ~Old and changed its return type

(still uses getBlockOld)

* Save state to resume in DistinctCollectBlock::getOrSkipSome

* Extracted redundant code

* fixed some ops

* added one more test

* fix endless blocking

* fix compile error

* fix test

* Refactored HashedCollectBlock::getOrSkipSome

* Return blocks to the manager

* Replaced usage of getBlockOld in HashedCollectBlock::getOrSkipSome

* remove unused shutdown calls, simplify ownership for expressions

* Removed superfluous variable

* Capture const variable by value

* Removed SortedCollectBlock::getOrSkipSomeOld in favour of getOrSkipSome

* Added a working version of SortedCollectBlock::getOrSkipSome

Has yet to be cleaned up

* Removed isTotalAggregation special treatment

* On no input, return a group of nulls (instead of no group at all)

* Bugfixes

* Simplified code

* Move return to the end, eliminate duplicate code

* Corrected skipped count in HashedCollectBlock

* Aligned getNextRow() implementations

* Added comments

* some cleanup

* fix potential memleak

* Bugfix

* Fixed failure tests

* Removed usage of getBlockOld in ExecutionBlock::getOrSkipSome

* Replaced hasMore with an async implementation (mostly)

* Removed getBlockOld()

* Added hasMoreState to the AQL API (and renamed hasMore methods to hasMoreState)

* RemoteBlock now uses the async hasMoreState route

* remove job queue

* options

* Bugfixes in the async implementation of LimitBlock

* LimitBlock::getOrSkipSome now always skips when calculating the fullcount

* fix compile warnings

* restrict threads

* Fixed api of Waiting ExecBlockMock. Unused yet

* Made SortedGatherBlock async-capable

* Removed nonEmptyIndex hack

* Removed duplicate traceGetSome~ calls, moved all to getSome

* Added asserts before replacing getNr*Registers

* Added a TODO note and a comment

* Removed getSomeWithoutRegisterClearoutOld()

* Removed skip()

* Removed common code by using getNr*Registers()

* Use getNr*Registers() in the TraversalBlock as well

* started to add lane

* started to add lane

* added lane

* completed lane

* removed debug output

* fixed merge

* Began working on a test suite for AQL tracing/profiling

* Added more tests and asserts in aql-profiler

* Made some ExecutionBlocks final

* Added a type enum to all blocks and the per-block stats

* Add block type to stats nodes when tracing AQL on block level

* Removed initializeCursor call from instantiateFromPlan

* Avoided additional getSome calls after DONE

* Added more profiler tests

* Refactored ExecutionBlock::getOrSkipSome and fixed two bugs

- set _upstreamState also when skipping
- explicitly use xecutionBlock::getHasMoreState()

* Bugfix: update state

* Reuse parent _skipped wherever possible; rename where not (LimitBlock)

* Simplified SortedCollectBlock::getOrSkipSome and reused general pattern & code

* Implemented missing virtual function (with USE_FAILURE)

* Reset neccessary values during initializeCursor

* Simplified code in EnumerateListBlock a little

* Added a test for DistinctCollectBlock in aql-profiler

* Avoid redundant getSome calls in DistinctCollectBlock

* fix compilation

* Fixed DistinctCollectBlock profiler test

* Added a second profiler test for the DistinctCollectBlock

* Added a profiler test for EnumerateCollectionBlock

* Bugfix in EnumerateListBlock

* added --server.fifoN-size

* Simplified EnumerateCollectionBlock::getSome

* Simplified EnumerateCollectionBlock::getSome, and return HASMORE less often when DONE

* Fix testEnumerateCollectionBlock1 for mmfiles

* do not pass by reference

* Fixed compile error

* fixed merge conflicts

* Added profiler tests for EnumerateCollectionBlock

* Test fix for mmfiles

* Fixed IResearch tests

* Bugfix in DistinctCollectBlock and a regression test

* Updated comment

* Bugfix for query statistics in cluster

* Check plan in distinct test

* Fix aql-profiler tests in cluster

* Removed unused line / bugfix for single server test runs

* This commit implements waking up of AQL queries. (#5651)

* Non-compiling intermediate commit for handover.

* Make branch compile again

* Started implementation of continueable rest cursor handler by moving the callbacks to the outer part. This is not yet fully tested!

* Made finalizeExecute noexcept. We cannot react to this errors as the response was potentially written before. Also introduced continueExecution in the RestHandler engine.

* First successful query wakeup.

* The wakeup callback now posts on the scheduler directly. A resthandler only needs to provide a callback that encapsulates the continueExecution call on this handler

* renamed finalizeExecute to shutdownExecute

* Added a differentiation between Handler and Callback in Query continuation. Handler will be posted in IO service. Callback will be executed directly

* fix audit log

* Removed callback from deleteQueryCursor. This cannot be waiting

* use CONDITION_LOCKER

* removed yet another thread-local variable

* Fixed forward declaration

* Made RestAqlHandler repeatable

* Use defer to close the query in RestAqlHandler. Now waiting will close the query as well.

* Added a mutex in the RestHandlers to make sure if the callback over network is too fast that there is only one Thread running in the RestHandler

* Captured the GeneralCommTask if it is posted to a RestHandler. This is necessary in the PAUSED case

* Refactoring of _noLockHeader responsibilities. Now the BaseHandler selects them and resets them after it is done. Only Coordinators are allowed to define them if a query is loaded.

* Removed reaction to existing nolockheaders in Coordinator Query Planning Phase

* Removed incorrect assertion.

* Further refactoring of NoLockHeaders. Now there is a wrapper class around it which allows for debugging and logging. The state now seems to be better. Also all non-rest-handler triggered queries clean up the NoLockHeaders properly.

* Fixed UserManager, now deletes nolock headers properly

* Swing to the Symphony of Destruction

* Forgot about community build...

* Fixed compiling of Catch tests

* Fixed community build

* need thread for size

* Made the restSimpleHndler repeatable

* Implemented dump and dumpSync in Cursors, Sync will block a thread, dump allows to wait, only relevant for Streaming cursor

* Reactivated StreamingCursors

* Removed debug output.

* Fixed false query continuation

* Reset thread output to non-debug

* Added missing return statements

* Allow some CollectionMethods to hand-in a context that may contain a transaction. This is meant to honor nolock headers.

* Fixed hidden merge conflict

* Bugfix in aql-profiler.js: use plan.nodes order, not stats

* Added two profiler tests for filter

* Avoid too many getBlock calls in the FilterBlock

* Removed debug output

* RemoteBlock API will now send a done(bool) flag whenever we request documents from remote Servers. It is possible that we are DONE and have a result. The pre 3.4.0 API uses exhausted which is exclusive to a result. This API is still implemented for beckwards compatibility.

* Implemented an executeSync function in AqlQuery. This will block the thread until query execution is complete

* Added another test for FILTER, and one test for the HashedCollectBlock

* Added more tests for HashedCollectBlock; avoid unneccessary getSome calls

* Added an profiler IndexBlock test

* IndexBlock: avoid redundant getSome calls, added missing traceGetSomeEnd calls

* Added a second test profiling IndexBlock

* Added a third test for IndexBlock

* Moved general code to module

* Moved noncluster tests into a separate file

* Split aql-profiler testsuite into three files

* Added profiler tests for LimitBlock

* Added a test for NoResultsBlock

* Added profiler tests for TraversalBlock

* Shutdown of an AQL query is now asynchronous. However in Error-Cases it will be executed in a blocking way still

* Optimized TraversalBlock getSome calls due to new (nightly) test results

* Fixed std::min calls I broke

* Let shutdown calls in AQL wait, if the query is executed successfully.

* Fixed queryResult going out of scope

* fix compile error through merge conflict with devel

* Fixed compiler warning "mismatching tags"

* Removed debug log output

* Added TODO notes

* Fixed test fail due to devel merge

* Fixed some invalid sync waiting implementations

* Added a profiler test for SortBlock

* Added profiler tests for SortedCollectBlock

* Fixed bug introduced by devel merge

* Fixed Remoteblocks ignoring errors!

* Added some more continue Callbacks in used places. And removed debug log

* Removed debug log output

* Suppress clang warnings

* Bugfix: use of invalid stack pointer

* Bugfix: RemoteBlock::shutdown now sends code as int, not string

* Revert "Suppress clang warnings"

This reverts commit 05591649c59743c992edd5e78814edc8ca2a83e0.

* Bugfix: cleanup state in RemoteBlock ::shutdown, ::getSome and ::skipSome

* Bugfix in Subquery shutdown: don't skip subquery shutdown when main query shutdown failed

* Allow copy elision
2018-07-09 14:24:10 +02:00
Simon 545561e9a9 Read only server (#5652) 2018-07-03 09:58:16 +02:00
Wilfried Goesgens 3cd1a52dbb add the server state to the 'Flaky agency' error, since this may stop improperly configured servers from starting up (#5433) 2018-06-04 11:34:19 +02:00
Jan 8e6d5df129
fixed minor several compiler complaints (#5406) 2018-05-23 11:50:00 +02:00
Matthew Von-Maszewski 0264f3bc9b update gossip loop to be more responsive to other agents (#5390) 2018-05-22 16:30:27 +02:00
Kaveh Vahedipour 34f66539bd inception ignored leaders configuration (#5387) 2018-05-22 10:14:12 +02:00
Vasiliy 843e584746 issue 389.5: refactor StandaloneContext to be constructed with a TRI_vocbase_t& (#5370)
* issue 389.5: refactor StandaloneContext to be constructed with a TRI_vocbase_t&

* backport: address build issues
2018-05-17 01:15:50 +03:00
Simon 17b1a2aafb Rest middleware refactoring (#5332) 2018-05-14 17:43:10 +02:00
Simon f2b952134f Fixing agency pool update (#5316) 2018-05-14 14:56:19 +02:00
jsteemann 7f8a1cc614 Merge branch 'bug-fix/add-missing-overrides-and-final' of https://github.com/arangodb/arangodb into devel 2018-05-07 23:02:46 +02:00
Tobias Gödderz 8c87f51429 Feature/fix inconsistent distribute shards like job (#4743) 2018-05-07 16:53:08 +02:00
jsteemann 52de92d334 add missing override specifiers, add final specifiers 2018-05-04 09:01:50 +02:00
Jan 30b12e311b
Bug fix/remove most of aql js (#5223) 2018-04-30 11:17:11 +02:00
Simon 468231efc5 AQL Profiling code (#5165)
* initial start of profiling

* adding profiling code

* Fixing remote block tracing, fixing width and units

* Fixing some tests

* Various fixes

* adressing review comments
2018-04-24 16:17:30 +02:00
Wilfried Goesgens 7d6e580780 Refactoring & code cleanup (#5138) (#5142) 2018-04-24 14:42:23 +02:00
Matthew Von-Maszewski a84f7805ad Feature/mv thread death logging (#5111)
* Initial low level interface for thread crash reporting (and management).
* Add a member version of isClusterRole()
* isolate heartbeat thread creation to new StartHeartbeatThread().  create heartbeat thread even if not a cluster or if an agent.
* update runDBServer() and runCoordinator() to shutdown more quickly by polling isStopping() at additional locations.
* copying updates from different branch / PR
* basic thread crash logging.  Not yet tied into Agency arangod or have any specific threads posting crashes
* make Supervision thread a CriticalThread
* sandwich CriticalThread between Thread and other classes to create long term, repeating thread crash reporting.
* restore code lost upon branch update relating to new startHeartbeatThread() function
* add CriticalThread.cpp to build
* add new runAgentServer() function to loop for Agents.  Make Heartbeat thread derive from CriticalThread.
* remove debug line
2018-04-23 15:50:14 +02:00
Vasiliy 012aaa9469 issue 383.4: push vocbase validity check up from Query constructor out into arangodb::consensus::State, StatisticsWorker and AQLUserFunctions calls (#5177) 2018-04-23 14:52:42 +03:00
Simon 45fbed497b Supervision Job for Active Failover (#5066) 2018-04-23 12:49:41 +02:00
Jan 2b84348b77
remove call to requiresElevatedPrivileges with default value (#5172) 2018-04-23 11:28:24 +02:00
Kaveh Vahedipour 3d043b35a3 Feature/supervsion maintenance mode (#5108)
* Supervision goes to Maintenance mode, when /arango/Supervision/Maintenance exists
* coordinator route stands
* stop updates in transient, when supervision off
2018-04-20 13:23:22 +02:00
Kaveh Vahedipour 8bbe256633 Bug fix/node smapping logs (#5126)
* Node was spamming the logs for only bad reasons
* lets not spam the customers
2018-04-17 17:01:38 +02:00
Wilfried Goesgens 9f5323bc53 Bugfix/fix windows warnings (#5117)
* fix windows warning
2018-04-17 13:35:03 +02:00
Matthew Von-Maszewski c0c149cf5b Create non-throwing wrappers for Node access in Agency (#4598)
* safety checkin of Node throw reduction.
* final round of Node throw protection.  Common accessors now protected to force code to hasAsXXX() functions.
2018-04-17 10:21:14 +02:00
Kaveh Vahedipour f4edcc7ba8 Bug fix/supervision engine starting early on leadership change (#5062)
* supervision must not work as long as agent is still preparing
* leadersince atomic and pushed to end of leader preparation
* More consistent use of integer types.
* Slightly change order of events in Supervision loop.
2018-04-10 15:28:26 +02:00
Kaveh Vahedipour 53bc6914c0 Only payload, when in PENDING (#5021)
* Only payload, when in PENDING
* change log entry
2018-04-10 11:59:22 +02:00
Simon 68442dae5a Fixing agency prefix in Agency/Job.cpp (#5039)
* Fixing some test issues and fixing the agency prefix in Agency/Job.cpp
* Making logic consistent in  failed- leader / follower job
* reverting condition back to == GOOD
2018-04-09 16:21:24 +02:00
Jan 7cb115a1a9
remove option `--cluster.my-local-info` (#4999) 2018-04-03 17:34:08 +02:00
Max Neunhoeffer 790824fd68
Merge remote-tracking branch 'origin/devel' into feature/arangosearch-cluster-views 2018-03-26 10:50:23 +02:00
Tobias Gödderz 7e53d3ed75 Bugfix / Supervision: removeFollower should remove the last follower(s) first (#4923)
* Added a test asserting the last followers are removed first as required by moveShard
* Remove the last followers first
* Removed unused includes
* Updated CHANGELOG
2018-03-23 09:34:04 +01:00
Max Neunhoeffer d4616a6063
Merge remote-tracking branch 'origin/devel' into feature/arangosearch-cluster-views 2018-03-19 10:08:47 +01:00
Vasiliy 148bdb7158 issue 344.6: remove some redundant functions (#4842) 2018-03-15 11:03:35 +01:00
Max Neunhoeffer 0f46598200
Merge remote-tracking branch 'origin/devel' into feature/arangosearch-cluster-views 2018-03-14 23:24:41 +01:00
Kaveh Vahedipour 2e2d947c1c devel: fixed the missed changes to plan after agency callback is registred f… (#4775)
* fixed the missed changes to plan after agency callback is registred for create collection
* Force check in timeout case.
* Sort out RestAgencyHandler behaviour for inquire.
* Take "ongoing" stuff out of AgencyComm.
2018-03-14 12:01:17 +01:00
Max Neunhoeffer 0a88c94805
Create Plan/Views/_system at Cluster deployment. 2018-03-08 11:08:45 +01:00
Tobias Gödderz 4f6847b1b8 Bug fix/supervision bug distributeshardslike and virtual collections (#4759) 2018-03-07 09:54:39 +01:00
Michael Hackstein 76e7461aa9
Revert "bug fix for jobs looking at distrubuteShardsLike and virtual collections (#4665)" (#4758)
This reverts commit 3c35cd32dd.
2018-03-05 17:48:29 +01:00
Kaveh Vahedipour 3c35cd32dd bug fix for jobs looking at distrubuteShardsLike and virtual collections (#4665) 2018-03-05 17:37:07 +01:00
Jan 5a67a048c5
bump version number for all local DDL changes and tell agency (#4685)
this allows other listeners (e.g. for DC2DC) to get notified when
DDL operations are carried out locally and need to be applied remotely
2018-03-05 17:06:34 +01:00
Simon 345fc3c0b7 Refactor Authentication Layer (devel) (#4592)
* Cherry Picking LDAP changes

* Adding missing merges

* Fixing remaining mentions of FeatureCacheFeature

* Fix jslint

* Fixing some failed tests

* Fixing cluster authentication issue, red tests

* Fixing ldap testsuite, adding trace logging

* Fixint ldap tesuite setup and LDAP recognition

(cherry picked from commit 686d28a779)

* Fixing wrong assert

* Adding changelog entry, making requested changes from code review

* Fixing dump_authentication, fix typos

* improvements found during code review

* oops

* more use of sessionstorage

* fix tests

* Fixing broken handling, disallowing adding of local users when disabled

* Fixing testInvalidGrants

* Removing undefined auth level externally

* Fixing previous commit

* added tests for ldap search mode

* intentionally removed `after` methods from tests

because they are executed before the tests start
no cleanup is performed right now after the authentication tests
however, a cleanup is done at start of every test

* ldap tests all modes

* forward port changes from 3.3

* added generated files

* forward port missing changes for web UI

* added generated files

* added generated files
2018-02-28 13:24:28 +01:00
Matthew Von-Maszewski e566150b2e There is a start-up race condition where collection could be in plan but not current. A server shutdown during this period locks system. (#4478) 2018-02-19 09:14:24 +01:00
Simon 35136a89c0 Fix some problems with active failover (#4540) 2018-02-09 15:11:53 +01:00
Jan b2ceb68205
Feature/small misc optimizations (#4504) 2018-02-08 09:25:07 +01:00
Kaveh Vahedipour 7f9786eb27 builder fixed for agency transaction. worked only for a single server. (#4436) 2018-02-06 23:14:53 +01:00
Kaveh Vahedipour 42f543fd10 constituent correctly persisiting _votedFor and _term (#4248) 2018-01-16 09:47:25 +01:00
Jan b2b6c06cbf
Feature/efficiency (#3736) 2018-01-05 16:51:31 +01:00
Kaveh Vahedipour 7715c75c59 let's not miss failedserver removal (#4208)
* let's not miss failedserver removal
* remove resetting of FailedServers in test code
* Only call abortRequestsToFailedServers at most every 3 seconds.
2018-01-03 21:55:40 +01:00
Matthew Von-Maszewski ae77ff80c2 create independent executeLockedRead and executeLockedWrite to speed performance (#4177) 2017-12-29 12:02:27 +01:00
Max Neunhöffer 927027695d
Sort out locking agency to separate reads and writes. (#4174)
* disentagle writes and reads in agency
* renamed _oLock to _outputLock.  Documented read and write rules for _readDB and _commitIndex using _outputLock and _waitForCV.  Adjusted code to match rules.
* update executeLocked() knowing some callers use _readDB via readDB().  readDB() currently read only, but using write locks due to absolutely safe.
* Lay out clear rules against deadlock in agency.
* Avoid unprotected access to _commitIndex.
2017-12-28 11:27:20 +01:00
Max Neunhöffer 7bae6980e8
Bug fix/agent lead hanger (#4147)
* Really enforce the hidden option --server.maximal-threads if given.
* Switch off --log.force-direct in scripts/startStandAloneAgency.sh
* Lower the timeout for sending AppendEntriesRPC to 150s.
* Erase _earliestPackage when becoming a leader.
* Challenge leadership in agent main loop.
* Use steady_clock for _earliestPackage.
* Change _lastAcked and _leaderSince to steady_clock as well.
* time difference calculations based on old readSystemClock to steadyClockToDouble
* All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps
* Inception system_clock to steady_clock
2017-12-27 16:45:39 +01:00
Matthew Von-Maszewski 8723df7681 Fix supervisor thread crash (#4083)
* Server short name could arrive too late for first health check.  Would lead to supervisor thread crash.  Add test for this condition and defense against other unknown throws in health check.

* Correct capitalization of ShortName.  Add spaces to two Log lines.
2017-12-27 16:10:47 +01:00
Matthew Von-Maszewski cb56f0acf1 Have twice seen coordinator go into long loop on shutdown. Added two tests for isStopping() to break the loops. (#4138) 2017-12-21 20:32:16 +01:00
Kaveh Vahedipour 22e6a68747 Bug fix/integer overflow when calculating waits in constituent (#4050)
* integer overflow in Constituent could seize operation of Agency

* less likely integer overflow on double conversion

* less likely integer overflow on double conversion

* changed comparison to integer comparison as suggested by @neunhoef
2017-12-19 21:40:46 +01:00
Jan 9c76613e63
fix premature unlock (#3802)
* fix some deadlocks found by evil lock manager (tm)

* fix duplicate lock

* fix indentation

* ensure proper lock dependencies

* fix lock acquisition

* removed useless comment

* do not lock twice

* create either a V8 transaction context or a standalone transaction context, depending on if we are called from within V8 or not

* AQL micro optimizations

* use explicit constructor

* only use V8DealerFeature's ConditionLocker for acquiring a free V8 context

entering and exiting the selected context is then done later on without having to hold the ConditionLocker

* remove some recursive locks

* Disable custom deadlock detection when Thread Sanitizer is enabled

* Changing ifdef's

* grr

* broke gcc

* Using atomic for ApplicationServer::_server

* fix premature unlock

* add some asserts

* honor collection locking in cluster

* yet one more lock fix

* removed assertion

* some more bugfixes

* Fixing assert

(cherry picked from commit 1155df173bfb67303077fbe04ee8d909517bfd21)
2017-12-13 13:27:42 +01:00
Kaveh Vahedipour ace06575dd when upgrading from 3.1 LastHeartBeatAcked could also have been missing, when the 3.1 cluster had not run for long enough (#3757) 2017-12-08 15:56:19 +01:00
Jan 282be208cc
remove TRI_usleep and TRI_sleep, and use std::this_thread::sleep_for … (#3817) 2017-12-06 18:43:49 +01:00
Max Neunhöffer 74458d9d34 Add security check in AgencyComm::sendWithFailover. (#3838) 2017-12-06 10:50:40 +01:00
Kaveh Vahedipour 11cfa74495 Bug fix/no support for inquiry in send transaction with failover less verbosity (#3847) 2017-11-30 14:54:22 +01:00
Kaveh Vahedipour f7b4150b64 no clientId anymore in send/sendWithFailOver SPIs (#3819) 2017-11-28 10:47:36 +01:00
Kaveh Vahedipour c300eee5f0 minor (#3813) 2017-11-27 18:22:13 +01:00
Kaveh Vahedipour 27cd691bbf Bug fix/agencycomm validate methods broken (#3805) 2017-11-27 14:18:25 +01:00
Kaveh Vahedipour 2beaef41ff Bug fix/agencycomm validate methods broken (#3784) 2017-11-24 10:31:07 +01:00
Simon Grätzer 987daca85b Handle invalid endpoints in AgencyComm (#3729) 2017-11-17 16:35:59 +01:00
Kaveh Vahedipour 7b80deb5cc Fixed object assignment operator for agency's key value store (#3701)
* Fixed object assignment operator for agency's key value store
* Node's toJson is now actually toJson. getString should be used for string extractions
* adjust agency's documentation (clarify precondition)
2017-11-17 15:49:40 +01:00
Kaveh Vahedipour 255d90d26a cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency (#3709)
This is the HealthRecord upgrade patch.
2017-11-17 10:14:14 +01:00
Jan b4f6ee9273 Feature/improved index api for unique constraints and replication (#3715) 2017-11-16 21:02:01 +01:00
Jan 5abf0c1185 Bug fix/fixes 1511 (#3711) 2017-11-16 14:18:51 +01:00
Max Neunhöffer 766ab7c8cf
Fix agency shutdown bug. (#3683)
* Fix agency shutdown bug.
* Remove precondition that was not needed in AgencyComm::removeValues.
* Fail fatally if threads do not shut down.
2017-11-14 16:33:46 +01:00
Jan e1ecc6b02c fix some threading issues (#3659) 2017-11-12 22:34:51 +01:00
Kaveh Vahedipour c9621ff230 Feature/new agency checks for preconditions (#3612) 2017-11-11 22:48:23 +01:00
Max Neunhöffer bff630b332 Handle leader resignation race with redirectRequst. (#3663) 2017-11-11 19:38:29 +01:00
Kaveh Vahedipour 7e816db51e Bug fix/agency restart enhancements (#3619)
* Removed unused active(...) method in Agent
* Inception's restart from persistence allows peer with empty active RAFT list to join
* Agency's UUID is persisted outside of the database comparable to coordinator and db server action.
* Publicized Methods to UUID stuff in ServerState
* Inception method documentation
* added --agency.disaster-recovery-id to allow for specification of known former agency id. this is a very dangerous option potentially.
* Delete a unused methods.
* separate _id and _recoveryId
* populating active list with entire pool
* Improve logging.
* reject gossip from unknown agent, if pool is complete
2017-11-10 23:40:26 +01:00
Jan bef52d7dc3
Bug fix/cleanup after cppcheck (#3639) 2017-11-10 13:53:28 +01:00
Max Neunhöffer 3c0ee6908b Bug fix/lead to agent (#3541) 2017-11-09 11:10:09 +01:00
Jan 98eecaae20 bug fix for agency precondition checks (#3579) 2017-11-06 23:55:41 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
jsteemann a5c777e565 fix broken inquiry results in AgencyComm 2017-10-26 20:10:54 +02:00
Max Neunhöffer cb05d33e17 Term is a number not a string. (#3520) 2017-10-26 12:02:38 +02:00
Max Neunhöffer ee96c37237 Fix agency restart problems. (#3493)
* Fix agency restart problems (port from a 3.2 fix).

* Further fixes after Craneware rescue.
2017-10-25 18:05:58 +02:00
Michael Hackstein 15d9a4be5f Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510) 2017-10-25 18:03:24 +02:00
Jan 720e6df82e Bug fix/fixes 1910 (#3471)
* properly initialize all properties

* use faster comparison

* properly detect and handle "method not allowed"

* code-style

* remove unused variable

* narrow variable scope

* handle non-existance of AuthenticationFeature

* remove dead code

* replace some C string handling with std::strings

* moved assertion to the correct place

* honor number of array members for IN operator

* slightly adjust error messages

* slighty adjust some error messages

* try to fix issue with lingering replication contexts on shutdown

* clean up heartbeat thread a little bit

* small fixes
2017-10-23 09:17:36 +02:00
Max Neunhöffer 67300f9d77 Add a hidden AGENCY_DUMP for agency emergency recovery. (#3474) 2017-10-21 00:24:32 +02:00
Simon Grätzer fd3f9d99d9 Fixing webinterface access (#3464)
* intermediate commit

* Refactoring the ExecContext

* Fixing authentication

* Added start script

* some fixes

* fixed access to nullptr

* some c++

* fixed misleading message

* Made DatabaseGuard movable. Also adapted map insertions to _vocbase in Syncer classes, which failed to compile under older GCC versions

* added support for global flag to replication handler

* Started Refactoring in replication-static

* Fixing syncer code

* store applier configuration

* Static replication tests now test replication in a non system Database

* added flags to replication feature

* Adding some extra checks

* Fixing issue with rocksdb rest replication handler

* replication static now runs _system and otherdatabase replication tests.

* Fixing crash on startup

* Replication_sync now tests _system as well as other Database

* Fixing up heartbeat thread, adding global flag to rest handler

* Fixing wrong assert

* some cleanup, probably some tests are broken

* Made non-system db version of replication-ongoing tests

* fix determine-open-transaction

* Fixed ongoing tests. And added a test where we drop a database on slave while replication is still ongoing

* test fixes

* Activated ongoing other db tests. Also added a test that drops the DB on master, while the slave is still syncing.

* some better error reporting

* gradually switch to Result

* createCollection -> create

* re-activate using of collection ids for now

* enable auto-start

* Fixed create collection in replication ongoing test

* Added first draft of a test for global replication

* move to Result

* use system database for global applier

* improved error reporting

* fixed invalid URLs

* add test case filter

* load existing global applier configuration

* improve error reporting

* Added further tests for global replication

* Fixed global replication test, it now properly waits for replication. Timeouts after 10 seconds.

* Removed erronious assertion

* improve error reporting

* intermediate commit

* Added a test-case for global replication where the Master already has some data and the slave is clean

* fix deletion of replication contexts

* Fixed JSLint

* compiling code

* fix typo

* do not fail for global applier when no database is configured

* intermediate commit

* syncer supports switch for 3.3 / 3.2

* fixed errors

* Fixing some replication bugs

* Fixing some assertions

* Fixed missing commit markers

* Fixing assertion on database drop

* Attempt to fix deadlock in applier and assertion

* Fixing some stupid things

* Support for collection parameter

* Acidentally turned off some tests

* Grrr

* Fixing wrong method call

* Fixed startscript

* Fixed assignmet instead of equality check typo

* Added a test far interrupted replication. For now it justs tests basics on _system database.

* Improved index tests on replication.

* properly initialize variable

* fixed some replication problems

* MMFiles wal access support

* fix replication issues

* Started mmfiles replication support

* fixing a bug

* Fixing an issue

* fixing some mmfiles stuff

* fix test

* reload users

* prevent pure virtual method call

* intermediate commit

* Making from exclusive

* do not call getMasterState if child syncer

* some reformatting

* Adding global support for handleCommandSync

* Fixing assertion

* removing some debug logs

* Changing return codes

* Fixing some issues in the rest handler

* Make replication less susceptible to errors

* remove some debug output

* return last log tick

* remove waits from tests

* fix two tests

* changing header for open-transactions call

* some fixes

* fix test

* invalidate cached databases

* merging request and execcontext

* try to fix assertion error

* renamed method

* fix compile warning

* small changes

* Always use execcontext

* Fixing an assert

* fix replication issues

* try to fix collection lookups

* try to fix master/slave start

* Changing comments in heartbeat thread

* fix wrong signature of READ_LOCKER_EVENTUAL

* log server role in testing mode

* Fixed authentication, removed execContext in favor of request context

* Adding cluster rest api

* Fixing cluster rest handler

* Fixing cluster callback

* Some refactoring

* Queue creation is not a single operation

* Allowed for leader redirects

* Setting start of batch

* Disabling 2.8 compat tests

* fix start/stop bugs

* jslint

* various little changes

* add flag for exposing jwt

* indentation

* cleanup

* Some changed to guid

* fixing tcp to http, vst

* changed endpoint header

* small fixes

* Reorder servers by health status

* Higher timeout

* Changing error messages

* update the fromTick when fetching multiple batches from the coordinator

* more debug info

* Reducing copy pasted code

* change uid generation

* reducing logspam

* more exceptions for redirects

* more exceptions

* attempt to fix uniqids in cluster

* centralize printing of HTTP errors in replication

* debug output

* fix messages for authentication

* cleanup

* removing --cluster.my-id, --cluster.my-local-info

* Added leadership race to bootstrap, determine foxxmaster on boostrap, removing obsolete code

* improve error reporting in RestAqlHandler

* Changing heartbeat thread, fixing cluster_sync

* some more debug output

* added master

* attempt to make tests more deterministic

* added logging about indexes

* added some safety checks to the logger

* slighty better error messages

* fix location header for SSL

* fix error message

* try to make tests more deterministic

* change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED

* Fixing broken webinterface access

* reverting groovy change

* Fixing read-only internal users

* Using superuser rights for dashboard now

* Adding mode field to _admin/server/role

* added mode TRYAGAIN

* remove inventory lock (does not seem necessary here)

* remove invalid assertion

* fixing agency bugs

* Removing debug output

* return proper errors in case of "method not allowed"

* Fixed up some info messages

* jslint
2017-10-20 18:06:59 +02:00
Kaveh Vahedipour 428e163db9 Return the result of the inquiry (#3465) 2017-10-20 15:01:32 +02:00
Jan 7840d3f824 Bug fix/fixes 1810 (#3460)
* improve error reporting in RestAqlHandler

* added logging about indexes

* added some safety checks to the logger

* slighty better error messages

* fix location header for SSL

* fix error message

* try to make tests more deterministic

* change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED
2017-10-19 11:28:01 +02:00
Simon Grätzer 7c31960cf2 Feature/async failover (#3451) 2017-10-18 23:59:29 +02:00
Kaveh Vahedipour 46333a762f Bug fix/agency restart after compaction and holes in log (#3413)
* State fixes holes in RAFT index range
* Avoid application of entries older than compaction index _cur and guard for unsigned overflow
2017-10-13 16:01:41 +02:00
m0ppers bb1d303473 Cmake 5.0 complains about unused lambda captures (#3390) 2017-10-13 12:20:48 +02:00
Max Neunhöffer 9a2385b941 Add host id detection and show in /_admin/cluster/Health. (#3389) 2017-10-11 12:42:44 +02:00
Max Neunhöffer d86f27bd19 Bug fix/agency leader timeouts (#3373)
* Send out empty heartbeats regardless of non-empty AppendEntriesRPC.
* Also improve logging:
  Note if a log in the empty heartbeat sending takes > 0.01 s.
  Clearly mark places where a leader resigns in logging.
  Log if no empty heartbeat is sent out.
* Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses.
* Add debug logging for _lastAcked and challengeLeadership.
* Remove some unused code. Do not count ourselves in challengeLeadership.
* Removal of entire activation/deactivation mechanisms in agency
* TRI_microtime up to c++11
* added term to response to sendAppendEntries.
2017-10-06 10:11:51 +02:00
Max Neunhoeffer af3f977997
Revert "Send out empty heartbeats regardless of non-empty AppendEntriesRPC."
This reverts commit e974501446.
2017-10-02 15:02:15 +02:00
Max Neunhoeffer 2852f80b5a
Revert "Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses."
This reverts commit 45d37edfb2.
2017-10-02 15:02:06 +02:00
Max Neunhoeffer 45d37edfb2
Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses. 2017-10-02 15:01:11 +02:00
Max Neunhoeffer e974501446
Send out empty heartbeats regardless of non-empty AppendEntriesRPC.
Also improve logging:
  Note if a log in the empty heartbeat sending takes > 0.01 s.
  Clearly mark places where a leader resigns in logging.
  Log if no empty heartbeat is sent out.
2017-10-02 14:14:41 +02:00
Max Neunhöffer 47f367d3f0 Bug fix/agency compactor deadlock (#3335)
* Fix a deadlock between Agent thread and compactor thread.
* Improve comments in header.
* Organise clean shutdown of agency threads.
2017-09-28 12:20:57 +02:00