1
0
Fork 0
Commit Graph

329 Commits

Author SHA1 Message Date
Tobias Gödderz dc2e27db6c [3.4] Feature/rebootid notice changes, backport of #9523 (#9685)
* Feature/rebootid notice changes, backport of #9523

* Move to 3.4 and C++11-compatible code (except for test code)

* Backported tests/Mocks/Servers.{h,cpp}

* Rebuilt errorfiles

* Ported CallbackGuardTest from gtest to catch

* Ported RebootTrackerTest from gtest to catch

* Make sure the state method is called, so the overridden method is used during tests

* Fixed test to work with the old scheduler

* release version 3.4.8

* [3.4] fix agency lockup when removing 404-ed callbacks (#9839)

* Update arangod/Cluster/ServerState.cpp

Co-Authored-By: Markus Pfeiffer <markuspf@users.noreply.github.com>

* Instantiate the scheduler during ::prepare()

* Fix test crash introduced during backport

* Fix a compile error on Windows (thanks, V8)
2019-09-19 15:03:39 +03:00
KVS85 0000a2d459 Revert "Let the server honour the keep alive timeout (#9477)"
This reverts commit 80c75af552.
2019-09-11 19:25:14 +02:00
Simon 80c75af552 Let the server honour the keep alive timeout (#9477)
* Fix keep alive timeout

* add changelog

* Update CHANGELOG

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>
2019-09-11 13:05:14 +03:00
Lars Maier 5b41e5a5c8 [3.4] Scheduler Logging (#9028)
* Log when queues become filled up or are completly filled.
* Added latency for pushing.
* lowered log level.
* clever logging. and mutex. :(
* Reset local clock.
* Try to fix mac compile.
* Improve logging logic for half full and full queue.
* CHANGELOG.
2019-05-20 16:30:52 +02:00
Jan 4a1f25ed46
use JobGuard when querying users from DB in cluster (#8057)
* use JobGuard when querying users from DB in cluster

* fix test crashes
2019-01-30 12:00:50 +01:00
Jan 3c828347dc
do not simplify non-deterministic conditions (#7927) 2019-01-19 18:52:17 +01:00
Matthew Von-Maszewski 474f0cde31 Bug fix 3.4/scheduler empty reformat (#7872)
* added check for empty scheduler

* removed log, old is 1 not 0

* require running in this thread

* test

* added isDirect to callback

* signature fixed

* added drain

* added allowDirectHandling

* disabled for testing

* Add ExecContextScope object to direct call.

* try alternate initialization of ExecContextScope

* remove ExecContextScope, no help.  try _fifoSize as part of direct decision.

* strand management to minimize reuse of same strand per listen socket

* blind attempt to address Jenkins shutdown lock up.  may remove quickly.

* add filename and line to existing error log message

* Adjust queueOperation() to stop accepting items once isStopping() becomes true.

* revert previous check-in to MMFilesCollectorThread.cpp

* big reformat

* fixed merge conflicts

* Add CHANGELOG entry.
2019-01-08 20:39:42 +01:00
Frank Celler 9477af198b big reformat 2018-12-26 00:57:05 +01:00
Jan 677522991e
Feature/internal 3306 (#7683) (#7688) 2018-12-06 17:46:58 +01:00
jsteemann 9658300f11 revert Scheduler changes 2018-11-26 09:54:41 +01:00
Jan b363372c63
Bug fix 3.4/remove shutdown assertion (#7387) 2018-11-22 15:36:06 +01:00
Matthew Von-Maszewski 4362137ba4
Bugfix 3.4: Null pointer defense in Scheduler::post(callback) (#7285)
* defense against the dark arts (nullptr in _ioContext)

* move incQueued() so that we can imply race state of _ioContext.

* adjust to meet Jans expectations

* jsteeman noticed that queue count is not considered before shutdown ... bad

* add JobGuard object to manage working count.  should hold shutdown a tad longer.

* TEMPORARY HACK:  need to validate problem that is randomly occurring in Jenkins automation

* TEMPORARY HACK 2: trying to isolate an acceptable sequence.

* TEMPORARY HACK 3: trying to isolate an acceptable sequence.

* TEMPORARY HACK 4: so close ... seem to have all the moving parts isolated.  Come on Jenkin!

* shutdown now orderly finishes everything already in fifo queues and active on threads.  Then forces any late requests to execute on callers thread.
2018-11-16 12:20:00 -06:00
Matthew Von-Maszewski 1c3672be75
revert accidental check-in of thread delete constant edit (#7208) 2018-11-04 10:36:36 -05:00
Simon c073b9dbbe Make ensureIndexOnCoordinator more robust (#7110) (#7130) 2018-10-30 11:25:06 +01:00
Matthew Von-Maszewski a1af2e305f Bugfix 3.4: update Scheduler thread logic based upon testing (#7056) 2018-10-24 22:49:28 +02:00
Lars Maier d7863b4583 Bug fix 3.4/cluster comm threads start stop (#6939)
* Start ClusterComm threads in `ClusterFeature::start`. Stop ClusterComm threads in `ClusterFeature::stop`.

* Do not free objects in `Scheduler::shutdown`. Let the `unique_ptr` do their job. Stop ClusterComm threads in `ClusterFeature::stop`, but free instance in `ClusterFeature::unprepare`.

* `io_context` may contains lambdas that hold `shared_ptr`s to `Tasks` the required a functional `VocBase` in their destructor.

* Clean up.
2018-10-19 13:12:51 +02:00
Jan 18de63c7c8
Feature 3.4/medium priority (#6910) 2018-10-18 17:08:39 +02:00
Matthew Von-Maszewski a9ce39f85c Bugfix 3.4: Merge scheduler changes by Michael & Frank into recent overlapping code changes (#6928)
* manual recreation of bug-fix-3.4/scheduler-high-low within recent Scheduler changes.

* restore Documentation that was unintentionally deleted
2018-10-16 22:51:00 +02:00
Matthew Von-Maszewski 25bc48e548 minimum fixes to clear scheduler timeout problem. (#6878) 2018-10-15 09:06:15 +02:00
Matthew Von-Maszewski 887822afa6 Bug fix 3.4: libcurl threading changes (#6829)
* enable the ability to push results processing to threads
* have ClusterComm push libcurl response processing to Scheduler threads
* tuning changes from Matthew and Michael
* give new defaults to minimum thread count
* create multiple ClusterCommThreads, each with own Communicator object
* put PR notes in change log
* correct speling
* Also drain V8 queue.
* Add prio V8 to switch in canPostDirectly.
* Accept --server.minimal-threads even if maximal threads is not set.
* Reactivate stopping of threads.
2018-10-12 17:00:55 +02:00
Simon 05446dcac0 Bug fix 3.4/activefail debug (#6717) 2018-10-05 18:36:06 +02:00
Jan 676a51731d
close socket when shutting down (#6651) 2018-09-28 17:50:33 +02:00
Jan e37576e03c
Bug fix 3.4/fix ssl vst (#6548) 2018-09-19 23:22:40 +02:00
Jan 5873f63a72
Bug fix/fixes 2908 (#6279) 2018-08-31 17:26:54 +02:00
Lars Maier 889ce78dcf Batch Handler V8-lane bug (#6196) 2018-08-21 13:52:14 +02:00
Jan 86204ed0b8
fix memory leaks in arangosh connections (#6160) 2018-08-17 08:48:54 +02:00
Jan d6a3b66e2a
micro optimizations (#6162) 2018-08-16 08:50:16 +02:00
Vasiliy 6fd541d110 issue 427.5: use ApplicationServer reference instead of pointer (#6145)
* issue 427.5: use ApplicationServer reference instead of pointer

* address MSVC build failure
2018-08-15 12:16:02 +03:00
Frank Celler a688dc0962
Feature/remove job queue thread (#5986)
limiting V8 calls in flight
2018-08-10 12:17:43 +02:00
jsteemann 7000f49201 change log level from warning to debug for closed connections 2018-07-20 16:19:41 +02:00
jsteemann 2138dc0479 Merge branch 'bug-fix/fixes-1707' of https://github.com/arangodb/arangodb into devel 2018-07-19 16:02:30 +02:00
Jan 85da1aa7ef
fix unix domain socket connections (#5914) 2018-07-19 15:05:42 +02:00
jsteemann 1588c358b9 options cleanup 2018-07-17 12:33:10 +02:00
Jan 006995a6a5
Bug fix/dont start v8 for agency (#5891)
* disable V8 for agency setups

* add missing section declaration (fixes unrelated Windows bug)
2018-07-17 11:24:53 +02:00
jsteemann 36f05c07e0 cleanup of server options 2018-07-16 21:38:35 +02:00
Michael Hackstein 7a95c5e675
Feature/feature phases (#5272)
* Added feature phases

* BasicsPhase and DatabasePhase to the required files. Server now has Feature circles and does not boot. Will be sorted out later on.

* Added ClusterPhase to features

* Added V8Phase to the required features

* Added AQLPhase to the affected features

* Added ServerPhase to Features

* Added FoxxPhase to the relevant features

* Added AgencyPhase to the relevant features

* Moved registration from local variable SYS_SYSTEM_REPLICATION_FACTOR from cluster to V8 as their ordering is now vice versa

* Moved Bootstrap feature into FoxxPhase. It could be moved to ServerPhase easily if the FoxxQueue dependency would be removed

* Final movement of Startup Phases. Now solved all circles.

* Removed merge conflict

* Moved ReplicationTimeout into cluster phase and fixed cross-phase requirements

* Added greetings phase. This phase separates the Basics Phase and is the first to be run. Includes Logger and Hello/Goodbye

* Added the GreetingsPhase in the corresponding features. Now all BasicsPhase features start after greetings Phase. There is some issue in this branch which prevents the Agency from Gossipping right now. Will be fixed next

* Moved creation of the Agent into the prepare phase of the feature. THereby it is guaranteed that agents at least exists before the GeneralServer is activating endpoints

* Recovery needs to be started after the ServerID

* Moved log output of FeaturePhases to DEBUG instead of ERROR.

* Added feature phases for clients

* ClusterFeature now does not directly require AgencyFeature any more

* Added requirement of TravEngineRegistryFeature in AQL feature. Otherwise shutdown may be undefined

* The ApplicationServer can now handout the list of ordered features. Used for testing purposes

* Fixed IResearchVew Tests Setup to honor new feature ordering

* Fixed IResearchViewDBServer Tests Setup to honor new feature ordering

* Started fixing IResearchView Coordinator tests with startup ordering. Not finished yet

* Added startup phases to ViewCoordinator test

* Disabled expected logoutput in ClusterRepairsTest

* Fixed indention in test code

* LinkCoordinator now honors startup ordering

* Link meta now honors startup rdering

* Supress expected cluster logs in ViewTest

* Removed '#' accidentially added.
2018-07-16 14:09:36 +02:00
Jan 2cac8b8a51
honor some cppcheck recommendations (#5817) 2018-07-10 13:50:30 +02:00
Jan 9ab26ba972
Bug fix/fix hangs (#5808) 2018-07-10 11:39:37 +02:00
Tobias Gödderz fc3e11dbbc Async AQL (#5806)
* Modified header to new initializeCursor API

* Adapted initializeCursor to DONE/WAITING API. Compiles but not tested and no one reacts to WAITING state, it is not returned anywhere yet

* Subqueries now expect a WAITING return from initilize cursor. However they will just return a nullptr and pretend the query is empty, this will be fixed later

* First attempt to simulate thread waiting over information within the query

* Small fix to allow for isDirect handlers to go to sleep.

* Waiting in the necessary places now for the async request to be send.

* Thank you auto-casting compiler, you are totally right i absolutely wanted to use this bool value as an index in may Array. How could i possibly not want to use it here?

* Include cond-var header

* Fixed mutex/cond_var usage

* Added oldAPI wrappers in AQL Blocks for get/skip some variants. This Commit compiles but is NOT tested

* Let getSome now return unique_ptr of AqlItemsBlocks. Also implemented the async variant of getSome in subqueries.

* Removed all references to OLD implementations in AQL. only the base wrappers are allowed to call OLD functions from now on. Now the testing part starts

* Fixed endless virtual recursion

* Implemented new getOrSkip API in SortBlock

* Implemented new getOrSkip API in LimitBlock

* Initilaize all variables

* Fixed logic bug in SubqueryBlock

* getBlock in ExecutionBlock now returns a state. All blocks need to handle this properly!

* Createad a wrapper getBlockOld that servers the old sync api and is used now in AQL. To be replaced overtime.

* Added IndexBlock::skipSome and IndexBlock::getSome

* getBlock now returns its old return value along with the state

* Switch from getBlockOld to getBlock in IndexBlock::skipSome

* Switch from getBlockOld to getBlock in IndexBlock::getSome

* ShortestPathBlock::skipSome is not implemented! Added a regression test

* Attempt to fix SubQueryResult memory management

* Fixed LIMIT Block

* Moved from ShortestPathBlock::getSomeOld to ::getSome

* Implemented ASYNC api on SingletonBlock

* Adapted EnumerateCollectionBlock to new async API

* Fixed FilterBlock and adapted return block to async API

* Adapted NORESULTS block to async AQL api.

* Adapted Modification Blocks to async API

* Fixed some initialize cursor functions to reset values required during get/skipSome

* First steps to adapt ClusterNodes to Async AQL api. Not there yet, need to implement the core still

* Added asnyc implementation for xxxForShard in ClusterBlocks. This commit changes internal logic of _doneForShard. Needs additional testing as soon as everything is in place.

* Adapted CalculationBlock to async API

* Adapted TraversalBlocks to ASYNC Aql. This is not optimal yet, we need a better decission if we are DONE or not on RETURN

* Adapted EnumerateListBlock to Async AQL api

* Adapted RemoteBlock to ASYNC API in getSome/skipSome. The whole thing is now LIVE in the cluster. Exetensive testing to be started now

* Fixed IndexBlock WAITING behaviour if Waiting occurs during a index processing

* Adapted IReasearchViewBlock to ASYNC AQL API

* Fixed SortingGatherBlock in WAITING state.

* Adapted IResearch ExecutionBlockMock to Async API

* Unified the HASMORE/DONE distinction. Code is much more readable now and harder to get incorrect 👍

* Implemented tonly heoretically reachable function of non void function.

* Fixed last commit

* Added inline TODO comments

* fix warning

* Fixed a clearing logic bug in RemoveNodes

* Fixed Error Handling in RemoteBlocks. Also fixed a logic bug (true/false simply has a 50% chance of getting it wrong) in Distribute and Scatter.

* remove unused methods

* Fixed failure test

* implement skipping

* Moved the Query Waiting out of the ExecutionEngine.

* changed one of the collect blocks

* Removed _upstreamState from ExecutionBlockMock, that is in the base-class now

* Added a Test Mock for a an ExecutionBlock that simulates the WAITING/HASMORE/DONE api.

* do not check "hasMore" if not necessary

* Added DistinctCollectBlock::getOrSkipSome from ~Old and changed its return type

(still uses getBlockOld)

* Save state to resume in DistinctCollectBlock::getOrSkipSome

* Extracted redundant code

* fixed some ops

* added one more test

* fix endless blocking

* fix compile error

* fix test

* Refactored HashedCollectBlock::getOrSkipSome

* Return blocks to the manager

* Replaced usage of getBlockOld in HashedCollectBlock::getOrSkipSome

* remove unused shutdown calls, simplify ownership for expressions

* Removed superfluous variable

* Capture const variable by value

* Removed SortedCollectBlock::getOrSkipSomeOld in favour of getOrSkipSome

* Added a working version of SortedCollectBlock::getOrSkipSome

Has yet to be cleaned up

* Removed isTotalAggregation special treatment

* On no input, return a group of nulls (instead of no group at all)

* Bugfixes

* Simplified code

* Move return to the end, eliminate duplicate code

* Corrected skipped count in HashedCollectBlock

* Aligned getNextRow() implementations

* Added comments

* some cleanup

* fix potential memleak

* Bugfix

* Fixed failure tests

* Removed usage of getBlockOld in ExecutionBlock::getOrSkipSome

* Replaced hasMore with an async implementation (mostly)

* Removed getBlockOld()

* Added hasMoreState to the AQL API (and renamed hasMore methods to hasMoreState)

* RemoteBlock now uses the async hasMoreState route

* remove job queue

* options

* Bugfixes in the async implementation of LimitBlock

* LimitBlock::getOrSkipSome now always skips when calculating the fullcount

* fix compile warnings

* restrict threads

* Fixed api of Waiting ExecBlockMock. Unused yet

* Made SortedGatherBlock async-capable

* Removed nonEmptyIndex hack

* Removed duplicate traceGetSome~ calls, moved all to getSome

* Added asserts before replacing getNr*Registers

* Added a TODO note and a comment

* Removed getSomeWithoutRegisterClearoutOld()

* Removed skip()

* Removed common code by using getNr*Registers()

* Use getNr*Registers() in the TraversalBlock as well

* started to add lane

* started to add lane

* added lane

* completed lane

* removed debug output

* fixed merge

* Began working on a test suite for AQL tracing/profiling

* Added more tests and asserts in aql-profiler

* Made some ExecutionBlocks final

* Added a type enum to all blocks and the per-block stats

* Add block type to stats nodes when tracing AQL on block level

* Removed initializeCursor call from instantiateFromPlan

* Avoided additional getSome calls after DONE

* Added more profiler tests

* Refactored ExecutionBlock::getOrSkipSome and fixed two bugs

- set _upstreamState also when skipping
- explicitly use xecutionBlock::getHasMoreState()

* Bugfix: update state

* Reuse parent _skipped wherever possible; rename where not (LimitBlock)

* Simplified SortedCollectBlock::getOrSkipSome and reused general pattern & code

* Implemented missing virtual function (with USE_FAILURE)

* Reset neccessary values during initializeCursor

* Simplified code in EnumerateListBlock a little

* Added a test for DistinctCollectBlock in aql-profiler

* Avoid redundant getSome calls in DistinctCollectBlock

* fix compilation

* Fixed DistinctCollectBlock profiler test

* Added a second profiler test for the DistinctCollectBlock

* Added a profiler test for EnumerateCollectionBlock

* Bugfix in EnumerateListBlock

* added --server.fifoN-size

* Simplified EnumerateCollectionBlock::getSome

* Simplified EnumerateCollectionBlock::getSome, and return HASMORE less often when DONE

* Fix testEnumerateCollectionBlock1 for mmfiles

* do not pass by reference

* Fixed compile error

* fixed merge conflicts

* Added profiler tests for EnumerateCollectionBlock

* Test fix for mmfiles

* Fixed IResearch tests

* Bugfix in DistinctCollectBlock and a regression test

* Updated comment

* Bugfix for query statistics in cluster

* Check plan in distinct test

* Fix aql-profiler tests in cluster

* Removed unused line / bugfix for single server test runs

* This commit implements waking up of AQL queries. (#5651)

* Non-compiling intermediate commit for handover.

* Make branch compile again

* Started implementation of continueable rest cursor handler by moving the callbacks to the outer part. This is not yet fully tested!

* Made finalizeExecute noexcept. We cannot react to this errors as the response was potentially written before. Also introduced continueExecution in the RestHandler engine.

* First successful query wakeup.

* The wakeup callback now posts on the scheduler directly. A resthandler only needs to provide a callback that encapsulates the continueExecution call on this handler

* renamed finalizeExecute to shutdownExecute

* Added a differentiation between Handler and Callback in Query continuation. Handler will be posted in IO service. Callback will be executed directly

* fix audit log

* Removed callback from deleteQueryCursor. This cannot be waiting

* use CONDITION_LOCKER

* removed yet another thread-local variable

* Fixed forward declaration

* Made RestAqlHandler repeatable

* Use defer to close the query in RestAqlHandler. Now waiting will close the query as well.

* Added a mutex in the RestHandlers to make sure if the callback over network is too fast that there is only one Thread running in the RestHandler

* Captured the GeneralCommTask if it is posted to a RestHandler. This is necessary in the PAUSED case

* Refactoring of _noLockHeader responsibilities. Now the BaseHandler selects them and resets them after it is done. Only Coordinators are allowed to define them if a query is loaded.

* Removed reaction to existing nolockheaders in Coordinator Query Planning Phase

* Removed incorrect assertion.

* Further refactoring of NoLockHeaders. Now there is a wrapper class around it which allows for debugging and logging. The state now seems to be better. Also all non-rest-handler triggered queries clean up the NoLockHeaders properly.

* Fixed UserManager, now deletes nolock headers properly

* Swing to the Symphony of Destruction

* Forgot about community build...

* Fixed compiling of Catch tests

* Fixed community build

* need thread for size

* Made the restSimpleHndler repeatable

* Implemented dump and dumpSync in Cursors, Sync will block a thread, dump allows to wait, only relevant for Streaming cursor

* Reactivated StreamingCursors

* Removed debug output.

* Fixed false query continuation

* Reset thread output to non-debug

* Added missing return statements

* Allow some CollectionMethods to hand-in a context that may contain a transaction. This is meant to honor nolock headers.

* Fixed hidden merge conflict

* Bugfix in aql-profiler.js: use plan.nodes order, not stats

* Added two profiler tests for filter

* Avoid too many getBlock calls in the FilterBlock

* Removed debug output

* RemoteBlock API will now send a done(bool) flag whenever we request documents from remote Servers. It is possible that we are DONE and have a result. The pre 3.4.0 API uses exhausted which is exclusive to a result. This API is still implemented for beckwards compatibility.

* Implemented an executeSync function in AqlQuery. This will block the thread until query execution is complete

* Added another test for FILTER, and one test for the HashedCollectBlock

* Added more tests for HashedCollectBlock; avoid unneccessary getSome calls

* Added an profiler IndexBlock test

* IndexBlock: avoid redundant getSome calls, added missing traceGetSomeEnd calls

* Added a second test profiling IndexBlock

* Added a third test for IndexBlock

* Moved general code to module

* Moved noncluster tests into a separate file

* Split aql-profiler testsuite into three files

* Added profiler tests for LimitBlock

* Added a test for NoResultsBlock

* Added profiler tests for TraversalBlock

* Shutdown of an AQL query is now asynchronous. However in Error-Cases it will be executed in a blocking way still

* Optimized TraversalBlock getSome calls due to new (nightly) test results

* Fixed std::min calls I broke

* Let shutdown calls in AQL wait, if the query is executed successfully.

* Fixed queryResult going out of scope

* fix compile error through merge conflict with devel

* Fixed compiler warning "mismatching tags"

* Removed debug log output

* Added TODO notes

* Fixed test fail due to devel merge

* Fixed some invalid sync waiting implementations

* Added a profiler test for SortBlock

* Added profiler tests for SortedCollectBlock

* Fixed bug introduced by devel merge

* Fixed Remoteblocks ignoring errors!

* Added some more continue Callbacks in used places. And removed debug log

* Removed debug log output

* Suppress clang warnings

* Bugfix: use of invalid stack pointer

* Bugfix: RemoteBlock::shutdown now sends code as int, not string

* Revert "Suppress clang warnings"

This reverts commit 05591649c59743c992edd5e78814edc8ca2a83e0.

* Bugfix: cleanup state in RemoteBlock ::shutdown, ::getSome and ::skipSome

* Bugfix in Subquery shutdown: don't skip subquery shutdown when main query shutdown failed

* Allow copy elision
2018-07-09 14:24:10 +02:00
Frank Celler c6e1672338 fixed acceptor on windows 2018-06-11 22:21:36 +02:00
Frank Celler 5145153467 no domain sockets on Windows 2018-06-11 16:06:57 +02:00
Frank Celler efc030ea87 Feature/remove event loop (#5565) 2018-06-11 11:46:17 +02:00
Frank Celler c5ac519d1c Bug fix for Read/Write race [WIP] (#5534)
* added wrapper, added asio_ns
* Temporarily fix condition variable bug in job queue.
* preparation for 3.3 back-port
* clang-format
* removed unecessary check, this is now fixed by stand
* added missing RequestStatistics::SET_READ_END
* cosmetics
2018-06-08 10:51:54 +02:00
Frank Celler baec138715 clang-format 2018-06-07 09:26:11 +02:00
Simon ef87bb04a4 Allow RestHandler to pause execution (#5458) 2018-05-25 12:27:14 +02:00
Jan 8e6d5df129
fixed minor several compiler complaints (#5406) 2018-05-23 11:50:00 +02:00
Simon 17b1a2aafb Rest middleware refactoring (#5332) 2018-05-14 17:43:10 +02:00
Jan 9fbcddb3bd
Bug fix/make rest handlers indirect (#5323)
* make rest handlers indirect

* added assertions
2018-05-11 13:51:06 +02:00
jsteemann 7f8a1cc614 Merge branch 'bug-fix/add-missing-overrides-and-final' of https://github.com/arangodb/arangodb into devel 2018-05-07 23:02:46 +02:00
Simon 6db3258505 Possible windows fix (#5281) 2018-05-07 18:07:28 +02:00