1
0
Fork 0
Commit Graph

94 Commits

Author SHA1 Message Date
Lars Maier 5b41e5a5c8 [3.4] Scheduler Logging (#9028)
* Log when queues become filled up or are completly filled.
* Added latency for pushing.
* lowered log level.
* clever logging. and mutex. :(
* Reset local clock.
* Try to fix mac compile.
* Improve logging logic for half full and full queue.
* CHANGELOG.
2019-05-20 16:30:52 +02:00
Matthew Von-Maszewski 474f0cde31 Bug fix 3.4/scheduler empty reformat (#7872)
* added check for empty scheduler

* removed log, old is 1 not 0

* require running in this thread

* test

* added isDirect to callback

* signature fixed

* added drain

* added allowDirectHandling

* disabled for testing

* Add ExecContextScope object to direct call.

* try alternate initialization of ExecContextScope

* remove ExecContextScope, no help.  try _fifoSize as part of direct decision.

* strand management to minimize reuse of same strand per listen socket

* blind attempt to address Jenkins shutdown lock up.  may remove quickly.

* add filename and line to existing error log message

* Adjust queueOperation() to stop accepting items once isStopping() becomes true.

* revert previous check-in to MMFilesCollectorThread.cpp

* big reformat

* fixed merge conflicts

* Add CHANGELOG entry.
2019-01-08 20:39:42 +01:00
Frank Celler 9477af198b big reformat 2018-12-26 00:57:05 +01:00
jsteemann 9658300f11 revert Scheduler changes 2018-11-26 09:54:41 +01:00
Jan b363372c63
Bug fix 3.4/remove shutdown assertion (#7387) 2018-11-22 15:36:06 +01:00
Matthew Von-Maszewski 4362137ba4
Bugfix 3.4: Null pointer defense in Scheduler::post(callback) (#7285)
* defense against the dark arts (nullptr in _ioContext)

* move incQueued() so that we can imply race state of _ioContext.

* adjust to meet Jans expectations

* jsteeman noticed that queue count is not considered before shutdown ... bad

* add JobGuard object to manage working count.  should hold shutdown a tad longer.

* TEMPORARY HACK:  need to validate problem that is randomly occurring in Jenkins automation

* TEMPORARY HACK 2: trying to isolate an acceptable sequence.

* TEMPORARY HACK 3: trying to isolate an acceptable sequence.

* TEMPORARY HACK 4: so close ... seem to have all the moving parts isolated.  Come on Jenkin!

* shutdown now orderly finishes everything already in fifo queues and active on threads.  Then forces any late requests to execute on callers thread.
2018-11-16 12:20:00 -06:00
Matthew Von-Maszewski 1c3672be75
revert accidental check-in of thread delete constant edit (#7208) 2018-11-04 10:36:36 -05:00
Simon c073b9dbbe Make ensureIndexOnCoordinator more robust (#7110) (#7130) 2018-10-30 11:25:06 +01:00
Matthew Von-Maszewski a1af2e305f Bugfix 3.4: update Scheduler thread logic based upon testing (#7056) 2018-10-24 22:49:28 +02:00
Lars Maier d7863b4583 Bug fix 3.4/cluster comm threads start stop (#6939)
* Start ClusterComm threads in `ClusterFeature::start`. Stop ClusterComm threads in `ClusterFeature::stop`.

* Do not free objects in `Scheduler::shutdown`. Let the `unique_ptr` do their job. Stop ClusterComm threads in `ClusterFeature::stop`, but free instance in `ClusterFeature::unprepare`.

* `io_context` may contains lambdas that hold `shared_ptr`s to `Tasks` the required a functional `VocBase` in their destructor.

* Clean up.
2018-10-19 13:12:51 +02:00
Jan 18de63c7c8
Feature 3.4/medium priority (#6910) 2018-10-18 17:08:39 +02:00
Matthew Von-Maszewski a9ce39f85c Bugfix 3.4: Merge scheduler changes by Michael & Frank into recent overlapping code changes (#6928)
* manual recreation of bug-fix-3.4/scheduler-high-low within recent Scheduler changes.

* restore Documentation that was unintentionally deleted
2018-10-16 22:51:00 +02:00
Matthew Von-Maszewski 25bc48e548 minimum fixes to clear scheduler timeout problem. (#6878) 2018-10-15 09:06:15 +02:00
Matthew Von-Maszewski 887822afa6 Bug fix 3.4: libcurl threading changes (#6829)
* enable the ability to push results processing to threads
* have ClusterComm push libcurl response processing to Scheduler threads
* tuning changes from Matthew and Michael
* give new defaults to minimum thread count
* create multiple ClusterCommThreads, each with own Communicator object
* put PR notes in change log
* correct speling
* Also drain V8 queue.
* Add prio V8 to switch in canPostDirectly.
* Accept --server.minimal-threads even if maximal threads is not set.
* Reactivate stopping of threads.
2018-10-12 17:00:55 +02:00
Simon 05446dcac0 Bug fix 3.4/activefail debug (#6717) 2018-10-05 18:36:06 +02:00
Jan 5873f63a72
Bug fix/fixes 2908 (#6279) 2018-08-31 17:26:54 +02:00
Lars Maier 889ce78dcf Batch Handler V8-lane bug (#6196) 2018-08-21 13:52:14 +02:00
Jan 86204ed0b8
fix memory leaks in arangosh connections (#6160) 2018-08-17 08:48:54 +02:00
Jan d6a3b66e2a
micro optimizations (#6162) 2018-08-16 08:50:16 +02:00
Frank Celler a688dc0962
Feature/remove job queue thread (#5986)
limiting V8 calls in flight
2018-08-10 12:17:43 +02:00
Jan 2cac8b8a51
honor some cppcheck recommendations (#5817) 2018-07-10 13:50:30 +02:00
Tobias Gödderz fc3e11dbbc Async AQL (#5806)
* Modified header to new initializeCursor API

* Adapted initializeCursor to DONE/WAITING API. Compiles but not tested and no one reacts to WAITING state, it is not returned anywhere yet

* Subqueries now expect a WAITING return from initilize cursor. However they will just return a nullptr and pretend the query is empty, this will be fixed later

* First attempt to simulate thread waiting over information within the query

* Small fix to allow for isDirect handlers to go to sleep.

* Waiting in the necessary places now for the async request to be send.

* Thank you auto-casting compiler, you are totally right i absolutely wanted to use this bool value as an index in may Array. How could i possibly not want to use it here?

* Include cond-var header

* Fixed mutex/cond_var usage

* Added oldAPI wrappers in AQL Blocks for get/skip some variants. This Commit compiles but is NOT tested

* Let getSome now return unique_ptr of AqlItemsBlocks. Also implemented the async variant of getSome in subqueries.

* Removed all references to OLD implementations in AQL. only the base wrappers are allowed to call OLD functions from now on. Now the testing part starts

* Fixed endless virtual recursion

* Implemented new getOrSkip API in SortBlock

* Implemented new getOrSkip API in LimitBlock

* Initilaize all variables

* Fixed logic bug in SubqueryBlock

* getBlock in ExecutionBlock now returns a state. All blocks need to handle this properly!

* Createad a wrapper getBlockOld that servers the old sync api and is used now in AQL. To be replaced overtime.

* Added IndexBlock::skipSome and IndexBlock::getSome

* getBlock now returns its old return value along with the state

* Switch from getBlockOld to getBlock in IndexBlock::skipSome

* Switch from getBlockOld to getBlock in IndexBlock::getSome

* ShortestPathBlock::skipSome is not implemented! Added a regression test

* Attempt to fix SubQueryResult memory management

* Fixed LIMIT Block

* Moved from ShortestPathBlock::getSomeOld to ::getSome

* Implemented ASYNC api on SingletonBlock

* Adapted EnumerateCollectionBlock to new async API

* Fixed FilterBlock and adapted return block to async API

* Adapted NORESULTS block to async AQL api.

* Adapted Modification Blocks to async API

* Fixed some initialize cursor functions to reset values required during get/skipSome

* First steps to adapt ClusterNodes to Async AQL api. Not there yet, need to implement the core still

* Added asnyc implementation for xxxForShard in ClusterBlocks. This commit changes internal logic of _doneForShard. Needs additional testing as soon as everything is in place.

* Adapted CalculationBlock to async API

* Adapted TraversalBlocks to ASYNC Aql. This is not optimal yet, we need a better decission if we are DONE or not on RETURN

* Adapted EnumerateListBlock to Async AQL api

* Adapted RemoteBlock to ASYNC API in getSome/skipSome. The whole thing is now LIVE in the cluster. Exetensive testing to be started now

* Fixed IndexBlock WAITING behaviour if Waiting occurs during a index processing

* Adapted IReasearchViewBlock to ASYNC AQL API

* Fixed SortingGatherBlock in WAITING state.

* Adapted IResearch ExecutionBlockMock to Async API

* Unified the HASMORE/DONE distinction. Code is much more readable now and harder to get incorrect 👍

* Implemented tonly heoretically reachable function of non void function.

* Fixed last commit

* Added inline TODO comments

* fix warning

* Fixed a clearing logic bug in RemoveNodes

* Fixed Error Handling in RemoteBlocks. Also fixed a logic bug (true/false simply has a 50% chance of getting it wrong) in Distribute and Scatter.

* remove unused methods

* Fixed failure test

* implement skipping

* Moved the Query Waiting out of the ExecutionEngine.

* changed one of the collect blocks

* Removed _upstreamState from ExecutionBlockMock, that is in the base-class now

* Added a Test Mock for a an ExecutionBlock that simulates the WAITING/HASMORE/DONE api.

* do not check "hasMore" if not necessary

* Added DistinctCollectBlock::getOrSkipSome from ~Old and changed its return type

(still uses getBlockOld)

* Save state to resume in DistinctCollectBlock::getOrSkipSome

* Extracted redundant code

* fixed some ops

* added one more test

* fix endless blocking

* fix compile error

* fix test

* Refactored HashedCollectBlock::getOrSkipSome

* Return blocks to the manager

* Replaced usage of getBlockOld in HashedCollectBlock::getOrSkipSome

* remove unused shutdown calls, simplify ownership for expressions

* Removed superfluous variable

* Capture const variable by value

* Removed SortedCollectBlock::getOrSkipSomeOld in favour of getOrSkipSome

* Added a working version of SortedCollectBlock::getOrSkipSome

Has yet to be cleaned up

* Removed isTotalAggregation special treatment

* On no input, return a group of nulls (instead of no group at all)

* Bugfixes

* Simplified code

* Move return to the end, eliminate duplicate code

* Corrected skipped count in HashedCollectBlock

* Aligned getNextRow() implementations

* Added comments

* some cleanup

* fix potential memleak

* Bugfix

* Fixed failure tests

* Removed usage of getBlockOld in ExecutionBlock::getOrSkipSome

* Replaced hasMore with an async implementation (mostly)

* Removed getBlockOld()

* Added hasMoreState to the AQL API (and renamed hasMore methods to hasMoreState)

* RemoteBlock now uses the async hasMoreState route

* remove job queue

* options

* Bugfixes in the async implementation of LimitBlock

* LimitBlock::getOrSkipSome now always skips when calculating the fullcount

* fix compile warnings

* restrict threads

* Fixed api of Waiting ExecBlockMock. Unused yet

* Made SortedGatherBlock async-capable

* Removed nonEmptyIndex hack

* Removed duplicate traceGetSome~ calls, moved all to getSome

* Added asserts before replacing getNr*Registers

* Added a TODO note and a comment

* Removed getSomeWithoutRegisterClearoutOld()

* Removed skip()

* Removed common code by using getNr*Registers()

* Use getNr*Registers() in the TraversalBlock as well

* started to add lane

* started to add lane

* added lane

* completed lane

* removed debug output

* fixed merge

* Began working on a test suite for AQL tracing/profiling

* Added more tests and asserts in aql-profiler

* Made some ExecutionBlocks final

* Added a type enum to all blocks and the per-block stats

* Add block type to stats nodes when tracing AQL on block level

* Removed initializeCursor call from instantiateFromPlan

* Avoided additional getSome calls after DONE

* Added more profiler tests

* Refactored ExecutionBlock::getOrSkipSome and fixed two bugs

- set _upstreamState also when skipping
- explicitly use xecutionBlock::getHasMoreState()

* Bugfix: update state

* Reuse parent _skipped wherever possible; rename where not (LimitBlock)

* Simplified SortedCollectBlock::getOrSkipSome and reused general pattern & code

* Implemented missing virtual function (with USE_FAILURE)

* Reset neccessary values during initializeCursor

* Simplified code in EnumerateListBlock a little

* Added a test for DistinctCollectBlock in aql-profiler

* Avoid redundant getSome calls in DistinctCollectBlock

* fix compilation

* Fixed DistinctCollectBlock profiler test

* Added a second profiler test for the DistinctCollectBlock

* Added a profiler test for EnumerateCollectionBlock

* Bugfix in EnumerateListBlock

* added --server.fifoN-size

* Simplified EnumerateCollectionBlock::getSome

* Simplified EnumerateCollectionBlock::getSome, and return HASMORE less often when DONE

* Fix testEnumerateCollectionBlock1 for mmfiles

* do not pass by reference

* Fixed compile error

* fixed merge conflicts

* Added profiler tests for EnumerateCollectionBlock

* Test fix for mmfiles

* Fixed IResearch tests

* Bugfix in DistinctCollectBlock and a regression test

* Updated comment

* Bugfix for query statistics in cluster

* Check plan in distinct test

* Fix aql-profiler tests in cluster

* Removed unused line / bugfix for single server test runs

* This commit implements waking up of AQL queries. (#5651)

* Non-compiling intermediate commit for handover.

* Make branch compile again

* Started implementation of continueable rest cursor handler by moving the callbacks to the outer part. This is not yet fully tested!

* Made finalizeExecute noexcept. We cannot react to this errors as the response was potentially written before. Also introduced continueExecution in the RestHandler engine.

* First successful query wakeup.

* The wakeup callback now posts on the scheduler directly. A resthandler only needs to provide a callback that encapsulates the continueExecution call on this handler

* renamed finalizeExecute to shutdownExecute

* Added a differentiation between Handler and Callback in Query continuation. Handler will be posted in IO service. Callback will be executed directly

* fix audit log

* Removed callback from deleteQueryCursor. This cannot be waiting

* use CONDITION_LOCKER

* removed yet another thread-local variable

* Fixed forward declaration

* Made RestAqlHandler repeatable

* Use defer to close the query in RestAqlHandler. Now waiting will close the query as well.

* Added a mutex in the RestHandlers to make sure if the callback over network is too fast that there is only one Thread running in the RestHandler

* Captured the GeneralCommTask if it is posted to a RestHandler. This is necessary in the PAUSED case

* Refactoring of _noLockHeader responsibilities. Now the BaseHandler selects them and resets them after it is done. Only Coordinators are allowed to define them if a query is loaded.

* Removed reaction to existing nolockheaders in Coordinator Query Planning Phase

* Removed incorrect assertion.

* Further refactoring of NoLockHeaders. Now there is a wrapper class around it which allows for debugging and logging. The state now seems to be better. Also all non-rest-handler triggered queries clean up the NoLockHeaders properly.

* Fixed UserManager, now deletes nolock headers properly

* Swing to the Symphony of Destruction

* Forgot about community build...

* Fixed compiling of Catch tests

* Fixed community build

* need thread for size

* Made the restSimpleHndler repeatable

* Implemented dump and dumpSync in Cursors, Sync will block a thread, dump allows to wait, only relevant for Streaming cursor

* Reactivated StreamingCursors

* Removed debug output.

* Fixed false query continuation

* Reset thread output to non-debug

* Added missing return statements

* Allow some CollectionMethods to hand-in a context that may contain a transaction. This is meant to honor nolock headers.

* Fixed hidden merge conflict

* Bugfix in aql-profiler.js: use plan.nodes order, not stats

* Added two profiler tests for filter

* Avoid too many getBlock calls in the FilterBlock

* Removed debug output

* RemoteBlock API will now send a done(bool) flag whenever we request documents from remote Servers. It is possible that we are DONE and have a result. The pre 3.4.0 API uses exhausted which is exclusive to a result. This API is still implemented for beckwards compatibility.

* Implemented an executeSync function in AqlQuery. This will block the thread until query execution is complete

* Added another test for FILTER, and one test for the HashedCollectBlock

* Added more tests for HashedCollectBlock; avoid unneccessary getSome calls

* Added an profiler IndexBlock test

* IndexBlock: avoid redundant getSome calls, added missing traceGetSomeEnd calls

* Added a second test profiling IndexBlock

* Added a third test for IndexBlock

* Moved general code to module

* Moved noncluster tests into a separate file

* Split aql-profiler testsuite into three files

* Added profiler tests for LimitBlock

* Added a test for NoResultsBlock

* Added profiler tests for TraversalBlock

* Shutdown of an AQL query is now asynchronous. However in Error-Cases it will be executed in a blocking way still

* Optimized TraversalBlock getSome calls due to new (nightly) test results

* Fixed std::min calls I broke

* Let shutdown calls in AQL wait, if the query is executed successfully.

* Fixed queryResult going out of scope

* fix compile error through merge conflict with devel

* Fixed compiler warning "mismatching tags"

* Removed debug log output

* Added TODO notes

* Fixed test fail due to devel merge

* Fixed some invalid sync waiting implementations

* Added a profiler test for SortBlock

* Added profiler tests for SortedCollectBlock

* Fixed bug introduced by devel merge

* Fixed Remoteblocks ignoring errors!

* Added some more continue Callbacks in used places. And removed debug log

* Removed debug log output

* Suppress clang warnings

* Bugfix: use of invalid stack pointer

* Bugfix: RemoteBlock::shutdown now sends code as int, not string

* Revert "Suppress clang warnings"

This reverts commit 05591649c59743c992edd5e78814edc8ca2a83e0.

* Bugfix: cleanup state in RemoteBlock ::shutdown, ::getSome and ::skipSome

* Bugfix in Subquery shutdown: don't skip subquery shutdown when main query shutdown failed

* Allow copy elision
2018-07-09 14:24:10 +02:00
Frank Celler efc030ea87 Feature/remove event loop (#5565) 2018-06-11 11:46:17 +02:00
Frank Celler c5ac519d1c Bug fix for Read/Write race [WIP] (#5534)
* added wrapper, added asio_ns
* Temporarily fix condition variable bug in job queue.
* preparation for 3.3 back-port
* clang-format
* removed unecessary check, this is now fixed by stand
* added missing RequestStatistics::SET_READ_END
* cosmetics
2018-06-08 10:51:54 +02:00
Frank Celler baec138715 clang-format 2018-06-07 09:26:11 +02:00
Simon ef87bb04a4 Allow RestHandler to pause execution (#5458) 2018-05-25 12:27:14 +02:00
Jan 8e6d5df129
fixed minor several compiler complaints (#5406) 2018-05-23 11:50:00 +02:00
Simon 17b1a2aafb Rest middleware refactoring (#5332) 2018-05-14 17:43:10 +02:00
jsteemann 7f8a1cc614 Merge branch 'bug-fix/add-missing-overrides-and-final' of https://github.com/arangodb/arangodb into devel 2018-05-07 23:02:46 +02:00
Simon fdee0544b7 Using asio::io_context::strands instead of locks (#5266)
* initial try adding strands

* working, stable amount of threads

* improve shell_client cluster

* Fixing some accounting for the scheduler

* Fix accounting

* Fixing wrong strand usage

* add missing return

* Fixing thread accounting

* More scheduler accounting issues

* Fixing various things

* Fixing some stuff

* Fixing some stuff

* Some more subtle bugfixes

* Some cleanup code

* fixing some stuff

* adding some more fixes

* Fixing possible issues

* Fixing missing _storeResult

* Fixing some stuff

* Reducing lambda stack, perhaps fixing hangups

* Fix writeunlocker

* Fixing possible issues

* adding some debugging stuff

* refactor sockets

* possible fixes

* Adding more job guards

* Fixin possible bug

* cleaning up some stuff

* working impl

* Remove debugging output

* Fixing build

* fixing import

* Fixing another bug

* removing debug log

* Removing examples

* Reverting scheduler working code

* Cleanup

* Addressing review comments
2018-05-07 15:58:19 +02:00
jsteemann 52de92d334 add missing override specifiers, add final specifiers 2018-05-04 09:01:50 +02:00
Jan 9c76613e63
fix premature unlock (#3802)
* fix some deadlocks found by evil lock manager (tm)

* fix duplicate lock

* fix indentation

* ensure proper lock dependencies

* fix lock acquisition

* removed useless comment

* do not lock twice

* create either a V8 transaction context or a standalone transaction context, depending on if we are called from within V8 or not

* AQL micro optimizations

* use explicit constructor

* only use V8DealerFeature's ConditionLocker for acquiring a free V8 context

entering and exiting the selected context is then done later on without having to hold the ConditionLocker

* remove some recursive locks

* Disable custom deadlock detection when Thread Sanitizer is enabled

* Changing ifdef's

* grr

* broke gcc

* Using atomic for ApplicationServer::_server

* fix premature unlock

* add some asserts

* honor collection locking in cluster

* yet one more lock fix

* removed assertion

* some more bugfixes

* Fixing assert

(cherry picked from commit 1155df173bfb67303077fbe04ee8d909517bfd21)
2017-12-13 13:27:42 +01:00
Jan 282be208cc
remove TRI_usleep and TRI_sleep, and use std::this_thread::sleep_for … (#3817) 2017-12-06 18:43:49 +01:00
Jan 057e87f919 fix shutdown in case no threads can be started (#3648) 2017-11-10 10:21:51 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
Jan 1ace247273 Bug fix/scheduling et al (#3161)
* added V8 context lifetime control options `--javascript.v8-contexts-max-invocations` and `--javascript.v8-contexts-max-age`

* make thread scheduling take into account most of the tasks dispatched via the io service
2017-08-30 10:40:02 +02:00
Frank Celler 6d08d4f4aa Bug fix/scheduler delete (#3077)
* removed delete call

* cleanup

* lower cpu activity of log thread too

* fix log messages

* do not enter threads into unordered_set, as it is unneeded

* do not compile in calls to disabled plan cache

* moved AQL regex cache from thread local variables to a class of its own

* more sensible thread creation and destruction
2017-08-25 12:00:17 +02:00
Jan 6180fcfdd1 Bug fix/prevent multiple journals (#3027)
* prevent multiple journals

* fix documentation

* remove _nrDesired, as it is not used anymore
2017-08-15 23:02:08 +02:00
jsteemann cbba71bb00 change feature order around 2017-05-10 14:29:20 +02:00
jsteemann 217d41f6f5 fix shutdown races 2017-05-09 10:24:40 +02:00
jsteemann 1d22f7bb61 yield on shutdown 2017-04-28 10:06:27 +02:00
jsteemann aa521d5412 better error messages 2017-04-27 14:47:19 +02:00
Frank Celler 45690bbbdd fixed queue size 2017-04-24 18:47:44 +02:00
Frank Celler 1a4675dfbf added queue size to statistics 2017-04-24 18:47:44 +02:00
Frank Celler d783d4ecae added queue time and request tracing with timings 2017-04-24 18:47:44 +02:00
jsteemann c5854d050b fix shutdown issue, modernize thread creation a bit 2017-04-19 16:57:53 +02:00
jsteemann 058e6002d3 try to fix shutdown races 2017-03-22 08:31:49 +01:00
jsteemann 93423ba273 try to fix shutdown races 2017-03-21 12:59:46 +01:00
Max Neunhoeffer 428b6aa67f Port thread fixes from 3.1 to devel. 2017-03-16 13:53:40 +01:00
Frank Celler 4a3fdf6351 raised default hard limit on threads for very small to 64 2017-02-23 09:08:39 +01:00