1
0
Fork 0
Commit Graph

284 Commits

Author SHA1 Message Date
Simon 35992ad67b Coordinator storage engine (#5405) 2018-05-22 19:30:27 +02:00
Kaveh Vahedipour c2c104d6b1 agency pool size had to be larger 1 (#5379) 2018-05-17 11:54:41 +02:00
Simon f2b952134f Fixing agency pool update (#5316) 2018-05-14 14:56:19 +02:00
Matthew Von-Maszewski 01cc6d2159 adjust thread crash logging for simple db server case. (#5234) 2018-05-02 22:35:20 +02:00
Jan 30b12e311b
Bug fix/remove most of aql js (#5223) 2018-04-30 11:17:11 +02:00
Simon 2e0e2574d4 Port from 3.3 (#5213) 2018-04-27 17:05:18 +02:00
Matthew Von-Maszewski a67df088b0 correct race condition leading to infinite job execution (#5201)
* fix infinite loop by setting _lastSyncTime within runBackgroundJob().  add code to make agency callback ignore _lastSyncTime limit.
* create change notes for this PR and previous PR 5114
2018-04-27 13:43:22 +02:00
Wilfried Goesgens ea507bb27c use std::chrono and std::date for date / time formatting (#5194) 2018-04-24 18:23:04 +02:00
Matthew Von-Maszewski a84f7805ad Feature/mv thread death logging (#5111)
* Initial low level interface for thread crash reporting (and management).
* Add a member version of isClusterRole()
* isolate heartbeat thread creation to new StartHeartbeatThread().  create heartbeat thread even if not a cluster or if an agent.
* update runDBServer() and runCoordinator() to shutdown more quickly by polling isStopping() at additional locations.
* copying updates from different branch / PR
* basic thread crash logging.  Not yet tied into Agency arangod or have any specific threads posting crashes
* make Supervision thread a CriticalThread
* sandwich CriticalThread between Thread and other classes to create long term, repeating thread crash reporting.
* restore code lost upon branch update relating to new startHeartbeatThread() function
* add CriticalThread.cpp to build
* add new runAgentServer() function to loop for Agents.  Make Heartbeat thread derive from CriticalThread.
* remove debug line
2018-04-23 15:50:14 +02:00
Simon 45fbed497b Supervision Job for Active Failover (#5066) 2018-04-23 12:49:41 +02:00
Matthew Von-Maszewski dd03ca5dd8 shutdown quicker on db server and coordinator heartbeat threads (#5114)
* shutdown quicker on db server and coordinator heartbeat threads

* Adjust the new condition variable usage to follow normal coding patterns.  Probably was ok.  But why take the chance.  And simplify future maintenance.
2018-04-20 15:00:08 +02:00
Simon 8be273efb8 Replication cleanup (#5105) 2018-04-17 08:17:42 +02:00
jsteemann 0bef01d85f add missing call to sendState 2018-04-04 17:29:18 +02:00
Max Neunhoeffer a84d758105
Increase a log level to warn (as in 3.3). 2018-03-23 12:47:40 +01:00
Max Neunhöffer 52dd3e82ce
Fix second cluster bootstrap hanger. (#4841)
Call loadCurrentDBServers regularly.
2018-03-15 00:44:15 +01:00
Kaveh Vahedipour 2e2d947c1c devel: fixed the missed changes to plan after agency callback is registred f… (#4775)
* fixed the missed changes to plan after agency callback is registred for create collection
* Force check in timeout case.
* Sort out RestAgencyHandler behaviour for inquire.
* Take "ongoing" stuff out of AgencyComm.
2018-03-14 12:01:17 +01:00
Jan 5a67a048c5
bump version number for all local DDL changes and tell agency (#4685)
this allows other listeners (e.g. for DC2DC) to get notified when
DDL operations are carried out locally and need to be applied remotely
2018-03-05 17:06:34 +01:00
Simon 345fc3c0b7 Refactor Authentication Layer (devel) (#4592)
* Cherry Picking LDAP changes

* Adding missing merges

* Fixing remaining mentions of FeatureCacheFeature

* Fix jslint

* Fixing some failed tests

* Fixing cluster authentication issue, red tests

* Fixing ldap testsuite, adding trace logging

* Fixint ldap tesuite setup and LDAP recognition

(cherry picked from commit 686d28a779)

* Fixing wrong assert

* Adding changelog entry, making requested changes from code review

* Fixing dump_authentication, fix typos

* improvements found during code review

* oops

* more use of sessionstorage

* fix tests

* Fixing broken handling, disallowing adding of local users when disabled

* Fixing testInvalidGrants

* Removing undefined auth level externally

* Fixing previous commit

* added tests for ldap search mode

* intentionally removed `after` methods from tests

because they are executed before the tests start
no cleanup is performed right now after the authentication tests
however, a cleanup is done at start of every test

* ldap tests all modes

* forward port changes from 3.3

* added generated files

* forward port missing changes for web UI

* added generated files

* added generated files
2018-02-28 13:24:28 +01:00
Simon 35136a89c0 Fix some problems with active failover (#4540) 2018-02-09 15:11:53 +01:00
jsteemann e1a5268fca fix Windows compile error 2017-12-13 18:18:45 +01:00
Jan 3143805c6f
make replication abortable (#4016) 2017-12-13 12:32:04 +01:00
Simon Grätzer dce677720d Fixing an issue with intermediate commits (#3975) 2017-12-12 09:17:18 +01:00
Jan 282be208cc
remove TRI_usleep and TRI_sleep, and use std::this_thread::sleep_for … (#3817) 2017-12-06 18:43:49 +01:00
Kaveh Vahedipour 11cfa74495 Bug fix/no support for inquiry in send transaction with failover less verbosity (#3847) 2017-11-30 14:54:22 +01:00
Kaveh Vahedipour 7015db790f replaced all AgencyGeneralTransactions by AgencyWriteTransaction (#3841) 2017-11-29 17:33:24 +01:00
Simon Grätzer 0e485f7441 Fixing collection name collection handling in Syncer (#3710) 2017-11-17 16:36:57 +01:00
m0ppers 6cbf7159be Feature/server mode (#3590)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* Fixing some bugs

* Fixing tests

* Canceling ongoing operations

* removing some unused code + some asserts

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* Current work

* proper ifs

* Migrate resthandler to c++ and implement mode

* readonly server mode spec test

* Add changelog

* change code so it expects a full object and not just a string

* Fix jslint
2017-11-10 17:56:21 +01:00
Jan 733f27e997 Bug fix/fix compilation with gxx7 (#3637) 2017-11-10 16:00:57 +01:00
Jan f6a90c4879 some cleanup (#3583) 2017-11-07 10:27:21 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
Simon Grätzer 64e9377c05 Replacing /_api/collection with RestHandler (#3543) 2017-11-02 14:57:17 +01:00
Jan 4d03d3bd82
Bug fix/fixes 2410 (#3511)
* slightly clean up indexes

* proper stringification of error code

* improve diagnostic messages

* improve memory usage for incremental sync
2017-10-30 22:49:24 +01:00
Michael Hackstein 15d9a4be5f Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510) 2017-10-25 18:03:24 +02:00
Jan 720e6df82e Bug fix/fixes 1910 (#3471)
* properly initialize all properties

* use faster comparison

* properly detect and handle "method not allowed"

* code-style

* remove unused variable

* narrow variable scope

* handle non-existance of AuthenticationFeature

* remove dead code

* replace some C string handling with std::strings

* moved assertion to the correct place

* honor number of array members for IN operator

* slightly adjust error messages

* slighty adjust some error messages

* try to fix issue with lingering replication contexts on shutdown

* clean up heartbeat thread a little bit

* small fixes
2017-10-23 09:17:36 +02:00
Simon Grätzer fd3f9d99d9 Fixing webinterface access (#3464)
* intermediate commit

* Refactoring the ExecContext

* Fixing authentication

* Added start script

* some fixes

* fixed access to nullptr

* some c++

* fixed misleading message

* Made DatabaseGuard movable. Also adapted map insertions to _vocbase in Syncer classes, which failed to compile under older GCC versions

* added support for global flag to replication handler

* Started Refactoring in replication-static

* Fixing syncer code

* store applier configuration

* Static replication tests now test replication in a non system Database

* added flags to replication feature

* Adding some extra checks

* Fixing issue with rocksdb rest replication handler

* replication static now runs _system and otherdatabase replication tests.

* Fixing crash on startup

* Replication_sync now tests _system as well as other Database

* Fixing up heartbeat thread, adding global flag to rest handler

* Fixing wrong assert

* some cleanup, probably some tests are broken

* Made non-system db version of replication-ongoing tests

* fix determine-open-transaction

* Fixed ongoing tests. And added a test where we drop a database on slave while replication is still ongoing

* test fixes

* Activated ongoing other db tests. Also added a test that drops the DB on master, while the slave is still syncing.

* some better error reporting

* gradually switch to Result

* createCollection -> create

* re-activate using of collection ids for now

* enable auto-start

* Fixed create collection in replication ongoing test

* Added first draft of a test for global replication

* move to Result

* use system database for global applier

* improved error reporting

* fixed invalid URLs

* add test case filter

* load existing global applier configuration

* improve error reporting

* Added further tests for global replication

* Fixed global replication test, it now properly waits for replication. Timeouts after 10 seconds.

* Removed erronious assertion

* improve error reporting

* intermediate commit

* Added a test-case for global replication where the Master already has some data and the slave is clean

* fix deletion of replication contexts

* Fixed JSLint

* compiling code

* fix typo

* do not fail for global applier when no database is configured

* intermediate commit

* syncer supports switch for 3.3 / 3.2

* fixed errors

* Fixing some replication bugs

* Fixing some assertions

* Fixed missing commit markers

* Fixing assertion on database drop

* Attempt to fix deadlock in applier and assertion

* Fixing some stupid things

* Support for collection parameter

* Acidentally turned off some tests

* Grrr

* Fixing wrong method call

* Fixed startscript

* Fixed assignmet instead of equality check typo

* Added a test far interrupted replication. For now it justs tests basics on _system database.

* Improved index tests on replication.

* properly initialize variable

* fixed some replication problems

* MMFiles wal access support

* fix replication issues

* Started mmfiles replication support

* fixing a bug

* Fixing an issue

* fixing some mmfiles stuff

* fix test

* reload users

* prevent pure virtual method call

* intermediate commit

* Making from exclusive

* do not call getMasterState if child syncer

* some reformatting

* Adding global support for handleCommandSync

* Fixing assertion

* removing some debug logs

* Changing return codes

* Fixing some issues in the rest handler

* Make replication less susceptible to errors

* remove some debug output

* return last log tick

* remove waits from tests

* fix two tests

* changing header for open-transactions call

* some fixes

* fix test

* invalidate cached databases

* merging request and execcontext

* try to fix assertion error

* renamed method

* fix compile warning

* small changes

* Always use execcontext

* Fixing an assert

* fix replication issues

* try to fix collection lookups

* try to fix master/slave start

* Changing comments in heartbeat thread

* fix wrong signature of READ_LOCKER_EVENTUAL

* log server role in testing mode

* Fixed authentication, removed execContext in favor of request context

* Adding cluster rest api

* Fixing cluster rest handler

* Fixing cluster callback

* Some refactoring

* Queue creation is not a single operation

* Allowed for leader redirects

* Setting start of batch

* Disabling 2.8 compat tests

* fix start/stop bugs

* jslint

* various little changes

* add flag for exposing jwt

* indentation

* cleanup

* Some changed to guid

* fixing tcp to http, vst

* changed endpoint header

* small fixes

* Reorder servers by health status

* Higher timeout

* Changing error messages

* update the fromTick when fetching multiple batches from the coordinator

* more debug info

* Reducing copy pasted code

* change uid generation

* reducing logspam

* more exceptions for redirects

* more exceptions

* attempt to fix uniqids in cluster

* centralize printing of HTTP errors in replication

* debug output

* fix messages for authentication

* cleanup

* removing --cluster.my-id, --cluster.my-local-info

* Added leadership race to bootstrap, determine foxxmaster on boostrap, removing obsolete code

* improve error reporting in RestAqlHandler

* Changing heartbeat thread, fixing cluster_sync

* some more debug output

* added master

* attempt to make tests more deterministic

* added logging about indexes

* added some safety checks to the logger

* slighty better error messages

* fix location header for SSL

* fix error message

* try to make tests more deterministic

* change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED

* Fixing broken webinterface access

* reverting groovy change

* Fixing read-only internal users

* Using superuser rights for dashboard now

* Adding mode field to _admin/server/role

* added mode TRYAGAIN

* remove inventory lock (does not seem necessary here)

* remove invalid assertion

* fixing agency bugs

* Removing debug output

* return proper errors in case of "method not allowed"

* Fixed up some info messages

* jslint
2017-10-20 18:06:59 +02:00
Simon Grätzer 7c31960cf2 Feature/async failover (#3451) 2017-10-18 23:59:29 +02:00
Jan 1ace247273 Bug fix/scheduling et al (#3161)
* added V8 context lifetime control options `--javascript.v8-contexts-max-invocations` and `--javascript.v8-contexts-max-age`

* make thread scheduling take into account most of the tasks dispatched via the io service
2017-08-30 10:40:02 +02:00
Max Neunhöffer 2f874249bb Bug fix/adjust agency comm timeouts (#2765)
* Take out 503 timeouts altogether.
* Overhaul of AgencyComm::sendWithFailover loop.
* Let performRequests optionally ignore 404 coll not found.
* Fix error message "database not found" when AgencyComm failed.
* Add log entries in Agency if locks are acquired too slowly.
* Reexecute the javascript cluster sync stuff even if there was no plan/current change...So failed sync jobs can retry later...
* Cover callbacks in Communicator by lock. This fixes https://github.com/arangodb/planning/issues/370
* Put in delay in waiting for leader in agency test.
* Schmutz logging to heartbeat topic.
* Add more lock time diagnostic in agent.
* Switch on agencycomm tracing in coordinator.
2017-07-13 00:44:28 +02:00
Frank Celler bbe7484521 Feature/auth context (#2704)
* added read-only users
2017-07-02 23:15:57 +02:00
Andreas Streichardt 2e4f83fc08 Invalidate current coordinators on every 2nd heartbeatrun
needed for foxx resilience stuff
2017-05-11 18:35:33 +02:00
Kaveh Vahedipour 6ef775e52f fix bug with ioService gone before hearbeatthread 2017-04-25 08:54:33 +02:00
Kaveh Vahedipour 2dbd179656 Fix ocasionally hangling shut down on mac 2017-04-21 11:49:22 +02:00
Simon Grätzer 3eb1723dec Fixed edge index 2017-03-30 13:22:39 +02:00
Simon Grätzer 27c617fe10 Merge branch 'devel' of https://github.com/arangodb/arangodb into devel
# Conflicts:
#	3rdParty/V8/v8
#	arangod/Transaction/Methods.h
#	arangod/Utils/UserTransaction.h
#	arangod/V8Server/v8-collection.cpp
2017-03-01 14:52:35 +01:00
Kaveh Vahedipour 4cc830b0df merge from 3.1 2017-02-20 20:05:52 +01:00
Kaveh Vahedipour 7fbf9fb621 AgencyCallBacks registry and unregistry are more talkative than bool 2017-02-10 17:31:26 +01:00
Simon Grätzer edab268572 Merge branch 'devel' of https://github.com/arangodb/arangodb into devel
# Conflicts:
#	arangod/Aql/FunctionDefinitions.cpp
#	arangod/Aql/Functions.h
#	arangod/Utils/ExplicitTransaction.h
2017-02-10 15:21:24 +01:00
jsteemann d024a6d00a remove logging for non-topics 2017-02-10 09:32:50 +01:00
jsteemann 01d3ad67b1 Merge branch 'devel' of https://github.com/arangodb/arangodb into engine-api 2017-02-08 00:59:16 +01:00
Max Neunhoeffer 883c11ea45 Handle the case that ClusterComm is already shut down gracefully.
This touches every single place where ClusterComm is being used.
2017-02-07 15:31:40 +01:00