arangodb

Commit Graph

Author	SHA1	Message	Date
Jan	5abf0c1185	Bug fix/fixes 1511 (#3711 )	2017-11-16 14:18:51 +01:00
Max Neunhöffer	766ab7c8cf	Fix agency shutdown bug. (#3683 ) * Fix agency shutdown bug. * Remove precondition that was not needed in AgencyComm::removeValues. * Fail fatally if threads do not shut down.	2017-11-14 16:33:46 +01:00
Jan	e1ecc6b02c	fix some threading issues (#3659 )	2017-11-12 22:34:51 +01:00
Kaveh Vahedipour	c9621ff230	Feature/new agency checks for preconditions (#3612 )	2017-11-11 22:48:23 +01:00
Max Neunhöffer	bff630b332	Handle leader resignation race with redirectRequst. (#3663 )	2017-11-11 19:38:29 +01:00
Kaveh Vahedipour	7e816db51e	Bug fix/agency restart enhancements (#3619 ) * Removed unused active(...) method in Agent * Inception's restart from persistence allows peer with empty active RAFT list to join * Agency's UUID is persisted outside of the database comparable to coordinator and db server action. * Publicized Methods to UUID stuff in ServerState * Inception method documentation * added --agency.disaster-recovery-id to allow for specification of known former agency id. this is a very dangerous option potentially. * Delete a unused methods. * separate _id and _recoveryId * populating active list with entire pool * Improve logging. * reject gossip from unknown agent, if pool is complete	2017-11-10 23:40:26 +01:00
Jan	bef52d7dc3	Bug fix/cleanup after cppcheck (#3639 )	2017-11-10 13:53:28 +01:00
Max Neunhöffer	3c0ee6908b	Bug fix/lead to agent (#3541 )	2017-11-09 11:10:09 +01:00
Jan	98eecaae20	bug fix for agency precondition checks (#3579 )	2017-11-06 23:55:41 +01:00
Simon Grätzer	ee8209943f	Missing things for active / passive (#3578 ) * Switching from ttl to supervision based failover mechanism * Allowing canceling of ongoing actions * refactored asyncjobmanager * refactoring some code * adding read-only flag * catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode * fixing "createsANewDatabaseWithAnInvalidUser" * auth = off does not longer make everyone superuser * Fixing cluster_sync and maybe resilience	2017-11-04 20:30:23 +01:00
jsteemann	a5c777e565	fix broken inquiry results in AgencyComm	2017-10-26 20:10:54 +02:00
Max Neunhöffer	cb05d33e17	Term is a number not a string. (#3520 )	2017-10-26 12:02:38 +02:00
Max Neunhöffer	ee96c37237	Fix agency restart problems. (#3493 ) * Fix agency restart problems (port from a 3.2 fix). * Further fixes after Craneware rescue.	2017-10-25 18:05:58 +02:00
Michael Hackstein	15d9a4be5f	Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510 )	2017-10-25 18:03:24 +02:00
Jan	720e6df82e	Bug fix/fixes 1910 (#3471 ) * properly initialize all properties * use faster comparison * properly detect and handle "method not allowed" * code-style * remove unused variable * narrow variable scope * handle non-existance of AuthenticationFeature * remove dead code * replace some C string handling with std::strings * moved assertion to the correct place * honor number of array members for IN operator * slightly adjust error messages * slighty adjust some error messages * try to fix issue with lingering replication contexts on shutdown * clean up heartbeat thread a little bit * small fixes	2017-10-23 09:17:36 +02:00
Max Neunhöffer	67300f9d77	Add a hidden AGENCY_DUMP for agency emergency recovery. (#3474 )	2017-10-21 00:24:32 +02:00
Simon Grätzer	fd3f9d99d9	Fixing webinterface access (#3464 ) * intermediate commit * Refactoring the ExecContext * Fixing authentication * Added start script * some fixes * fixed access to nullptr * some c++ * fixed misleading message * Made DatabaseGuard movable. Also adapted map insertions to _vocbase in Syncer classes, which failed to compile under older GCC versions * added support for global flag to replication handler * Started Refactoring in replication-static * Fixing syncer code * store applier configuration * Static replication tests now test replication in a non system Database * added flags to replication feature * Adding some extra checks * Fixing issue with rocksdb rest replication handler * replication static now runs _system and otherdatabase replication tests. * Fixing crash on startup * Replication_sync now tests _system as well as other Database * Fixing up heartbeat thread, adding global flag to rest handler * Fixing wrong assert * some cleanup, probably some tests are broken * Made non-system db version of replication-ongoing tests * fix determine-open-transaction * Fixed ongoing tests. And added a test where we drop a database on slave while replication is still ongoing * test fixes * Activated ongoing other db tests. Also added a test that drops the DB on master, while the slave is still syncing. * some better error reporting * gradually switch to Result * createCollection -> create * re-activate using of collection ids for now * enable auto-start * Fixed create collection in replication ongoing test * Added first draft of a test for global replication * move to Result * use system database for global applier * improved error reporting * fixed invalid URLs * add test case filter * load existing global applier configuration * improve error reporting * Added further tests for global replication * Fixed global replication test, it now properly waits for replication. Timeouts after 10 seconds. * Removed erronious assertion * improve error reporting * intermediate commit * Added a test-case for global replication where the Master already has some data and the slave is clean * fix deletion of replication contexts * Fixed JSLint * compiling code * fix typo * do not fail for global applier when no database is configured * intermediate commit * syncer supports switch for 3.3 / 3.2 * fixed errors * Fixing some replication bugs * Fixing some assertions * Fixed missing commit markers * Fixing assertion on database drop * Attempt to fix deadlock in applier and assertion * Fixing some stupid things * Support for collection parameter * Acidentally turned off some tests * Grrr * Fixing wrong method call * Fixed startscript * Fixed assignmet instead of equality check typo * Added a test far interrupted replication. For now it justs tests basics on _system database. * Improved index tests on replication. * properly initialize variable * fixed some replication problems * MMFiles wal access support * fix replication issues * Started mmfiles replication support * fixing a bug * Fixing an issue * fixing some mmfiles stuff * fix test * reload users * prevent pure virtual method call * intermediate commit * Making from exclusive * do not call getMasterState if child syncer * some reformatting * Adding global support for handleCommandSync * Fixing assertion * removing some debug logs * Changing return codes * Fixing some issues in the rest handler * Make replication less susceptible to errors * remove some debug output * return last log tick * remove waits from tests * fix two tests * changing header for open-transactions call * some fixes * fix test * invalidate cached databases * merging request and execcontext * try to fix assertion error * renamed method * fix compile warning * small changes * Always use execcontext * Fixing an assert * fix replication issues * try to fix collection lookups * try to fix master/slave start * Changing comments in heartbeat thread * fix wrong signature of READ_LOCKER_EVENTUAL * log server role in testing mode * Fixed authentication, removed execContext in favor of request context * Adding cluster rest api * Fixing cluster rest handler * Fixing cluster callback * Some refactoring * Queue creation is not a single operation * Allowed for leader redirects * Setting start of batch * Disabling 2.8 compat tests * fix start/stop bugs * jslint * various little changes * add flag for exposing jwt * indentation * cleanup * Some changed to guid * fixing tcp to http, vst * changed endpoint header * small fixes * Reorder servers by health status * Higher timeout * Changing error messages * update the fromTick when fetching multiple batches from the coordinator * more debug info * Reducing copy pasted code * change uid generation * reducing logspam * more exceptions for redirects * more exceptions * attempt to fix uniqids in cluster * centralize printing of HTTP errors in replication * debug output * fix messages for authentication * cleanup * removing --cluster.my-id, --cluster.my-local-info * Added leadership race to bootstrap, determine foxxmaster on boostrap, removing obsolete code * improve error reporting in RestAqlHandler * Changing heartbeat thread, fixing cluster_sync * some more debug output * added master * attempt to make tests more deterministic * added logging about indexes * added some safety checks to the logger * slighty better error messages * fix location header for SSL * fix error message * try to make tests more deterministic * change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED * Fixing broken webinterface access * reverting groovy change * Fixing read-only internal users * Using superuser rights for dashboard now * Adding mode field to _admin/server/role * added mode TRYAGAIN * remove inventory lock (does not seem necessary here) * remove invalid assertion * fixing agency bugs * Removing debug output * return proper errors in case of "method not allowed" * Fixed up some info messages * jslint	2017-10-20 18:06:59 +02:00
Kaveh Vahedipour	428e163db9	Return the result of the inquiry (#3465 )	2017-10-20 15:01:32 +02:00
Jan	7840d3f824	Bug fix/fixes 1810 (#3460 ) * improve error reporting in RestAqlHandler * added logging about indexes * added some safety checks to the logger * slighty better error messages * fix location header for SSL * fix error message * try to make tests more deterministic * change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED	2017-10-19 11:28:01 +02:00
Simon Grätzer	7c31960cf2	Feature/async failover (#3451 )	2017-10-18 23:59:29 +02:00
Kaveh Vahedipour	46333a762f	Bug fix/agency restart after compaction and holes in log (#3413 ) * State fixes holes in RAFT index range * Avoid application of entries older than compaction index _cur and guard for unsigned overflow	2017-10-13 16:01:41 +02:00
m0ppers	bb1d303473	Cmake 5.0 complains about unused lambda captures (#3390 )	2017-10-13 12:20:48 +02:00
Max Neunhöffer	9a2385b941	Add host id detection and show in /_admin/cluster/Health. (#3389 )	2017-10-11 12:42:44 +02:00
Max Neunhöffer	d86f27bd19	Bug fix/agency leader timeouts (#3373 ) * Send out empty heartbeats regardless of non-empty AppendEntriesRPC. * Also improve logging: Note if a log in the empty heartbeat sending takes > 0.01 s. Clearly mark places where a leader resigns in logging. Log if no empty heartbeat is sent out. * Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses. * Add debug logging for _lastAcked and challengeLeadership. * Remove some unused code. Do not count ourselves in challengeLeadership. * Removal of entire activation/deactivation mechanisms in agency * TRI_microtime up to c++11 * added term to response to sendAppendEntries.	2017-10-06 10:11:51 +02:00
Max Neunhoeffer	af3f977997	Revert "Send out empty heartbeats regardless of non-empty AppendEntriesRPC." This reverts commit `e974501446`.	2017-10-02 15:02:15 +02:00
Max Neunhoeffer	2852f80b5a	Revert "Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses." This reverts commit `45d37edfb2`.	2017-10-02 15:02:06 +02:00
Max Neunhoeffer	45d37edfb2	Make leader more tolerant w.r.t. incoming AppendEntriesRPC responses.	2017-10-02 15:01:11 +02:00
Max Neunhoeffer	e974501446	Send out empty heartbeats regardless of non-empty AppendEntriesRPC. Also improve logging: Note if a log in the empty heartbeat sending takes > 0.01 s. Clearly mark places where a leader resigns in logging. Log if no empty heartbeat is sent out.	2017-10-02 14:14:41 +02:00
Max Neunhöffer	47f367d3f0	Bug fix/agency compactor deadlock (#3335 ) * Fix a deadlock between Agent thread and compactor thread. * Improve comments in header. * Organise clean shutdown of agency threads.	2017-09-28 12:20:57 +02:00
Max Neunhöffer	22e46978a6	Bug fix/sort out agency locks (#3306 ) New locking concept in Agency. Ensure empty heartbeats can be sent, answered and processed without long locks. Adjust logging. Fix compaction bugs.	2017-09-27 15:22:30 +02:00
Kaveh Vahedipour	3700f75b0c	State has to keep log for removeConflicts and acoording log all the way (#3249 )	2017-09-16 12:20:47 +02:00
Jan	5165155ed1	Bug fix/fixes 0609 (#3227 ) * do not use V8 variant of AQL functions in early optimization stage when a C++ variant is available * additionally, simplify AQL function definitions and aliases * warn when more than 90% of max mappings are in use * added C++ variant of replication catchup * added `--log.role` option * updated CHANGELOG * removed non-existing scheduler.threads option from config * removed useless __FILE__, __LINE__ invocations * updated CHANGELOG * allow a priority V8 context * remove TRI_CORE_MEM_ZONE * try to fix Windows errors & warnings * cleanup * removed memory zones altogether * exclude system collections from collection tests	2017-09-13 16:28:21 +02:00
Kaveh Vahedipour	627f344266	fixed a bug, where when servers failed, when also agency leadership c… (#3189 ) * fixed a bug, where when servers failed, when also agency leadership changes * redid entire design of checkDBServers/checkCoordinators. * comparison in supervision must be between oldPersisted and newHealth * UI stuff * UI stuff * FailedServer test needed adjustment * Hopefully final round * fixed supervision failure detection * FailedServer tests back to origin devel * oldNot documented among preconditions in Agency HTTP API docs * changed only look for status updated * non action line in api-cluster	2017-09-07 16:10:23 +02:00
Simon Grätzer	ffc465433a	No access collections Improvements (#3190 ) * consolidated EdgeDocumentToken * optimizing cluster traversal * adding skip collection checks * API cleanup * copying AQLValue to avoid use-after-free bugs * Fixing rocksdb SingleServerEdgeCursor * Fixing a collection resolving issue	2017-09-07 14:55:07 +02:00
Jan	0abbc3a3c6	fix duplicate mutex (#3215 )	2017-09-07 14:38:29 +02:00
Kaveh Vahedipour	e808867ddc	Bug fix/unordered map changes order in catch tests (#3175 ) * order of free and free2 changed with use of unordered_multimap vs multimap * fixing order independance for maps in catch tests fixed? * missing bits and pieces	2017-08-31 15:58:48 +02:00
Kaveh Vahedipour	00650e6a3f	Bug fix/agency mt fixes (#3158 ) * added debugging methods * try to fix invalid access in case of error * remove unused members * bugfixes and comments * all agency fixes in * merge bug * partially unguarded Agent::lead fixed * all agency fixes in * added nrBlocked to thread startup eval * added nrBlocked to thread startup eval * recombination of cases in State::get * some maps replaced with unordered_maps * optimized maps some	2017-08-30 10:43:51 +02:00
Kaveh Vahedipour	4c94a1c8ab	fixed a bug in create collection in cluster, where transaction result was not checked for success before access (#3137 )	2017-08-28 14:58:26 +02:00
Jan	47e29e6e1f	Bug fix/issues 1806 (#3069 ) * fix buffer overruns in linenoise for long input lines * don't make historian repeatedly print the same error messages that nothing can be done about * make the implementations of the logging operator<<s not throw exceptions, so that logging does throw exceptions as an unintended side effect * update CHANGELOG * improve error message * don't copy strings, but pass them by const reference	2017-08-18 22:58:09 +02:00
Kaveh Vahedipour	1d1e0f5a50	Feature/cluster id and extended health (#3046 ) * added unique id to cluster, added access to Health * added agents to health api * added agents to health api * added agents to health api * transaction information for api * agents listed like other servers * missing line through merge conflict	2017-08-18 11:13:23 +02:00
m0ppers	930dd8aad2	MSVC is pendantic (but right) (#3047 )	2017-08-17 21:25:34 +02:00
Frank Celler	e446cff433	added result code in error message	2017-08-12 10:52:32 +02:00
Jan	49fa0bf4b4	used the required mutex in Store::clear to avoid races (#2957 ) also added asserts for that the mutex is actually held everywhere where it is required	2017-08-05 17:15:51 +02:00
Jan	ed9d15156e	remove dependency on MMFiles features from non-MMFiles files (#2925 )	2017-08-01 22:16:43 +02:00
Kaveh Vahedipour	2bbb2224e8	agency size 1 issues on windows and arschlinux's hyping of the bleeding edge	2017-07-14 12:07:14 +02:00
Andreas Streichardt	f1a6f16135	Carrotfix to only stop inception thread if it has been started should really be done in the thread lib but that is a bit risky so shortly before the release	2017-07-14 10:20:49 +02:00
Max Neunhöffer	bf168a7496	Fix a bug that the URL was overwritten by redirect in AgencyComm. (#2794 )	2017-07-13 16:27:09 +02:00
Max Neunhöffer	2f874249bb	Bug fix/adjust agency comm timeouts (#2765 ) * Take out 503 timeouts altogether. * Overhaul of AgencyComm::sendWithFailover loop. * Let performRequests optionally ignore 404 coll not found. * Fix error message "database not found" when AgencyComm failed. * Add log entries in Agency if locks are acquired too slowly. * Reexecute the javascript cluster sync stuff even if there was no plan/current change...So failed sync jobs can retry later... * Cover callbacks in Communicator by lock. This fixes https://github.com/arangodb/planning/issues/370 * Put in delay in waiting for leader in agency test. * Schmutz logging to heartbeat topic. * Add more lock time diagnostic in agent. * Switch on agencycomm tracing in coordinator.	2017-07-13 00:44:28 +02:00
Kaveh Vahedipour	fd90318fd8	correct-funny-fail-rotation-after-compaction-bugfix (#2774 )	2017-07-12 22:39:23 +02:00
Jan	49d2313c2c	fix ub in agency compaction (#2736 )	2017-07-09 20:45:00 +02:00

1 2 3 4 5 ...

1166 Commits