arangodb

Commit Graph

Author	SHA1	Message	Date
Lars Maier	2ed283ef3c	Fixing broken UI. (#7551 )	2018-11-29 19:31:45 +01:00
Kaveh Vahedipour	3225a7b16d	[3.4] Feature/engine version added to agent configuration (#7481 ) * agents' is obtained from leader's configuration * corrections in Supervision for advertised endpoints * change log * Updated Documentation for cluster/health. * Unified naming convention. * Fixed missing update of volatile fields. * Set version in right order. * Removed debug output. * Fixed jslint - missing ;	2018-11-29 12:00:47 +01:00
Lars Maier	154d449061	Export Version and Engine in Cluster Health. Additionally export `versionString` in registered Servers. (#7463 )	2018-11-27 09:15:38 +01:00
Kaveh Vahedipour	860fa21219	Bug fix 3.4/index readiness (#6716 ) * backport of test data generation for maintenance from devel * 3.4 working * fixing index use in cluster while still being built * fixed broken views * correct 200 for ensureIndex * merge with 3.4 * agency comm to handle replace in array * supervision changes * cluster info's exsureIndex * 3.4 ready * timeout * missing files from origin * neunhoef complaints * bogus entry * no need to wait for current once again * no longer necessary. done in IndexFactory now * correct comments * left overs * dead code revived * Move CHANGELOG entry to the right place.	2018-11-21 14:41:36 +01:00
Jan	2f9f168656	try not to throw so many exceptions from Supervision (#7226 )	2018-11-06 18:00:45 +01:00
Max Neunhöffer	fa683d3925	Remove a relic from early days in /Target/FailedServers. (#6689 ) * Remove a relic from early days in /Target/FailedServers. * Fix a test.	2018-10-09 13:49:38 +02:00
Lars Maier	03d5b26013	Increase current version when cleaning out a lost collection. (#6715 ) * Increase the current version rather than the plan version.	2018-10-04 13:49:54 +02:00
Lars Maier	09395e73de	Added try-catch-block. (#6649 ) * Added try-catch-block. * Removed debug output. * Deleted unneeded constructors. * Assignment operator deleted.	2018-09-28 17:09:50 +02:00
Lars Maier	0e9aa10c2a	Feature 3.4/cleanup lost collections (#6627 ) * Working draft: clean lost collections in supervision. * Added early exit as in spec. * Finished test. Fixed logging.	2018-09-27 10:35:39 +02:00
Simon	f79a7d1a8f	Fix crash on Agency / DBserver with user JWT tokens (#6595 )	2018-09-26 14:22:27 +02:00
Kaveh Vahedipour	2041e56f44	advertised endpoints (#6493 )	2018-09-14 10:05:46 +02:00
Kaveh Vahedipour	28754cbf15	Feature/schmutz plus plus (#5972 ) - Schmutz now called "Maintenance" and completely implemented in C++ - Fix index locking bug in mmfiles - Fix a bug in mmfiles with silent option and repsert - Slightly increase supervision okperiod and graceperiod	2018-08-24 12:15:35 +02:00
Simon	468231efc5	AQL Profiling code (#5165 ) * initial start of profiling * adding profiling code * Fixing remote block tracing, fixing width and units * Fixing some tests * Various fixes * adressing review comments	2018-04-24 16:17:30 +02:00
Matthew Von-Maszewski	a84f7805ad	Feature/mv thread death logging (#5111 ) * Initial low level interface for thread crash reporting (and management). * Add a member version of isClusterRole() * isolate heartbeat thread creation to new StartHeartbeatThread(). create heartbeat thread even if not a cluster or if an agent. * update runDBServer() and runCoordinator() to shutdown more quickly by polling isStopping() at additional locations. * copying updates from different branch / PR * basic thread crash logging. Not yet tied into Agency arangod or have any specific threads posting crashes * make Supervision thread a CriticalThread * sandwich CriticalThread between Thread and other classes to create long term, repeating thread crash reporting. * restore code lost upon branch update relating to new startHeartbeatThread() function * add CriticalThread.cpp to build * add new runAgentServer() function to loop for Agents. Make Heartbeat thread derive from CriticalThread. * remove debug line	2018-04-23 15:50:14 +02:00
Simon	45fbed497b	Supervision Job for Active Failover (#5066 )	2018-04-23 12:49:41 +02:00
Kaveh Vahedipour	3d043b35a3	Feature/supervsion maintenance mode (#5108 ) * Supervision goes to Maintenance mode, when /arango/Supervision/Maintenance exists * coordinator route stands * stop updates in transient, when supervision off	2018-04-20 13:23:22 +02:00
Matthew Von-Maszewski	c0c149cf5b	Create non-throwing wrappers for Node access in Agency (#4598 ) * safety checkin of Node throw reduction. * final round of Node throw protection. Common accessors now protected to force code to hasAsXXX() functions.	2018-04-17 10:21:14 +02:00
Kaveh Vahedipour	f4edcc7ba8	Bug fix/supervision engine starting early on leadership change (#5062 ) * supervision must not work as long as agent is still preparing * leadersince atomic and pushed to end of leader preparation * More consistent use of integer types. * Slightly change order of events in Supervision loop.	2018-04-10 15:28:26 +02:00
Kaveh Vahedipour	7f9786eb27	builder fixed for agency transaction. worked only for a single server. (#4436 )	2018-02-06 23:14:53 +01:00
Kaveh Vahedipour	7715c75c59	let's not miss failedserver removal (#4208 ) * let's not miss failedserver removal * remove resetting of FailedServers in test code * Only call abortRequestsToFailedServers at most every 3 seconds.	2018-01-03 21:55:40 +01:00
Matthew Von-Maszewski	ae77ff80c2	create independent executeLockedRead and executeLockedWrite to speed performance (#4177 )	2017-12-29 12:02:27 +01:00
Max Neunhöffer	7bae6980e8	Bug fix/agent lead hanger (#4147 ) * Really enforce the hidden option --server.maximal-threads if given. * Switch off --log.force-direct in scripts/startStandAloneAgency.sh * Lower the timeout for sending AppendEntriesRPC to 150s. * Erase _earliestPackage when becoming a leader. * Challenge leadership in agent main loop. * Use steady_clock for _earliestPackage. * Change _lastAcked and _leaderSince to steady_clock as well. * time difference calculations based on old readSystemClock to steadyClockToDouble * All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps * Inception system_clock to steady_clock	2017-12-27 16:45:39 +01:00
Matthew Von-Maszewski	8723df7681	Fix supervisor thread crash (#4083 ) * Server short name could arrive too late for first health check. Would lead to supervisor thread crash. Add test for this condition and defense against other unknown throws in health check. * Correct capitalization of ShortName. Add spaces to two Log lines.	2017-12-27 16:10:47 +01:00
Kaveh Vahedipour	ace06575dd	when upgrading from 3.1 LastHeartBeatAcked could also have been missing, when the 3.1 cluster had not run for long enough (#3757 )	2017-12-08 15:56:19 +01:00
Kaveh Vahedipour	c300eee5f0	minor (#3813 )	2017-11-27 18:22:13 +01:00
Kaveh Vahedipour	7b80deb5cc	Fixed object assignment operator for agency's key value store (#3701 ) * Fixed object assignment operator for agency's key value store * Node's toJson is now actually toJson. getString should be used for string extractions * adjust agency's documentation (clarify precondition)	2017-11-17 15:49:40 +01:00
Kaveh Vahedipour	255d90d26a	cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency (#3709 ) This is the HealthRecord upgrade patch.	2017-11-17 10:14:14 +01:00
Simon Grätzer	ee8209943f	Missing things for active / passive (#3578 ) * Switching from ttl to supervision based failover mechanism * Allowing canceling of ongoing actions * refactored asyncjobmanager * refactoring some code * adding read-only flag * catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode * fixing "createsANewDatabaseWithAnInvalidUser" * auth = off does not longer make everyone superuser * Fixing cluster_sync and maybe resilience	2017-11-04 20:30:23 +01:00
Michael Hackstein	15d9a4be5f	Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510 )	2017-10-25 18:03:24 +02:00
Simon Grätzer	7c31960cf2	Feature/async failover (#3451 )	2017-10-18 23:59:29 +02:00
Max Neunhöffer	9a2385b941	Add host id detection and show in /_admin/cluster/Health. (#3389 )	2017-10-11 12:42:44 +02:00
Kaveh Vahedipour	627f344266	fixed a bug, where when servers failed, when also agency leadership c… (#3189 ) * fixed a bug, where when servers failed, when also agency leadership changes * redid entire design of checkDBServers/checkCoordinators. * comparison in supervision must be between oldPersisted and newHealth * UI stuff * UI stuff * FailedServer test needed adjustment * Hopefully final round * fixed supervision failure detection * FailedServer tests back to origin devel * oldNot documented among preconditions in Agency HTTP API docs * changed only look for status updated * non action line in api-cluster	2017-09-07 16:10:23 +02:00
Kaveh Vahedipour	00650e6a3f	Bug fix/agency mt fixes (#3158 ) * added debugging methods * try to fix invalid access in case of error * remove unused members * bugfixes and comments * all agency fixes in * merge bug * partially unguarded Agent::lead fixed * all agency fixes in * added nrBlocked to thread startup eval * added nrBlocked to thread startup eval * recombination of cases in State::get * some maps replaced with unordered_maps * optimized maps some	2017-08-30 10:43:51 +02:00
Andreas Streichardt	8e15412e06	Wait for supervision node to prevent races	2017-06-09 15:52:29 +02:00
jsteemann	2930ab6b57	cppcheck	2017-05-15 22:39:16 +02:00
Andreas Streichardt	fe59502848	Fix server health	2017-05-11 12:20:15 +02:00
Kaveh Vahedipour	de77b5ec7a	getting rid of exceptions in supervision	2017-05-10 17:50:31 +02:00
Kaveh Vahedipour	b0e7ce40f0	avoid exceptions in supervision main thread when running without cluster	2017-05-04 14:37:03 +02:00
Kaveh Vahedipour	68efba18e8	keep agencyPrefix, when non set	2017-04-26 15:32:26 +02:00
jsteemann	4289105eb3	fix shutdown issue	2017-04-25 16:09:01 +02:00
Kaveh Vahedipour	09a6888d14	attempt at fixing shutdown bug on mac os x	2017-04-24 10:45:54 +02:00
jsteemann	ea8496f1a5	cppcheck	2017-04-21 20:19:36 +02:00
Kaveh Vahedipour	1f81ce28b0	merge in cpp & js from 3.1.18 yet to do tests	2017-04-21 15:41:05 +02:00
Kaveh Vahedipour	4cc830b0df	merge from 3.1	2017-02-20 20:05:52 +01:00
jsteemann	b3ac54d065	remove global namespace include	2017-02-13 13:03:33 +01:00
jsteemann	d024a6d00a	remove logging for non-topics	2017-02-10 09:32:50 +01:00
Andreas Streichardt	8349f56e40	Properly check return valiue	2017-02-07 15:15:56 +01:00
Kaveh Vahedipour	8d66d69f83	supervision handles coordinator demise correctly	2017-02-07 11:29:37 +01:00
Kaveh Vahedipour	f3cb1307a5	3.1 fixes backported to devel	2017-02-03 10:48:25 +01:00
jsteemann	fa917937c4	do not use namespaces in header files	2017-02-01 13:41:31 +01:00

1 2 3 4 5

210 Commits