1
0
Fork 0
Commit Graph

210 Commits

Author SHA1 Message Date
Lars Maier 2ed283ef3c Fixing broken UI. (#7551) 2018-11-29 19:31:45 +01:00
Kaveh Vahedipour 3225a7b16d [3.4] Feature/engine version added to agent configuration (#7481)
* agents' is obtained from leader's configuration
* corrections in Supervision for advertised endpoints
* change log
* Updated Documentation for cluster/health.
* Unified naming convention.
* Fixed missing update of volatile fields.
* Set version in right order.
* Removed debug output.
* Fixed jslint - missing ;
2018-11-29 12:00:47 +01:00
Lars Maier 154d449061 Export Version and Engine in Cluster Health. Additionally export `versionString` in registered Servers. (#7463) 2018-11-27 09:15:38 +01:00
Kaveh Vahedipour 860fa21219 Bug fix 3.4/index readiness (#6716)
* backport of test data generation for maintenance from devel
* 3.4 working
* fixing index use in cluster while still being built
* fixed broken views
* correct 200 for ensureIndex
* merge with 3.4
* agency comm to handle replace in array
* supervision changes
* cluster info's exsureIndex
* 3.4 ready
* timeout
* missing files from origin
* neunhoef complaints
* bogus entry
* no need to wait for current once again
* no longer necessary. done in IndexFactory now
* correct comments
* left overs
* dead code revived
* Move CHANGELOG entry to the right place.
2018-11-21 14:41:36 +01:00
Jan 2f9f168656
try not to throw so many exceptions from Supervision (#7226) 2018-11-06 18:00:45 +01:00
Max Neunhöffer fa683d3925
Remove a relic from early days in /Target/FailedServers. (#6689)
* Remove a relic from early days in /Target/FailedServers.
* Fix a test.
2018-10-09 13:49:38 +02:00
Lars Maier 03d5b26013 Increase current version when cleaning out a lost collection. (#6715)
* Increase the current version rather than the plan version.
2018-10-04 13:49:54 +02:00
Lars Maier 09395e73de Added try-catch-block. (#6649)
* Added try-catch-block.
* Removed debug output.
* Deleted unneeded constructors.
* Assignment operator deleted.
2018-09-28 17:09:50 +02:00
Lars Maier 0e9aa10c2a Feature 3.4/cleanup lost collections (#6627)
* Working draft: clean lost collections in supervision.
* Added early exit as in spec.
* Finished test. Fixed logging.
2018-09-27 10:35:39 +02:00
Simon f79a7d1a8f Fix crash on Agency / DBserver with user JWT tokens (#6595) 2018-09-26 14:22:27 +02:00
Kaveh Vahedipour 2041e56f44 advertised endpoints (#6493) 2018-09-14 10:05:46 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00
Simon 468231efc5 AQL Profiling code (#5165)
* initial start of profiling

* adding profiling code

* Fixing remote block tracing, fixing width and units

* Fixing some tests

* Various fixes

* adressing review comments
2018-04-24 16:17:30 +02:00
Matthew Von-Maszewski a84f7805ad Feature/mv thread death logging (#5111)
* Initial low level interface for thread crash reporting (and management).
* Add a member version of isClusterRole()
* isolate heartbeat thread creation to new StartHeartbeatThread().  create heartbeat thread even if not a cluster or if an agent.
* update runDBServer() and runCoordinator() to shutdown more quickly by polling isStopping() at additional locations.
* copying updates from different branch / PR
* basic thread crash logging.  Not yet tied into Agency arangod or have any specific threads posting crashes
* make Supervision thread a CriticalThread
* sandwich CriticalThread between Thread and other classes to create long term, repeating thread crash reporting.
* restore code lost upon branch update relating to new startHeartbeatThread() function
* add CriticalThread.cpp to build
* add new runAgentServer() function to loop for Agents.  Make Heartbeat thread derive from CriticalThread.
* remove debug line
2018-04-23 15:50:14 +02:00
Simon 45fbed497b Supervision Job for Active Failover (#5066) 2018-04-23 12:49:41 +02:00
Kaveh Vahedipour 3d043b35a3 Feature/supervsion maintenance mode (#5108)
* Supervision goes to Maintenance mode, when /arango/Supervision/Maintenance exists
* coordinator route stands
* stop updates in transient, when supervision off
2018-04-20 13:23:22 +02:00
Matthew Von-Maszewski c0c149cf5b Create non-throwing wrappers for Node access in Agency (#4598)
* safety checkin of Node throw reduction.
* final round of Node throw protection.  Common accessors now protected to force code to hasAsXXX() functions.
2018-04-17 10:21:14 +02:00
Kaveh Vahedipour f4edcc7ba8 Bug fix/supervision engine starting early on leadership change (#5062)
* supervision must not work as long as agent is still preparing
* leadersince atomic and pushed to end of leader preparation
* More consistent use of integer types.
* Slightly change order of events in Supervision loop.
2018-04-10 15:28:26 +02:00
Kaveh Vahedipour 7f9786eb27 builder fixed for agency transaction. worked only for a single server. (#4436) 2018-02-06 23:14:53 +01:00
Kaveh Vahedipour 7715c75c59 let's not miss failedserver removal (#4208)
* let's not miss failedserver removal
* remove resetting of FailedServers in test code
* Only call abortRequestsToFailedServers at most every 3 seconds.
2018-01-03 21:55:40 +01:00
Matthew Von-Maszewski ae77ff80c2 create independent executeLockedRead and executeLockedWrite to speed performance (#4177) 2017-12-29 12:02:27 +01:00
Max Neunhöffer 7bae6980e8
Bug fix/agent lead hanger (#4147)
* Really enforce the hidden option --server.maximal-threads if given.
* Switch off --log.force-direct in scripts/startStandAloneAgency.sh
* Lower the timeout for sending AppendEntriesRPC to 150s.
* Erase _earliestPackage when becoming a leader.
* Challenge leadership in agent main loop.
* Use steady_clock for _earliestPackage.
* Change _lastAcked and _leaderSince to steady_clock as well.
* time difference calculations based on old readSystemClock to steadyClockToDouble
* All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps
* Inception system_clock to steady_clock
2017-12-27 16:45:39 +01:00
Matthew Von-Maszewski 8723df7681 Fix supervisor thread crash (#4083)
* Server short name could arrive too late for first health check.  Would lead to supervisor thread crash.  Add test for this condition and defense against other unknown throws in health check.

* Correct capitalization of ShortName.  Add spaces to two Log lines.
2017-12-27 16:10:47 +01:00
Kaveh Vahedipour ace06575dd when upgrading from 3.1 LastHeartBeatAcked could also have been missing, when the 3.1 cluster had not run for long enough (#3757) 2017-12-08 15:56:19 +01:00
Kaveh Vahedipour c300eee5f0 minor (#3813) 2017-11-27 18:22:13 +01:00
Kaveh Vahedipour 7b80deb5cc Fixed object assignment operator for agency's key value store (#3701)
* Fixed object assignment operator for agency's key value store
* Node's toJson is now actually toJson. getString should be used for string extractions
* adjust agency's documentation (clarify precondition)
2017-11-17 15:49:40 +01:00
Kaveh Vahedipour 255d90d26a cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency (#3709)
This is the HealthRecord upgrade patch.
2017-11-17 10:14:14 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
Michael Hackstein 15d9a4be5f Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510) 2017-10-25 18:03:24 +02:00
Simon Grätzer 7c31960cf2 Feature/async failover (#3451) 2017-10-18 23:59:29 +02:00
Max Neunhöffer 9a2385b941 Add host id detection and show in /_admin/cluster/Health. (#3389) 2017-10-11 12:42:44 +02:00
Kaveh Vahedipour 627f344266 fixed a bug, where when servers failed, when also agency leadership c… (#3189)
* fixed a bug, where when servers failed, when also agency leadership changes

* redid entire design of checkDBServers/checkCoordinators.

* comparison in supervision must be between oldPersisted and newHealth

* UI stuff

* UI stuff

* FailedServer test needed adjustment

* Hopefully final round

* fixed supervision failure detection

* FailedServer tests back to origin devel

* oldNot documented among preconditions in Agency HTTP API docs

* changed only look for status updated

* non action line in api-cluster
2017-09-07 16:10:23 +02:00
Kaveh Vahedipour 00650e6a3f Bug fix/agency mt fixes (#3158)
* added debugging methods

* try to fix invalid access in case of error

* remove unused members

* bugfixes and comments

* all agency fixes in

* merge bug

* partially unguarded Agent::lead fixed

* all agency fixes in

* added nrBlocked to thread startup eval

* added nrBlocked to thread startup eval

* recombination of cases in State::get

* some maps replaced with unordered_maps

* optimized maps some
2017-08-30 10:43:51 +02:00
Andreas Streichardt 8e15412e06 Wait for supervision node to prevent races 2017-06-09 15:52:29 +02:00
jsteemann 2930ab6b57 cppcheck 2017-05-15 22:39:16 +02:00
Andreas Streichardt fe59502848 Fix server health 2017-05-11 12:20:15 +02:00
Kaveh Vahedipour de77b5ec7a getting rid of exceptions in supervision 2017-05-10 17:50:31 +02:00
Kaveh Vahedipour b0e7ce40f0 avoid exceptions in supervision main thread when running without cluster 2017-05-04 14:37:03 +02:00
Kaveh Vahedipour 68efba18e8 keep agencyPrefix, when non set 2017-04-26 15:32:26 +02:00
jsteemann 4289105eb3 fix shutdown issue 2017-04-25 16:09:01 +02:00
Kaveh Vahedipour 09a6888d14 attempt at fixing shutdown bug on mac os x 2017-04-24 10:45:54 +02:00
jsteemann ea8496f1a5 cppcheck 2017-04-21 20:19:36 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Kaveh Vahedipour 4cc830b0df merge from 3.1 2017-02-20 20:05:52 +01:00
jsteemann b3ac54d065 remove global namespace include 2017-02-13 13:03:33 +01:00
jsteemann d024a6d00a remove logging for non-topics 2017-02-10 09:32:50 +01:00
Andreas Streichardt 8349f56e40 Properly check return valiue 2017-02-07 15:15:56 +01:00
Kaveh Vahedipour 8d66d69f83 supervision handles coordinator demise correctly 2017-02-07 11:29:37 +01:00
Kaveh Vahedipour f3cb1307a5 3.1 fixes backported to devel 2017-02-03 10:48:25 +01:00
jsteemann fa917937c4 do not use namespaces in header files 2017-02-01 13:41:31 +01:00