1
0
Fork 0
Commit Graph

123 Commits

Author SHA1 Message Date
Lars Maier 6b04e3de03 Ported ResignLeadership to 3.4 (#9669)
* Ported ResignLeadership to 3.5

* Added http route.
2019-08-09 16:41:13 +02:00
Lars Maier 6733f2a54d Fixed lost precondition when removing server. (#8986) 2019-05-14 15:37:03 +02:00
Jan c56080bc2e
use _system database in /_admin/cluster/health (#8897) 2019-05-06 09:14:15 +02:00
Max Neunhöffer 46e479376d
Further supervision fixes. (#8259)
* Do not schedule Coordinators in Plan.

* Finish failed server when server is no longer in health.

* Fix removeServer checks.

Check that server is no longer in use before removing it. Give 60s
waiting time for condition to be met. Also observer agency lock.

* Finish FailedFollower job if server no longer follower.

This can happen because RemoveFollower was faster.

* Only use GOOD servers as replacement followers.

* Fix AddFollower for satellite collections.

* Fix RemoveServer for satellite collections.

* MoveShard handles moves from leader to followers

* Prepare CleanoutServer and FailedServer for satellite collections.

* More sorting out of AddFollower and RemoveFollower.

* Fix RemoveFollower job w.r.t. choice of follower to remove.

* Fix message.

* kill you own sub jobs, please

* Added preconditions to payloads for supervision's job finishers

* Improve logging.

* Add agency diagnostics to failed move shard test, start.

* Add coordinator agency diagnostics.

* Remove warning.

* Add changelog entry.

* Add agency diagnostics if things go sour with move shard.

* Add agency diags when things go wrong 2.

* API /_api/agency/state: back to old format.

* Fix Windows compilation.

* handle aborts in supervision and wait for the last Raft log to be committed

* tests compiling, 2 failing for valid reasons

* Correctly report TRI_ERROR_CLUSTER_CONNECTION_LOST as 503.

* FailedLeader /FailedFollower cannot continue, when aborting blocks
2019-03-04 11:43:35 +01:00
Kaveh Vahedipour 3225a7b16d [3.4] Feature/engine version added to agent configuration (#7481)
* agents' is obtained from leader's configuration
* corrections in Supervision for advertised endpoints
* change log
* Updated Documentation for cluster/health.
* Unified naming convention.
* Fixed missing update of volatile fields.
* Set version in right order.
* Removed debug output.
* Fixed jslint - missing ;
2018-11-29 12:00:47 +01:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00
Simon 18c3069117 Make cluster routes check roles (#6239) 2018-08-24 09:46:27 +02:00
Jan 102f15bece
removed several unused internal APIs (#6193) 2018-08-20 12:57:58 +02:00
Max Neunhöffer a84d9f7335
Add an API to query for status of moveShard and cleanOutServer jobs. (#5593)
This is so far intentionally undocumented, since we want to collect
experience with it first.
2018-06-15 16:27:47 +02:00
Kaveh Vahedipour 3d043b35a3 Feature/supervsion maintenance mode (#5108)
* Supervision goes to Maintenance mode, when /arango/Supervision/Maintenance exists
* coordinator route stands
* stop updates in transient, when supervision off
2018-04-20 13:23:22 +02:00
Jan 5fd0bb7dbf
removed remainders of dysfunctional `/_admin/cluster-test` and `/_admin/clusterCheckPort` API endpoints and removed them from documentation (#4861) 2018-03-18 22:48:09 +01:00
Simon 35136a89c0 Fix some problems with active failover (#4540) 2018-02-09 15:11:53 +01:00
Heiko 61de1b6099 Bug fix/optimize shard distribution api and ui (#3921)
* UI: document/edge editor now remembering their modes (e.g. code or tree)

* changed shardDistribution api behaviour, added PUT route to only fetch collection based shard distribution

* ui: optimized shards view, added missing cleanup function in nodes view

* broken test

* shard distribution tests not fit the new api behaviour

* variables as reference

* CHANGELOG
2018-01-02 12:42:12 +01:00
Andreas Streichardt 5071a77340 Fix jslint 2017-11-21 11:36:25 +01:00
Andreas Streichardt 810ca8a9ef Fix removal of nodes 2017-11-20 17:13:37 +01:00
Michael Hackstein 5c633f9fae Bug fix/speedup shard distribution (#3645)
* Added a more sophisticated test for shardDistribution format

* Updated shard distribution test to use request instead of download

* Added a cxx reporter for the shard distribuation. WIP

* Added some virtual functions/classes for Mocking

* Added a unittest for the new CXX ShardDistribution Reporter.

* The ShardDsitributionReporter now reports Plan and Current correctly. However it does not dare to find a good total/current value and just returns a default. Hence these tests are still red

* Shard distribution now uses the cxx variant

* The ShardDistribution reporter now tries to execute count on the shards

* Updated changelog

* Added error case tests. If the servers time out the mechanism will stop bothering after two seconds and just report default values.
2017-11-10 15:17:08 +01:00
Jan 7613bc4314 Bug fix/fixes 0211 (#3568)
* remove some non-unused V8 persistents

* do not throw that many bogus assertions

* do not rely on server role being defined

* slightly better debug output for V8 context debugging

* fix collection ids in inventory response

* simplify bootstrap a bit

* slightly better error handling

* make elapsed time a queryable value

* use less memory for stub collections

* added assertions that will always make sense

* added assertions

* do not garbage-collect while waiting

* less copying of parameters

* do not show "load indexes into memory" buttons for mmfiles engine

  as all indexes are in memory anyway

* when a collection is truncated via the web interface, flush the WAL and rotate all active journals

this will make close all open journals on leader and followers and make them subject to compaction opportunities

* fix invalid server id values being passed from web interface to backend

* introduce afterTruncate method for indexes

* added test case for issue #3447

* updated CHANGELOG

* don't warn about replicationFactor for system collections

* check that the queries actually use the geo index and not some other index

* properly report error in web interface

* fix some internals checks that made truncate fail for bigger collections in maintainer mode

* also run a compact() operation after a serious truncate

in order to make iteration over the truncated range much faster
when the collection is next accessed

* increase default maximum number of V8 contexts to at least 16
2017-11-09 12:48:15 +01:00
Jan 53ea1ba560 remove dead apis (#3574) 2017-11-03 14:44:21 +01:00
Simon Grätzer 7c31960cf2 Feature/async failover (#3451) 2017-10-18 23:59:29 +02:00
Heiko a03a86fe46 fixed wrong selection of the database inside the internal cluster js api (#3202) 2017-09-13 17:19:18 +02:00
Kaveh Vahedipour 627f344266 fixed a bug, where when servers failed, when also agency leadership c… (#3189)
* fixed a bug, where when servers failed, when also agency leadership changes

* redid entire design of checkDBServers/checkCoordinators.

* comparison in supervision must be between oldPersisted and newHealth

* UI stuff

* UI stuff

* FailedServer test needed adjustment

* Hopefully final round

* fixed supervision failure detection

* FailedServer tests back to origin devel

* oldNot documented among preconditions in Agency HTTP API docs

* changed only look for status updated

* non action line in api-cluster
2017-09-07 16:10:23 +02:00
Kaveh Vahedipour 9cad75e4e8 Feature/cluster id and extended health (#3073)
* added unique id to cluster, added access to Health

* added agents to health api

* added agents to health api

* added agents to health api

* transaction information for api

* agents listed like other servers

* missing line through merge conflict

* fixed git merge glitch
2017-08-18 11:36:53 +02:00
Kaveh Vahedipour 1d1e0f5a50 Feature/cluster id and extended health (#3046)
* added unique id to cluster, added access to Health

* added agents to health api

* added agents to health api

* added agents to health api

* transaction information for api

* agents listed like other servers

* missing line through merge conflict
2017-08-18 11:13:23 +02:00
Kaveh Vahedipour 231a360b3b fixes for secondaries 2017-07-11 14:05:51 +02:00
Andreas Streichardt 7f8ff26f41 Fix jslint 2017-05-11 15:34:56 +02:00
Andreas Streichardt 9ec902ace8 Allow removing dbnodes 2017-05-11 15:18:29 +02:00
hkernbach d7bfe4dfc7 Merge branch 'devel' of github.com:arangodb/arangodb into devel 2017-05-10 14:26:45 +02:00
hkernbach 38a3bab42e added api cluster api routes 2017-05-10 14:26:26 +02:00
Andreas Streichardt 23911d66d0 Finally fix the error where suddenly an array of dbservers are being called 2017-05-10 12:10:09 +02:00
Max Neunhoeffer bf2e5f30ca Port 3.1 fixes to devel, shortName translation. 2017-04-26 10:02:14 +02:00
Max Neunhoeffer c8a205b1aa New /_api/cluster/endpoints.
Also fix documentation (and deprecate) /_api/endpoint.
2017-03-15 13:33:50 +01:00
Andreas Streichardt e345415879 Allow removing dbservers which are no longer in use 2017-02-13 17:23:39 +01:00
Andreas Streichardt 1b92c8e46b Allow removing dbservers 2017-02-13 17:23:39 +01:00
jsteemann 5a1fe3a341 jslint 2017-02-13 14:35:14 +01:00
Andreas Streichardt 1bb8f97773 Fix secondaries 2017-02-13 14:00:19 +01:00
Andreas Streichardt cf39ddd39a Fix jslint 2017-02-10 19:43:04 +01:00
Andreas Streichardt bc1df86ddd remove lock queries 2017-02-10 19:34:29 +01:00
Andreas Streichardt f1da0c54f6 jslint 2017-02-08 15:07:47 +01:00
Andreas Streichardt edd27e7eff more remove checks 2017-02-08 15:07:47 +01:00
Andreas Streichardt ad1c7e7c13 oh jslint 2017-02-08 10:49:36 +01:00
Andreas Streichardt 2e3026db91 Make it possible to remove failed coordinators 2017-02-08 10:26:13 +01:00
hkernbach 4ae274fbd3 fixed cluster parsing issue when node information is not available 2017-01-10 17:07:51 +01:00
Andreas Streichardt 975e7a65b2 Fix cluster internal requests in authentication enabled cluster 2017-01-05 16:20:15 +01:00
Kaveh Vahedipour 805f894368 fixed faulty output when primary db server from changeSecondary api could not be found 2016-11-30 11:27:54 +01:00
Andreas Streichardt 2abc46f3e6 fix shardview...no more endless loops during shard rebalancing...also
fix error management when trying to reach dbservers
2016-11-21 17:47:06 +01:00
jsteemann aa5bd85b4f fix log message 2016-11-21 14:06:10 +01:00
jsteemann 63f0b53db5 removed useless log message 2016-11-21 14:04:16 +01:00
hkernbach 23e15889c7 fixed shardDistribution info route 2016-10-25 13:40:28 +02:00
Kaveh Vahedipour 72bf15c118 Fixed moveShard to do distributeShardsLike in start instead of create 2016-10-06 15:32:41 +02:00
Kaveh Vahedipour ce8c1a0cac revisiting all supervision jobs 2016-10-05 17:16:02 +02:00