1
0
Fork 0
Commit Graph

161 Commits

Author SHA1 Message Date
Max Neunhöffer 328f46e3d6 This merges hotbackup and atomic-db-creation into 3.5. (#9968)
* Squashed commit of feature-3.5/hotbackup_devel.

This puts hotbackup into 3.5.

* Port atomic-database-creation-2 to 3.5.

* Remove some wrongly ported code.

* Fix compilation.

* Fix a manual merge error.

* Remove a feature from the mocks which does not exist in 3.5.

* Add some code which was forgotten in manual merge.

* Fix a problem introduced in a manual merge.

* reuse function

* Address some whitespace issues that came up in review

* aardvark should not create the frontend collection

* create _frontend collection from c++

* recheckAndUpdate Callback in CollectionWatcher

* Wrong author ;)

* rm outdated todo

* Update lib/Basics/VelocyPackHelper.h

Co-Authored-By: Michael Hackstein <michael@arangodb.com>

* use logger unique id, use startup logger

* not needed

* optimized vector shardid method

* do not create _modules collection lazy anymre

* Formatting.

* Assert instead of if/TRI_ASSERT(false)

* Don't use exceptions as control structure

* Re-add READ_LOCKER that got lost in translation

* Fix audit log in case database creation fails early.

* legacy sharding

* Add CHANGELOG entry.

* Retry database cancellation indefinitely

* Do not use exceptions in UpgradeTask

* DropCollection is a FAST_LANE action and should not need much time or else retry.

* Remove superflous addition of LdapFeature

Proudly brought to you by ASAN tests

* Fixed check for distributShardsLike sharding on _system database

* Fixed compile issue on tests

* Removed assertion that seems to be not correct yet on devel.

* Sort out google cloud storage as remote. (#9918)

* Add successful method to ClusterCommResult.
* Improve error forwarding for cluster internal communication.

* Feature/hotbackup list retries (#9924)

* retry hot backup listing for 2 minutes in cluster before giving up

* Enable api by default.

* fix broken list of non existing id (#9957)

* Fix compilation after manual merge.

* Fix another compilation problem.

* Yet more fixes for compilation.

* More compilation fixes.
2019-09-11 13:13:54 +03:00
Jan 2fe0527d25 check for duplicate server endpoints on cluster startup (#9863)
* check for duplicate server endpoints on cluster startup

* Update arangod/Cluster/ServerState.cpp
2019-09-03 11:49:08 +03:00
Jan d2efc6cea8 fix return value when reboot id cannot be retrieved (#9858) 2019-08-30 12:26:00 +03:00
KVS85 e64080e207
Merge 3.5.1 back to 3.5 (#9713)
* Bug fix 3.5/make arangosh reconnect (#9615)

* make arangosh reconnect

* added CHANGELOG entry

* fix lagging AgencyCallbacks (#9620)

* fix lagging AgencyCallbacks

* optimizations, discussed with @mchacki

* fix wording

* updated CHANGELOG

* fix yet another undefined behavior (#9629)

* [3.5.1] Fail the FailedLeader Job if the new leader fails. (#9628)

* Fail the FailedLeader Job if the new leader fails.

* Updated changelog.

* In case of timeout do not rollback.

* Fixed catch tests.

* Changed wording.

* DELETED rollback.

* reduce wait timeouts as a mitigation for notifying waiters without ho… (#9619)

* reduce wait timeouts as a mitigation for notifying waiters without holding the required mutex

this is a quick mitigation only, which reduces maximum wait time from 1
second to 100 milliseconds without changing other behavior.

the main problem of notifying pending writers without successfully
acquiring the required mutex still needs proper addressing.

* adjust timing-dependent test

* [3.5.1] Fast Controlled Leaderchange (#9634)

* First draft of keeping in sync during controlled leader change.

* Test if server is actually the leader in plan.

* Updated changelog.

* Added oldLeader check for set-the-leader request.

* Small fixes.

* Removed LOG_DEVEL.

* less copying, more moving! 🚚 (#9645)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* Port TakeoverShardLeadership from devel to 3.5.1 (#9659)

* Create TakeoverShardLeader job.
* Add TakeoverShardLeadership to Action factory.
* Add log message at level debug.
* Sort out LOG_TOPIC ids.
* Fix unit tests.
* CHANGELOG.

* Bug fix 3.5/hide mmfiles specific info in web ui (#9668)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* hide MMFiles-specific information when we don't need it

* Ported ResignLeadership to 3.5 (#9656)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* Ported ResignLeadership to 3.5

* Add the actual http route.

* Aardvark: Add k Shortest Paths example graph to UI (#9491) (#9661)

* Aardvark: Add k Shortest Paths example graph to UI (#9491)

* Add example graph to UI

* Add kShortestPathsGraph to examples.js

* Update example-graph.js

* Update aardvark.js

* Regenerate UI

* add the ability to have cluster special examples (#9613) (#9663)

* add the ability to have cluster special examples

* Update get_cluster_health.md

* fix abort condition, fix negative filtering for cluster tests

* Test if job fails with unmet assertion

* Remove cluster test example

* germanize

* better skip reasons

* removing superfluous semicolons

* Revert skip reasons, too noisy

* various replication improvements: (#9675)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* various replication improvements:

- better debuggability (more log details)
- shorter minimum wait delay in active failover
- fixed too early pruning of WAL files on leaders

* Bug fix 3.5/fix rocksdb return code (#9692)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* fix return codes for concurrent writes to same documents

* [3.5] Feature/rebootid notice changes, backport of #9523 (#9684)

* Feature/rebootid notice changes, backport of #9523

* Fixed error code to not re-use an old one

* Bug fix 3.5/issue 9679 (#9682)

* attempt to fix load_balancing tests in slow test environments (#9626)

* Bug fix/fix swagger datatype (#9045) (#9602)

* Bug fix/fix swagger datatype (#9045)

* remove http so https arangos will work

* verify that query parameters are proper swagger data types, fix offending documentation files

* return the actual type - not the list of available ones

* check formats

* there is no uint64 in swagger

* Fresh Swagger

* fixed issue #9679

* bug-fix/issue-#9660 (#9704) (#9707)

* bug-fix/issue-#9660 (#9704)

* fix issue

* Update tests/js/common/aql/aql-view-arangosearch-cluster.inc

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* Update tests/js/common/aql/aql-view-arangosearch-noncluster.js

Co-Authored-By: Jan <jsteemann@users.noreply.github.com>

* fix cluster tests

* Update CHANGELOG

* [3.5] agency node fixes (#9698)

* node fixes port from 3.4
* fixed change log

* update rocksdb statistics to deliver sums from column family instead of single value from default family. (#9706)

* Feature 3.5/geo functions (#9710)

* Add support for WGS84 on distances (#9672)

* Add area calculations (#9693)

* Update CHANGELOG
2019-08-14 20:24:47 +03:00
Max Neunhöffer 4da1a6afdf
Coordinators do not unregister at every shutdown. (#9134)
* Coordinators do not unregister at every shutdown.

Instead they create a new short name with every start.
This is needed for the transactions.

* Always new short id for coordinators. Never for DBServers!
2019-05-29 22:55:22 +02:00
Lars Maier c99e8e8973 [devel] ClientID Agency Transaction (#8652)
* Changed clientId to format <serverid>:<uuid>.
* Changed behavior if id is not known.
2019-04-30 10:39:23 +02:00
Jan Christoph Uhde c3f7961b88 apply unique log ids (#8561) 2019-03-25 20:26:51 +01:00
Simon 49cc3bcd1e Refactorings from cluster trx improvement branch (#8391) 2019-03-14 23:13:17 +01:00
Dan Larkin-York 4459cde000 Make coordinator short ID transient (#8234) 2019-03-01 09:48:36 +01:00
Tobias Gödderz a1d3bc3e94 Foxx queue jobs hanging after Foxxmaster crash (#7922)
* Fixed bug where the Foxxmaster doesn't reset jobs after a crash when it should, or a non-master coordinator removes jobs in progress during startup

* Added a regression test

* Updated CHANGELOG

* Fixed non-maintainer compile
2019-01-14 16:08:08 +01:00
Frank Celler ac9f375fb5 big reformat 2018-12-26 00:54:03 +01:00
Tobias Gödderz ceeae07ffe Reload Foxx routes during startup (#7533) 2018-12-10 12:41:50 +01:00
Jan 14c598c194
allow using UTF8 filenames for UUID directory (#7568) 2018-11-30 16:44:04 +01:00
Lars Maier f3ade0f860 Version/Engine Cluster Health (#7474)
* Export Version and Engine in Cluster Health. Additionally export `versionString` in registered Servers.

* Updated Changelog.
2018-11-27 14:56:00 +01:00
Wilfried Goesgens 0a7c7446af Bug fix/less exceptions (#7385) 2018-11-21 12:00:14 +01:00
Simon cb4c07e0ed Replace engine equality feature (#6931)
* replace engine equality feature

* remove pointless code
2018-10-17 14:41:47 +02:00
Jan 165bf3bd1b
fix arangojs issue 573 (#6767) 2018-10-10 09:19:24 +02:00
Dan Larkin-York 1f63f16396 Move some logging off of general topic. 2018-10-01 13:28:11 -04:00
Wilfried Goesgens a477df49cf Feature/windows utf16 fileaccess (#6534) 2018-09-24 19:41:17 +02:00
Max Neunhöffer 84735955ea Add advertised endpoints. (#6104) 2018-09-13 16:30:55 +02:00
Max Neunhöffer bdf8c7d1a4
Wait 2s after switching server mode before answering. (#6390)
This is needed because the change is propagated via the agency and the
heartbeat, which only happens once per second.
2018-09-05 17:04:27 +02:00
Kaveh Vahedipour 28754cbf15 Feature/schmutz plus plus (#5972)
- Schmutz now called "Maintenance" and completely implemented in C++
 - Fix index locking bug in mmfiles
 - Fix a bug in mmfiles with silent option and repsert
 - Slightly increase supervision okperiod and graceperiod
2018-08-24 12:15:35 +02:00
Jan 4d4135d25c
Feature/add dbserver as an alias for primary (#6072)
* add "DBSERVER" as an alias for "PRIMARY"

This allows specifying the value "DBSERVER" for `--cluster.my-role`.
"DBSERVER" is only treated as an alias for "PRIMARY", because several
other parts of the code and APIs use the string "PRIMARY".
Changing these from "PRIMARY" to "DBSERVER" would make the change
downwards-incompatible, which we do not want.

The downside of this alias-only solution is that even when specifying
a role value of "DBSERVER", the server will still report its role as
"PRIMARY", which may be a bit confusing. The server will also generate
its id as "PRMR-XXXX" as before:

    2018-08-03T15:23:09Z [9584] INFO {cluster} Starting up with role PRIMARY
    2018-08-03T15:23:09Z [9584] INFO {cluster} Cluster feature is turned on. Agency version: {"server":"arango","version":"3.4.devel","license":"enterprise"}, Agency endpoints: http+tcp://[::]:4001, server id: 'PRMR-f655b728-4cea-44ac-88e9-8b34baa80958', internal address: tcp://[::1]:8629, role: PRIMARY

* adjusted documentation to use "DBSERVER" instead of "PRIMARY"

* api doc

- secondary role not used anymore. stated.
- primary database is not clear. replaced with dbserver
- brief referenced only dbserver and coordinator - better to provide wider description, in line with what is described below, as other roles can be returned

* typo

* typo

* added starting from 3.4

* additional warning

* cited in the release note
2018-08-06 17:20:50 +02:00
Jan 1c8f6a75dd
Bug fix/fix issue 6076 (#6082) 2018-08-06 14:27:25 +02:00
Max Neunhöffer 014c3f7f53
Only load Plan and Current in ClusterInfo when actually needed. (#5649)
* Only update Plan and Current from Agency if not already done.
* Add read protection for getPlanVersion and getCurrentVersion.
* Add a further check to loadPlan and loadCurrent.
* Fix tests to new behaviour.
* Try to increase Plan/Version and Current/Version with every change.
* Add two more increments of Plan/Version
* Add missing increments in tests for Plan/Version.
* Add changelog entry.
2018-07-16 12:20:13 +02:00
Dan Larkin-York 8b0cb1c657 Restrict cursors to generating user (#5744) 2018-07-03 17:44:15 +02:00
Dan Larkin-York 21e16a8a24 Add load balancer awareness for cursor API (#5682) 2018-07-03 14:29:09 +02:00
Simon 545561e9a9 Read only server (#5652) 2018-07-03 09:58:16 +02:00
jsteemann a14b67f584 use nullptr 2018-05-25 18:55:37 +02:00
Simon 8be273efb8 Replication cleanup (#5105) 2018-04-17 08:17:42 +02:00
Jan 76dcd6ded5
added option `--cluster.require-persisted-id` (#5001) 2018-04-13 11:08:49 +02:00
Jan 7cb115a1a9
remove option `--cluster.my-local-info` (#4999) 2018-04-03 17:34:08 +02:00
Jan 9c76613e63
fix premature unlock (#3802)
* fix some deadlocks found by evil lock manager (tm)

* fix duplicate lock

* fix indentation

* ensure proper lock dependencies

* fix lock acquisition

* removed useless comment

* do not lock twice

* create either a V8 transaction context or a standalone transaction context, depending on if we are called from within V8 or not

* AQL micro optimizations

* use explicit constructor

* only use V8DealerFeature's ConditionLocker for acquiring a free V8 context

entering and exiting the selected context is then done later on without having to hold the ConditionLocker

* remove some recursive locks

* Disable custom deadlock detection when Thread Sanitizer is enabled

* Changing ifdef's

* grr

* broke gcc

* Using atomic for ApplicationServer::_server

* fix premature unlock

* add some asserts

* honor collection locking in cluster

* yet one more lock fix

* removed assertion

* some more bugfixes

* Fixing assert

(cherry picked from commit 1155df173bfb67303077fbe04ee8d909517bfd21)
2017-12-13 13:27:42 +01:00
Jan 282be208cc
remove TRI_usleep and TRI_sleep, and use std::this_thread::sleep_for … (#3817) 2017-12-06 18:43:49 +01:00
Kaveh Vahedipour 7015db790f replaced all AgencyGeneralTransactions by AgencyWriteTransaction (#3841) 2017-11-29 17:33:24 +01:00
m0ppers 6cbf7159be Feature/server mode (#3590)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* Fixing some bugs

* Fixing tests

* Canceling ongoing operations

* removing some unused code + some asserts

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* Current work

* proper ifs

* Migrate resthandler to c++ and implement mode

* readonly server mode spec test

* Add changelog

* change code so it expects a full object and not just a string

* Fix jslint
2017-11-10 17:56:21 +01:00
Kaveh Vahedipour b09c329a7c fixed 10 min dead time, when bogus --cluster.my-local-info specification (#3643) 2017-11-10 16:05:03 +01:00
Jan 7613bc4314 Bug fix/fixes 0211 (#3568)
* remove some non-unused V8 persistents

* do not throw that many bogus assertions

* do not rely on server role being defined

* slightly better debug output for V8 context debugging

* fix collection ids in inventory response

* simplify bootstrap a bit

* slightly better error handling

* make elapsed time a queryable value

* use less memory for stub collections

* added assertions that will always make sense

* added assertions

* do not garbage-collect while waiting

* less copying of parameters

* do not show "load indexes into memory" buttons for mmfiles engine

  as all indexes are in memory anyway

* when a collection is truncated via the web interface, flush the WAL and rotate all active journals

this will make close all open journals on leader and followers and make them subject to compaction opportunities

* fix invalid server id values being passed from web interface to backend

* introduce afterTruncate method for indexes

* added test case for issue #3447

* updated CHANGELOG

* don't warn about replicationFactor for system collections

* check that the queries actually use the geo index and not some other index

* properly report error in web interface

* fix some internals checks that made truncate fail for bigger collections in maintainer mode

* also run a compact() operation after a serious truncate

in order to make iteration over the truncated range much faster
when the collection is next accessed

* increase default maximum number of V8 contexts to at least 16
2017-11-09 12:48:15 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
Simon Grätzer 3e211f0d9e Fix --cluster.my-local-info (#3481)
* Compatibility for old startup methods.

* removed my-local-info from docs
2017-10-23 12:20:17 +02:00
Simon Grätzer 7c31960cf2 Feature/async failover (#3451) 2017-10-18 23:59:29 +02:00
Max Neunhöffer 9a2385b941 Add host id detection and show in /_admin/cluster/Health. (#3389) 2017-10-11 12:42:44 +02:00
Jan 5165155ed1 Bug fix/fixes 0609 (#3227)
* do not use V8 variant of AQL functions in early optimization stage when a C++ variant is available

* additionally, simplify AQL function definitions and aliases

* warn when more than 90% of max mappings are in use

* added C++ variant of replication catchup

* added `--log.role` option

* updated CHANGELOG

* removed non-existing scheduler.threads option from config

* removed useless __FILE__, __LINE__ invocations

* updated CHANGELOG

* allow a priority V8 context

* remove TRI_CORE_MEM_ZONE

* try to fix Windows errors & warnings

* cleanup

* removed memory zones altogether

* exclude system collections from collection tests
2017-09-13 16:28:21 +02:00
Kaveh Vahedipour c1abc0333d cluster documentation varnish (#2553) 2017-06-12 19:02:11 +02:00
jsteemann 1df46f8923 Merge branch 'devel' of https://github.com/arangodb/arangodb into engine-api 2017-04-21 16:59:32 +02:00
Kaveh Vahedipour 1f81ce28b0 merge in cpp & js from 3.1.18 yet to do tests 2017-04-21 15:41:05 +02:00
Simon Grätzer b304d35ac2 Added rocksdb background thread 2017-04-20 18:50:52 +02:00
Andreas Streichardt e3d8f19368 Fix unused variables 2017-02-13 15:22:58 +01:00
Andreas Streichardt 1bb8f97773 Fix secondaries 2017-02-13 14:00:19 +01:00
Jan Christoph Uhde 16a9ddd78d fix ServerState.cpp 2017-02-10 20:52:26 +01:00