1
0
Fork 0
Commit Graph

1858 Commits

Author SHA1 Message Date
Heiko a16fd972eb optimized shard distribution api and ui (#4312) 2018-01-16 20:54:07 +01:00
Jan b8c043945f
try to not fail hard when a collection is dropped while the WAL is tailed (#4225) 2018-01-04 16:31:17 +01:00
Kaveh Vahedipour 56a9ad69b1 Bug fix 3.3/supervision no longer fails to remove server from failed when back to good (#4210)
* let's not miss failedserver removal
* remove resetting of FailedServers in test code
* Only call abortRequestsToFailedServers at most every 3 seconds.
2018-01-03 21:55:01 +01:00
Matthew Von-Maszewski d35ebbe6a1 move from devel to 3.3 the dynamic chooseTimeout() feature. (#4166) 2017-12-27 16:36:42 +01:00
Matthew Von-Maszewski f35215ea51 Have twice seen coordinator go into long loop on shutdown. Added two tests for isStopping() to break the loops. (#4139) 2017-12-21 20:56:14 +01:00
Jan 9c5893e7a7
fix premature unlock (#3802) (#4027)
* fix some deadlocks found by evil lock manager (tm)

* fix duplicate lock

* fix indentation

* ensure proper lock dependencies

* fix lock acquisition

* removed useless comment

* do not lock twice

* create either a V8 transaction context or a standalone transaction context, depending on if we are called from within V8 or not

* AQL micro optimizations

* use explicit constructor

* only use V8DealerFeature's ConditionLocker for acquiring a free V8 context

entering and exiting the selected context is then done later on without having to hold the ConditionLocker

* remove some recursive locks

* Disable custom deadlock detection when Thread Sanitizer is enabled

* Changing ifdef's

* grr

* broke gcc

* Using atomic for ApplicationServer::_server

* fix premature unlock

* add some asserts

* honor collection locking in cluster

* yet one more lock fix

* removed assertion

* some more bugfixes

* Fixing assert

(cherry picked from commit 1155df173bfb67303077fbe04ee8d909517bfd21)
2017-12-13 18:46:14 +01:00
Jan e7f93d8b9d
potentially fix send request timeout (#4025)
* potentially fix send request timeout

* fix globallyUniqueIds
2017-12-13 18:45:31 +01:00
Simon Grätzer cfd500c53c Fixing an issue with intermediate commits (#4006)
* Fixing an issue with intermediate commits

(cherry picked from commit 7f61fb9be0f22904be21e9dd26c3870035a5b318)

* Only check collection counts without intermediate commits

* Fixing assert
2017-12-12 23:15:41 +01:00
Jan d080350afa
make replication abortable (#4015) 2017-12-12 23:15:22 +01:00
Jan 29d81302ce Bug fix 3.3/agency transactions (#3936)
* Added 503/0 inquire tests to resilience tests (#3839)

* replaced all AgencyGeneralTransactions by AgencyWriteTransaction (#3841)
2017-12-07 10:39:36 +01:00
Kaveh Vahedipour f7b4150b64 no clientId anymore in send/sendWithFailOver SPIs (#3819) 2017-11-28 10:47:36 +01:00
Kaveh Vahedipour 27cd691bbf Bug fix/agencycomm validate methods broken (#3805) 2017-11-27 14:18:25 +01:00
Matthew Von-Maszewski 874e1d8eb0 correct stupid mistake: was erasing iterator in middle of same iterators loop (#3740) 2017-11-20 10:24:53 +01:00
Matthew Von-Maszewski 0558b4372d Improve ClusterComm::wait() (#3722) 2017-11-17 22:46:54 +01:00
Simon Grätzer 0e485f7441 Fixing collection name collection handling in Syncer (#3710) 2017-11-17 16:36:57 +01:00
Jan 4a199e2fe3 make truncate a bit less obstrusive (#3721) 2017-11-16 20:28:33 +01:00
Jan 5abf0c1185 Bug fix/fixes 1511 (#3711) 2017-11-16 14:18:51 +01:00
Michael Hackstein 3911a6a722 Fixed minor issue in shard distribution reporter thanks to @danielhlarkin for spotting (#3695) 2017-11-14 16:47:25 +01:00
Jan 8ece5914a3 do not warn when the user attempts to drop a non-existing index (#3682) 2017-11-13 17:47:57 +01:00
Jan ba9bc41457 fix some typos in code and docs (#3671) 2017-11-13 17:33:36 +01:00
Frank Celler 813f5c8541
fixed output (#3668) 2017-11-12 18:06:56 +01:00
Kaveh Vahedipour 7e816db51e Bug fix/agency restart enhancements (#3619)
* Removed unused active(...) method in Agent
* Inception's restart from persistence allows peer with empty active RAFT list to join
* Agency's UUID is persisted outside of the database comparable to coordinator and db server action.
* Publicized Methods to UUID stuff in ServerState
* Inception method documentation
* added --agency.disaster-recovery-id to allow for specification of known former agency id. this is a very dangerous option potentially.
* Delete a unused methods.
* separate _id and _recoveryId
* populating active list with entire pool
* Improve logging.
* reject gossip from unknown agent, if pool is complete
2017-11-10 23:40:26 +01:00
m0ppers 6cbf7159be Feature/server mode (#3590)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* Fixing some bugs

* Fixing tests

* Canceling ongoing operations

* removing some unused code + some asserts

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* Current work

* proper ifs

* Migrate resthandler to c++ and implement mode

* readonly server mode spec test

* Add changelog

* change code so it expects a full object and not just a string

* Fix jslint
2017-11-10 17:56:21 +01:00
Kaveh Vahedipour b09c329a7c fixed 10 min dead time, when bogus --cluster.my-local-info specification (#3643) 2017-11-10 16:05:03 +01:00
Jan 733f27e997 Bug fix/fix compilation with gxx7 (#3637) 2017-11-10 16:00:57 +01:00
Michael Hackstein 5c633f9fae Bug fix/speedup shard distribution (#3645)
* Added a more sophisticated test for shardDistribution format

* Updated shard distribution test to use request instead of download

* Added a cxx reporter for the shard distribuation. WIP

* Added some virtual functions/classes for Mocking

* Added a unittest for the new CXX ShardDistribution Reporter.

* The ShardDsitributionReporter now reports Plan and Current correctly. However it does not dare to find a good total/current value and just returns a default. Hence these tests are still red

* Shard distribution now uses the cxx variant

* The ShardDistribution reporter now tries to execute count on the shards

* Updated changelog

* Added error case tests. If the servers time out the mechanism will stop bothering after two seconds and just report default values.
2017-11-10 15:17:08 +01:00
Jan 7613bc4314 Bug fix/fixes 0211 (#3568)
* remove some non-unused V8 persistents

* do not throw that many bogus assertions

* do not rely on server role being defined

* slightly better debug output for V8 context debugging

* fix collection ids in inventory response

* simplify bootstrap a bit

* slightly better error handling

* make elapsed time a queryable value

* use less memory for stub collections

* added assertions that will always make sense

* added assertions

* do not garbage-collect while waiting

* less copying of parameters

* do not show "load indexes into memory" buttons for mmfiles engine

  as all indexes are in memory anyway

* when a collection is truncated via the web interface, flush the WAL and rotate all active journals

this will make close all open journals on leader and followers and make them subject to compaction opportunities

* fix invalid server id values being passed from web interface to backend

* introduce afterTruncate method for indexes

* added test case for issue #3447

* updated CHANGELOG

* don't warn about replicationFactor for system collections

* check that the queries actually use the geo index and not some other index

* properly report error in web interface

* fix some internals checks that made truncate fail for bigger collections in maintainer mode

* also run a compact() operation after a serious truncate

in order to make iteration over the truncated range much faster
when the collection is next accessed

* increase default maximum number of V8 contexts to at least 16
2017-11-09 12:48:15 +01:00
Jan f6a90c4879 some cleanup (#3583) 2017-11-07 10:27:21 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
m0ppers b552853909 Feature/optional replication factor enforcement (#3570) 2017-11-02 21:41:25 +01:00
Simon Grätzer 64e9377c05 Replacing /_api/collection with RestHandler (#3543) 2017-11-02 14:57:17 +01:00
Jan 4d03d3bd82
Bug fix/fixes 2410 (#3511)
* slightly clean up indexes

* proper stringification of error code

* improve diagnostic messages

* improve memory usage for incremental sync
2017-10-30 22:49:24 +01:00
Simon Grätzer d14710b683 Updating Replication Factor (#3528)
* Changing replication factor in 3.2 (#3513)

* Allow changing replicationFactor on coordinator

* Fixing logic

* Allowing change of replication factor

* Additional input validation

* grrr

* Testing invalid inputs

(cherry picked from commit 47600e1ea1)

* jslint

(cherry picked from commit 4d223597db)

* cherry-pick commits from 3.2: 47600e1ea1 4d223597db d684eaa7f8 a898af3723
2017-10-26 15:28:05 +02:00
Jan e9ade3f02d increase timeout for truncate operation in cluster (#3515) 2017-10-26 10:29:18 +02:00
Jan 9eb5e1545f Bug fix/fix cluster foxx queue startup (#3507)
* slightly clean up indexes

* proper stringification of error code

* do not run AQL queries during Foxx queue startup

* always create keyspace
2017-10-25 18:07:27 +02:00
Michael Hackstein 15d9a4be5f Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510) 2017-10-25 18:03:24 +02:00
Simon Grätzer 3e211f0d9e Fix --cluster.my-local-info (#3481)
* Compatibility for old startup methods.

* removed my-local-info from docs
2017-10-23 12:20:17 +02:00
Jan 720e6df82e Bug fix/fixes 1910 (#3471)
* properly initialize all properties

* use faster comparison

* properly detect and handle "method not allowed"

* code-style

* remove unused variable

* narrow variable scope

* handle non-existance of AuthenticationFeature

* remove dead code

* replace some C string handling with std::strings

* moved assertion to the correct place

* honor number of array members for IN operator

* slightly adjust error messages

* slighty adjust some error messages

* try to fix issue with lingering replication contexts on shutdown

* clean up heartbeat thread a little bit

* small fixes
2017-10-23 09:17:36 +02:00
Simon Grätzer fd3f9d99d9 Fixing webinterface access (#3464)
* intermediate commit

* Refactoring the ExecContext

* Fixing authentication

* Added start script

* some fixes

* fixed access to nullptr

* some c++

* fixed misleading message

* Made DatabaseGuard movable. Also adapted map insertions to _vocbase in Syncer classes, which failed to compile under older GCC versions

* added support for global flag to replication handler

* Started Refactoring in replication-static

* Fixing syncer code

* store applier configuration

* Static replication tests now test replication in a non system Database

* added flags to replication feature

* Adding some extra checks

* Fixing issue with rocksdb rest replication handler

* replication static now runs _system and otherdatabase replication tests.

* Fixing crash on startup

* Replication_sync now tests _system as well as other Database

* Fixing up heartbeat thread, adding global flag to rest handler

* Fixing wrong assert

* some cleanup, probably some tests are broken

* Made non-system db version of replication-ongoing tests

* fix determine-open-transaction

* Fixed ongoing tests. And added a test where we drop a database on slave while replication is still ongoing

* test fixes

* Activated ongoing other db tests. Also added a test that drops the DB on master, while the slave is still syncing.

* some better error reporting

* gradually switch to Result

* createCollection -> create

* re-activate using of collection ids for now

* enable auto-start

* Fixed create collection in replication ongoing test

* Added first draft of a test for global replication

* move to Result

* use system database for global applier

* improved error reporting

* fixed invalid URLs

* add test case filter

* load existing global applier configuration

* improve error reporting

* Added further tests for global replication

* Fixed global replication test, it now properly waits for replication. Timeouts after 10 seconds.

* Removed erronious assertion

* improve error reporting

* intermediate commit

* Added a test-case for global replication where the Master already has some data and the slave is clean

* fix deletion of replication contexts

* Fixed JSLint

* compiling code

* fix typo

* do not fail for global applier when no database is configured

* intermediate commit

* syncer supports switch for 3.3 / 3.2

* fixed errors

* Fixing some replication bugs

* Fixing some assertions

* Fixed missing commit markers

* Fixing assertion on database drop

* Attempt to fix deadlock in applier and assertion

* Fixing some stupid things

* Support for collection parameter

* Acidentally turned off some tests

* Grrr

* Fixing wrong method call

* Fixed startscript

* Fixed assignmet instead of equality check typo

* Added a test far interrupted replication. For now it justs tests basics on _system database.

* Improved index tests on replication.

* properly initialize variable

* fixed some replication problems

* MMFiles wal access support

* fix replication issues

* Started mmfiles replication support

* fixing a bug

* Fixing an issue

* fixing some mmfiles stuff

* fix test

* reload users

* prevent pure virtual method call

* intermediate commit

* Making from exclusive

* do not call getMasterState if child syncer

* some reformatting

* Adding global support for handleCommandSync

* Fixing assertion

* removing some debug logs

* Changing return codes

* Fixing some issues in the rest handler

* Make replication less susceptible to errors

* remove some debug output

* return last log tick

* remove waits from tests

* fix two tests

* changing header for open-transactions call

* some fixes

* fix test

* invalidate cached databases

* merging request and execcontext

* try to fix assertion error

* renamed method

* fix compile warning

* small changes

* Always use execcontext

* Fixing an assert

* fix replication issues

* try to fix collection lookups

* try to fix master/slave start

* Changing comments in heartbeat thread

* fix wrong signature of READ_LOCKER_EVENTUAL

* log server role in testing mode

* Fixed authentication, removed execContext in favor of request context

* Adding cluster rest api

* Fixing cluster rest handler

* Fixing cluster callback

* Some refactoring

* Queue creation is not a single operation

* Allowed for leader redirects

* Setting start of batch

* Disabling 2.8 compat tests

* fix start/stop bugs

* jslint

* various little changes

* add flag for exposing jwt

* indentation

* cleanup

* Some changed to guid

* fixing tcp to http, vst

* changed endpoint header

* small fixes

* Reorder servers by health status

* Higher timeout

* Changing error messages

* update the fromTick when fetching multiple batches from the coordinator

* more debug info

* Reducing copy pasted code

* change uid generation

* reducing logspam

* more exceptions for redirects

* more exceptions

* attempt to fix uniqids in cluster

* centralize printing of HTTP errors in replication

* debug output

* fix messages for authentication

* cleanup

* removing --cluster.my-id, --cluster.my-local-info

* Added leadership race to bootstrap, determine foxxmaster on boostrap, removing obsolete code

* improve error reporting in RestAqlHandler

* Changing heartbeat thread, fixing cluster_sync

* some more debug output

* added master

* attempt to make tests more deterministic

* added logging about indexes

* added some safety checks to the logger

* slighty better error messages

* fix location header for SSL

* fix error message

* try to make tests more deterministic

* change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED

* Fixing broken webinterface access

* reverting groovy change

* Fixing read-only internal users

* Using superuser rights for dashboard now

* Adding mode field to _admin/server/role

* added mode TRYAGAIN

* remove inventory lock (does not seem necessary here)

* remove invalid assertion

* fixing agency bugs

* Removing debug output

* return proper errors in case of "method not allowed"

* Fixed up some info messages

* jslint
2017-10-20 18:06:59 +02:00
Jan 7840d3f824 Bug fix/fixes 1810 (#3460)
* improve error reporting in RestAqlHandler

* added logging about indexes

* added some safety checks to the logger

* slighty better error messages

* fix location header for SSL

* fix error message

* try to make tests more deterministic

* change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED
2017-10-19 11:28:01 +02:00
Simon Grätzer 7c31960cf2 Feature/async failover (#3451) 2017-10-18 23:59:29 +02:00
Wilfried Goesgens 88fd685296 Bug fix/fix shard state restore display (#3385)
* don't add  items into the array if we have an empty list member.

* we mustn't display 100% here, since its a rounding error. If, the checkmark would be set.

* rather use string cutting to round so we don't get into the 100% trap.
2017-10-17 14:08:07 +02:00
Max Neunhöffer 9a2385b941 Add host id detection and show in /_admin/cluster/Health. (#3389) 2017-10-11 12:42:44 +02:00
Jan 0561bf45ce Bug fix/isrestore (#3283)
* Make isRestore work in the cluster.

This covers sharded collections with default sharding and non-default
sharding.

* always use locally generate revision ids for storing and looking up documents
2017-10-03 11:53:49 +02:00
Jan 56fab56ff5 Bug fix/issues 2709 (#3333)
* logging improvements

* fixed copy&paste error

* enable assertions in non-release mode

* updated CHANGELOG

* fix --temp.path (was dysfunctional), fix creation of temporary directories and races
2017-10-01 09:43:03 +02:00
Jan 4ba38bf981 try to work around some assertions (#3296)
* remove obsolete values from relative config

* warn about using obsolete config parameters
2017-09-28 09:21:33 +02:00
Frank Celler 111c826026 Bugfix/modify document coordinator (#3266)
* Fixed ModifyDocument on coordinator. It uses the internally optimized function to extract _key values, which fails on Compact objects

* do not crash server when passing invalid document key type
2017-09-15 22:49:20 +02:00
m0ppers f0ae3d3de7 Bug fix/uaf communicator request abort (#3216)
* Fix abortion of requests

* Fix progress callback
2017-09-15 22:42:30 +02:00
Michael Hackstein 0d522a6136 Fixed ModifyDocument on coordinator. It uses the internally optimized function to extract _key values, which fails on Compact objects (#3258) 2017-09-15 14:23:04 +02:00
Jan 5165155ed1 Bug fix/fixes 0609 (#3227)
* do not use V8 variant of AQL functions in early optimization stage when a C++ variant is available

* additionally, simplify AQL function definitions and aliases

* warn when more than 90% of max mappings are in use

* added C++ variant of replication catchup

* added `--log.role` option

* updated CHANGELOG

* removed non-existing scheduler.threads option from config

* removed useless __FILE__, __LINE__ invocations

* updated CHANGELOG

* allow a priority V8 context

* remove TRI_CORE_MEM_ZONE

* try to fix Windows errors & warnings

* cleanup

* removed memory zones altogether

* exclude system collections from collection tests
2017-09-13 16:28:21 +02:00