1
0
Fork 0
Commit Graph

1197 Commits

Author SHA1 Message Date
Kaveh Vahedipour c07a706948 supervision fix for internal issue #2215 backport to 3.3 (#5063)
* supervision fix for internal issue #2215 backport to 3.3
2018-04-10 15:29:27 +02:00
Kaveh Vahedipour 3d8eb7541c internal issue #2215 (#5043) 2018-04-10 12:00:48 +02:00
Jan 765aed1368 bump version number for all local DDL changes and tell agency (#4684)
* bump version number for all local DDL changes and tell agency
  this allows other listeners (e.g. for DC2DC) to get notified when  
  DDL operations are carried out locally and need to be applied remotely
* Increase a log level.
2018-03-23 12:34:08 +01:00
Tobias Gödderz b34eb1c23f Bugfix / Supervision: removeFollower should remove the last follower(s) first (#4926)
* Added a test asserting the last followers are removed first as required by moveShard
* Remove the last followers first
* Removed unused includes
* Updated CHANGELOG
2018-03-23 09:33:04 +01:00
Kaveh Vahedipour 263fe1f6e9 3.3: fixed the missed changes to plan after agency callback is registred f… (#4777)
* fixed the missed changes to plan after agency callback is registred for create collection
* fixed agency inquire
2018-03-14 12:03:40 +01:00
jsteemann f68cd7f296 fix typo in variable name 2018-02-28 17:39:49 +01:00
Kaveh Vahedipour 4893e1ccca more information from inception, when agent is activated (#4691) 2018-02-28 16:11:58 +01:00
Kaveh Vahedipour a69c60d79b bug fix for jobs looking at distrubuteShardsLike and virtual collections (#4666) 2018-02-28 16:09:16 +01:00
Simon 1d57a46168 Refactor Authentication Layer (3.3) (#4588)
* added tests for revokeCollection and revokeDatabase

* optimized user permission test

* ui selection bugfix

* fixed ldap ui login

* login view

* Authentication refactoring

* localstorage fallback if user config is not available

* Fixing permission resolution test

* Adding missing import

* local storage queries now supported

* disabled collection task check for ldap

* added internal ldapEnabled function and ldap config to the ui

* more db creation tests

* removed console logs

* render fix

* Various authentication related fixes

* exec ldap test also for cluster

* Adding support to refresh user rights from external auth sources

* ldap test howto comment

* Handling roles more correctly

* jwt

* login view

* First part of rework of LDAP documentation.

* test roles in a ldap environment

* Changing role handling

* Finish revision of the LDAP chapter in the manual.

* Fixing user header

* Fixing some slight issues with LDAP users and roles

* Removing unused code

* Removing unused code

* added ldap test

* fixing a bug in restuserhandler

* more ldap tests

* ldap tests

* ldap tests

* optimized ldap testing, added cluster support, fixed some auth tests

* ldap cleanup test

* ldap tests

* auth tests

* ldap test

* Changing permission defaults

* revert change

* updated user helper test

* rm of try catch block connection

* Removing FeatureCacheFeature

* Changed permission resolution according to discussion

* updated the docs to clarify the permission resolution and the intricacies of LDAP users

* Fixing wrong permissions check in handling of PUT in ResUserHandler

* Using revision ID when replacing users

* Adding basic replace test

* tests

* Fixing some outstanding issues

* Fixing test setup, optimizing some stuff

* Fixing permission resolution rules, testsuite setup, etc

* Fix deadlock

* Adding error message for keyspace, slightly changing test setup

* Removing remaining mentions of FeatureCacheFeature

* Fix jslint

* Fixing some failed tests

* Fixing cluster authentication issue, red tests

* Fixing ldap testsuite, adding trace logging

* Fixint ldap tesuite setup and LDAP recognition

* Fixing an assert

* Cleanup, adding changelog entry

* fix typo

* Fixing dump_authentication test

* improvements found during code review

* oops

* updated CHANGELOG

* Fixing broken handling, disallowing adding of local users when disabled

* added tests for ldap search mode

* Fixing testInvalidGrants

(cherry picked from commit bc7ea2aaa29a9ed0974898f487e8a318f24912f1)

* Removing undefined auth level externally

(cherry picked from commit 70859f43ae6fd694fdbf70f669fbfdafc58e7913)

* Fixing previous commit

(cherry picked from commit 2fbcffd2ed657862ef9fb5e6d45201a6ec8ada69)

* more use of sessionstorage

* intentionally removed `after` methods from tests

because they are executed before the tests start
no cleanup is performed right now after the authentication tests
however, a cleanup is done at start of every test

* ldap tests all modes

* fix LDAP test invocation

* Added roles transformation to ldap test suite

* Fix compilation of community version.

* Imrpved the ldap testsuites by unifying their options

* fix permission problems for system collections

* Improved LDAP configuration documentation.

* Grunt.

* fixed some ro/rw display issues

* fixed some ro/rw display issues part 2

* grunt build

* bump version number

* Fixed typos in LDAP manual
2018-02-28 13:24:18 +01:00
Simon e96e899fd3 Active Failover for Foxx Services (3.3) (#4593) 2018-02-15 09:36:25 +01:00
Kaveh Vahedipour cce5b2decb Bug fix 3.3/supervision to delete removed nodes from health (#4455) 2018-02-13 15:55:42 +01:00
Kaveh Vahedipour a14c4bd02f constituent correctly persisiting _votedFor and _term (#4248) (#4320) 2018-01-17 10:37:16 +01:00
Kaveh Vahedipour 56a9ad69b1 Bug fix 3.3/supervision no longer fails to remove server from failed when back to good (#4210)
* let's not miss failedserver removal
* remove resetting of FailedServers in test code
* Only call abortRequestsToFailedServers at most every 3 seconds.
2018-01-03 21:55:01 +01:00
Matthew Von-Maszewski 41d1bfce23 create independent executeLockedRead and executeLockedWrite to speed performance (#4178) 2017-12-29 13:36:48 +01:00
Kaveh Vahedipour d6ce7a1301 Agency read write locks ported from devel (#4175) 2017-12-28 11:28:11 +01:00
Matthew Von-Maszewski 392ddde251 Bug fix 3.3: Fix supervisor thread crash (#4165)
* port devel branch to 3.3 of supervisor thread death fix
2017-12-27 22:34:29 +01:00
Max Neunhöffer ef8fcd101c
Port to 3.3 of various fixes around leadership preparation in agency. (#4150)
* Add logging for _earliestPackage in Agent.
* Really enforce the hidden option --server.maximal-threads if given.
* Switch off --log.force-direct in scripts/startStandAloneAgency.sh
* Lower the timeout for sending AppendEntriesRPC to 150s.
* Erase _earliestPackage when becoming a leader.
* Challenge leadership in agent main loop.
* Use steady_clock for _earliestPackage.
* Change _lastAcked and _leaderSince to steady_clock as well.
* time difference calculations based on old readSystemClock to steadyClockToDouble
* All system_clock transitioned to steady_clock in Agent. Remaining system_clock are user input / output or timestamps
* Inception system_clock to steady_clock
2017-12-27 16:47:16 +01:00
Matthew Von-Maszewski f35215ea51 Have twice seen coordinator go into long loop on shutdown. Added two tests for isStopping() to break the loops. (#4139) 2017-12-21 20:56:14 +01:00
Jan b7ee607312
Bug fix 3.3/integer overflow when calculating waits in constituent (#4090)
* integer overflow in Constituent could seize operation of Agency

* less likely integer overflow on double conversion

* less likely integer overflow on double conversion

* changed comparison to integer comparison as suggested by @neunhoef
2017-12-19 10:10:05 +01:00
Jan 9c5893e7a7
fix premature unlock (#3802) (#4027)
* fix some deadlocks found by evil lock manager (tm)

* fix duplicate lock

* fix indentation

* ensure proper lock dependencies

* fix lock acquisition

* removed useless comment

* do not lock twice

* create either a V8 transaction context or a standalone transaction context, depending on if we are called from within V8 or not

* AQL micro optimizations

* use explicit constructor

* only use V8DealerFeature's ConditionLocker for acquiring a free V8 context

entering and exiting the selected context is then done later on without having to hold the ConditionLocker

* remove some recursive locks

* Disable custom deadlock detection when Thread Sanitizer is enabled

* Changing ifdef's

* grr

* broke gcc

* Using atomic for ApplicationServer::_server

* fix premature unlock

* add some asserts

* honor collection locking in cluster

* yet one more lock fix

* removed assertion

* some more bugfixes

* Fixing assert

(cherry picked from commit 1155df173bfb67303077fbe04ee8d909517bfd21)
2017-12-13 18:46:14 +01:00
Jan 7af86685e3
when upgrading from 3.1 LastHeartBeatAcked could also have been missing, when the 3.1 cluster had not run for long enough (#3974) 2017-12-08 17:33:37 +01:00
Jan bec83181be Bug fix 3.3/add security check end with failover (#3911)
* Add security check in AgencyComm::sendWithFailover.

* some cleanup

* added some more tests

* add typeName() to AgencyCommTransaction to make the transaction type printable in debug messages

* improve debuggability
2017-12-07 10:33:59 +01:00
Jan ba729150bf backporting inquire fixes (#3920) 2017-12-07 10:27:41 +01:00
Kaveh Vahedipour f7b4150b64 no clientId anymore in send/sendWithFailOver SPIs (#3819) 2017-11-28 10:47:36 +01:00
Kaveh Vahedipour c300eee5f0 minor (#3813) 2017-11-27 18:22:13 +01:00
Kaveh Vahedipour 27cd691bbf Bug fix/agencycomm validate methods broken (#3805) 2017-11-27 14:18:25 +01:00
Kaveh Vahedipour 2beaef41ff Bug fix/agencycomm validate methods broken (#3784) 2017-11-24 10:31:07 +01:00
Simon Grätzer 987daca85b Handle invalid endpoints in AgencyComm (#3729) 2017-11-17 16:35:59 +01:00
Kaveh Vahedipour 7b80deb5cc Fixed object assignment operator for agency's key value store (#3701)
* Fixed object assignment operator for agency's key value store
* Node's toJson is now actually toJson. getString should be used for string extractions
* adjust agency's documentation (clarify precondition)
2017-11-17 15:49:40 +01:00
Kaveh Vahedipour 255d90d26a cherry pick from 3.2 pull request for bug-fix/supervision-thread-exists-on-pre3.2-agency (#3709)
This is the HealthRecord upgrade patch.
2017-11-17 10:14:14 +01:00
Jan b4f6ee9273 Feature/improved index api for unique constraints and replication (#3715) 2017-11-16 21:02:01 +01:00
Jan 5abf0c1185 Bug fix/fixes 1511 (#3711) 2017-11-16 14:18:51 +01:00
Max Neunhöffer 766ab7c8cf
Fix agency shutdown bug. (#3683)
* Fix agency shutdown bug.
* Remove precondition that was not needed in AgencyComm::removeValues.
* Fail fatally if threads do not shut down.
2017-11-14 16:33:46 +01:00
Jan e1ecc6b02c fix some threading issues (#3659) 2017-11-12 22:34:51 +01:00
Kaveh Vahedipour c9621ff230 Feature/new agency checks for preconditions (#3612) 2017-11-11 22:48:23 +01:00
Max Neunhöffer bff630b332 Handle leader resignation race with redirectRequst. (#3663) 2017-11-11 19:38:29 +01:00
Kaveh Vahedipour 7e816db51e Bug fix/agency restart enhancements (#3619)
* Removed unused active(...) method in Agent
* Inception's restart from persistence allows peer with empty active RAFT list to join
* Agency's UUID is persisted outside of the database comparable to coordinator and db server action.
* Publicized Methods to UUID stuff in ServerState
* Inception method documentation
* added --agency.disaster-recovery-id to allow for specification of known former agency id. this is a very dangerous option potentially.
* Delete a unused methods.
* separate _id and _recoveryId
* populating active list with entire pool
* Improve logging.
* reject gossip from unknown agent, if pool is complete
2017-11-10 23:40:26 +01:00
Jan bef52d7dc3
Bug fix/cleanup after cppcheck (#3639) 2017-11-10 13:53:28 +01:00
Max Neunhöffer 3c0ee6908b Bug fix/lead to agent (#3541) 2017-11-09 11:10:09 +01:00
Jan 98eecaae20 bug fix for agency precondition checks (#3579) 2017-11-06 23:55:41 +01:00
Simon Grätzer ee8209943f Missing things for active / passive (#3578)
* Switching from ttl to supervision based failover mechanism

* Allowing canceling of ongoing actions

* refactored asyncjobmanager

* refactoring some code

* adding read-only flag

* catching some exceptions to reduce log pollution, removing unnecessary code, removing tests for _changeMode

* fixing "createsANewDatabaseWithAnInvalidUser"

* auth = off does not longer make everyone superuser

* Fixing cluster_sync and maybe resilience
2017-11-04 20:30:23 +01:00
jsteemann a5c777e565 fix broken inquiry results in AgencyComm 2017-10-26 20:10:54 +02:00
Max Neunhöffer cb05d33e17 Term is a number not a string. (#3520) 2017-10-26 12:02:38 +02:00
Max Neunhöffer ee96c37237 Fix agency restart problems. (#3493)
* Fix agency restart problems (port from a 3.2 fix).

* Further fixes after Craneware rescue.
2017-10-25 18:05:58 +02:00
Michael Hackstein 15d9a4be5f Reactivated the failover of the FoxxMaster, it was not modified anymore after the current master dies (#3510) 2017-10-25 18:03:24 +02:00
Jan 720e6df82e Bug fix/fixes 1910 (#3471)
* properly initialize all properties

* use faster comparison

* properly detect and handle "method not allowed"

* code-style

* remove unused variable

* narrow variable scope

* handle non-existance of AuthenticationFeature

* remove dead code

* replace some C string handling with std::strings

* moved assertion to the correct place

* honor number of array members for IN operator

* slightly adjust error messages

* slighty adjust some error messages

* try to fix issue with lingering replication contexts on shutdown

* clean up heartbeat thread a little bit

* small fixes
2017-10-23 09:17:36 +02:00
Max Neunhöffer 67300f9d77 Add a hidden AGENCY_DUMP for agency emergency recovery. (#3474) 2017-10-21 00:24:32 +02:00
Simon Grätzer fd3f9d99d9 Fixing webinterface access (#3464)
* intermediate commit

* Refactoring the ExecContext

* Fixing authentication

* Added start script

* some fixes

* fixed access to nullptr

* some c++

* fixed misleading message

* Made DatabaseGuard movable. Also adapted map insertions to _vocbase in Syncer classes, which failed to compile under older GCC versions

* added support for global flag to replication handler

* Started Refactoring in replication-static

* Fixing syncer code

* store applier configuration

* Static replication tests now test replication in a non system Database

* added flags to replication feature

* Adding some extra checks

* Fixing issue with rocksdb rest replication handler

* replication static now runs _system and otherdatabase replication tests.

* Fixing crash on startup

* Replication_sync now tests _system as well as other Database

* Fixing up heartbeat thread, adding global flag to rest handler

* Fixing wrong assert

* some cleanup, probably some tests are broken

* Made non-system db version of replication-ongoing tests

* fix determine-open-transaction

* Fixed ongoing tests. And added a test where we drop a database on slave while replication is still ongoing

* test fixes

* Activated ongoing other db tests. Also added a test that drops the DB on master, while the slave is still syncing.

* some better error reporting

* gradually switch to Result

* createCollection -> create

* re-activate using of collection ids for now

* enable auto-start

* Fixed create collection in replication ongoing test

* Added first draft of a test for global replication

* move to Result

* use system database for global applier

* improved error reporting

* fixed invalid URLs

* add test case filter

* load existing global applier configuration

* improve error reporting

* Added further tests for global replication

* Fixed global replication test, it now properly waits for replication. Timeouts after 10 seconds.

* Removed erronious assertion

* improve error reporting

* intermediate commit

* Added a test-case for global replication where the Master already has some data and the slave is clean

* fix deletion of replication contexts

* Fixed JSLint

* compiling code

* fix typo

* do not fail for global applier when no database is configured

* intermediate commit

* syncer supports switch for 3.3 / 3.2

* fixed errors

* Fixing some replication bugs

* Fixing some assertions

* Fixed missing commit markers

* Fixing assertion on database drop

* Attempt to fix deadlock in applier and assertion

* Fixing some stupid things

* Support for collection parameter

* Acidentally turned off some tests

* Grrr

* Fixing wrong method call

* Fixed startscript

* Fixed assignmet instead of equality check typo

* Added a test far interrupted replication. For now it justs tests basics on _system database.

* Improved index tests on replication.

* properly initialize variable

* fixed some replication problems

* MMFiles wal access support

* fix replication issues

* Started mmfiles replication support

* fixing a bug

* Fixing an issue

* fixing some mmfiles stuff

* fix test

* reload users

* prevent pure virtual method call

* intermediate commit

* Making from exclusive

* do not call getMasterState if child syncer

* some reformatting

* Adding global support for handleCommandSync

* Fixing assertion

* removing some debug logs

* Changing return codes

* Fixing some issues in the rest handler

* Make replication less susceptible to errors

* remove some debug output

* return last log tick

* remove waits from tests

* fix two tests

* changing header for open-transactions call

* some fixes

* fix test

* invalidate cached databases

* merging request and execcontext

* try to fix assertion error

* renamed method

* fix compile warning

* small changes

* Always use execcontext

* Fixing an assert

* fix replication issues

* try to fix collection lookups

* try to fix master/slave start

* Changing comments in heartbeat thread

* fix wrong signature of READ_LOCKER_EVENTUAL

* log server role in testing mode

* Fixed authentication, removed execContext in favor of request context

* Adding cluster rest api

* Fixing cluster rest handler

* Fixing cluster callback

* Some refactoring

* Queue creation is not a single operation

* Allowed for leader redirects

* Setting start of batch

* Disabling 2.8 compat tests

* fix start/stop bugs

* jslint

* various little changes

* add flag for exposing jwt

* indentation

* cleanup

* Some changed to guid

* fixing tcp to http, vst

* changed endpoint header

* small fixes

* Reorder servers by health status

* Higher timeout

* Changing error messages

* update the fromTick when fetching multiple batches from the coordinator

* more debug info

* Reducing copy pasted code

* change uid generation

* reducing logspam

* more exceptions for redirects

* more exceptions

* attempt to fix uniqids in cluster

* centralize printing of HTTP errors in replication

* debug output

* fix messages for authentication

* cleanup

* removing --cluster.my-id, --cluster.my-local-info

* Added leadership race to bootstrap, determine foxxmaster on boostrap, removing obsolete code

* improve error reporting in RestAqlHandler

* Changing heartbeat thread, fixing cluster_sync

* some more debug output

* added master

* attempt to make tests more deterministic

* added logging about indexes

* added some safety checks to the logger

* slighty better error messages

* fix location header for SSL

* fix error message

* try to make tests more deterministic

* change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED

* Fixing broken webinterface access

* reverting groovy change

* Fixing read-only internal users

* Using superuser rights for dashboard now

* Adding mode field to _admin/server/role

* added mode TRYAGAIN

* remove inventory lock (does not seem necessary here)

* remove invalid assertion

* fixing agency bugs

* Removing debug output

* return proper errors in case of "method not allowed"

* Fixed up some info messages

* jslint
2017-10-20 18:06:59 +02:00
Kaveh Vahedipour 428e163db9 Return the result of the inquiry (#3465) 2017-10-20 15:01:32 +02:00
Jan 7840d3f824 Bug fix/fixes 1810 (#3460)
* improve error reporting in RestAqlHandler

* added logging about indexes

* added some safety checks to the logger

* slighty better error messages

* fix location header for SSL

* fix error message

* try to make tests more deterministic

* change error code from TRI_ERROR_INTERNAL (which we want to avoid) to TRI_ERROR_FAILED
2017-10-19 11:28:01 +02:00