mirror of https://gitee.com/bigwinds/arangodb
Bug fix/failover with min replication factor (#9486)
* Improve collection time of IResearchQueryOptimizationTest * Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it * Added some assertion son minReplicationFactor * Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled * added minReplicationFactor to the user interface, preparation for the collection api changes * added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo * added minReplicationFactor usage to tests * TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME * minReplicationFactor now able to change via collection properties route * fixed wrongly assert * added minReplicationFactor to the graph management ui * added minReplicationFactor to the gharial api * Fixed off-by-one error in minReplicationFactor. We actually enforced one more. * adjusted description of minReplicationFactor * FollowerInfo Refactoring * added gharial api graph creation tests with minimal replication factor * proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests * added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor * Debug logging * MORE Debug logging * Included replication fast lane * Use correct minreplicationfactor * modified debug logging * Fixed compileissues * MORE Debug logging * MORE Debug logging * MORE Debug logging * MORE Debug logging * MORE Debug logging * MORE Debug logging * MORE Debug logging * Revert "MORE Debug logging" This reverts commitdab5af28c0
. * Revert "MORE Debug logging" This reverts commit6134b664bd
. * Revert "MORE Debug logging" This reverts commit80160bdf3b
. * Revert "MORE Debug logging" This reverts commit06aabcdfe1
. * Removed debug output * Added replication fast lane. Also refactored the commands as i cannot take it any more... * Put some requests of RocksDBReplication onto CATCHUP Lane. * Put some requests of MMFilesReplication onto CATCHUP Lane. * Adjusted Fast and MED lane usage in Supervised scheduler * Added changelog entry * Added new features entry * A new leader will now keep old followers in case of failover * Update arangod/Cluster/ClusterCollectionCreationInfo.cpp Co-Authored-By: Tobias Gödderz <tobias@arangodb.com> * Fixed JSLINT * Unified lane handling of replication handlers * Sorry forgotten in last commit * replaced strings with static strings * more use of static strings * optimized min repl description in the ui * decr initial loop variable * clean up of the createWithId test * more use of static strings * Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js Co-Authored-By: Tobias Gödderz <tobias@arangodb.com> * Added some comments on condition, renamed variable as suggested in review * Added check for min replicationFactor to be non-zero * Added assertion * Added function to modify min and max replication factor in one go * added missing semicolon * rm log devel * Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place * Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers * check replFactor against nr dbservers * Add lie reporting in CURRENT * Reverted most of my recent commits about Failover situation. The intended plan simply does not work out * move replication checks from logical collection to rest collection handler * added more replication tests * Include assert only if we are not in gtest * jslint * set min repl factor to zero if satellite collection * check replication attributes in v8 collection * Initial commit, old plan, does not yet work * fixed ires tests * Included FailoverCandidates key. Not fully implemented * fixed wrong assert * unified in sync follower reporting * fixed compiler errors * Cleanup locking, and fixed potential deadlocks * Comments about locking order in FollowerInfo. * properly check uint * Keep old leader as potential failover candidate * Transaction methods now use followerInfo to check if the leader can write, this might have the sideeffect that 'failoverCandidates' are updated * Let agency check failoverCandidates if possible * Initialize member variables * Use unified follower reporting in DBServerAgencySync * Removed obsolete variable, collecting it somewhere else * repl factor attr check * Reimplemented previous followers, second attempt now. PhaseOne and PhaseTwo can now synchronize on current. * Fixed assertion, forgot an off-by-one * adjusted test to be more preciese now * Fixed failove candidates list * Disable write on dropping too many followers * Allow to run updateFailoerCandidates multiple times with same leader. * Final fixes, resilience tests now green, crossing fingers for jenkins * Fixed race on atomics comparison * Fixed invalid number type * added nullptr handling * added nullptr handling * Removed invalid assert * Make takeover of leadership an atomic operation * Update tests/js/common/shell/shell-cluster-collection.js Co-Authored-By: Tobias Gödderz <tobias@arangodb.com> * Review fixes * Fixed creation code to use takeoverLeadership * Update arangod/Cluster/FollowerInfo.h Co-Authored-By: Tobias Gödderz <tobias@arangodb.com> * Applied review fixes * There is no timeout * Moved AQL + Pregel to INTERNAL_AQL lane, which is medium priority, to avoid deadlocks with Sync replication * More review fixes * Use difference if you want to compare two vectors... * Use std::string ... * Now check if we are in recovery mode * Added documentation for minReplicationFactor * Added readme update as well in documenation
This commit is contained in:
parent
c922c5f133
commit
36b1d290a9
|
@ -38,6 +38,10 @@ factor. The number of _followers_ can be controlled using the
|
||||||
`replicationFactor` parameter is the total number of copies being
|
`replicationFactor` parameter is the total number of copies being
|
||||||
kept, that is, it is one plus the number of _followers_.
|
kept, that is, it is one plus the number of _followers_.
|
||||||
|
|
||||||
|
In addition to the `replicationFactor` we have a `minReplicationFactor`
|
||||||
|
the locks down a collection as soon as we have lost too many followers.
|
||||||
|
|
||||||
|
|
||||||
Asynchronous replication
|
Asynchronous replication
|
||||||
------------------------
|
------------------------
|
||||||
|
|
||||||
|
|
|
@ -164,6 +164,15 @@ to the [naming conventions](../NamingConventions/README.md).
|
||||||
dramatically when using joins in AQL at the costs of reduced write
|
dramatically when using joins in AQL at the costs of reduced write
|
||||||
performance on these collections.
|
performance on these collections.
|
||||||
|
|
||||||
|
- *minReplicationFactor* (optional, default is 1): in a cluster, this
|
||||||
|
attribute determines how many copies of each shard are required
|
||||||
|
to be in sync on the different DBServers. If we have less then these
|
||||||
|
many copies in the cluster a shard will refuse to write. The
|
||||||
|
minReplicationFactor can not be larger than replicationFactor.
|
||||||
|
Please note: during server failures this might lead to writes
|
||||||
|
not beeing possible until the failover is sorted out and might cause
|
||||||
|
write slow downs in trade of data durability.
|
||||||
|
|
||||||
- *distributeShardsLike*: distribute the shards of this collection
|
- *distributeShardsLike*: distribute the shards of this collection
|
||||||
cloning the shard distribution of another. If this value is set,
|
cloning the shard distribution of another. If this value is set,
|
||||||
it will copy the attributes *replicationFactor*, *numberOfShards* and
|
it will copy the attributes *replicationFactor*, *numberOfShards* and
|
||||||
|
|
|
@ -43,6 +43,10 @@ determine the target shard for documents; *Cluster specific attribute.*
|
||||||
@RESTSTRUCT{replicationFactor,collection_info,integer,optional,}
|
@RESTSTRUCT{replicationFactor,collection_info,integer,optional,}
|
||||||
contains how many copies of each shard are kept on different DBServers.; *Cluster specific attribute.*
|
contains how many copies of each shard are kept on different DBServers.; *Cluster specific attribute.*
|
||||||
|
|
||||||
|
@RESTSTRUCT{minReplicationFactor,collection_info,integer,optional,}
|
||||||
|
contains how many minimal copies of each shard are kept on different DBServers.
|
||||||
|
The shards will refuse to write, if we have less then these many copies in sync.; *Cluster specific attribute.*
|
||||||
|
|
||||||
@RESTSTRUCT{shardingStrategy,collection_info,string,optional,}
|
@RESTSTRUCT{shardingStrategy,collection_info,string,optional,}
|
||||||
the sharding strategy selected for the collection; *Cluster specific attribute.*
|
the sharding strategy selected for the collection; *Cluster specific attribute.*
|
||||||
One of 'hash' or 'enterprise-hash-smart-edge'
|
One of 'hash' or 'enterprise-hash-smart-edge'
|
||||||
|
|
|
@ -25,6 +25,11 @@ concurrent modifications to this graph.
|
||||||
@RESTSTRUCT{replicationFactor,graph_representation,integer,required,}
|
@RESTSTRUCT{replicationFactor,graph_representation,integer,required,}
|
||||||
The replication factor used for every new collection in the graph.
|
The replication factor used for every new collection in the graph.
|
||||||
|
|
||||||
|
@RESTSTRUCT{minReplicationFactor,graph_representation,integer,optional,}
|
||||||
|
The minimal replication factor used for every new collection in the graph.
|
||||||
|
If one shard has less then minimal replication factor copies, we cannot write
|
||||||
|
to this shard, but to all others.
|
||||||
|
|
||||||
@RESTSTRUCT{isSmart,graph_representation,boolean,required,}
|
@RESTSTRUCT{isSmart,graph_representation,boolean,required,}
|
||||||
Flag if the graph is a SmartGraph (Enterprise Edition only) or not.
|
Flag if the graph is a SmartGraph (Enterprise Edition only) or not.
|
||||||
|
|
||||||
|
|
|
@ -42,6 +42,11 @@ Cannot be modified later.
|
||||||
@RESTSTRUCT{replicationFactor,post_api_gharial_create_opts,integer,required,}
|
@RESTSTRUCT{replicationFactor,post_api_gharial_create_opts,integer,required,}
|
||||||
The replication factor used when initially creating collections for this graph.
|
The replication factor used when initially creating collections for this graph.
|
||||||
|
|
||||||
|
@RESTSTRUCT{minReplicationFactor,post_api_gharial_create_opts,integer,optional,}
|
||||||
|
The minimal replication factor used for every new collection in the graph.
|
||||||
|
If one shard has less then minimal replication factor copies, we cannot write
|
||||||
|
to this shard, but to all others.
|
||||||
|
|
||||||
@RESTRETURNCODES
|
@RESTRETURNCODES
|
||||||
|
|
||||||
@RESTRETURNCODE{201}
|
@RESTRETURNCODE{201}
|
||||||
|
|
|
@ -52,7 +52,11 @@ In a cluster setup, the result will also contain the following attributes:
|
||||||
determine the target shard for documents.
|
determine the target shard for documents.
|
||||||
|
|
||||||
* *replicationFactor*: determines how many copies of each shard are kept
|
* *replicationFactor*: determines how many copies of each shard are kept
|
||||||
on different DBServers.
|
on different DBServers. Has to be in the range of 1-10 *(Cluster only)*
|
||||||
|
|
||||||
|
* *minReplicationFactor* : determines the number of minimal shard copies kept on
|
||||||
|
different DBServers, a shard will refuse to write, if less then this amount
|
||||||
|
of copies are in sync. Has to be in the range of 1-replicationFactor *(Cluster only)*
|
||||||
|
|
||||||
* *shardingStrategy*: the sharding strategy selected for the collection.
|
* *shardingStrategy*: the sharding strategy selected for the collection.
|
||||||
This attribute will only be populated in cluster mode and is not populated
|
This attribute will only be populated in cluster mode and is not populated
|
||||||
|
@ -77,6 +81,10 @@ one or more of the following attribute(s):
|
||||||
different DBServers, valid values are integer numbers
|
different DBServers, valid values are integer numbers
|
||||||
in the range of 1-10 *(Cluster only)*
|
in the range of 1-10 *(Cluster only)*
|
||||||
|
|
||||||
|
* *minReplicationFactor* : Change the number of minimal shard copies kept on
|
||||||
|
different DBServers, a shard will refuse to write, if less then this amount
|
||||||
|
of copies are in sync. Has to be in the range of 1-replicationFactor *(Cluster only)*
|
||||||
|
|
||||||
**Note**: some other collection properties, such as *type*, *isVolatile*,
|
**Note**: some other collection properties, such as *type*, *isVolatile*,
|
||||||
*keyOptions*, *numberOfShards* or *shardingStrategy* cannot be changed once
|
*keyOptions*, *numberOfShards* or *shardingStrategy* cannot be changed once
|
||||||
the collection is created.
|
the collection is created.
|
||||||
|
|
|
@ -95,7 +95,8 @@ bool Job::finish(std::string const& server, std::string const& shard,
|
||||||
try {
|
try {
|
||||||
jobType = pending.slice()[0].get("type").copyString();
|
jobType = pending.slice()[0].get("type").copyString();
|
||||||
} catch (std::exception const&) {
|
} catch (std::exception const&) {
|
||||||
LOG_TOPIC("76352", WARN, Logger::AGENCY) << "Failed to obtain type of job " << _jobId;
|
LOG_TOPIC("76352", WARN, Logger::AGENCY)
|
||||||
|
<< "Failed to obtain type of job " << _jobId;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Additional payload, which is to be executed in the finish transaction
|
// Additional payload, which is to be executed in the finish transaction
|
||||||
|
@ -156,7 +157,6 @@ bool Job::finish(std::string const& server, std::string const& shard,
|
||||||
finished.add(prec.key.copyString(), prec.value);
|
finished.add(prec.key.copyString(), prec.value);
|
||||||
}
|
}
|
||||||
} // -- preconditions
|
} // -- preconditions
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
write_ret_t res = singleWriteTransaction(_agent, finished, false);
|
write_ret_t res = singleWriteTransaction(_agent, finished, false);
|
||||||
|
@ -189,11 +189,11 @@ std::string Job::randomIdleAvailableServer(Node const& snap,
|
||||||
for (auto const& srv : snap.hasAsChildren(healthPrefix).first) {
|
for (auto const& srv : snap.hasAsChildren(healthPrefix).first) {
|
||||||
// ignore excluded servers
|
// ignore excluded servers
|
||||||
if (std::find(std::begin(exclude), std::end(exclude), srv.first) != std::end(exclude)) {
|
if (std::find(std::begin(exclude), std::end(exclude), srv.first) != std::end(exclude)) {
|
||||||
continue ;
|
continue;
|
||||||
}
|
}
|
||||||
// ignore servers not in availableServers above:
|
// ignore servers not in availableServers above:
|
||||||
if (std::find(std::begin(as), std::end(as), srv.first) == std::end(as)) {
|
if (std::find(std::begin(as), std::end(as), srv.first) == std::end(as)) {
|
||||||
continue ;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
std::string const& status = (*srv.second).hasAsString("Status").first;
|
std::string const& status = (*srv.second).hasAsString("Status").first;
|
||||||
|
@ -269,8 +269,9 @@ size_t Job::countGoodOrBadServersInList(Node const& snap, VPackSlice const& serv
|
||||||
}
|
}
|
||||||
|
|
||||||
// The following counts in a given server list how many of the servers are
|
// The following counts in a given server list how many of the servers are
|
||||||
// in Status "GOOD" or "BAD".
|
// in Status "GOOD" or "BAD".
|
||||||
size_t Job::countGoodOrBadServersInList(Node const& snap, std::vector<std::string> const& serverList) {
|
size_t Job::countGoodOrBadServersInList(Node const& snap,
|
||||||
|
std::vector<std::string> const& serverList) {
|
||||||
size_t count = 0;
|
size_t count = 0;
|
||||||
auto const& health = snap.hasAsChildren(healthPrefix);
|
auto const& health = snap.hasAsChildren(healthPrefix);
|
||||||
// Do we have a Health substructure?
|
// Do we have a Health substructure?
|
||||||
|
@ -294,7 +295,8 @@ size_t Job::countGoodOrBadServersInList(Node const& snap, std::vector<std::strin
|
||||||
}
|
}
|
||||||
|
|
||||||
/// @brief Check if a server is cleaned or to be cleaned out:
|
/// @brief Check if a server is cleaned or to be cleaned out:
|
||||||
bool Job::isInServerList(Node const& snap, std::string const& prefix, std::string const& server, bool isArray) {
|
bool Job::isInServerList(Node const& snap, std::string const& prefix,
|
||||||
|
std::string const& server, bool isArray) {
|
||||||
VPackSlice slice;
|
VPackSlice slice;
|
||||||
bool found = false;
|
bool found = false;
|
||||||
if (isArray) {
|
if (isArray) {
|
||||||
|
@ -418,16 +420,15 @@ std::vector<Job::shard_t> Job::clones(Node const& snapshot, std::string const& d
|
||||||
|
|
||||||
for (const auto& colptr : snapshot.hasAsChildren(databasePath).first) { // collections
|
for (const auto& colptr : snapshot.hasAsChildren(databasePath).first) { // collections
|
||||||
|
|
||||||
auto const &col = *colptr.second;
|
auto const& col = *colptr.second;
|
||||||
auto const &otherCollection = colptr.first;
|
auto const& otherCollection = colptr.first;
|
||||||
|
|
||||||
if (otherCollection != collection && col.has("distributeShardsLike") && // use .has() form to prevent logging of missing
|
if (otherCollection != collection && col.has("distributeShardsLike") && // use .has() form to prevent logging of missing
|
||||||
col.hasAsSlice("distributeShardsLike").first.copyString() == collection) {
|
col.hasAsSlice("distributeShardsLike").first.copyString() == collection) {
|
||||||
auto const& theirshards = sortedShardList(col.hasAsNode("shards").first);
|
auto const& theirshards = sortedShardList(col.hasAsNode("shards").first);
|
||||||
if (theirshards.size() > 0) { // do not care about virtual collections
|
if (theirshards.size() > 0) { // do not care about virtual collections
|
||||||
if (theirshards.size() == myshards.size()) {
|
if (theirshards.size() == myshards.size()) {
|
||||||
ret.emplace_back(otherCollection,
|
ret.emplace_back(otherCollection, theirshards[steps]);
|
||||||
theirshards[steps]);
|
|
||||||
} else {
|
} else {
|
||||||
LOG_TOPIC("3092e", ERR, Logger::SUPERVISION)
|
LOG_TOPIC("3092e", ERR, Logger::SUPERVISION)
|
||||||
<< "Shard distribution of clone(" << otherCollection
|
<< "Shard distribution of clone(" << otherCollection
|
||||||
|
@ -452,10 +453,11 @@ std::string Job::findNonblockedCommonHealthyInSyncFollower( // Which is in "GOO
|
||||||
|
|
||||||
std::unordered_map<std::string, size_t> currentServers;
|
std::unordered_map<std::string, size_t> currentServers;
|
||||||
for (const auto& clone : cs) {
|
for (const auto& clone : cs) {
|
||||||
auto currentShardPath = curColPrefix + db + "/" + clone.collection + "/" +
|
auto sharedPath = db + "/" + clone.collection + "/";
|
||||||
clone.shard + "/servers";
|
auto currentShardPath = curColPrefix + sharedPath + clone.shard + "/servers";
|
||||||
auto plannedShardPath =
|
auto currentFailoverCandidatesPath =
|
||||||
planColPrefix + db + "/" + clone.collection + "/shards/" + clone.shard;
|
curColPrefix + sharedPath + clone.shard + "/servers";
|
||||||
|
auto plannedShardPath = planColPrefix + sharedPath + "shards/" + clone.shard;
|
||||||
size_t i = 0;
|
size_t i = 0;
|
||||||
|
|
||||||
// start up race condition ... current might not have everything in plan
|
// start up race condition ... current might not have everything in plan
|
||||||
|
@ -464,13 +466,30 @@ std::string Job::findNonblockedCommonHealthyInSyncFollower( // Which is in "GOO
|
||||||
continue;
|
continue;
|
||||||
} // if
|
} // if
|
||||||
|
|
||||||
for (const auto& server :
|
bool isArray = false;
|
||||||
VPackArrayIterator(snap.hasAsArray(currentShardPath).first)) {
|
VPackSlice serverList;
|
||||||
auto id = server.copyString();
|
// If we do have failover candidates, we should use them
|
||||||
|
std::tie(serverList, isArray) = snap.hasAsArray(currentFailoverCandidatesPath);
|
||||||
|
if (!isArray) {
|
||||||
|
// We have old DBServers that do not report failover candidates,
|
||||||
|
// Need to rely on current
|
||||||
|
std::tie(serverList, isArray) = snap.hasAsArray(currentShardPath);
|
||||||
|
TRI_ASSERT(isArray);
|
||||||
|
if (!isArray) {
|
||||||
|
THROW_ARANGO_EXCEPTION_MESSAGE(
|
||||||
|
TRI_ERROR_SUPERVISION_GENERAL_FAILURE,
|
||||||
|
"Could not find common insync server for: " + currentShardPath +
|
||||||
|
", value is not an array.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// Guarantieed by if above
|
||||||
|
TRI_ASSERT(serverList.isArray());
|
||||||
|
for (const auto& server : VPackArrayIterator(serverList)) {
|
||||||
if (i++ == 0) {
|
if (i++ == 0) {
|
||||||
// Skip leader
|
// Skip leader
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
auto id = server.copyString();
|
||||||
|
|
||||||
if (!good[id]) {
|
if (!good[id]) {
|
||||||
// Skip unhealthy servers
|
// Skip unhealthy servers
|
||||||
|
@ -550,8 +569,8 @@ bool Job::abortable(Node const& snapshot, std::string const& jobId) {
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
void Job::doForAllShards(Node const& snapshot, std::string& database,
|
void Job::doForAllShards(
|
||||||
std::vector<shard_t>& shards,
|
Node const& snapshot, std::string& database, std::vector<shard_t>& shards,
|
||||||
std::function<void(Slice plan, Slice current, std::string& planPath, std::string& curPath)> worker) {
|
std::function<void(Slice plan, Slice current, std::string& planPath, std::string& curPath)> worker) {
|
||||||
for (auto const& collShard : shards) {
|
for (auto const& collShard : shards) {
|
||||||
std::string shard = collShard.shard;
|
std::string shard = collShard.shard;
|
||||||
|
|
|
@ -49,9 +49,7 @@ class RestAqlHandler : public RestVocbaseBaseHandler {
|
||||||
|
|
||||||
public:
|
public:
|
||||||
char const* name() const override final { return "RestAqlHandler"; }
|
char const* name() const override final { return "RestAqlHandler"; }
|
||||||
RequestLane lane() const override final {
|
RequestLane lane() const override final { return RequestLane::CLUSTER_AQL; }
|
||||||
return RequestLane::CLUSTER_INTERNAL;
|
|
||||||
}
|
|
||||||
RestStatus execute() override;
|
RestStatus execute() override;
|
||||||
RestStatus continueExecute() override;
|
RestStatus continueExecute() override;
|
||||||
|
|
||||||
|
|
|
@ -165,6 +165,30 @@ class CollectionInfoCurrent {
|
||||||
return v;
|
return v;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief returns the current failover candidates for the given shard
|
||||||
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
TEST_VIRTUAL std::vector<ServerID> failoverCandidates(ShardID const& shardID) const {
|
||||||
|
std::vector<ServerID> v;
|
||||||
|
|
||||||
|
auto it = _vpacks.find(shardID);
|
||||||
|
if (it != _vpacks.end()) {
|
||||||
|
VPackSlice slice = it->second->slice();
|
||||||
|
|
||||||
|
VPackSlice servers = slice.get(StaticStrings::FailoverCandidates);
|
||||||
|
if (servers.isArray()) {
|
||||||
|
for (auto const& server : VPackArrayIterator(servers)) {
|
||||||
|
TRI_ASSERT(server.isString());
|
||||||
|
if (server.isString()) {
|
||||||
|
v.push_back(server.copyString());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return v;
|
||||||
|
}
|
||||||
|
|
||||||
//////////////////////////////////////////////////////////////////////////////
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
/// @brief returns the errorMessage entry for one shardID
|
/// @brief returns the errorMessage entry for one shardID
|
||||||
//////////////////////////////////////////////////////////////////////////////
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
|
@ -93,7 +93,8 @@ CreateCollection::CreateCollection(MaintenanceFeature& feature, ActionDescriptio
|
||||||
TRI_ASSERT(type == TRI_COL_TYPE_DOCUMENT || type == TRI_COL_TYPE_EDGE);
|
TRI_ASSERT(type == TRI_COL_TYPE_DOCUMENT || type == TRI_COL_TYPE_EDGE);
|
||||||
|
|
||||||
if (!error.str().empty()) {
|
if (!error.str().empty()) {
|
||||||
LOG_TOPIC("7c60f", ERR, Logger::MAINTENANCE) << "CreateCollection: " << error.str();
|
LOG_TOPIC("7c60f", ERR, Logger::MAINTENANCE)
|
||||||
|
<< "CreateCollection: " << error.str();
|
||||||
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
||||||
setState(FAILED);
|
setState(FAILED);
|
||||||
}
|
}
|
||||||
|
@ -156,10 +157,12 @@ bool CreateCollection::first() {
|
||||||
LOG_TOPIC("9db9a", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("9db9a", DEBUG, Logger::MAINTENANCE)
|
||||||
<< "local collection " << database << "/"
|
<< "local collection " << database << "/"
|
||||||
<< shard << " successfully created";
|
<< shard << " successfully created";
|
||||||
col->followers()->setTheLeader(leader);
|
|
||||||
|
|
||||||
if (leader.empty()) {
|
if (leader.empty()) {
|
||||||
col->followers()->clear();
|
std::vector<std::string> noFollowers;
|
||||||
|
col->followers()->takeOverLeadership(noFollowers);
|
||||||
|
} else {
|
||||||
|
col->followers()->setTheLeader(leader);
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
|
|
||||||
|
|
|
@ -70,7 +70,8 @@ Result DBServerAgencySync::getLocalCollections(VPackBuilder& collections) {
|
||||||
}
|
}
|
||||||
|
|
||||||
if (dbfeature == nullptr) {
|
if (dbfeature == nullptr) {
|
||||||
LOG_TOPIC("d0ef2", ERR, Logger::HEARTBEAT) << "Failed to get feature database";
|
LOG_TOPIC("d0ef2", ERR, Logger::HEARTBEAT)
|
||||||
|
<< "Failed to get feature database";
|
||||||
return Result(TRI_ERROR_INTERNAL, "Failed to get feature database");
|
return Result(TRI_ERROR_INTERNAL, "Failed to get feature database");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -80,9 +81,7 @@ Result DBServerAgencySync::getLocalCollections(VPackBuilder& collections) {
|
||||||
if (!vocbase.use()) {
|
if (!vocbase.use()) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
auto unuse = scopeGuard([&vocbase] {
|
auto unuse = scopeGuard([&vocbase] { vocbase.release(); });
|
||||||
vocbase.release();
|
|
||||||
});
|
|
||||||
|
|
||||||
collections.add(VPackValue(vocbase.name()));
|
collections.add(VPackValue(vocbase.name()));
|
||||||
|
|
||||||
|
@ -100,8 +99,7 @@ Result DBServerAgencySync::getLocalCollections(VPackBuilder& collections) {
|
||||||
// generate a collection definition identical to that which would be
|
// generate a collection definition identical to that which would be
|
||||||
// persisted in the case of SingleServer
|
// persisted in the case of SingleServer
|
||||||
collection->properties(collections,
|
collection->properties(collections,
|
||||||
LogicalDataSource::makeFlags(
|
LogicalDataSource::makeFlags(LogicalDataSource::Serialize::Detailed,
|
||||||
LogicalDataSource::Serialize::Detailed,
|
|
||||||
LogicalDataSource::Serialize::ForPersistence));
|
LogicalDataSource::Serialize::ForPersistence));
|
||||||
|
|
||||||
auto const& folls = collection->followers();
|
auto const& folls = collection->followers();
|
||||||
|
@ -119,19 +117,7 @@ Result DBServerAgencySync::getLocalCollections(VPackBuilder& collections) {
|
||||||
// we are the leader ourselves
|
// we are the leader ourselves
|
||||||
// In this case we report our in-sync followers here in the format
|
// In this case we report our in-sync followers here in the format
|
||||||
// of the agency: [ leader, follower1, follower2, ... ]
|
// of the agency: [ leader, follower1, follower2, ... ]
|
||||||
collections.add(VPackValue("servers"));
|
folls->injectFollowerInfo(collections);
|
||||||
|
|
||||||
{
|
|
||||||
VPackArrayBuilder guard(&collections);
|
|
||||||
|
|
||||||
collections.add(VPackValue(arangodb::ServerState::instance()->getId()));
|
|
||||||
|
|
||||||
std::shared_ptr<std::vector<ServerID> const> srvs = folls->get();
|
|
||||||
|
|
||||||
for (auto const& s : *srvs) {
|
|
||||||
collections.add(VPackValue(s));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -151,14 +137,20 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
|
|
||||||
LOG_TOPIC("62fd8", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("62fd8", DEBUG, Logger::MAINTENANCE)
|
||||||
<< "DBServerAgencySync::execute starting";
|
<< "DBServerAgencySync::execute starting";
|
||||||
|
DBServerAgencySyncResult result;
|
||||||
auto* sysDbFeature =
|
auto* sysDbFeature =
|
||||||
application_features::ApplicationServer::lookupFeature<SystemDatabaseFeature>();
|
application_features::ApplicationServer::lookupFeature<SystemDatabaseFeature>();
|
||||||
MaintenanceFeature* mfeature =
|
MaintenanceFeature* mfeature =
|
||||||
ApplicationServer::getFeature<MaintenanceFeature>("Maintenance");
|
ApplicationServer::getFeature<MaintenanceFeature>("Maintenance");
|
||||||
|
if (mfeature == nullptr) {
|
||||||
|
LOG_TOPIC("3a1f7", ERR, Logger::MAINTENANCE)
|
||||||
|
<< "Could not load maintenance feature, can happen during shutdown.";
|
||||||
|
result.success = false;
|
||||||
|
result.errorMessage = "Could not load maintenance feature";
|
||||||
|
return result;
|
||||||
|
}
|
||||||
arangodb::SystemDatabaseFeature::ptr vocbase =
|
arangodb::SystemDatabaseFeature::ptr vocbase =
|
||||||
sysDbFeature ? sysDbFeature->use() : nullptr;
|
sysDbFeature ? sysDbFeature->use() : nullptr;
|
||||||
DBServerAgencySyncResult result;
|
|
||||||
|
|
||||||
if (vocbase == nullptr) {
|
if (vocbase == nullptr) {
|
||||||
LOG_TOPIC("18d67", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("18d67", DEBUG, Logger::MAINTENANCE)
|
||||||
|
@ -196,20 +188,21 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
VPackObjectBuilder o(&rb);
|
VPackObjectBuilder o(&rb);
|
||||||
|
|
||||||
auto startTimePhaseOne = std::chrono::steady_clock::now();
|
auto startTimePhaseOne = std::chrono::steady_clock::now();
|
||||||
LOG_TOPIC("19aaf", DEBUG, Logger::MAINTENANCE) << "DBServerAgencySync::phaseOne";
|
LOG_TOPIC("19aaf", DEBUG, Logger::MAINTENANCE)
|
||||||
|
<< "DBServerAgencySync::phaseOne";
|
||||||
tmp = arangodb::maintenance::phaseOne(plan->slice(), local.slice(),
|
tmp = arangodb::maintenance::phaseOne(plan->slice(), local.slice(),
|
||||||
serverId, *mfeature, rb);
|
serverId, *mfeature, rb);
|
||||||
auto endTimePhaseOne = std::chrono::steady_clock::now();
|
auto endTimePhaseOne = std::chrono::steady_clock::now();
|
||||||
LOG_TOPIC("93f83", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("93f83", DEBUG, Logger::MAINTENANCE)
|
||||||
<< "DBServerAgencySync::phaseOne done";
|
<< "DBServerAgencySync::phaseOne done";
|
||||||
|
|
||||||
if (endTimePhaseOne - startTimePhaseOne >
|
if (endTimePhaseOne - startTimePhaseOne > std::chrono::milliseconds(200)) {
|
||||||
std::chrono::milliseconds(200)) {
|
|
||||||
// We take this as indication that many shards are in the system,
|
// We take this as indication that many shards are in the system,
|
||||||
// in this case: give some asynchronous jobs created in phaseOne a
|
// in this case: give some asynchronous jobs created in phaseOne a
|
||||||
// chance to complete before we collect data for phaseTwo:
|
// chance to complete before we collect data for phaseTwo:
|
||||||
LOG_TOPIC("ef730", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("ef730", DEBUG, Logger::MAINTENANCE)
|
||||||
<< "DBServerAgencySync::hesitating between phases 1 and 2 for 0.1s...";
|
<< "DBServerAgencySync::hesitating between phases 1 and 2 for "
|
||||||
|
"0.1s...";
|
||||||
std::this_thread::sleep_for(std::chrono::milliseconds(100));
|
std::this_thread::sleep_for(std::chrono::milliseconds(100));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -224,6 +217,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
LOG_TOPIC("675fd", TRACE, Logger::MAINTENANCE)
|
LOG_TOPIC("675fd", TRACE, Logger::MAINTENANCE)
|
||||||
<< "DBServerAgencySync::phaseTwo - current state: " << current->toJson();
|
<< "DBServerAgencySync::phaseTwo - current state: " << current->toJson();
|
||||||
|
|
||||||
|
mfeature->increaseCurrentCounter();
|
||||||
|
|
||||||
local.clear();
|
local.clear();
|
||||||
glc = getLocalCollections(local);
|
glc = getLocalCollections(local);
|
||||||
// We intentionally refetch local collections here, such that phase 2
|
// We intentionally refetch local collections here, such that phase 2
|
||||||
|
@ -237,7 +232,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
return result;
|
return result;
|
||||||
}
|
}
|
||||||
|
|
||||||
LOG_TOPIC("652ff", DEBUG, Logger::MAINTENANCE) << "DBServerAgencySync::phaseTwo";
|
LOG_TOPIC("652ff", DEBUG, Logger::MAINTENANCE)
|
||||||
|
<< "DBServerAgencySync::phaseTwo";
|
||||||
|
|
||||||
tmp = arangodb::maintenance::phaseTwo(plan->slice(), current->slice(),
|
tmp = arangodb::maintenance::phaseTwo(plan->slice(), current->slice(),
|
||||||
local.slice(), serverId, *mfeature, rb);
|
local.slice(), serverId, *mfeature, rb);
|
||||||
|
@ -246,7 +242,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
<< "DBServerAgencySync::phaseTwo done";
|
<< "DBServerAgencySync::phaseTwo done";
|
||||||
|
|
||||||
} catch (std::exception const& e) {
|
} catch (std::exception const& e) {
|
||||||
LOG_TOPIC("cd308", ERR, Logger::MAINTENANCE) << "Failed to handle plan change: " << e.what();
|
LOG_TOPIC("cd308", ERR, Logger::MAINTENANCE)
|
||||||
|
<< "Failed to handle plan change: " << e.what();
|
||||||
}
|
}
|
||||||
|
|
||||||
if (rb.isClosed()) {
|
if (rb.isClosed()) {
|
||||||
|
@ -268,9 +265,9 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
|
|
||||||
if (ao.value.hasKey("precondition")) {
|
if (ao.value.hasKey("precondition")) {
|
||||||
auto const precondition = ao.value.get("precondition");
|
auto const precondition = ao.value.get("precondition");
|
||||||
preconditions.push_back(
|
preconditions.push_back(AgencyPrecondition(precondition.keyAt(0).copyString(),
|
||||||
AgencyPrecondition(
|
AgencyPrecondition::Type::VALUE,
|
||||||
precondition.keyAt(0).copyString(), AgencyPrecondition::Type::VALUE, precondition.valueAt(0)));
|
precondition.valueAt(0)));
|
||||||
}
|
}
|
||||||
|
|
||||||
if (op == "set") {
|
if (op == "set") {
|
||||||
|
@ -279,7 +276,6 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
} else if (op == "delete") {
|
} else if (op == "delete") {
|
||||||
operations.push_back(AgencyOperation(key, AgencySimpleOperationType::DELETE_OP));
|
operations.push_back(AgencyOperation(key, AgencySimpleOperationType::DELETE_OP));
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
operations.push_back(AgencyOperation("Current/Version",
|
operations.push_back(AgencyOperation("Current/Version",
|
||||||
AgencySimpleOperationType::INCREMENT_OP));
|
AgencySimpleOperationType::INCREMENT_OP));
|
||||||
|
@ -289,8 +285,7 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
if (!r.successful()) {
|
if (!r.successful()) {
|
||||||
LOG_TOPIC("d73b8", INFO, Logger::MAINTENANCE)
|
LOG_TOPIC("d73b8", INFO, Logger::MAINTENANCE)
|
||||||
<< "Error reporting to agency: _statusCode: " << r.errorCode()
|
<< "Error reporting to agency: _statusCode: " << r.errorCode()
|
||||||
<< " message: " << r.errorMessage()
|
<< " message: " << r.errorMessage() << ". This can be ignored, since it will be retried automatically.";
|
||||||
<< ". This can be ignored, since it will be retried automatically.";
|
|
||||||
} else {
|
} else {
|
||||||
LOG_TOPIC("9b0b3", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("9b0b3", DEBUG, Logger::MAINTENANCE)
|
||||||
<< "Invalidating current in ClusterInfo";
|
<< "Invalidating current in ClusterInfo";
|
||||||
|
@ -317,8 +312,9 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
result.errorMessage = "Report from phase 1 and 2 was no object.";
|
result.errorMessage = "Report from phase 1 and 2 was no object.";
|
||||||
try {
|
try {
|
||||||
std::string json = report.toJson();
|
std::string json = report.toJson();
|
||||||
LOG_TOPIC("65fde", WARN, Logger::MAINTENANCE) << "Report from phase 1 and 2 was: " << json;
|
LOG_TOPIC("65fde", WARN, Logger::MAINTENANCE)
|
||||||
} catch(std::exception const& exc) {
|
<< "Report from phase 1 and 2 was: " << json;
|
||||||
|
} catch (std::exception const& exc) {
|
||||||
LOG_TOPIC("54de2", WARN, Logger::MAINTENANCE)
|
LOG_TOPIC("54de2", WARN, Logger::MAINTENANCE)
|
||||||
<< "Report from phase 1 and 2 could not be dumped to JSON, error: "
|
<< "Report from phase 1 and 2 could not be dumped to JSON, error: "
|
||||||
<< exc.what() << ", head byte:" << report.head();
|
<< exc.what() << ", head byte:" << report.head();
|
||||||
|
@ -329,8 +325,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
<< "Report from phase 1 and 2, byte size: " << l;
|
<< "Report from phase 1 and 2, byte size: " << l;
|
||||||
LOG_TOPIC("67421", WARN, Logger::MAINTENANCE)
|
LOG_TOPIC("67421", WARN, Logger::MAINTENANCE)
|
||||||
<< "Bytes: "
|
<< "Bytes: "
|
||||||
<< arangodb::basics::StringUtils::encodeHex((char const*) report.start(), l);
|
<< arangodb::basics::StringUtils::encodeHex((char const*)report.start(), l);
|
||||||
} catch(...) {
|
} catch (...) {
|
||||||
LOG_TOPIC("76124", WARN, Logger::MAINTENANCE)
|
LOG_TOPIC("76124", WARN, Logger::MAINTENANCE)
|
||||||
<< "Report from phase 1 and 2, byte size throws.";
|
<< "Report from phase 1 and 2, byte size throws.";
|
||||||
}
|
}
|
||||||
|
@ -342,7 +338,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
||||||
|
|
||||||
auto took = duration<double>(clock::now() - start).count();
|
auto took = duration<double>(clock::now() - start).count();
|
||||||
if (took > 30.0) {
|
if (took > 30.0) {
|
||||||
LOG_TOPIC("83cb8", WARN, Logger::MAINTENANCE) << "DBServerAgencySync::execute "
|
LOG_TOPIC("83cb8", WARN, Logger::MAINTENANCE)
|
||||||
|
<< "DBServerAgencySync::execute "
|
||||||
"took "
|
"took "
|
||||||
<< took << " s to execute handlePlanChange";
|
<< took << " s to execute handlePlanChange";
|
||||||
}
|
}
|
||||||
|
|
|
@ -25,6 +25,7 @@
|
||||||
#include "FollowerInfo.h"
|
#include "FollowerInfo.h"
|
||||||
|
|
||||||
#include "ApplicationFeatures/ApplicationServer.h"
|
#include "ApplicationFeatures/ApplicationServer.h"
|
||||||
|
#include "Cluster/MaintenanceStrings.h"
|
||||||
#include "Cluster/ServerState.h"
|
#include "Cluster/ServerState.h"
|
||||||
#include "VocBase/LogicalCollection.h"
|
#include "VocBase/LogicalCollection.h"
|
||||||
|
|
||||||
|
@ -32,56 +33,12 @@
|
||||||
|
|
||||||
using namespace arangodb;
|
using namespace arangodb;
|
||||||
|
|
||||||
////////////////////////////////////////////////////////////////////////////////
|
static std::string const inline reportName(bool isRemove) {
|
||||||
/// @brief change JSON under
|
if (isRemove) {
|
||||||
/// Current/Collection/<DB-name>/<Collection-ID>/<shard-ID>
|
return "FollowerInfo::remove";
|
||||||
/// to add or remove a serverID, if add flag is true, the entry is added
|
|
||||||
/// (if it is not yet there), otherwise the entry is removed (if it was
|
|
||||||
/// there).
|
|
||||||
////////////////////////////////////////////////////////////////////////////////
|
|
||||||
|
|
||||||
static VPackBuilder newShardEntry(VPackSlice oldValue, ServerID const& sid, bool add) {
|
|
||||||
VPackBuilder newValue;
|
|
||||||
VPackSlice servers;
|
|
||||||
{
|
|
||||||
VPackObjectBuilder b(&newValue);
|
|
||||||
// Now need to find the `servers` attribute, which is a list:
|
|
||||||
for (auto const& it : VPackObjectIterator(oldValue)) {
|
|
||||||
if (it.key.isEqualString("servers")) {
|
|
||||||
servers = it.value;
|
|
||||||
} else {
|
} else {
|
||||||
newValue.add(it.key);
|
return "FollowerInfo::add";
|
||||||
newValue.add(it.value);
|
|
||||||
}
|
}
|
||||||
}
|
|
||||||
newValue.add(VPackValue("servers"));
|
|
||||||
if (servers.isArray() && servers.length() > 0) {
|
|
||||||
VPackArrayBuilder bb(&newValue);
|
|
||||||
newValue.add(servers[0]);
|
|
||||||
VPackArrayIterator it(servers);
|
|
||||||
bool done = false;
|
|
||||||
for (++it; it.valid(); ++it) {
|
|
||||||
if ((*it).isEqualString(sid)) {
|
|
||||||
if (add) {
|
|
||||||
newValue.add(*it);
|
|
||||||
done = true;
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
newValue.add(*it);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (add && !done) {
|
|
||||||
newValue.add(VPackValue(sid));
|
|
||||||
}
|
|
||||||
} else {
|
|
||||||
VPackArrayBuilder bb(&newValue);
|
|
||||||
newValue.add(VPackValue(ServerState::instance()->getId()));
|
|
||||||
if (add) {
|
|
||||||
newValue.add(VPackValue(sid));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return newValue;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
static std::string CurrentShardPath(arangodb::LogicalCollection& col) {
|
static std::string CurrentShardPath(arangodb::LogicalCollection& col) {
|
||||||
|
@ -136,6 +93,15 @@ Result FollowerInfo::add(ServerID const& sid) {
|
||||||
v = std::make_shared<std::vector<ServerID>>(*_followers);
|
v = std::make_shared<std::vector<ServerID>>(*_followers);
|
||||||
v->push_back(sid); // add a single entry
|
v->push_back(sid); // add a single entry
|
||||||
_followers = v; // will cast to std::vector<ServerID> const
|
_followers = v; // will cast to std::vector<ServerID> const
|
||||||
|
{
|
||||||
|
// insertIntoCandidates
|
||||||
|
if (std::find(_failoverCandidates->begin(), _failoverCandidates->end(), sid) ==
|
||||||
|
_failoverCandidates->end()) {
|
||||||
|
auto nextCandidates = std::make_shared<std::vector<ServerID>>(*_failoverCandidates);
|
||||||
|
nextCandidates->push_back(sid); // add a single entry
|
||||||
|
_failoverCandidates = nextCandidates; // will cast to std::vector<ServerID> const
|
||||||
|
}
|
||||||
|
}
|
||||||
#ifdef DEBUG_SYNC_REPLICATION
|
#ifdef DEBUG_SYNC_REPLICATION
|
||||||
if (!AgencyCommManager::MANAGER) {
|
if (!AgencyCommManager::MANAGER) {
|
||||||
return {TRI_ERROR_NO_ERROR};
|
return {TRI_ERROR_NO_ERROR};
|
||||||
|
@ -144,23 +110,15 @@ Result FollowerInfo::add(ServerID const& sid) {
|
||||||
}
|
}
|
||||||
|
|
||||||
// Now tell the agency
|
// Now tell the agency
|
||||||
TRI_ASSERT(_docColl != nullptr);
|
auto agencyRes = persistInAgency(false);
|
||||||
std::string curPath = CurrentShardPath(*_docColl);
|
if (agencyRes.ok() || agencyRes.is(TRI_ERROR_CLUSTER_NOT_LEADER)) {
|
||||||
std::string planPath = PlanShardPath(*_docColl);
|
// Not a leader is expected
|
||||||
AgencyComm ac;
|
return agencyRes;
|
||||||
double startTime = TRI_microtime();
|
}
|
||||||
do {
|
// Real error, report
|
||||||
AgencyReadTransaction trx(std::vector<std::string>(
|
|
||||||
{AgencyCommManager::path(planPath), AgencyCommManager::path(curPath)}));
|
|
||||||
AgencyCommResult res = ac.sendTransactionWithFailover(trx);
|
|
||||||
|
|
||||||
if (res.successful()) {
|
|
||||||
TRI_ASSERT(res.slice().isArray() && res.slice().length() == 1);
|
|
||||||
VPackSlice resSlice = res.slice()[0];
|
|
||||||
// Let's look at the results, note that both can be None!
|
|
||||||
velocypack::Slice planEntry = PlanShardEntry(*_docColl, resSlice);
|
|
||||||
velocypack::Slice currentEntry = CurrentShardEntry(*_docColl, resSlice);
|
|
||||||
|
|
||||||
|
<<<<<<< HEAD
|
||||||
|
=======
|
||||||
if (!currentEntry.isObject()) {
|
if (!currentEntry.isObject()) {
|
||||||
LOG_TOPIC("b753d", ERR, Logger::CLUSTER)
|
LOG_TOPIC("b753d", ERR, Logger::CLUSTER)
|
||||||
<< "FollowerInfo::add, did not find object in " << curPath;
|
<< "FollowerInfo::add, did not find object in " << curPath;
|
||||||
|
@ -210,14 +168,15 @@ Result FollowerInfo::add(ServerID const& sid) {
|
||||||
int errorCode = (application_features::ApplicationServer::isStopping())
|
int errorCode = (application_features::ApplicationServer::isStopping())
|
||||||
? TRI_ERROR_SHUTTING_DOWN
|
? TRI_ERROR_SHUTTING_DOWN
|
||||||
: TRI_ERROR_CLUSTER_AGENCY_COMMUNICATION_FAILED;
|
: TRI_ERROR_CLUSTER_AGENCY_COMMUNICATION_FAILED;
|
||||||
|
>>>>>>> c922c5f1332482ef29dff794d8af394d31c1b737
|
||||||
std::string errorMessage =
|
std::string errorMessage =
|
||||||
"unable to add follower in agency, timeout in agency CAS operation for "
|
"unable to add follower in agency, timeout in agency CAS operation for "
|
||||||
"key " +
|
"key " +
|
||||||
_docColl->vocbase().name() + "/" + std::to_string(_docColl->planId()) +
|
_docColl->vocbase().name() + "/" + std::to_string(_docColl->planId()) +
|
||||||
": " + TRI_errno_string(errorCode);
|
": " + TRI_errno_string(agencyRes.errorNumber());
|
||||||
LOG_TOPIC("6295b", ERR, Logger::CLUSTER) << errorMessage;
|
LOG_TOPIC("6295b", ERR, Logger::CLUSTER) << errorMessage;
|
||||||
|
agencyRes.reset(agencyRes.errorNumber(), std::move(errorMessage));
|
||||||
return {errorCode, std::move(errorMessage)};
|
return agencyRes;
|
||||||
}
|
}
|
||||||
|
|
||||||
////////////////////////////////////////////////////////////////////////////////
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
@ -246,46 +205,192 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
||||||
<< "Removing follower " << sid << " from " << _docColl->name();
|
<< "Removing follower " << sid << " from " << _docColl->name();
|
||||||
|
|
||||||
MUTEX_LOCKER(locker, _agencyMutex);
|
MUTEX_LOCKER(locker, _agencyMutex);
|
||||||
|
WRITE_LOCKER(canWriteLocker, _canWriteLock);
|
||||||
WRITE_LOCKER(writeLocker, _dataLock); // the data lock has to be locked until this function completes
|
WRITE_LOCKER(writeLocker, _dataLock); // the data lock has to be locked until this function completes
|
||||||
// because if the agency communication does not work
|
// because if the agency communication does not work
|
||||||
// local data is modified again.
|
// local data is modified again.
|
||||||
|
|
||||||
// First check if there is anything to do:
|
// First check if there is anything to do:
|
||||||
bool found = false;
|
if (std::find(_followers->begin(), _followers->end(), sid) == _followers->end()) {
|
||||||
for (auto const& s : *_followers) {
|
TRI_ASSERT(std::find(_failoverCandidates->begin(), _failoverCandidates->end(),
|
||||||
if (s == sid) {
|
sid) == _failoverCandidates->end());
|
||||||
found = true;
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
if (!found) {
|
|
||||||
return {TRI_ERROR_NO_ERROR}; // nothing to do
|
return {TRI_ERROR_NO_ERROR}; // nothing to do
|
||||||
}
|
}
|
||||||
|
// Both lists have to be in sync at any time!
|
||||||
|
TRI_ASSERT(std::find(_failoverCandidates->begin(), _failoverCandidates->end(),
|
||||||
|
sid) != _failoverCandidates->end());
|
||||||
|
auto oldFollowers = _followers;
|
||||||
|
auto oldFailovers = _failoverCandidates;
|
||||||
|
{
|
||||||
auto v = std::make_shared<std::vector<ServerID>>();
|
auto v = std::make_shared<std::vector<ServerID>>();
|
||||||
if (_followers->size() > 0) {
|
TRI_ASSERT(!_followers->empty()); // well we found the element above \o/
|
||||||
v->reserve(_followers->size() - 1);
|
v->reserve(_followers->size() - 1);
|
||||||
for (auto const& i : *_followers) {
|
std::remove_copy(_followers->begin(), _followers->end(),
|
||||||
if (i != sid) {
|
std::back_inserter(*v.get()), sid);
|
||||||
v->push_back(i);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
auto _oldFollowers = _followers;
|
|
||||||
_followers = v; // will cast to std::vector<ServerID> const
|
_followers = v; // will cast to std::vector<ServerID> const
|
||||||
|
}
|
||||||
|
{
|
||||||
|
auto v = std::make_shared<std::vector<ServerID>>();
|
||||||
|
TRI_ASSERT(!_failoverCandidates->empty()); // well we found the element above \o/
|
||||||
|
v->reserve(_failoverCandidates->size() - 1);
|
||||||
|
std::remove_copy(_failoverCandidates->begin(), _failoverCandidates->end(),
|
||||||
|
std::back_inserter(*v.get()), sid);
|
||||||
|
_failoverCandidates = v; // will cast to std::vector<ServerID> const
|
||||||
|
}
|
||||||
#ifdef DEBUG_SYNC_REPLICATION
|
#ifdef DEBUG_SYNC_REPLICATION
|
||||||
if (!AgencyCommManager::MANAGER) {
|
if (!AgencyCommManager::MANAGER) {
|
||||||
return {TRI_ERROR_NO_ERROR};
|
return {TRI_ERROR_NO_ERROR};
|
||||||
}
|
}
|
||||||
#endif
|
#endif
|
||||||
|
Result agencyRes = persistInAgency(true);
|
||||||
|
if (agencyRes.ok()) {
|
||||||
|
// +1 for the leader (me)
|
||||||
|
if (_followers->size() + 1 < _docColl->minReplicationFactor()) {
|
||||||
|
_canWrite = false;
|
||||||
|
}
|
||||||
|
// we are finished
|
||||||
|
LOG_TOPIC("be0cb", DEBUG, Logger::CLUSTER)
|
||||||
|
<< "Removing follower " << sid << " from " << _docColl->name() << "succeeded";
|
||||||
|
return agencyRes;
|
||||||
|
}
|
||||||
|
if (agencyRes.is(TRI_ERROR_CLUSTER_NOT_LEADER)) {
|
||||||
|
// Next run in Maintenance will fix this.
|
||||||
|
return agencyRes;
|
||||||
|
}
|
||||||
|
|
||||||
|
// rollback:
|
||||||
|
_followers = oldFollowers;
|
||||||
|
_failoverCandidates = oldFailovers;
|
||||||
|
std::string errorMessage =
|
||||||
|
"unable to remove follower from agency, timeout in agency CAS operation "
|
||||||
|
"for key " +
|
||||||
|
_docColl->vocbase().name() + "/" + std::to_string(_docColl->planId()) +
|
||||||
|
": " + TRI_errno_string(agencyRes.errorNumber());
|
||||||
|
LOG_TOPIC("a0dcc", ERR, Logger::CLUSTER) << errorMessage;
|
||||||
|
agencyRes.resetErrorMessage<std::string>(std::move(errorMessage));
|
||||||
|
return agencyRes;
|
||||||
|
}
|
||||||
|
|
||||||
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief clear follower list, no changes in agency necessary
|
||||||
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
void FollowerInfo::clear() {
|
||||||
|
WRITE_LOCKER(canWriteLocker, _canWriteLock);
|
||||||
|
WRITE_LOCKER(writeLocker, _dataLock);
|
||||||
|
_followers = std::make_shared<std::vector<ServerID>>();
|
||||||
|
_failoverCandidates = std::make_shared<std::vector<ServerID>>();
|
||||||
|
_canWrite = false;
|
||||||
|
}
|
||||||
|
|
||||||
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief check whether the given server is a follower
|
||||||
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
bool FollowerInfo::contains(ServerID const& sid) const {
|
||||||
|
READ_LOCKER(readLocker, _dataLock);
|
||||||
|
auto const& f = *_followers;
|
||||||
|
return std::find(f.begin(), f.end(), sid) != f.end();
|
||||||
|
}
|
||||||
|
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief Take over leadership for this shard.
|
||||||
|
/// Also inject information of a insync followers that we knew about
|
||||||
|
/// before a failover to this server has happened
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
void FollowerInfo::takeOverLeadership(std::vector<std::string> const& previousInsyncFollowers) {
|
||||||
|
// This function copies over the information taken from the last CURRENT into a local vector.
|
||||||
|
// Where we remove the old leader and ourself from the list of followers
|
||||||
|
WRITE_LOCKER(canWriteLocker, _canWriteLock);
|
||||||
|
WRITE_LOCKER(writeLocker, _dataLock);
|
||||||
|
// Reset local structures, if we take over leadership we do not know anything!
|
||||||
|
_followers = std::make_shared<std::vector<ServerID>>();
|
||||||
|
_failoverCandidates = std::make_shared<std::vector<ServerID>>();
|
||||||
|
// We disallow writes until the first write.
|
||||||
|
_canWrite = false;
|
||||||
|
// Take over leadership
|
||||||
|
_theLeader = "";
|
||||||
|
_theLeaderTouched = true;
|
||||||
|
TRI_ASSERT(_failoverCandidates != nullptr && _failoverCandidates->empty());
|
||||||
|
if (previousInsyncFollowers.size() > 1) {
|
||||||
|
auto ourselves = arangodb::ServerState::instance()->getId();
|
||||||
|
auto failoverCandidates =
|
||||||
|
std::make_shared<std::vector<ServerID>>(previousInsyncFollowers);
|
||||||
|
auto myEntry =
|
||||||
|
std::find(failoverCandidates->begin(), failoverCandidates->end(), ourselves);
|
||||||
|
// We are a valid failover follower
|
||||||
|
TRI_ASSERT(myEntry != failoverCandidates->end());
|
||||||
|
// The first server is a different leader! (For some reason the job can be
|
||||||
|
// triggered twice) TRI_ASSERT(myEntry != failoverCandidates->begin());
|
||||||
|
failoverCandidates->erase(myEntry);
|
||||||
|
// Put us in front, put old leader somewhere, we do not really care
|
||||||
|
_failoverCandidates = failoverCandidates;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief Update the current information in the Agency. We update the failover-
|
||||||
|
/// list with the newest values, after this the guarantee is that
|
||||||
|
/// _followers == _failoverCandidates
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
bool FollowerInfo::updateFailoverCandidates() {
|
||||||
|
MUTEX_LOCKER(agencyLocker, _agencyMutex);
|
||||||
|
// Acquire _canWriteLock first
|
||||||
|
WRITE_LOCKER(canWriteLocker, _canWriteLock);
|
||||||
|
// Next acquire _dataLock
|
||||||
|
WRITE_LOCKER(dataLocker, _dataLock);
|
||||||
|
if (_canWrite) {
|
||||||
|
// Short circuit, we have multiple writes in the above write lock
|
||||||
|
// The first needs to do things and flips _canWrite
|
||||||
|
// All followers can return as soon as the lock is released
|
||||||
|
#ifdef ARANGODB_ENABLE_MAINTAINER_MODE
|
||||||
|
TRI_ASSERT(_failoverCandidates->size() == _followers->size());
|
||||||
|
std::vector<std::string> diff;
|
||||||
|
std::set_symmetric_difference(_failoverCandidates->begin(),
|
||||||
|
_failoverCandidates->end(), _followers->begin(),
|
||||||
|
_followers->end(), std::back_inserter(diff));
|
||||||
|
TRI_ASSERT(diff.empty());
|
||||||
|
#endif
|
||||||
|
return _canWrite;
|
||||||
|
}
|
||||||
|
TRI_ASSERT(_followers->size() + 1 >= _docColl->minReplicationFactor());
|
||||||
|
// Update both lists (we use a copy here, as we are modifying them in other places individually!)
|
||||||
|
_failoverCandidates = std::make_shared<std::vector<ServerID> const>(*_followers);
|
||||||
|
// Just be sure
|
||||||
|
TRI_ASSERT(_failoverCandidates.get() != _followers.get());
|
||||||
|
TRI_ASSERT(_failoverCandidates->size() == _followers->size());
|
||||||
|
#ifdef ARANGODB_ENABLE_MAINTAINER_MODE
|
||||||
|
std::vector<std::string> diff;
|
||||||
|
std::set_symmetric_difference(_failoverCandidates->begin(),
|
||||||
|
_failoverCandidates->end(), _followers->begin(),
|
||||||
|
_followers->end(), std::back_inserter(diff));
|
||||||
|
TRI_ASSERT(diff.empty());
|
||||||
|
#endif
|
||||||
|
Result res = persistInAgency(true);
|
||||||
|
if (!res.ok()) {
|
||||||
|
// We could not persist the update in the agency.
|
||||||
|
// Collection left in RO mode.
|
||||||
|
LOG_TOPIC("7af00", INFO, Logger::CLUSTER)
|
||||||
|
<< "Could not persist insync follower for " << _docColl->vocbase().name()
|
||||||
|
<< "/" << std::to_string(_docColl->planId())
|
||||||
|
<< " keep RO-mode for now, next write will retry.";
|
||||||
|
TRI_ASSERT(!_canWrite);
|
||||||
|
} else {
|
||||||
|
_canWrite = true;
|
||||||
|
}
|
||||||
|
return _canWrite;
|
||||||
|
}
|
||||||
|
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief Persist information in Current
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
Result FollowerInfo::persistInAgency(bool isRemove) const {
|
||||||
// Now tell the agency
|
// Now tell the agency
|
||||||
TRI_ASSERT(_docColl != nullptr);
|
TRI_ASSERT(_docColl != nullptr);
|
||||||
std::string curPath = CurrentShardPath(*_docColl);
|
std::string curPath = CurrentShardPath(*_docColl);
|
||||||
std::string planPath = PlanShardPath(*_docColl);
|
std::string planPath = PlanShardPath(*_docColl);
|
||||||
|
|
||||||
AgencyComm ac;
|
AgencyComm ac;
|
||||||
double startTime = TRI_microtime();
|
|
||||||
do {
|
do {
|
||||||
AgencyReadTransaction trx(std::vector<std::string>(
|
AgencyReadTransaction trx(std::vector<std::string>(
|
||||||
{AgencyCommManager::path(planPath), AgencyCommManager::path(curPath)}));
|
{AgencyCommManager::path(planPath), AgencyCommManager::path(curPath)}));
|
||||||
|
@ -299,7 +404,7 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
||||||
|
|
||||||
if (!currentEntry.isObject()) {
|
if (!currentEntry.isObject()) {
|
||||||
LOG_TOPIC("01896", ERR, Logger::CLUSTER)
|
LOG_TOPIC("01896", ERR, Logger::CLUSTER)
|
||||||
<< "FollowerInfo::remove, did not find object in " << curPath;
|
<< reportName(isRemove) << ", did not find object in " << curPath;
|
||||||
if (!currentEntry.isNone()) {
|
if (!currentEntry.isNone()) {
|
||||||
LOG_TOPIC("57c84", ERR, Logger::CLUSTER) << "Found: " << currentEntry.toJson();
|
LOG_TOPIC("57c84", ERR, Logger::CLUSTER) << "Found: " << currentEntry.toJson();
|
||||||
}
|
}
|
||||||
|
@ -307,16 +412,16 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
||||||
if (!planEntry.isArray() || planEntry.length() == 0 || !planEntry[0].isString() ||
|
if (!planEntry.isArray() || planEntry.length() == 0 || !planEntry[0].isString() ||
|
||||||
!planEntry[0].isEqualString(ServerState::instance()->getId())) {
|
!planEntry[0].isEqualString(ServerState::instance()->getId())) {
|
||||||
LOG_TOPIC("42231", INFO, Logger::CLUSTER)
|
LOG_TOPIC("42231", INFO, Logger::CLUSTER)
|
||||||
<< "FollowerInfo::remove, did not find myself in Plan: "
|
<< reportName(isRemove)
|
||||||
<< _docColl->vocbase().name() << "/"
|
<< ", did not find myself in Plan: " << _docColl->vocbase().name()
|
||||||
<< std::to_string(_docColl->planId())
|
<< "/" << std::to_string(_docColl->planId())
|
||||||
<< " (can happen when the leader changed recently).";
|
<< " (can happen when the leader changed recently).";
|
||||||
if (!planEntry.isNone()) {
|
if (!planEntry.isNone()) {
|
||||||
LOG_TOPIC("ffede", INFO, Logger::CLUSTER) << "Found: " << planEntry.toJson();
|
LOG_TOPIC("ffede", INFO, Logger::CLUSTER) << "Found: " << planEntry.toJson();
|
||||||
}
|
}
|
||||||
return {TRI_ERROR_CLUSTER_NOT_LEADER};
|
return {TRI_ERROR_CLUSTER_NOT_LEADER};
|
||||||
} else {
|
} else {
|
||||||
auto newValue = newShardEntry(currentEntry, sid, false);
|
auto newValue = newShardEntry(currentEntry);
|
||||||
AgencyWriteTransaction trx;
|
AgencyWriteTransaction trx;
|
||||||
trx.preconditions.push_back(
|
trx.preconditions.push_back(
|
||||||
AgencyPrecondition(curPath, AgencyPrecondition::Type::VALUE, currentEntry));
|
AgencyPrecondition(curPath, AgencyPrecondition::Type::VALUE, currentEntry));
|
||||||
|
@ -328,19 +433,21 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
||||||
AgencyOperation("Current/Version", AgencySimpleOperationType::INCREMENT_OP));
|
AgencyOperation("Current/Version", AgencySimpleOperationType::INCREMENT_OP));
|
||||||
AgencyCommResult res2 = ac.sendTransactionWithFailover(trx);
|
AgencyCommResult res2 = ac.sendTransactionWithFailover(trx);
|
||||||
if (res2.successful()) {
|
if (res2.successful()) {
|
||||||
// we are finished
|
|
||||||
LOG_TOPIC("be0cb", DEBUG, Logger::CLUSTER)
|
|
||||||
<< "Removing follower " << sid << " from " << _docColl->name()
|
|
||||||
<< "succeeded";
|
|
||||||
return {TRI_ERROR_NO_ERROR};
|
return {TRI_ERROR_NO_ERROR};
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
} else {
|
} else {
|
||||||
LOG_TOPIC("b7333", WARN, Logger::CLUSTER)
|
LOG_TOPIC("b7333", WARN, Logger::CLUSTER)
|
||||||
<< "FollowerInfo::remove, could not read " << planPath << " and "
|
<< reportName(isRemove) << ", could not read " << planPath << " and "
|
||||||
<< curPath << " in agency.";
|
<< curPath << " in agency.";
|
||||||
}
|
}
|
||||||
|
<<<<<<< HEAD
|
||||||
|
using namespace std::chrono_literals;
|
||||||
|
std::this_thread::sleep_for(500ms);
|
||||||
|
} while (!application_features::ApplicationServer::isStopping());
|
||||||
|
return TRI_ERROR_SHUTTING_DOWN;
|
||||||
|
=======
|
||||||
std::this_thread::sleep_for(std::chrono::milliseconds(500));
|
std::this_thread::sleep_for(std::chrono::milliseconds(500));
|
||||||
} while (TRI_microtime() < startTime + 7200 &&
|
} while (TRI_microtime() < startTime + 7200 &&
|
||||||
!application_features::ApplicationServer::isStopping());
|
!application_features::ApplicationServer::isStopping());
|
||||||
|
@ -367,23 +474,58 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
||||||
LOG_TOPIC("a0dcc", ERR, Logger::CLUSTER) << errorMessage;
|
LOG_TOPIC("a0dcc", ERR, Logger::CLUSTER) << errorMessage;
|
||||||
|
|
||||||
return {errorCode, std::move(errorMessage)};
|
return {errorCode, std::move(errorMessage)};
|
||||||
|
>>>>>>> c922c5f1332482ef29dff794d8af394d31c1b737
|
||||||
}
|
}
|
||||||
|
|
||||||
//////////////////////////////////////////////////////////////////////////////
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
/// @brief clear follower list, no changes in agency necessary
|
/// @brief inject the information about "servers" and "failoverCandidates"
|
||||||
//////////////////////////////////////////////////////////////////////////////
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
void FollowerInfo::clear() {
|
void FollowerInfo::injectFollowerInfoInternal(VPackBuilder& builder) const {
|
||||||
WRITE_LOCKER(writeLocker, _dataLock);
|
auto ourselves = arangodb::ServerState::instance()->getId();
|
||||||
_followers = std::make_shared<std::vector<ServerID>>();
|
TRI_ASSERT(builder.isOpenObject());
|
||||||
|
builder.add(VPackValue(maintenance::SERVERS));
|
||||||
|
{
|
||||||
|
VPackArrayBuilder bb(&builder);
|
||||||
|
builder.add(VPackValue(ourselves));
|
||||||
|
for (auto const& f : *_followers) {
|
||||||
|
builder.add(VPackValue(f));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
builder.add(VPackValue(StaticStrings::FailoverCandidates));
|
||||||
|
{
|
||||||
|
VPackArrayBuilder bb(&builder);
|
||||||
|
builder.add(VPackValue(ourselves));
|
||||||
|
for (auto const& f : *_failoverCandidates) {
|
||||||
|
builder.add(VPackValue(f));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
TRI_ASSERT(builder.isOpenObject());
|
||||||
}
|
}
|
||||||
|
|
||||||
//////////////////////////////////////////////////////////////////////////////
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
/// @brief check whether the given server is a follower
|
/// @brief change JSON under
|
||||||
//////////////////////////////////////////////////////////////////////////////
|
/// Current/Collection/<DB-name>/<Collection-ID>/<shard-ID>
|
||||||
|
/// to add or remove a serverID, if add flag is true, the entry is added
|
||||||
|
/// (if it is not yet there), otherwise the entry is removed (if it was
|
||||||
|
/// there).
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
bool FollowerInfo::contains(ServerID const& sid) const {
|
VPackBuilder FollowerInfo::newShardEntry(VPackSlice oldValue) const {
|
||||||
READ_LOCKER(readLocker, _dataLock);
|
VPackBuilder newValue;
|
||||||
auto const& f = *_followers;
|
TRI_ASSERT(oldValue.isObject());
|
||||||
return std::find(f.begin(), f.end(), sid) != f.end();
|
{
|
||||||
|
VPackObjectBuilder b(&newValue);
|
||||||
|
// Copy all but SERVERS and FailoverCandidates.
|
||||||
|
// They will be injected later.
|
||||||
|
for (auto const& it : VPackObjectIterator(oldValue)) {
|
||||||
|
if (!it.key.isEqualString(maintenance::SERVERS) &&
|
||||||
|
!it.key.isEqualString(StaticStrings::FailoverCandidates)) {
|
||||||
|
newValue.add(it.key);
|
||||||
|
newValue.add(it.value);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
injectFollowerInfoInternal(newValue);
|
||||||
|
}
|
||||||
|
return newValue;
|
||||||
}
|
}
|
|
@ -25,11 +25,15 @@
|
||||||
#ifndef ARANGOD_CLUSTER_FOLLOWER_INFO_H
|
#ifndef ARANGOD_CLUSTER_FOLLOWER_INFO_H
|
||||||
#define ARANGOD_CLUSTER_FOLLOWER_INFO_H 1
|
#define ARANGOD_CLUSTER_FOLLOWER_INFO_H 1
|
||||||
|
|
||||||
|
#include "ClusterInfo.h"
|
||||||
|
|
||||||
#include "Basics/Mutex.h"
|
#include "Basics/Mutex.h"
|
||||||
#include "Basics/ReadWriteLock.h"
|
#include "Basics/ReadWriteLock.h"
|
||||||
#include "Basics/Result.h"
|
#include "Basics/Result.h"
|
||||||
#include "Basics/WriteLocker.h"
|
#include "Basics/WriteLocker.h"
|
||||||
#include "ClusterInfo.h"
|
#include "StorageEngine/EngineSelectorFeature.h"
|
||||||
|
#include "StorageEngine/StorageEngine.h"
|
||||||
|
#include "VocBase/LogicalCollection.h"
|
||||||
|
|
||||||
namespace arangodb {
|
namespace arangodb {
|
||||||
|
|
||||||
|
@ -44,22 +48,41 @@ class Slice;
|
||||||
class FollowerInfo {
|
class FollowerInfo {
|
||||||
// This is the list of real local followers
|
// This is the list of real local followers
|
||||||
std::shared_ptr<std::vector<ServerID> const> _followers;
|
std::shared_ptr<std::vector<ServerID> const> _followers;
|
||||||
|
// This is the list of followers that have been insync BEFORE we
|
||||||
|
// triggered a failover to this server.
|
||||||
|
// The list is filled only temporarily, and will be deleted as
|
||||||
|
// soon as we can guarantee at least so many followers locally.
|
||||||
|
std::shared_ptr<std::vector<ServerID> const> _failoverCandidates;
|
||||||
|
|
||||||
// The agencyMutex is used to synchronise access to the agency.
|
// The agencyMutex is used to synchronise access to the agency.
|
||||||
// the _dataLock is used to sync the access to local data.
|
// the _dataLock is used to sync the access to local data.
|
||||||
// The agencyMutex is always locked before the _dataLock is locked.
|
// The _canWriteLock is used to protect flag if we do have enough followers
|
||||||
|
// The locking ordering to avoid dead locks has to be as follows:
|
||||||
|
// 1.) _agencyMutex
|
||||||
|
// 2.) _canWriteLock
|
||||||
|
// 3.) _dataLock
|
||||||
mutable Mutex _agencyMutex;
|
mutable Mutex _agencyMutex;
|
||||||
|
mutable arangodb::basics::ReadWriteLock _canWriteLock;
|
||||||
mutable arangodb::basics::ReadWriteLock _dataLock;
|
mutable arangodb::basics::ReadWriteLock _dataLock;
|
||||||
|
|
||||||
arangodb::LogicalCollection* _docColl;
|
arangodb::LogicalCollection* _docColl;
|
||||||
std::string _theLeader;
|
|
||||||
// if the latter is empty, then we are leading
|
// if the latter is empty, then we are leading
|
||||||
|
std::string _theLeader;
|
||||||
bool _theLeaderTouched;
|
bool _theLeaderTouched;
|
||||||
|
// flag if we have enough insnc followers and can pass through writes
|
||||||
|
bool _canWrite;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
explicit FollowerInfo(arangodb::LogicalCollection* d)
|
explicit FollowerInfo(arangodb::LogicalCollection* d)
|
||||||
: _followers(std::make_shared<std::vector<ServerID>>()),
|
: _followers(std::make_shared<std::vector<ServerID>>()),
|
||||||
|
_failoverCandidates(std::make_shared<std::vector<ServerID>>()),
|
||||||
_docColl(d),
|
_docColl(d),
|
||||||
_theLeaderTouched(false) {}
|
_theLeader(""),
|
||||||
|
_theLeaderTouched(false),
|
||||||
|
_canWrite(_docColl->replicationFactor() <= 1) {
|
||||||
|
// On replicationfactor 1 we do not have any failover servers to maintain.
|
||||||
|
// This should also disable satellite tracking.
|
||||||
|
}
|
||||||
|
|
||||||
////////////////////////////////////////////////////////////////////////////////
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
/// @brief get information about current followers of a shard.
|
/// @brief get information about current followers of a shard.
|
||||||
|
@ -70,6 +93,23 @@ class FollowerInfo {
|
||||||
return _followers;
|
return _followers;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief get information about current followers of a shard.
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
std::shared_ptr<std::vector<ServerID> const> getFailoverCandidates() const {
|
||||||
|
READ_LOCKER(readLocker, _dataLock);
|
||||||
|
return _failoverCandidates;
|
||||||
|
}
|
||||||
|
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief Take over leadership for this shard.
|
||||||
|
/// Also inject information of a insync followers that we knew about
|
||||||
|
/// before a failover to this server has happened
|
||||||
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
|
void takeOverLeadership(std::vector<std::string> const& previousInsyncFollowers);
|
||||||
|
|
||||||
//////////////////////////////////////////////////////////////////////////////
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
/// @brief add a follower to a shard, this is only done by the server side
|
/// @brief add a follower to a shard, this is only done by the server side
|
||||||
/// of the "get-in-sync" capabilities. This reports to the agency under
|
/// of the "get-in-sync" capabilities. This reports to the agency under
|
||||||
|
@ -106,6 +146,9 @@ class FollowerInfo {
|
||||||
//////////////////////////////////////////////////////////////////////////////
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
void setTheLeader(std::string const& who) {
|
void setTheLeader(std::string const& who) {
|
||||||
|
// Empty leader => we are now new leader.
|
||||||
|
// This needs to be handled with takeOverLeadership
|
||||||
|
TRI_ASSERT(!who.empty());
|
||||||
WRITE_LOCKER(writeLocker, _dataLock);
|
WRITE_LOCKER(writeLocker, _dataLock);
|
||||||
_theLeader = who;
|
_theLeader = who;
|
||||||
_theLeaderTouched = true;
|
_theLeaderTouched = true;
|
||||||
|
@ -128,6 +171,54 @@ class FollowerInfo {
|
||||||
READ_LOCKER(readLocker, _dataLock);
|
READ_LOCKER(readLocker, _dataLock);
|
||||||
return _theLeaderTouched;
|
return _theLeaderTouched;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bool allowedToWrite() {
|
||||||
|
{
|
||||||
|
auto engine = arangodb::EngineSelectorFeature::ENGINE;
|
||||||
|
TRI_ASSERT(engine != nullptr);
|
||||||
|
if (engine->inRecovery()) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
READ_LOCKER(readLocker, _canWriteLock);
|
||||||
|
if (_canWrite) {
|
||||||
|
// Someone has decided we can write, fastPath!
|
||||||
|
|
||||||
|
#ifdef ARANGODB_USE_MAINTAINER_MODE
|
||||||
|
// Invariant, we can only WRITE if we do not have other failover candidates
|
||||||
|
READ_LOCKER(readLockerData, _dataLock);
|
||||||
|
TRI_ASSERT(_followers->size() == _failoverCandidates->size());
|
||||||
|
TRI_ASSERT(_followers->size() > _docColl->minReplicationFactor());
|
||||||
|
#endif
|
||||||
|
return _canWrite;
|
||||||
|
}
|
||||||
|
READ_LOCKER(readLockerData, _dataLock);
|
||||||
|
TRI_ASSERT(_docColl != nullptr);
|
||||||
|
if (_followers->size() + 1 < _docColl->minReplicationFactor()) {
|
||||||
|
// We know that we still do not have enough followers
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return updateFailoverCandidates();
|
||||||
|
}
|
||||||
|
|
||||||
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
/// @brief Inject the information about followers into the builder.
|
||||||
|
/// Builder needs to be an open object and is not allowed to contain
|
||||||
|
/// the keys "servers" and "failoverCandidates".
|
||||||
|
//////////////////////////////////////////////////////////////////////////////
|
||||||
|
void injectFollowerInfo(arangodb::velocypack::Builder& builder) const {
|
||||||
|
READ_LOCKER(readLockerData, _dataLock);
|
||||||
|
injectFollowerInfoInternal(builder);
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
void injectFollowerInfoInternal(arangodb::velocypack::Builder& builder) const;
|
||||||
|
|
||||||
|
bool updateFailoverCandidates();
|
||||||
|
|
||||||
|
Result persistInAgency(bool isRemove) const;
|
||||||
|
|
||||||
|
arangodb::velocypack::Builder newShardEntry(arangodb::velocypack::Slice oldValue) const;
|
||||||
};
|
};
|
||||||
} // end namespace arangodb
|
} // end namespace arangodb
|
||||||
|
|
||||||
|
|
|
@ -245,7 +245,8 @@ void handlePlanShard(VPackSlice const& cprops, VPackSlice const& ldb,
|
||||||
{THE_LEADER, shouldBeLeading ? std::string() : leaderId},
|
{THE_LEADER, shouldBeLeading ? std::string() : leaderId},
|
||||||
{SERVER_ID, serverId},
|
{SERVER_ID, serverId},
|
||||||
{LOCAL_LEADER, lcol.get(THE_LEADER).copyString()},
|
{LOCAL_LEADER, lcol.get(THE_LEADER).copyString()},
|
||||||
{FOLLOWERS_TO_DROP, followersToDropString}},
|
{FOLLOWERS_TO_DROP, followersToDropString},
|
||||||
|
{OLD_CURRENT_COUNTER, std::to_string(feature.getCurrentCounter())}},
|
||||||
HIGHER_PRIORITY, properties));
|
HIGHER_PRIORITY, properties));
|
||||||
} else {
|
} else {
|
||||||
LOG_TOPIC("0285b", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("0285b", DEBUG, Logger::MAINTENANCE)
|
||||||
|
@ -726,26 +727,14 @@ static VPackBuilder assembleLocalCollectionInfo(
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
ret.add(VPackValue(SERVERS));
|
collection->followers()->injectFollowerInfo(ret);
|
||||||
{
|
|
||||||
VPackArrayBuilder a(&ret);
|
|
||||||
ret.add(VPackValue(ourselves));
|
|
||||||
// planServers may be `none` in the case that the shard is not
|
|
||||||
// contained in Plan, but in local.
|
|
||||||
if (planServers.isArray()) {
|
|
||||||
std::shared_ptr<std::vector<std::string> const> current =
|
|
||||||
collection->followers()->get();
|
|
||||||
for (auto const& server : *current) {
|
|
||||||
ret.add(VPackValue(server));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
return ret;
|
return ret;
|
||||||
} catch (std::exception const& e) {
|
} catch (std::exception const& e) {
|
||||||
ret.clear();
|
ret.clear();
|
||||||
std::string errorMsg(
|
std::string errorMsg(
|
||||||
"Maintenance::assembleLocalCollectionInfo: Failed to lookup database ");
|
"Maintenance::assembleLocalCollectionInfo: Failed to lookup "
|
||||||
|
"database ");
|
||||||
errorMsg += database;
|
errorMsg += database;
|
||||||
errorMsg += ", exception: ";
|
errorMsg += ", exception: ";
|
||||||
errorMsg += e.what();
|
errorMsg += e.what();
|
||||||
|
@ -852,8 +841,10 @@ arangodb::Result arangodb::maintenance::reportInCurrent(
|
||||||
auto const planPath = std::vector<std::string>{dbName, colName, "shards", shName};
|
auto const planPath = std::vector<std::string>{dbName, colName, "shards", shName};
|
||||||
if (!pdbs.hasKey(planPath)) {
|
if (!pdbs.hasKey(planPath)) {
|
||||||
LOG_TOPIC("43242", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("43242", DEBUG, Logger::MAINTENANCE)
|
||||||
<< "Ooops, we have a shard for which we believe to be the leader,"
|
<< "Ooops, we have a shard for which we believe to be the "
|
||||||
" but the Plan does not have it any more, we do not report in "
|
"leader,"
|
||||||
|
" but the Plan does not have it any more, we do not report "
|
||||||
|
"in "
|
||||||
"Current about this, database: "
|
"Current about this, database: "
|
||||||
<< dbName << ", shard: " << shName;
|
<< dbName << ", shard: " << shName;
|
||||||
continue;
|
continue;
|
||||||
|
@ -863,7 +854,8 @@ arangodb::Result arangodb::maintenance::reportInCurrent(
|
||||||
if (!thePlanList.isArray() || thePlanList.length() == 0 ||
|
if (!thePlanList.isArray() || thePlanList.length() == 0 ||
|
||||||
!thePlanList[0].isString() || !thePlanList[0].isEqualStringUnchecked(serverId)) {
|
!thePlanList[0].isString() || !thePlanList[0].isEqualStringUnchecked(serverId)) {
|
||||||
LOG_TOPIC("87776", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("87776", DEBUG, Logger::MAINTENANCE)
|
||||||
<< "Ooops, we have a shard for which we believe to be the leader,"
|
<< "Ooops, we have a shard for which we believe to be the "
|
||||||
|
"leader,"
|
||||||
" but the Plan says otherwise, we do not report in Current "
|
" but the Plan says otherwise, we do not report in Current "
|
||||||
"about this, database: "
|
"about this, database: "
|
||||||
<< dbName << ", shard: " << shName;
|
<< dbName << ", shard: " << shName;
|
||||||
|
@ -923,7 +915,8 @@ arangodb::Result arangodb::maintenance::reportInCurrent(
|
||||||
if (!pdbs.hasKey(planPath)) {
|
if (!pdbs.hasKey(planPath)) {
|
||||||
LOG_TOPIC("65432", DEBUG, Logger::MAINTENANCE)
|
LOG_TOPIC("65432", DEBUG, Logger::MAINTENANCE)
|
||||||
<< "Ooops, we have a shard for which we believe that we "
|
<< "Ooops, we have a shard for which we believe that we "
|
||||||
"just resigned, but the Plan does not have it any more,"
|
"just resigned, but the Plan does not have it any "
|
||||||
|
"more,"
|
||||||
" we do not report in Current about this, database: "
|
" we do not report in Current about this, database: "
|
||||||
<< dbName << ", shard: " << shName;
|
<< dbName << ", shard: " << shName;
|
||||||
continue;
|
continue;
|
||||||
|
|
|
@ -57,7 +57,8 @@ bool findNotDoneActions(std::shared_ptr<maintenance::Action> const& action) {
|
||||||
MaintenanceFeature::MaintenanceFeature(application_features::ApplicationServer& server)
|
MaintenanceFeature::MaintenanceFeature(application_features::ApplicationServer& server)
|
||||||
: ApplicationFeature(server, "Maintenance"),
|
: ApplicationFeature(server, "Maintenance"),
|
||||||
_forceActivation(false),
|
_forceActivation(false),
|
||||||
_maintenanceThreadsMax(2) {
|
_maintenanceThreadsMax(2),
|
||||||
|
_currentCounter(0) {
|
||||||
// the number of threads will be adjusted later. it's just that we want to
|
// the number of threads will be adjusted later. it's just that we want to
|
||||||
// initialize all members properly
|
// initialize all members properly
|
||||||
|
|
||||||
|
@ -116,7 +117,8 @@ void MaintenanceFeature::validateOptions(std::shared_ptr<ProgramOptions> options
|
||||||
<< "Need at least" << minThreadLimit << "maintenance-threads";
|
<< "Need at least" << minThreadLimit << "maintenance-threads";
|
||||||
_maintenanceThreadsMax = minThreadLimit;
|
_maintenanceThreadsMax = minThreadLimit;
|
||||||
} else if (_maintenanceThreadsMax >= maxThreadLimit) {
|
} else if (_maintenanceThreadsMax >= maxThreadLimit) {
|
||||||
LOG_TOPIC("8fb0e", WARN, Logger::MAINTENANCE) << "maintenance-threads limited to " << maxThreadLimit;
|
LOG_TOPIC("8fb0e", WARN, Logger::MAINTENANCE)
|
||||||
|
<< "maintenance-threads limited to " << maxThreadLimit;
|
||||||
_maintenanceThreadsMax = maxThreadLimit;
|
_maintenanceThreadsMax = maxThreadLimit;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
@ -129,7 +131,8 @@ void MaintenanceFeature::start() {
|
||||||
|
|
||||||
// _forceActivation is set by the catch tests
|
// _forceActivation is set by the catch tests
|
||||||
if (!_forceActivation && (serverState->isAgent() || serverState->isSingleServer())) {
|
if (!_forceActivation && (serverState->isAgent() || serverState->isSingleServer())) {
|
||||||
LOG_TOPIC("deb1a", TRACE, Logger::MAINTENANCE) << "Disable maintenance-threads"
|
LOG_TOPIC("deb1a", TRACE, Logger::MAINTENANCE)
|
||||||
|
<< "Disable maintenance-threads"
|
||||||
<< " for single-server or agents.";
|
<< " for single-server or agents.";
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
@ -733,3 +736,41 @@ void MaintenanceFeature::delShardVersion(std::string const& shname) {
|
||||||
_shardVersion.erase(it);
|
_shardVersion.erase(it);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
uint64_t MaintenanceFeature::getCurrentCounter() const {
|
||||||
|
// It is guaranteed that getCurrentCounter is not executed
|
||||||
|
// concurrent to increase / wait.
|
||||||
|
// This guarantee is created by the following:
|
||||||
|
// 1) There is one inifinite loop that will call
|
||||||
|
// PhaseOne and PhaseTwo in exactly this ordering.
|
||||||
|
// It is guaranteed that only one thread at a time is
|
||||||
|
// in this loop.
|
||||||
|
// Between PhaseOne and PhaseTwo the increaseCurrentCounter is called
|
||||||
|
// Within PhaseOne this getCurrentCounter is called, but never after.
|
||||||
|
// so getCurrentCounter and increaseCurrentCounter are strictily serialized.
|
||||||
|
// 2) waitForLargerCurrentCounter can be called in concurrent threads at any time.
|
||||||
|
// It is read-only, so it is save to have it concurrent to getCurrentCounter
|
||||||
|
// without any locking.
|
||||||
|
// However we need locking for increase and waitFor in order to guarantee
|
||||||
|
// it's functionallity.
|
||||||
|
// For now we actually do not need this guard, but as this is NOT performance
|
||||||
|
// critical we can simply get it, just to be save for later use.
|
||||||
|
std::unique_lock<std::mutex> guard(_currentCounterLock);
|
||||||
|
return _currentCounter;
|
||||||
|
}
|
||||||
|
|
||||||
|
void MaintenanceFeature::increaseCurrentCounter() {
|
||||||
|
std::unique_lock<std::mutex> guard(_currentCounterLock);
|
||||||
|
_currentCounter++;
|
||||||
|
_currentCounterCondition.notify_all();
|
||||||
|
}
|
||||||
|
|
||||||
|
void MaintenanceFeature::waitForLargerCurrentCounter(uint64_t old) {
|
||||||
|
std::unique_lock<std::mutex> guard(_currentCounterLock);
|
||||||
|
if (_currentCounter > old) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
_currentCounterCondition.wait(guard);
|
||||||
|
TRI_ASSERT(_currentCounter > old);
|
||||||
|
return;
|
||||||
|
}
|
|
@ -36,7 +36,7 @@
|
||||||
|
|
||||||
namespace arangodb {
|
namespace arangodb {
|
||||||
|
|
||||||
template<typename T>
|
template <typename T>
|
||||||
struct SharedPtrComparer {
|
struct SharedPtrComparer {
|
||||||
bool operator()(std::shared_ptr<T> const& a, std::shared_ptr<T> const& b) {
|
bool operator()(std::shared_ptr<T> const& a, std::shared_ptr<T> const& b) {
|
||||||
if (a == nullptr || b == nullptr) {
|
if (a == nullptr || b == nullptr) {
|
||||||
|
@ -50,8 +50,6 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
||||||
public:
|
public:
|
||||||
explicit MaintenanceFeature(application_features::ApplicationServer&);
|
explicit MaintenanceFeature(application_features::ApplicationServer&);
|
||||||
|
|
||||||
MaintenanceFeature();
|
|
||||||
|
|
||||||
virtual ~MaintenanceFeature() {}
|
virtual ~MaintenanceFeature() {}
|
||||||
|
|
||||||
struct errors_t {
|
struct errors_t {
|
||||||
|
@ -156,7 +154,8 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
||||||
* @brief Find and return first found not-done action or nullptr
|
* @brief Find and return first found not-done action or nullptr
|
||||||
* @param desc Description of sought action
|
* @param desc Description of sought action
|
||||||
*/
|
*/
|
||||||
std::shared_ptr<maintenance::Action> findFirstNotDoneAction(std::shared_ptr<maintenance::ActionDescription> const& desc);
|
std::shared_ptr<maintenance::Action> findFirstNotDoneAction(
|
||||||
|
std::shared_ptr<maintenance::ActionDescription> const& desc);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* @brief add index error to bucket
|
* @brief add index error to bucket
|
||||||
|
@ -298,18 +297,43 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
||||||
*/
|
*/
|
||||||
void delShardVersion(std::string const& shardId);
|
void delShardVersion(std::string const& shardId);
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief Get the number of loadCurrent operations.
|
||||||
|
* NOTE: The Counter functions can be removed
|
||||||
|
* as soon as we use a push based approach on Plan and Current
|
||||||
|
* @return The most recent count for getCurrent calls
|
||||||
|
*/
|
||||||
|
uint64_t getCurrentCounter() const;
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief increase the counter for loadCurrent operations triggered
|
||||||
|
* during maintenance. This is used to delay some Actions, that
|
||||||
|
* require a recent current to continue
|
||||||
|
*/
|
||||||
|
void increaseCurrentCounter();
|
||||||
|
|
||||||
|
/**
|
||||||
|
* @brief wait until the current counter is larger then the given old one
|
||||||
|
* the idea here is to first request the `getCurrentCounter`.
|
||||||
|
* @param old The last number of getCurrentCounter(). This function will
|
||||||
|
* return only of the recent counter is larger than old.
|
||||||
|
*/
|
||||||
|
void waitForLargerCurrentCounter(uint64_t old);
|
||||||
|
|
||||||
private:
|
private:
|
||||||
/// @brief common code used by multiple constructors
|
/// @brief common code used by multiple constructors
|
||||||
void init();
|
void init();
|
||||||
|
|
||||||
/// @brief Search for first action matching hash and predicate
|
/// @brief Search for first action matching hash and predicate
|
||||||
/// @return shared pointer to action object if exists, empty shared_ptr if not
|
/// @return shared pointer to action object if exists, empty shared_ptr if not
|
||||||
std::shared_ptr<maintenance::Action> findFirstActionHash(size_t hash,
|
std::shared_ptr<maintenance::Action> findFirstActionHash(
|
||||||
|
size_t hash,
|
||||||
std::function<bool(std::shared_ptr<maintenance::Action> const&)> const& predicate);
|
std::function<bool(std::shared_ptr<maintenance::Action> const&)> const& predicate);
|
||||||
|
|
||||||
/// @brief Search for first action matching hash and predicate (with lock already held by caller)
|
/// @brief Search for first action matching hash and predicate (with lock already held by caller)
|
||||||
/// @return shared pointer to action object if exists, empty shared_ptr if not
|
/// @return shared pointer to action object if exists, empty shared_ptr if not
|
||||||
std::shared_ptr<maintenance::Action> findFirstActionHashNoLock(size_t hash,
|
std::shared_ptr<maintenance::Action> findFirstActionHashNoLock(
|
||||||
|
size_t hash,
|
||||||
std::function<bool(std::shared_ptr<maintenance::Action> const&)> const& predicate);
|
std::function<bool(std::shared_ptr<maintenance::Action> const&)> const& predicate);
|
||||||
|
|
||||||
/// @brief Search for action by Id
|
/// @brief Search for action by Id
|
||||||
|
@ -321,7 +345,6 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
||||||
std::shared_ptr<maintenance::Action> findActionIdNoLock(uint64_t hash);
|
std::shared_ptr<maintenance::Action> findActionIdNoLock(uint64_t hash);
|
||||||
|
|
||||||
protected:
|
protected:
|
||||||
|
|
||||||
/// @brief option for forcing this feature to always be enable - used by the catch tests
|
/// @brief option for forcing this feature to always be enable - used by the catch tests
|
||||||
bool _forceActivation;
|
bool _forceActivation;
|
||||||
|
|
||||||
|
@ -365,8 +388,8 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
||||||
// we need to leave the action in _prioQueue (since we cannot remove anything
|
// we need to leave the action in _prioQueue (since we cannot remove anything
|
||||||
// but the top from it), and simply put it into a different state.
|
// but the top from it), and simply put it into a different state.
|
||||||
std::priority_queue<std::shared_ptr<maintenance::Action>,
|
std::priority_queue<std::shared_ptr<maintenance::Action>,
|
||||||
std::vector<std::shared_ptr<maintenance::Action>>,
|
std::vector<std::shared_ptr<maintenance::Action>>, SharedPtrComparer<maintenance::Action>>
|
||||||
SharedPtrComparer<maintenance::Action>> _prioQueue;
|
_prioQueue;
|
||||||
|
|
||||||
/// @brief lock to protect _actionRegistry and state changes to MaintenanceActions within
|
/// @brief lock to protect _actionRegistry and state changes to MaintenanceActions within
|
||||||
mutable arangodb::basics::ReadWriteLock _actionRegistryLock;
|
mutable arangodb::basics::ReadWriteLock _actionRegistryLock;
|
||||||
|
@ -404,6 +427,15 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
||||||
/// @brief shards have versions in order to be able to distinguish between
|
/// @brief shards have versions in order to be able to distinguish between
|
||||||
/// independant actions
|
/// independant actions
|
||||||
std::unordered_map<std::string, size_t> _shardVersion;
|
std::unordered_map<std::string, size_t> _shardVersion;
|
||||||
|
|
||||||
|
/// @brief Mutex for the current counter condition variable
|
||||||
|
mutable std::mutex _currentCounterLock;
|
||||||
|
|
||||||
|
/// @brief Condition variable where Actions can wait on until _currentCounter increased
|
||||||
|
std::condition_variable _currentCounterCondition;
|
||||||
|
|
||||||
|
/// @brief counter for load_current requests.
|
||||||
|
uint64_t _currentCounter;
|
||||||
};
|
};
|
||||||
|
|
||||||
} // namespace arangodb
|
} // namespace arangodb
|
||||||
|
|
|
@ -69,6 +69,7 @@ constexpr char const* THE_LEADER = "theLeader";
|
||||||
constexpr char const* UNDERSCORE = "_";
|
constexpr char const* UNDERSCORE = "_";
|
||||||
constexpr char const* UPDATE_COLLECTION = "UpdateCollection";
|
constexpr char const* UPDATE_COLLECTION = "UpdateCollection";
|
||||||
constexpr char const* WAIT_FOR_SYNC = "waitForSync";
|
constexpr char const* WAIT_FOR_SYNC = "waitForSync";
|
||||||
|
constexpr char const* OLD_CURRENT_COUNTER = "oldCurrentCounter";
|
||||||
|
|
||||||
} // namespace maintenance
|
} // namespace maintenance
|
||||||
} // namespace arangodb
|
} // namespace arangodb
|
||||||
|
|
|
@ -77,21 +77,36 @@ UpdateCollection::UpdateCollection(MaintenanceFeature& feature, ActionDescriptio
|
||||||
}
|
}
|
||||||
TRI_ASSERT(desc.has(FOLLOWERS_TO_DROP));
|
TRI_ASSERT(desc.has(FOLLOWERS_TO_DROP));
|
||||||
|
|
||||||
|
TRI_ASSERT(desc.has(OLD_CURRENT_COUNTER));
|
||||||
|
|
||||||
if (!error.str().empty()) {
|
if (!error.str().empty()) {
|
||||||
LOG_TOPIC("a6e4c", ERR, Logger::MAINTENANCE) << "UpdateCollection: " << error.str();
|
LOG_TOPIC("a6e4c", ERR, Logger::MAINTENANCE)
|
||||||
|
<< "UpdateCollection: " << error.str();
|
||||||
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
||||||
setState(FAILED);
|
setState(FAILED);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
void handleLeadership(LogicalCollection& collection, std::string const& localLeader,
|
void handleLeadership(LogicalCollection& collection, std::string const& localLeader,
|
||||||
std::string const& plannedLeader, std::string const& followersToDrop) {
|
std::string const& plannedLeader,
|
||||||
|
std::string const& followersToDrop, std::string const& databaseName,
|
||||||
|
uint64_t oldCounter, MaintenanceFeature& feature) {
|
||||||
auto& followers = collection.followers();
|
auto& followers = collection.followers();
|
||||||
|
|
||||||
if (plannedLeader.empty()) { // Planned to lead
|
if (plannedLeader.empty()) { // Planned to lead
|
||||||
if (!localLeader.empty()) { // We were not leader, assume leadership
|
if (!localLeader.empty()) { // We were not leader, assume leadership
|
||||||
followers->setTheLeader(std::string());
|
// This will block the thread until we fetched a new current version
|
||||||
followers->clear();
|
// in maintenance main thread.
|
||||||
|
feature.waitForLargerCurrentCounter(oldCounter);
|
||||||
|
auto currentInfo = ClusterInfo::instance()->getCollectionCurrent(
|
||||||
|
databaseName, std::to_string(collection.planId()));
|
||||||
|
if (currentInfo == nullptr) {
|
||||||
|
// Collection has been dropped we cannot continue here.
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
TRI_ASSERT(currentInfo != nullptr);
|
||||||
|
auto failoverCandidates = currentInfo->failoverCandidates(collection.name());
|
||||||
|
followers->takeOverLeadership(failoverCandidates);
|
||||||
transaction::cluster::abortFollowerTransactionsOnShard(collection.id());
|
transaction::cluster::abortFollowerTransactionsOnShard(collection.id());
|
||||||
} else {
|
} else {
|
||||||
// If someone (the Supervision most likely) has thrown
|
// If someone (the Supervision most likely) has thrown
|
||||||
|
@ -138,6 +153,8 @@ bool UpdateCollection::first() {
|
||||||
auto const& localLeader = _description.get(LOCAL_LEADER);
|
auto const& localLeader = _description.get(LOCAL_LEADER);
|
||||||
auto const& followersToDrop = _description.get(FOLLOWERS_TO_DROP);
|
auto const& followersToDrop = _description.get(FOLLOWERS_TO_DROP);
|
||||||
auto const& props = properties();
|
auto const& props = properties();
|
||||||
|
auto const& oldCounterString = _description.get(OLD_CURRENT_COUNTER);
|
||||||
|
uint64_t oldCounter = basics::StringUtils::uint64(oldCounterString);
|
||||||
|
|
||||||
try {
|
try {
|
||||||
DatabaseGuard guard(database);
|
DatabaseGuard guard(database);
|
||||||
|
@ -152,7 +169,8 @@ bool UpdateCollection::first() {
|
||||||
// resignation case is not handled here, since then
|
// resignation case is not handled here, since then
|
||||||
// ourselves does not appear in shards[shard] but only
|
// ourselves does not appear in shards[shard] but only
|
||||||
// "_" + ourselves.
|
// "_" + ourselves.
|
||||||
handleLeadership(*coll, localLeader, plannedLeader, followersToDrop);
|
handleLeadership(*coll, localLeader, plannedLeader, followersToDrop,
|
||||||
|
vocbase.name(), oldCounter, feature());
|
||||||
_result = Collections::updateProperties(*coll, props, false); // always a full-update
|
_result = Collections::updateProperties(*coll, props, false); // always a full-update
|
||||||
|
|
||||||
if (!_result.ok()) {
|
if (!_result.ok()) {
|
||||||
|
@ -173,7 +191,8 @@ bool UpdateCollection::first() {
|
||||||
std::stringstream error;
|
std::stringstream error;
|
||||||
|
|
||||||
error << "action " << _description << " failed with exception " << e.what();
|
error << "action " << _description << " failed with exception " << e.what();
|
||||||
LOG_TOPIC("79442", WARN, Logger::MAINTENANCE) << "UpdateCollection: " << error.str();
|
LOG_TOPIC("79442", WARN, Logger::MAINTENANCE)
|
||||||
|
<< "UpdateCollection: " << error.str();
|
||||||
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -65,6 +65,11 @@ enum class RequestLane {
|
||||||
// V8 or having high priority.
|
// V8 or having high priority.
|
||||||
CLUSTER_INTERNAL,
|
CLUSTER_INTERNAL,
|
||||||
|
|
||||||
|
// For requests from the DBserver to the Coordinator or
|
||||||
|
// from the Coordinator to the DBserver. Using AQL
|
||||||
|
// these have Medium priority.
|
||||||
|
CLUSTER_AQL,
|
||||||
|
|
||||||
// For requests from the from the Coordinator to the
|
// For requests from the from the Coordinator to the
|
||||||
// DBserver using V8.
|
// DBserver using V8.
|
||||||
CLUSTER_V8,
|
CLUSTER_V8,
|
||||||
|
@ -115,6 +120,8 @@ inline RequestPriority PriorityRequestLane(RequestLane lane) {
|
||||||
return RequestPriority::LOW;
|
return RequestPriority::LOW;
|
||||||
case RequestLane::CLUSTER_INTERNAL:
|
case RequestLane::CLUSTER_INTERNAL:
|
||||||
return RequestPriority::HIGH;
|
return RequestPriority::HIGH;
|
||||||
|
case RequestLane::CLUSTER_AQL:
|
||||||
|
return RequestPriority::MED;
|
||||||
case RequestLane::CLUSTER_V8:
|
case RequestLane::CLUSTER_V8:
|
||||||
return RequestPriority::LOW;
|
return RequestPriority::LOW;
|
||||||
case RequestLane::CLUSTER_ADMIN:
|
case RequestLane::CLUSTER_ADMIN:
|
||||||
|
|
|
@ -41,9 +41,7 @@ class InternalRestTraverserHandler : public RestVocbaseBaseHandler {
|
||||||
char const* name() const override final {
|
char const* name() const override final {
|
||||||
return "InternalRestTraverserHandler";
|
return "InternalRestTraverserHandler";
|
||||||
}
|
}
|
||||||
RequestLane lane() const override final {
|
RequestLane lane() const override final { return RequestLane::CLUSTER_AQL; }
|
||||||
return RequestLane::CLUSTER_INTERNAL;
|
|
||||||
}
|
|
||||||
|
|
||||||
private:
|
private:
|
||||||
// @brief create a new Traverser Engine.
|
// @brief create a new Traverser Engine.
|
||||||
|
|
|
@ -82,7 +82,7 @@ Conductor::Conductor(uint64_t executionNumber, TRI_vocbase_t& vocbase,
|
||||||
if (_asyncMode) {
|
if (_asyncMode) {
|
||||||
LOG_TOPIC("1b1c2", DEBUG, Logger::PREGEL) << "Running in async mode";
|
LOG_TOPIC("1b1c2", DEBUG, Logger::PREGEL) << "Running in async mode";
|
||||||
}
|
}
|
||||||
VPackSlice lazy = _userParams.slice().get( Utils::lazyLoadingKey);
|
VPackSlice lazy = _userParams.slice().get(Utils::lazyLoadingKey);
|
||||||
_lazyLoading = _algorithm->supportsLazyLoading();
|
_lazyLoading = _algorithm->supportsLazyLoading();
|
||||||
_lazyLoading = _lazyLoading && (lazy.isNone() || lazy.getBoolean());
|
_lazyLoading = _lazyLoading && (lazy.isNone() || lazy.getBoolean());
|
||||||
if (_lazyLoading) {
|
if (_lazyLoading) {
|
||||||
|
@ -98,8 +98,7 @@ Conductor::Conductor(uint64_t executionNumber, TRI_vocbase_t& vocbase,
|
||||||
}
|
}
|
||||||
|
|
||||||
Conductor::~Conductor() {
|
Conductor::~Conductor() {
|
||||||
if (_state != ExecutionState::CANCELED &&
|
if (_state != ExecutionState::CANCELED && _state != ExecutionState::DEFAULT) {
|
||||||
_state != ExecutionState::DEFAULT) {
|
|
||||||
try {
|
try {
|
||||||
this->cancel();
|
this->cancel();
|
||||||
} catch (...) {
|
} catch (...) {
|
||||||
|
@ -120,11 +119,13 @@ void Conductor::start() {
|
||||||
_globalSuperstep = 0;
|
_globalSuperstep = 0;
|
||||||
_state = ExecutionState::RUNNING;
|
_state = ExecutionState::RUNNING;
|
||||||
|
|
||||||
LOG_TOPIC("3a255", DEBUG, Logger::PREGEL) << "Telling workers to load the data";
|
LOG_TOPIC("3a255", DEBUG, Logger::PREGEL)
|
||||||
|
<< "Telling workers to load the data";
|
||||||
int res = _initializeWorkers(Utils::startExecutionPath, VPackSlice());
|
int res = _initializeWorkers(Utils::startExecutionPath, VPackSlice());
|
||||||
if (res != TRI_ERROR_NO_ERROR) {
|
if (res != TRI_ERROR_NO_ERROR) {
|
||||||
_state = ExecutionState::CANCELED;
|
_state = ExecutionState::CANCELED;
|
||||||
LOG_TOPIC("30171", ERR, Logger::PREGEL) << "Not all DBServers started the execution";
|
LOG_TOPIC("30171", ERR, Logger::PREGEL)
|
||||||
|
<< "Not all DBServers started the execution";
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -170,7 +171,8 @@ bool Conductor::_startGlobalStep() {
|
||||||
_masterContext->_enterNextGSS = false;
|
_masterContext->_enterNextGSS = false;
|
||||||
proceed = _masterContext->postGlobalSuperstep();
|
proceed = _masterContext->postGlobalSuperstep();
|
||||||
if (!proceed) {
|
if (!proceed) {
|
||||||
LOG_TOPIC("0aa8e", DEBUG, Logger::PREGEL) << "Master context ended execution";
|
LOG_TOPIC("0aa8e", DEBUG, Logger::PREGEL)
|
||||||
|
<< "Master context ended execution";
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -212,7 +214,8 @@ bool Conductor::_startGlobalStep() {
|
||||||
res = _sendToAllDBServers(Utils::startGSSPath, b); // call me maybe
|
res = _sendToAllDBServers(Utils::startGSSPath, b); // call me maybe
|
||||||
if (res != TRI_ERROR_NO_ERROR) {
|
if (res != TRI_ERROR_NO_ERROR) {
|
||||||
_state = ExecutionState::IN_ERROR;
|
_state = ExecutionState::IN_ERROR;
|
||||||
LOG_TOPIC("f34bb", ERR, Logger::PREGEL) << "Conductor could not start GSS " << _globalSuperstep;
|
LOG_TOPIC("f34bb", ERR, Logger::PREGEL)
|
||||||
|
<< "Conductor could not start GSS " << _globalSuperstep;
|
||||||
// the recovery mechanisms should take care od this
|
// the recovery mechanisms should take care od this
|
||||||
} else {
|
} else {
|
||||||
LOG_TOPIC("411a5", DEBUG, Logger::PREGEL) << "Conductor started new gss " << _globalSuperstep;
|
LOG_TOPIC("411a5", DEBUG, Logger::PREGEL) << "Conductor started new gss " << _globalSuperstep;
|
||||||
|
@ -236,8 +239,9 @@ void Conductor::finishedWorkerStartup(VPackSlice const& data) {
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|
||||||
LOG_TOPIC("76631", INFO, Logger::PREGEL) << "Running pregel with " << _totalVerticesCount
|
LOG_TOPIC("76631", INFO, Logger::PREGEL)
|
||||||
<< " vertices, " << _totalEdgesCount << " edges";
|
<< "Running pregel with " << _totalVerticesCount << " vertices, "
|
||||||
|
<< _totalEdgesCount << " edges";
|
||||||
if (_masterContext) {
|
if (_masterContext) {
|
||||||
_masterContext->_globalSuperstep = 0;
|
_masterContext->_globalSuperstep = 0;
|
||||||
_masterContext->_vertexCount = _totalVerticesCount;
|
_masterContext->_vertexCount = _totalVerticesCount;
|
||||||
|
@ -356,7 +360,8 @@ void Conductor::finishedRecoveryStep(VPackSlice const& data) {
|
||||||
res = _sendToAllDBServers(Utils::continueRecoveryPath, b);
|
res = _sendToAllDBServers(Utils::continueRecoveryPath, b);
|
||||||
|
|
||||||
} else {
|
} else {
|
||||||
LOG_TOPIC("6ecf2", INFO, Logger::PREGEL) << "Recovery finished. Proceeding normally";
|
LOG_TOPIC("6ecf2", INFO, Logger::PREGEL)
|
||||||
|
<< "Recovery finished. Proceeding normally";
|
||||||
|
|
||||||
// build the message, works for all cases
|
// build the message, works for all cases
|
||||||
VPackBuilder b;
|
VPackBuilder b;
|
||||||
|
@ -393,7 +398,8 @@ void Conductor::startRecovery() {
|
||||||
if (_state != ExecutionState::RUNNING && _state != ExecutionState::IN_ERROR) {
|
if (_state != ExecutionState::RUNNING && _state != ExecutionState::IN_ERROR) {
|
||||||
return; // maybe we are already in recovery mode
|
return; // maybe we are already in recovery mode
|
||||||
} else if (_algorithm->supportsCompensation() == false) {
|
} else if (_algorithm->supportsCompensation() == false) {
|
||||||
LOG_TOPIC("12e0e", ERR, Logger::PREGEL) << "Algorithm does not support recovery";
|
LOG_TOPIC("12e0e", ERR, Logger::PREGEL)
|
||||||
|
<< "Algorithm does not support recovery";
|
||||||
cancelNoLock();
|
cancelNoLock();
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
@ -407,14 +413,15 @@ void Conductor::startRecovery() {
|
||||||
|
|
||||||
// let's wait for a final state in the cluster
|
// let's wait for a final state in the cluster
|
||||||
_workHandle = SchedulerFeature::SCHEDULER->queueDelay(
|
_workHandle = SchedulerFeature::SCHEDULER->queueDelay(
|
||||||
RequestLane::CLUSTER_INTERNAL, std::chrono::seconds(2), [this](bool cancelled) {
|
RequestLane::CLUSTER_AQL, std::chrono::seconds(2), [this](bool cancelled) {
|
||||||
if (cancelled || _state != ExecutionState::RECOVERING) {
|
if (cancelled || _state != ExecutionState::RECOVERING) {
|
||||||
return; // seems like we are canceled
|
return; // seems like we are canceled
|
||||||
}
|
}
|
||||||
std::vector<ServerID> goodServers;
|
std::vector<ServerID> goodServers;
|
||||||
int res = PregelFeature::instance()->recoveryManager()->filterGoodServers(_dbServers, goodServers);
|
int res = PregelFeature::instance()->recoveryManager()->filterGoodServers(_dbServers, goodServers);
|
||||||
if (res != TRI_ERROR_NO_ERROR) {
|
if (res != TRI_ERROR_NO_ERROR) {
|
||||||
LOG_TOPIC("3d08b", ERR, Logger::PREGEL) << "Recovery proceedings failed";
|
LOG_TOPIC("3d08b", ERR, Logger::PREGEL)
|
||||||
|
<< "Recovery proceedings failed";
|
||||||
cancelNoLock();
|
cancelNoLock();
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
@ -614,8 +621,8 @@ int Conductor::_initializeWorkers(std::string const& suffix, VPackSlice addition
|
||||||
}
|
}
|
||||||
|
|
||||||
std::shared_ptr<ClusterComm> cc = ClusterComm::instance();
|
std::shared_ptr<ClusterComm> cc = ClusterComm::instance();
|
||||||
size_t nrGood = cc->performRequests(requests, 5.0 * 60.0,
|
size_t nrGood =
|
||||||
LogTopic("Pregel Conductor"), false);
|
cc->performRequests(requests, 5.0 * 60.0, LogTopic("Pregel Conductor"), false);
|
||||||
Utils::printResponses(requests);
|
Utils::printResponses(requests);
|
||||||
return nrGood == requests.size() ? TRI_ERROR_NO_ERROR : TRI_ERROR_FAILED;
|
return nrGood == requests.size() ? TRI_ERROR_NO_ERROR : TRI_ERROR_FAILED;
|
||||||
}
|
}
|
||||||
|
@ -651,7 +658,6 @@ int Conductor::_finalizeWorkers() {
|
||||||
}
|
}
|
||||||
|
|
||||||
void Conductor::finishedWorkerFinalize(VPackSlice data) {
|
void Conductor::finishedWorkerFinalize(VPackSlice data) {
|
||||||
|
|
||||||
MUTEX_LOCKER(guard, _callbackMutex);
|
MUTEX_LOCKER(guard, _callbackMutex);
|
||||||
_ensureUniqueResponse(data);
|
_ensureUniqueResponse(data);
|
||||||
if (_respondedServers.size() != _dbServers.size()) {
|
if (_respondedServers.size() != _dbServers.size()) {
|
||||||
|
@ -675,8 +681,7 @@ void Conductor::finishedWorkerFinalize(VPackSlice data) {
|
||||||
LOG_TOPIC("063b5", INFO, Logger::PREGEL) << "Done. We did " << _globalSuperstep << " rounds";
|
LOG_TOPIC("063b5", INFO, Logger::PREGEL) << "Done. We did " << _globalSuperstep << " rounds";
|
||||||
LOG_TOPIC("3cfa8", INFO, Logger::PREGEL)
|
LOG_TOPIC("3cfa8", INFO, Logger::PREGEL)
|
||||||
<< "Startup Time: " << _computationStartTimeSecs - _startTimeSecs << "s";
|
<< "Startup Time: " << _computationStartTimeSecs - _startTimeSecs << "s";
|
||||||
LOG_TOPIC("d43cb", INFO, Logger::PREGEL)
|
LOG_TOPIC("d43cb", INFO, Logger::PREGEL) << "Computation Time: " << compTime << "s";
|
||||||
<< "Computation Time: " << compTime << "s";
|
|
||||||
LOG_TOPIC("74e05", INFO, Logger::PREGEL) << "Storage Time: " << storeTime << "s";
|
LOG_TOPIC("74e05", INFO, Logger::PREGEL) << "Storage Time: " << storeTime << "s";
|
||||||
LOG_TOPIC("06f03", INFO, Logger::PREGEL) << "Overall: " << totalRuntimeSecs() << "s";
|
LOG_TOPIC("06f03", INFO, Logger::PREGEL) << "Overall: " << totalRuntimeSecs() << "s";
|
||||||
LOG_TOPIC("03f2e", DEBUG, Logger::PREGEL) << "Stats: " << debugOut.toString();
|
LOG_TOPIC("03f2e", DEBUG, Logger::PREGEL) << "Stats: " << debugOut.toString();
|
||||||
|
@ -686,7 +691,7 @@ void Conductor::finishedWorkerFinalize(VPackSlice data) {
|
||||||
auto* scheduler = SchedulerFeature::SCHEDULER;
|
auto* scheduler = SchedulerFeature::SCHEDULER;
|
||||||
if (scheduler) {
|
if (scheduler) {
|
||||||
uint64_t exe = _executionNumber;
|
uint64_t exe = _executionNumber;
|
||||||
scheduler->queue(RequestLane::CLUSTER_INTERNAL, [exe] {
|
scheduler->queue(RequestLane::CLUSTER_AQL, [exe] {
|
||||||
auto pf = PregelFeature::instance();
|
auto pf = PregelFeature::instance();
|
||||||
if (pf) {
|
if (pf) {
|
||||||
pf->cleanupConductor(exe);
|
pf->cleanupConductor(exe);
|
||||||
|
@ -770,8 +775,7 @@ int Conductor::_sendToAllDBServers(std::string const& path, VPackBuilder const&
|
||||||
if (conductor) {
|
if (conductor) {
|
||||||
TRI_vocbase_t& vocbase = conductor->_vocbaseGuard.database();
|
TRI_vocbase_t& vocbase = conductor->_vocbaseGuard.database();
|
||||||
VPackBuilder response;
|
VPackBuilder response;
|
||||||
PregelFeature::handleWorkerRequest(vocbase, path,
|
PregelFeature::handleWorkerRequest(vocbase, path, message.slice(), response);
|
||||||
message.slice(), response);
|
|
||||||
}
|
}
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
@ -794,9 +798,10 @@ int Conductor::_sendToAllDBServers(std::string const& path, VPackBuilder const&
|
||||||
requests.emplace_back("server:" + server, rest::RequestType::POST, base + path, body);
|
requests.emplace_back("server:" + server, rest::RequestType::POST, base + path, body);
|
||||||
}
|
}
|
||||||
|
|
||||||
size_t nrGood = cc->performRequests(requests, 5.0 * 60.0,
|
size_t nrGood =
|
||||||
LogTopic("Pregel Conductor"), false);
|
cc->performRequests(requests, 5.0 * 60.0, LogTopic("Pregel Conductor"), false);
|
||||||
LOG_TOPIC("9de62", TRACE, Logger::PREGEL) << "Send " << path << " to " << nrGood << " servers";
|
LOG_TOPIC("9de62", TRACE, Logger::PREGEL)
|
||||||
|
<< "Send " << path << " to " << nrGood << " servers";
|
||||||
Utils::printResponses(requests);
|
Utils::printResponses(requests);
|
||||||
if (handle && nrGood == requests.size()) {
|
if (handle && nrGood == requests.size()) {
|
||||||
for (ClusterCommRequest const& req : requests) {
|
for (ClusterCommRequest const& req : requests) {
|
||||||
|
|
|
@ -1147,7 +1147,8 @@ Result RestReplicationHandler::processRestoreCollectionCoordinator(
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!isValidMinReplFactorSlice) {
|
if (!isValidMinReplFactorSlice) {
|
||||||
if (replFactorSlice.isString() && replFactorSlice.isEqualString("satellite")) {
|
if (replFactorSlice.isString() &&
|
||||||
|
replFactorSlice.isEqualString("satellite")) {
|
||||||
minReplicationFactor = 0;
|
minReplicationFactor = 0;
|
||||||
} else if (minReplicationFactor <= 0) {
|
} else if (minReplicationFactor <= 0) {
|
||||||
minReplicationFactor = 1;
|
minReplicationFactor = 1;
|
||||||
|
|
|
@ -53,9 +53,9 @@ bool isDirectDeadlockLane(RequestLane lane) {
|
||||||
// Those tasks can not be executed directly.
|
// Those tasks can not be executed directly.
|
||||||
return lane == RequestLane::TASK_V8 || lane == RequestLane::CLIENT_V8 ||
|
return lane == RequestLane::TASK_V8 || lane == RequestLane::CLIENT_V8 ||
|
||||||
lane == RequestLane::CLUSTER_V8 || lane == RequestLane::INTERNAL_LOW ||
|
lane == RequestLane::CLUSTER_V8 || lane == RequestLane::INTERNAL_LOW ||
|
||||||
lane == RequestLane::SERVER_REPLICATION ||
|
lane == RequestLane::SERVER_REPLICATION || lane == RequestLane::CLUSTER_ADMIN ||
|
||||||
lane == RequestLane::CLUSTER_ADMIN || lane == RequestLane::CLUSTER_INTERNAL ||
|
lane == RequestLane::CLUSTER_INTERNAL || lane == RequestLane::AGENCY_CLUSTER ||
|
||||||
lane == RequestLane::AGENCY_CLUSTER || lane == RequestLane::CLIENT_AQL;
|
lane == RequestLane::CLIENT_AQL || lane == RequestLane::CLUSTER_AQL;
|
||||||
}
|
}
|
||||||
|
|
||||||
} // namespace
|
} // namespace
|
||||||
|
|
|
@ -1720,7 +1720,7 @@ OperationResult transaction::Methods::insertLocal(std::string const& collectionN
|
||||||
if (!options.isSynchronousReplicationFrom.empty()) {
|
if (!options.isSynchronousReplicationFrom.empty()) {
|
||||||
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION, options);
|
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION, options);
|
||||||
}
|
}
|
||||||
if (followerInfo->get()->size() + 1 < collection->minReplicationFactor()) {
|
if (!followerInfo->allowedToWrite()) {
|
||||||
// We cannot fulfill minimum replication Factor.
|
// We cannot fulfill minimum replication Factor.
|
||||||
// Reject write.
|
// Reject write.
|
||||||
LOG_TOPIC("d7306", ERR, Logger::REPLICATION)
|
LOG_TOPIC("d7306", ERR, Logger::REPLICATION)
|
||||||
|
@ -2044,7 +2044,7 @@ OperationResult transaction::Methods::modifyLocal(std::string const& collectionN
|
||||||
if (!options.isSynchronousReplicationFrom.empty()) {
|
if (!options.isSynchronousReplicationFrom.empty()) {
|
||||||
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
||||||
}
|
}
|
||||||
if (followerInfo->get()->size() + 1 < collection->minReplicationFactor()) {
|
if (!followerInfo->allowedToWrite()) {
|
||||||
// We cannot fulfill minimum replication Factor.
|
// We cannot fulfill minimum replication Factor.
|
||||||
// Reject write.
|
// Reject write.
|
||||||
LOG_TOPIC("2e35a", ERR, Logger::REPLICATION)
|
LOG_TOPIC("2e35a", ERR, Logger::REPLICATION)
|
||||||
|
@ -2324,7 +2324,7 @@ OperationResult transaction::Methods::removeLocal(std::string const& collectionN
|
||||||
if (!options.isSynchronousReplicationFrom.empty()) {
|
if (!options.isSynchronousReplicationFrom.empty()) {
|
||||||
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
||||||
}
|
}
|
||||||
if (followerInfo->get()->size() + 1 < collection->minReplicationFactor()) {
|
if (!followerInfo->allowedToWrite()) {
|
||||||
// We cannot fulfill minimum replication Factor.
|
// We cannot fulfill minimum replication Factor.
|
||||||
// Reject write.
|
// Reject write.
|
||||||
LOG_TOPIC("f1f8e", ERR, Logger::REPLICATION)
|
LOG_TOPIC("f1f8e", ERR, Logger::REPLICATION)
|
||||||
|
@ -2559,7 +2559,7 @@ OperationResult transaction::Methods::truncateLocal(std::string const& collectio
|
||||||
if (!options.isSynchronousReplicationFrom.empty()) {
|
if (!options.isSynchronousReplicationFrom.empty()) {
|
||||||
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
||||||
}
|
}
|
||||||
if (followerInfo->get()->size() + 1 < collection->minReplicationFactor()) {
|
if (!followerInfo->allowedToWrite()) {
|
||||||
// We cannot fulfill minimum replication Factor.
|
// We cannot fulfill minimum replication Factor.
|
||||||
// Reject write.
|
// Reject write.
|
||||||
LOG_TOPIC("7c1d4", ERR, Logger::REPLICATION)
|
LOG_TOPIC("7c1d4", ERR, Logger::REPLICATION)
|
||||||
|
|
|
@ -228,6 +228,7 @@ std::string const StaticStrings::GraphCreateCollection("createCollection");
|
||||||
|
|
||||||
// Replication
|
// Replication
|
||||||
std::string const StaticStrings::ReplicationSoftLockOnly("doSoftLockOnly");
|
std::string const StaticStrings::ReplicationSoftLockOnly("doSoftLockOnly");
|
||||||
|
std::string const StaticStrings::FailoverCandidates("failoverCandidates");
|
||||||
|
|
||||||
// misc strings
|
// misc strings
|
||||||
std::string const StaticStrings::LastValue("lastValue");
|
std::string const StaticStrings::LastValue("lastValue");
|
||||||
|
|
|
@ -210,6 +210,7 @@ class StaticStrings {
|
||||||
|
|
||||||
// Replication
|
// Replication
|
||||||
static std::string const ReplicationSoftLockOnly;
|
static std::string const ReplicationSoftLockOnly;
|
||||||
|
static std::string const FailoverCandidates;
|
||||||
|
|
||||||
// misc strings
|
// misc strings
|
||||||
static std::string const LastValue;
|
static std::string const LastValue;
|
||||||
|
|
|
@ -252,13 +252,39 @@ class IResearchQueryOptimizationTest : public ::testing::Test {
|
||||||
|
|
||||||
NS_END
|
NS_END
|
||||||
|
|
||||||
|
static std::vector<std::string> const EMPTY;
|
||||||
|
|
||||||
// -----------------------------------------------------------------------------
|
// -----------------------------------------------------------------------------
|
||||||
// --SECTION-- test suite
|
// --SECTION-- test suite
|
||||||
// -----------------------------------------------------------------------------
|
// -----------------------------------------------------------------------------
|
||||||
|
|
||||||
|
void addLinkToCollection(std::shared_ptr<arangodb::iresearch::IResearchView>& view) {
|
||||||
|
auto updateJson = VPackParser::fromJson(
|
||||||
|
"{ \"links\" : {"
|
||||||
|
"\"collection_1\" : { \"includeAllFields\" : true }"
|
||||||
|
"}}");
|
||||||
|
EXPECT_TRUE((view->properties(updateJson->slice(), true).ok()));
|
||||||
|
|
||||||
|
arangodb::velocypack::Builder builder;
|
||||||
|
|
||||||
|
builder.openObject();
|
||||||
|
view->properties(builder, arangodb::LogicalDataSource::makeFlags(
|
||||||
|
arangodb::LogicalDataSource::Serialize::Detailed));
|
||||||
|
builder.close();
|
||||||
|
|
||||||
|
auto slice = builder.slice();
|
||||||
|
EXPECT_TRUE(slice.isObject());
|
||||||
|
EXPECT_TRUE(slice.get("name").copyString() == "testView");
|
||||||
|
EXPECT_TRUE(slice.get("type").copyString() ==
|
||||||
|
arangodb::iresearch::DATA_SOURCE_TYPE.name());
|
||||||
|
EXPECT_TRUE(slice.get("deleted").isNone()); // no system properties
|
||||||
|
auto tmpSlice = slice.get("links");
|
||||||
|
EXPECT_TRUE((true == tmpSlice.isObject() && 1 == tmpSlice.length()));
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
// dedicated to https://github.com/arangodb/arangodb/issues/8294
|
// dedicated to https://github.com/arangodb/arangodb/issues/8294
|
||||||
TEST_F(IResearchQueryOptimizationTest, test) {
|
TEST_F(IResearchQueryOptimizationTest, test) {
|
||||||
static std::vector<std::string> const EMPTY;
|
|
||||||
|
|
||||||
auto createJson = VPackParser::fromJson(
|
auto createJson = VPackParser::fromJson(
|
||||||
"{ \
|
"{ \
|
||||||
|
@ -285,29 +311,7 @@ TEST_F(IResearchQueryOptimizationTest, test) {
|
||||||
ASSERT_TRUE((false == !view));
|
ASSERT_TRUE((false == !view));
|
||||||
|
|
||||||
// add link to collection
|
// add link to collection
|
||||||
{
|
addLinkToCollection(view);
|
||||||
auto updateJson = VPackParser::fromJson(
|
|
||||||
"{ \"links\" : {"
|
|
||||||
"\"collection_1\" : { \"includeAllFields\" : true }"
|
|
||||||
"}}");
|
|
||||||
EXPECT_TRUE((view->properties(updateJson->slice(), true).ok()));
|
|
||||||
|
|
||||||
arangodb::velocypack::Builder builder;
|
|
||||||
|
|
||||||
builder.openObject();
|
|
||||||
view->properties(builder, arangodb::LogicalDataSource::makeFlags(
|
|
||||||
arangodb::LogicalDataSource::Serialize::Detailed));
|
|
||||||
builder.close();
|
|
||||||
|
|
||||||
auto slice = builder.slice();
|
|
||||||
EXPECT_TRUE(slice.isObject());
|
|
||||||
EXPECT_TRUE(slice.get("name").copyString() == "testView");
|
|
||||||
EXPECT_TRUE(slice.get("type").copyString() ==
|
|
||||||
arangodb::iresearch::DATA_SOURCE_TYPE.name());
|
|
||||||
EXPECT_TRUE(slice.get("deleted").isNone()); // no system properties
|
|
||||||
auto tmpSlice = slice.get("links");
|
|
||||||
EXPECT_TRUE((true == tmpSlice.isObject() && 1 == tmpSlice.length()));
|
|
||||||
}
|
|
||||||
|
|
||||||
std::deque<arangodb::ManagedDocumentResult> insertedDocs;
|
std::deque<arangodb::ManagedDocumentResult> insertedDocs;
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue