mirror of https://gitee.com/bigwinds/arangodb
Bug fix/failover with min replication factor (#9486)
* Improve collection time of IResearchQueryOptimizationTest * Added a minReplicationFactor field in Collections. It is not possible to modify it yet and noone cares for it * Added some assertion son minReplicationFactor * Transaction API will now reject writes as soon as minimal replication factor is NOT fulfilled * added minReplicationFactor to the user interface, preparation for the collection api changes * added minReplicationFactor to VocBaseCollection, RestReplicationHandler, RestCollectionHandler, ClusterMethods, ClusterInfo and ClusterCollectionCreationInfo * added minReplicationFactor usage to tests * TODO TEMOPORARY COMMIT FOR TESTING PLEASE REVERT ME * minReplicationFactor now able to change via collection properties route * fixed wrongly assert * added minReplicationFactor to the graph management ui * added minReplicationFactor to the gharial api * Fixed off-by-one error in minReplicationFactor. We actually enforced one more. * adjusted description of minReplicationFactor * FollowerInfo Refactoring * added gharial api graph creation tests with minimal replication factor * proper cleanup of shell collection tests, removed lots of duplicate code, preparation for some new tests * added collection create tests using invalid/valid names, replicationFactor and minReplicationFactor * Debug logging * MORE Debug logging * Included replication fast lane * Use correct minreplicationfactor * modified debug logging * Fixed compileissues * MORE Debug logging * MORE Debug logging * MORE Debug logging * MORE Debug logging * MORE Debug logging * MORE Debug logging * MORE Debug logging * Revert "MORE Debug logging" This reverts commitdab5af28c0
. * Revert "MORE Debug logging" This reverts commit6134b664bd
. * Revert "MORE Debug logging" This reverts commit80160bdf3b
. * Revert "MORE Debug logging" This reverts commit06aabcdfe1
. * Removed debug output * Added replication fast lane. Also refactored the commands as i cannot take it any more... * Put some requests of RocksDBReplication onto CATCHUP Lane. * Put some requests of MMFilesReplication onto CATCHUP Lane. * Adjusted Fast and MED lane usage in Supervised scheduler * Added changelog entry * Added new features entry * A new leader will now keep old followers in case of failover * Update arangod/Cluster/ClusterCollectionCreationInfo.cpp Co-Authored-By: Tobias Gödderz <tobias@arangodb.com> * Fixed JSLINT * Unified lane handling of replication handlers * Sorry forgotten in last commit * replaced strings with static strings * more use of static strings * optimized min repl description in the ui * decr initial loop variable * clean up of the createWithId test * more use of static strings * Update js/apps/system/_admin/aardvark/APP/frontend/js/views/collectionsView.js Co-Authored-By: Tobias Gödderz <tobias@arangodb.com> * Added some comments on condition, renamed variable as suggested in review * Added check for min replicationFactor to be non-zero * Added assertion * Added function to modify min and max replication factor in one go * added missing semicolon * rm log devel * Added a second information to follower info that can keep track of followers that have been in sync before a failover has taken place * Maintenance reports previous version now to follower info. instead of lying by itself. The Follower Info now gets a failover save mode to report insync followers * check replFactor against nr dbservers * Add lie reporting in CURRENT * Reverted most of my recent commits about Failover situation. The intended plan simply does not work out * move replication checks from logical collection to rest collection handler * added more replication tests * Include assert only if we are not in gtest * jslint * set min repl factor to zero if satellite collection * check replication attributes in v8 collection * Initial commit, old plan, does not yet work * fixed ires tests * Included FailoverCandidates key. Not fully implemented * fixed wrong assert * unified in sync follower reporting * fixed compiler errors * Cleanup locking, and fixed potential deadlocks * Comments about locking order in FollowerInfo. * properly check uint * Keep old leader as potential failover candidate * Transaction methods now use followerInfo to check if the leader can write, this might have the sideeffect that 'failoverCandidates' are updated * Let agency check failoverCandidates if possible * Initialize member variables * Use unified follower reporting in DBServerAgencySync * Removed obsolete variable, collecting it somewhere else * repl factor attr check * Reimplemented previous followers, second attempt now. PhaseOne and PhaseTwo can now synchronize on current. * Fixed assertion, forgot an off-by-one * adjusted test to be more preciese now * Fixed failove candidates list * Disable write on dropping too many followers * Allow to run updateFailoerCandidates multiple times with same leader. * Final fixes, resilience tests now green, crossing fingers for jenkins * Fixed race on atomics comparison * Fixed invalid number type * added nullptr handling * added nullptr handling * Removed invalid assert * Make takeover of leadership an atomic operation * Update tests/js/common/shell/shell-cluster-collection.js Co-Authored-By: Tobias Gödderz <tobias@arangodb.com> * Review fixes * Fixed creation code to use takeoverLeadership * Update arangod/Cluster/FollowerInfo.h Co-Authored-By: Tobias Gödderz <tobias@arangodb.com> * Applied review fixes * There is no timeout * Moved AQL + Pregel to INTERNAL_AQL lane, which is medium priority, to avoid deadlocks with Sync replication * More review fixes * Use difference if you want to compare two vectors... * Use std::string ... * Now check if we are in recovery mode * Added documentation for minReplicationFactor * Added readme update as well in documenation
This commit is contained in:
parent
c922c5f133
commit
36b1d290a9
|
@ -38,6 +38,10 @@ factor. The number of _followers_ can be controlled using the
|
|||
`replicationFactor` parameter is the total number of copies being
|
||||
kept, that is, it is one plus the number of _followers_.
|
||||
|
||||
In addition to the `replicationFactor` we have a `minReplicationFactor`
|
||||
the locks down a collection as soon as we have lost too many followers.
|
||||
|
||||
|
||||
Asynchronous replication
|
||||
------------------------
|
||||
|
||||
|
|
|
@ -164,6 +164,15 @@ to the [naming conventions](../NamingConventions/README.md).
|
|||
dramatically when using joins in AQL at the costs of reduced write
|
||||
performance on these collections.
|
||||
|
||||
- *minReplicationFactor* (optional, default is 1): in a cluster, this
|
||||
attribute determines how many copies of each shard are required
|
||||
to be in sync on the different DBServers. If we have less then these
|
||||
many copies in the cluster a shard will refuse to write. The
|
||||
minReplicationFactor can not be larger than replicationFactor.
|
||||
Please note: during server failures this might lead to writes
|
||||
not beeing possible until the failover is sorted out and might cause
|
||||
write slow downs in trade of data durability.
|
||||
|
||||
- *distributeShardsLike*: distribute the shards of this collection
|
||||
cloning the shard distribution of another. If this value is set,
|
||||
it will copy the attributes *replicationFactor*, *numberOfShards* and
|
||||
|
|
|
@ -43,6 +43,10 @@ determine the target shard for documents; *Cluster specific attribute.*
|
|||
@RESTSTRUCT{replicationFactor,collection_info,integer,optional,}
|
||||
contains how many copies of each shard are kept on different DBServers.; *Cluster specific attribute.*
|
||||
|
||||
@RESTSTRUCT{minReplicationFactor,collection_info,integer,optional,}
|
||||
contains how many minimal copies of each shard are kept on different DBServers.
|
||||
The shards will refuse to write, if we have less then these many copies in sync.; *Cluster specific attribute.*
|
||||
|
||||
@RESTSTRUCT{shardingStrategy,collection_info,string,optional,}
|
||||
the sharding strategy selected for the collection; *Cluster specific attribute.*
|
||||
One of 'hash' or 'enterprise-hash-smart-edge'
|
||||
|
|
|
@ -25,6 +25,11 @@ concurrent modifications to this graph.
|
|||
@RESTSTRUCT{replicationFactor,graph_representation,integer,required,}
|
||||
The replication factor used for every new collection in the graph.
|
||||
|
||||
@RESTSTRUCT{minReplicationFactor,graph_representation,integer,optional,}
|
||||
The minimal replication factor used for every new collection in the graph.
|
||||
If one shard has less then minimal replication factor copies, we cannot write
|
||||
to this shard, but to all others.
|
||||
|
||||
@RESTSTRUCT{isSmart,graph_representation,boolean,required,}
|
||||
Flag if the graph is a SmartGraph (Enterprise Edition only) or not.
|
||||
|
||||
|
|
|
@ -42,6 +42,11 @@ Cannot be modified later.
|
|||
@RESTSTRUCT{replicationFactor,post_api_gharial_create_opts,integer,required,}
|
||||
The replication factor used when initially creating collections for this graph.
|
||||
|
||||
@RESTSTRUCT{minReplicationFactor,post_api_gharial_create_opts,integer,optional,}
|
||||
The minimal replication factor used for every new collection in the graph.
|
||||
If one shard has less then minimal replication factor copies, we cannot write
|
||||
to this shard, but to all others.
|
||||
|
||||
@RESTRETURNCODES
|
||||
|
||||
@RESTRETURNCODE{201}
|
||||
|
|
|
@ -52,7 +52,11 @@ In a cluster setup, the result will also contain the following attributes:
|
|||
determine the target shard for documents.
|
||||
|
||||
* *replicationFactor*: determines how many copies of each shard are kept
|
||||
on different DBServers.
|
||||
on different DBServers. Has to be in the range of 1-10 *(Cluster only)*
|
||||
|
||||
* *minReplicationFactor* : determines the number of minimal shard copies kept on
|
||||
different DBServers, a shard will refuse to write, if less then this amount
|
||||
of copies are in sync. Has to be in the range of 1-replicationFactor *(Cluster only)*
|
||||
|
||||
* *shardingStrategy*: the sharding strategy selected for the collection.
|
||||
This attribute will only be populated in cluster mode and is not populated
|
||||
|
@ -77,6 +81,10 @@ one or more of the following attribute(s):
|
|||
different DBServers, valid values are integer numbers
|
||||
in the range of 1-10 *(Cluster only)*
|
||||
|
||||
* *minReplicationFactor* : Change the number of minimal shard copies kept on
|
||||
different DBServers, a shard will refuse to write, if less then this amount
|
||||
of copies are in sync. Has to be in the range of 1-replicationFactor *(Cluster only)*
|
||||
|
||||
**Note**: some other collection properties, such as *type*, *isVolatile*,
|
||||
*keyOptions*, *numberOfShards* or *shardingStrategy* cannot be changed once
|
||||
the collection is created.
|
||||
|
|
|
@ -95,21 +95,22 @@ bool Job::finish(std::string const& server, std::string const& shard,
|
|||
try {
|
||||
jobType = pending.slice()[0].get("type").copyString();
|
||||
} catch (std::exception const&) {
|
||||
LOG_TOPIC("76352", WARN, Logger::AGENCY) << "Failed to obtain type of job " << _jobId;
|
||||
LOG_TOPIC("76352", WARN, Logger::AGENCY)
|
||||
<< "Failed to obtain type of job " << _jobId;
|
||||
}
|
||||
|
||||
// Additional payload, which is to be executed in the finish transaction
|
||||
Slice operations = Slice::emptyObjectSlice();
|
||||
Slice preconditions = Slice::emptyObjectSlice();
|
||||
Slice preconditions = Slice::emptyObjectSlice();
|
||||
|
||||
if (payload != nullptr) {
|
||||
Slice slice = payload->slice();
|
||||
TRI_ASSERT(slice.isObject() || slice.isArray());
|
||||
if (slice.isObject()) { // opers only
|
||||
if (slice.isObject()) { // opers only
|
||||
operations = slice;
|
||||
TRI_ASSERT(operations.isObject());
|
||||
} else {
|
||||
TRI_ASSERT(slice.length() < 3); // opers + precs only
|
||||
TRI_ASSERT(slice.length() < 3); // opers + precs only
|
||||
if (slice.length() > 0) {
|
||||
operations = slice[0];
|
||||
TRI_ASSERT(operations.isObject());
|
||||
|
@ -125,7 +126,7 @@ bool Job::finish(std::string const& server, std::string const& shard,
|
|||
{
|
||||
VPackArrayBuilder guard(&finished);
|
||||
|
||||
{ // operations --
|
||||
{ // operations --
|
||||
VPackObjectBuilder operguard(&finished);
|
||||
|
||||
addPutJobIntoSomewhere(finished, success ? "Finished" : "Failed",
|
||||
|
@ -148,15 +149,14 @@ bool Job::finish(std::string const& server, std::string const& shard,
|
|||
addReleaseShard(finished, shard);
|
||||
}
|
||||
|
||||
} // -- operations
|
||||
} // -- operations
|
||||
|
||||
if (preconditions.isObject() && preconditions.length() > 0) { // preconditions --
|
||||
if (preconditions.isObject() && preconditions.length() > 0) { // preconditions --
|
||||
VPackObjectBuilder precguard(&finished);
|
||||
for (auto const& prec : VPackObjectIterator(preconditions)) {
|
||||
finished.add(prec.key.copyString(), prec.value);
|
||||
}
|
||||
} // -- preconditions
|
||||
|
||||
} // -- preconditions
|
||||
}
|
||||
|
||||
write_ret_t res = singleWriteTransaction(_agent, finished, false);
|
||||
|
@ -168,16 +168,16 @@ bool Job::finish(std::string const& server, std::string const& shard,
|
|||
}
|
||||
} catch (std::exception const& e) {
|
||||
LOG_TOPIC("1fead", WARN, Logger::AGENCY)
|
||||
<< "Caught exception in finish, message: " << e.what();
|
||||
<< "Caught exception in finish, message: " << e.what();
|
||||
} catch (...) {
|
||||
LOG_TOPIC("7762f", WARN, Logger::AGENCY)
|
||||
<< "Caught unspecified exception in finish.";
|
||||
<< "Caught unspecified exception in finish.";
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
std::string Job::randomIdleAvailableServer(Node const& snap,
|
||||
std::vector<std::string> const& exclude) {
|
||||
std::vector<std::string> const& exclude) {
|
||||
std::vector<std::string> as = availableServers(snap);
|
||||
std::string ret;
|
||||
|
||||
|
@ -189,11 +189,11 @@ std::string Job::randomIdleAvailableServer(Node const& snap,
|
|||
for (auto const& srv : snap.hasAsChildren(healthPrefix).first) {
|
||||
// ignore excluded servers
|
||||
if (std::find(std::begin(exclude), std::end(exclude), srv.first) != std::end(exclude)) {
|
||||
continue ;
|
||||
continue;
|
||||
}
|
||||
// ignore servers not in availableServers above:
|
||||
if (std::find(std::begin(as), std::end(as), srv.first) == std::end(as)) {
|
||||
continue ;
|
||||
continue;
|
||||
}
|
||||
|
||||
std::string const& status = (*srv.second).hasAsString("Status").first;
|
||||
|
@ -242,7 +242,7 @@ size_t Job::countGoodOrBadServersInList(Node const& snap, VPackSlice const& serv
|
|||
auto const& health = snap.hasAsChildren(healthPrefix);
|
||||
// Do we have a Health substructure?
|
||||
if (health.second) {
|
||||
Node::Children const& healthData = health.first; // List of servers in Health
|
||||
Node::Children const& healthData = health.first; // List of servers in Health
|
||||
for (VPackSlice const serverName : VPackArrayIterator(serverList)) {
|
||||
if (serverName.isString()) {
|
||||
// serverName not a string? Then don't count
|
||||
|
@ -269,13 +269,14 @@ size_t Job::countGoodOrBadServersInList(Node const& snap, VPackSlice const& serv
|
|||
}
|
||||
|
||||
// The following counts in a given server list how many of the servers are
|
||||
// in Status "GOOD" or "BAD".
|
||||
size_t Job::countGoodOrBadServersInList(Node const& snap, std::vector<std::string> const& serverList) {
|
||||
// in Status "GOOD" or "BAD".
|
||||
size_t Job::countGoodOrBadServersInList(Node const& snap,
|
||||
std::vector<std::string> const& serverList) {
|
||||
size_t count = 0;
|
||||
auto const& health = snap.hasAsChildren(healthPrefix);
|
||||
// Do we have a Health substructure?
|
||||
if (health.second) {
|
||||
Node::Children const& healthData = health.first; // List of servers in Health
|
||||
Node::Children const& healthData = health.first; // List of servers in Health
|
||||
for (auto& serverStr : serverList) {
|
||||
// Now look up this server:
|
||||
auto it = healthData.find(serverStr);
|
||||
|
@ -294,7 +295,8 @@ size_t Job::countGoodOrBadServersInList(Node const& snap, std::vector<std::strin
|
|||
}
|
||||
|
||||
/// @brief Check if a server is cleaned or to be cleaned out:
|
||||
bool Job::isInServerList(Node const& snap, std::string const& prefix, std::string const& server, bool isArray) {
|
||||
bool Job::isInServerList(Node const& snap, std::string const& prefix,
|
||||
std::string const& server, bool isArray) {
|
||||
VPackSlice slice;
|
||||
bool found = false;
|
||||
if (isArray) {
|
||||
|
@ -309,7 +311,7 @@ bool Job::isInServerList(Node const& snap, std::string const& prefix, std::strin
|
|||
}
|
||||
}
|
||||
} else { // an object
|
||||
auto const& children = snap.hasAsChildren(prefix);
|
||||
auto const& children = snap.hasAsChildren(prefix);
|
||||
if (children.second) {
|
||||
for (auto const& srv : children.first) {
|
||||
if (srv.first == server) {
|
||||
|
@ -418,16 +420,15 @@ std::vector<Job::shard_t> Job::clones(Node const& snapshot, std::string const& d
|
|||
|
||||
for (const auto& colptr : snapshot.hasAsChildren(databasePath).first) { // collections
|
||||
|
||||
auto const &col = *colptr.second;
|
||||
auto const &otherCollection = colptr.first;
|
||||
auto const& col = *colptr.second;
|
||||
auto const& otherCollection = colptr.first;
|
||||
|
||||
if (otherCollection != collection && col.has("distributeShardsLike") && // use .has() form to prevent logging of missing
|
||||
col.hasAsSlice("distributeShardsLike").first.copyString() == collection) {
|
||||
auto const& theirshards = sortedShardList(col.hasAsNode("shards").first);
|
||||
if (theirshards.size() > 0) { // do not care about virtual collections
|
||||
if (theirshards.size() == myshards.size()) {
|
||||
ret.emplace_back(otherCollection,
|
||||
theirshards[steps]);
|
||||
ret.emplace_back(otherCollection, theirshards[steps]);
|
||||
} else {
|
||||
LOG_TOPIC("3092e", ERR, Logger::SUPERVISION)
|
||||
<< "Shard distribution of clone(" << otherCollection
|
||||
|
@ -452,10 +453,11 @@ std::string Job::findNonblockedCommonHealthyInSyncFollower( // Which is in "GOO
|
|||
|
||||
std::unordered_map<std::string, size_t> currentServers;
|
||||
for (const auto& clone : cs) {
|
||||
auto currentShardPath = curColPrefix + db + "/" + clone.collection + "/" +
|
||||
clone.shard + "/servers";
|
||||
auto plannedShardPath =
|
||||
planColPrefix + db + "/" + clone.collection + "/shards/" + clone.shard;
|
||||
auto sharedPath = db + "/" + clone.collection + "/";
|
||||
auto currentShardPath = curColPrefix + sharedPath + clone.shard + "/servers";
|
||||
auto currentFailoverCandidatesPath =
|
||||
curColPrefix + sharedPath + clone.shard + "/servers";
|
||||
auto plannedShardPath = planColPrefix + sharedPath + "shards/" + clone.shard;
|
||||
size_t i = 0;
|
||||
|
||||
// start up race condition ... current might not have everything in plan
|
||||
|
@ -464,13 +466,30 @@ std::string Job::findNonblockedCommonHealthyInSyncFollower( // Which is in "GOO
|
|||
continue;
|
||||
} // if
|
||||
|
||||
for (const auto& server :
|
||||
VPackArrayIterator(snap.hasAsArray(currentShardPath).first)) {
|
||||
auto id = server.copyString();
|
||||
bool isArray = false;
|
||||
VPackSlice serverList;
|
||||
// If we do have failover candidates, we should use them
|
||||
std::tie(serverList, isArray) = snap.hasAsArray(currentFailoverCandidatesPath);
|
||||
if (!isArray) {
|
||||
// We have old DBServers that do not report failover candidates,
|
||||
// Need to rely on current
|
||||
std::tie(serverList, isArray) = snap.hasAsArray(currentShardPath);
|
||||
TRI_ASSERT(isArray);
|
||||
if (!isArray) {
|
||||
THROW_ARANGO_EXCEPTION_MESSAGE(
|
||||
TRI_ERROR_SUPERVISION_GENERAL_FAILURE,
|
||||
"Could not find common insync server for: " + currentShardPath +
|
||||
", value is not an array.");
|
||||
}
|
||||
}
|
||||
// Guarantieed by if above
|
||||
TRI_ASSERT(serverList.isArray());
|
||||
for (const auto& server : VPackArrayIterator(serverList)) {
|
||||
if (i++ == 0) {
|
||||
// Skip leader
|
||||
continue;
|
||||
}
|
||||
auto id = server.copyString();
|
||||
|
||||
if (!good[id]) {
|
||||
// Skip unhealthy servers
|
||||
|
@ -550,9 +569,9 @@ bool Job::abortable(Node const& snapshot, std::string const& jobId) {
|
|||
return false;
|
||||
}
|
||||
|
||||
void Job::doForAllShards(Node const& snapshot, std::string& database,
|
||||
std::vector<shard_t>& shards,
|
||||
std::function<void(Slice plan, Slice current, std::string& planPath, std::string& curPath)> worker) {
|
||||
void Job::doForAllShards(
|
||||
Node const& snapshot, std::string& database, std::vector<shard_t>& shards,
|
||||
std::function<void(Slice plan, Slice current, std::string& planPath, std::string& curPath)> worker) {
|
||||
for (auto const& collShard : shards) {
|
||||
std::string shard = collShard.shard;
|
||||
std::string collection = collShard.collection;
|
||||
|
|
|
@ -49,9 +49,7 @@ class RestAqlHandler : public RestVocbaseBaseHandler {
|
|||
|
||||
public:
|
||||
char const* name() const override final { return "RestAqlHandler"; }
|
||||
RequestLane lane() const override final {
|
||||
return RequestLane::CLUSTER_INTERNAL;
|
||||
}
|
||||
RequestLane lane() const override final { return RequestLane::CLUSTER_AQL; }
|
||||
RestStatus execute() override;
|
||||
RestStatus continueExecute() override;
|
||||
|
||||
|
|
|
@ -165,6 +165,30 @@ class CollectionInfoCurrent {
|
|||
return v;
|
||||
}
|
||||
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief returns the current failover candidates for the given shard
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
TEST_VIRTUAL std::vector<ServerID> failoverCandidates(ShardID const& shardID) const {
|
||||
std::vector<ServerID> v;
|
||||
|
||||
auto it = _vpacks.find(shardID);
|
||||
if (it != _vpacks.end()) {
|
||||
VPackSlice slice = it->second->slice();
|
||||
|
||||
VPackSlice servers = slice.get(StaticStrings::FailoverCandidates);
|
||||
if (servers.isArray()) {
|
||||
for (auto const& server : VPackArrayIterator(servers)) {
|
||||
TRI_ASSERT(server.isString());
|
||||
if (server.isString()) {
|
||||
v.push_back(server.copyString());
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
return v;
|
||||
}
|
||||
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief returns the errorMessage entry for one shardID
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
|
|
|
@ -93,7 +93,8 @@ CreateCollection::CreateCollection(MaintenanceFeature& feature, ActionDescriptio
|
|||
TRI_ASSERT(type == TRI_COL_TYPE_DOCUMENT || type == TRI_COL_TYPE_EDGE);
|
||||
|
||||
if (!error.str().empty()) {
|
||||
LOG_TOPIC("7c60f", ERR, Logger::MAINTENANCE) << "CreateCollection: " << error.str();
|
||||
LOG_TOPIC("7c60f", ERR, Logger::MAINTENANCE)
|
||||
<< "CreateCollection: " << error.str();
|
||||
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
||||
setState(FAILED);
|
||||
}
|
||||
|
@ -156,10 +157,12 @@ bool CreateCollection::first() {
|
|||
LOG_TOPIC("9db9a", DEBUG, Logger::MAINTENANCE)
|
||||
<< "local collection " << database << "/"
|
||||
<< shard << " successfully created";
|
||||
col->followers()->setTheLeader(leader);
|
||||
|
||||
if (leader.empty()) {
|
||||
col->followers()->clear();
|
||||
std::vector<std::string> noFollowers;
|
||||
col->followers()->takeOverLeadership(noFollowers);
|
||||
} else {
|
||||
col->followers()->setTheLeader(leader);
|
||||
}
|
||||
});
|
||||
|
||||
|
@ -186,7 +189,7 @@ bool CreateCollection::first() {
|
|||
}
|
||||
|
||||
LOG_TOPIC("4562c", DEBUG, Logger::MAINTENANCE)
|
||||
<< "Create collection done, notifying Maintenance";
|
||||
<< "Create collection done, notifying Maintenance";
|
||||
|
||||
notify();
|
||||
|
||||
|
|
|
@ -70,7 +70,8 @@ Result DBServerAgencySync::getLocalCollections(VPackBuilder& collections) {
|
|||
}
|
||||
|
||||
if (dbfeature == nullptr) {
|
||||
LOG_TOPIC("d0ef2", ERR, Logger::HEARTBEAT) << "Failed to get feature database";
|
||||
LOG_TOPIC("d0ef2", ERR, Logger::HEARTBEAT)
|
||||
<< "Failed to get feature database";
|
||||
return Result(TRI_ERROR_INTERNAL, "Failed to get feature database");
|
||||
}
|
||||
|
||||
|
@ -80,9 +81,7 @@ Result DBServerAgencySync::getLocalCollections(VPackBuilder& collections) {
|
|||
if (!vocbase.use()) {
|
||||
return;
|
||||
}
|
||||
auto unuse = scopeGuard([&vocbase] {
|
||||
vocbase.release();
|
||||
});
|
||||
auto unuse = scopeGuard([&vocbase] { vocbase.release(); });
|
||||
|
||||
collections.add(VPackValue(vocbase.name()));
|
||||
|
||||
|
@ -100,9 +99,8 @@ Result DBServerAgencySync::getLocalCollections(VPackBuilder& collections) {
|
|||
// generate a collection definition identical to that which would be
|
||||
// persisted in the case of SingleServer
|
||||
collection->properties(collections,
|
||||
LogicalDataSource::makeFlags(
|
||||
LogicalDataSource::Serialize::Detailed,
|
||||
LogicalDataSource::Serialize::ForPersistence));
|
||||
LogicalDataSource::makeFlags(LogicalDataSource::Serialize::Detailed,
|
||||
LogicalDataSource::Serialize::ForPersistence));
|
||||
|
||||
auto const& folls = collection->followers();
|
||||
std::string const theLeader = folls->getLeader();
|
||||
|
@ -119,19 +117,7 @@ Result DBServerAgencySync::getLocalCollections(VPackBuilder& collections) {
|
|||
// we are the leader ourselves
|
||||
// In this case we report our in-sync followers here in the format
|
||||
// of the agency: [ leader, follower1, follower2, ... ]
|
||||
collections.add(VPackValue("servers"));
|
||||
|
||||
{
|
||||
VPackArrayBuilder guard(&collections);
|
||||
|
||||
collections.add(VPackValue(arangodb::ServerState::instance()->getId()));
|
||||
|
||||
std::shared_ptr<std::vector<ServerID> const> srvs = folls->get();
|
||||
|
||||
for (auto const& s : *srvs) {
|
||||
collections.add(VPackValue(s));
|
||||
}
|
||||
}
|
||||
folls->injectFollowerInfo(collections);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -151,14 +137,20 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
|
||||
LOG_TOPIC("62fd8", DEBUG, Logger::MAINTENANCE)
|
||||
<< "DBServerAgencySync::execute starting";
|
||||
|
||||
DBServerAgencySyncResult result;
|
||||
auto* sysDbFeature =
|
||||
application_features::ApplicationServer::lookupFeature<SystemDatabaseFeature>();
|
||||
MaintenanceFeature* mfeature =
|
||||
ApplicationServer::getFeature<MaintenanceFeature>("Maintenance");
|
||||
if (mfeature == nullptr) {
|
||||
LOG_TOPIC("3a1f7", ERR, Logger::MAINTENANCE)
|
||||
<< "Could not load maintenance feature, can happen during shutdown.";
|
||||
result.success = false;
|
||||
result.errorMessage = "Could not load maintenance feature";
|
||||
return result;
|
||||
}
|
||||
arangodb::SystemDatabaseFeature::ptr vocbase =
|
||||
sysDbFeature ? sysDbFeature->use() : nullptr;
|
||||
DBServerAgencySyncResult result;
|
||||
|
||||
if (vocbase == nullptr) {
|
||||
LOG_TOPIC("18d67", DEBUG, Logger::MAINTENANCE)
|
||||
|
@ -196,20 +188,21 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
VPackObjectBuilder o(&rb);
|
||||
|
||||
auto startTimePhaseOne = std::chrono::steady_clock::now();
|
||||
LOG_TOPIC("19aaf", DEBUG, Logger::MAINTENANCE) << "DBServerAgencySync::phaseOne";
|
||||
LOG_TOPIC("19aaf", DEBUG, Logger::MAINTENANCE)
|
||||
<< "DBServerAgencySync::phaseOne";
|
||||
tmp = arangodb::maintenance::phaseOne(plan->slice(), local.slice(),
|
||||
serverId, *mfeature, rb);
|
||||
auto endTimePhaseOne = std::chrono::steady_clock::now();
|
||||
LOG_TOPIC("93f83", DEBUG, Logger::MAINTENANCE)
|
||||
<< "DBServerAgencySync::phaseOne done";
|
||||
|
||||
if (endTimePhaseOne - startTimePhaseOne >
|
||||
std::chrono::milliseconds(200)) {
|
||||
if (endTimePhaseOne - startTimePhaseOne > std::chrono::milliseconds(200)) {
|
||||
// We take this as indication that many shards are in the system,
|
||||
// in this case: give some asynchronous jobs created in phaseOne a
|
||||
// chance to complete before we collect data for phaseTwo:
|
||||
LOG_TOPIC("ef730", DEBUG, Logger::MAINTENANCE)
|
||||
<< "DBServerAgencySync::hesitating between phases 1 and 2 for 0.1s...";
|
||||
<< "DBServerAgencySync::hesitating between phases 1 and 2 for "
|
||||
"0.1s...";
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds(100));
|
||||
}
|
||||
|
||||
|
@ -224,6 +217,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
LOG_TOPIC("675fd", TRACE, Logger::MAINTENANCE)
|
||||
<< "DBServerAgencySync::phaseTwo - current state: " << current->toJson();
|
||||
|
||||
mfeature->increaseCurrentCounter();
|
||||
|
||||
local.clear();
|
||||
glc = getLocalCollections(local);
|
||||
// We intentionally refetch local collections here, such that phase 2
|
||||
|
@ -237,7 +232,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
return result;
|
||||
}
|
||||
|
||||
LOG_TOPIC("652ff", DEBUG, Logger::MAINTENANCE) << "DBServerAgencySync::phaseTwo";
|
||||
LOG_TOPIC("652ff", DEBUG, Logger::MAINTENANCE)
|
||||
<< "DBServerAgencySync::phaseTwo";
|
||||
|
||||
tmp = arangodb::maintenance::phaseTwo(plan->slice(), current->slice(),
|
||||
local.slice(), serverId, *mfeature, rb);
|
||||
|
@ -246,7 +242,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
<< "DBServerAgencySync::phaseTwo done";
|
||||
|
||||
} catch (std::exception const& e) {
|
||||
LOG_TOPIC("cd308", ERR, Logger::MAINTENANCE) << "Failed to handle plan change: " << e.what();
|
||||
LOG_TOPIC("cd308", ERR, Logger::MAINTENANCE)
|
||||
<< "Failed to handle plan change: " << e.what();
|
||||
}
|
||||
|
||||
if (rb.isClosed()) {
|
||||
|
@ -268,9 +265,9 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
|
||||
if (ao.value.hasKey("precondition")) {
|
||||
auto const precondition = ao.value.get("precondition");
|
||||
preconditions.push_back(
|
||||
AgencyPrecondition(
|
||||
precondition.keyAt(0).copyString(), AgencyPrecondition::Type::VALUE, precondition.valueAt(0)));
|
||||
preconditions.push_back(AgencyPrecondition(precondition.keyAt(0).copyString(),
|
||||
AgencyPrecondition::Type::VALUE,
|
||||
precondition.valueAt(0)));
|
||||
}
|
||||
|
||||
if (op == "set") {
|
||||
|
@ -279,7 +276,6 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
} else if (op == "delete") {
|
||||
operations.push_back(AgencyOperation(key, AgencySimpleOperationType::DELETE_OP));
|
||||
}
|
||||
|
||||
}
|
||||
operations.push_back(AgencyOperation("Current/Version",
|
||||
AgencySimpleOperationType::INCREMENT_OP));
|
||||
|
@ -288,9 +284,8 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
AgencyCommResult r = comm.sendTransactionWithFailover(currentTransaction);
|
||||
if (!r.successful()) {
|
||||
LOG_TOPIC("d73b8", INFO, Logger::MAINTENANCE)
|
||||
<< "Error reporting to agency: _statusCode: " << r.errorCode()
|
||||
<< " message: " << r.errorMessage()
|
||||
<< ". This can be ignored, since it will be retried automatically.";
|
||||
<< "Error reporting to agency: _statusCode: " << r.errorCode()
|
||||
<< " message: " << r.errorMessage() << ". This can be ignored, since it will be retried automatically.";
|
||||
} else {
|
||||
LOG_TOPIC("9b0b3", DEBUG, Logger::MAINTENANCE)
|
||||
<< "Invalidating current in ClusterInfo";
|
||||
|
@ -317,22 +312,23 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
result.errorMessage = "Report from phase 1 and 2 was no object.";
|
||||
try {
|
||||
std::string json = report.toJson();
|
||||
LOG_TOPIC("65fde", WARN, Logger::MAINTENANCE) << "Report from phase 1 and 2 was: " << json;
|
||||
} catch(std::exception const& exc) {
|
||||
LOG_TOPIC("65fde", WARN, Logger::MAINTENANCE)
|
||||
<< "Report from phase 1 and 2 was: " << json;
|
||||
} catch (std::exception const& exc) {
|
||||
LOG_TOPIC("54de2", WARN, Logger::MAINTENANCE)
|
||||
<< "Report from phase 1 and 2 could not be dumped to JSON, error: "
|
||||
<< exc.what() << ", head byte:" << report.head();
|
||||
<< "Report from phase 1 and 2 could not be dumped to JSON, error: "
|
||||
<< exc.what() << ", head byte:" << report.head();
|
||||
uint64_t l = 0;
|
||||
try {
|
||||
l = report.byteSize();
|
||||
LOG_TOPIC("54dda", WARN, Logger::MAINTENANCE)
|
||||
<< "Report from phase 1 and 2, byte size: " << l;
|
||||
<< "Report from phase 1 and 2, byte size: " << l;
|
||||
LOG_TOPIC("67421", WARN, Logger::MAINTENANCE)
|
||||
<< "Bytes: "
|
||||
<< arangodb::basics::StringUtils::encodeHex((char const*) report.start(), l);
|
||||
} catch(...) {
|
||||
<< "Bytes: "
|
||||
<< arangodb::basics::StringUtils::encodeHex((char const*)report.start(), l);
|
||||
} catch (...) {
|
||||
LOG_TOPIC("76124", WARN, Logger::MAINTENANCE)
|
||||
<< "Report from phase 1 and 2, byte size throws.";
|
||||
<< "Report from phase 1 and 2, byte size throws.";
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -342,9 +338,10 @@ DBServerAgencySyncResult DBServerAgencySync::execute() {
|
|||
|
||||
auto took = duration<double>(clock::now() - start).count();
|
||||
if (took > 30.0) {
|
||||
LOG_TOPIC("83cb8", WARN, Logger::MAINTENANCE) << "DBServerAgencySync::execute "
|
||||
"took "
|
||||
<< took << " s to execute handlePlanChange";
|
||||
LOG_TOPIC("83cb8", WARN, Logger::MAINTENANCE)
|
||||
<< "DBServerAgencySync::execute "
|
||||
"took "
|
||||
<< took << " s to execute handlePlanChange";
|
||||
}
|
||||
|
||||
return result;
|
||||
|
|
|
@ -25,6 +25,7 @@
|
|||
#include "FollowerInfo.h"
|
||||
|
||||
#include "ApplicationFeatures/ApplicationServer.h"
|
||||
#include "Cluster/MaintenanceStrings.h"
|
||||
#include "Cluster/ServerState.h"
|
||||
#include "VocBase/LogicalCollection.h"
|
||||
|
||||
|
@ -32,56 +33,12 @@
|
|||
|
||||
using namespace arangodb;
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief change JSON under
|
||||
/// Current/Collection/<DB-name>/<Collection-ID>/<shard-ID>
|
||||
/// to add or remove a serverID, if add flag is true, the entry is added
|
||||
/// (if it is not yet there), otherwise the entry is removed (if it was
|
||||
/// there).
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
static VPackBuilder newShardEntry(VPackSlice oldValue, ServerID const& sid, bool add) {
|
||||
VPackBuilder newValue;
|
||||
VPackSlice servers;
|
||||
{
|
||||
VPackObjectBuilder b(&newValue);
|
||||
// Now need to find the `servers` attribute, which is a list:
|
||||
for (auto const& it : VPackObjectIterator(oldValue)) {
|
||||
if (it.key.isEqualString("servers")) {
|
||||
servers = it.value;
|
||||
} else {
|
||||
newValue.add(it.key);
|
||||
newValue.add(it.value);
|
||||
}
|
||||
}
|
||||
newValue.add(VPackValue("servers"));
|
||||
if (servers.isArray() && servers.length() > 0) {
|
||||
VPackArrayBuilder bb(&newValue);
|
||||
newValue.add(servers[0]);
|
||||
VPackArrayIterator it(servers);
|
||||
bool done = false;
|
||||
for (++it; it.valid(); ++it) {
|
||||
if ((*it).isEqualString(sid)) {
|
||||
if (add) {
|
||||
newValue.add(*it);
|
||||
done = true;
|
||||
}
|
||||
} else {
|
||||
newValue.add(*it);
|
||||
}
|
||||
}
|
||||
if (add && !done) {
|
||||
newValue.add(VPackValue(sid));
|
||||
}
|
||||
} else {
|
||||
VPackArrayBuilder bb(&newValue);
|
||||
newValue.add(VPackValue(ServerState::instance()->getId()));
|
||||
if (add) {
|
||||
newValue.add(VPackValue(sid));
|
||||
}
|
||||
}
|
||||
static std::string const inline reportName(bool isRemove) {
|
||||
if (isRemove) {
|
||||
return "FollowerInfo::remove";
|
||||
} else {
|
||||
return "FollowerInfo::add";
|
||||
}
|
||||
return newValue;
|
||||
}
|
||||
|
||||
static std::string CurrentShardPath(arangodb::LogicalCollection& col) {
|
||||
|
@ -136,6 +93,15 @@ Result FollowerInfo::add(ServerID const& sid) {
|
|||
v = std::make_shared<std::vector<ServerID>>(*_followers);
|
||||
v->push_back(sid); // add a single entry
|
||||
_followers = v; // will cast to std::vector<ServerID> const
|
||||
{
|
||||
// insertIntoCandidates
|
||||
if (std::find(_failoverCandidates->begin(), _failoverCandidates->end(), sid) ==
|
||||
_failoverCandidates->end()) {
|
||||
auto nextCandidates = std::make_shared<std::vector<ServerID>>(*_failoverCandidates);
|
||||
nextCandidates->push_back(sid); // add a single entry
|
||||
_failoverCandidates = nextCandidates; // will cast to std::vector<ServerID> const
|
||||
}
|
||||
}
|
||||
#ifdef DEBUG_SYNC_REPLICATION
|
||||
if (!AgencyCommManager::MANAGER) {
|
||||
return {TRI_ERROR_NO_ERROR};
|
||||
|
@ -144,23 +110,15 @@ Result FollowerInfo::add(ServerID const& sid) {
|
|||
}
|
||||
|
||||
// Now tell the agency
|
||||
TRI_ASSERT(_docColl != nullptr);
|
||||
std::string curPath = CurrentShardPath(*_docColl);
|
||||
std::string planPath = PlanShardPath(*_docColl);
|
||||
AgencyComm ac;
|
||||
double startTime = TRI_microtime();
|
||||
do {
|
||||
AgencyReadTransaction trx(std::vector<std::string>(
|
||||
{AgencyCommManager::path(planPath), AgencyCommManager::path(curPath)}));
|
||||
AgencyCommResult res = ac.sendTransactionWithFailover(trx);
|
||||
|
||||
if (res.successful()) {
|
||||
TRI_ASSERT(res.slice().isArray() && res.slice().length() == 1);
|
||||
VPackSlice resSlice = res.slice()[0];
|
||||
// Let's look at the results, note that both can be None!
|
||||
velocypack::Slice planEntry = PlanShardEntry(*_docColl, resSlice);
|
||||
velocypack::Slice currentEntry = CurrentShardEntry(*_docColl, resSlice);
|
||||
auto agencyRes = persistInAgency(false);
|
||||
if (agencyRes.ok() || agencyRes.is(TRI_ERROR_CLUSTER_NOT_LEADER)) {
|
||||
// Not a leader is expected
|
||||
return agencyRes;
|
||||
}
|
||||
// Real error, report
|
||||
|
||||
<<<<<<< HEAD
|
||||
=======
|
||||
if (!currentEntry.isObject()) {
|
||||
LOG_TOPIC("b753d", ERR, Logger::CLUSTER)
|
||||
<< "FollowerInfo::add, did not find object in " << curPath;
|
||||
|
@ -210,14 +168,15 @@ Result FollowerInfo::add(ServerID const& sid) {
|
|||
int errorCode = (application_features::ApplicationServer::isStopping())
|
||||
? TRI_ERROR_SHUTTING_DOWN
|
||||
: TRI_ERROR_CLUSTER_AGENCY_COMMUNICATION_FAILED;
|
||||
>>>>>>> c922c5f1332482ef29dff794d8af394d31c1b737
|
||||
std::string errorMessage =
|
||||
"unable to add follower in agency, timeout in agency CAS operation for "
|
||||
"key " +
|
||||
_docColl->vocbase().name() + "/" + std::to_string(_docColl->planId()) +
|
||||
": " + TRI_errno_string(errorCode);
|
||||
": " + TRI_errno_string(agencyRes.errorNumber());
|
||||
LOG_TOPIC("6295b", ERR, Logger::CLUSTER) << errorMessage;
|
||||
|
||||
return {errorCode, std::move(errorMessage)};
|
||||
agencyRes.reset(agencyRes.errorNumber(), std::move(errorMessage));
|
||||
return agencyRes;
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
@ -246,46 +205,192 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
|||
<< "Removing follower " << sid << " from " << _docColl->name();
|
||||
|
||||
MUTEX_LOCKER(locker, _agencyMutex);
|
||||
WRITE_LOCKER(canWriteLocker, _canWriteLock);
|
||||
WRITE_LOCKER(writeLocker, _dataLock); // the data lock has to be locked until this function completes
|
||||
// because if the agency communication does not work
|
||||
// local data is modified again.
|
||||
|
||||
// First check if there is anything to do:
|
||||
bool found = false;
|
||||
for (auto const& s : *_followers) {
|
||||
if (s == sid) {
|
||||
found = true;
|
||||
break;
|
||||
}
|
||||
}
|
||||
if (!found) {
|
||||
if (std::find(_followers->begin(), _followers->end(), sid) == _followers->end()) {
|
||||
TRI_ASSERT(std::find(_failoverCandidates->begin(), _failoverCandidates->end(),
|
||||
sid) == _failoverCandidates->end());
|
||||
return {TRI_ERROR_NO_ERROR}; // nothing to do
|
||||
}
|
||||
|
||||
auto v = std::make_shared<std::vector<ServerID>>();
|
||||
if (_followers->size() > 0) {
|
||||
// Both lists have to be in sync at any time!
|
||||
TRI_ASSERT(std::find(_failoverCandidates->begin(), _failoverCandidates->end(),
|
||||
sid) != _failoverCandidates->end());
|
||||
auto oldFollowers = _followers;
|
||||
auto oldFailovers = _failoverCandidates;
|
||||
{
|
||||
auto v = std::make_shared<std::vector<ServerID>>();
|
||||
TRI_ASSERT(!_followers->empty()); // well we found the element above \o/
|
||||
v->reserve(_followers->size() - 1);
|
||||
for (auto const& i : *_followers) {
|
||||
if (i != sid) {
|
||||
v->push_back(i);
|
||||
}
|
||||
}
|
||||
std::remove_copy(_followers->begin(), _followers->end(),
|
||||
std::back_inserter(*v.get()), sid);
|
||||
_followers = v; // will cast to std::vector<ServerID> const
|
||||
}
|
||||
{
|
||||
auto v = std::make_shared<std::vector<ServerID>>();
|
||||
TRI_ASSERT(!_failoverCandidates->empty()); // well we found the element above \o/
|
||||
v->reserve(_failoverCandidates->size() - 1);
|
||||
std::remove_copy(_failoverCandidates->begin(), _failoverCandidates->end(),
|
||||
std::back_inserter(*v.get()), sid);
|
||||
_failoverCandidates = v; // will cast to std::vector<ServerID> const
|
||||
}
|
||||
auto _oldFollowers = _followers;
|
||||
_followers = v; // will cast to std::vector<ServerID> const
|
||||
#ifdef DEBUG_SYNC_REPLICATION
|
||||
if (!AgencyCommManager::MANAGER) {
|
||||
return {TRI_ERROR_NO_ERROR};
|
||||
}
|
||||
#endif
|
||||
Result agencyRes = persistInAgency(true);
|
||||
if (agencyRes.ok()) {
|
||||
// +1 for the leader (me)
|
||||
if (_followers->size() + 1 < _docColl->minReplicationFactor()) {
|
||||
_canWrite = false;
|
||||
}
|
||||
// we are finished
|
||||
LOG_TOPIC("be0cb", DEBUG, Logger::CLUSTER)
|
||||
<< "Removing follower " << sid << " from " << _docColl->name() << "succeeded";
|
||||
return agencyRes;
|
||||
}
|
||||
if (agencyRes.is(TRI_ERROR_CLUSTER_NOT_LEADER)) {
|
||||
// Next run in Maintenance will fix this.
|
||||
return agencyRes;
|
||||
}
|
||||
|
||||
// rollback:
|
||||
_followers = oldFollowers;
|
||||
_failoverCandidates = oldFailovers;
|
||||
std::string errorMessage =
|
||||
"unable to remove follower from agency, timeout in agency CAS operation "
|
||||
"for key " +
|
||||
_docColl->vocbase().name() + "/" + std::to_string(_docColl->planId()) +
|
||||
": " + TRI_errno_string(agencyRes.errorNumber());
|
||||
LOG_TOPIC("a0dcc", ERR, Logger::CLUSTER) << errorMessage;
|
||||
agencyRes.resetErrorMessage<std::string>(std::move(errorMessage));
|
||||
return agencyRes;
|
||||
}
|
||||
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief clear follower list, no changes in agency necessary
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
void FollowerInfo::clear() {
|
||||
WRITE_LOCKER(canWriteLocker, _canWriteLock);
|
||||
WRITE_LOCKER(writeLocker, _dataLock);
|
||||
_followers = std::make_shared<std::vector<ServerID>>();
|
||||
_failoverCandidates = std::make_shared<std::vector<ServerID>>();
|
||||
_canWrite = false;
|
||||
}
|
||||
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief check whether the given server is a follower
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
bool FollowerInfo::contains(ServerID const& sid) const {
|
||||
READ_LOCKER(readLocker, _dataLock);
|
||||
auto const& f = *_followers;
|
||||
return std::find(f.begin(), f.end(), sid) != f.end();
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief Take over leadership for this shard.
|
||||
/// Also inject information of a insync followers that we knew about
|
||||
/// before a failover to this server has happened
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
void FollowerInfo::takeOverLeadership(std::vector<std::string> const& previousInsyncFollowers) {
|
||||
// This function copies over the information taken from the last CURRENT into a local vector.
|
||||
// Where we remove the old leader and ourself from the list of followers
|
||||
WRITE_LOCKER(canWriteLocker, _canWriteLock);
|
||||
WRITE_LOCKER(writeLocker, _dataLock);
|
||||
// Reset local structures, if we take over leadership we do not know anything!
|
||||
_followers = std::make_shared<std::vector<ServerID>>();
|
||||
_failoverCandidates = std::make_shared<std::vector<ServerID>>();
|
||||
// We disallow writes until the first write.
|
||||
_canWrite = false;
|
||||
// Take over leadership
|
||||
_theLeader = "";
|
||||
_theLeaderTouched = true;
|
||||
TRI_ASSERT(_failoverCandidates != nullptr && _failoverCandidates->empty());
|
||||
if (previousInsyncFollowers.size() > 1) {
|
||||
auto ourselves = arangodb::ServerState::instance()->getId();
|
||||
auto failoverCandidates =
|
||||
std::make_shared<std::vector<ServerID>>(previousInsyncFollowers);
|
||||
auto myEntry =
|
||||
std::find(failoverCandidates->begin(), failoverCandidates->end(), ourselves);
|
||||
// We are a valid failover follower
|
||||
TRI_ASSERT(myEntry != failoverCandidates->end());
|
||||
// The first server is a different leader! (For some reason the job can be
|
||||
// triggered twice) TRI_ASSERT(myEntry != failoverCandidates->begin());
|
||||
failoverCandidates->erase(myEntry);
|
||||
// Put us in front, put old leader somewhere, we do not really care
|
||||
_failoverCandidates = failoverCandidates;
|
||||
}
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief Update the current information in the Agency. We update the failover-
|
||||
/// list with the newest values, after this the guarantee is that
|
||||
/// _followers == _failoverCandidates
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
bool FollowerInfo::updateFailoverCandidates() {
|
||||
MUTEX_LOCKER(agencyLocker, _agencyMutex);
|
||||
// Acquire _canWriteLock first
|
||||
WRITE_LOCKER(canWriteLocker, _canWriteLock);
|
||||
// Next acquire _dataLock
|
||||
WRITE_LOCKER(dataLocker, _dataLock);
|
||||
if (_canWrite) {
|
||||
// Short circuit, we have multiple writes in the above write lock
|
||||
// The first needs to do things and flips _canWrite
|
||||
// All followers can return as soon as the lock is released
|
||||
#ifdef ARANGODB_ENABLE_MAINTAINER_MODE
|
||||
TRI_ASSERT(_failoverCandidates->size() == _followers->size());
|
||||
std::vector<std::string> diff;
|
||||
std::set_symmetric_difference(_failoverCandidates->begin(),
|
||||
_failoverCandidates->end(), _followers->begin(),
|
||||
_followers->end(), std::back_inserter(diff));
|
||||
TRI_ASSERT(diff.empty());
|
||||
#endif
|
||||
return _canWrite;
|
||||
}
|
||||
TRI_ASSERT(_followers->size() + 1 >= _docColl->minReplicationFactor());
|
||||
// Update both lists (we use a copy here, as we are modifying them in other places individually!)
|
||||
_failoverCandidates = std::make_shared<std::vector<ServerID> const>(*_followers);
|
||||
// Just be sure
|
||||
TRI_ASSERT(_failoverCandidates.get() != _followers.get());
|
||||
TRI_ASSERT(_failoverCandidates->size() == _followers->size());
|
||||
#ifdef ARANGODB_ENABLE_MAINTAINER_MODE
|
||||
std::vector<std::string> diff;
|
||||
std::set_symmetric_difference(_failoverCandidates->begin(),
|
||||
_failoverCandidates->end(), _followers->begin(),
|
||||
_followers->end(), std::back_inserter(diff));
|
||||
TRI_ASSERT(diff.empty());
|
||||
#endif
|
||||
Result res = persistInAgency(true);
|
||||
if (!res.ok()) {
|
||||
// We could not persist the update in the agency.
|
||||
// Collection left in RO mode.
|
||||
LOG_TOPIC("7af00", INFO, Logger::CLUSTER)
|
||||
<< "Could not persist insync follower for " << _docColl->vocbase().name()
|
||||
<< "/" << std::to_string(_docColl->planId())
|
||||
<< " keep RO-mode for now, next write will retry.";
|
||||
TRI_ASSERT(!_canWrite);
|
||||
} else {
|
||||
_canWrite = true;
|
||||
}
|
||||
return _canWrite;
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief Persist information in Current
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
Result FollowerInfo::persistInAgency(bool isRemove) const {
|
||||
// Now tell the agency
|
||||
TRI_ASSERT(_docColl != nullptr);
|
||||
std::string curPath = CurrentShardPath(*_docColl);
|
||||
std::string planPath = PlanShardPath(*_docColl);
|
||||
|
||||
AgencyComm ac;
|
||||
double startTime = TRI_microtime();
|
||||
do {
|
||||
AgencyReadTransaction trx(std::vector<std::string>(
|
||||
{AgencyCommManager::path(planPath), AgencyCommManager::path(curPath)}));
|
||||
|
@ -299,7 +404,7 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
|||
|
||||
if (!currentEntry.isObject()) {
|
||||
LOG_TOPIC("01896", ERR, Logger::CLUSTER)
|
||||
<< "FollowerInfo::remove, did not find object in " << curPath;
|
||||
<< reportName(isRemove) << ", did not find object in " << curPath;
|
||||
if (!currentEntry.isNone()) {
|
||||
LOG_TOPIC("57c84", ERR, Logger::CLUSTER) << "Found: " << currentEntry.toJson();
|
||||
}
|
||||
|
@ -307,16 +412,16 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
|||
if (!planEntry.isArray() || planEntry.length() == 0 || !planEntry[0].isString() ||
|
||||
!planEntry[0].isEqualString(ServerState::instance()->getId())) {
|
||||
LOG_TOPIC("42231", INFO, Logger::CLUSTER)
|
||||
<< "FollowerInfo::remove, did not find myself in Plan: "
|
||||
<< _docColl->vocbase().name() << "/"
|
||||
<< std::to_string(_docColl->planId())
|
||||
<< reportName(isRemove)
|
||||
<< ", did not find myself in Plan: " << _docColl->vocbase().name()
|
||||
<< "/" << std::to_string(_docColl->planId())
|
||||
<< " (can happen when the leader changed recently).";
|
||||
if (!planEntry.isNone()) {
|
||||
LOG_TOPIC("ffede", INFO, Logger::CLUSTER) << "Found: " << planEntry.toJson();
|
||||
}
|
||||
return {TRI_ERROR_CLUSTER_NOT_LEADER};
|
||||
} else {
|
||||
auto newValue = newShardEntry(currentEntry, sid, false);
|
||||
auto newValue = newShardEntry(currentEntry);
|
||||
AgencyWriteTransaction trx;
|
||||
trx.preconditions.push_back(
|
||||
AgencyPrecondition(curPath, AgencyPrecondition::Type::VALUE, currentEntry));
|
||||
|
@ -328,19 +433,21 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
|||
AgencyOperation("Current/Version", AgencySimpleOperationType::INCREMENT_OP));
|
||||
AgencyCommResult res2 = ac.sendTransactionWithFailover(trx);
|
||||
if (res2.successful()) {
|
||||
// we are finished
|
||||
LOG_TOPIC("be0cb", DEBUG, Logger::CLUSTER)
|
||||
<< "Removing follower " << sid << " from " << _docColl->name()
|
||||
<< "succeeded";
|
||||
return {TRI_ERROR_NO_ERROR};
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
LOG_TOPIC("b7333", WARN, Logger::CLUSTER)
|
||||
<< "FollowerInfo::remove, could not read " << planPath << " and "
|
||||
<< reportName(isRemove) << ", could not read " << planPath << " and "
|
||||
<< curPath << " in agency.";
|
||||
}
|
||||
<<<<<<< HEAD
|
||||
using namespace std::chrono_literals;
|
||||
std::this_thread::sleep_for(500ms);
|
||||
} while (!application_features::ApplicationServer::isStopping());
|
||||
return TRI_ERROR_SHUTTING_DOWN;
|
||||
=======
|
||||
std::this_thread::sleep_for(std::chrono::milliseconds(500));
|
||||
} while (TRI_microtime() < startTime + 7200 &&
|
||||
!application_features::ApplicationServer::isStopping());
|
||||
|
@ -367,23 +474,58 @@ Result FollowerInfo::remove(ServerID const& sid) {
|
|||
LOG_TOPIC("a0dcc", ERR, Logger::CLUSTER) << errorMessage;
|
||||
|
||||
return {errorCode, std::move(errorMessage)};
|
||||
>>>>>>> c922c5f1332482ef29dff794d8af394d31c1b737
|
||||
}
|
||||
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief clear follower list, no changes in agency necessary
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief inject the information about "servers" and "failoverCandidates"
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
void FollowerInfo::clear() {
|
||||
WRITE_LOCKER(writeLocker, _dataLock);
|
||||
_followers = std::make_shared<std::vector<ServerID>>();
|
||||
void FollowerInfo::injectFollowerInfoInternal(VPackBuilder& builder) const {
|
||||
auto ourselves = arangodb::ServerState::instance()->getId();
|
||||
TRI_ASSERT(builder.isOpenObject());
|
||||
builder.add(VPackValue(maintenance::SERVERS));
|
||||
{
|
||||
VPackArrayBuilder bb(&builder);
|
||||
builder.add(VPackValue(ourselves));
|
||||
for (auto const& f : *_followers) {
|
||||
builder.add(VPackValue(f));
|
||||
}
|
||||
}
|
||||
builder.add(VPackValue(StaticStrings::FailoverCandidates));
|
||||
{
|
||||
VPackArrayBuilder bb(&builder);
|
||||
builder.add(VPackValue(ourselves));
|
||||
for (auto const& f : *_failoverCandidates) {
|
||||
builder.add(VPackValue(f));
|
||||
}
|
||||
}
|
||||
TRI_ASSERT(builder.isOpenObject());
|
||||
}
|
||||
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief check whether the given server is a follower
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief change JSON under
|
||||
/// Current/Collection/<DB-name>/<Collection-ID>/<shard-ID>
|
||||
/// to add or remove a serverID, if add flag is true, the entry is added
|
||||
/// (if it is not yet there), otherwise the entry is removed (if it was
|
||||
/// there).
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
bool FollowerInfo::contains(ServerID const& sid) const {
|
||||
READ_LOCKER(readLocker, _dataLock);
|
||||
auto const& f = *_followers;
|
||||
return std::find(f.begin(), f.end(), sid) != f.end();
|
||||
VPackBuilder FollowerInfo::newShardEntry(VPackSlice oldValue) const {
|
||||
VPackBuilder newValue;
|
||||
TRI_ASSERT(oldValue.isObject());
|
||||
{
|
||||
VPackObjectBuilder b(&newValue);
|
||||
// Copy all but SERVERS and FailoverCandidates.
|
||||
// They will be injected later.
|
||||
for (auto const& it : VPackObjectIterator(oldValue)) {
|
||||
if (!it.key.isEqualString(maintenance::SERVERS) &&
|
||||
!it.key.isEqualString(StaticStrings::FailoverCandidates)) {
|
||||
newValue.add(it.key);
|
||||
newValue.add(it.value);
|
||||
}
|
||||
}
|
||||
injectFollowerInfoInternal(newValue);
|
||||
}
|
||||
return newValue;
|
||||
}
|
|
@ -25,11 +25,15 @@
|
|||
#ifndef ARANGOD_CLUSTER_FOLLOWER_INFO_H
|
||||
#define ARANGOD_CLUSTER_FOLLOWER_INFO_H 1
|
||||
|
||||
#include "ClusterInfo.h"
|
||||
|
||||
#include "Basics/Mutex.h"
|
||||
#include "Basics/ReadWriteLock.h"
|
||||
#include "Basics/Result.h"
|
||||
#include "Basics/WriteLocker.h"
|
||||
#include "ClusterInfo.h"
|
||||
#include "StorageEngine/EngineSelectorFeature.h"
|
||||
#include "StorageEngine/StorageEngine.h"
|
||||
#include "VocBase/LogicalCollection.h"
|
||||
|
||||
namespace arangodb {
|
||||
|
||||
|
@ -44,22 +48,41 @@ class Slice;
|
|||
class FollowerInfo {
|
||||
// This is the list of real local followers
|
||||
std::shared_ptr<std::vector<ServerID> const> _followers;
|
||||
// This is the list of followers that have been insync BEFORE we
|
||||
// triggered a failover to this server.
|
||||
// The list is filled only temporarily, and will be deleted as
|
||||
// soon as we can guarantee at least so many followers locally.
|
||||
std::shared_ptr<std::vector<ServerID> const> _failoverCandidates;
|
||||
|
||||
// The agencyMutex is used to synchronise access to the agency.
|
||||
// the _dataLock is used to sync the access to local data.
|
||||
// The agencyMutex is always locked before the _dataLock is locked.
|
||||
// The _canWriteLock is used to protect flag if we do have enough followers
|
||||
// The locking ordering to avoid dead locks has to be as follows:
|
||||
// 1.) _agencyMutex
|
||||
// 2.) _canWriteLock
|
||||
// 3.) _dataLock
|
||||
mutable Mutex _agencyMutex;
|
||||
mutable arangodb::basics::ReadWriteLock _canWriteLock;
|
||||
mutable arangodb::basics::ReadWriteLock _dataLock;
|
||||
|
||||
arangodb::LogicalCollection* _docColl;
|
||||
std::string _theLeader;
|
||||
// if the latter is empty, then we are leading
|
||||
std::string _theLeader;
|
||||
bool _theLeaderTouched;
|
||||
// flag if we have enough insnc followers and can pass through writes
|
||||
bool _canWrite;
|
||||
|
||||
public:
|
||||
explicit FollowerInfo(arangodb::LogicalCollection* d)
|
||||
: _followers(std::make_shared<std::vector<ServerID>>()),
|
||||
_failoverCandidates(std::make_shared<std::vector<ServerID>>()),
|
||||
_docColl(d),
|
||||
_theLeaderTouched(false) {}
|
||||
_theLeader(""),
|
||||
_theLeaderTouched(false),
|
||||
_canWrite(_docColl->replicationFactor() <= 1) {
|
||||
// On replicationfactor 1 we do not have any failover servers to maintain.
|
||||
// This should also disable satellite tracking.
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief get information about current followers of a shard.
|
||||
|
@ -70,6 +93,23 @@ class FollowerInfo {
|
|||
return _followers;
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief get information about current followers of a shard.
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
std::shared_ptr<std::vector<ServerID> const> getFailoverCandidates() const {
|
||||
READ_LOCKER(readLocker, _dataLock);
|
||||
return _failoverCandidates;
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief Take over leadership for this shard.
|
||||
/// Also inject information of a insync followers that we knew about
|
||||
/// before a failover to this server has happened
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
void takeOverLeadership(std::vector<std::string> const& previousInsyncFollowers);
|
||||
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief add a follower to a shard, this is only done by the server side
|
||||
/// of the "get-in-sync" capabilities. This reports to the agency under
|
||||
|
@ -106,6 +146,9 @@ class FollowerInfo {
|
|||
//////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
void setTheLeader(std::string const& who) {
|
||||
// Empty leader => we are now new leader.
|
||||
// This needs to be handled with takeOverLeadership
|
||||
TRI_ASSERT(!who.empty());
|
||||
WRITE_LOCKER(writeLocker, _dataLock);
|
||||
_theLeader = who;
|
||||
_theLeaderTouched = true;
|
||||
|
@ -128,6 +171,54 @@ class FollowerInfo {
|
|||
READ_LOCKER(readLocker, _dataLock);
|
||||
return _theLeaderTouched;
|
||||
}
|
||||
|
||||
bool allowedToWrite() {
|
||||
{
|
||||
auto engine = arangodb::EngineSelectorFeature::ENGINE;
|
||||
TRI_ASSERT(engine != nullptr);
|
||||
if (engine->inRecovery()) {
|
||||
return true;
|
||||
}
|
||||
READ_LOCKER(readLocker, _canWriteLock);
|
||||
if (_canWrite) {
|
||||
// Someone has decided we can write, fastPath!
|
||||
|
||||
#ifdef ARANGODB_USE_MAINTAINER_MODE
|
||||
// Invariant, we can only WRITE if we do not have other failover candidates
|
||||
READ_LOCKER(readLockerData, _dataLock);
|
||||
TRI_ASSERT(_followers->size() == _failoverCandidates->size());
|
||||
TRI_ASSERT(_followers->size() > _docColl->minReplicationFactor());
|
||||
#endif
|
||||
return _canWrite;
|
||||
}
|
||||
READ_LOCKER(readLockerData, _dataLock);
|
||||
TRI_ASSERT(_docColl != nullptr);
|
||||
if (_followers->size() + 1 < _docColl->minReplicationFactor()) {
|
||||
// We know that we still do not have enough followers
|
||||
return false;
|
||||
}
|
||||
}
|
||||
return updateFailoverCandidates();
|
||||
}
|
||||
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
/// @brief Inject the information about followers into the builder.
|
||||
/// Builder needs to be an open object and is not allowed to contain
|
||||
/// the keys "servers" and "failoverCandidates".
|
||||
//////////////////////////////////////////////////////////////////////////////
|
||||
void injectFollowerInfo(arangodb::velocypack::Builder& builder) const {
|
||||
READ_LOCKER(readLockerData, _dataLock);
|
||||
injectFollowerInfoInternal(builder);
|
||||
}
|
||||
|
||||
private:
|
||||
void injectFollowerInfoInternal(arangodb::velocypack::Builder& builder) const;
|
||||
|
||||
bool updateFailoverCandidates();
|
||||
|
||||
Result persistInAgency(bool isRemove) const;
|
||||
|
||||
arangodb::velocypack::Builder newShardEntry(arangodb::velocypack::Slice oldValue) const;
|
||||
};
|
||||
} // end namespace arangodb
|
||||
|
||||
|
|
|
@ -245,7 +245,8 @@ void handlePlanShard(VPackSlice const& cprops, VPackSlice const& ldb,
|
|||
{THE_LEADER, shouldBeLeading ? std::string() : leaderId},
|
||||
{SERVER_ID, serverId},
|
||||
{LOCAL_LEADER, lcol.get(THE_LEADER).copyString()},
|
||||
{FOLLOWERS_TO_DROP, followersToDropString}},
|
||||
{FOLLOWERS_TO_DROP, followersToDropString},
|
||||
{OLD_CURRENT_COUNTER, std::to_string(feature.getCurrentCounter())}},
|
||||
HIGHER_PRIORITY, properties));
|
||||
} else {
|
||||
LOG_TOPIC("0285b", DEBUG, Logger::MAINTENANCE)
|
||||
|
@ -726,26 +727,14 @@ static VPackBuilder assembleLocalCollectionInfo(
|
|||
}
|
||||
}
|
||||
}
|
||||
ret.add(VPackValue(SERVERS));
|
||||
{
|
||||
VPackArrayBuilder a(&ret);
|
||||
ret.add(VPackValue(ourselves));
|
||||
// planServers may be `none` in the case that the shard is not
|
||||
// contained in Plan, but in local.
|
||||
if (planServers.isArray()) {
|
||||
std::shared_ptr<std::vector<std::string> const> current =
|
||||
collection->followers()->get();
|
||||
for (auto const& server : *current) {
|
||||
ret.add(VPackValue(server));
|
||||
}
|
||||
}
|
||||
}
|
||||
collection->followers()->injectFollowerInfo(ret);
|
||||
}
|
||||
return ret;
|
||||
} catch (std::exception const& e) {
|
||||
ret.clear();
|
||||
std::string errorMsg(
|
||||
"Maintenance::assembleLocalCollectionInfo: Failed to lookup database ");
|
||||
"Maintenance::assembleLocalCollectionInfo: Failed to lookup "
|
||||
"database ");
|
||||
errorMsg += database;
|
||||
errorMsg += ", exception: ";
|
||||
errorMsg += e.what();
|
||||
|
@ -852,8 +841,10 @@ arangodb::Result arangodb::maintenance::reportInCurrent(
|
|||
auto const planPath = std::vector<std::string>{dbName, colName, "shards", shName};
|
||||
if (!pdbs.hasKey(planPath)) {
|
||||
LOG_TOPIC("43242", DEBUG, Logger::MAINTENANCE)
|
||||
<< "Ooops, we have a shard for which we believe to be the leader,"
|
||||
" but the Plan does not have it any more, we do not report in "
|
||||
<< "Ooops, we have a shard for which we believe to be the "
|
||||
"leader,"
|
||||
" but the Plan does not have it any more, we do not report "
|
||||
"in "
|
||||
"Current about this, database: "
|
||||
<< dbName << ", shard: " << shName;
|
||||
continue;
|
||||
|
@ -863,7 +854,8 @@ arangodb::Result arangodb::maintenance::reportInCurrent(
|
|||
if (!thePlanList.isArray() || thePlanList.length() == 0 ||
|
||||
!thePlanList[0].isString() || !thePlanList[0].isEqualStringUnchecked(serverId)) {
|
||||
LOG_TOPIC("87776", DEBUG, Logger::MAINTENANCE)
|
||||
<< "Ooops, we have a shard for which we believe to be the leader,"
|
||||
<< "Ooops, we have a shard for which we believe to be the "
|
||||
"leader,"
|
||||
" but the Plan says otherwise, we do not report in Current "
|
||||
"about this, database: "
|
||||
<< dbName << ", shard: " << shName;
|
||||
|
@ -923,7 +915,8 @@ arangodb::Result arangodb::maintenance::reportInCurrent(
|
|||
if (!pdbs.hasKey(planPath)) {
|
||||
LOG_TOPIC("65432", DEBUG, Logger::MAINTENANCE)
|
||||
<< "Ooops, we have a shard for which we believe that we "
|
||||
"just resigned, but the Plan does not have it any more,"
|
||||
"just resigned, but the Plan does not have it any "
|
||||
"more,"
|
||||
" we do not report in Current about this, database: "
|
||||
<< dbName << ", shard: " << shName;
|
||||
continue;
|
||||
|
|
|
@ -57,7 +57,8 @@ bool findNotDoneActions(std::shared_ptr<maintenance::Action> const& action) {
|
|||
MaintenanceFeature::MaintenanceFeature(application_features::ApplicationServer& server)
|
||||
: ApplicationFeature(server, "Maintenance"),
|
||||
_forceActivation(false),
|
||||
_maintenanceThreadsMax(2) {
|
||||
_maintenanceThreadsMax(2),
|
||||
_currentCounter(0) {
|
||||
// the number of threads will be adjusted later. it's just that we want to
|
||||
// initialize all members properly
|
||||
|
||||
|
@ -116,7 +117,8 @@ void MaintenanceFeature::validateOptions(std::shared_ptr<ProgramOptions> options
|
|||
<< "Need at least" << minThreadLimit << "maintenance-threads";
|
||||
_maintenanceThreadsMax = minThreadLimit;
|
||||
} else if (_maintenanceThreadsMax >= maxThreadLimit) {
|
||||
LOG_TOPIC("8fb0e", WARN, Logger::MAINTENANCE) << "maintenance-threads limited to " << maxThreadLimit;
|
||||
LOG_TOPIC("8fb0e", WARN, Logger::MAINTENANCE)
|
||||
<< "maintenance-threads limited to " << maxThreadLimit;
|
||||
_maintenanceThreadsMax = maxThreadLimit;
|
||||
}
|
||||
}
|
||||
|
@ -129,8 +131,9 @@ void MaintenanceFeature::start() {
|
|||
|
||||
// _forceActivation is set by the catch tests
|
||||
if (!_forceActivation && (serverState->isAgent() || serverState->isSingleServer())) {
|
||||
LOG_TOPIC("deb1a", TRACE, Logger::MAINTENANCE) << "Disable maintenance-threads"
|
||||
<< " for single-server or agents.";
|
||||
LOG_TOPIC("deb1a", TRACE, Logger::MAINTENANCE)
|
||||
<< "Disable maintenance-threads"
|
||||
<< " for single-server or agents.";
|
||||
return;
|
||||
}
|
||||
|
||||
|
@ -442,7 +445,7 @@ std::shared_ptr<Action> MaintenanceFeature::findReadyAction(std::unordered_set<s
|
|||
} // else
|
||||
} // for
|
||||
}
|
||||
} // WRITE
|
||||
} // WRITE
|
||||
|
||||
// no pointer ... wait 0.1 seconds unless woken up
|
||||
if (!_isShuttingDown) {
|
||||
|
@ -733,3 +736,41 @@ void MaintenanceFeature::delShardVersion(std::string const& shname) {
|
|||
_shardVersion.erase(it);
|
||||
}
|
||||
}
|
||||
|
||||
uint64_t MaintenanceFeature::getCurrentCounter() const {
|
||||
// It is guaranteed that getCurrentCounter is not executed
|
||||
// concurrent to increase / wait.
|
||||
// This guarantee is created by the following:
|
||||
// 1) There is one inifinite loop that will call
|
||||
// PhaseOne and PhaseTwo in exactly this ordering.
|
||||
// It is guaranteed that only one thread at a time is
|
||||
// in this loop.
|
||||
// Between PhaseOne and PhaseTwo the increaseCurrentCounter is called
|
||||
// Within PhaseOne this getCurrentCounter is called, but never after.
|
||||
// so getCurrentCounter and increaseCurrentCounter are strictily serialized.
|
||||
// 2) waitForLargerCurrentCounter can be called in concurrent threads at any time.
|
||||
// It is read-only, so it is save to have it concurrent to getCurrentCounter
|
||||
// without any locking.
|
||||
// However we need locking for increase and waitFor in order to guarantee
|
||||
// it's functionallity.
|
||||
// For now we actually do not need this guard, but as this is NOT performance
|
||||
// critical we can simply get it, just to be save for later use.
|
||||
std::unique_lock<std::mutex> guard(_currentCounterLock);
|
||||
return _currentCounter;
|
||||
}
|
||||
|
||||
void MaintenanceFeature::increaseCurrentCounter() {
|
||||
std::unique_lock<std::mutex> guard(_currentCounterLock);
|
||||
_currentCounter++;
|
||||
_currentCounterCondition.notify_all();
|
||||
}
|
||||
|
||||
void MaintenanceFeature::waitForLargerCurrentCounter(uint64_t old) {
|
||||
std::unique_lock<std::mutex> guard(_currentCounterLock);
|
||||
if (_currentCounter > old) {
|
||||
return;
|
||||
}
|
||||
_currentCounterCondition.wait(guard);
|
||||
TRI_ASSERT(_currentCounter > old);
|
||||
return;
|
||||
}
|
|
@ -36,7 +36,7 @@
|
|||
|
||||
namespace arangodb {
|
||||
|
||||
template<typename T>
|
||||
template <typename T>
|
||||
struct SharedPtrComparer {
|
||||
bool operator()(std::shared_ptr<T> const& a, std::shared_ptr<T> const& b) {
|
||||
if (a == nullptr || b == nullptr) {
|
||||
|
@ -50,8 +50,6 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
|||
public:
|
||||
explicit MaintenanceFeature(application_features::ApplicationServer&);
|
||||
|
||||
MaintenanceFeature();
|
||||
|
||||
virtual ~MaintenanceFeature() {}
|
||||
|
||||
struct errors_t {
|
||||
|
@ -156,7 +154,8 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
|||
* @brief Find and return first found not-done action or nullptr
|
||||
* @param desc Description of sought action
|
||||
*/
|
||||
std::shared_ptr<maintenance::Action> findFirstNotDoneAction(std::shared_ptr<maintenance::ActionDescription> const& desc);
|
||||
std::shared_ptr<maintenance::Action> findFirstNotDoneAction(
|
||||
std::shared_ptr<maintenance::ActionDescription> const& desc);
|
||||
|
||||
/**
|
||||
* @brief add index error to bucket
|
||||
|
@ -298,19 +297,44 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
|||
*/
|
||||
void delShardVersion(std::string const& shardId);
|
||||
|
||||
/**
|
||||
* @brief Get the number of loadCurrent operations.
|
||||
* NOTE: The Counter functions can be removed
|
||||
* as soon as we use a push based approach on Plan and Current
|
||||
* @return The most recent count for getCurrent calls
|
||||
*/
|
||||
uint64_t getCurrentCounter() const;
|
||||
|
||||
/**
|
||||
* @brief increase the counter for loadCurrent operations triggered
|
||||
* during maintenance. This is used to delay some Actions, that
|
||||
* require a recent current to continue
|
||||
*/
|
||||
void increaseCurrentCounter();
|
||||
|
||||
/**
|
||||
* @brief wait until the current counter is larger then the given old one
|
||||
* the idea here is to first request the `getCurrentCounter`.
|
||||
* @param old The last number of getCurrentCounter(). This function will
|
||||
* return only of the recent counter is larger than old.
|
||||
*/
|
||||
void waitForLargerCurrentCounter(uint64_t old);
|
||||
|
||||
private:
|
||||
/// @brief common code used by multiple constructors
|
||||
void init();
|
||||
|
||||
/// @brief Search for first action matching hash and predicate
|
||||
/// @return shared pointer to action object if exists, empty shared_ptr if not
|
||||
std::shared_ptr<maintenance::Action> findFirstActionHash(size_t hash,
|
||||
std::function<bool(std::shared_ptr<maintenance::Action> const&)> const& predicate);
|
||||
std::shared_ptr<maintenance::Action> findFirstActionHash(
|
||||
size_t hash,
|
||||
std::function<bool(std::shared_ptr<maintenance::Action> const&)> const& predicate);
|
||||
|
||||
/// @brief Search for first action matching hash and predicate (with lock already held by caller)
|
||||
/// @return shared pointer to action object if exists, empty shared_ptr if not
|
||||
std::shared_ptr<maintenance::Action> findFirstActionHashNoLock(size_t hash,
|
||||
std::function<bool(std::shared_ptr<maintenance::Action> const&)> const& predicate);
|
||||
std::shared_ptr<maintenance::Action> findFirstActionHashNoLock(
|
||||
size_t hash,
|
||||
std::function<bool(std::shared_ptr<maintenance::Action> const&)> const& predicate);
|
||||
|
||||
/// @brief Search for action by Id
|
||||
/// @return shared pointer to action object if exists, nullptr if not
|
||||
|
@ -321,7 +345,6 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
|||
std::shared_ptr<maintenance::Action> findActionIdNoLock(uint64_t hash);
|
||||
|
||||
protected:
|
||||
|
||||
/// @brief option for forcing this feature to always be enable - used by the catch tests
|
||||
bool _forceActivation;
|
||||
|
||||
|
@ -365,8 +388,8 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
|||
// we need to leave the action in _prioQueue (since we cannot remove anything
|
||||
// but the top from it), and simply put it into a different state.
|
||||
std::priority_queue<std::shared_ptr<maintenance::Action>,
|
||||
std::vector<std::shared_ptr<maintenance::Action>>,
|
||||
SharedPtrComparer<maintenance::Action>> _prioQueue;
|
||||
std::vector<std::shared_ptr<maintenance::Action>>, SharedPtrComparer<maintenance::Action>>
|
||||
_prioQueue;
|
||||
|
||||
/// @brief lock to protect _actionRegistry and state changes to MaintenanceActions within
|
||||
mutable arangodb::basics::ReadWriteLock _actionRegistryLock;
|
||||
|
@ -404,6 +427,15 @@ class MaintenanceFeature : public application_features::ApplicationFeature {
|
|||
/// @brief shards have versions in order to be able to distinguish between
|
||||
/// independant actions
|
||||
std::unordered_map<std::string, size_t> _shardVersion;
|
||||
|
||||
/// @brief Mutex for the current counter condition variable
|
||||
mutable std::mutex _currentCounterLock;
|
||||
|
||||
/// @brief Condition variable where Actions can wait on until _currentCounter increased
|
||||
std::condition_variable _currentCounterCondition;
|
||||
|
||||
/// @brief counter for load_current requests.
|
||||
uint64_t _currentCounter;
|
||||
};
|
||||
|
||||
} // namespace arangodb
|
||||
|
|
|
@ -69,6 +69,7 @@ constexpr char const* THE_LEADER = "theLeader";
|
|||
constexpr char const* UNDERSCORE = "_";
|
||||
constexpr char const* UPDATE_COLLECTION = "UpdateCollection";
|
||||
constexpr char const* WAIT_FOR_SYNC = "waitForSync";
|
||||
constexpr char const* OLD_CURRENT_COUNTER = "oldCurrentCounter";
|
||||
|
||||
} // namespace maintenance
|
||||
} // namespace arangodb
|
||||
|
|
|
@ -77,21 +77,36 @@ UpdateCollection::UpdateCollection(MaintenanceFeature& feature, ActionDescriptio
|
|||
}
|
||||
TRI_ASSERT(desc.has(FOLLOWERS_TO_DROP));
|
||||
|
||||
TRI_ASSERT(desc.has(OLD_CURRENT_COUNTER));
|
||||
|
||||
if (!error.str().empty()) {
|
||||
LOG_TOPIC("a6e4c", ERR, Logger::MAINTENANCE) << "UpdateCollection: " << error.str();
|
||||
LOG_TOPIC("a6e4c", ERR, Logger::MAINTENANCE)
|
||||
<< "UpdateCollection: " << error.str();
|
||||
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
||||
setState(FAILED);
|
||||
}
|
||||
}
|
||||
|
||||
void handleLeadership(LogicalCollection& collection, std::string const& localLeader,
|
||||
std::string const& plannedLeader, std::string const& followersToDrop) {
|
||||
std::string const& plannedLeader,
|
||||
std::string const& followersToDrop, std::string const& databaseName,
|
||||
uint64_t oldCounter, MaintenanceFeature& feature) {
|
||||
auto& followers = collection.followers();
|
||||
|
||||
if (plannedLeader.empty()) { // Planned to lead
|
||||
if (!localLeader.empty()) { // We were not leader, assume leadership
|
||||
followers->setTheLeader(std::string());
|
||||
followers->clear();
|
||||
// This will block the thread until we fetched a new current version
|
||||
// in maintenance main thread.
|
||||
feature.waitForLargerCurrentCounter(oldCounter);
|
||||
auto currentInfo = ClusterInfo::instance()->getCollectionCurrent(
|
||||
databaseName, std::to_string(collection.planId()));
|
||||
if (currentInfo == nullptr) {
|
||||
// Collection has been dropped we cannot continue here.
|
||||
return;
|
||||
}
|
||||
TRI_ASSERT(currentInfo != nullptr);
|
||||
auto failoverCandidates = currentInfo->failoverCandidates(collection.name());
|
||||
followers->takeOverLeadership(failoverCandidates);
|
||||
transaction::cluster::abortFollowerTransactionsOnShard(collection.id());
|
||||
} else {
|
||||
// If someone (the Supervision most likely) has thrown
|
||||
|
@ -138,6 +153,8 @@ bool UpdateCollection::first() {
|
|||
auto const& localLeader = _description.get(LOCAL_LEADER);
|
||||
auto const& followersToDrop = _description.get(FOLLOWERS_TO_DROP);
|
||||
auto const& props = properties();
|
||||
auto const& oldCounterString = _description.get(OLD_CURRENT_COUNTER);
|
||||
uint64_t oldCounter = basics::StringUtils::uint64(oldCounterString);
|
||||
|
||||
try {
|
||||
DatabaseGuard guard(database);
|
||||
|
@ -152,7 +169,8 @@ bool UpdateCollection::first() {
|
|||
// resignation case is not handled here, since then
|
||||
// ourselves does not appear in shards[shard] but only
|
||||
// "_" + ourselves.
|
||||
handleLeadership(*coll, localLeader, plannedLeader, followersToDrop);
|
||||
handleLeadership(*coll, localLeader, plannedLeader, followersToDrop,
|
||||
vocbase.name(), oldCounter, feature());
|
||||
_result = Collections::updateProperties(*coll, props, false); // always a full-update
|
||||
|
||||
if (!_result.ok()) {
|
||||
|
@ -173,7 +191,8 @@ bool UpdateCollection::first() {
|
|||
std::stringstream error;
|
||||
|
||||
error << "action " << _description << " failed with exception " << e.what();
|
||||
LOG_TOPIC("79442", WARN, Logger::MAINTENANCE) << "UpdateCollection: " << error.str();
|
||||
LOG_TOPIC("79442", WARN, Logger::MAINTENANCE)
|
||||
<< "UpdateCollection: " << error.str();
|
||||
_result.reset(TRI_ERROR_INTERNAL, error.str());
|
||||
}
|
||||
|
||||
|
|
|
@ -65,6 +65,11 @@ enum class RequestLane {
|
|||
// V8 or having high priority.
|
||||
CLUSTER_INTERNAL,
|
||||
|
||||
// For requests from the DBserver to the Coordinator or
|
||||
// from the Coordinator to the DBserver. Using AQL
|
||||
// these have Medium priority.
|
||||
CLUSTER_AQL,
|
||||
|
||||
// For requests from the from the Coordinator to the
|
||||
// DBserver using V8.
|
||||
CLUSTER_V8,
|
||||
|
@ -115,6 +120,8 @@ inline RequestPriority PriorityRequestLane(RequestLane lane) {
|
|||
return RequestPriority::LOW;
|
||||
case RequestLane::CLUSTER_INTERNAL:
|
||||
return RequestPriority::HIGH;
|
||||
case RequestLane::CLUSTER_AQL:
|
||||
return RequestPriority::MED;
|
||||
case RequestLane::CLUSTER_V8:
|
||||
return RequestPriority::LOW;
|
||||
case RequestLane::CLUSTER_ADMIN:
|
||||
|
|
|
@ -41,9 +41,7 @@ class InternalRestTraverserHandler : public RestVocbaseBaseHandler {
|
|||
char const* name() const override final {
|
||||
return "InternalRestTraverserHandler";
|
||||
}
|
||||
RequestLane lane() const override final {
|
||||
return RequestLane::CLUSTER_INTERNAL;
|
||||
}
|
||||
RequestLane lane() const override final { return RequestLane::CLUSTER_AQL; }
|
||||
|
||||
private:
|
||||
// @brief create a new Traverser Engine.
|
||||
|
|
|
@ -82,7 +82,7 @@ Conductor::Conductor(uint64_t executionNumber, TRI_vocbase_t& vocbase,
|
|||
if (_asyncMode) {
|
||||
LOG_TOPIC("1b1c2", DEBUG, Logger::PREGEL) << "Running in async mode";
|
||||
}
|
||||
VPackSlice lazy = _userParams.slice().get( Utils::lazyLoadingKey);
|
||||
VPackSlice lazy = _userParams.slice().get(Utils::lazyLoadingKey);
|
||||
_lazyLoading = _algorithm->supportsLazyLoading();
|
||||
_lazyLoading = _lazyLoading && (lazy.isNone() || lazy.getBoolean());
|
||||
if (_lazyLoading) {
|
||||
|
@ -98,8 +98,7 @@ Conductor::Conductor(uint64_t executionNumber, TRI_vocbase_t& vocbase,
|
|||
}
|
||||
|
||||
Conductor::~Conductor() {
|
||||
if (_state != ExecutionState::CANCELED &&
|
||||
_state != ExecutionState::DEFAULT) {
|
||||
if (_state != ExecutionState::CANCELED && _state != ExecutionState::DEFAULT) {
|
||||
try {
|
||||
this->cancel();
|
||||
} catch (...) {
|
||||
|
@ -120,11 +119,13 @@ void Conductor::start() {
|
|||
_globalSuperstep = 0;
|
||||
_state = ExecutionState::RUNNING;
|
||||
|
||||
LOG_TOPIC("3a255", DEBUG, Logger::PREGEL) << "Telling workers to load the data";
|
||||
LOG_TOPIC("3a255", DEBUG, Logger::PREGEL)
|
||||
<< "Telling workers to load the data";
|
||||
int res = _initializeWorkers(Utils::startExecutionPath, VPackSlice());
|
||||
if (res != TRI_ERROR_NO_ERROR) {
|
||||
_state = ExecutionState::CANCELED;
|
||||
LOG_TOPIC("30171", ERR, Logger::PREGEL) << "Not all DBServers started the execution";
|
||||
LOG_TOPIC("30171", ERR, Logger::PREGEL)
|
||||
<< "Not all DBServers started the execution";
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -170,7 +171,8 @@ bool Conductor::_startGlobalStep() {
|
|||
_masterContext->_enterNextGSS = false;
|
||||
proceed = _masterContext->postGlobalSuperstep();
|
||||
if (!proceed) {
|
||||
LOG_TOPIC("0aa8e", DEBUG, Logger::PREGEL) << "Master context ended execution";
|
||||
LOG_TOPIC("0aa8e", DEBUG, Logger::PREGEL)
|
||||
<< "Master context ended execution";
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -212,7 +214,8 @@ bool Conductor::_startGlobalStep() {
|
|||
res = _sendToAllDBServers(Utils::startGSSPath, b); // call me maybe
|
||||
if (res != TRI_ERROR_NO_ERROR) {
|
||||
_state = ExecutionState::IN_ERROR;
|
||||
LOG_TOPIC("f34bb", ERR, Logger::PREGEL) << "Conductor could not start GSS " << _globalSuperstep;
|
||||
LOG_TOPIC("f34bb", ERR, Logger::PREGEL)
|
||||
<< "Conductor could not start GSS " << _globalSuperstep;
|
||||
// the recovery mechanisms should take care od this
|
||||
} else {
|
||||
LOG_TOPIC("411a5", DEBUG, Logger::PREGEL) << "Conductor started new gss " << _globalSuperstep;
|
||||
|
@ -236,8 +239,9 @@ void Conductor::finishedWorkerStartup(VPackSlice const& data) {
|
|||
return;
|
||||
}
|
||||
|
||||
LOG_TOPIC("76631", INFO, Logger::PREGEL) << "Running pregel with " << _totalVerticesCount
|
||||
<< " vertices, " << _totalEdgesCount << " edges";
|
||||
LOG_TOPIC("76631", INFO, Logger::PREGEL)
|
||||
<< "Running pregel with " << _totalVerticesCount << " vertices, "
|
||||
<< _totalEdgesCount << " edges";
|
||||
if (_masterContext) {
|
||||
_masterContext->_globalSuperstep = 0;
|
||||
_masterContext->_vertexCount = _totalVerticesCount;
|
||||
|
@ -356,7 +360,8 @@ void Conductor::finishedRecoveryStep(VPackSlice const& data) {
|
|||
res = _sendToAllDBServers(Utils::continueRecoveryPath, b);
|
||||
|
||||
} else {
|
||||
LOG_TOPIC("6ecf2", INFO, Logger::PREGEL) << "Recovery finished. Proceeding normally";
|
||||
LOG_TOPIC("6ecf2", INFO, Logger::PREGEL)
|
||||
<< "Recovery finished. Proceeding normally";
|
||||
|
||||
// build the message, works for all cases
|
||||
VPackBuilder b;
|
||||
|
@ -393,7 +398,8 @@ void Conductor::startRecovery() {
|
|||
if (_state != ExecutionState::RUNNING && _state != ExecutionState::IN_ERROR) {
|
||||
return; // maybe we are already in recovery mode
|
||||
} else if (_algorithm->supportsCompensation() == false) {
|
||||
LOG_TOPIC("12e0e", ERR, Logger::PREGEL) << "Algorithm does not support recovery";
|
||||
LOG_TOPIC("12e0e", ERR, Logger::PREGEL)
|
||||
<< "Algorithm does not support recovery";
|
||||
cancelNoLock();
|
||||
return;
|
||||
}
|
||||
|
@ -407,14 +413,15 @@ void Conductor::startRecovery() {
|
|||
|
||||
// let's wait for a final state in the cluster
|
||||
_workHandle = SchedulerFeature::SCHEDULER->queueDelay(
|
||||
RequestLane::CLUSTER_INTERNAL, std::chrono::seconds(2), [this](bool cancelled) {
|
||||
RequestLane::CLUSTER_AQL, std::chrono::seconds(2), [this](bool cancelled) {
|
||||
if (cancelled || _state != ExecutionState::RECOVERING) {
|
||||
return; // seems like we are canceled
|
||||
}
|
||||
std::vector<ServerID> goodServers;
|
||||
int res = PregelFeature::instance()->recoveryManager()->filterGoodServers(_dbServers, goodServers);
|
||||
if (res != TRI_ERROR_NO_ERROR) {
|
||||
LOG_TOPIC("3d08b", ERR, Logger::PREGEL) << "Recovery proceedings failed";
|
||||
LOG_TOPIC("3d08b", ERR, Logger::PREGEL)
|
||||
<< "Recovery proceedings failed";
|
||||
cancelNoLock();
|
||||
return;
|
||||
}
|
||||
|
@ -614,15 +621,15 @@ int Conductor::_initializeWorkers(std::string const& suffix, VPackSlice addition
|
|||
}
|
||||
|
||||
std::shared_ptr<ClusterComm> cc = ClusterComm::instance();
|
||||
size_t nrGood = cc->performRequests(requests, 5.0 * 60.0,
|
||||
LogTopic("Pregel Conductor"), false);
|
||||
size_t nrGood =
|
||||
cc->performRequests(requests, 5.0 * 60.0, LogTopic("Pregel Conductor"), false);
|
||||
Utils::printResponses(requests);
|
||||
return nrGood == requests.size() ? TRI_ERROR_NO_ERROR : TRI_ERROR_FAILED;
|
||||
}
|
||||
|
||||
int Conductor::_finalizeWorkers() {
|
||||
_callbackMutex.assertLockedByCurrentThread();
|
||||
_finalizationStartTimeSecs = TRI_microtime();
|
||||
_finalizationStartTimeSecs = TRI_microtime();
|
||||
|
||||
bool store = _state == ExecutionState::DONE;
|
||||
store = store && _storeResults;
|
||||
|
@ -651,7 +658,6 @@ int Conductor::_finalizeWorkers() {
|
|||
}
|
||||
|
||||
void Conductor::finishedWorkerFinalize(VPackSlice data) {
|
||||
|
||||
MUTEX_LOCKER(guard, _callbackMutex);
|
||||
_ensureUniqueResponse(data);
|
||||
if (_respondedServers.size() != _dbServers.size()) {
|
||||
|
@ -674,9 +680,8 @@ void Conductor::finishedWorkerFinalize(VPackSlice data) {
|
|||
|
||||
LOG_TOPIC("063b5", INFO, Logger::PREGEL) << "Done. We did " << _globalSuperstep << " rounds";
|
||||
LOG_TOPIC("3cfa8", INFO, Logger::PREGEL)
|
||||
<< "Startup Time: " << _computationStartTimeSecs - _startTimeSecs << "s";
|
||||
LOG_TOPIC("d43cb", INFO, Logger::PREGEL)
|
||||
<< "Computation Time: " << compTime << "s";
|
||||
<< "Startup Time: " << _computationStartTimeSecs - _startTimeSecs << "s";
|
||||
LOG_TOPIC("d43cb", INFO, Logger::PREGEL) << "Computation Time: " << compTime << "s";
|
||||
LOG_TOPIC("74e05", INFO, Logger::PREGEL) << "Storage Time: " << storeTime << "s";
|
||||
LOG_TOPIC("06f03", INFO, Logger::PREGEL) << "Overall: " << totalRuntimeSecs() << "s";
|
||||
LOG_TOPIC("03f2e", DEBUG, Logger::PREGEL) << "Stats: " << debugOut.toString();
|
||||
|
@ -686,7 +691,7 @@ void Conductor::finishedWorkerFinalize(VPackSlice data) {
|
|||
auto* scheduler = SchedulerFeature::SCHEDULER;
|
||||
if (scheduler) {
|
||||
uint64_t exe = _executionNumber;
|
||||
scheduler->queue(RequestLane::CLUSTER_INTERNAL, [exe] {
|
||||
scheduler->queue(RequestLane::CLUSTER_AQL, [exe] {
|
||||
auto pf = PregelFeature::instance();
|
||||
if (pf) {
|
||||
pf->cleanupConductor(exe);
|
||||
|
@ -770,8 +775,7 @@ int Conductor::_sendToAllDBServers(std::string const& path, VPackBuilder const&
|
|||
if (conductor) {
|
||||
TRI_vocbase_t& vocbase = conductor->_vocbaseGuard.database();
|
||||
VPackBuilder response;
|
||||
PregelFeature::handleWorkerRequest(vocbase, path,
|
||||
message.slice(), response);
|
||||
PregelFeature::handleWorkerRequest(vocbase, path, message.slice(), response);
|
||||
}
|
||||
});
|
||||
}
|
||||
|
@ -794,9 +798,10 @@ int Conductor::_sendToAllDBServers(std::string const& path, VPackBuilder const&
|
|||
requests.emplace_back("server:" + server, rest::RequestType::POST, base + path, body);
|
||||
}
|
||||
|
||||
size_t nrGood = cc->performRequests(requests, 5.0 * 60.0,
|
||||
LogTopic("Pregel Conductor"), false);
|
||||
LOG_TOPIC("9de62", TRACE, Logger::PREGEL) << "Send " << path << " to " << nrGood << " servers";
|
||||
size_t nrGood =
|
||||
cc->performRequests(requests, 5.0 * 60.0, LogTopic("Pregel Conductor"), false);
|
||||
LOG_TOPIC("9de62", TRACE, Logger::PREGEL)
|
||||
<< "Send " << path << " to " << nrGood << " servers";
|
||||
Utils::printResponses(requests);
|
||||
if (handle && nrGood == requests.size()) {
|
||||
for (ClusterCommRequest const& req : requests) {
|
||||
|
|
|
@ -1147,7 +1147,8 @@ Result RestReplicationHandler::processRestoreCollectionCoordinator(
|
|||
}
|
||||
|
||||
if (!isValidMinReplFactorSlice) {
|
||||
if (replFactorSlice.isString() && replFactorSlice.isEqualString("satellite")) {
|
||||
if (replFactorSlice.isString() &&
|
||||
replFactorSlice.isEqualString("satellite")) {
|
||||
minReplicationFactor = 0;
|
||||
} else if (minReplicationFactor <= 0) {
|
||||
minReplicationFactor = 1;
|
||||
|
|
|
@ -53,9 +53,9 @@ bool isDirectDeadlockLane(RequestLane lane) {
|
|||
// Those tasks can not be executed directly.
|
||||
return lane == RequestLane::TASK_V8 || lane == RequestLane::CLIENT_V8 ||
|
||||
lane == RequestLane::CLUSTER_V8 || lane == RequestLane::INTERNAL_LOW ||
|
||||
lane == RequestLane::SERVER_REPLICATION ||
|
||||
lane == RequestLane::CLUSTER_ADMIN || lane == RequestLane::CLUSTER_INTERNAL ||
|
||||
lane == RequestLane::AGENCY_CLUSTER || lane == RequestLane::CLIENT_AQL;
|
||||
lane == RequestLane::SERVER_REPLICATION || lane == RequestLane::CLUSTER_ADMIN ||
|
||||
lane == RequestLane::CLUSTER_INTERNAL || lane == RequestLane::AGENCY_CLUSTER ||
|
||||
lane == RequestLane::CLIENT_AQL || lane == RequestLane::CLUSTER_AQL;
|
||||
}
|
||||
|
||||
} // namespace
|
||||
|
|
|
@ -1720,7 +1720,7 @@ OperationResult transaction::Methods::insertLocal(std::string const& collectionN
|
|||
if (!options.isSynchronousReplicationFrom.empty()) {
|
||||
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION, options);
|
||||
}
|
||||
if (followerInfo->get()->size() + 1 < collection->minReplicationFactor()) {
|
||||
if (!followerInfo->allowedToWrite()) {
|
||||
// We cannot fulfill minimum replication Factor.
|
||||
// Reject write.
|
||||
LOG_TOPIC("d7306", ERR, Logger::REPLICATION)
|
||||
|
@ -2044,7 +2044,7 @@ OperationResult transaction::Methods::modifyLocal(std::string const& collectionN
|
|||
if (!options.isSynchronousReplicationFrom.empty()) {
|
||||
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
||||
}
|
||||
if (followerInfo->get()->size() + 1 < collection->minReplicationFactor()) {
|
||||
if (!followerInfo->allowedToWrite()) {
|
||||
// We cannot fulfill minimum replication Factor.
|
||||
// Reject write.
|
||||
LOG_TOPIC("2e35a", ERR, Logger::REPLICATION)
|
||||
|
@ -2324,7 +2324,7 @@ OperationResult transaction::Methods::removeLocal(std::string const& collectionN
|
|||
if (!options.isSynchronousReplicationFrom.empty()) {
|
||||
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
||||
}
|
||||
if (followerInfo->get()->size() + 1 < collection->minReplicationFactor()) {
|
||||
if (!followerInfo->allowedToWrite()) {
|
||||
// We cannot fulfill minimum replication Factor.
|
||||
// Reject write.
|
||||
LOG_TOPIC("f1f8e", ERR, Logger::REPLICATION)
|
||||
|
@ -2559,7 +2559,7 @@ OperationResult transaction::Methods::truncateLocal(std::string const& collectio
|
|||
if (!options.isSynchronousReplicationFrom.empty()) {
|
||||
return OperationResult(TRI_ERROR_CLUSTER_SHARD_LEADER_REFUSES_REPLICATION);
|
||||
}
|
||||
if (followerInfo->get()->size() + 1 < collection->minReplicationFactor()) {
|
||||
if (!followerInfo->allowedToWrite()) {
|
||||
// We cannot fulfill minimum replication Factor.
|
||||
// Reject write.
|
||||
LOG_TOPIC("7c1d4", ERR, Logger::REPLICATION)
|
||||
|
|
|
@ -228,6 +228,7 @@ std::string const StaticStrings::GraphCreateCollection("createCollection");
|
|||
|
||||
// Replication
|
||||
std::string const StaticStrings::ReplicationSoftLockOnly("doSoftLockOnly");
|
||||
std::string const StaticStrings::FailoverCandidates("failoverCandidates");
|
||||
|
||||
// misc strings
|
||||
std::string const StaticStrings::LastValue("lastValue");
|
||||
|
|
|
@ -210,6 +210,7 @@ class StaticStrings {
|
|||
|
||||
// Replication
|
||||
static std::string const ReplicationSoftLockOnly;
|
||||
static std::string const FailoverCandidates;
|
||||
|
||||
// misc strings
|
||||
static std::string const LastValue;
|
||||
|
|
|
@ -252,13 +252,39 @@ class IResearchQueryOptimizationTest : public ::testing::Test {
|
|||
|
||||
NS_END
|
||||
|
||||
static std::vector<std::string> const EMPTY;
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
// --SECTION-- test suite
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void addLinkToCollection(std::shared_ptr<arangodb::iresearch::IResearchView>& view) {
|
||||
auto updateJson = VPackParser::fromJson(
|
||||
"{ \"links\" : {"
|
||||
"\"collection_1\" : { \"includeAllFields\" : true }"
|
||||
"}}");
|
||||
EXPECT_TRUE((view->properties(updateJson->slice(), true).ok()));
|
||||
|
||||
arangodb::velocypack::Builder builder;
|
||||
|
||||
builder.openObject();
|
||||
view->properties(builder, arangodb::LogicalDataSource::makeFlags(
|
||||
arangodb::LogicalDataSource::Serialize::Detailed));
|
||||
builder.close();
|
||||
|
||||
auto slice = builder.slice();
|
||||
EXPECT_TRUE(slice.isObject());
|
||||
EXPECT_TRUE(slice.get("name").copyString() == "testView");
|
||||
EXPECT_TRUE(slice.get("type").copyString() ==
|
||||
arangodb::iresearch::DATA_SOURCE_TYPE.name());
|
||||
EXPECT_TRUE(slice.get("deleted").isNone()); // no system properties
|
||||
auto tmpSlice = slice.get("links");
|
||||
EXPECT_TRUE((true == tmpSlice.isObject() && 1 == tmpSlice.length()));
|
||||
}
|
||||
|
||||
|
||||
// dedicated to https://github.com/arangodb/arangodb/issues/8294
|
||||
TEST_F(IResearchQueryOptimizationTest, test) {
|
||||
static std::vector<std::string> const EMPTY;
|
||||
|
||||
auto createJson = VPackParser::fromJson(
|
||||
"{ \
|
||||
|
@ -285,29 +311,7 @@ TEST_F(IResearchQueryOptimizationTest, test) {
|
|||
ASSERT_TRUE((false == !view));
|
||||
|
||||
// add link to collection
|
||||
{
|
||||
auto updateJson = VPackParser::fromJson(
|
||||
"{ \"links\" : {"
|
||||
"\"collection_1\" : { \"includeAllFields\" : true }"
|
||||
"}}");
|
||||
EXPECT_TRUE((view->properties(updateJson->slice(), true).ok()));
|
||||
|
||||
arangodb::velocypack::Builder builder;
|
||||
|
||||
builder.openObject();
|
||||
view->properties(builder, arangodb::LogicalDataSource::makeFlags(
|
||||
arangodb::LogicalDataSource::Serialize::Detailed));
|
||||
builder.close();
|
||||
|
||||
auto slice = builder.slice();
|
||||
EXPECT_TRUE(slice.isObject());
|
||||
EXPECT_TRUE(slice.get("name").copyString() == "testView");
|
||||
EXPECT_TRUE(slice.get("type").copyString() ==
|
||||
arangodb::iresearch::DATA_SOURCE_TYPE.name());
|
||||
EXPECT_TRUE(slice.get("deleted").isNone()); // no system properties
|
||||
auto tmpSlice = slice.get("links");
|
||||
EXPECT_TRUE((true == tmpSlice.isObject() && 1 == tmpSlice.length()));
|
||||
}
|
||||
addLinkToCollection(view);
|
||||
|
||||
std::deque<arangodb::ManagedDocumentResult> insertedDocs;
|
||||
|
||||
|
|
Loading…
Reference in New Issue