Bug fix/collection babies race timeout (#9185)

* Fixed include guard. * Forward port of 3.4 bug-fix * Removed lockers alltogether we are secured mutex already * Fixed recursive lock gathering
2019-06-13 19:11:24 +02:00 · 2019-06-13 19:11:24 +02:00 · 2c78e2471b
parent cc125b377c
commit 2c78e2471b
4 changed files with 39 additions and 26 deletions
--- a/21
+++ b/21
@ -1,6 +1,9 @@
 devel
 -----

+* Speed up collection creation process in cluster, if not all agency callbacks are
+  delivered successfully.
+
 * increased performance of document inserts, by reducing the number of checks in unique / primary indexes

 * fixed a callback function in the web UI where the variable `this` was out of scope.
@ -34,7 +37,7 @@ devel

 v3.5.0-rc.3 (2019-05-31)
 ------------------------
-  
+
 * fix issue #9106: Sparse Skiplist Index on multiple fields not used for FILTER + SORT query

  Allow AQL query optimizer to use sparse indexes in more cases, specifically when
@ -52,7 +55,7 @@ v3.5.0-rc.3 (2019-05-31)
 * Bugfix for smart graph traversals with uniqueVertices: path, which could
  sometimes lead to erroneous traversal results

-* Pregel algorithms can be run with the option "useMemoryMaps: true" to be 
+* Pregel algorithms can be run with the option "useMemoryMaps: true" to be
  able to run algorithms on data that is bigger than the available RAM.

 * fix a race in TTL thread deactivation/shutdown
@ -80,7 +83,7 @@ v3.5.0-rc.2 (2019-05-23)
  and uncompressed data blocks not fitting into the block cache

  The error can only occur for collection or index scans with the RocksDB storage engine
-  when the RocksDB block cache is used and set to a very small size, plus its maximum size is 
+  when the RocksDB block cache is used and set to a very small size, plus its maximum size is
  enforced by setting the `--rocksdb.enforce-block-cache-size-limit` option to `true`.

  Previously these incomplete reads could have been ignored silently, making collection or
@ -88,7 +91,7 @@ v3.5.0-rc.2 (2019-05-23)

 * fixed internal issue #3918: added optional second parameter "withId" to AQL
  function PREGEL_RESULT
- 
+
  this parameter defaults to `false`. When set to `true` the results of the Pregel
  computation run will also contain the `_id` attribute for each vertex and not
  just `_key`. This allows distinguishing vertices from different vertex collections.
@ -99,9 +102,9 @@ v3.5.0-rc.2 (2019-05-23)

 * internally switch unit tests framework from catch to gtest

-* disable selection of index types "hash" and "skiplist" in the web interface when 
-  using the RocksDB engine. The index types "hash", "skiplist" and "persistent" are 
-  just aliases of each other with the RocksDB engine, so there is no need to offer all 
+* disable selection of index types "hash" and "skiplist" in the web interface when
+  using the RocksDB engine. The index types "hash", "skiplist" and "persistent" are
+  just aliases of each other with the RocksDB engine, so there is no need to offer all
  of them. After initially only offering "hash" indexes, we decided to only offer
  indexes of type "persistent", as it is technically the most
  appropriate description.
@ -619,7 +622,7 @@ v3.4.6 (2019-05-21)
  and uncompressed data blocks not fitting into the block cache

  The error can only occur for collection or index scans with the RocksDB storage engine
-  when the RocksDB block cache is used and set to a very small size, plus its maximum size is 
+  when the RocksDB block cache is used and set to a very small size, plus its maximum size is
  enforced by setting the `--rocksdb.enforce-block-cache-size-limit` option to `true`.

  Previously these incomplete reads could have been ignored silently, making collection or
@ -627,7 +630,7 @@ v3.4.6 (2019-05-21)

 * fixed internal issue #3918: added optional second parameter "withId" to AQL
  function PREGEL_RESULT
- 
+
  this parameter defaults to `false`. When set to `true` the results of the Pregel
  computation run will also contain the `_id` attribute for each vertex and not
  just `_key`. This allows distinguishing vertices from different vertex collections.
--- a/arangod/Cluster/AgencyCallback.cpp
+++ b/arangod/Cluster/AgencyCallback.cpp
@ -125,7 +125,7 @@ bool AgencyCallback::execute(std::shared_ptr<VPackBuilder> newData) {
  return result;
 }

-void AgencyCallback::executeByCallbackOrTimeout(double maxTimeout) {
+bool AgencyCallback::executeByCallbackOrTimeout(double maxTimeout) {
  // One needs to acquire the mutex of the condition variable
  // before entering this function!
  if (!_cv.wait(static_cast<uint64_t>(maxTimeout * 1000000.0)) &&
@ -134,5 +134,7 @@ void AgencyCallback::executeByCallbackOrTimeout(double maxTimeout) {
        << "Waiting done and nothing happended. Refetching to be sure";
    // mop: watches have not triggered during our sleep...recheck to be sure
    refetchAndUpdate(false, true);  // Force a check
+    return true;
  }
+  return false;
 }
--- a/arangod/Cluster/AgencyCallback.h
+++ b/arangod/Cluster/AgencyCallback.h
@ -112,9 +112,12 @@ class AgencyCallback {

  //////////////////////////////////////////////////////////////////////////////
  /// @brief wait until a callback is received or a timeout has happened
+  /// 
+  /// @return true => if we got woken up after maxTimeout
+  ///         false => if someone else ringed the condition variable
  //////////////////////////////////////////////////////////////////////////////

-  void executeByCallbackOrTimeout(double);
+  bool executeByCallbackOrTimeout(double);

  //////////////////////////////////////////////////////////////////////////////
  /// @brief private members
--- a/arangod/Cluster/ClusterInfo.cpp
+++ b/arangod/Cluster/ClusterInfo.cpp
@ -1977,13 +1977,9 @@ Result ClusterInfo::createCollectionsCoordinator(std::string const& databaseName

    if (nrDone->load(std::memory_order_acquire) == infos.size()) {
      {
-        // We need to lock all condition variables
-        std::vector<::arangodb::basics::ConditionLocker> lockers;
-        for (auto& cb : agencyCallbacks) {
-          CONDITION_LOCKER(locker, cb->_cv);
-        }
+        // We do not need to lock all condition variables
+        // we are save by cacheMutex
        cbGuard.fire();
-        // After the guard is done we can release the lockers
      }
      // Now we need to remove TTL + the IsBuilding flag in Agency
      opers.clear();
@ -2009,13 +2005,9 @@ Result ClusterInfo::createCollectionsCoordinator(std::string const& databaseName
    }
    if (tmpRes > TRI_ERROR_NO_ERROR) {
      {
-        // We need to lock all condition variables
-        std::vector<::arangodb::basics::ConditionLocker> lockers;
-        for (auto& cb : agencyCallbacks) {
-          CONDITION_LOCKER(locker, cb->_cv);
-        }
+        // We do not need to lock all condition variables
+        // we are save by cacheMutex
        cbGuard.fire();
-        // After the guard is done we can release the lockers
      }

      // report error
@ -2047,9 +2039,22 @@ Result ClusterInfo::createCollectionsCoordinator(std::string const& databaseName
    TRI_ASSERT(agencyCallbacks.size() == infos.size());
    for (size_t i = 0; i < infos.size(); ++i) {
      if (infos[i].state == ClusterCollectionCreationInfo::INIT) {
-        // This one has not responded, wait for it.
-        CONDITION_LOCKER(locker, agencyCallbacks[i]->_cv);
-        agencyCallbacks[i]->executeByCallbackOrTimeout(interval);
+        bool wokenUp = false;
+        {
+          // This one has not responded, wait for it.
+          CONDITION_LOCKER(locker, agencyCallbacks[i]->_cv);
+          wokenUp = agencyCallbacks[i]->executeByCallbackOrTimeout(interval);
+        }
+        if (wokenUp) {
+          ++i;
+          // We got woken up by waittime, not by  callback.
+          // Let us check if we skipped other callbacks as well
+          for (; i < infos.size(); ++i) {
+            if (infos[i].state == ClusterCollectionCreationInfo::INIT) {
+              agencyCallbacks[i]->refetchAndUpdate(true, false);
+            }
+          }
+        }
        break;
      }
    }