diff --git a/CHANGELOG b/CHANGELOG index 8759501957..bba67896c2 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,6 +1,9 @@ devel ----- +* force connection timeout to be 7 seconds to allow libcurl time to retry lost DNS + queries. + * fixes a routing issue within the web ui after the use of views * fixes some graph data parsing issues in the ui, e.g. cleaning up duplicate @@ -87,7 +90,7 @@ v3.4.0-rc.3 (XXXX-XX-XX) * prevent creation of collections and views with the same in cluster setups -* fixed issue #6770: document update: ignoreRevs parameter ignored +* fixed issue #6770: document update: ignoreRevs parameter ignored * added AQL query optimizer rules `simplify-conditions` and `fuse-filters` @@ -865,30 +868,30 @@ v3.3.17 (2018-10-04) * upgraded arangosync version to 0.6.0 * added several advanced options for configuring and debugging LDAP connections. - Please note that some of the following options are platform-specific and may not + Please note that some of the following options are platform-specific and may not work on all platforms or with all LDAP servers reliably: - - `--ldap.serialized`: whether or not calls into the underlying LDAP library + - `--ldap.serialized`: whether or not calls into the underlying LDAP library should be serialized. This option can be used to work around thread-unsafe LDAP library functionality. - - `--ldap.serialize-timeout`: sets the timeout value that is used when waiting to - enter the LDAP library call serialization lock. This is only meaningful when - `--ldap.serialized` has been set to `true`. - - `--ldap.retries`: number of tries to attempt a connection. Setting this to values - greater than one will make ArangoDB retry to contact the LDAP server in case no + - `--ldap.serialize-timeout`: sets the timeout value that is used when waiting to + enter the LDAP library call serialization lock. This is only meaningful when + `--ldap.serialized` has been set to `true`. + - `--ldap.retries`: number of tries to attempt a connection. Setting this to values + greater than one will make ArangoDB retry to contact the LDAP server in case no connection can be made initially. - - `--ldap.restart`: whether or not the LDAP library should implicitly restart + - `--ldap.restart`: whether or not the LDAP library should implicitly restart connections - - `--ldap.referrals`: whether or not the LDAP library should implicitly chase + - `--ldap.referrals`: whether or not the LDAP library should implicitly chase referrals - - `--ldap.debug`: turn on internal OpenLDAP library output (warning: will print + - `--ldap.debug`: turn on internal OpenLDAP library output (warning: will print to stdout). - - `--ldap.timeout`: timeout value (in seconds) for synchronous LDAP API calls + - `--ldap.timeout`: timeout value (in seconds) for synchronous LDAP API calls (a value of 0 means default timeout). - - `--ldap.network-timeout`: timeout value (in seconds) after which network operations - following the initial connection return in case of no activity (a value of 0 means + - `--ldap.network-timeout`: timeout value (in seconds) after which network operations + following the initial connection return in case of no activity (a value of 0 means default timeout). - - `--ldap.async-connect`: whether or not the connection to the LDAP library will + - `--ldap.async-connect`: whether or not the connection to the LDAP library will be done asynchronously. * fixed a shutdown race in ArangoDB's logger, which could have led to some buffered @@ -902,7 +905,7 @@ v3.3.17 (2018-10-04) * fixed issue #6583: Agency node segfaults if sent an authenticated HTTP request is sent to its port -* when cleaning out a leader it could happen that it became follower instead of +* when cleaning out a leader it could happen that it became follower instead of being removed completely * make synchronous replication detect more error cases when followers cannot @@ -912,7 +915,7 @@ v3.3.17 (2018-10-04) VelocyStream protocol (VST) That combination could have led to spurious errors such as "TLS padding error" - or "Tag mismatch" and connections being closed + or "Tag mismatch" and connections being closed * agency endpoint updates now go through RAFT @@ -927,7 +930,7 @@ v3.3.16 (2018-09-19) * fixed issue #6495 (Document not found when removing records) -* fixed undefined behavior in cluster plan-loading procedure that may have +* fixed undefined behavior in cluster plan-loading procedure that may have unintentionally modified a shared structure * reduce overhead of function initialization in AQL COLLECT aggregate functions, @@ -974,18 +977,18 @@ v3.3.15 (2018-09-10) * added startup option `--query.optimizer-max-plans value` - This option allows limiting the number of query execution plans created by the + This option allows limiting the number of query execution plans created by the AQL optimizer for any incoming queries. The default value is `128`. - By adjusting this value it can be controlled how many different query execution - plans the AQL query optimizer will generate at most for any given AQL query. - Normally the AQL query optimizer will generate a single execution plan per AQL query, + By adjusting this value it can be controlled how many different query execution + plans the AQL query optimizer will generate at most for any given AQL query. + Normally the AQL query optimizer will generate a single execution plan per AQL query, but there are some cases in which it creates multiple competing plans. More plans can lead to better optimized queries, however, plan creation has its costs. The - more plans are created and shipped through the optimization pipeline, the more time + more plans are created and shipped through the optimization pipeline, the more time will be spent in the optimizer. - Lowering this option's value will make the optimizer stop creating additional plans + Lowering this option's value will make the optimizer stop creating additional plans when it has already created enough plans. Note that this setting controls the default maximum number of plans to create. The @@ -1919,7 +1922,7 @@ v3.2.17 (XXXX-XX-XX) * make synchronous replication detect more error cases when followers cannot apply the changes from the leader -* fixed undefined behavior in cluster plan-loading procedure that may have +* fixed undefined behavior in cluster plan-loading procedure that may have unintentionally modified a shared structure * cluster nodes should retry registering in agency until successful diff --git a/arangod/VocBase/vocbase.cpp b/arangod/VocBase/vocbase.cpp index 88ae3fa22b..53618929d6 100644 --- a/arangod/VocBase/vocbase.cpp +++ b/arangod/VocBase/vocbase.cpp @@ -872,7 +872,10 @@ void TRI_vocbase_t::shutdown() { // starts unloading of collections for (auto& collection : collections) { - collection->close(); // required to release indexes + { + WRITE_LOCKER_EVENTUAL(locker, collection->lock()); + collection->close(); // required to release indexes + } unloadCollection(collection.get(), true); } @@ -1828,6 +1831,7 @@ TRI_vocbase_t::~TRI_vocbase_t() { // do a final cleanup of collections for (auto& it : _collections) { + WRITE_LOCKER_EVENTUAL(locker, it->lock()); it->close(); // required to release indexes } @@ -2260,4 +2264,4 @@ TRI_voc_rid_t TRI_StringToRid(char const* p, size_t len, bool& isOld, // ----------------------------------------------------------------------------- // --SECTION-- END-OF-FILE -// ----------------------------------------------------------------------------- \ No newline at end of file +// ----------------------------------------------------------------------------- diff --git a/lib/SimpleHttpClient/Communicator.cpp b/lib/SimpleHttpClient/Communicator.cpp index c196ae9eaf..6fce08d045 100644 --- a/lib/SimpleHttpClient/Communicator.cpp +++ b/lib/SimpleHttpClient/Communicator.cpp @@ -382,8 +382,11 @@ void Communicator::createRequestInProgress(NewRequest&& newRequest) { // in doubt change the timeout to _MS below and hardcode it to 999 and see if // the requests immediately fail // if not this hack can go away - if (connectTimeout <= 0) { - connectTimeout = 5; + if (connectTimeout <= 7) { + // matthewv: previously arangod default was 1. libcurl flushes its DNS cache + // every 60 seconds. Tests showed DNS packets lost under high load. libcurl + // retries DNS after 5 seconds. 7 seconds allows for one retry plus a little padding. + connectTimeout = 7; } curl_easy_setopt( @@ -485,6 +488,14 @@ void Communicator::handleResult(CURL* handle, CURLcode rc) { << ::buildPrefix(rip->_ticketId) << "curl error details: " << rip->_errorBuffer; } + double namelookup; + curl_easy_getinfo(handle, CURLINFO_NAMELOOKUP_TIME, &namelookup); + + if (5.0 <= namelookup) { + LOG_TOPIC(WARN, arangodb::Logger::FIXME) << "libcurl DNS lookup took " + << namelookup << " seconds. Consider using static IP addresses."; + } // if + switch (rc) { case CURLE_OK: { long httpStatusCode = 200;