1
0
Fork 0

port 3.4 changes that give libcurl time to retry a failed DNS query. Also add changes to vocbase.cpp that were missed in previous PR. (#7132)

This commit is contained in:
Matthew Von-Maszewski 2018-10-30 16:00:13 -04:00 committed by Max Neunhöffer
parent c8961b2faa
commit a054e31f73
3 changed files with 47 additions and 29 deletions

View File

@ -1,6 +1,9 @@
devel devel
----- -----
* force connection timeout to be 7 seconds to allow libcurl time to retry lost DNS
queries.
* fixes a routing issue within the web ui after the use of views * fixes a routing issue within the web ui after the use of views
* fixes some graph data parsing issues in the ui, e.g. cleaning up duplicate * fixes some graph data parsing issues in the ui, e.g. cleaning up duplicate
@ -87,7 +90,7 @@ v3.4.0-rc.3 (XXXX-XX-XX)
* prevent creation of collections and views with the same in cluster setups * prevent creation of collections and views with the same in cluster setups
* fixed issue #6770: document update: ignoreRevs parameter ignored * fixed issue #6770: document update: ignoreRevs parameter ignored
* added AQL query optimizer rules `simplify-conditions` and `fuse-filters` * added AQL query optimizer rules `simplify-conditions` and `fuse-filters`
@ -865,30 +868,30 @@ v3.3.17 (2018-10-04)
* upgraded arangosync version to 0.6.0 * upgraded arangosync version to 0.6.0
* added several advanced options for configuring and debugging LDAP connections. * added several advanced options for configuring and debugging LDAP connections.
Please note that some of the following options are platform-specific and may not Please note that some of the following options are platform-specific and may not
work on all platforms or with all LDAP servers reliably: work on all platforms or with all LDAP servers reliably:
- `--ldap.serialized`: whether or not calls into the underlying LDAP library - `--ldap.serialized`: whether or not calls into the underlying LDAP library
should be serialized. should be serialized.
This option can be used to work around thread-unsafe LDAP library functionality. This option can be used to work around thread-unsafe LDAP library functionality.
- `--ldap.serialize-timeout`: sets the timeout value that is used when waiting to - `--ldap.serialize-timeout`: sets the timeout value that is used when waiting to
enter the LDAP library call serialization lock. This is only meaningful when enter the LDAP library call serialization lock. This is only meaningful when
`--ldap.serialized` has been set to `true`. `--ldap.serialized` has been set to `true`.
- `--ldap.retries`: number of tries to attempt a connection. Setting this to values - `--ldap.retries`: number of tries to attempt a connection. Setting this to values
greater than one will make ArangoDB retry to contact the LDAP server in case no greater than one will make ArangoDB retry to contact the LDAP server in case no
connection can be made initially. connection can be made initially.
- `--ldap.restart`: whether or not the LDAP library should implicitly restart - `--ldap.restart`: whether or not the LDAP library should implicitly restart
connections connections
- `--ldap.referrals`: whether or not the LDAP library should implicitly chase - `--ldap.referrals`: whether or not the LDAP library should implicitly chase
referrals referrals
- `--ldap.debug`: turn on internal OpenLDAP library output (warning: will print - `--ldap.debug`: turn on internal OpenLDAP library output (warning: will print
to stdout). to stdout).
- `--ldap.timeout`: timeout value (in seconds) for synchronous LDAP API calls - `--ldap.timeout`: timeout value (in seconds) for synchronous LDAP API calls
(a value of 0 means default timeout). (a value of 0 means default timeout).
- `--ldap.network-timeout`: timeout value (in seconds) after which network operations - `--ldap.network-timeout`: timeout value (in seconds) after which network operations
following the initial connection return in case of no activity (a value of 0 means following the initial connection return in case of no activity (a value of 0 means
default timeout). default timeout).
- `--ldap.async-connect`: whether or not the connection to the LDAP library will - `--ldap.async-connect`: whether or not the connection to the LDAP library will
be done asynchronously. be done asynchronously.
* fixed a shutdown race in ArangoDB's logger, which could have led to some buffered * fixed a shutdown race in ArangoDB's logger, which could have led to some buffered
@ -902,7 +905,7 @@ v3.3.17 (2018-10-04)
* fixed issue #6583: Agency node segfaults if sent an authenticated HTTP request is sent to its port * fixed issue #6583: Agency node segfaults if sent an authenticated HTTP request is sent to its port
* when cleaning out a leader it could happen that it became follower instead of * when cleaning out a leader it could happen that it became follower instead of
being removed completely being removed completely
* make synchronous replication detect more error cases when followers cannot * make synchronous replication detect more error cases when followers cannot
@ -912,7 +915,7 @@ v3.3.17 (2018-10-04)
VelocyStream protocol (VST) VelocyStream protocol (VST)
That combination could have led to spurious errors such as "TLS padding error" That combination could have led to spurious errors such as "TLS padding error"
or "Tag mismatch" and connections being closed or "Tag mismatch" and connections being closed
* agency endpoint updates now go through RAFT * agency endpoint updates now go through RAFT
@ -927,7 +930,7 @@ v3.3.16 (2018-09-19)
* fixed issue #6495 (Document not found when removing records) * fixed issue #6495 (Document not found when removing records)
* fixed undefined behavior in cluster plan-loading procedure that may have * fixed undefined behavior in cluster plan-loading procedure that may have
unintentionally modified a shared structure unintentionally modified a shared structure
* reduce overhead of function initialization in AQL COLLECT aggregate functions, * reduce overhead of function initialization in AQL COLLECT aggregate functions,
@ -974,18 +977,18 @@ v3.3.15 (2018-09-10)
* added startup option `--query.optimizer-max-plans value` * added startup option `--query.optimizer-max-plans value`
This option allows limiting the number of query execution plans created by the This option allows limiting the number of query execution plans created by the
AQL optimizer for any incoming queries. The default value is `128`. AQL optimizer for any incoming queries. The default value is `128`.
By adjusting this value it can be controlled how many different query execution By adjusting this value it can be controlled how many different query execution
plans the AQL query optimizer will generate at most for any given AQL query. plans the AQL query optimizer will generate at most for any given AQL query.
Normally the AQL query optimizer will generate a single execution plan per AQL query, Normally the AQL query optimizer will generate a single execution plan per AQL query,
but there are some cases in which it creates multiple competing plans. More plans but there are some cases in which it creates multiple competing plans. More plans
can lead to better optimized queries, however, plan creation has its costs. The can lead to better optimized queries, however, plan creation has its costs. The
more plans are created and shipped through the optimization pipeline, the more time more plans are created and shipped through the optimization pipeline, the more time
will be spent in the optimizer. will be spent in the optimizer.
Lowering this option's value will make the optimizer stop creating additional plans Lowering this option's value will make the optimizer stop creating additional plans
when it has already created enough plans. when it has already created enough plans.
Note that this setting controls the default maximum number of plans to create. The Note that this setting controls the default maximum number of plans to create. The
@ -1919,7 +1922,7 @@ v3.2.17 (XXXX-XX-XX)
* make synchronous replication detect more error cases when followers cannot * make synchronous replication detect more error cases when followers cannot
apply the changes from the leader apply the changes from the leader
* fixed undefined behavior in cluster plan-loading procedure that may have * fixed undefined behavior in cluster plan-loading procedure that may have
unintentionally modified a shared structure unintentionally modified a shared structure
* cluster nodes should retry registering in agency until successful * cluster nodes should retry registering in agency until successful

View File

@ -872,7 +872,10 @@ void TRI_vocbase_t::shutdown() {
// starts unloading of collections // starts unloading of collections
for (auto& collection : collections) { for (auto& collection : collections) {
collection->close(); // required to release indexes {
WRITE_LOCKER_EVENTUAL(locker, collection->lock());
collection->close(); // required to release indexes
}
unloadCollection(collection.get(), true); unloadCollection(collection.get(), true);
} }
@ -1828,6 +1831,7 @@ TRI_vocbase_t::~TRI_vocbase_t() {
// do a final cleanup of collections // do a final cleanup of collections
for (auto& it : _collections) { for (auto& it : _collections) {
WRITE_LOCKER_EVENTUAL(locker, it->lock());
it->close(); // required to release indexes it->close(); // required to release indexes
} }
@ -2260,4 +2264,4 @@ TRI_voc_rid_t TRI_StringToRid(char const* p, size_t len, bool& isOld,
// ----------------------------------------------------------------------------- // -----------------------------------------------------------------------------
// --SECTION-- END-OF-FILE // --SECTION-- END-OF-FILE
// ----------------------------------------------------------------------------- // -----------------------------------------------------------------------------

View File

@ -382,8 +382,11 @@ void Communicator::createRequestInProgress(NewRequest&& newRequest) {
// in doubt change the timeout to _MS below and hardcode it to 999 and see if // in doubt change the timeout to _MS below and hardcode it to 999 and see if
// the requests immediately fail // the requests immediately fail
// if not this hack can go away // if not this hack can go away
if (connectTimeout <= 0) { if (connectTimeout <= 7) {
connectTimeout = 5; // matthewv: previously arangod default was 1. libcurl flushes its DNS cache
// every 60 seconds. Tests showed DNS packets lost under high load. libcurl
// retries DNS after 5 seconds. 7 seconds allows for one retry plus a little padding.
connectTimeout = 7;
} }
curl_easy_setopt( curl_easy_setopt(
@ -485,6 +488,14 @@ void Communicator::handleResult(CURL* handle, CURLcode rc) {
<< ::buildPrefix(rip->_ticketId) << "curl error details: " << rip->_errorBuffer; << ::buildPrefix(rip->_ticketId) << "curl error details: " << rip->_errorBuffer;
} }
double namelookup;
curl_easy_getinfo(handle, CURLINFO_NAMELOOKUP_TIME, &namelookup);
if (5.0 <= namelookup) {
LOG_TOPIC(WARN, arangodb::Logger::FIXME) << "libcurl DNS lookup took "
<< namelookup << " seconds. Consider using static IP addresses.";
} // if
switch (rc) { switch (rc) {
case CURLE_OK: { case CURLE_OK: {
long httpStatusCode = 200; long httpStatusCode = 200;