1
0
Fork 0

port 3.4 changes that give libcurl time to retry a failed DNS query. Also add changes to vocbase.cpp that were missed in previous PR. (#7132)

This commit is contained in:
Matthew Von-Maszewski 2018-10-30 16:00:13 -04:00 committed by Max Neunhöffer
parent c8961b2faa
commit a054e31f73
3 changed files with 47 additions and 29 deletions

View File

@ -1,6 +1,9 @@
devel
-----
* force connection timeout to be 7 seconds to allow libcurl time to retry lost DNS
queries.
* fixes a routing issue within the web ui after the use of views
* fixes some graph data parsing issues in the ui, e.g. cleaning up duplicate
@ -87,7 +90,7 @@ v3.4.0-rc.3 (XXXX-XX-XX)
* prevent creation of collections and views with the same in cluster setups
* fixed issue #6770: document update: ignoreRevs parameter ignored
* fixed issue #6770: document update: ignoreRevs parameter ignored
* added AQL query optimizer rules `simplify-conditions` and `fuse-filters`
@ -865,30 +868,30 @@ v3.3.17 (2018-10-04)
* upgraded arangosync version to 0.6.0
* added several advanced options for configuring and debugging LDAP connections.
Please note that some of the following options are platform-specific and may not
Please note that some of the following options are platform-specific and may not
work on all platforms or with all LDAP servers reliably:
- `--ldap.serialized`: whether or not calls into the underlying LDAP library
- `--ldap.serialized`: whether or not calls into the underlying LDAP library
should be serialized.
This option can be used to work around thread-unsafe LDAP library functionality.
- `--ldap.serialize-timeout`: sets the timeout value that is used when waiting to
enter the LDAP library call serialization lock. This is only meaningful when
`--ldap.serialized` has been set to `true`.
- `--ldap.retries`: number of tries to attempt a connection. Setting this to values
greater than one will make ArangoDB retry to contact the LDAP server in case no
- `--ldap.serialize-timeout`: sets the timeout value that is used when waiting to
enter the LDAP library call serialization lock. This is only meaningful when
`--ldap.serialized` has been set to `true`.
- `--ldap.retries`: number of tries to attempt a connection. Setting this to values
greater than one will make ArangoDB retry to contact the LDAP server in case no
connection can be made initially.
- `--ldap.restart`: whether or not the LDAP library should implicitly restart
- `--ldap.restart`: whether or not the LDAP library should implicitly restart
connections
- `--ldap.referrals`: whether or not the LDAP library should implicitly chase
- `--ldap.referrals`: whether or not the LDAP library should implicitly chase
referrals
- `--ldap.debug`: turn on internal OpenLDAP library output (warning: will print
- `--ldap.debug`: turn on internal OpenLDAP library output (warning: will print
to stdout).
- `--ldap.timeout`: timeout value (in seconds) for synchronous LDAP API calls
- `--ldap.timeout`: timeout value (in seconds) for synchronous LDAP API calls
(a value of 0 means default timeout).
- `--ldap.network-timeout`: timeout value (in seconds) after which network operations
following the initial connection return in case of no activity (a value of 0 means
- `--ldap.network-timeout`: timeout value (in seconds) after which network operations
following the initial connection return in case of no activity (a value of 0 means
default timeout).
- `--ldap.async-connect`: whether or not the connection to the LDAP library will
- `--ldap.async-connect`: whether or not the connection to the LDAP library will
be done asynchronously.
* fixed a shutdown race in ArangoDB's logger, which could have led to some buffered
@ -902,7 +905,7 @@ v3.3.17 (2018-10-04)
* fixed issue #6583: Agency node segfaults if sent an authenticated HTTP request is sent to its port
* when cleaning out a leader it could happen that it became follower instead of
* when cleaning out a leader it could happen that it became follower instead of
being removed completely
* make synchronous replication detect more error cases when followers cannot
@ -912,7 +915,7 @@ v3.3.17 (2018-10-04)
VelocyStream protocol (VST)
That combination could have led to spurious errors such as "TLS padding error"
or "Tag mismatch" and connections being closed
or "Tag mismatch" and connections being closed
* agency endpoint updates now go through RAFT
@ -927,7 +930,7 @@ v3.3.16 (2018-09-19)
* fixed issue #6495 (Document not found when removing records)
* fixed undefined behavior in cluster plan-loading procedure that may have
* fixed undefined behavior in cluster plan-loading procedure that may have
unintentionally modified a shared structure
* reduce overhead of function initialization in AQL COLLECT aggregate functions,
@ -974,18 +977,18 @@ v3.3.15 (2018-09-10)
* added startup option `--query.optimizer-max-plans value`
This option allows limiting the number of query execution plans created by the
This option allows limiting the number of query execution plans created by the
AQL optimizer for any incoming queries. The default value is `128`.
By adjusting this value it can be controlled how many different query execution
plans the AQL query optimizer will generate at most for any given AQL query.
Normally the AQL query optimizer will generate a single execution plan per AQL query,
By adjusting this value it can be controlled how many different query execution
plans the AQL query optimizer will generate at most for any given AQL query.
Normally the AQL query optimizer will generate a single execution plan per AQL query,
but there are some cases in which it creates multiple competing plans. More plans
can lead to better optimized queries, however, plan creation has its costs. The
more plans are created and shipped through the optimization pipeline, the more time
more plans are created and shipped through the optimization pipeline, the more time
will be spent in the optimizer.
Lowering this option's value will make the optimizer stop creating additional plans
Lowering this option's value will make the optimizer stop creating additional plans
when it has already created enough plans.
Note that this setting controls the default maximum number of plans to create. The
@ -1919,7 +1922,7 @@ v3.2.17 (XXXX-XX-XX)
* make synchronous replication detect more error cases when followers cannot
apply the changes from the leader
* fixed undefined behavior in cluster plan-loading procedure that may have
* fixed undefined behavior in cluster plan-loading procedure that may have
unintentionally modified a shared structure
* cluster nodes should retry registering in agency until successful

View File

@ -872,7 +872,10 @@ void TRI_vocbase_t::shutdown() {
// starts unloading of collections
for (auto& collection : collections) {
collection->close(); // required to release indexes
{
WRITE_LOCKER_EVENTUAL(locker, collection->lock());
collection->close(); // required to release indexes
}
unloadCollection(collection.get(), true);
}
@ -1828,6 +1831,7 @@ TRI_vocbase_t::~TRI_vocbase_t() {
// do a final cleanup of collections
for (auto& it : _collections) {
WRITE_LOCKER_EVENTUAL(locker, it->lock());
it->close(); // required to release indexes
}
@ -2260,4 +2264,4 @@ TRI_voc_rid_t TRI_StringToRid(char const* p, size_t len, bool& isOld,
// -----------------------------------------------------------------------------
// --SECTION-- END-OF-FILE
// -----------------------------------------------------------------------------
// -----------------------------------------------------------------------------

View File

@ -382,8 +382,11 @@ void Communicator::createRequestInProgress(NewRequest&& newRequest) {
// in doubt change the timeout to _MS below and hardcode it to 999 and see if
// the requests immediately fail
// if not this hack can go away
if (connectTimeout <= 0) {
connectTimeout = 5;
if (connectTimeout <= 7) {
// matthewv: previously arangod default was 1. libcurl flushes its DNS cache
// every 60 seconds. Tests showed DNS packets lost under high load. libcurl
// retries DNS after 5 seconds. 7 seconds allows for one retry plus a little padding.
connectTimeout = 7;
}
curl_easy_setopt(
@ -485,6 +488,14 @@ void Communicator::handleResult(CURL* handle, CURLcode rc) {
<< ::buildPrefix(rip->_ticketId) << "curl error details: " << rip->_errorBuffer;
}
double namelookup;
curl_easy_getinfo(handle, CURLINFO_NAMELOOKUP_TIME, &namelookup);
if (5.0 <= namelookup) {
LOG_TOPIC(WARN, arangodb::Logger::FIXME) << "libcurl DNS lookup took "
<< namelookup << " seconds. Consider using static IP addresses.";
} // if
switch (rc) {
case CURLE_OK: {
long httpStatusCode = 200;