1
0
Fork 0

improve documentation a bit (#8722)

This commit is contained in:
Jan 2019-04-10 12:50:10 +02:00 committed by GitHub
parent 9c4af6cb7d
commit 44bc625317
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
5 changed files with 120 additions and 96 deletions

View File

@ -7,39 +7,42 @@ Introduction to TTL (time-to-live) Indexes
The TTL index provided by ArangoDB is used for removing expired documents
from a collection.
The TTL index is set up by setting an `expireAfter` value and by picking a single
document attribute which contains the documents' creation date and time. Documents
are expired after `expireAfter` seconds after their creation time. The creation time
is specified as a numeric timestamp (Unix timestamp) or a date string in format
`YYYY-MM-DDTHH:MM:SS` with optional milliseconds. All date strings will be interpreted
as UTC dates.
The TTL index is set up by setting an `expireAfter` value and by selecting a single
document attribute which contains a reference timepoint. For each document, that
reference timepoint can then be specified as a numeric timestamp (Unix timestamp) or
a date string in format `YYYY-MM-DDTHH:MM:SS` with optional milliseconds.
All date strings will be interpreted as UTC dates.
For example, if `expireAfter` is set to 600 seconds (10 minutes) and the index
attribute is "creationDate" and there is the following document:
Documents will count as expired when wall clock time is beyond the per-document
reference timepoint value plus the index' `expireAfter` value added to it.
### Removing documents at a fixed period after creation / update
One use case supported by TTL indexes is to remove documents at a fixed duration
after they have been created or last updated. This requires setting up the index
with an attribute that contains the documents' creation or last-updated time.
Let's assume the index attribute is set to "creationDate", and the `expireAfter`
attribute of the index was set to 600 seconds (10 minutes).
db.collection.ensureIndex({ type: "ttl", fields: ["creationDate"], expireAfter: 600 });
Let's further assume the following document now gets inserted into the collection:
{ "creationDate" : 1550165973 }
This document will be indexed with a creation date time value of `1550165973`,
which translates to the human-readable date `2019-02-14T17:39:33.000Z`. The document
This document will be indexed with a reference timepoint value of `1550165973`,
which translates to the human-readable date/time `2019-02-14T17:39:33.000Z`. The document
will expire 600 seconds afterwards, which is at timestamp `1550166573` (or
`2019-02-14T17:49:33.000Z` in the human-readable version).
The actual removal of expired documents will not necessarily happen immediately.
Expired documents will eventually removed by a background thread that is periodically
going through all TTL indexes and removing the expired documents.
There is no guarantee when exactly the removal of expired documents will be carried
out, so queries may still find and return documents that have already expired. These
will eventually be removed when the background thread kicks in and has capacity to
remove the expired documents. It is guaranteed however that only documents which are
past their expiration time will actually be removed.
`2019-02-14T17:49:33.000Z` in the human-readable version). From that point on, the
document is a candidate for being removed.
Please note that the numeric date time values for the index attribute should be
specified in milliseconds since January 1st 1970 (Unix timestamp). To calculate the current
specified in seconds since January 1st 1970 (Unix timestamp). To calculate the current
timestamp from JavaScript in this format, there is `Date.now() / 1000`, to calculate it
from an arbitrary Date instance, there is `Date.getTime() / 1000`.
from an arbitrary `Date` instance, there is `Date.getTime() / 1000`.
Alternatively, the index attribute values can be specified as a date string in format
Alternatively, the reference timepoints can be specified as a date string in format
`YYYY-MM-DDTHH:MM:SS` with optional milliseconds. All date strings will be interpreted
as UTC dates.
@ -47,17 +50,61 @@ The above example document using a datestring attribute value would be
{ "creationDate" : "2019-02-14T17:39:33.000Z" }
Now any data-modification access to the document could update the value in the document's
`creationDate` attribute to the current date/time, which would prolong the existence
of the document and keep it from being expired and removed.
Setting a document's reference timepoint on initial insertion or updating it on every
subsequent modification of the document will not be performed by ArangoDB. Instead, it
is the tasks of client applications to set and update the reference timepoints whenever
the use case requires it.
### Removing documents at certain points in time
Another use case is to specify a per-document expiration/removal timepoint, and setting
the `expireAfter` attribute to a low value (e.g. 0 seconds).
Let's assume the index attribute is set to "expireDate", and the `expireAfter`
attribute of the index was set to 0 seconds (immediately when wall clock time reaches
the value specified in `expireDate`).
db.collection.ensureIndex({ type: "ttl", fields: ["expireDate"], expireAfter: 0 });
When storing the following document in the collection, it will expire at the timepoint
specified in the document itself:
{ "expireDate" : "2019-03-28T01:06:00Z" }
As `expireAfter` was set to 0, the document will count as expired when wall clock time
has reached the timeout.
It should be intuitive to see that the `expireDate` can be differently per document.
This allows mixing of documents with different expiration periods by calculating their
expiration dates differently in the client application.
### Preventing documents from being removed
In case the index attribute does not contain a numeric value nor a proper date string,
the document will not be stored in the TTL index and thus will not become a candidate
for expiration and removal. Providing either a non-numeric value or even no value for
the index attribute is a supported way of keeping documents from being expired and removed.
the index attribute is a supported way to keep documents from being expired and removed.
There can at most be one TTL index per collection. It is not recommended to use
TTL indexes for user-land AQL queries, as TTL indexes may store a transformed,
always numerical version of the index attribute value.
### Limitations
The actual removal of expired documents will not necessarily happen immediately when
they have reached their expiration time.
Expired documents will eventually be removed by a background thread that is periodically
going through all TTL indexes and removing the expired documents.
There is no guarantee when exactly the removal of expired documents will be carried
out, so queries may still find and return documents that have already expired. These
will eventually be removed when the background thread kicks in and has spare capacity to
remove the expired documents. It is guaranteed however that only documents which are
past their expiration time will actually be removed.
The frequency for invoking the background removal thread can be configured using
the `--ttl.frequency` startup option. The frequency is specified in milliseconds.
The frequency for invoking the background removal thread can be configured
using the `--ttl.frequency` startup option.
In order to avoid "random" load spikes by the background thread suddenly kicking
in and removing a lot of documents at once, the number of to-be-removed documents
per thread invocation can be capped.
@ -66,6 +113,19 @@ controlled by the startup option `--ttl.max-total-removes`. The maximum number o
documents in a single collection at once can be controlled by the startup option
`--ttl.max-collection-removes`.
There can at most be one TTL index per collection. It is not recommended to rely on
TTL indexes for user-land AQL queries. This is because TTL indexes may store a transformed,
always numerical version of the index attribute value even if it was originally passed
in as a datestring.
Please note that there is one background thread per ArangoDB database server instance
for performing the removal of expired documents of all collections in all databases.
If the number of databases and collections with TTL indexes is high and there are many
documents to remove from these, the background thread may at least temporarily lag
behind with its removal operations. It should eventually catch up in case the number
of to-be-removed documents per invocation is not higher than the background thread's
configured threshold values.
Accessing TTL Indexes from the Shell
-------------------------------------

View File

@ -65,11 +65,11 @@ different usage scenarios:
expired documents from a collection.
The TTL index is set up by setting an `expireAfter` value and by picking a single
document attribute which contains the documents' creation date and time. Documents
are expired after `expireAfter` seconds after their creation time. The creation time
is specified as either a numeric timestamp (Unix timestamp) or a date string in format
`YYYY-MM-DDTHH:MM:SS` with optional milliseconds. All date strings will be interpreted
as UTC dates.
document attribute which contains the documents' reference timepoint. Documents
are expired `expireAfter` seconds after their reference timepoint has been reached.
The documents' reference timepoint is specified as either a numeric timestamp
(Unix timestamp) or a date string in format `YYYY-MM-DDTHH:MM:SS` with optional
milliseconds. All date strings will be interpreted as UTC dates.
For example, if `expireAfter` is set to 600 seconds (10 minutes) and the index
attribute is "creationDate" and there is the following document:
@ -94,7 +94,7 @@ different usage scenarios:
past their expiration time will actually be removed.
Please note that the numeric date time values for the index attribute should be
specified in milliseconds since January 1st 1970 (Unix timestamp). To calculate the current
specified in seconds since January 1st 1970 (Unix timestamp). To calculate the current
timestamp from JavaScript in this format, there is `Date.now() / 1000`, to calculate it
from an arbitrary Date instance, there is `Date.getTime() / 1000`.
@ -111,8 +111,9 @@ different usage scenarios:
for expiration and removal. Providing either a non-numeric value or even no value for
the index attribute is a supported way of keeping documents from being expired and removed.
It is not recommended to use TTL indexes for user-land AQL queries, as TTL indexes may
store a transformed, always numerical version of the index attribute value.
It is not recommended to rely on TTL indexes for user-land AQL queries. This is because
TTL indexe may store a transformed, always numerical version of the index attribute value
even if it was originally passed in as a datestring.
- geo index: the geo index provided by ArangoDB allows searching for documents
within a radius around a two-dimensional earth coordinate (point), or to

View File

@ -190,65 +190,18 @@ other operations on the collection.
TTL (time-to-live) Indexes
--------------------------
The new TTL indexes provided by ArangoDB can be used for removing expired documents
from a collection.
The new TTL indexes feature provided by ArangoDB can be used for automatically
removing expired documents from a collection.
A TTL index can be set up by setting an `expireAfter` value and by picking a single
document attribute which contains the documents' creation date and time. Documents
expire `expireAfter` seconds after their creation time. The creation time
is specified as either a numeric timestamp or a UTC datestring.
TTL indexes support eventual removal of documents which are past a configured
expiration timepoint. The expiration timepoints can be based upon the documents'
original insertion or last-updated timepoints, with adding a period during
which to retain the documents.
Alternatively, expiration timepoints can be specified as absolute values per
document.
It is also possible to exclude documents from automatic expiration and removal.
For example, if `expireAfter` is set to 600 seconds (10 minutes) and the index
attribute is "creationDate" and there is the following document:
{ "creationDate" : 1550165973 }
This document will be indexed with a creation timestamp value of `1550165973`,
which translates to the human-readable date string `2019-02-14T17:39:33.000Z`. The
document will expire 600 seconds afterwards, which is at timestamp `1550166573` (or
`2019-02-14T17:49:33.000Z` in the human-readable version).
The actual removal of expired documents will not necessarily happen immediately.
Expired documents will eventually removed by a background thread that is periodically
going through all TTL indexes and removing the expired documents.
There is no guarantee when exactly the removal of expired documents will be carried
out, so queries may still find and return documents that have already expired. These
will eventually be removed when the background thread kicks in and has capacity to
remove the expired documents. It is guaranteed however that only documents which are
past their expiration time will actually be removed.
Please note that the numeric timestamp values for the index attribute should be
specified in seconds since January 1st 1970 (Unix timestamp). To calculate the current
timestamp from JavaScript in this format, there is `Date.now() / 1000`, to calculate it
from an arbitrary Date instance, there is `Date.getTime() / 1000`.
Alternatively, the index attribute values can be specified as a date string in format
`YYYY-MM-DDTHH:MM:SS` with optional milliseconds. All date strings will be interpreted
as UTC dates.
The above example document using a datestring attribute value would be
{ "creationDate" : "2019-02-14T17:39:33.000Z" }
In case the index attribute does not contain a numeric value nor a proper date string,
the document will not be stored in the TTL index and thus will not become a candidate
for expiration and removal. Providing either a non-numeric value or even no value for
the index attribute is a supported way of keeping documents from being expired and removed.
There can at most be one TTL index per collection. It is not recommended to use
TTL indexes for user-land AQL queries, as TTL indexes may store a transformed,
always numerical version of the index attribute value.
The frequency for invoking the background removal thread can be configured
using the `--ttl.frequency` startup option.
In order to avoid "random" load spikes by the background thread suddenly kicking
in and removing a lot of documents at once, the number of to-be-removed documents
per thread invocation can be capped.
The total maximum number of documents to be removed per thread invocation is
controlled by the startup option `--ttl.max-total-removes`. The maximum number of
documents in a single collection at once can be controlled by the startup option
`--ttl.max-collection-removes`.
Also see the [TTL Indexes](../Indexing/Ttl.md) page.
HTTP API extensions

View File

@ -108,6 +108,9 @@ Result TtlProperties::fromVelocyPack(VPackSlice const& slice) {
return Result(TRI_ERROR_BAD_PARAMETER, "expecting numeric value for frequency");
}
frequency = slice.get("frequency").getNumericValue<uint64_t>();
if (frequency < TtlProperties::minFrequency) {
return Result(TRI_ERROR_BAD_PARAMETER, "too low value for frequency");
}
}
if (slice.hasKey("maxTotalRemoves")) {
if (!slice.get("maxTotalRemoves").isNumber()) {
@ -439,6 +442,12 @@ void TtlFeature::validateOptions(std::shared_ptr<ProgramOptions> options) {
<< "invalid value for '--ttl.max-collection-removes'.";
FATAL_ERROR_EXIT();
}
if (_properties.frequency < TtlProperties::minFrequency) {
LOG_TOPIC("ea696", FATAL, arangodb::Logger::STARTUP)
<< "too low value for '--ttl.frequency'.";
FATAL_ERROR_EXIT();
}
}
void TtlFeature::start() {

View File

@ -55,6 +55,7 @@ struct TtlStatistics {
};
struct TtlProperties {
static constexpr uint64_t minFrequency = 1 * 1000; // milliseconds
uint64_t frequency = 30 * 1000; // milliseconds
uint64_t maxTotalRemoves = 1000000;
uint64_t maxCollectionRemoves = 1000000;