mirror of https://gitee.com/bigwinds/arangodb
142 lines
6.6 KiB
Plaintext
142 lines
6.6 KiB
Plaintext
!SECTION Which Index to use when
|
|
|
|
ArangoDB automatically indexes the `_key` attribute in each collection. There
|
|
is no need to index this attribute separately. Please note that a document's
|
|
`_id` attribute is derived from the `_key` attribute, and is thus implicitly
|
|
indexed, too.
|
|
|
|
ArangoDB will also automatically create an index on `_from` and `_to` in any
|
|
edge collection, meaning incoming and outgoing connections can be determined
|
|
efficiently.
|
|
|
|
!SUBSECTION Index types
|
|
|
|
Users can define additional indexes on one or multiple document attributes.
|
|
Several different index types are provided by ArangoDB. These indexes have
|
|
different usage scenarios:
|
|
|
|
- hash index: provides quick access to individual documents if (and only if)
|
|
all indexed attributes are provided in the search query. The index will only
|
|
be used for equality comparisons. It does not support range queries and
|
|
cannot be used for sorting.
|
|
|
|
The hash index is a good candidate if all or most queries on the indexed
|
|
attribute(s) are equality comparisons. It will be the most efficient index
|
|
type if the index is declared unique.
|
|
|
|
Insertions into a non-unique hash index are also very efficient. Update and
|
|
removal performance in a non-unique hash index depend on the key selectivity.
|
|
If the selectivity is low and keys repeat a lot, update and removal performance
|
|
in a non-unique hash index will degrade.
|
|
|
|
A non-unique hash index should therefore not be used if duplicate index values
|
|
are allowed and it is known that there will be many duplicate values in the index
|
|
and there will be updates or removals.
|
|
|
|
A non-unique hash index on an optional document attribute should be declared
|
|
sparse so that it will not index documents for which the index attribute is
|
|
not set.
|
|
|
|
- skiplist index: skiplists keep the indexed values in an order, so they can
|
|
be used for equality lookups, range queries and for sorting. For high selectivity
|
|
attributes, skiplist indexes will have a higher overhead than hash indexes. For
|
|
low selectivity attributes, skiplist indexes will be more efficient than non-unique
|
|
hash indexes.
|
|
|
|
Additionally, skiplist indexes allow more use cases (e.g. range queries, sorting)
|
|
than hash indexes. Furthermore, they can be used for lookups based on a leftmost
|
|
prefix of the index attributes.
|
|
|
|
- geo index: the geo index provided by ArangoDB allows searching for documents
|
|
within a radius around a two-dimensional earth coordinate (point), or to
|
|
find documents with are closest to a point. Document coordinates can either
|
|
be specified in two different document attributes or in a single attribute, e.g.
|
|
|
|
{ "latitude": 50.9406645, "longitude": 6.9599115 }
|
|
|
|
or
|
|
|
|
{ "coords": [ 50.9406645, 6.9599115 ] }
|
|
|
|
Geo indexes will only be invoked via special functions.
|
|
|
|
- full-text index: a full-text index can be used to index all words contained in
|
|
a specific attribute of all documents in a collection. Only words with a
|
|
(specifiable) minimum length are indexed. Word tokenization is done using
|
|
the word boundary analysis provided by libicu, which is taking into account
|
|
the selected language provided at server start.
|
|
|
|
The index supports complete match queries (full words) and prefix queries.
|
|
Full-text indexes will only be invoked via special functions.
|
|
|
|
- cap constraint: the cap constraint provided by ArangoDB indexes documents
|
|
not to speed up search queries, but to limit (cap) the number or size of
|
|
documents in a collection. This can be used to prevent collections from growing
|
|
permanently.
|
|
|
|
|
|
!SUBSECTION Sparse vs. non-sparse indexes
|
|
|
|
Hash indexes and skiplist indexes can optionally be created sparse. A sparse index
|
|
does not contain documents for which at least one of the index attribute is not set
|
|
or contains a value of `null`.
|
|
|
|
As such documents are excluded from sparse indexes, they may contain fewer documents than
|
|
their non-sparse counterparts. This enables faster indexing and can lead to reduced memory
|
|
usage in case the indexed attribute does occur only in some, but not all documents of the
|
|
collection. Sparse indexes will also reduce the number of collisions in non-unique hash
|
|
indexes in case non-existing or optional attributes are indexed.
|
|
|
|
In order to create a sparse index, an object with the attribute `sparse` can be added to
|
|
the index creation commands:
|
|
|
|
```js
|
|
db.collection.ensureHashIndex(attributeName, { sparse: true });
|
|
db.collection.ensureHashIndex(attributeName1, attributeName2, { sparse: true });
|
|
db.collection.ensureUniqueConstraint(attributeName, { sparse: true });
|
|
db.collection.ensureUniqueConstraint(attributeName1, attributeName2, { sparse: true });
|
|
|
|
db.collection.ensureSkiplist(attributeName, { sparse: true });
|
|
db.collection.ensureSkiplist(attributeName1, attributeName2, { sparse: true });
|
|
db.collection.ensureUniqueSkiplist(attributeName, { sparse: true });
|
|
db.collection.ensureUniqueSkiplist(attributeName1, attributeName2, { sparse: true });
|
|
```
|
|
|
|
When not explicitly set, the `sparse` attribute defaults to `false` for new indexes.
|
|
Other indexes than hash and skiplist do not support sparsity.
|
|
|
|
As sparse indexes may exclude some documents from the collection, they cannot be used for
|
|
all types of queries. Sparse hash indexes cannot be used to find documents for which at
|
|
least one of the indexed attributes has a value of `null`. For example, the following AQL
|
|
query cannot use a sparse index, even if one was created on attribute `attr`:
|
|
|
|
FOR doc In collection
|
|
FILTER doc.attr == null
|
|
RETURN doc
|
|
|
|
If the lookup value is non-constant, a sparse index may or may not be used, depending on
|
|
the other types of conditions in the query. If the optimizer can safely determine that
|
|
the lookup value cannot be `null`, a sparse index may be used. When uncertain, the optimizer
|
|
will not make use of a sparse index in a query in order to produce correct results.
|
|
|
|
For example, the following queries cannot use a sparse index on `attr` because the optimizer
|
|
will not know beforehand whether the comparison values for `doc.attr` will include `null`:
|
|
|
|
FOR doc In collection
|
|
FILTER doc.attr == SOME_FUNCTION(...)
|
|
RETURN doc
|
|
|
|
FOR other IN otherCollection
|
|
FOR doc In collection
|
|
FILTER doc.attr == other.attr
|
|
RETURN doc
|
|
|
|
Sparse skiplist indexes can be used for sorting if the optimizer can safely detect that the
|
|
index range does not include `null` for any of the index attributes.
|
|
|
|
Note that if you intend to use [joins](../AqlExamples/Join.html) it may be clever
|
|
to use non-sparsity and maybe even uniqueness for that attribute, else all items containing
|
|
the NULL-value will match against each other and thus produce large results.
|
|
|
|
|