mirror of https://gitee.com/bigwinds/arangodb
257 lines
12 KiB
Plaintext
257 lines
12 KiB
Plaintext
!SECTION Index basics
|
|
|
|
Indexes allow fast access to documents, provided the indexed attribute(s)
|
|
are used in a query. While ArangoDB automatically indexes some system
|
|
attributes, users are free to create extra indexes on non-system attributes
|
|
of documents.
|
|
|
|
A user-defined index is created on collection level. Most user-defined indexes
|
|
can be created by specifying the names of the attributes which should be indexed.
|
|
Some index types allow indexing just one attribute (e.g. fulltext index) whereas
|
|
other index types allow indexing multiple attributes at the same time.
|
|
|
|
The system attributes `_id`, `_key`, `_from` and `_to` are automatically indexed
|
|
by ArangoDB, without the user being required to create extra indexes for them.
|
|
|
|
Therefore, indexing `_id`, `_key`, `_rev`, `_from`, and `_to` in a user-defined
|
|
index is often not required and is currently not supported by ArangoDB.
|
|
|
|
ArangoDB provides the following index types:
|
|
|
|
!SUBSECTION Primary Index
|
|
|
|
For each collection there will always be a *primary index* which is a hash index
|
|
for the [document keys](../Glossary/index.html#document_key) (`_key` attribute)
|
|
of all documents in the collection. The primary index allows quick selection
|
|
of documents in the collection using either the `_key` or `_id` attributes. It will
|
|
be used from within AQL queries automatically when performing equality lookups on
|
|
`_key` or `_id`.
|
|
|
|
There are also dedicated functions to find a document given its `_key` or `_id`
|
|
that will always make use of the primary index:
|
|
|
|
```js
|
|
db.collection.document("<document-key>");
|
|
db._document("<document-id>");
|
|
```
|
|
|
|
As the primary index is a hash index, it cannot be used for range queries or for sorting
|
|
on `_key` or `_id`.
|
|
|
|
The primary index of a collection cannot be dropped or changed, and there is no
|
|
mechanism to create user-defined primary indexes.
|
|
|
|
|
|
!SUBSECTION Edges Index
|
|
|
|
Every [edge collection](../Glossary/index.html#edge_collection) also has an
|
|
automatically created *edges index*. The edges index provides quick access to
|
|
documents by either their `_from` or `_to` attributes. It can therefore be
|
|
used to quickly find connections between vertex documents and is invoked when
|
|
the connecting edges of a vertex are queried.
|
|
|
|
Edges indexes are used from within AQL when performing equality lookups on `_from`
|
|
or `_to` values in an edge collections. There are also dedicated functions to
|
|
find edges given their `_from` or `_to` values that will always make use of the
|
|
edges index:
|
|
|
|
```js
|
|
db.collection.edges("<from-value>");
|
|
db.collection.edges("<to-value>");
|
|
db.collection.outEdges("<from-value>");
|
|
db.collection.outEdges("<to-value>");
|
|
db.collection.inEdges("<from-value>");
|
|
db.collection.inEdges("<to-value>");
|
|
```
|
|
|
|
The edges index is a hash index. It can be used for equality lookups only, but not for range
|
|
queries or for sorting. As edges indexes are automatically created for edge collections, it
|
|
is not possible to create user-defined edges indexes.
|
|
|
|
The edges index cannot be dropped or changed.
|
|
|
|
|
|
!SUBSECTION Hash Index
|
|
|
|
A hash index can be used to quickly find documents with specific attribute values.
|
|
The hash index is unsorted, so it supports equality lookups but no range queries or sorting.
|
|
|
|
A hash index can be created on one or multiple document attributes. A hash index will
|
|
only be used by a query if all indexed attributes are present in the search condition,
|
|
and if all attributes are compared using the equality (`==`) operator. Hash indexes are
|
|
used from within AQL and several query functions, e.g. `byExample`, `firstExample` etc.
|
|
|
|
Hash indexes can optionally be declared to be unique, disallowing saving the same
|
|
value in the indexed attribute. Hash indexes can optionally be sparse.
|
|
|
|
The different types of hash indexes have the following characteristics:
|
|
|
|
* **unique hash index**: all documents in the collection must have different values for
|
|
the attributes covered by the unique index. Trying to insert a document with the same
|
|
key value as an already existing document will lead to a unique constraint
|
|
violation.
|
|
|
|
This type of index is not sparse. Documents that do not contain the index attributes or
|
|
that have a value of `null` in the index attribute(s) will still be indexed.
|
|
A key value of `null` may only occur once in the index, so this type of index cannot
|
|
be used for optional attributes.
|
|
|
|
* **unique, sparse hash index**: all documents in the collection must have different
|
|
values for the attributes covered by the unique index. Documents in which at least one
|
|
of the index attributes is not set or has a value of `null` are not included in the
|
|
index. This type of index can be used to ensure that there are no duplicate keys in
|
|
the collection for documents which have the indexed attributes set. As the index will
|
|
exclude documents for which the indexed attributes are `null` or not set, it can be
|
|
used for optional attributes.
|
|
|
|
* **non-unique hash index**: all documents in the collection will be indexed. This type
|
|
of index is not sparse. Documents that do not contain the index attributes or that have
|
|
a value of `null` in the index attribute(s) will still be indexed. Duplicate key values
|
|
can occur and do not lead to unique constraint violations.
|
|
|
|
* **non-unique, sparse hash index**: only those documents will be indexed that have all
|
|
the indexed attributes set to a value other than `null`. It can be used for optional
|
|
attributes.
|
|
|
|
The amortized complexity of lookup, insert, update, and removal operations in unique hash
|
|
indexes is O(1).
|
|
|
|
Non-unique hash indexes have an amortized complexity of O(1) for inserts. Lookup, update
|
|
and removal operations in non-unique hash indexes have an amortized complexity that is
|
|
linearly correlated with the number of duplicates for a given key. That means non-unique
|
|
hash indexes should not be used on attributes with very low cardinality.
|
|
|
|
If a hash index is created on an attribute that it is missing in all or many of the documents,
|
|
the behavior is as follows:
|
|
|
|
* if the index is sparse, the documents missing the attribute will not be indexed and not
|
|
use index memory. These documents will not influence the update or removal performance
|
|
for the index.
|
|
|
|
* if the index is non-sparse, the documents missing the attribute will be contained in the
|
|
index with a key value of `null`. If many such documents get indexed, a lot of collisions
|
|
will occur, and lookup, update and removal of documents will become expensive. This
|
|
should be avoided if possible.
|
|
|
|
|
|
!SUBSECTION Skiplist Index
|
|
|
|
A skiplist is a sorted index structure. It can be used to quickly find documents
|
|
with specific attribute values but also for range queries and returning documents from
|
|
the index in sorted order. Skiplists will be used from within AQL and several query
|
|
functions, e.g. `byExample`, `firstExample` etc.
|
|
|
|
Skiplist indexes will be used for lookups, range queries and sorting only if either all
|
|
index attributes are provided in a query, or if a leftmost prefix of the index attributes
|
|
is specified.
|
|
|
|
For example, if a skiplist index is created on attributes `value1` and `value2`, the
|
|
following conditions could use the index (note: the `<=` and `>=` operators are intentionally
|
|
omitted here for the sake of brevity):
|
|
|
|
FILTER doc.value1 == ...
|
|
FILTER doc.value1 < ...
|
|
FILTER doc.value1 > ...
|
|
FILTER doc.value1 > ... && doc.value1 < ...
|
|
|
|
FILTER doc.value1 == ... && doc.value2 == ...
|
|
FILTER doc.value1 == ... && doc.value2 > ...
|
|
FILTER doc.value1 == ... && doc.value2 > ... && doc.value2 < ...
|
|
|
|
In order to use a skiplist index for sorting, the index attributes must be specified in
|
|
the `SORT` clause of the query in the same order as they appear in the index definition.
|
|
Skiplist indexes are always created in ascending order, but they can be used to access
|
|
the indexed elements in both ascending or descending order. However, for a combined index
|
|
(an index on multiple attributes) this requires that the sort orders in a single query
|
|
as specified in the `SORT` clause must all be either all ascending (optionally ommitted
|
|
as ascending is the default) or all descending.
|
|
|
|
For example, if the skiplist index is created on attributes `value1` and `value2` (in this order),
|
|
then the following sorts clauses can use the index to determine the sort order:
|
|
|
|
* `SORT value1 ASC, value2 ASC` (and its equivalent `SORT value1, value2`)
|
|
* `SORT value1 DESC, value2 DESC`
|
|
* `SORT value1 ASC` (and its equivalent `SORT value1`)
|
|
* `SORT value1 DESC`
|
|
|
|
However, the following sort clauses cannot make use of the index only:
|
|
|
|
* `SORT value1 ASC, value2 DESC`
|
|
* `SORT value1 DESC, value2 ASC`
|
|
* `SORT value2` (and its equivalent `SORT value2 ASC`)
|
|
* `SORT value2 DESC` (because first indexed attribute `value1` is not used in sort clause)
|
|
|
|
Note: the latter two sort clauses cannot use the index because the sort clause does not
|
|
refer to a leftmost prefix of the index attributes.
|
|
|
|
Skiplists can optionally be declared to be unique, disallowing saving the same
|
|
value in the indexed attribute. They can be sparse or non-sparse.
|
|
|
|
The different types of skiplist indexes have the following characteristics:
|
|
|
|
* **unique skiplist index**: all documents in the collection must have different values for
|
|
the attributes covered by the unique index. Trying to insert a document with the same
|
|
key value as an already existing document will lead to a unique constraint
|
|
violation.
|
|
|
|
This type of index is not sparse. Documents that do not contain the index attributes or
|
|
that have a value of `null` in the index attribute(s) will still be indexed.
|
|
A key value of `null` may only occur once in the index, so this type of index cannot
|
|
be used for optional attributes.
|
|
|
|
* **unique, sparse skiplist index**: all documents in the collection must have different
|
|
values for the attributes covered by the unique index. Documents in which at least one
|
|
of the index attributes is not set or has a value of `null` are not included in the
|
|
index. This type of index can be used to ensure that there are no duplicate keys in
|
|
the collection for documents which have the indexed attributes set. As the index will
|
|
exclude documents for which the indexed attributes are `null` or not set, it can be
|
|
used for optional attributes.
|
|
|
|
* **non-unique skiplist index**: all documents in the collection will be indexed. This type
|
|
of index is not sparse. Documents that do not contain the index attributes or that have
|
|
a value of `null` in the index attribute(s) will still be indexed. Duplicate key values
|
|
can occur and do not lead to unique constraint violations.
|
|
|
|
* **non-unique, sparse skiplist index**: only those documents will be indexed that have all
|
|
the indexed attributes set to a value other than `null`. It can be used for optional
|
|
attributes.
|
|
|
|
The operational amortized complexity for skiplist indexes is logarithmically correlated
|
|
with the number of documents in the index.
|
|
|
|
|
|
!SUBSECTION Geo Index
|
|
|
|
Users can create additional geo indexes on one or multiple attributes in collections.
|
|
A geo index is used to find places on the surface of the earth fast.
|
|
|
|
The geo index stores two-dimensional coordinates. It can be created on either two
|
|
separate document attributes (latitude and longitude) or a single array attribute that
|
|
contains both latitude and longitude. Latitude and longitude must be numeric values.
|
|
|
|
Th geo index provides operations to find documents with coordinates nearest to a given
|
|
comparison coordinate, and to find documents with coordinates that are within a specifiable
|
|
radius around a comparison coordinate.
|
|
|
|
The geo index is used via dedicated functions in AQL or the simple queries, but will
|
|
not enabled for other types of queries or conditions.
|
|
|
|
|
|
!SUBSECTION Fulltext Index
|
|
|
|
A fulltext index can be used to find words, or prefixes of words inside documents.
|
|
A fulltext index can be created on a single attribute only, and will index all words
|
|
contained in documents that have a textual value in that attribute. Only words with a (specifiable)
|
|
minimum length are indexed. Word tokenization is done using the word boundary analysis
|
|
provided by libicu, which is taking into account the selected language provided at
|
|
server start. Words are indexed in their lower-cased form. The index supports complete
|
|
match queries (full words) and prefix queries, plus basic logical operations such as
|
|
`and`, `or` and `not` for combining partial results.
|
|
|
|
The fulltext index is sparse, meaning it will only index documents for which the index
|
|
attribute is set and contains a string value. Additionally, only words with a configurable
|
|
minimum length will be included in the index.
|
|
|
|
The fulltext index is used via dedicated functions in AQL or the simple queries, but will
|
|
not be enabled for other types of queries or conditions.
|