1
0
Fork 0
arangodb/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp

257 lines
12 KiB
Plaintext

!SECTION Index basics
Indexes allow fast access to documents, provided the indexed attribute(s)
are used in a query. While ArangoDB automatically indexes some system
attributes, users are free to create extra indexes on non-system attributes
of documents.
A user-defined index is created on collection level. Most user-defined indexes
can be created by specifying the names of the attributes which should be indexed.
Some index types allow indexing just one attribute (e.g. fulltext index) whereas
other index types allow indexing multiple attributes at the same time.
The system attributes `_id`, `_key`, `_from` and `_to` are automatically indexed
by ArangoDB, without the user being required to create extra indexes for them.
Therefore, indexing `_id`, `_key`, `_rev`, `_from`, and `_to` in a user-defined
index is often not required and is currently not supported by ArangoDB.
ArangoDB provides the following index types:
!SUBSECTION Primary Index
For each collection there will always be a *primary index* which is a hash index
for the [document keys](../Glossary/index.html#document_key) (`_key` attribute)
of all documents in the collection. The primary index allows quick selection
of documents in the collection using either the `_key` or `_id` attributes. It will
be used from within AQL queries automatically when performing equality lookups on
`_key` or `_id`.
There are also dedicated functions to find a document given its `_key` or `_id`
that will always make use of the primary index:
```js
db.collection.document("<document-key>");
db._document("<document-id>");
```
As the primary index is a hash index, it cannot be used for range queries or for sorting
on `_key` or `_id`.
The primary index of a collection cannot be dropped or changed, and there is no
mechanism to create user-defined primary indexes.
!SUBSECTION Edges Index
Every [edge collection](../Glossary/index.html#edge_collection) also has an
automatically created *edges index*. The edges index provides quick access to
documents by either their `_from` or `_to` attributes. It can therefore be
used to quickly find connections between vertex documents and is invoked when
the connecting edges of a vertex are queried.
Edges indexes are used from within AQL when performing equality lookups on `_from`
or `_to` values in an edge collections. There are also dedicated functions to
find edges given their `_from` or `_to` values that will always make use of the
edges index:
```js
db.collection.edges("<from-value>");
db.collection.edges("<to-value>");
db.collection.outEdges("<from-value>");
db.collection.outEdges("<to-value>");
db.collection.inEdges("<from-value>");
db.collection.inEdges("<to-value>");
```
The edges index is a hash index. It can be used for equality lookups only, but not for range
queries or for sorting. As edges indexes are automatically created for edge collections, it
is not possible to create user-defined edges indexes.
The edges index cannot be dropped or changed.
!SUBSECTION Hash Index
A hash index can be used to quickly find documents with specific attribute values.
The hash index is unsorted, so it supports equality lookups but no range queries or sorting.
A hash index can be created on one or multiple document attributes. A hash index will
only be used by a query if all indexed attributes are present in the search condition,
and if all attributes are compared using the equality (`==`) operator. Hash indexes are
used from within AQL and several query functions, e.g. `byExample`, `firstExample` etc.
Hash indexes can optionally be declared to be unique, disallowing saving the same
value in the indexed attribute. Hash indexes can optionally be sparse.
The different types of hash indexes have the following characteristics:
* **unique hash index**: all documents in the collection must have different values for
the attributes covered by the unique index. Trying to insert a document with the same
key value as an already existing document will lead to a unique constraint
violation.
This type of index is not sparse. Documents that do not contain the index attributes or
that have a value of `null` in the index attribute(s) will still be indexed.
A key value of `null` may only occur once in the index, so this type of index cannot
be used for optional attributes.
* **unique, sparse hash index**: all documents in the collection must have different
values for the attributes covered by the unique index. Documents in which at least one
of the index attributes is not set or has a value of `null` are not included in the
index. This type of index can be used to ensure that there are no duplicate keys in
the collection for documents which have the indexed attributes set. As the index will
exclude documents for which the indexed attributes are `null` or not set, it can be
used for optional attributes.
* **non-unique hash index**: all documents in the collection will be indexed. This type
of index is not sparse. Documents that do not contain the index attributes or that have
a value of `null` in the index attribute(s) will still be indexed. Duplicate key values
can occur and do not lead to unique constraint violations.
* **non-unique, sparse hash index**: only those documents will be indexed that have all
the indexed attributes set to a value other than `null`. It can be used for optional
attributes.
The amortized complexity of lookup, insert, update, and removal operations in unique hash
indexes is O(1).
Non-unique hash indexes have an amortized complexity of O(1) for inserts. Lookup, update
and removal operations in non-unique hash indexes have an amortized complexity that is
linearly correlated with the number of duplicates for a given key. That means non-unique
hash indexes should not be used on attributes with very low cardinality.
If a hash index is created on an attribute that it is missing in all or many of the documents,
the behavior is as follows:
* if the index is sparse, the documents missing the attribute will not be indexed and not
use index memory. These documents will not influence the update or removal performance
for the index.
* if the index is non-sparse, the documents missing the attribute will be contained in the
index with a key value of `null`. If many such documents get indexed, a lot of collisions
will occur, and lookup, update and removal of documents will become expensive. This
should be avoided if possible.
!SUBSECTION Skiplist Index
A skiplist is a sorted index structure. It can be used to quickly find documents
with specific attribute values but also for range queries and returning documents from
the index in sorted order. Skiplists will be used from within AQL and several query
functions, e.g. `byExample`, `firstExample` etc.
Skiplist indexes will be used for lookups, range queries and sorting only if either all
index attributes are provided in a query, or if a leftmost prefix of the index attributes
is specified.
For example, if a skiplist index is created on attributes `value1` and `value2`, the
following conditions could use the index (note: the `<=` and `>=` operators are intentionally
omitted here for the sake of brevity):
FILTER doc.value1 == ...
FILTER doc.value1 < ...
FILTER doc.value1 > ...
FILTER doc.value1 > ... && doc.value1 < ...
FILTER doc.value1 == ... && doc.value2 == ...
FILTER doc.value1 == ... && doc.value2 > ...
FILTER doc.value1 == ... && doc.value2 > ... && doc.value2 < ...
In order to use a skiplist index for sorting, the index attributes must be specified in
the `SORT` clause of the query in the same order as they appear in the index definition.
Skiplist indexes are always created in ascending order, but they can be used to access
the indexed elements in both ascending or descending order. However, for a combined index
(an index on multiple attributes) this requires that the sort orders in a single query
as specified in the `SORT` clause must all be either all ascending (optionally ommitted
as ascending is the default) or all descending.
For example, if the skiplist index is created on attributes `value1` and `value2` (in this order),
then the following sorts clauses can use the index to determine the sort order:
* `SORT value1 ASC, value2 ASC` (and its equivalent `SORT value1, value2`)
* `SORT value1 DESC, value2 DESC`
* `SORT value1 ASC` (and its equivalent `SORT value1`)
* `SORT value1 DESC`
However, the following sort clauses cannot make use of the index only:
* `SORT value1 ASC, value2 DESC`
* `SORT value1 DESC, value2 ASC`
* `SORT value2` (and its equivalent `SORT value2 ASC`)
* `SORT value2 DESC` (because first indexed attribute `value1` is not used in sort clause)
Note: the latter two sort clauses cannot use the index because the sort clause does not
refer to a leftmost prefix of the index attributes.
Skiplists can optionally be declared to be unique, disallowing saving the same
value in the indexed attribute. They can be sparse or non-sparse.
The different types of skiplist indexes have the following characteristics:
* **unique skiplist index**: all documents in the collection must have different values for
the attributes covered by the unique index. Trying to insert a document with the same
key value as an already existing document will lead to a unique constraint
violation.
This type of index is not sparse. Documents that do not contain the index attributes or
that have a value of `null` in the index attribute(s) will still be indexed.
A key value of `null` may only occur once in the index, so this type of index cannot
be used for optional attributes.
* **unique, sparse skiplist index**: all documents in the collection must have different
values for the attributes covered by the unique index. Documents in which at least one
of the index attributes is not set or has a value of `null` are not included in the
index. This type of index can be used to ensure that there are no duplicate keys in
the collection for documents which have the indexed attributes set. As the index will
exclude documents for which the indexed attributes are `null` or not set, it can be
used for optional attributes.
* **non-unique skiplist index**: all documents in the collection will be indexed. This type
of index is not sparse. Documents that do not contain the index attributes or that have
a value of `null` in the index attribute(s) will still be indexed. Duplicate key values
can occur and do not lead to unique constraint violations.
* **non-unique, sparse skiplist index**: only those documents will be indexed that have all
the indexed attributes set to a value other than `null`. It can be used for optional
attributes.
The operational amortized complexity for skiplist indexes is logarithmically correlated
with the number of documents in the index.
!SUBSECTION Geo Index
Users can create additional geo indexes on one or multiple attributes in collections.
A geo index is used to find places on the surface of the earth fast.
The geo index stores two-dimensional coordinates. It can be created on either two
separate document attributes (latitude and longitude) or a single array attribute that
contains both latitude and longitude. Latitude and longitude must be numeric values.
Th geo index provides operations to find documents with coordinates nearest to a given
comparison coordinate, and to find documents with coordinates that are within a specifiable
radius around a comparison coordinate.
The geo index is used via dedicated functions in AQL or the simple queries, but will
not enabled for other types of queries or conditions.
!SUBSECTION Fulltext Index
A fulltext index can be used to find words, or prefixes of words inside documents.
A fulltext index can be created on a single attribute only, and will index all words
contained in documents that have a textual value in that attribute. Only words with a (specifiable)
minimum length are indexed. Word tokenization is done using the word boundary analysis
provided by libicu, which is taking into account the selected language provided at
server start. Words are indexed in their lower-cased form. The index supports complete
match queries (full words) and prefix queries, plus basic logical operations such as
`and`, `or` and `not` for combining partial results.
The fulltext index is sparse, meaning it will only index documents for which the index
attribute is set and contains a string value. Additionally, only words with a configurable
minimum length will be included in the index.
The fulltext index is used via dedicated functions in AQL or the simple queries, but will
not be enabled for other types of queries or conditions.