1
0
Fork 0
arangodb/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp

205 lines
8.2 KiB
Plaintext

!SECTION Index basics
Indexes allow fast access to documents, provided the indexed attribute(s)
are used in a query. While ArangoDB automatically indexes some system
attributes, users are free to create extra indexes on non-system attributes
of documents.
A user-defined index is created on collection level. Most user-defined indexes
can be created by specifying the names of the attributes which should be indexed.
Some index types allow indexing just one attribute (e.g. fulltext index) whereas
other index types allow indexing multiple attributes at the same time.
The system attributes `_id`, `_key`, `_from` and `-to` are automatically indexed
by ArangoDB, without the user being required to create extra indexes for them.
Therefore, indexing `_id`, `_key`, `_rev`, `_from`, and `_to` in a user-defined
index is often not required and is currently not supported by ArangoDB.
ArangoDB provides the following index types:
!SUBSECTION Primary Index
For each collection there will always be a *primary index* which is a hash index
for the [document keys](../Glossary/README.html#document_key) (`_key` attribute)
of all documents in the collection. The primary index allows quick selection
of documents in the collection using either the `_key` or `_id` attributes.
There are also dedicated functions to find a document given its `_key` or `_id`
that will always make use of the primary index:
```js
db.collection.document("<document-key>");
db._document("<document-id>");
```
The primary index of a collection cannot be dropped or changed.
!SUBSECTION Edges Index
Every [edge collection](../Glossary/README.html#edge_collection) also has an
automatically created *edges index*. The edges index provides quick access to
documents by either their `_from` or `_to` attributes. It can therefore be
used to quickly find connections between vertex documents and is invoked when
the connecting edges of a vertex are queried.
The edges index cannot be dropped or changed. Extra edges indexes cannot be
created on other attributes or in non-edge collections.
There are also dedidacted functions to find edges given their `_from` or `_to`
values that will always make use of the edges index:
```js
db.collection.edges("<from-value>");
db.collection.edges("<to-value>");
db.collection.outEdges("<from-value>");
db.collection.outEdges("<to-value>");
db.collection.inEdges("<from-value>");
db.collection.inEdges("<to-value>");
```
!SUBSECTION Hash Index
A hash index can be used to quickly find documents with specific attribute values.
The hash index is unsorted, so it supports equality lookups but no range queries.
A hash index can be created on one or multiple document attributes. A hash index will
only be used by a query if all indexed attributes are present in the search condition,
and if all attributes are compared using the equality (`==`) operator.
Hash indexes can optionally be declared to be unique, disallowing saving the same
value in the indexed attribute.
Hash indexes are supported by AQL and several query functions, e.g. `byExample`,
`firstExample` etc.
!SUBSECTION Skiplist Index
A skiplist is a sorted index structure. They can be used to quickly find documents
with specific attribute values but also support range queries. They can also be used
for sorting in AQL.
A skiplist can be created on one or multiple document attributes.
Skiplists can optionally be declared to be unique, disallowing saving the same
value in the indexed attribute.
Skiplists are supported by AQL and several query functions, e.g. `byExample`,
`firstExample` etc.
!SUBSECTION Geo Index
A geo index is used to find places on the surface of the earth fast. The
geo index in ArangoDB supports near and within queries. There are special functions
to query geo indexes.
!SUBSECTION Fulltext Index
A fulltext index can be used to find words, or prefixes of words inside documents.
A fulltext index can be set on one attribute only, and will index all words contained
in documents that have a textual value in this attribute. Only words with a (specifyable)
minimum length are indexed. Word tokenization is done using the word boundary analysis
provided by libicu, which is taking into account the selected language provided at
server start. Words are indexed in their lower-cased form. The index supports complete
match queries (full words) and prefix queries.
!SECTION Index Identifiers and Handles
An *index handle* uniquely identifies an index in the database. It is a string and
consists of the collection name and an *index identifier* separated by a `/`. The
index identifier part is a numeric value that is auto-generated by ArangoDB.
A specific index of a collection can be accessed using its *index handle* or
*index identifier* as follows:
```js
db.collection.index("<index-handle>");
db.collection.index("<index-identifier>");
db._index("<index-handle>");
```
For example: Assume that the index handle, which is stored in the `_id`
attribute of the index, is `demo/362549736` and the index was created in a collection
named `demo`. Then this index can be accessed as:
```js
db.demo.index("demo/362549736");
```
Because the index handle is unique within the database, you can leave out the
*collection* and use the shortcut:
```js
db._index("demo/362549736");
```
!SECTION Which Index type to use when
ArangoDB automatically indexes the `_key` attribute in each collection. There
is no need to index this attribute separately. Please note that a document's
`_id` attribute is derived from the `_key` attribute, and is thus implicitly
indexed, too.
ArangoDB will also automatically create an index on `_from` and `_to` in any
edge collection, meaning incoming and outgoing connections can be determined
efficiently.
Users can define additional indexes on one or multiple document attributes.
Several different index types are provided by ArangoDB. These indexes have
different usage scenarios:
- hash index: provides quick access to individual documents if (and only if)
all indexed attributes are provided in the search query. The index will only
be used for equality comparisons. It does not support range queries and
cannot be used for sorting..
The hash index is a good candidate if all or most queries on the indexed
attribute(s) are equality comparisons. It will be the most efficient index
type if the index is declared unique.
Insertions into a non-unique hash index are also very efficent. Removal
performance in a non-unique hash index depends on how often the indexed
attribute's values repeat. If there are a lot of value repetitions, the
removal performance in a non-unique hash index will suffer.
A non-unique hash index should there not be used if duplicate index values
are allowed (i.e. when the hash index is not declared *unique*) and there
will be many duplicate values in the index plus a lot of document removal
operations in the collection.
- skip list index: skip lists keep the indexed values in an order, so they can
be used for equality lookups, range queries and for sorting. Skip list indexes
will have a higher overhead than hash indexes but they are more general and
allow more use cases (e.g. range queries). Additionally, they can be used
for lower selectivity attributes, when non-unique hash indexes are not a
good fit.
- geo index: the geo index provided by ArangoDB allows searching for documents
within a radius around a two-dimensional earth coordinate (point), or to
find documents with are closest to a point. Document coordinates can either
be specified in two different document attributes or in a single attribute, e.g.
{ "latitude": 50.9406645, "longitude": 6.9599115 }
or
{ "coords": [ 50.9406645, 6.9599115 ] }
- fulltext index: a fulltext index can be used to index all words contained in
a specific attribute of all documents in a collection. Only words with a
(specifiable) minimum length are indexed. Word tokenization is done using
the word boundary analysis provided by libicu, which is taking into account
the selected language provided at server start.
The index supports complete match queries (full words) and prefix queries.
- cap constraint: the cap constraint provided by ArangoDB indexes documents
not to speed up search queries, but to limit (cap) the number or size of
documents in a collection.