1
0
Fork 0
arangodb/Documentation/Books/Manual/indexing-index-utilization.md

102 lines
5.8 KiB
Markdown

---
layout: default
description: Index Utilization
---
Index Utilization
=================
In most cases ArangoDB will use a single index per collection in a given query. AQL queries can
use more than one index per collection when multiple FILTER conditions are combined with a
logical `OR` and these can be covered by indexes. AQL queries will use a single index per
collection when FILTER conditions are combined with logical `AND`.
Creating multiple indexes on different attributes of the same collection may give the query
optimizer more choices when picking an index. Creating multiple indexes on different attributes
can also help in speeding up different queries, with FILTER conditions on different attributes.
It is often beneficial to create an index on more than just one attribute. By adding more attributes
to an index, an index can become more selective and thus reduce the number of documents that
queries need to process.
ArangoDB's primary indexes, edges indexes and hash indexes will automatically provide selectivity
estimates. Index selectivity estimates are provided in the web interface, the `getIndexes()` return
value and in the `explain()` output for a given query.
The more selective an index is, the more documents it will filter on average. The index selectivity
estimates are therefore used by the optimizer when creating query execution plans when there are
multiple indexes the optimizer can choose from. The optimizer will then select a combination of
indexes with the lowest estimated total cost. In general, the optimizer will pick the indexes with
the highest estimated selectivity.
Sparse indexes may or may not be picked by the optimizer in a query. As sparse indexes do not contain
`null` values, they will not be used for queries if the optimizer cannot safely determine whether a
FILTER condition includes `null` values for the index attributes. The optimizer policy is to produce
correct results, regardless of whether or which index is used to satisfy FILTER conditions. If it is
unsure about whether using an index will violate the policy, it will not make use of the index.
Troubleshooting
---------------
When in doubt about whether and which indexes will be used for executing a given AQL query,
click the *Explain* button in the web interface in the *Queries* view or use
the `explain()` method for the statement as follows (from the ArangoShell):
```js
var query = "FOR doc IN collection FILTER doc.value > 42 RETURN doc";
var stmt = db._createStatement(query);
stmt.explain();
```
The `explain()` command will return a detailed JSON representation of the query's execution plan.
The JSON explain output is intended to be used by code. To get a human-readable and much more
compact explanation of the query, there is an explainer tool:
```js
var query = "FOR doc IN collection FILTER doc.value > 42 RETURN doc";
require("@arangodb/aql/explainer").explain(query);
```
If any of the explain methods shows that a query is not using indexes, the following steps may help:
* check if the attribute names in the query are correctly spelled. In a schema-free database, documents
in the same collection can have varying structures. There is no such thing as a *non-existing attribute*
error. A query that refers to attribute names not present in any of the documents will not return an
error, and obviously will not benefit from indexes.
* check the return value of the `getIndexes()` method for the collections used in the query and validate
that indexes are actually present on the attributes used in the query's filter conditions.
* if indexes are present but not used by the query, the indexes may have the wrong type. For example, a
hash index will only be used for equality comparisons (i.e. `==`) but not for other comparison types such
as `<`, `<=`, `>`, `>=`. Additionally hash indexes will only be used if all of the index attributes are
used in the query's FILTER conditions. A skiplist index will only be used if at least its first attribute
is used in a FILTER condition. If additionally of the skiplist index attributes are specified in the query
(from left-to-right), they may also be used and allow to filter more documents.
* using indexed attributes as function parameters or in arbitrary expressions will likely lead to the index
on the attribute not being used. For example, the following queries will not use an index on `value`:
FOR doc IN collection FILTER TO_NUMBER(doc.value) == 42 RETURN doc
FOR doc IN collection FILTER doc.value - 1 == 42 RETURN doc
In these cases the queries should be rewritten so that only the index attribute is present on one side of
the operator, or additional filters and indexes should be used to restrict the amount of documents otherwise.
* certain AQL functions such as `WITHIN()` or `FULLTEXT()` do utilize indexes internally, but their use is
not mentioned in the query explanation for functions in general. These functions will raise query errors
(at runtime) if no suitable index is present for the collection in question.
* the query optimizer will in general pick one index per collection in a query. It can pick more than
one index per collection if the FILTER condition contains multiple branches combined with logical `OR`.
For example, the following queries can use indexes:
FOR doc IN collection FILTER doc.value1 == 42 || doc.value1 == 23 RETURN doc
FOR doc IN collection FILTER doc.value1 == 42 || doc.value2 == 23 RETURN doc
FOR doc IN collection FILTER doc.value1 < 42 || doc.value2 > 23 RETURN doc
The two `OR`s in the first query will be converted to an `IN` list, and if there is a suitable index on
`value1`, it will be used. The second query requires two separate indexes on `value1` and `value2` and
will use them if present. The third query can use the indexes on `value1` and `value2` when they are
sorted.