mirror of https://gitee.com/bigwinds/arangodb
102 lines
5.8 KiB
Markdown
102 lines
5.8 KiB
Markdown
---
|
|
layout: default
|
|
description: Index Utilization
|
|
---
|
|
Index Utilization
|
|
=================
|
|
|
|
In most cases ArangoDB will use a single index per collection in a given query. AQL queries can
|
|
use more than one index per collection when multiple FILTER conditions are combined with a
|
|
logical `OR` and these can be covered by indexes. AQL queries will use a single index per
|
|
collection when FILTER conditions are combined with logical `AND`.
|
|
|
|
Creating multiple indexes on different attributes of the same collection may give the query
|
|
optimizer more choices when picking an index. Creating multiple indexes on different attributes
|
|
can also help in speeding up different queries, with FILTER conditions on different attributes.
|
|
|
|
It is often beneficial to create an index on more than just one attribute. By adding more attributes
|
|
to an index, an index can become more selective and thus reduce the number of documents that
|
|
queries need to process.
|
|
|
|
ArangoDB's primary indexes, edges indexes and hash indexes will automatically provide selectivity
|
|
estimates. Index selectivity estimates are provided in the web interface, the `getIndexes()` return
|
|
value and in the `explain()` output for a given query.
|
|
|
|
The more selective an index is, the more documents it will filter on average. The index selectivity
|
|
estimates are therefore used by the optimizer when creating query execution plans when there are
|
|
multiple indexes the optimizer can choose from. The optimizer will then select a combination of
|
|
indexes with the lowest estimated total cost. In general, the optimizer will pick the indexes with
|
|
the highest estimated selectivity.
|
|
|
|
Sparse indexes may or may not be picked by the optimizer in a query. As sparse indexes do not contain
|
|
`null` values, they will not be used for queries if the optimizer cannot safely determine whether a
|
|
FILTER condition includes `null` values for the index attributes. The optimizer policy is to produce
|
|
correct results, regardless of whether or which index is used to satisfy FILTER conditions. If it is
|
|
unsure about whether using an index will violate the policy, it will not make use of the index.
|
|
|
|
|
|
Troubleshooting
|
|
---------------
|
|
|
|
When in doubt about whether and which indexes will be used for executing a given AQL query,
|
|
click the *Explain* button in the web interface in the *Queries* view or use
|
|
the `explain()` method for the statement as follows (from the ArangoShell):
|
|
|
|
```js
|
|
var query = "FOR doc IN collection FILTER doc.value > 42 RETURN doc";
|
|
var stmt = db._createStatement(query);
|
|
stmt.explain();
|
|
```
|
|
|
|
The `explain()` command will return a detailed JSON representation of the query's execution plan.
|
|
The JSON explain output is intended to be used by code. To get a human-readable and much more
|
|
compact explanation of the query, there is an explainer tool:
|
|
|
|
```js
|
|
var query = "FOR doc IN collection FILTER doc.value > 42 RETURN doc";
|
|
require("@arangodb/aql/explainer").explain(query);
|
|
```
|
|
|
|
If any of the explain methods shows that a query is not using indexes, the following steps may help:
|
|
|
|
* check if the attribute names in the query are correctly spelled. In a schema-free database, documents
|
|
in the same collection can have varying structures. There is no such thing as a *non-existing attribute*
|
|
error. A query that refers to attribute names not present in any of the documents will not return an
|
|
error, and obviously will not benefit from indexes.
|
|
|
|
* check the return value of the `getIndexes()` method for the collections used in the query and validate
|
|
that indexes are actually present on the attributes used in the query's filter conditions.
|
|
|
|
* if indexes are present but not used by the query, the indexes may have the wrong type. For example, a
|
|
hash index will only be used for equality comparisons (i.e. `==`) but not for other comparison types such
|
|
as `<`, `<=`, `>`, `>=`. Additionally hash indexes will only be used if all of the index attributes are
|
|
used in the query's FILTER conditions. A skiplist index will only be used if at least its first attribute
|
|
is used in a FILTER condition. If additionally of the skiplist index attributes are specified in the query
|
|
(from left-to-right), they may also be used and allow to filter more documents.
|
|
|
|
* using indexed attributes as function parameters or in arbitrary expressions will likely lead to the index
|
|
on the attribute not being used. For example, the following queries will not use an index on `value`:
|
|
|
|
FOR doc IN collection FILTER TO_NUMBER(doc.value) == 42 RETURN doc
|
|
FOR doc IN collection FILTER doc.value - 1 == 42 RETURN doc
|
|
|
|
In these cases the queries should be rewritten so that only the index attribute is present on one side of
|
|
the operator, or additional filters and indexes should be used to restrict the amount of documents otherwise.
|
|
|
|
* certain AQL functions such as `WITHIN()` or `FULLTEXT()` do utilize indexes internally, but their use is
|
|
not mentioned in the query explanation for functions in general. These functions will raise query errors
|
|
(at runtime) if no suitable index is present for the collection in question.
|
|
|
|
* the query optimizer will in general pick one index per collection in a query. It can pick more than
|
|
one index per collection if the FILTER condition contains multiple branches combined with logical `OR`.
|
|
For example, the following queries can use indexes:
|
|
|
|
FOR doc IN collection FILTER doc.value1 == 42 || doc.value1 == 23 RETURN doc
|
|
FOR doc IN collection FILTER doc.value1 == 42 || doc.value2 == 23 RETURN doc
|
|
FOR doc IN collection FILTER doc.value1 < 42 || doc.value2 > 23 RETURN doc
|
|
|
|
The two `OR`s in the first query will be converted to an `IN` list, and if there is a suitable index on
|
|
`value1`, it will be used. The second query requires two separate indexes on `value1` and `value2` and
|
|
will use them if present. The third query can use the indexes on `value1` and `value2` when they are
|
|
sorted.
|