arangodb/Documentation/Books/AQL/execution-and-performance-o...

---
layout: default
description: AQL queries are sent through an optimizer before execution
---
The AQL query optimizer
=======================

AQL queries are sent through an optimizer before execution. The task of the optimizer is
to create an initial execution plan for the query, look for optimization opportunities and
apply them. As a result, the optimizer might produce multiple execution plans for a
single query. It will then calculate the costs for all plans and pick the plan with the
lowest total cost. This resulting plan is considered to be the *optimal plan*, which is
then executed.

The optimizer is designed to only perform optimizations if they are *safe*, in the
meaning that an optimization should not modify the result of a query. A notable exception
to this is that the optimizer is allowed to change the order of results for queries that
do not explicitly specify how results should be sorted.

Execution plans
---------------

The `explain` command can be used to query the optimal executed plan or even all plans
the optimizer has generated. Additionally, `explain` can reveal some more information
about the optimizer's view of the query.

### Inspecting plans using the explain helper

The `explain` method of `ArangoStatement` as shown in the next chapters creates very verbose output.
You can work on the output programmatically, or use this handsome tool that we created
to generate a more human readable representation.

You may use it like this: (we disable syntax highlighting here)
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_01_axplainer
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_01_axplainer}
    ~addIgnoreCollection("test")
    ~db._drop("test");
    db._create("test");
    for (i = 0; i < 100; ++i) { db.test.save({ value: i }); }
    db.test.ensureIndex({ type: "skiplist", fields: [ "value" ] });
    var explain = require("@arangodb/aql/explainer").explain;
    explain("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value", {colors:false});
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_01_axplainer
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}

### Execution plans in detail

Let's have a look at the raw json output of the same execution plan
using the `explain` method of `ArangoStatement`:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_01_explainCreate
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_01_explainCreate}
    stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain();
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_01_explainCreate
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
As you can see, the result details are very verbose so we will not show them in full in the next
sections. Instead, let's take a closer look at the results step by step.

#### Execution nodes

In general, an execution plan can be considered to be a pipeline of processing steps.
Each processing step is carried out by a so-called *execution node*

The `nodes` attribute of the `explain` result contains these *execution nodes* in
the *execution plan*. The output is still very verbose, so here's a shorted form of it:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_02_explainOverview
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_02_explainOverview}
    ~var stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain().plan.nodes.map(function (node) { return node.type; });
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_02_explainOverview
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
*Note that the list of nodes might slightly change in future versions of ArangoDB if
new execution node types get added or the optimizer create somewhat more
optimized plans).*

When a plan is executed, the query execution engine will start with the node at
the bottom of the list (i.e. the *ReturnNode*).

The *ReturnNode*'s purpose is to return data to the caller. It does not produce
data itself, so it will ask the node above itself, this is the *CalculationNode*
in our example.
*CalculationNode*s are responsible for evaluating arbitrary expressions. In our
example query, the *CalculationNode* will evaluate the value of `i.value`, which
is needed by the *ReturnNode*. The calculation will be applied for all data the
*CalculationNode* gets from the node above it, in our example the *IndexNode*.

Finally, all of this needs to be done for documents of collection `test`. This is
where the *IndexNode* enters the game. It will use an index (thus its name)
to find certain documents in the collection and ship it down the pipeline in the
order required by `SORT i.value`. The *IndexNode* itself has a *SingletonNode*
as its input. The sole purpose of a *SingletonNode* node is to provide a single empty
document as input for other processing steps. It is always the end of the pipeline.

Here's a summary:
* SingletonNode: produces an empty document as input for other processing steps.
* IndexNode: iterates over the index on attribute `value` in collection `test`
  in the order required by `SORT i.value`.
* CalculationNode: evaluates the result of the calculation `i.value > 97` to `true` or `false`
* CalculationNode: calculates return value `i.value`
* ReturnNode: returns data to the caller


#### Optimizer rules

Note that in the example, the optimizer has optimized the `SORT` statement away.
It can do it safely because there is a sorted skiplist index on `i.value`, which it has
picked in the *IndexNode*. As the index values are iterated over in sorted order
anyway, the extra *SortNode* would have been redundant and was removed.

Additionally, the optimizer has done more work to generate an execution plan that
avoids as much expensive operations as possible. Here is the list of optimizer rules
that were applied to the plan:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_03_explainRules
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_03_explainRules}
    ~var stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain().plan.rules;
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_03_explainRules
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
Here is the meaning of these rules in context of this query:
* `move-calculations-up`: moves a *CalculationNode* as far up in the processing pipeline
  as possible
* `move-filters-up`: moves a *FilterNode* as far up in the processing pipeline as
  possible
* `remove-redundant-calculations`: replaces references to variables with references to
  other variables that contain the exact same result. In the example query, `i.value`
  is calculated multiple times, but each calculation inside a loop iteration would
  produce the same value. Therefore, the expression result is shared by several nodes.
* `remove-unnecessary-calculations`: removes *CalculationNode*s whose result values are
  not used in the query. In the example this happens due to the `remove-redundant-calculations`
  rule having made some calculations unnecessary.
* `use-indexes`: use an index to iterate over a collection instead of performing a
  full collection scan. In the example case this makes sense, as the index can be
  used for filtering and sorting.
* `remove-filter-covered-by-index`: remove an unnecessary filter whose functionality
  is already covered by an index. In this case the index only returns documents
  matching the filter.
* `use-index-for-sort`: removes a `SORT` operation if it is already satisfied by
  traversing over a sorted index

Note that some rules may appear multiple times in the list, with number suffixes.
This is due to the same rule being applied multiple times, at different positions
in the optimizer pipeline.


#### Collections used in a query

The list of collections used in a plan (and query) is contained in the `collections`
attribute of a plan:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_04_explainCollections
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_04_explainCollections}
    ~var stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain().plan.collections
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_04_explainCollections
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
The `name` attribute contains the name of the `collection`, and `type` is the
access type, which can be either `read` or `write`.


#### Variables used in a query

The optimizer will also return a list of variables used in a plan (and query). This
list will contain auxiliary variables created by the optimizer itself. This list
can be ignored by end users in most cases.


#### Cost of a query

For each plan the optimizer generates, it will calculate the total cost. The plan
with the lowest total cost is considered to be the optimal plan. Costs are
estimates only, as the actual execution costs are unknown to the optimizer.
Costs are calculated based on heuristics that are hard-coded into execution nodes.
Cost values do not have any unit.


### Retrieving all execution plans

To retrieve not just the optimal plan but a list of all plans the optimizer has
generated, set the option `allPlans` to `true`:

This will return a list of all plans in the `plans` attribute instead of in the
`plan` attribute:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_05_explainAllPlans
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_05_explainAllPlans}
    ~var stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain({ allPlans: true });
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_05_explainAllPlans
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
### Retrieving the plan as it was generated by the parser / lexer

To retrieve the plan which closely matches your query, you may turn off most
optimization rules (i.e. cluster rules cannot be disabled if you're running
the explain on a cluster coordinator) set the option `rules` to `-all`:

This will return an unoptimized plan in the `plan`:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_06_explainUnoptimizedPlans
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_06_explainUnoptimizedPlans}
    ~var stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain({ optimizer: { rules: [ "-all" ] } });
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_06_explainUnoptimizedPlans
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
Note that some optimizations are already done at parse time (i.e. evaluate simple constant
calculation as `1 + 1`)


Turning specific optimizer rules off
------------------------------------

Optimizer rules can also be turned on or off individually, using the `rules` attribute.
This can be used to enable or disable one or multiple rules. Rules that shall be enabled
need to be prefixed with a `+`, rules to be disabled should be prefixed with a `-`. The
pseudo-rule `all` matches all rules.

Rules specified in `rules` are evaluated from left to right, so the following works to
turn on just the one specific rule:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_07_explainSingleRulePlans
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_07_explainSingleRulePlans}
    ~var stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain({ optimizer: { rules: [ "-all", "+use-index-range" ] } });
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_07_explainSingleRulePlans
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
By default, all rules are turned on. To turn off just a few specific rules, use something
like this:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_08_explainDisableSingleRulePlans
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_08_explainDisableSingleRulePlans}
    ~var stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain({ optimizer: { rules: [ "-use-index-range", "-use-index-for-sort" ] } });
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_08_explainDisableSingleRulePlans
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
The maximum number of plans created by the optimizer can also be limited using the
`maxNumberOfPlans` attribute:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_09_explainMaxNumberOfPlans
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_09_explainMaxNumberOfPlans}
    ~var stmt = db._createStatement("FOR i IN test FILTER i.value > 97 SORT i.value RETURN i.value");
    stmt.explain({ maxNumberOfPlans: 1 });
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_09_explainMaxNumberOfPlans
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
Optimizer statistics
--------------------

The optimizer will return statistics as a part of an `explain` result.

The following attributes will be returned in the `stats` attribute of an `explain` result:

- `plansCreated`: total number of plans created by the optimizer
- `rulesExecuted`: number of rules executed (note: an executed rule does not
  indicate a plan was actually modified by a rule)
- `rulesSkipped`: number of rules skipped by the optimizer

Warnings
--------

For some queries, the optimizer may produce warnings. These will be returned in
the `warnings` attribute of the `explain` result:
{% arangoshexample examplevar="examplevar" script="script" result="result" %}
    @startDocuBlockInline AQLEXP_10_explainWarn
    @EXAMPLE_ARANGOSH_OUTPUT{AQLEXP_10_explainWarn}
    var stmt = db._createStatement("FOR i IN 1..10 RETURN 1 / 0")
    stmt.explain().warnings;
    ~db._drop("test")
    ~removeIgnoreCollection("test")
    @END_EXAMPLE_ARANGOSH_OUTPUT
    @endDocuBlock AQLEXP_10_explainWarn
{% endarangoshexample %}
{% include arangoshexample.html id=examplevar script=script result=result %}
There is an upper bound on the number of warning a query may produce. If that
bound is reached, no further warnings will be returned.


Optimization in a cluster
-------------------------

When you're running AQL in the cluster, the parsing of the query is done on the
coordinator. The coordinator then chops the query into snipets, which are to
remain on the coordinator, and others that are to be distributed over the network
to the shards. The cutting sites are interconnected via *Scatter-*, *Gather-* and *RemoteNodes*.

These nodes mark the network borders of the snippets. The optimizer strives to reduce the amount
of data transfered via these network interfaces by pushing `FILTER`s out to the shards,
as it is vital to the query performance to reduce that data amount to transfer over the
network links.

Snippets marked with **DBS** are executed on the shards, **COOR** ones are excuted on the coordinator.

**As usual, the optimizer can only take certain assumptions for granted when doing so,
i.e. [user-defined functions have to be executed on the coordinator](extending.html).
If in doubt, you should modify your query to reduce the number interconnections between your snippets.**

When optimizing your query you may want to look at simpler parts of it first.

List of execution nodes
-----------------------

The following execution node types will appear in the output of `explain`:

* *SingletonNode*: the purpose of a *SingletonNode* is to produce an empty document
  that is used as input for other processing steps. Each execution plan will contain
  exactly one *SingletonNode* as its top node.
* *EnumerateCollectionNode*: enumeration over documents of a collection (given in
  its *collection* attribute) without using an index.
* *IndexNode*: enumeration over one or many indexes (given in its *indexes* attribute)
  of a collection. The index ranges are specified in the *condition* attribute of the node.
* *EnumerateListNode*: enumeration over a list of (non-collection) values.
* *FilterNode*: only lets values pass that satisfy a filter condition. Will appear once
  per *FILTER* statement.
* *LimitNode*: limits the number of results passed to other processing steps. Will
  appear once per *LIMIT* statement.
* *CalculationNode*: evaluates an expression. The expression result may be used by
  other nodes, e.g. *FilterNode*, *EnumerateListNode*, *SortNode* etc.
* *SubqueryNode*: executes a subquery.
* *SortNode*: performs a sort of its input values.
* *AggregateNode*: aggregates its input and produces new output variables. This will
  appear once per *COLLECT* statement.
* *ReturnNode*: returns data to the caller. Will appear in each read-only query at
  least once. Subqueries will also contain *ReturnNode*s.
* *InsertNode*: inserts documents into a collection (given in its *collection*
  attribute). Will appear exactly once in a query that contains an *INSERT* statement.
* *RemoveNode*: removes documents from a collection (given in its *collection*
  attribute). Will appear exactly once in a query that contains a *REMOVE* statement.
* *ReplaceNode*: replaces documents in a collection (given in its *collection*
  attribute). Will appear exactly once in a query that contains a *REPLACE* statement.
* *UpdateNode*: updates documents in a collection (given in its *collection*
  attribute). Will appear exactly once in a query that contains an *UPDATE* statement.
* *UpsertNode*: upserts documents in a collection (given in its *collection*
  attribute). Will appear exactly once in a query that contains an *UPSERT* statement.
* *NoResultsNode*: will be inserted if *FILTER* statements turn out to be never
  satisfiable. The *NoResultsNode* will pass an empty result set into the processing
  pipeline.

For queries in the cluster, the following nodes may appear in execution plans:

* *SingleRemoteOperationNode*: used on a coordinator to directly work with a single
  document on a DB-Server that was referenced by its `_key`.
* *ScatterNode*: used on a coordinator to fan-out data to one or multiple shards.
* *GatherNode*: used on a coordinator to aggregate results from one or many shards
  into a combined stream of results.
* *DistributeNode*: used on a coordinator to fan-out data to one or multiple shards,
  taking into account a collection's shard key.
* *RemoteNode*: a *RemoteNode* will perform communication with another ArangoDB
  instances in the cluster. For example, the cluster coordinator will need to
  communicate with other servers to fetch the actual data from the shards. It
  will do so via *RemoteNode*s. The data servers themselves might again pull
  further data from the coordinator, and thus might also employ *RemoteNode*s.
  So, all of the above cluster relevant nodes will be accompanied by a *RemoteNode*.


List of optimizer rules
-----------------------

The following optimizer rules may appear in the `rules` attribute of a plan:

* `move-calculations-up`: will appear if a *CalculationNode* was moved up in a plan.
  The intention of this rule is to move calculations up in the processing pipeline
  as far as possible (ideally out of enumerations) so they are not executed in loops
  if not required. It is also quite common that this rule enables further optimizations
  to kick in.
* `move-filters-up`: will appear if a *FilterNode* was moved up in a plan. The
  intention of this rule is to move filters up in the processing pipeline as far
  as possible (ideally out of inner loops) so they filter results as early as possible.
* `sort-in-values`: will appear when the values used as right-hand side of an `IN`
  operator will be pre-sorted using an extra function call. Pre-sorting the comparison
  array allows using a binary search in-list lookup with a logarithmic complexity instead
  of the default linear complexity in-list lookup.
* `remove-unnecessary-filters`: will appear if a *FilterNode* was removed or replaced.
  *FilterNode*s whose filter condition will always evaluate to *true* will be
  removed from the plan, whereas *FilterNode* that will never let any results pass
  will be replaced with a *NoResultsNode*.
* `remove-redundant-calculations`: will appear if redundant calculations (expressions
  with the exact same result) were found in the query. The optimizer rule will then
  replace references to the redundant expressions with a single reference, allowing
  other optimizer rules to remove the then-unneeded *CalculationNode*s.
* `remove-unnecessary-calculations`: will appear if *CalculationNode*s were removed
  from the query. The rule will removed all calculations whose result is not
  referenced in the query (note that this may be a consequence of applying other
  optimizations).
* `remove-redundant-sorts`: will appear if multiple *SORT* statements can be merged
  into fewer sorts.
* `interchange-adjacent-enumerations`: will appear if a query contains multiple
  *FOR* statements whose order were permuted. Permutation of *FOR* statements is
  performed because it may enable further optimizations by other rules.
* `remove-collect-variables`: will appear if an *INTO* clause was removed from a *COLLECT*
  statement because the result of *INTO* is not used. May also appear if a result
  of a *COLLECT* statement's *AGGREGATE* variables is not used.
* `propagate-constant-attributes`: will appear when a constant value was inserted
  into a filter condition, replacing a dynamic attribute value.
* `replace-or-with-in`: will appear if multiple *OR*-combined equality conditions
  on the same variable or attribute were replaced with an *IN* condition.
* `remove-redundant-or`: will appear if multiple *OR* conditions for the same variable
  or attribute were combined into a single condition.
* `use-indexes`: will appear when an index is used to iterate over a collection.
  As a consequence, an *EnumerateCollectionNode* was replaced with an
  *IndexNode* in the plan.
* `remove-filter-covered-by-index`: will appear if a *FilterNode* was removed or replaced
  because the filter condition is already covered by an *IndexNode*.
* `remove-filter-covered-by-traversal`: will appear if a *FilterNode* was removed or replaced
  because the filter condition is already covered by an *TraversalNode*.
* `use-index-for-sort`: will appear if an index can be used to avoid a *SORT*
  operation. If the rule was applied, a *SortNode* was removed from the plan.
* `move-calculations-down`: will appear if a *CalculationNode* was moved down in a plan.
  The intention of this rule is to move calculations down in the processing pipeline
  as far as possible (below *FILTER*, *LIMIT* and *SUBQUERY* nodes) so they are executed
  as late as possible and not before their results are required.
* `patch-update-statements`: will appear if an *UpdateNode* or *ReplaceNode* was patched
  to not buffer its input completely, but to process it in smaller batches. The rule will
  fire for an *UPDATE* or *REPLACE* query that is fed by a full collection scan or an index
  scan only, and that does not use any other collections, indexes, subqueries or traversals.
* `optimize-traversals`: will appear if either the edge or path output variable in an
  AQL traversal was optimized away, or if a *FILTER* condition from the query was moved
  in the *TraversalNode* for early pruning of results.
* `inline-subqueries`: will appear when a subquery was pulled out in its surrounding scope,
  e.g. `FOR x IN (FOR y IN collection FILTER y.value >= 5 RETURN y.test) RETURN x.a`
  would become `FOR tmp IN collection FILTER tmp.value >= 5 LET x = tmp.test RETURN x.a`
* `geo-index-optimizer`: will appear when a geo index is utilized.
* `replace-function-with-index`: will appear when a deprecated index function such as
   `FULLTEXT`, `NEAR`, `WITHIN` or `WITHIN_RECTANGLE` is replaced with a regular
   subquery.
* `fuse-filters`: will appear if the optimizer merges adjacent FILTER nodes together into
   a single FILTER node
* `simplify-conditions`: will appear if the optimizer replaces parts in a CalculationNode's
   expression with simpler expressions
* `remove-sort-rand`: will appear when a *SORT RAND()* expression is removed by
  moving the random iteration into an *EnumerateCollectionNode*. This optimizer rule
  is specific for the MMFiles storage engine.
* `reduce-extraction-to-projection`: will appear when an *EnumerationCollectionNode* or
  an *IndexNode* that would have extracted an entire document was modified to return
  only a projection of each document. Projections are limited to at most 5 different
  document attributes. This optimizer rule is specific for the RocksDB storage engine.
* `optimize-subqueries`: will appear when optimizations are applied to a subquery. The
  optimizer rule will add a *LIMIT* statement to qualifying subqueries to make them
  return less data. Another optimization performed by this rule is to modify the result
  value of subqueries in case only the number of subquery results is checked later.
  This saves copying the document data from the subquery to the outer scope and may
  enable follow-up optimizations.
* `sort-limit`: will appear when a *SortNode* is followed by a *LimitNode* with no
  intervening nodes that may change the element count (e.g. a *FilterNode* which
  could not be moved before the sort, or a source node like *EnumerateCollectionNode*).
  This is used to make the *SortNode* aware of the limit and offset from the *LimitNode*
  to enable some optimizations internal to the *SortNode* which allow for reduced
  memory usage and and in many cases, improved sorting speed. The optimizer may
  choose not to apply the rule if it decides that it will offer little or no benefit.
  In particular it will not apply the rule if the input size is very small or if
  the output from the `LimitNode` is similar in size to the input. In exceptionally rare
  cases, this rule could result in some small slowdown. If observed, one can
  disable the rule for the affected query at the cost of increased memory usage.

The following optimizer rules may appear in the `rules` attribute of cluster plans:

* `optimize-cluster-single-document-operations`: it may appear if you directly reference
  a document by its `_key`; in this case no AQL will be executed on the DB-Servers, instead
  the coordinator will directly work with the documents on the DB-Servers.
* `distribute-in-cluster`: will appear when query parts get distributed in a cluster.
  This is not an optimization rule, and it cannot be turned off.
* `scatter-in-cluster`: will appear when scatter, gather, and remote nodes are inserted
  into a distributed query. This is not an optimization rule, and it cannot be turned off.
* `distribute-filtercalc-to-cluster`: will appear when filters are moved up in a
  distributed execution plan. Filters are moved as far up in the plan as possible to
  make result sets as small as possible as early as possible.
* `distribute-sort-to-cluster`: will appear if sorts are moved up in a distributed query.
  Sorts are moved as far up in the plan as possible to make result sets as small as possible
  as early as possible.
* `remove-unnecessary-remote-scatter`: will appear if a RemoteNode is followed by a
  ScatterNode, and the ScatterNode is only followed by calculations or the SingletonNode.
  In this case, there is no need to distribute the calculation, and it will be handled
  centrally.
* `undistribute-remove-after-enum-coll`: will appear if a RemoveNode can be pushed into
  the same query part that enumerates over the documents of a collection. This saves
  inter-cluster roundtrips between the EnumerateCollectionNode and the RemoveNode.
* `collect-in-cluster`: will appear when a *CollectNode* on a coordinator is accompanied
  by extra *CollectNode*s on the database servers, which will do the heavy processing and
  allow the *CollectNode* on the coordinator to a light-weight aggregation only.
* `restrict-to-single-shard`: will appear if a collection operation (IndexNode or a
  data-modification node) will only affect a single shard, and the operation can be
  restricted to the single shard and is not applied for all shards. This optimization
  can be applied for queries that access a collection only once in the query, and that
  do not use traversals, shortest path queries and that do not access collection data
  dynamically using the `DOCUMENT`, `FULLTEXT`, `NEAR` or `WITHIN` AQL functions.
  Additionally, the optimizer will only pull off this optimization if can safely
  determine the values of all the collection's shard keys from the query, and when the
  shard keys are covered by a single index (this is always true if the shard key is
  the default `_key`).
* `smart-joins`: will appear when the query optimizer can reduce an inter-node join
  to a server-local join. This rule is only active in the *Enterprise Edition* of
  ArangoDB, and will only be employed when joining two collections with identical
  sharding setup via their shard keys.

Note that some rules may appear multiple times in the list, with number suffixes.
This is due to the same rule being applied multiple times, at different positions
in the optimizer pipeline.

### Additional optimizations applied

If a query iterates over a collection (for filtering or counting) but does not need
the actual document values later, the optimizer can apply a "scan-only" optimization
for *EnumerateCollectionNode*s and *IndexNode*s. In this case, it will not build up
a result with the document data at all, which may reduce work significantly especially
with the RocksDB storage engine. In case the document data is actually not needed
later on, it may be sensible to remove it from query strings so the optimizer can
apply the optimization.

If the optimization is applied, it will show up as "scan only" in an AQL
query's execution plan for an *EnumerateCollectionNode* or an *IndexNode*.


Additionally, the optimizer can apply an "index-only" optimization for AQL queries that
can satisfy the retrieval of all required document attributes directly from an index.

This optimization will be triggered for the RocksDB engine if an index is used
that covers all required attributes of the document used later on in the query.
If applied, it will save retrieving the actual document data (which would require
an extra lookup in RocksDB), but will instead build the document data solely
from the index values found. It will only be applied when using up to 5 attributes
from the document, and only if the rest of the document data is not used later
on in the query.

The optimization is currently available for the RocksDB engine for the index types
primary, edge, hash, skiplist and persistent.

If the optimization is applied, it will show up as "index only" in an AQL
query's execution plan for an *IndexNode*.