1
0
Fork 0

add Section for disabling the optimizer, fix grammer.

This commit is contained in:
Willi Goesgens 2014-11-04 13:35:24 +01:00
parent f76cba14bb
commit a3fb94121e
1 changed files with 50 additions and 29 deletions

View File

@ -41,11 +41,11 @@ let's take a closer look at the results step by step.
!SUBSUBSECTION Execution nodes !SUBSUBSECTION Execution nodes
In general, an execution plan can be considered to be a pipeline as processing steps. In general, an execution plan can be considered to be a pipeline of processing steps.
Each processing step is carried out by a so-called *execution node* Each processing step is carried out by a so-called *execution node*
The `nodes` attribute of the `explain` result contains a these execution nodes in The `nodes` attribute of the `explain` result contains these *execution nodes* in
the execution plan. The output is still very verbose, so here's a shorted form of it: the *execution plan*. The output is still very verbose, so here's a shorted form of it:
``` ```
arangosh> stmt.explain().plan.nodes.map(function (node) { return node.type; }); arangosh> stmt.explain().plan.nodes.map(function (node) { return node.type; });
[ [
@ -66,28 +66,30 @@ When a plan is executed, the query execution engine will start with the node at
the bottom of the list (i.e. the *ReturnNode*). the bottom of the list (i.e. the *ReturnNode*).
The *ReturnNode*'s purpose is to return data to the caller. It does not produce The *ReturnNode*'s purpose is to return data to the caller. It does not produce
data itself, so it will ask the node above itself, that is the *CalculationNode*. data itself, so it will ask the node above itself, this is the *CalculationNode*
in our example.
*CalculationNode*s are responsible for evaluating arbitrary expressions. In our *CalculationNode*s are responsible for evaluating arbitrary expressions. In our
example query, the *CalculationNode* will evaluate the value of `i.value`, which example query, the *CalculationNode* will evaluate the value of `i.value`, which
is needed by the *ReturnNode*. The calculation will be applied for all data the is needed by the *ReturnNode*. The calculation will be applied for all data the
*CalculationNode* gets from the node above it, i.e. the *FilterNode*. *CalculationNode* gets from the node above it, in our example the *FilterNode*.
*FilterNode*s will only let certain documents pass. Normally, filters are based on *FilterNode*s will only let certain documents pass. Normally, filters are based on
evaluationg an expression, and so it is in the example case. The filter expression the evaluation of an expression. The filters expression result (`i.value > 97`)
result is calculated in the *CalculationNode* above the *FilterNode*. is calculated in the *CalculationNode* above the *FilterNode*.
Finally, all of this needs to be done for documents of collection `test`. This is Finally, all of this needs to be done for documents of collection `test`. This is
where the *IndexRangeNode* comes into play. It will use an index (thus the name) where the *IndexRangeNode* enters the game. It will use an index (thus its name)
to find certain documents in the collection and ship it down the pipeline. The to find certain documents in the collection and ship it down the pipeline in the
*IndexRangeNode* itself has a *SingletonNode* as its input. The sole purpose of a order required by `SORT i.value`. The *IndexRangeNode* itself has a *SingletonNode*
*SingletonNode* node is to provide a single empty document as input for other as its input. The sole purpose of a *SingletonNode* node is to provide a single empty
processing steps. It is always the end of the pipeline. document as input for other processing steps. It is always the end of the pipeline.
Here's a summary: Here's a summary:
* SingletonNode: produces empty document as input for other processing steps. * SingletonNode: produces empty document as input for other processing steps.
* IndexRangeNode: iterates over the index on attribute `value` in collection `test` * IndexRangeNode: iterates over the index on attribute `value` in collection `test`
* CalculationNode: calculates condition value `i.value > 97` in the order required by `SORT i.value`.
* FilterNode: only lets documents pass that satisfy condition `i.value > 97` * CalculationNode: evaluates the result of the calculation `i.value > 97` to `true` or `false`
* FilterNode: only lets documents pass where above calculation returned `true`
* CalculationNode: calculates return value `i.value` * CalculationNode: calculates return value `i.value`
* ReturnNode: returns data to the caller * ReturnNode: returns data to the caller
@ -97,10 +99,10 @@ Here's a summary:
Note that in the example, the optimizer has optimized the `SORT` statement away. Note that in the example, the optimizer has optimized the `SORT` statement away.
It can do it safely because there is a sorted index on `i.value`, which it has It can do it safely because there is a sorted index on `i.value`, which it has
picked in the *IndexRangeNode*. As the index values are iterated in sorted order picked in the *IndexRangeNode*. As the index values are iterated in sorted order
anyway, the extra `SORT` would be redundant and was removed. anyway, the extra *SortNode* would be redundant and was removed.
Additionally, the optimizer has done more work to generate an execution plan that Additionally, the optimizer has done more work to generate an execution plan that
avoid as much expensive operations as possible. Here is a list of optimizer rules avoids as much expensive operations as possible. Here is the list of optimizer rules
that were applied to the plan: that were applied to the plan:
arangosh> stmt.explain().plan.rules; arangosh> stmt.explain().plan.rules;
@ -115,10 +117,7 @@ arangosh> stmt.explain().plan.rules;
"use-index-for-sort" "use-index-for-sort"
] ]
*Note that the list of optimizer rules might change if new rules are added to the Here is the meaning of these rules in context of this query:
optimizer or the existing rules get modified.*
Here's what the rules mean in context of this query:
* `move-calculations-up`: moves a *CalculationNode* as far up in the processing pipeline * `move-calculations-up`: moves a *CalculationNode* as far up in the processing pipeline
as possible as possible
* `move-filters-up`: moves a *FilterNode* as far up in the processing pipeline as * `move-filters-up`: moves a *FilterNode* as far up in the processing pipeline as
@ -128,7 +127,7 @@ Here's what the rules mean in context of this query:
is calculated multiple times, but each calculation inside a loop iteration would is calculated multiple times, but each calculation inside a loop iteration would
produce the same value. Therefore, the expression result is shared by several nodes. produce the same value. Therefore, the expression result is shared by several nodes.
* `remove-unnecessary-calculations`: removes *CalculationNode*s whose result values are * `remove-unnecessary-calculations`: removes *CalculationNode*s whose result values are
not used in the query. In the example this is due to the `remove-redundant-calculations` not used in the query. In the example this happenes due to the `remove-redundant-calculations`
rule having made some calculations unnecessary. rule having made some calculations unnecessary.
* `use-index-range`: use an index to iterate over a collection instead of performing a * `use-index-range`: use an index to iterate over a collection instead of performing a
full collection scan. In the example case this makes sense, as the index can be full collection scan. In the example case this makes sense, as the index can be
@ -169,7 +168,7 @@ can be ignored by end users in most cases.
!SUBSUBSECTION Cost of a query !SUBSUBSECTION Cost of a query
For each plan the optimizer generates, it will calculate a total cost. The plan For each plan the optimizer generates, it will calculate the total cost. The plan
with the lowest total cost is considered to be the optimal plan. Costs are with the lowest total cost is considered to be the optimal plan. Costs are
estimates only, as the actual execution costs are unknown to the optimizer. estimates only, as the actual execution costs are unknown to the optimizer.
Costs are calculated based on heuristics that are hard-coded into execution nodes. Costs are calculated based on heuristics that are hard-coded into execution nodes.
@ -196,10 +195,31 @@ arangosh> stmt.explain({ allPlans: true });
} }
``` ```
!SUBSECTION Retrieving the plan as it was generated by the parser / lexer
To retrieve the plan which closely matches your query, you may turn off most
optimization rules (i.e. cluster rules cannot be disabled if you're running
the explain on a cluster coordinator) set the option `rules` to `-all`:
This will return an unoptimized plan in the `plan`:
```
arangosh> stmt.explain({ rules: [ '-all'] });
{
"plan" : {
...
},
...
}
```
Note that some optimisations are already done at parse time (i.e. evaluate simple constant
calculation as 1 + 1)
!SUBSECTION Warnings !SUBSECTION Warnings
For some queries, the optimizer might produce warnings. These will be returned in For some queries, the optimizer may produce warnings. These will be returned in
the `warnings` attribute of the `explain` result: the `warnings` attribute of the `explain` result:
``` ```
@ -213,7 +233,7 @@ arangosh> stmt.explain().warnings;
] ]
``` ```
There is an upper bound on the number of variables a query might produce. If that There is an upper bound on the number of warning a query may produce. If that
bound is reached, no further warnings will be returned. bound is reached, no further warnings will be returned.
@ -229,7 +249,7 @@ The following execution node types will appear in the output of `explain`:
* *IndexRangeNode*: enumeration over a specific index (given in its *index* attribute) * *IndexRangeNode*: enumeration over a specific index (given in its *index* attribute)
of a collection. The index range is specified in the *ranges* attribute of the node. of a collection. The index range is specified in the *ranges* attribute of the node.
* *EnumerateListNode*: enumeration over a list of (non-collection) values. * *EnumerateListNode*: enumeration over a list of (non-collection) values.
* *FilterNode*: only lets values pass that satisfy a fill condition. Will appear once * *FilterNode*: only lets values pass that satisfy a filter condition. Will appear once
per *FILTER* statement. per *FILTER* statement.
* *LimitNode*: limits the number of results passed to other processing steps. Will * *LimitNode*: limits the number of results passed to other processing steps. Will
appear once per *LIMIT* statement. appear once per *LIMIT* statement.
@ -265,6 +285,7 @@ For queries in the cluster, the following nodes may appear in execution plans:
communicate with other servers to fetch the actual data from the shards. It communicate with other servers to fetch the actual data from the shards. It
will do so via *RemoteNode*s. The data servers themselves might again pull will do so via *RemoteNode*s. The data servers themselves might again pull
further data from the coordinator, and thus might also employ *RemoteNode*s. further data from the coordinator, and thus might also employ *RemoteNode*s.
So, all of the above cluster relevant nodes will be accompanied by a *RemoteNode*.
!SUBSECTION List of optimizer rules !SUBSECTION List of optimizer rules
@ -282,7 +303,7 @@ The following optimizer rules may appear in the `rules` attribute of a plan:
removed from the plan, whereas *FilterNode* that will never let any results pass removed from the plan, whereas *FilterNode* that will never let any results pass
will be replaced with a *NoResultsNode*. will be replaced with a *NoResultsNode*.
* `remove-redundant-calculations`: will appear if redundant calculations (expressions * `remove-redundant-calculations`: will appear if redundant calculations (expressions
with the exact same result) are found in the query. The optimizer rule will then with the exact same result) were found in the query. The optimizer rule will then
replace references to the redundant expressions with a single reference, allowing replace references to the redundant expressions with a single reference, allowing
other optimizer rules to remove the then-unneeded *CalculationNode*s. other optimizer rules to remove the then-unneeded *CalculationNode*s.
* `remove-unnecessary-calculations`: will appear if *CalculationNode*s were removed * `remove-unnecessary-calculations`: will appear if *CalculationNode*s were removed
@ -292,13 +313,13 @@ The following optimizer rules may appear in the `rules` attribute of a plan:
* `remove-redundant-sorts`: will appear if multiple *SORT* statements can be merged * `remove-redundant-sorts`: will appear if multiple *SORT* statements can be merged
into fewer sorts. into fewer sorts.
* `interchange-adjacent-enumerations`: will appear if a query contains multiple * `interchange-adjacent-enumerations`: will appear if a query contains multiple
*FOR* statements whose order was permuted. Permutation of *FOR* statements is *FOR* statements whose order were permuted. Permutation of *FOR* statements is
performed because it may enable further optimizations by other rules. performed because it may enable further optimizations by other rules.
* `use-index-range`: will appear if an index can be used to iterate over a collection. * `use-index-range`: will appear if an index can be used to iterate over a collection.
As a consequence, an *EnumerateCollectionNode* will have been replaced with an As a consequence, an *EnumerateCollectionNode* was replaced with an
*IndexRangeNode* in the plan. *IndexRangeNode* in the plan.
* `use-index-for-sort`: will appear if an index can be used to avoid a *SORT* * `use-index-for-sort`: will appear if an index can be used to avoid a *SORT*
operation. If the rule was applied, a *SortNode* will have been removed from the operation. If the rule was applied, a *SortNode* has been removed from the
plan. plan.
The following optimizer rules may appear in the `rules` attribute of cluster plans: The following optimizer rules may appear in the `rules` attribute of cluster plans: