1
0
Fork 0

Merge branch 'devel' of github.com:arangodb/arangodb into devel

This commit is contained in:
Michael Hackstein 2015-10-21 16:20:44 +02:00
commit fe455671d4
5 changed files with 51 additions and 38 deletions

View File

@ -7,8 +7,10 @@ single query. It will then calculate the costs for all plans and pick the plan w
lowest total cost. This resulting plan is considered to be the *optimal plan*, which is
then executed.
The optimizer is designed to only perform optimization if they are *safe*, in the
meaning that an optimization does not modify the result of a query.
The optimizer is designed to only perform optimizations if they are *safe*, in the
meaning that an optimization should not modify the result of a query. A notable exception
to this is that the optimizer is allowed to change the order of results for queries that
do not explicitly specify how results should be sorted.
!SUBSECTION Execution plans
@ -69,16 +71,16 @@ the evaluation of an expression. The filters expression result (`i.value > 97`)
is calculated in the *CalculationNode* above the *FilterNode*.
Finally, all of this needs to be done for documents of collection `test`. This is
where the *IndexRangeNode* enters the game. It will use an index (thus its name)
where the *IndexNode* enters the game. It will use an index (thus its name)
to find certain documents in the collection and ship it down the pipeline in the
order required by `SORT i.value`. The *IndexRangeNode* itself has a *SingletonNode*
order required by `SORT i.value`. The *IndexNode* itself has a *SingletonNode*
as its input. The sole purpose of a *SingletonNode* node is to provide a single empty
document as input for other processing steps. It is always the end of the pipeline.
Here's a summary:
* SingletonNode: produces empty document as input for other processing steps.
* IndexRangeNode: iterates over the index on attribute `value` in collection `test`
in the order required by `SORT i.value`.
* SingletonNode: produces an empty document as input for other processing steps.
* IndexNode: iterates over the index on attribute `value` in collection `test`
in the order required by `SORT i.value`.
* CalculationNode: evaluates the result of the calculation `i.value > 97` to `true` or `false`
* FilterNode: only lets documents pass where above calculation returned `true`
* CalculationNode: calculates return value `i.value`
@ -88,9 +90,9 @@ Here's a summary:
!SUBSUBSECTION Optimizer rules
Note that in the example, the optimizer has optimized the `SORT` statement away.
It can do it safely because there is a sorted index on `i.value`, which it has
picked in the *IndexRangeNode*. As the index values are iterated in sorted order
anyway, the extra *SortNode* would be redundant and was removed.
It can do it safely because there is a sorted skiplist index on `i.value`, which it has
picked in the *IndexNode*. As the index values are iterated over in sorted order
anyway, the extra *SortNode* would have been redundant and was removed.
Additionally, the optimizer has done more work to generate an execution plan that
avoids as much expensive operations as possible. Here is the list of optimizer rules
@ -115,7 +117,7 @@ Here is the meaning of these rules in context of this query:
* `remove-unnecessary-calculations`: removes *CalculationNode*s whose result values are
not used in the query. In the example this happens due to the `remove-redundant-calculations`
rule having made some calculations unnecessary.
* `use-index-range`: use an index to iterate over a collection instead of performing a
* `use-index`: use an index to iterate over a collection instead of performing a
full collection scan. In the example case this makes sense, as the index can be
used for filtering and sorting.
* `use-index-for-sort`: removes a `SORT` operation if it is already satisfied by
@ -268,8 +270,8 @@ The following execution node types will appear in the output of `explain`:
exactly one *SingletonNode* as its top node.
* *EnumerateCollectionNode*: enumeration over documents of a collection (given in
its *collection* attribute) without using an index.
* *IndexRangeNode*: enumeration over a specific index (given in its *index* attribute)
of a collection. The index range is specified in the *ranges* attribute of the node.
* *IndexNode*: enumeration over one or many indexes (given in its *indexes* attribute)
of a collection. The index ranges are specified in the *condition* attribute of the node.
* *EnumerateListNode*: enumeration over a list of (non-collection) values.
* *FilterNode*: only lets values pass that satisfy a filter condition. Will appear once
per *FILTER* statement.
@ -291,6 +293,8 @@ The following execution node types will appear in the output of `explain`:
attribute). Will appear exactly once in a query that contains a *REPLACE* statement.
* *UpdateNode*: updates documents in a collection (given in its *collection*
attribute). Will appear exactly once in a query that contains an *UPDATE* statement.
* *UpsertNode*: upserts documents in a collection (given in its *collection*
attribute). Will appear exactly once in a query that contains an *UPSERT* statement.
* *NoResultsNode*: will be inserted if *FILTER* statements turn out to be never
satisfiable. The *NoResultsNode* will pass an empty result set into the processing
pipeline.
@ -349,11 +353,11 @@ The following optimizer rules may appear in the `rules` attribute of a plan:
on the same variable or attribute were replaced with an *IN* condition.
* `remove-redundant-or`: will appear if multiple *OR* conditions for the same variable
or attribute were combined into a single condition.
* `use-index-range`: will appear if an index can be used to iterate over a collection.
* `use-indexes`: will appear when an index is used to iterate over a collection.
As a consequence, an *EnumerateCollectionNode* was replaced with an
*IndexRangeNode* in the plan.
*IndexNode* in the plan.
* `remove-filters-covered-by-index`: will appear if a *FilterNode* was removed or replaced
because the filter condition is already covered by an *IndexRangeNode*.
because the filter condition is already covered by an *IndexNode*.
* `use-index-for-sort`: will appear if an index can be used to avoid a *SORT*
operation. If the rule was applied, a *SortNode* was removed from the plan.
* `move-calculations-down`: will appear if a *CalculationNode* was moved down in a plan.

View File

@ -1,6 +1,6 @@
!SECTION How ArangoDB uses Indexes
In general, ArangoDB will use a single index per collection in a given query. AQL queries can
In most cases ArangoDB will use a single index per collection in a given query. AQL queries can
use more than one index per collection when multiple FILTER conditions are combined with a
logical `OR` and these can be covered by indexes. AQL queries will use a single index per
collection when FILTER conditions are combined with logical `AND`.
@ -23,11 +23,11 @@ multiple indexes the optimizer can choose from. The optimizer will then select a
indexes with the lowest estimated total cost. In general, the optimizer will pick the indexes with
the highest estimated selectivity.
Sparse indexes do not contain `null` values. If the optimizer cannot safely determine whether a
FILTER condition includes `null` values, it will not make use of a sparse index. The optimizer
policy is to produce correct results, regardless of whether or which index is used to satisfy
FILTER conditions. If it is unsure about whether using an index will violate the policy, it will
not make use of the index.
Sparse indexes may or may not be picked by the optimizer in a query. As sparse indexes do not contain
`null` values, they will not be used for queries if the optimizer cannot safely determine whether a
FILTER condition includes `null` values for the index attributes. The optimizer policy is to produce
correct results, regardless of whether or which index is used to satisfy FILTER conditions. If it is
unsure about whether using an index will violate the policy, it will not make use of the index.
!SUBSECTION Troubleshooting
@ -76,12 +76,15 @@ If any of the explain methods shows that a query is not using indexes, the follo
In these cases the queries should be rewritten so that only the index attribute is present on one side of
the operator, or additional filters and indexes should be used to restrict the amount of documents otherwise.
* the query optimizer will in general picking one index per collection in a query. It can pick more than
* the query optimizer will in general pick one index per collection in a query. It can pick more than
one index per collection if the FILTER condition contains multiple branches combined with logical `OR`.
For example, the following queries can use more than one index:
For example, the following queries can use indexes:
FOR doc IN collection FILTER doc.value1 == 42 || doc.value1 == 23 RETURN doc
FOR doc IN collection FILTER doc.value1 == 42 || doc.value2 == 23 RETURN doc
FOR doc IN collection FILTER doc.value1 < 42 || doc.value2 > 23 RETURN doc
In the latter case, the query optimizer can only use indexes if there are indexes present on both `value1`
and `value2`.
The two `OR`s in the first query will be converted to an `IN` list, and if there is a suitable index on
`value1`, it will be used. The second query requires two separate indexes on `value1` and `value2` and
will use them if present. The third query can use the indexes on `value1` and `value2` when they are
sorted.

View File

@ -35,8 +35,8 @@ db.collection.document("<document-key>");
db._document("<document-id>");
```
As the primary index is a hash index, it cannot be used for non-equality range
queries or for sorting.
As the primary index is an unsorted hash index, it cannot be used for non-equality
range queries or for sorting.
The primary index of a collection cannot be dropped or changed, and there is no
mechanism to create user-defined primary indexes.
@ -64,11 +64,11 @@ db.collection.inEdges("<from-value>");
db.collection.inEdges("<to-value>");
```
The edges index is a hash index. It can be used for equality lookups only, but not for range
queries or for sorting. As edges indexes are automatically created for edge collections, it
is not possible to create user-defined edges indexes.
Internally, the edges index is implemented as a hash index. It can be used for equality
lookups, but not for range queries or for sorting. As edges indexes are automatically
created for edge collections, it is not possible to create user-defined edges indexes.
The edges index cannot be dropped or changed.
An edges index cannot be dropped or changed.
!SUBSECTION Hash Index
@ -120,7 +120,7 @@ Non-unique hash indexes have an amortized complexity of O(1) for insert, update,
removal operations. That means non-unique hash indexes can be used on attributes with
low cardinality.
If a hash index is created on an attribute that it is missing in all or many of the documents,
If a hash index is created on an attribute that is missing in all or many of the documents,
the behavior is as follows:
* if the index is sparse, the documents missing the attribute will not be indexed and not
@ -130,6 +130,9 @@ the behavior is as follows:
* if the index is non-sparse, the documents missing the attribute will be contained in the
index with a key value of `null`.
Hash indexes support indexing array values if the index attribute name is extended with
a <i>[\*]</i>`.
!SUBSECTION Skiplist Index
@ -217,6 +220,9 @@ The different types of skiplist indexes have the following characteristics:
The operational amortized complexity for skiplist indexes is logarithmically correlated
with the number of documents in the index.
Skiplist indexes support indexing array values if the index attribute name is extended with
a <i>[\*]</i>`.
!SUBSECTION Geo Index
@ -231,8 +237,8 @@ Th geo index provides operations to find documents with coordinates nearest to a
comparison coordinate, and to find documents with coordinates that are within a specifiable
radius around a comparison coordinate.
The geo index is used via dedicated functions in AQL or the simple queries, but will
not enabled for other types of queries or conditions.
The geo index is used via dedicated functions in AQL or the simple queries functions,
but will not be used for other types of queries or conditions.
!SUBSECTION Fulltext Index

View File

@ -1035,7 +1035,7 @@ ArangoCollection.prototype.ensureSkiplist = function () {
////////////////////////////////////////////////////////////////////////////////
/// @brief ensures that a fulltext index exists
/// @startDocuBlock ensureIndex
/// @startDocuBlock ensureFulltextIndex
/// `collection.ensureIndex({ type: "fulltext", fields: [ "field" ], minLength: minLength })`
///
/// Creates a fulltext index on all documents on attribute *field*.

View File

@ -430,7 +430,7 @@ function ModelAnnotationSpec () {
var Model = FoxxModel.extend({});
jsonSchema = toJSONSchema("myname", Model);
assertEqual(jsonSchema.id, "myname");
assertEqual(jsonSchema.required, []);
assertEqual(jsonSchema.required, undefined);
assertEqual(jsonSchema.properties, {});
},
@ -450,7 +450,7 @@ function ModelAnnotationSpec () {
jsonSchema = toJSONSchema("myname", Model);
assertEqual(jsonSchema.id, "myname");
assertEqual(jsonSchema.required, []);
assertEqual(jsonSchema.required, undefined);
assertEqual(jsonSchema.properties.x.type, "string");
},