From 876b60d1098ed816af74d3fff2464319b63dab90 Mon Sep 17 00:00:00 2001 From: Jan Steemann Date: Wed, 21 Oct 2015 15:48:20 +0200 Subject: [PATCH 1/3] updated optimizer documentation --- Documentation/Books/Users/Aql/Optimizer.mdpp | 36 ++++++++++--------- .../IndexHandling/HowArangoDBUsesIndexes.mdpp | 23 ++++++------ 2 files changed, 33 insertions(+), 26 deletions(-) diff --git a/Documentation/Books/Users/Aql/Optimizer.mdpp b/Documentation/Books/Users/Aql/Optimizer.mdpp index 06c28b53bd..4012ff1cfc 100644 --- a/Documentation/Books/Users/Aql/Optimizer.mdpp +++ b/Documentation/Books/Users/Aql/Optimizer.mdpp @@ -7,8 +7,10 @@ single query. It will then calculate the costs for all plans and pick the plan w lowest total cost. This resulting plan is considered to be the *optimal plan*, which is then executed. -The optimizer is designed to only perform optimization if they are *safe*, in the -meaning that an optimization does not modify the result of a query. +The optimizer is designed to only perform optimizations if they are *safe*, in the +meaning that an optimization should not modify the result of a query. A notable exception +to this is that the optimizer is allowed to change the order of results for queries that +do not explicitly specify how results should be sorted. !SUBSECTION Execution plans @@ -69,16 +71,16 @@ the evaluation of an expression. The filters expression result (`i.value > 97`) is calculated in the *CalculationNode* above the *FilterNode*. Finally, all of this needs to be done for documents of collection `test`. This is -where the *IndexRangeNode* enters the game. It will use an index (thus its name) +where the *IndexNode* enters the game. It will use an index (thus its name) to find certain documents in the collection and ship it down the pipeline in the -order required by `SORT i.value`. The *IndexRangeNode* itself has a *SingletonNode* +order required by `SORT i.value`. The *IndexNode* itself has a *SingletonNode* as its input. The sole purpose of a *SingletonNode* node is to provide a single empty document as input for other processing steps. It is always the end of the pipeline. Here's a summary: -* SingletonNode: produces empty document as input for other processing steps. -* IndexRangeNode: iterates over the index on attribute `value` in collection `test` - in the order required by `SORT i.value`. +* SingletonNode: produces an empty document as input for other processing steps. +* IndexNode: iterates over the index on attribute `value` in collection `test` + in the order required by `SORT i.value`. * CalculationNode: evaluates the result of the calculation `i.value > 97` to `true` or `false` * FilterNode: only lets documents pass where above calculation returned `true` * CalculationNode: calculates return value `i.value` @@ -88,9 +90,9 @@ Here's a summary: !SUBSUBSECTION Optimizer rules Note that in the example, the optimizer has optimized the `SORT` statement away. -It can do it safely because there is a sorted index on `i.value`, which it has -picked in the *IndexRangeNode*. As the index values are iterated in sorted order -anyway, the extra *SortNode* would be redundant and was removed. +It can do it safely because there is a sorted skiplist index on `i.value`, which it has +picked in the *IndexNode*. As the index values are iterated over in sorted order +anyway, the extra *SortNode* would have been redundant and was removed. Additionally, the optimizer has done more work to generate an execution plan that avoids as much expensive operations as possible. Here is the list of optimizer rules @@ -115,7 +117,7 @@ Here is the meaning of these rules in context of this query: * `remove-unnecessary-calculations`: removes *CalculationNode*s whose result values are not used in the query. In the example this happens due to the `remove-redundant-calculations` rule having made some calculations unnecessary. -* `use-index-range`: use an index to iterate over a collection instead of performing a +* `use-index`: use an index to iterate over a collection instead of performing a full collection scan. In the example case this makes sense, as the index can be used for filtering and sorting. * `use-index-for-sort`: removes a `SORT` operation if it is already satisfied by @@ -268,8 +270,8 @@ The following execution node types will appear in the output of `explain`: exactly one *SingletonNode* as its top node. * *EnumerateCollectionNode*: enumeration over documents of a collection (given in its *collection* attribute) without using an index. -* *IndexRangeNode*: enumeration over a specific index (given in its *index* attribute) - of a collection. The index range is specified in the *ranges* attribute of the node. +* *IndexNode*: enumeration over one or many indexes (given in its *indexes* attribute) + of a collection. The index ranges are specified in the *condition* attribute of the node. * *EnumerateListNode*: enumeration over a list of (non-collection) values. * *FilterNode*: only lets values pass that satisfy a filter condition. Will appear once per *FILTER* statement. @@ -291,6 +293,8 @@ The following execution node types will appear in the output of `explain`: attribute). Will appear exactly once in a query that contains a *REPLACE* statement. * *UpdateNode*: updates documents in a collection (given in its *collection* attribute). Will appear exactly once in a query that contains an *UPDATE* statement. +* *UpsertNode*: upserts documents in a collection (given in its *collection* + attribute). Will appear exactly once in a query that contains an *UPSERT* statement. * *NoResultsNode*: will be inserted if *FILTER* statements turn out to be never satisfiable. The *NoResultsNode* will pass an empty result set into the processing pipeline. @@ -349,11 +353,11 @@ The following optimizer rules may appear in the `rules` attribute of a plan: on the same variable or attribute were replaced with an *IN* condition. * `remove-redundant-or`: will appear if multiple *OR* conditions for the same variable or attribute were combined into a single condition. -* `use-index-range`: will appear if an index can be used to iterate over a collection. +* `use-indexes`: will appear when an index is used to iterate over a collection. As a consequence, an *EnumerateCollectionNode* was replaced with an - *IndexRangeNode* in the plan. + *IndexNode* in the plan. * `remove-filters-covered-by-index`: will appear if a *FilterNode* was removed or replaced - because the filter condition is already covered by an *IndexRangeNode*. + because the filter condition is already covered by an *IndexNode*. * `use-index-for-sort`: will appear if an index can be used to avoid a *SORT* operation. If the rule was applied, a *SortNode* was removed from the plan. * `move-calculations-down`: will appear if a *CalculationNode* was moved down in a plan. diff --git a/Documentation/Books/Users/IndexHandling/HowArangoDBUsesIndexes.mdpp b/Documentation/Books/Users/IndexHandling/HowArangoDBUsesIndexes.mdpp index 0792e321db..01b40a4dae 100644 --- a/Documentation/Books/Users/IndexHandling/HowArangoDBUsesIndexes.mdpp +++ b/Documentation/Books/Users/IndexHandling/HowArangoDBUsesIndexes.mdpp @@ -1,6 +1,6 @@ !SECTION How ArangoDB uses Indexes -In general, ArangoDB will use a single index per collection in a given query. AQL queries can +In most cases ArangoDB will use a single index per collection in a given query. AQL queries can use more than one index per collection when multiple FILTER conditions are combined with a logical `OR` and these can be covered by indexes. AQL queries will use a single index per collection when FILTER conditions are combined with logical `AND`. @@ -23,11 +23,11 @@ multiple indexes the optimizer can choose from. The optimizer will then select a indexes with the lowest estimated total cost. In general, the optimizer will pick the indexes with the highest estimated selectivity. -Sparse indexes do not contain `null` values. If the optimizer cannot safely determine whether a -FILTER condition includes `null` values, it will not make use of a sparse index. The optimizer -policy is to produce correct results, regardless of whether or which index is used to satisfy -FILTER conditions. If it is unsure about whether using an index will violate the policy, it will -not make use of the index. +Sparse indexes may or may not be picked by the optimizer in a query. As sparse indexes do not contain +`null` values, they will not be used for queries if the optimizer cannot safely determine whether a +FILTER condition includes `null` values for the index attributes. The optimizer policy is to produce +correct results, regardless of whether or which index is used to satisfy FILTER conditions. If it is +unsure about whether using an index will violate the policy, it will not make use of the index. !SUBSECTION Troubleshooting @@ -76,12 +76,15 @@ If any of the explain methods shows that a query is not using indexes, the follo In these cases the queries should be rewritten so that only the index attribute is present on one side of the operator, or additional filters and indexes should be used to restrict the amount of documents otherwise. -* the query optimizer will in general picking one index per collection in a query. It can pick more than +* the query optimizer will in general pick one index per collection in a query. It can pick more than one index per collection if the FILTER condition contains multiple branches combined with logical `OR`. - For example, the following queries can use more than one index: + For example, the following queries can use indexes: FOR doc IN collection FILTER doc.value1 == 42 || doc.value1 == 23 RETURN doc FOR doc IN collection FILTER doc.value1 == 42 || doc.value2 == 23 RETURN doc + FOR doc IN collection FILTER doc.value1 < 42 || doc.value2 > 23 RETURN doc - In the latter case, the query optimizer can only use indexes if there are indexes present on both `value1` - and `value2`. + The two `OR`s in the first query will be converted to an `IN` list, and if there is a suitable index on + `value1`, it will be used. The second query requires two separate indexes on `value1` and `value2` and + will use them if present. The third query can use the indexes on `value1` and `value2` when they are + sorted. From 4eec43fb28f95222a364b665ace0c9cf49d10bc8 Mon Sep 17 00:00:00 2001 From: Alan Plum Date: Wed, 21 Oct 2015 16:13:46 +0200 Subject: [PATCH 2/3] JSONSchema is so much fun --- js/server/tests/shell-foxx-model.js | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/js/server/tests/shell-foxx-model.js b/js/server/tests/shell-foxx-model.js index b5aab16dbb..39a75161aa 100644 --- a/js/server/tests/shell-foxx-model.js +++ b/js/server/tests/shell-foxx-model.js @@ -430,7 +430,7 @@ function ModelAnnotationSpec () { var Model = FoxxModel.extend({}); jsonSchema = toJSONSchema("myname", Model); assertEqual(jsonSchema.id, "myname"); - assertEqual(jsonSchema.required, []); + assertEqual(jsonSchema.required, undefined); assertEqual(jsonSchema.properties, {}); }, @@ -450,7 +450,7 @@ function ModelAnnotationSpec () { jsonSchema = toJSONSchema("myname", Model); assertEqual(jsonSchema.id, "myname"); - assertEqual(jsonSchema.required, []); + assertEqual(jsonSchema.required, undefined); assertEqual(jsonSchema.properties.x.type, "string"); }, From becac4b4b17a71dc5d7522866a297ae3b81456a8 Mon Sep 17 00:00:00 2001 From: Jan Steemann Date: Wed, 21 Oct 2015 16:14:21 +0200 Subject: [PATCH 3/3] documentation fixes --- .../Users/IndexHandling/IndexBasics.mdpp | 24 ++++++++++++------- .../modules/org/arangodb/arango-collection.js | 2 +- 2 files changed, 16 insertions(+), 10 deletions(-) diff --git a/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp b/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp index 3c82b517d5..c19a3a4de6 100644 --- a/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp +++ b/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp @@ -35,8 +35,8 @@ db.collection.document(""); db._document(""); ``` -As the primary index is a hash index, it cannot be used for non-equality range -queries or for sorting. +As the primary index is an unsorted hash index, it cannot be used for non-equality +range queries or for sorting. The primary index of a collection cannot be dropped or changed, and there is no mechanism to create user-defined primary indexes. @@ -64,11 +64,11 @@ db.collection.inEdges(""); db.collection.inEdges(""); ``` -The edges index is a hash index. It can be used for equality lookups only, but not for range -queries or for sorting. As edges indexes are automatically created for edge collections, it -is not possible to create user-defined edges indexes. +Internally, the edges index is implemented as a hash index. It can be used for equality +lookups, but not for range queries or for sorting. As edges indexes are automatically +created for edge collections, it is not possible to create user-defined edges indexes. -The edges index cannot be dropped or changed. +An edges index cannot be dropped or changed. !SUBSECTION Hash Index @@ -120,7 +120,7 @@ Non-unique hash indexes have an amortized complexity of O(1) for insert, update, removal operations. That means non-unique hash indexes can be used on attributes with low cardinality. -If a hash index is created on an attribute that it is missing in all or many of the documents, +If a hash index is created on an attribute that is missing in all or many of the documents, the behavior is as follows: * if the index is sparse, the documents missing the attribute will not be indexed and not @@ -130,6 +130,9 @@ the behavior is as follows: * if the index is non-sparse, the documents missing the attribute will be contained in the index with a key value of `null`. +Hash indexes support indexing array values if the index attribute name is extended with +a [\*]`. + !SUBSECTION Skiplist Index @@ -217,6 +220,9 @@ The different types of skiplist indexes have the following characteristics: The operational amortized complexity for skiplist indexes is logarithmically correlated with the number of documents in the index. +Skiplist indexes support indexing array values if the index attribute name is extended with +a [\*]`. + !SUBSECTION Geo Index @@ -231,8 +237,8 @@ Th geo index provides operations to find documents with coordinates nearest to a comparison coordinate, and to find documents with coordinates that are within a specifiable radius around a comparison coordinate. -The geo index is used via dedicated functions in AQL or the simple queries, but will -not enabled for other types of queries or conditions. +The geo index is used via dedicated functions in AQL or the simple queries functions, +but will not be used for other types of queries or conditions. !SUBSECTION Fulltext Index diff --git a/js/server/modules/org/arangodb/arango-collection.js b/js/server/modules/org/arangodb/arango-collection.js index 6b72269a5b..dccf23a58a 100644 --- a/js/server/modules/org/arangodb/arango-collection.js +++ b/js/server/modules/org/arangodb/arango-collection.js @@ -1035,7 +1035,7 @@ ArangoCollection.prototype.ensureSkiplist = function () { //////////////////////////////////////////////////////////////////////////////// /// @brief ensures that a fulltext index exists -/// @startDocuBlock ensureIndex +/// @startDocuBlock ensureFulltextIndex /// `collection.ensureIndex({ type: "fulltext", fields: [ "field" ], minLength: minLength })` /// /// Creates a fulltext index on all documents on attribute *field*.