Merge branch 'devel' of github.com:arangodb/arangodb into devel

2015-10-21 16:20:44 +02:00 · 2015-10-21 16:20:44 +02:00 · fe455671d4
parent 9fd375d684 e187ca26ec
commit fe455671d4
5 changed files with 51 additions and 38 deletions
--- a/Documentation/Books/Users/Aql/Optimizer.mdpp
+++ b/Documentation/Books/Users/Aql/Optimizer.mdpp
@ -7,8 +7,10 @@ single query. It will then calculate the costs for all plans and pick the plan w
 lowest total cost. This resulting plan is considered to be the *optimal plan*, which is
 then executed.

-The optimizer is designed to only perform optimization if they are *safe*, in the 
-meaning that an optimization does not modify the result of a query.
+The optimizer is designed to only perform optimizations if they are *safe*, in the 
+meaning that an optimization should not modify the result of a query. A notable exception
+to this is that the optimizer is allowed to change the order of results for queries that
+do not explicitly specify how results should be sorted.


 !SUBSECTION Execution plans
@ -69,16 +71,16 @@ the evaluation of an expression. The filters expression result (`i.value > 97`)
 is calculated in the *CalculationNode* above the *FilterNode*.

 Finally, all of this needs to be done for documents of collection `test`. This is
-where the *IndexRangeNode* enters the game. It will use an index (thus its name)
+where the *IndexNode* enters the game. It will use an index (thus its name)
 to find certain documents in the collection and ship it down the pipeline in the
-order required by `SORT i.value`. The *IndexRangeNode* itself has a *SingletonNode*
+order required by `SORT i.value`. The *IndexNode* itself has a *SingletonNode*
 as its input. The sole purpose of a *SingletonNode* node is to provide a single empty
 document as input for other processing steps. It is always the end of the pipeline.

 Here's a summary:
-* SingletonNode: produces empty document as input for other processing steps.
-* IndexRangeNode: iterates over the index on attribute `value` in collection `test`
-  in the order required by  `SORT i.value`.
+* SingletonNode: produces an empty document as input for other processing steps.
+* IndexNode: iterates over the index on attribute `value` in collection `test`
+  in the order required by `SORT i.value`.
 * CalculationNode: evaluates the result of the calculation `i.value > 97` to `true` or `false`
 * FilterNode: only lets documents pass where above calculation returned `true`
 * CalculationNode: calculates return value `i.value`
@ -88,9 +90,9 @@ Here's a summary:
 !SUBSUBSECTION Optimizer rules

 Note that in the example, the optimizer has optimized the `SORT` statement away.
-It can do it safely because there is a sorted index on `i.value`, which it has
-picked in the *IndexRangeNode*. As the index values are iterated in sorted order
-anyway, the extra *SortNode* would be redundant and was removed.
+It can do it safely because there is a sorted skiplist index on `i.value`, which it has
+picked in the *IndexNode*. As the index values are iterated over in sorted order
+anyway, the extra *SortNode* would have been redundant and was removed.

 Additionally, the optimizer has done more work to generate an execution plan that
 avoids as much expensive operations as possible. Here is the list of optimizer rules
@ -115,7 +117,7 @@ Here is the meaning of these rules in context of this query:
 * `remove-unnecessary-calculations`: removes *CalculationNode*s whose result values are
  not used in the query. In the example this happens due to the `remove-redundant-calculations`
  rule having made some calculations unnecessary.
-* `use-index-range`: use an index to iterate over a collection instead of performing a
+* `use-index`: use an index to iterate over a collection instead of performing a
  full collection scan. In the example case this makes sense, as the index can be
  used for filtering and sorting.
 * `use-index-for-sort`: removes a `SORT` operation if it is already satisfied by 
@ -268,8 +270,8 @@ The following execution node types will appear in the output of `explain`:
  exactly one *SingletonNode* as its top node.
 * *EnumerateCollectionNode*: enumeration over documents of a collection (given in
  its *collection* attribute) without using an index.
-* *IndexRangeNode*: enumeration over a specific index (given in its *index* attribute)
-  of a collection. The index range is specified in the *ranges* attribute of the node.
+* *IndexNode*: enumeration over one or many indexes (given in its *indexes* attribute)
+  of a collection. The index ranges are specified in the *condition* attribute of the node.
 * *EnumerateListNode*: enumeration over a list of (non-collection) values.
 * *FilterNode*: only lets values pass that satisfy a filter condition. Will appear once
  per *FILTER* statement.
@ -291,6 +293,8 @@ The following execution node types will appear in the output of `explain`:
  attribute). Will appear exactly once in a query that contains a *REPLACE* statement.
 * *UpdateNode*: updates documents in a collection (given in its *collection* 
  attribute). Will appear exactly once in a query that contains an *UPDATE* statement.
+* *UpsertNode*: upserts documents in a collection (given in its *collection* 
+  attribute). Will appear exactly once in a query that contains an *UPSERT* statement.
 * *NoResultsNode*: will be inserted if *FILTER* statements turn out to be never
  satisfiable. The *NoResultsNode* will pass an empty result set into the processing
  pipeline. 
@ -349,11 +353,11 @@ The following optimizer rules may appear in the `rules` attribute of a plan:
  on the same variable or attribute were replaced with an *IN* condition.
 * `remove-redundant-or`: will appear if multiple *OR* conditions for the same variable
  or attribute were combined into a single condition.
-* `use-index-range`: will appear if an index can be used to iterate over a collection.
+* `use-indexes`: will appear when an index is used to iterate over a collection.
  As a consequence, an *EnumerateCollectionNode* was replaced with an 
-  *IndexRangeNode* in the plan.
+  *IndexNode* in the plan.
 * `remove-filters-covered-by-index`: will appear if a *FilterNode* was removed or replaced
-  because the filter condition is already covered by an *IndexRangeNode*.
+  because the filter condition is already covered by an *IndexNode*.
 * `use-index-for-sort`: will appear if an index can be used to avoid a *SORT* 
  operation. If the rule was applied, a *SortNode* was removed from the plan.
 * `move-calculations-down`: will appear if a *CalculationNode* was moved down in a plan. 
--- a/Documentation/Books/Users/IndexHandling/HowArangoDBUsesIndexes.mdpp
+++ b/Documentation/Books/Users/IndexHandling/HowArangoDBUsesIndexes.mdpp
@ -1,6 +1,6 @@
 !SECTION How ArangoDB uses Indexes

-In general, ArangoDB will use a single index per collection in a given query. AQL queries can
+In most cases ArangoDB will use a single index per collection in a given query. AQL queries can
 use more than one index per collection when multiple FILTER conditions are combined with a 
 logical `OR` and these can be covered by indexes. AQL queries will use a single index per
 collection when FILTER conditions are combined with logical `AND`.
@ -23,11 +23,11 @@ multiple indexes the optimizer can choose from. The optimizer will then select a
 indexes with the lowest estimated total cost. In general, the optimizer will pick the indexes with
 the highest estimated selectivity.

-Sparse indexes do not contain `null` values. If the optimizer cannot safely determine whether a
-FILTER condition includes `null` values, it will not make use of a sparse index. The optimizer 
-policy is to produce correct results, regardless of whether or which index is used to satisfy 
-FILTER conditions. If it is unsure about whether using an index will violate the policy, it will 
-not make use of the index.
+Sparse indexes may or may not be picked by the optimizer in a query. As sparse indexes do not contain 
+`null` values, they will not be used for queries if the optimizer cannot safely determine whether a
+FILTER condition includes `null` values for the index attributes. The optimizer policy is to produce 
+correct results, regardless of whether or which index is used to satisfy FILTER conditions. If it is 
+unsure about whether using an index will violate the policy, it will not make use of the index.


 !SUBSECTION Troubleshooting
@ -76,12 +76,15 @@ If any of the explain methods shows that a query is not using indexes, the follo
  In these cases the queries should be rewritten so that only the index attribute is present on one side of 
  the operator, or additional filters and indexes should be used to restrict the amount of documents otherwise.

-* the query optimizer will in general picking one index per collection in a query. It can pick more than
+* the query optimizer will in general pick one index per collection in a query. It can pick more than
  one index per collection if the FILTER condition contains multiple branches combined with logical `OR`.
-  For example, the following queries can use more than one index:
+  For example, the following queries can use indexes:

      FOR doc IN collection FILTER doc.value1 == 42 || doc.value1 == 23 RETURN doc
      FOR doc IN collection FILTER doc.value1 == 42 || doc.value2 == 23 RETURN doc
+      FOR doc IN collection FILTER doc.value1 < 42 || doc.value2 > 23 RETURN doc

-  In the latter case, the query optimizer can only use indexes if there are indexes present on both `value1` 
-  and `value2`.
+  The two `OR`s in the first query will be converted to an `IN` list, and if there is a suitable index on
+  `value1`, it will be used. The second query requires two separate indexes on `value1` and `value2` and
+  will use them if present. The third query can use the indexes on `value1` and `value2` when they are
+  sorted.
--- a/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp
+++ b/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp
@ -35,8 +35,8 @@ db.collection.document("<document-key>");
 db._document("<document-id>");
 ```

-As the primary index is a hash index, it cannot be used for non-equality range 
-queries or for sorting.
+As the primary index is an unsorted hash index, it cannot be used for non-equality 
+range queries or for sorting.

 The primary index of a collection cannot be dropped or changed, and there is no
 mechanism to create user-defined primary indexes.
@ -64,11 +64,11 @@ db.collection.inEdges("<from-value>");
 db.collection.inEdges("<to-value>");
 ```

-The edges index is a hash index. It can be used for equality lookups only, but not for range
-queries or for sorting. As edges indexes are automatically created for edge collections, it
-is not possible to create user-defined edges indexes.
+Internally, the edges index is implemented as a hash index. It can be used for equality 
+lookups, but not for range queries or for sorting. As edges indexes are automatically 
+created for edge collections, it is not possible to create user-defined edges indexes.

-The edges index cannot be dropped or changed.
+An edges index cannot be dropped or changed.


 !SUBSECTION Hash Index
@ -120,7 +120,7 @@ Non-unique hash indexes have an amortized complexity of O(1) for insert, update,
 removal operations. That means non-unique hash indexes can be used on attributes with 
 low cardinality. 

-If a hash index is created on an attribute that it is missing in all or many of the documents,
+If a hash index is created on an attribute that is missing in all or many of the documents,
 the behavior is as follows:

 * if the index is sparse, the documents missing the attribute will not be indexed and not
@ -130,6 +130,9 @@ the behavior is as follows:
 * if the index is non-sparse, the documents missing the attribute will be contained in the
  index with a key value of `null`. 

+Hash indexes support indexing array values if the index attribute name is extended with
+a <i>[\*]</i>`. 
+

 !SUBSECTION Skiplist Index

@ -217,6 +220,9 @@ The different types of skiplist indexes have the following characteristics:
 The operational amortized complexity for skiplist indexes is logarithmically correlated
 with the number of documents in the index.

+Skiplist indexes support indexing array values if the index attribute name is extended with
+a <i>[\*]</i>`. 
+

 !SUBSECTION Geo Index

@ -231,8 +237,8 @@ Th geo index provides operations to find documents with coordinates nearest to a
 comparison coordinate, and to find documents with coordinates that are within a specifiable
 radius around a comparison coordinate.

-The geo index is used via dedicated functions in AQL or the simple queries, but will
-not enabled for other types of queries or conditions.
+The geo index is used via dedicated functions in AQL or the simple queries functions, 
+but will not be used for other types of queries or conditions.


 !SUBSECTION Fulltext Index
--- a/js/server/modules/org/arangodb/arango-collection.js
+++ b/js/server/modules/org/arangodb/arango-collection.js
@ -1035,7 +1035,7 @@ ArangoCollection.prototype.ensureSkiplist = function () {

 ////////////////////////////////////////////////////////////////////////////////
 /// @brief ensures that a fulltext index exists
-/// @startDocuBlock ensureIndex
+/// @startDocuBlock ensureFulltextIndex
 /// `collection.ensureIndex({ type: "fulltext", fields: [ "field" ], minLength: minLength })`
 ///
 /// Creates a fulltext index on all documents on attribute *field*.
--- a/js/server/tests/shell-foxx-model.js
+++ b/js/server/tests/shell-foxx-model.js
@ -430,7 +430,7 @@ function ModelAnnotationSpec () {
      var Model = FoxxModel.extend({});
      jsonSchema = toJSONSchema("myname", Model);
      assertEqual(jsonSchema.id, "myname");
-      assertEqual(jsonSchema.required, []);
+      assertEqual(jsonSchema.required, undefined);
      assertEqual(jsonSchema.properties, {});
    },

@ -450,7 +450,7 @@ function ModelAnnotationSpec () {

      jsonSchema = toJSONSchema("myname", Model);
      assertEqual(jsonSchema.id, "myname");
-      assertEqual(jsonSchema.required, []);
+      assertEqual(jsonSchema.required, undefined);
      assertEqual(jsonSchema.properties.x.type, "string");
    },