updated documentation

2015-02-13 00:23:06 +01:00 · 2015-02-13 00:23:06 +01:00 · bcdbf30ca2
parent 52783bd9fa
commit bcdbf30ca2
5 changed files with 373 additions and 120 deletions
--- a/56
+++ b/56
@ -1,6 +1,62 @@
 v2.5.0 (XXXX-XX-XX)
 -------------------
 * added support for sparse hash and skiplist indexes
  Hash and skiplist indexes can optionally be made sparse. Sparse indexes exclude documents
  in which at least one of the index attributes is either not set or has a value of `null`.
  As such documents are excluded from sparse indexes, they may contain fewer documents than
  their non-sparse counterparts. This enables faster indexing and can lead to reduced memory
  usage in case the indexed attribute does occur only in some, but not all documents of the 
  collection. Sparse indexes will also reduce the number of collisions in non-unique hash
  indexes in case non-existing or optional attributes are indexed.
  In order to create a sparse index, an object with the attribute `sparse` can be added to
  the index creation commands:
      db.collection.ensureHashIndex(attributeName, { sparse: true }); 
      db.collection.ensureHashIndex(attributeName1, attributeName2, { sparse: true }); 
      db.collection.ensureUniqueConstraint(attributeName, { sparse: true }); 
      db.collection.ensureUniqueConstraint(attributeName1, attributeName2, { sparse: true }); 
      db.collection.ensureSkiplist(attributeName, { sparse: true }); 
      db.collection.ensureSkiplist(attributeName1, attributeName2, { sparse: true }); 
      db.collection.ensureUniqueSkiplist(attributeName, { sparse: true }); 
      db.collection.ensureUniqueSkiplist(attributeName1, attributeName2, { sparse: true }); 
  When not explicitly set, the `sparse` attribute defaults to `false` for new indexes.
  Other indexes than hash and skiplist do not support sparsity.
  As sparse indexes may exclude some documents from the collection, they cannot be used for
  all types of queries. Sparse hash indexes cannot be used to find documents for which at
  least one of the indexed attributes has a value of `null`. For example, the following AQL
  query cannot use a sparse index, even if one was created on attribute `attr`:
      FOR doc In collection 
        FILTER doc.attr == null 
        RETURN doc
  If the lookup value is non-constant, a sparse index may or may not be used, depending on
  the other types of conditions in the query. If the optimizer can safely determine that
  the lookup value cannot be `null`, a sparse index may be used. When uncertain, the optimizer
  will not make use of a sparse index in a query in order to produce correct results.
  For example, the following queries cannot use a sparse index on `attr` because the optimizer
  will not know beforehand whether the comparsion values for `doc.attr` will include `null`:
      FOR doc In collection 
        FILTER doc.attr == SOME_FUNCTION(...) 
        RETURN doc
      FOR other IN otherCollection 
        FOR doc In collection 
          FILTER doc.attr == other.attr 
          RETURN doc
  Sparse skiplist indexes can be used for sorting if the optimizer can safely detect that the 
  index range does not include `null` for any of the index attributes. 
 * inspection of AQL data-modification queries will now detect if the data-modification part
  of the query can run in lockstep with the data retrieval part of the query, or if the data
  retrieval part must be executed before the data modification can start.
--- a/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp
+++ b/Documentation/Books/Users/IndexHandling/IndexBasics.mdpp
@ -23,7 +23,9 @@ ArangoDB provides the following index types:
 For each collection there will always be a *primary index* which is a hash index 
 for the [document keys](../Glossary/README.html#document_key) (`_key` attribute)
 of all documents in the collection. The primary index allows quick selection
-of documents in the collection using either the `_key` or `_id` attributes.
+of documents in the collection using either the `_key` or `_id` attributes. It will
 be used from within AQL queries automatically when performing equality lookups on
 `_key` or `_id`. 
 There are also dedicated functions to find a document given its `_key` or `_id`
 that will always make use of the primary index:
@ -33,7 +35,11 @@ db.collection.document("<document-key>");
 db._document("<document-id>");
 ```
-The primary index of a collection cannot be dropped or changed.
+As the primary index is a hash index, it cannot be used for range queries or for sorting
 on `_key` or `_id`. 
 The primary index of a collection cannot be dropped or changed, and there is no
 mechanism to create user-defined primary indexes.
 !SUBSECTION Edges Index
@ -44,11 +50,10 @@ documents by either their `_from` or `_to` attributes. It can therefore be
 used to quickly find connections between vertex documents and is invoked when 
 the connecting edges of a vertex are queried. 
-The edges index cannot be dropped or changed. Extra edges indexes cannot be
+Edges indexes are used from within AQL when performing equality lookups on `_from`
-created on other attributes or in non-edge collections.
+or `_to` values in an edge collections. There are also dedidacted functions to 
-
+find edges given their `_from` or `_to` values that will always make use of the 
-There are also dedidacted functions to find edges given their `_from` or `_to` 
+edges index:
 values that will always make use of the edges index:
 ```js
 db.collection.edges("<from-value>");
@ -59,146 +64,172 @@ db.collection.inEdges("<from-value>");
 db.collection.inEdges("<to-value>");
 ```
 The edges index is a hash index. It can be used for equality lookups only, but not for range
 queries or for sorting. As edges indexes are automatically created for edge collections, it
 is not possible to create user-defined edges indexes.
 The edges index cannot be dropped or changed.
 !SUBSECTION Hash Index
 A hash index can be used to quickly find documents with specific attribute values.
-The hash index is unsorted, so it supports equality lookups but no range queries.
+The hash index is unsorted, so it supports equality lookups but no range queries or sorting.
 A hash index can be created on one or multiple document attributes. A hash index will 
 only be used by a query if all indexed attributes are present in the search condition,
-and if all attributes are compared using the equality (`==`) operator. 
+and if all attributes are compared using the equality (`==`) operator. Hash indexes are 
 used from within AQL and several query functions, e.g. `byExample`, `firstExample` etc.
 Hash indexes can optionally be declared to be unique, disallowing saving the same
-value in the indexed attribute.
+value in the indexed attribute. Hash indexes can optionally be sparse.
-Hash indexes are supported by AQL and several query functions, e.g. `byExample`, 
+The different types of hash indexes have the following characteristics:
-`firstExample` etc.
+
 * **unique hash index**: all documents in the collection must have different values for 
  the attributes covered by the unique index. Trying to insert a document with the same 
  key value as an already existing document will lead to a unique constraint 
  violation. 
  This type of index is not sparse. Documents that do not contain the index attributes or 
  that have a value of `null` in the index attribute(s) will still be indexed. 
  A key value of `null` may only occur once in the index, so this type of index cannot 
  be used for optional attributes.
 * **unique, sparse hash index**: all documents in the collection must have different 
  values for the attributes covered by the unique index. Documents in which at least one
  of the index attributes is not set or has a value of `null` are not included in the 
  index. This type of index can be used to ensure that there are no duplicate keys in
  the collection for documents which have the indexed attributes set. As the index will
  exclude documents for which the indexed attributes are `null` or not set, it can be
  used for optional attributes.
 * **non-unique hash index**: all documents in the collection will be indexed. This type
  of index is not sparse. Documents that do not contain the index attributes or that have 
  a value of `null` in the index attribute(s) will still be indexed. Duplicate key values 
  can occur and do not lead to unique constraint violations.
 * **non-unique, sparse hash index**: only those documents will be indexed that have all
  the indexed attributes set to a value other than `null`. It can be used for optional
  attributes.
 The amortized complexity of lookup, insert, update, and removal operations in unique hash 
 indexes is O(1). 
 Non-unique hash indexes have an amortized complexity of O(1) for inserts. Lookup, update
 and removal operations in non-unique hash indexes have an amortized complexity that is 
 linearly correlated with the number of duplicates for a given key. That means non-unique 
 hash indexes should not be used on attributes with very low cardinality. 
 If a hash index is created on an attribute that it is missing in all or many of the documents,
 the behavior is as follows:
 * if the index is sparse, the documents missing the attribute will not be indexed and not
  use index memory. These documents will not influence the update or removal performance
  for the index.
 * if the index is non-sparse, the documents missing the attribute will be contained in the
  index with a key value of `null`. If many such documents get indexed, a lot of collisions
  will occur, and lookup, update and removal of documents will become expensive. This 
  should be avoided if possible.
 !SUBSECTION Skiplist Index
-A skiplist is a sorted index structure. They can be used to quickly find documents 
+A skiplist is a sorted index structure. It can be used to quickly find documents 
-with specific attribute values but also support range queries. They can also be used
+with specific attribute values but also for range queries and returning documents from
-for sorting in AQL.
+the index in sorted order. Skiplists will be used from within AQL and several query 
 functions, e.g. `byExample`, `firstExample` etc.
-A skiplist can be created on one or multiple document attributes. 
+Skiplist indexes will be used for lookups, range queries and sorting only if either all
 index attributes are provided in a query, or if a leftmost prefix of the index attributes
 is specified.
 For example, if a skiplist index is created on attributes `value1` and `value2`, the 
 following conditions could use the index (note: the `<=` and `>=` operators are intentionally
 omitted here for the sake of brevity):
    FILTER doc.value1 == ...
    FILTER doc.value1 < ...
    FILTER doc.value1 > ...
    FILTER doc.value1 > ... && doc.value1 < ...
    FILTER doc.value1 == ... && doc.value2 == ...
    FILTER doc.value1 == ... && doc.value2 > ...
    FILTER doc.value1 == ... && doc.value2 > ... && doc.value2 < ...
 In order to use a skiplist index for sorting, the index attributes must be specified in
 the `SORT` clause of the query in the same order as they appear in the index definition.
 Sort orders cannot be mixed, i.e. the sort orders specified in the `SORT` clause must all
 be either ascending (optionally ommitted as ascending is the default) or descending. 
 Skiplists can optionally be declared to be unique, disallowing saving the same
-value in the indexed attribute.
+value in the indexed attribute. They can be sparse or non-sparse.
-Skiplists are supported by AQL and several query functions, e.g. `byExample`, 
+The different types of skiplist indexes have the following characteristics:
-`firstExample` etc.
+
 * **unique skiplist index**: all documents in the collection must have different values for 
  the attributes covered by the unique index. Trying to insert a document with the same 
  key value as an already existing document will lead to a unique constraint 
  violation. 
  This type of index is not sparse. Documents that do not contain the index attributes or 
  that have a value of `null` in the index attribute(s) will still be indexed. 
  A key value of `null` may only occur once in the index, so this type of index cannot 
  be used for optional attributes.
 * **unique, sparse skiplist index**: all documents in the collection must have different 
  values for the attributes covered by the unique index. Documents in which at least one
  of the index attributes is not set or has a value of `null` are not included in the 
  index. This type of index can be used to ensure that there are no duplicate keys in
  the collection for documents which have the indexed attributes set. As the index will
  exclude documents for which the indexed attributes are `null` or not set, it can be
  used for optional attributes.
 * **non-unique skiplist index**: all documents in the collection will be indexed. This type
  of index is not sparse. Documents that do not contain the index attributes or that have 
  a value of `null` in the index attribute(s) will still be indexed. Duplicate key values 
  can occur and do not lead to unique constraint violations.
 * **non-unique, sparse skiplist index**: only those documents will be indexed that have all
  the indexed attributes set to a value other than `null`. It can be used for optional
  attributes.
 The operational amortized complexity for skiplist indexes is logarithmically correlated
 with the number of documents in the index.
 !SUBSECTION Geo Index
-A geo index is used to find places on the surface of the earth fast. The
+Users can create additional geo indexes on one or multiple attributes in collections. 
-geo index in ArangoDB supports near and within queries. There are special functions
+A geo index is used to find places on the surface of the earth fast. 
-to query geo indexes.
+
 The geo index stores two-dimensional coordinates. It can be created on either two 
 separate document attributes (latitude and longitude) or a single array attribute that
 contains both latitude and longitude. Latitude and longitude must be numeric values.
 Th geo index provides operations to find documents with coordinates nearest to a given 
 comparsion coordinate, and to find documents with coordinates that are within a specifiable
 radius around a comparsion coordinate.
 The geo index is used via dedicated functions in AQL or the simple queries, but will
 not enabled for other types of queries or conditions.
 !SUBSECTION Fulltext Index
 A fulltext index can be used to find words, or prefixes of words inside documents. 
-A fulltext index can be set on one attribute only, and will index all words contained 
+A fulltext index can be created on a single attribute only, and will index all words 
-in documents that have a textual value in this attribute. Only words with a (specifyable) 
+contained in documents that have a textual value in that attribute. Only words with a (specifyable) 
 minimum length are indexed. Word tokenization is done using the word boundary analysis 
 provided by libicu, which is taking into account the selected language provided at 
 server start. Words are indexed in their lower-cased form. The index supports complete 
-match queries (full words) and prefix queries.
+match queries (full words) and prefix queries, plus basic logical operations such as 
 `and`, `or` and `not` for combining partial results.
 The fulltext index is sparse, meaning it will only index documents for which the index
 attribute is set and contains a string value. Additionally, only words with a configurable
 minimum length will be included in the index.
-!SECTION Index Identifiers and Handles 
+The fulltext index is used via dedicated functions in AQL or the simple queries, but will
-
+not be enabled for other types of queries or conditions.
 An *index handle* uniquely identifies an index in the database. It is a string and 
 consists of the collection name and an *index identifier* separated by a `/`. The 
 index identifier part is a numeric value that is auto-generated by ArangoDB.
 A specific index of a collection can be accessed using its *index handle* or
 *index identifier* as follows:
 ```js
 db.collection.index("<index-handle>");
 db.collection.index("<index-identifier>");
 db._index("<index-handle>");
 ```
 For example: Assume that the index handle, which is stored in the `_id`
 attribute of the index, is `demo/362549736` and the index was created in a collection
 named `demo`. Then this index can be accessed as:
 ```js
 db.demo.index("demo/362549736");
 ```
 Because the index handle is unique within the database, you can leave out the
 *collection* and use the shortcut:
 ```js
 db._index("demo/362549736");
 ```
 !SECTION Which Index type to use when
 ArangoDB automatically indexes the `_key` attribute in each collection. There
 is no need to index this attribute separately. Please note that a document's
 `_id` attribute is derived from the `_key` attribute, and is thus implicitly
 indexed, too.
 ArangoDB will also automatically create an index on `_from` and `_to` in any
 edge collection, meaning incoming and outgoing connections can be determined
 efficiently.
 Users can define additional indexes on one or multiple document attributes.
 Several different index types are provided by ArangoDB. These indexes have
 different usage scenarios:
 - hash index: provides quick access to individual documents if (and only if)
  all indexed attributes are provided in the search query. The index will only
  be used for equality comparisons. It does not support range queries and 
  cannot be used for sorting..
  The hash index is a good candidate if all or most queries on the indexed
  attribute(s) are equality comparisons. It will be the most efficient index 
  type if the index is declared unique. 
  Insertions into a non-unique hash index are also very efficent. Removal
  performance in a non-unique hash index depends on how often the indexed
  attribute's values repeat. If there are a lot of value repetitions, the
  removal performance in a non-unique hash index will suffer.
  A non-unique hash index should there not be used if duplicate index values 
  are allowed (i.e. when the hash index is not declared *unique*) and there
  will be many duplicate values in the index plus a lot of document removal
  operations in the collection.
 - skip list index: skip lists keep the indexed values in an order, so they can
  be used for equality lookups, range queries and for sorting. Skip list indexes 
  will have a higher overhead than hash indexes but they are more general and
  allow more use cases (e.g. range queries). Additionally, they can be used
  for lower selectivity attributes, when non-unique hash indexes are not a
  good fit.
 - geo index: the geo index provided by ArangoDB allows searching for documents
  within a radius around a two-dimensional earth coordinate (point), or to
  find documents with are closest to a point. Document coordinates can either 
  be specified in two different document attributes or in a single attribute, e.g.
      { "latitude": 50.9406645, "longitude": 6.9599115 }
  or
      { "coords": [ 50.9406645, 6.9599115 ] }
 - fulltext index: a fulltext index can be used to index all words contained in 
  a specific attribute of all documents in a collection. Only words with a 
  (specifiable) minimum length are indexed. Word tokenization is done using 
  the word boundary analysis provided by libicu, which is taking into account 
  the selected language provided at server start.
  The index supports complete match queries (full words) and prefix queries.
 - cap constraint: the cap constraint provided by ArangoDB indexes documents
  not to speed up search queries, but to limit (cap) the number or size of
  documents in a collection.
--- a/Documentation/Books/Users/IndexHandling/WhichIndex.mdpp
+++ b/Documentation/Books/Users/IndexHandling/WhichIndex.mdpp
@ -0,0 +1,135 @@
 !SECTION Which Index to use when
 ArangoDB automatically indexes the `_key` attribute in each collection. There
 is no need to index this attribute separately. Please note that a document's
 `_id` attribute is derived from the `_key` attribute, and is thus implicitly
 indexed, too.
 ArangoDB will also automatically create an index on `_from` and `_to` in any
 edge collection, meaning incoming and outgoing connections can be determined
 efficiently.
 !SUBSECTION Index types
 Users can define additional indexes on one or multiple document attributes.
 Several different index types are provided by ArangoDB. These indexes have
 different usage scenarios:
 - hash index: provides quick access to individual documents if (and only if)
  all indexed attributes are provided in the search query. The index will only
  be used for equality comparisons. It does not support range queries and 
  cannot be used for sorting.
  The hash index is a good candidate if all or most queries on the indexed
  attribute(s) are equality comparisons. It will be the most efficient index 
  type if the index is declared unique. 
  Insertions into a non-unique hash index are also very efficent. Update and
  removal performance in a non-unique hash index depend on the key selectivity.
  If the selectivity is low and keys repeat a lot, update and removal performance
  in a non-unique hash index will degarde.
  A non-unique hash index should therefore not be used if duplicate index values 
  are allowed and it is known that there will be many duplicate values in the index 
  and there will be updates or removals.
  A non-unique hash index on an optional document attribute should be declared
  sparse so that it will not index documents for which the index attribute is
  not set.
 - skiplist index: skiplists keep the indexed values in an order, so they can
  be used for equality lookups, range queries and for sorting. For high selectivity
  attributes, skiplist indexes will have a higher overhead than hash indexes. For
  low selectivity attributes, skiplist indexes will be more efficient than non-unique
  hash indexes.
  Additionally, skiplist indexes allow more use cases (e.g. range queries, sorting)
  than hash indexes. Furthermore, they can be used for lookups based on a leftmost
  prefix of the index attributes.
 - geo index: the geo index provided by ArangoDB allows searching for documents
  within a radius around a two-dimensional earth coordinate (point), or to
  find documents with are closest to a point. Document coordinates can either 
  be specified in two different document attributes or in a single attribute, e.g.
      { "latitude": 50.9406645, "longitude": 6.9599115 }
  or
      { "coords": [ 50.9406645, 6.9599115 ] }
  Geo indexes will only be invoked via special functions.
 - fulltext index: a fulltext index can be used to index all words contained in 
  a specific attribute of all documents in a collection. Only words with a 
  (specifiable) minimum length are indexed. Word tokenization is done using 
  the word boundary analysis provided by libicu, which is taking into account 
  the selected language provided at server start.
  The index supports complete match queries (full words) and prefix queries.
  Fulltexts indexes will only be invoked via special functions.
 - cap constraint: the cap constraint provided by ArangoDB indexes documents
  not to speed up search queries, but to limit (cap) the number or size of
  documents in a collection. This can be used to prevent collections from growing 
  permanently.
 !SUBSECTION Sparse vs. non-sparse indexes
 Hash indexes and skiplist indexes can optionally be created sparse. A sparse index
 does not contain documents for which at least one of the index attribute is not set
 or contains a value of `null`.
 As such documents are excluded from sparse indexes, they may contain fewer documents than
 their non-sparse counterparts. This enables faster indexing and can lead to reduced memory
 usage in case the indexed attribute does occur only in some, but not all documents of the 
 collection. Sparse indexes will also reduce the number of collisions in non-unique hash
 indexes in case non-existing or optional attributes are indexed.
 In order to create a sparse index, an object with the attribute `sparse` can be added to
 the index creation commands:
 ```js
 db.collection.ensureHashIndex(attributeName, { sparse: true }); 
 db.collection.ensureHashIndex(attributeName1, attributeName2, { sparse: true }); 
 db.collection.ensureUniqueConstraint(attributeName, { sparse: true }); 
 db.collection.ensureUniqueConstraint(attributeName1, attributeName2, { sparse: true }); 
 db.collection.ensureSkiplist(attributeName, { sparse: true }); 
 db.collection.ensureSkiplist(attributeName1, attributeName2, { sparse: true }); 
 db.collection.ensureUniqueSkiplist(attributeName, { sparse: true }); 
 db.collection.ensureUniqueSkiplist(attributeName1, attributeName2, { sparse: true }); 
 ```
 When not explicitly set, the `sparse` attribute defaults to `false` for new indexes.
 Other indexes than hash and skiplist do not support sparsity.
 As sparse indexes may exclude some documents from the collection, they cannot be used for
 all types of queries. Sparse hash indexes cannot be used to find documents for which at
 least one of the indexed attributes has a value of `null`. For example, the following AQL
 query cannot use a sparse index, even if one was created on attribute `attr`:
    FOR doc In collection 
      FILTER doc.attr == null 
      RETURN doc
 If the lookup value is non-constant, a sparse index may or may not be used, depending on
 the other types of conditions in the query. If the optimizer can safely determine that
 the lookup value cannot be `null`, a sparse index may be used. When uncertain, the optimizer
 will not make use of a sparse index in a query in order to produce correct results.
 For example, the following queries cannot use a sparse index on `attr` because the optimizer
 will not know beforehand whether the comparsion values for `doc.attr` will include `null`:
    FOR doc In collection 
      FILTER doc.attr == SOME_FUNCTION(...) 
      RETURN doc
    FOR other IN otherCollection 
      FOR doc In collection 
        FILTER doc.attr == other.attr 
        RETURN doc
 Sparse skiplist indexes can be used for sorting if the optimizer can safely detect that the 
 index range does not include `null` for any of the index attributes. 
--- a/Documentation/Books/Users/IndexHandling/WorkingWithIndexes.mdpp
+++ b/Documentation/Books/Users/IndexHandling/WorkingWithIndexes.mdpp
@ -1,5 +1,35 @@
 !CHAPTER Working with Indexes
 !SECTION Index Identifiers and Handles 
 An *index handle* uniquely identifies an index in the database. It is a string and 
 consists of the collection name and an *index identifier* separated by a `/`. The 
 index identifier part is a numeric value that is auto-generated by ArangoDB.
 A specific index of a collection can be accessed using its *index handle* or
 *index identifier* as follows:
 ```js
 db.collection.index("<index-handle>");
 db.collection.index("<index-identifier>");
 db._index("<index-handle>");
 ```
 For example: Assume that the index handle, which is stored in the `_id`
 attribute of the index, is `demo/362549736` and the index was created in a collection
 named `demo`. Then this index can be accessed as:
 ```js
 db.demo.index("demo/362549736");
 ```
 Because the index handle is unique within the database, you can leave out the
 *collection* and use the shortcut:
 ```js
 db._index("demo/362549736");
 ```
 !SECTION Collection Methods
 !SUBSECTION Listing all indexes of a collection
--- a/Documentation/Books/Users/SUMMARY.md
+++ b/Documentation/Books/Users/SUMMARY.md
@ -211,6 +211,7 @@
 * [Administrating ArangoDB](AdministratingArango/README.md)
 * [Indexing](IndexHandling/README.md)
  * [Index Basics](IndexHandling/IndexBasics.md)
  * [Which Index to use when](IndexHandling/WhichIndex.md)
  * [Working with Indexes](IndexHandling/WorkingWithIndexes.md)
    * [Hash Indexes](IndexHandling/Hash.md)
    * [Skiplists](IndexHandling/Skiplist.md)