1
0
Fork 0

updated documentation

This commit is contained in:
Jan Steemann 2015-02-13 00:23:06 +01:00
parent 52783bd9fa
commit bcdbf30ca2
5 changed files with 373 additions and 120 deletions

View File

@ -1,6 +1,62 @@
v2.5.0 (XXXX-XX-XX)
-------------------
* added support for sparse hash and skiplist indexes
Hash and skiplist indexes can optionally be made sparse. Sparse indexes exclude documents
in which at least one of the index attributes is either not set or has a value of `null`.
As such documents are excluded from sparse indexes, they may contain fewer documents than
their non-sparse counterparts. This enables faster indexing and can lead to reduced memory
usage in case the indexed attribute does occur only in some, but not all documents of the
collection. Sparse indexes will also reduce the number of collisions in non-unique hash
indexes in case non-existing or optional attributes are indexed.
In order to create a sparse index, an object with the attribute `sparse` can be added to
the index creation commands:
db.collection.ensureHashIndex(attributeName, { sparse: true });
db.collection.ensureHashIndex(attributeName1, attributeName2, { sparse: true });
db.collection.ensureUniqueConstraint(attributeName, { sparse: true });
db.collection.ensureUniqueConstraint(attributeName1, attributeName2, { sparse: true });
db.collection.ensureSkiplist(attributeName, { sparse: true });
db.collection.ensureSkiplist(attributeName1, attributeName2, { sparse: true });
db.collection.ensureUniqueSkiplist(attributeName, { sparse: true });
db.collection.ensureUniqueSkiplist(attributeName1, attributeName2, { sparse: true });
When not explicitly set, the `sparse` attribute defaults to `false` for new indexes.
Other indexes than hash and skiplist do not support sparsity.
As sparse indexes may exclude some documents from the collection, they cannot be used for
all types of queries. Sparse hash indexes cannot be used to find documents for which at
least one of the indexed attributes has a value of `null`. For example, the following AQL
query cannot use a sparse index, even if one was created on attribute `attr`:
FOR doc In collection
FILTER doc.attr == null
RETURN doc
If the lookup value is non-constant, a sparse index may or may not be used, depending on
the other types of conditions in the query. If the optimizer can safely determine that
the lookup value cannot be `null`, a sparse index may be used. When uncertain, the optimizer
will not make use of a sparse index in a query in order to produce correct results.
For example, the following queries cannot use a sparse index on `attr` because the optimizer
will not know beforehand whether the comparsion values for `doc.attr` will include `null`:
FOR doc In collection
FILTER doc.attr == SOME_FUNCTION(...)
RETURN doc
FOR other IN otherCollection
FOR doc In collection
FILTER doc.attr == other.attr
RETURN doc
Sparse skiplist indexes can be used for sorting if the optimizer can safely detect that the
index range does not include `null` for any of the index attributes.
* inspection of AQL data-modification queries will now detect if the data-modification part
of the query can run in lockstep with the data retrieval part of the query, or if the data
retrieval part must be executed before the data modification can start.

View File

@ -23,7 +23,9 @@ ArangoDB provides the following index types:
For each collection there will always be a *primary index* which is a hash index
for the [document keys](../Glossary/README.html#document_key) (`_key` attribute)
of all documents in the collection. The primary index allows quick selection
of documents in the collection using either the `_key` or `_id` attributes.
of documents in the collection using either the `_key` or `_id` attributes. It will
be used from within AQL queries automatically when performing equality lookups on
`_key` or `_id`.
There are also dedicated functions to find a document given its `_key` or `_id`
that will always make use of the primary index:
@ -33,7 +35,11 @@ db.collection.document("<document-key>");
db._document("<document-id>");
```
The primary index of a collection cannot be dropped or changed.
As the primary index is a hash index, it cannot be used for range queries or for sorting
on `_key` or `_id`.
The primary index of a collection cannot be dropped or changed, and there is no
mechanism to create user-defined primary indexes.
!SUBSECTION Edges Index
@ -44,11 +50,10 @@ documents by either their `_from` or `_to` attributes. It can therefore be
used to quickly find connections between vertex documents and is invoked when
the connecting edges of a vertex are queried.
The edges index cannot be dropped or changed. Extra edges indexes cannot be
created on other attributes or in non-edge collections.
There are also dedidacted functions to find edges given their `_from` or `_to`
values that will always make use of the edges index:
Edges indexes are used from within AQL when performing equality lookups on `_from`
or `_to` values in an edge collections. There are also dedidacted functions to
find edges given their `_from` or `_to` values that will always make use of the
edges index:
```js
db.collection.edges("<from-value>");
@ -59,146 +64,172 @@ db.collection.inEdges("<from-value>");
db.collection.inEdges("<to-value>");
```
The edges index is a hash index. It can be used for equality lookups only, but not for range
queries or for sorting. As edges indexes are automatically created for edge collections, it
is not possible to create user-defined edges indexes.
The edges index cannot be dropped or changed.
!SUBSECTION Hash Index
A hash index can be used to quickly find documents with specific attribute values.
The hash index is unsorted, so it supports equality lookups but no range queries.
The hash index is unsorted, so it supports equality lookups but no range queries or sorting.
A hash index can be created on one or multiple document attributes. A hash index will
only be used by a query if all indexed attributes are present in the search condition,
and if all attributes are compared using the equality (`==`) operator.
and if all attributes are compared using the equality (`==`) operator. Hash indexes are
used from within AQL and several query functions, e.g. `byExample`, `firstExample` etc.
Hash indexes can optionally be declared to be unique, disallowing saving the same
value in the indexed attribute.
value in the indexed attribute. Hash indexes can optionally be sparse.
Hash indexes are supported by AQL and several query functions, e.g. `byExample`,
`firstExample` etc.
The different types of hash indexes have the following characteristics:
* **unique hash index**: all documents in the collection must have different values for
the attributes covered by the unique index. Trying to insert a document with the same
key value as an already existing document will lead to a unique constraint
violation.
This type of index is not sparse. Documents that do not contain the index attributes or
that have a value of `null` in the index attribute(s) will still be indexed.
A key value of `null` may only occur once in the index, so this type of index cannot
be used for optional attributes.
* **unique, sparse hash index**: all documents in the collection must have different
values for the attributes covered by the unique index. Documents in which at least one
of the index attributes is not set or has a value of `null` are not included in the
index. This type of index can be used to ensure that there are no duplicate keys in
the collection for documents which have the indexed attributes set. As the index will
exclude documents for which the indexed attributes are `null` or not set, it can be
used for optional attributes.
* **non-unique hash index**: all documents in the collection will be indexed. This type
of index is not sparse. Documents that do not contain the index attributes or that have
a value of `null` in the index attribute(s) will still be indexed. Duplicate key values
can occur and do not lead to unique constraint violations.
* **non-unique, sparse hash index**: only those documents will be indexed that have all
the indexed attributes set to a value other than `null`. It can be used for optional
attributes.
The amortized complexity of lookup, insert, update, and removal operations in unique hash
indexes is O(1).
Non-unique hash indexes have an amortized complexity of O(1) for inserts. Lookup, update
and removal operations in non-unique hash indexes have an amortized complexity that is
linearly correlated with the number of duplicates for a given key. That means non-unique
hash indexes should not be used on attributes with very low cardinality.
If a hash index is created on an attribute that it is missing in all or many of the documents,
the behavior is as follows:
* if the index is sparse, the documents missing the attribute will not be indexed and not
use index memory. These documents will not influence the update or removal performance
for the index.
* if the index is non-sparse, the documents missing the attribute will be contained in the
index with a key value of `null`. If many such documents get indexed, a lot of collisions
will occur, and lookup, update and removal of documents will become expensive. This
should be avoided if possible.
!SUBSECTION Skiplist Index
A skiplist is a sorted index structure. They can be used to quickly find documents
with specific attribute values but also support range queries. They can also be used
for sorting in AQL.
A skiplist is a sorted index structure. It can be used to quickly find documents
with specific attribute values but also for range queries and returning documents from
the index in sorted order. Skiplists will be used from within AQL and several query
functions, e.g. `byExample`, `firstExample` etc.
A skiplist can be created on one or multiple document attributes.
Skiplist indexes will be used for lookups, range queries and sorting only if either all
index attributes are provided in a query, or if a leftmost prefix of the index attributes
is specified.
For example, if a skiplist index is created on attributes `value1` and `value2`, the
following conditions could use the index (note: the `<=` and `>=` operators are intentionally
omitted here for the sake of brevity):
FILTER doc.value1 == ...
FILTER doc.value1 < ...
FILTER doc.value1 > ...
FILTER doc.value1 > ... && doc.value1 < ...
FILTER doc.value1 == ... && doc.value2 == ...
FILTER doc.value1 == ... && doc.value2 > ...
FILTER doc.value1 == ... && doc.value2 > ... && doc.value2 < ...
In order to use a skiplist index for sorting, the index attributes must be specified in
the `SORT` clause of the query in the same order as they appear in the index definition.
Sort orders cannot be mixed, i.e. the sort orders specified in the `SORT` clause must all
be either ascending (optionally ommitted as ascending is the default) or descending.
Skiplists can optionally be declared to be unique, disallowing saving the same
value in the indexed attribute.
value in the indexed attribute. They can be sparse or non-sparse.
Skiplists are supported by AQL and several query functions, e.g. `byExample`,
`firstExample` etc.
The different types of skiplist indexes have the following characteristics:
* **unique skiplist index**: all documents in the collection must have different values for
the attributes covered by the unique index. Trying to insert a document with the same
key value as an already existing document will lead to a unique constraint
violation.
This type of index is not sparse. Documents that do not contain the index attributes or
that have a value of `null` in the index attribute(s) will still be indexed.
A key value of `null` may only occur once in the index, so this type of index cannot
be used for optional attributes.
* **unique, sparse skiplist index**: all documents in the collection must have different
values for the attributes covered by the unique index. Documents in which at least one
of the index attributes is not set or has a value of `null` are not included in the
index. This type of index can be used to ensure that there are no duplicate keys in
the collection for documents which have the indexed attributes set. As the index will
exclude documents for which the indexed attributes are `null` or not set, it can be
used for optional attributes.
* **non-unique skiplist index**: all documents in the collection will be indexed. This type
of index is not sparse. Documents that do not contain the index attributes or that have
a value of `null` in the index attribute(s) will still be indexed. Duplicate key values
can occur and do not lead to unique constraint violations.
* **non-unique, sparse skiplist index**: only those documents will be indexed that have all
the indexed attributes set to a value other than `null`. It can be used for optional
attributes.
The operational amortized complexity for skiplist indexes is logarithmically correlated
with the number of documents in the index.
!SUBSECTION Geo Index
A geo index is used to find places on the surface of the earth fast. The
geo index in ArangoDB supports near and within queries. There are special functions
to query geo indexes.
Users can create additional geo indexes on one or multiple attributes in collections.
A geo index is used to find places on the surface of the earth fast.
The geo index stores two-dimensional coordinates. It can be created on either two
separate document attributes (latitude and longitude) or a single array attribute that
contains both latitude and longitude. Latitude and longitude must be numeric values.
Th geo index provides operations to find documents with coordinates nearest to a given
comparsion coordinate, and to find documents with coordinates that are within a specifiable
radius around a comparsion coordinate.
The geo index is used via dedicated functions in AQL or the simple queries, but will
not enabled for other types of queries or conditions.
!SUBSECTION Fulltext Index
A fulltext index can be used to find words, or prefixes of words inside documents.
A fulltext index can be set on one attribute only, and will index all words contained
in documents that have a textual value in this attribute. Only words with a (specifyable)
A fulltext index can be created on a single attribute only, and will index all words
contained in documents that have a textual value in that attribute. Only words with a (specifyable)
minimum length are indexed. Word tokenization is done using the word boundary analysis
provided by libicu, which is taking into account the selected language provided at
server start. Words are indexed in their lower-cased form. The index supports complete
match queries (full words) and prefix queries.
match queries (full words) and prefix queries, plus basic logical operations such as
`and`, `or` and `not` for combining partial results.
The fulltext index is sparse, meaning it will only index documents for which the index
attribute is set and contains a string value. Additionally, only words with a configurable
minimum length will be included in the index.
!SECTION Index Identifiers and Handles
An *index handle* uniquely identifies an index in the database. It is a string and
consists of the collection name and an *index identifier* separated by a `/`. The
index identifier part is a numeric value that is auto-generated by ArangoDB.
A specific index of a collection can be accessed using its *index handle* or
*index identifier* as follows:
```js
db.collection.index("<index-handle>");
db.collection.index("<index-identifier>");
db._index("<index-handle>");
```
For example: Assume that the index handle, which is stored in the `_id`
attribute of the index, is `demo/362549736` and the index was created in a collection
named `demo`. Then this index can be accessed as:
```js
db.demo.index("demo/362549736");
```
Because the index handle is unique within the database, you can leave out the
*collection* and use the shortcut:
```js
db._index("demo/362549736");
```
!SECTION Which Index type to use when
ArangoDB automatically indexes the `_key` attribute in each collection. There
is no need to index this attribute separately. Please note that a document's
`_id` attribute is derived from the `_key` attribute, and is thus implicitly
indexed, too.
ArangoDB will also automatically create an index on `_from` and `_to` in any
edge collection, meaning incoming and outgoing connections can be determined
efficiently.
Users can define additional indexes on one or multiple document attributes.
Several different index types are provided by ArangoDB. These indexes have
different usage scenarios:
- hash index: provides quick access to individual documents if (and only if)
all indexed attributes are provided in the search query. The index will only
be used for equality comparisons. It does not support range queries and
cannot be used for sorting..
The hash index is a good candidate if all or most queries on the indexed
attribute(s) are equality comparisons. It will be the most efficient index
type if the index is declared unique.
Insertions into a non-unique hash index are also very efficent. Removal
performance in a non-unique hash index depends on how often the indexed
attribute's values repeat. If there are a lot of value repetitions, the
removal performance in a non-unique hash index will suffer.
A non-unique hash index should there not be used if duplicate index values
are allowed (i.e. when the hash index is not declared *unique*) and there
will be many duplicate values in the index plus a lot of document removal
operations in the collection.
- skip list index: skip lists keep the indexed values in an order, so they can
be used for equality lookups, range queries and for sorting. Skip list indexes
will have a higher overhead than hash indexes but they are more general and
allow more use cases (e.g. range queries). Additionally, they can be used
for lower selectivity attributes, when non-unique hash indexes are not a
good fit.
- geo index: the geo index provided by ArangoDB allows searching for documents
within a radius around a two-dimensional earth coordinate (point), or to
find documents with are closest to a point. Document coordinates can either
be specified in two different document attributes or in a single attribute, e.g.
{ "latitude": 50.9406645, "longitude": 6.9599115 }
or
{ "coords": [ 50.9406645, 6.9599115 ] }
- fulltext index: a fulltext index can be used to index all words contained in
a specific attribute of all documents in a collection. Only words with a
(specifiable) minimum length are indexed. Word tokenization is done using
the word boundary analysis provided by libicu, which is taking into account
the selected language provided at server start.
The index supports complete match queries (full words) and prefix queries.
- cap constraint: the cap constraint provided by ArangoDB indexes documents
not to speed up search queries, but to limit (cap) the number or size of
documents in a collection.
The fulltext index is used via dedicated functions in AQL or the simple queries, but will
not be enabled for other types of queries or conditions.

View File

@ -0,0 +1,135 @@
!SECTION Which Index to use when
ArangoDB automatically indexes the `_key` attribute in each collection. There
is no need to index this attribute separately. Please note that a document's
`_id` attribute is derived from the `_key` attribute, and is thus implicitly
indexed, too.
ArangoDB will also automatically create an index on `_from` and `_to` in any
edge collection, meaning incoming and outgoing connections can be determined
efficiently.
!SUBSECTION Index types
Users can define additional indexes on one or multiple document attributes.
Several different index types are provided by ArangoDB. These indexes have
different usage scenarios:
- hash index: provides quick access to individual documents if (and only if)
all indexed attributes are provided in the search query. The index will only
be used for equality comparisons. It does not support range queries and
cannot be used for sorting.
The hash index is a good candidate if all or most queries on the indexed
attribute(s) are equality comparisons. It will be the most efficient index
type if the index is declared unique.
Insertions into a non-unique hash index are also very efficent. Update and
removal performance in a non-unique hash index depend on the key selectivity.
If the selectivity is low and keys repeat a lot, update and removal performance
in a non-unique hash index will degarde.
A non-unique hash index should therefore not be used if duplicate index values
are allowed and it is known that there will be many duplicate values in the index
and there will be updates or removals.
A non-unique hash index on an optional document attribute should be declared
sparse so that it will not index documents for which the index attribute is
not set.
- skiplist index: skiplists keep the indexed values in an order, so they can
be used for equality lookups, range queries and for sorting. For high selectivity
attributes, skiplist indexes will have a higher overhead than hash indexes. For
low selectivity attributes, skiplist indexes will be more efficient than non-unique
hash indexes.
Additionally, skiplist indexes allow more use cases (e.g. range queries, sorting)
than hash indexes. Furthermore, they can be used for lookups based on a leftmost
prefix of the index attributes.
- geo index: the geo index provided by ArangoDB allows searching for documents
within a radius around a two-dimensional earth coordinate (point), or to
find documents with are closest to a point. Document coordinates can either
be specified in two different document attributes or in a single attribute, e.g.
{ "latitude": 50.9406645, "longitude": 6.9599115 }
or
{ "coords": [ 50.9406645, 6.9599115 ] }
Geo indexes will only be invoked via special functions.
- fulltext index: a fulltext index can be used to index all words contained in
a specific attribute of all documents in a collection. Only words with a
(specifiable) minimum length are indexed. Word tokenization is done using
the word boundary analysis provided by libicu, which is taking into account
the selected language provided at server start.
The index supports complete match queries (full words) and prefix queries.
Fulltexts indexes will only be invoked via special functions.
- cap constraint: the cap constraint provided by ArangoDB indexes documents
not to speed up search queries, but to limit (cap) the number or size of
documents in a collection. This can be used to prevent collections from growing
permanently.
!SUBSECTION Sparse vs. non-sparse indexes
Hash indexes and skiplist indexes can optionally be created sparse. A sparse index
does not contain documents for which at least one of the index attribute is not set
or contains a value of `null`.
As such documents are excluded from sparse indexes, they may contain fewer documents than
their non-sparse counterparts. This enables faster indexing and can lead to reduced memory
usage in case the indexed attribute does occur only in some, but not all documents of the
collection. Sparse indexes will also reduce the number of collisions in non-unique hash
indexes in case non-existing or optional attributes are indexed.
In order to create a sparse index, an object with the attribute `sparse` can be added to
the index creation commands:
```js
db.collection.ensureHashIndex(attributeName, { sparse: true });
db.collection.ensureHashIndex(attributeName1, attributeName2, { sparse: true });
db.collection.ensureUniqueConstraint(attributeName, { sparse: true });
db.collection.ensureUniqueConstraint(attributeName1, attributeName2, { sparse: true });
db.collection.ensureSkiplist(attributeName, { sparse: true });
db.collection.ensureSkiplist(attributeName1, attributeName2, { sparse: true });
db.collection.ensureUniqueSkiplist(attributeName, { sparse: true });
db.collection.ensureUniqueSkiplist(attributeName1, attributeName2, { sparse: true });
```
When not explicitly set, the `sparse` attribute defaults to `false` for new indexes.
Other indexes than hash and skiplist do not support sparsity.
As sparse indexes may exclude some documents from the collection, they cannot be used for
all types of queries. Sparse hash indexes cannot be used to find documents for which at
least one of the indexed attributes has a value of `null`. For example, the following AQL
query cannot use a sparse index, even if one was created on attribute `attr`:
FOR doc In collection
FILTER doc.attr == null
RETURN doc
If the lookup value is non-constant, a sparse index may or may not be used, depending on
the other types of conditions in the query. If the optimizer can safely determine that
the lookup value cannot be `null`, a sparse index may be used. When uncertain, the optimizer
will not make use of a sparse index in a query in order to produce correct results.
For example, the following queries cannot use a sparse index on `attr` because the optimizer
will not know beforehand whether the comparsion values for `doc.attr` will include `null`:
FOR doc In collection
FILTER doc.attr == SOME_FUNCTION(...)
RETURN doc
FOR other IN otherCollection
FOR doc In collection
FILTER doc.attr == other.attr
RETURN doc
Sparse skiplist indexes can be used for sorting if the optimizer can safely detect that the
index range does not include `null` for any of the index attributes.

View File

@ -1,5 +1,35 @@
!CHAPTER Working with Indexes
!SECTION Index Identifiers and Handles
An *index handle* uniquely identifies an index in the database. It is a string and
consists of the collection name and an *index identifier* separated by a `/`. The
index identifier part is a numeric value that is auto-generated by ArangoDB.
A specific index of a collection can be accessed using its *index handle* or
*index identifier* as follows:
```js
db.collection.index("<index-handle>");
db.collection.index("<index-identifier>");
db._index("<index-handle>");
```
For example: Assume that the index handle, which is stored in the `_id`
attribute of the index, is `demo/362549736` and the index was created in a collection
named `demo`. Then this index can be accessed as:
```js
db.demo.index("demo/362549736");
```
Because the index handle is unique within the database, you can leave out the
*collection* and use the shortcut:
```js
db._index("demo/362549736");
```
!SECTION Collection Methods
!SUBSECTION Listing all indexes of a collection

View File

@ -211,6 +211,7 @@
* [Administrating ArangoDB](AdministratingArango/README.md)
* [Indexing](IndexHandling/README.md)
* [Index Basics](IndexHandling/IndexBasics.md)
* [Which Index to use when](IndexHandling/WhichIndex.md)
* [Working with Indexes](IndexHandling/WorkingWithIndexes.md)
* [Hash Indexes](IndexHandling/Hash.md)
* [Skiplists](IndexHandling/Skiplist.md)
@ -224,4 +225,4 @@
* [Document Keys](NamingConventions/DocumentKeys.md)
* [Attribute Names](NamingConventions/AttributeNames.md)
* [Error codes and meanings](ErrorCodes/README.md)
* [Glossary](Glossary/README.md)
* [Glossary](Glossary/README.md)