Doc - Vertex Centric Indexes improvements (#5982)

- Fix example, make usage and utilization clearer
2018-07-31 16:28:47 +02:00 · 2018-07-31 16:28:47 +02:00 · fadd055176
parent 534977ee4e
commit fadd055176
1 changed files with 18 additions and 12 deletions
--- a/Documentation/Books/Manual/Indexing/IndexBasics.md
+++ b/Documentation/Books/Manual/Indexing/IndexBasics.md
@ -413,7 +413,7 @@ value `bar` will be inserted only once:
 db.posts.insert({ tags: [ "foobar", "bar", "bar" ] });
 ```

-This is done to avoid redudant storage of the same index value for the same document, which
+This is done to avoid redundant storage of the same index value for the same document, which
 would not provide any benefit.

 If an array index is declared **unique**, the de-duplication of array values will happen before 
@ -485,17 +485,21 @@ Vertex centric indexes
 As mentioned above, the most important indexes for graphs are the edge
 indexes, indexing the `_from` and `_to` attributes of edge collections.
 They provide very quick access to all edges originating in or arriving
-at a given vertex, which allows to quickly find all neighbours of a vertex
+at a given vertex, which allows to quickly find all neighbors of a vertex
 in a graph.

 In many cases one would like to run more specific queries, for example
-finding amongst the edges originating in a given vertex only those
-with the 20 latest time stamps. Exactly this is achieved with "vertex 
-centric indexes". In a sense these are localized indexes for an edge
-collection, which sit at every single vertex.
+finding amongst the edges originating from a given vertex only those
+with a timestamp greater than or equal to some date and time. Exactly this
+is achieved with "vertex centric indexes". In a sense these are localized
+indexes for an edge collection, which sit at every single vertex.

 Technically, they are implemented in ArangoDB as indexes, which sort the 
-complete edge collection first by `_from` and then by other attributes.
+complete edge collection first by `_from` and then by other attributes
+for _OUTBOUND_ traversals, or first by `_to` and then by other attributes
+for _INBOUND_ traversals. For traversals in _ANY_ direction two indexes
+are needed, one with `_from` and the other with `_to` as first indexed field.
+
 If we for example have a skiplist index on the attributes `_from` and 
 `timestamp` of an edge collection, we can answer the above question
 very quickly with a single range lookup in the index.
@ -513,15 +517,17 @@ would simply do
 db.edges.ensureIndex({"type":"skiplist", "fields": ["_from", "timestamp"]});
 ```

-Then, queries like
+in arangosh. Then, queries like

 ```js
 FOR v, e, p IN 1..1 OUTBOUND "V/1" edges
-  FILTER e.timestamp ALL >= "2016-11-09"
+  FILTER e.timestamp >= "2018-07-09"
  RETURN p
 ```

 will be considerably faster in case there are many edges originating
-in vertex `"V/1"` but only few with a recent time stamp.
-
-
+from vertex `"V/1"` but only few with a recent time stamp. Note that the
+optimizer may prefer the default edge index over vertex centric indexes
+based on the costs it estimates, even if a vertex centric index might
+in fact be faster. Vertex centric indexes are more likely to be chosen
+for highly connected graphs and with RocksDB storage engine.