mirror of https://gitee.com/bigwinds/arangodb
378 lines
17 KiB
Markdown
378 lines
17 KiB
Markdown
Graph traversals in AQL
|
||
=======================
|
||
|
||
Syntax
|
||
------
|
||
|
||
There are two slightly different syntaxes for traversals in AQL, one for
|
||
- [named graphs](../../Manual/Graphs/index.html#named-graphs) and another to
|
||
- specify a [set of edge collections](#working-with-collection-sets)
|
||
([anonymous graph](../../Manual/Graphs/index.html#anonymous-graphs)).
|
||
|
||
### Working with named graphs
|
||
|
||
```
|
||
FOR vertex[, edge[, path]]
|
||
IN [min[..max]]
|
||
OUTBOUND|INBOUND|ANY startVertex
|
||
GRAPH graphName
|
||
[OPTIONS options]
|
||
```
|
||
|
||
- `FOR`: emits up to three variables:
|
||
- **vertex** (object): the current vertex in a traversal
|
||
- **edge** (object, *optional*): the current edge in a traversal
|
||
- **path** (object, *optional*): representation of the current path with
|
||
two members:
|
||
- `vertices`: an array of all vertices on this path
|
||
- `edges`: an array of all edges on this path
|
||
- `IN` `min..max`: the minimal and maximal depth for the traversal:
|
||
- **min** (number, *optional*): edges and vertices returned by this query will
|
||
start at the traversal depth of *min* (thus edges and vertices below will
|
||
not be returned). If not specified, it defaults to 1. The minimal
|
||
possible value is 0.
|
||
- **max** (number, *optional*): up to *max* length paths are traversed.
|
||
If omitted, *max* defaults to *min*. Thus only the vertices and edges in
|
||
the range of *min* are returned. *max* can not be specified without *min*.
|
||
- `OUTBOUND|INBOUND|ANY`: follow outgoing, incoming, or edges pointing in either
|
||
direction in the traversal; Please note that this can't be replaced by a bind parameter.
|
||
- **startVertex** (string|object): a vertex where the traversal will originate from.
|
||
This can be specified in the form of an ID string or in the form of a document
|
||
with the attribute `_id`. All other values will lead to a warning and an empty
|
||
result. If the specified document does not exist, the result is empty as well
|
||
and there is no warning.
|
||
- `GRAPH` **graphName** (string): the name identifying the named graph.
|
||
Its vertex and edge collections will be looked up. Note that the graph name
|
||
is like a regular string, hence it must be enclosed by quote marks.
|
||
- `OPTIONS` **options** (object, *optional*): used to modify the execution of the
|
||
traversal. Only the following attributes have an effect, all others are ignored:
|
||
- **uniqueVertices** (string): optionally ensure vertex uniqueness
|
||
- "path" – it is guaranteed that there is no path returned with a duplicate vertex
|
||
- "global" – it is guaranteed that each vertex is visited at most once during
|
||
the traversal, no matter how many paths lead from the start vertex to this one.
|
||
If you start with a `min depth > 1` a vertex that was found before *min* depth
|
||
might not be returned at all (it still might be part of a path). **Note:**
|
||
Using this configuration the result is not deterministic any more. If there
|
||
are multiple paths from *startVertex* to *vertex*, one of those is picked.
|
||
- "none" (default) – no uniqueness check is applied on vertices
|
||
- **uniqueEdges** (string): optionally ensure edge uniqueness
|
||
- "path" (default) – it is guaranteed that there is no path returned with a
|
||
duplicate edge
|
||
- "global" – it is guaranteed that each edge is visited at most once during
|
||
the traversal, no matter how many paths lead from the start vertex to this edge.
|
||
If you start with a `min depth > 1`, an edge that was found before *min* depth
|
||
might not be returned at all (it still might be part of a path). **Note:**
|
||
Using this configuration the result is not deterministic any more. If there
|
||
are multiple paths from *startVertex* over *edge* one of those is picked.
|
||
- "none" – no uniqueness check is applied on edges. **Note:**
|
||
Using this configuration the traversal will follow cycles in edges.
|
||
- **bfs** (bool): optionally use the alternative breadth-first traversal algorithm
|
||
- true – the traversal will be executed breadth-first. The results will first
|
||
contain all vertices at depth 1. Than all vertices at depth 2 and so on.
|
||
- false (default) – the traversal will be executed depth-first. It will first
|
||
return all paths from *min* depth to *max* depth for one vertex at depth 1.
|
||
Than for the next vertex at depth 1 and so on.
|
||
|
||
### Working with collection sets
|
||
|
||
```
|
||
FOR vertex[, edge[, path]]
|
||
IN [min[..max]]
|
||
OUTBOUND|INBOUND|ANY startVertex
|
||
edgeCollection1, ..., edgeCollectionN
|
||
[OPTIONS options]
|
||
```
|
||
|
||
Instead of `GRAPH graphName` you may specify a list of edge collections. Vertex
|
||
collections are determined by the edges in the edge collections. The traversal
|
||
options are the same as with the [named graph variant](#working-with-named-graphs).
|
||
|
||
If the same edge collection is specified multiple times, it will behave as if it
|
||
were specified only once. Specifying the same edge collection is only allowed when
|
||
the collections do not have conflicting traversal directions.
|
||
|
||
### Traversing in mixed directions
|
||
|
||
For traversals with a list of edge collections you can optionally specify the
|
||
direction for some of the edge collections. Say for example you have three edge
|
||
collections *edges1*, *edges2* and *edges3*, where in *edges2* the direction has
|
||
no relevance but in *edges1* and *edges3* the direction should be taken into account.
|
||
In this case you can use *OUTBOUND* as general traversal direction and *ANY*
|
||
specifically for *edges2* as follows:
|
||
|
||
```
|
||
FOR vertex IN OUTBOUND
|
||
startVertex
|
||
edges1, ANY edges2, edges3
|
||
```
|
||
|
||
All collections in the list that do not specify their own direction will use the
|
||
direction defined after `IN`. This allows to use a different direction for each
|
||
collection in your traversal.
|
||
|
||
### Graph traversals in a cluster
|
||
|
||
Due to the nature of graphs, edges may reference vertices from arbitrary
|
||
collections. Following the path can thus involve documents from various
|
||
collections and it's not possible to predict which will be visited in a
|
||
traversal. Hence, which collections need to be locked can only be determined
|
||
at run time. Deadlocks may occur under certain circumstances.
|
||
|
||
Please consider to use the [`WITH` statement](../Operations/With.md) to
|
||
specify the collections you expect to be involved.
|
||
|
||
Using filters and the explainer to extrapolate the costs
|
||
--------------------------------------------------------
|
||
|
||
All three variables emitted by the traversals might as well be used in filter
|
||
statements. For some of these filter statements the optimizer can detect that it
|
||
is possible to prune paths of traversals earlier, hence filtered results will
|
||
not be emitted to the variables in the first place. This may significantly
|
||
improve the performance of your query. Whenever a filter is not fulfilled,
|
||
the complete set of *vertex*, *edge* and *path* will be skipped. All paths
|
||
with a length greater than *max* will never be computed.
|
||
|
||
In the current state, `AND` combined filters can be optimized, but `OR`
|
||
combined filters cannot.
|
||
|
||
### Filtering on paths
|
||
|
||
Filtering on paths allows for the most powerful filtering and may have the
|
||
highest impact on performance. Using the path variable you can filter on
|
||
specific iteration depths. You can filter for absolute positions in the path
|
||
by specifying a positive number (which then qualifies for the optimizations),
|
||
or relative positions to the end of the path by specifying a negative number.
|
||
|
||
#### Filtering edges on the path
|
||
|
||
```js
|
||
FOR v, e, p IN 1..5 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
|
||
FILTER p.edges[0].theTruth == true
|
||
RETURN p
|
||
```
|
||
|
||
will filter all paths where the start edge (index 0) has the attribute
|
||
*theTruth* equal to *true*. The resulting paths will be up to 5 items long.
|
||
|
||
### Filtering vertices on the path
|
||
|
||
Similar to filtering the edges on the path you can also filter the vertices:
|
||
|
||
```js
|
||
FOR v, e, p IN 1..5 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
|
||
FILTER p.vertices[1]._key == "G"
|
||
RETURN p
|
||
```
|
||
|
||
#### Combining several filters
|
||
|
||
And of course you can combine these filters in any way you like:
|
||
|
||
```js
|
||
FOR v, e, p IN 1..5 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
|
||
FILTER p.edges[0].theTruth == true
|
||
AND p.edges[1].theFalse == false
|
||
FILTER p.vertices[1]._key == "G"
|
||
RETURN p
|
||
```
|
||
|
||
The query will filter all paths where the first edge has the attribute
|
||
*theTruth* equal to *true*, the first vertex is "G" and the second edge has
|
||
the attribute *theFalse* equal to *false*. The resulting paths will be up to
|
||
5 items long.
|
||
|
||
**Note**: Although we have defined a *min* of 1, we will only get results of
|
||
depth 2. This is because for all results in depth 1 the second edge does not
|
||
exist and hence cannot fulfill the condition here.
|
||
|
||
#### Filter on the entire path
|
||
|
||
With the help of array comparison operators filters can also be defined
|
||
on the entire path, like ALL edges should have theTruth == true:
|
||
|
||
```js
|
||
FOR v, e, p IN 1..5 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
|
||
FILTER p.edges[*].theTruth ALL == true
|
||
RETURN p
|
||
```
|
||
|
||
Or NONE of the edges should have theTruth == true:
|
||
|
||
```js
|
||
FOR v, e, p IN 1..5 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
|
||
FILTER p.edges[*].theTruth NONE == true
|
||
RETURN p
|
||
```
|
||
|
||
Both examples above are recognized by the optimizer and can potentially use other indexes
|
||
than the edge index.
|
||
|
||
It is also possible to define that at least one edge on the path has to fulfill the condition:
|
||
|
||
```js
|
||
FOR v, e, p IN 1..5 OUTBOUND 'circles/A' GRAPH 'traversalGraph'
|
||
FILTER p.edges[*].theTruth ANY == true
|
||
RETURN p
|
||
```
|
||
|
||
It is guaranteed that at least one, but potentially more edges fulfill the condition.
|
||
All of the above filters can be defined on vertices in the exact same way.
|
||
|
||
### Filtering on the path vs. filtering on vertices or edges
|
||
Filtering on the path influences the Iteration on your graph. If certain conditions
|
||
aren't met, the traversal may stop continuing along this path.
|
||
|
||
In contrast filters on vertex or edge only express whether you're interestet in the actual value of these
|
||
documents. Thus, it influences the list of returned documents (if you return v or e) similar
|
||
as specifying a non-null `min` value. If you specify a min value of 2, the traversal over the first
|
||
two nodes of these paths has to be executed - you just won't see them in your result array.
|
||
|
||
Similar are filters on vertices or edges - the traverser has to walk along these nodes, since
|
||
you may be interested in documents further down the path.
|
||
|
||
|
||
### Examples
|
||
|
||
We will create a simple symmetric traversal demonstration graph:
|
||
|
||

|
||
|
||
@startDocuBlockInline GRAPHTRAV_01_create_graph
|
||
@EXAMPLE_ARANGOSH_OUTPUT{GRAPHTRAV_01_create_graph}
|
||
~addIgnoreCollection("circles");
|
||
~addIgnoreCollection("edges");
|
||
var examples = require("@arangodb/graph-examples/example-graph.js");
|
||
var graph = examples.loadGraph("traversalGraph");
|
||
db.circles.toArray();
|
||
db.edges.toArray();
|
||
@END_EXAMPLE_ARANGOSH_OUTPUT
|
||
@endDocuBlock GRAPHTRAV_01_create_graph
|
||
|
||
To get started we select the full graph. For better overview we only return
|
||
the vertex IDs:
|
||
|
||
@startDocuBlockInline GRAPHTRAV_02_traverse_all
|
||
@EXAMPLE_ARANGOSH_OUTPUT{GRAPHTRAV_02_traverse_all}
|
||
db._query("FOR v IN 1..3 OUTBOUND 'circles/A' GRAPH 'traversalGraph' RETURN v._key");
|
||
db._query("FOR v IN 1..3 OUTBOUND 'circles/A' edges RETURN v._key");
|
||
@END_EXAMPLE_ARANGOSH_OUTPUT
|
||
@endDocuBlock GRAPHTRAV_02_traverse_all
|
||
|
||
We can nicely see that it is heading for the first outer vertex, then goes back to
|
||
the branch to descend into the next tree. After that it returns to our start node,
|
||
to descend again. As we can see both queries return the same result, the first one
|
||
uses the named graph, the second uses the edge collections directly.
|
||
|
||
Now we only want the elements of a specific depth (min = max = 2), the ones that
|
||
are right behind the fork:
|
||
|
||
@startDocuBlockInline GRAPHTRAV_03_traverse_3
|
||
@EXAMPLE_ARANGOSH_OUTPUT{GRAPHTRAV_03_traverse_3}
|
||
db._query("FOR v IN 2..2 OUTBOUND 'circles/A' GRAPH 'traversalGraph' return v._key");
|
||
db._query("FOR v IN 2 OUTBOUND 'circles/A' GRAPH 'traversalGraph' return v._key");
|
||
@END_EXAMPLE_ARANGOSH_OUTPUT
|
||
@endDocuBlock GRAPHTRAV_03_traverse_3
|
||
|
||
As you can see, we can express this in two ways: with or without *max* parameter
|
||
in the expression.
|
||
|
||
### Filter examples
|
||
|
||
Now let's start to add some filters. We want to cut of the branch on the right
|
||
side of the graph, we may filter in two ways:
|
||
|
||
- we know the vertex at depth 1 has `_key` == `G`
|
||
- we know the `label` attribute of the edge connecting **A** to **G** is `right_foo`
|
||
|
||
@startDocuBlockInline GRAPHTRAV_04_traverse_4
|
||
@EXAMPLE_ARANGOSH_OUTPUT{GRAPHTRAV_04_traverse_4}
|
||
db._query("FOR v, e, p IN 1..3 OUTBOUND 'circles/A' GRAPH 'traversalGraph' FILTER p.vertices[1]._key != 'G' RETURN v._key");
|
||
db._query("FOR v, e, p IN 1..3 OUTBOUND 'circles/A' GRAPH 'traversalGraph' FILTER p.edges[0].label != 'right_foo' RETURN v._key");
|
||
@END_EXAMPLE_ARANGOSH_OUTPUT
|
||
@endDocuBlock GRAPHTRAV_04_traverse_4
|
||
|
||
As we can see all vertices behind **G** are skipped in both queries.
|
||
The first filters on the vertex `_key`, the second on an edge label.
|
||
Note again, as soon as a filter is not fulfilled for any of the three elements
|
||
`v`, `e` or `p`, the complete set of these will be excluded from the result.
|
||
|
||
We also may combine several filters, for instance to filter out the right branch
|
||
(**G**), and the **E** branch:
|
||
|
||
@startDocuBlockInline GRAPHTRAV_05_traverse_5
|
||
@EXAMPLE_ARANGOSH_OUTPUT{GRAPHTRAV_05_traverse_5}
|
||
db._query("FOR v,e,p IN 1..3 OUTBOUND 'circles/A' GRAPH 'traversalGraph' FILTER p.vertices[1]._key != 'G' FILTER p.edges[1].label != 'left_blub' return v._key");
|
||
db._query("FOR v,e,p IN 1..3 OUTBOUND 'circles/A' GRAPH 'traversalGraph' FILTER p.vertices[1]._key != 'G' AND p.edges[1].label != 'left_blub' return v._key");
|
||
@END_EXAMPLE_ARANGOSH_OUTPUT
|
||
@endDocuBlock GRAPHTRAV_05_traverse_5
|
||
|
||
As you can see, combining two `FILTER` statements with an `AND` has the same result.
|
||
|
||
Comparing OUTBOUND / INBOUND / ANY
|
||
----------------------------------
|
||
|
||
All our previous examples traversed the graph in *OUTBOUND* edge direction.
|
||
You may however want to also traverse in reverse direction (*INBOUND*) or
|
||
both (*ANY*). Since `circles/A` only has outbound edges, we start our queries
|
||
from `circles/E`:
|
||
|
||
@startDocuBlockInline GRAPHTRAV_06_traverse_reverse_6
|
||
@EXAMPLE_ARANGOSH_OUTPUT{GRAPHTRAV_06_traverse_reverse_6}
|
||
db._query("FOR v IN 1..3 OUTBOUND 'circles/E' GRAPH 'traversalGraph' return v._key");
|
||
db._query("FOR v IN 1..3 INBOUND 'circles/E' GRAPH 'traversalGraph' return v._key");
|
||
db._query("FOR v IN 1..3 ANY 'circles/E' GRAPH 'traversalGraph' return v._key");
|
||
@END_EXAMPLE_ARANGOSH_OUTPUT
|
||
@endDocuBlock GRAPHTRAV_06_traverse_reverse_6
|
||
|
||
The first traversal will only walk in the forward (*OUTBOUND*) direction.
|
||
Therefore from **E** we only can see **F**. Walking in reverse direction
|
||
(*INBOUND*), we see the path to **A**: **B** → **A**.
|
||
|
||
Walking in forward and reverse direction (*ANY*) we can see a more diverse result.
|
||
First of all, we see the simple paths to **F** and **A**. However, these vertices
|
||
have edges in other directions and they will be traversed.
|
||
|
||
**Note**: The traverser may use identical edges multiple times. For instance,
|
||
if it walks from **E** to **F**, it will continue to walk from **F** to **E**
|
||
using the same edge once again. Due to this we will see duplicate nodes in the result.
|
||
|
||
Please note that the direction can't be passed in by a bind parameter.
|
||
|
||
Use the AQL explainer for optimizations
|
||
---------------------------------------
|
||
|
||
Now let's have a look what the optimizer does behind the curtain and inspect
|
||
traversal queries using [the explainer](../ExecutionAndPerformance/Optimizer.md):
|
||
|
||
@startDocuBlockInline GRAPHTRAV_07_traverse_7
|
||
@EXAMPLE_ARANGOSH_OUTPUT{GRAPHTRAV_07_traverse_7}
|
||
db._explain("FOR v,e,p IN 1..3 OUTBOUND 'circles/A' GRAPH 'traversalGraph' LET localScopeVar = RAND() > 0.5 FILTER p.edges[0].theTruth != localScopeVar RETURN v._key", {}, {colors: false});
|
||
db._explain("FOR v,e,p IN 1..3 OUTBOUND 'circles/A' GRAPH 'traversalGraph' FILTER p.edges[0].label == 'right_foo' RETURN v._key", {}, {colors: false});
|
||
@END_EXAMPLE_ARANGOSH_OUTPUT
|
||
@endDocuBlock GRAPHTRAV_07_traverse_7
|
||
|
||
We now see two queries: In one we add a variable *localScopeVar*, which is outside
|
||
the scope of the traversal itself - it is not known inside of the traverser.
|
||
Therefore, this filter can only be executed after the traversal, which may be
|
||
undesired in large graphs. The second query on the other hand only operates on the
|
||
path, and therefore this condition can be used during the execution of the traversal.
|
||
Paths that are filtered out by this condition won't be processed at all.
|
||
|
||
And finally clean it up again:
|
||
|
||
@startDocuBlockInline GRAPHTRAV_99_drop_graph
|
||
@EXAMPLE_ARANGOSH_OUTPUT{GRAPHTRAV_99_drop_graph}
|
||
var examples = require("@arangodb/graph-examples/example-graph.js");
|
||
examples.dropGraph("traversalGraph");
|
||
~removeIgnoreCollection("circles");
|
||
~removeIgnoreCollection("edges");
|
||
@END_EXAMPLE_ARANGOSH_OUTPUT
|
||
@endDocuBlock GRAPHTRAV_99_drop_graph
|
||
|
||
|
||
If this traversal is not powerful enough for your needs, like you cannot describe
|
||
your conditions as AQL filter statements, then you might want to have a look at
|
||
[manually crafted traversers](../../Manual/Graphs/Traversals/index.html).
|
||
|
||
Also see how to [combine graph traversals](../Examples/CombiningGraphTraversals.md).
|