mirror of https://gitee.com/bigwinds/arangodb
845 lines
29 KiB
Markdown
845 lines
29 KiB
Markdown
---
|
|
layout: default
|
|
description: Using Traversal Objects
|
|
---
|
|
Using Traversal Objects
|
|
=======================
|
|
|
|
{% hint 'warning' %}
|
|
The JavaScript module `@arangodb/graph/traversal` (*traversal module* for short)
|
|
is deprecated from version 3.4.0 on. The preferred way to traverse graphs is with AQL.
|
|
{% endhint %}
|
|
|
|
To use a traversal object, we first need to require the *traversal* module:
|
|
|
|
|
|
```js
|
|
var traversal = require("@arangodb/graph/traversal");
|
|
var examples = require("@arangodb/graph-examples/example-graph.js");
|
|
examples.loadGraph("worldCountry");
|
|
```
|
|
|
|
We then need to setup a configuration for the traversal and determine at which vertex to
|
|
start the traversal:
|
|
|
|
```js
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("worldCountry"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.inboundExpander,
|
|
maxDepth: 1
|
|
};
|
|
|
|
var startVertex = db._document("v/world");
|
|
```
|
|
|
|
**Note**: The startVertex needs to be a document, not only a document id.
|
|
|
|
We can then create a traverser and start the traversal by calling its *traverse* method.
|
|
Note that *traverse* needs a *result* object, which it can modify in place:
|
|
|
|
```js
|
|
var result = {
|
|
visited: {
|
|
vertices: [ ],
|
|
paths: [ ]
|
|
}
|
|
};
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(result, startVertex);
|
|
```
|
|
|
|
Finally, we can print the contents of the *results* object, limited to the visited vertices.
|
|
We will only print the name and type of each visited vertex for brevity:
|
|
|
|
```js
|
|
require("@arangodb").print(result.visited.vertices.map(function(vertex) {
|
|
return vertex.name + " (" + vertex.type + ")";
|
|
}));
|
|
```
|
|
|
|
|
|
The full script, which includes all steps carried out so far is thus:
|
|
|
|
```js
|
|
var traversal = require("@arangodb/graph/traversal");
|
|
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("worldCountry"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.inboundExpander,
|
|
maxDepth: 1
|
|
};
|
|
|
|
var startVertex = db._document("v/world");
|
|
var result = {
|
|
visited: {
|
|
vertices: [ ],
|
|
paths: [ ]
|
|
}
|
|
};
|
|
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(result, startVertex);
|
|
|
|
require("@arangodb").print(result.visited.vertices.map(function(vertex) {
|
|
return vertex.name + " (" + vertex.type + ")";
|
|
}));
|
|
```
|
|
|
|
The result is an array of vertices that were visited during the traversal, starting at the
|
|
start vertex (i.e. *v/world* in our example):
|
|
|
|
```js
|
|
[
|
|
"World (root)",
|
|
"Africa (continent)",
|
|
"Asia (continent)",
|
|
"Australia (continent)",
|
|
"Europe (continent)",
|
|
"North America (continent)",
|
|
"South America (continent)"
|
|
]
|
|
```
|
|
|
|
**Note**: The result is limited to vertices directly connected to the start vertex. We
|
|
achieved this by setting the *maxDepth* attribute to *1*. Not setting it would return the
|
|
full array of vertices.
|
|
|
|
Traversal Direction
|
|
-------------------
|
|
|
|
For the examples contained in this manual, we'll be starting the traversals at vertex
|
|
*v/world*. Vertices in our graph are connected like this:
|
|
|
|
```js
|
|
v/world <- is-in <- continent (Africa) <- is-in <- country (Algeria) <- is-in <- capital (Algiers)
|
|
```
|
|
|
|
To get any meaningful results, we must traverse the graph in **inbound** order. This means,
|
|
we'll be following all incoming edges of to a vertex. In the traversal configuration, we
|
|
have specified this via the *expander* attribute:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: traversal.inboundExpander
|
|
};
|
|
```
|
|
|
|
For other graphs, we might want to traverse via the **outgoing** edges. For this, we can
|
|
use the *outboundExpander*. There is also an *anyExpander*, which will follow both outgoing
|
|
and incoming edges. This should be used with care and the traversal should always be
|
|
limited to a maximum number of iterations (e.g. using the *maxIterations* attribute) in
|
|
order to terminate at some point.
|
|
|
|
To invoke the default outbound expander for a graph, simply use the predefined function:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: traversal.outboundExpander
|
|
};
|
|
```
|
|
|
|
Please note the outbound expander will not produce any output for the examples if we still
|
|
start the traversal at the *v/world* vertex.
|
|
|
|
Still, we can use the outbound expander if we start somewhere else in the graph, e.g.
|
|
|
|
```js
|
|
var traversal = require("@arangodb/graph/traversal");
|
|
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("world_graph"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.outboundExpander
|
|
};
|
|
|
|
var startVertex = db._document("v/capital-algiers");
|
|
var result = {
|
|
visited: {
|
|
vertices: [ ],
|
|
paths: [ ]
|
|
}
|
|
};
|
|
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(result, startVertex);
|
|
|
|
require("@arangodb").print(result.visited.vertices.map(function(vertex) {
|
|
return vertex.name + " (" + vertex.type + ")";
|
|
}));
|
|
```
|
|
|
|
The result is:
|
|
|
|
```js
|
|
[
|
|
"Algiers (capital)",
|
|
"Algeria (country)",
|
|
"Africa (continent)",
|
|
"World (root)"
|
|
]
|
|
```
|
|
|
|
which confirms that now we're going outbound.
|
|
|
|
Traversal Strategy
|
|
------------------
|
|
|
|
### Depth-first traversals
|
|
|
|
The visitation order of vertices is determined by the *strategy* and *order* attributes set
|
|
in the configuration. We chose *depthfirst* and *preorder*, meaning the traverser will
|
|
visit each vertex **before** handling connected edges (pre-order), and descend into any
|
|
connected edges before processing other vertices on the same level (depth-first).
|
|
|
|
Let's remove the *maxDepth* attribute now. We'll now be getting all vertices (directly
|
|
and indirectly connected to the start vertex):
|
|
|
|
```js
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("world_graph"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.inboundExpander
|
|
};
|
|
|
|
var result = {
|
|
visited: {
|
|
vertices: [ ],
|
|
paths: [ ]
|
|
}
|
|
};
|
|
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(result, startVertex);
|
|
|
|
require("@arangodb").print(result.visited.vertices.map(function(vertex) {
|
|
return vertex.name + " (" + vertex.type + ")";
|
|
}));
|
|
```
|
|
|
|
The result will be a longer array, assembled in depth-first, pre-order order. For
|
|
each continent found, the traverser will descend into linked countries, and then into
|
|
the linked capital:
|
|
|
|
```js
|
|
[
|
|
"World (root)",
|
|
"Africa (continent)",
|
|
"Algeria (country)",
|
|
"Algiers (capital)",
|
|
"Angola (country)",
|
|
"Luanda (capital)",
|
|
"Botswana (country)",
|
|
"Gaborone (capital)",
|
|
"Burkina Faso (country)",
|
|
"Ouagadougou (capital)",
|
|
...
|
|
]
|
|
```
|
|
|
|
Let's switch the *order* attribute from *preorder* to *postorder*. This will make the
|
|
traverser visit vertices **after** all connected vertices were visited (i.e. most distant
|
|
vertices will be emitted first):
|
|
|
|
```js
|
|
[
|
|
"Algiers (capital)",
|
|
"Algeria (country)",
|
|
"Luanda (capital)",
|
|
"Angola (country)",
|
|
"Gaborone (capital)",
|
|
"Botswana (country)",
|
|
"Ouagadougou (capital)",
|
|
"Burkina Faso (country)",
|
|
"Bujumbura (capital)",
|
|
"Burundi (country)",
|
|
"Yaounde (capital)",
|
|
"Cameroon (country)",
|
|
"N'Djamena (capital)",
|
|
"Chad (country)",
|
|
"Yamoussoukro (capital)",
|
|
"Cote d'Ivoire (country)",
|
|
"Cairo (capital)",
|
|
"Egypt (country)",
|
|
"Asmara (capital)",
|
|
"Eritrea (country)",
|
|
"Africa (continent)",
|
|
...
|
|
]
|
|
```
|
|
|
|
### Breadth-first traversals
|
|
|
|
If we go back to *preorder*, but change the strategy to *breadth-first* and re-run the
|
|
traversal, we'll see that the return order changes, and items on the same level will be
|
|
returned adjacently:
|
|
|
|
```js
|
|
[
|
|
"World (root)",
|
|
"Africa (continent)",
|
|
"Asia (continent)",
|
|
"Australia (continent)",
|
|
"Europe (continent)",
|
|
"North America (continent)",
|
|
"South America (continent)",
|
|
"Burkina Faso (country)",
|
|
"Burundi (country)",
|
|
"Cameroon (country)",
|
|
"Chad (country)",
|
|
"Algeria (country)",
|
|
"Angola (country)",
|
|
...
|
|
]
|
|
```
|
|
|
|
**Note**: The order of items returned for the same level is undefined.
|
|
This is because there is no natural order of edges for a vertex with
|
|
multiple connected edges. To explicitly set the order for edges on the
|
|
same level, you can specify an edge comparator function with the *sort*
|
|
attribute:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
sort: function (l, r) { return l._key < r._key ? 1 : -1; }
|
|
...
|
|
};
|
|
```
|
|
|
|
The arguments l and r are edge documents.
|
|
This will traverse edges of the same vertex in backward *_key* order:
|
|
|
|
```js
|
|
[
|
|
"World (root)",
|
|
"South America (continent)",
|
|
"North America (continent)",
|
|
"Europe (continent)",
|
|
"Australia (continent)",
|
|
"Asia (continent)",
|
|
"Africa (continent)",
|
|
"Ecuador (country)",
|
|
"Colombia (country)",
|
|
"Chile (country)",
|
|
"Brazil (country)",
|
|
"Bolivia (country)",
|
|
"Argentina (country)",
|
|
...
|
|
]
|
|
```
|
|
|
|
**Note**: This attribute only works for the usual expanders
|
|
*traversal.inboundExpander*, *traversal.outboundExpander*,
|
|
*traversal.anyExpander* and their corresponding "WithLabels" variants.
|
|
If you are using custom expanders
|
|
you have to organize the sorting within the specified expander.
|
|
|
|
### Writing Custom Visitors
|
|
|
|
So far we have used much of the traverser's default functions. The traverser is very
|
|
configurable and many of the default functions can be overridden with custom functionality.
|
|
|
|
For example, we have been using the default visitor function (which is always used if
|
|
the configuration does not contain the *visitor* attribute). The default visitor function
|
|
is called for each vertex in a traversal, and will push it into the result.
|
|
This is the reason why the *result* variable looked different after the traversal, and
|
|
needed to be initialized before the traversal was started.
|
|
|
|
Note that the default visitor (named `trackingVisitor`) will add every visited vertex
|
|
into the result, including the full paths from the start vertex. This is useful for learning and
|
|
debugging purposes, but should be avoided in production because it might produce (and
|
|
copy) huge amounts of data. Instead, only those data should be copied into the result
|
|
that are actually necessary.
|
|
|
|
The traverser comes with the following predefined visitors:
|
|
- *trackingVisitor*: this is the default visitor. It will copy all data of each visited
|
|
vertex plus the full path information into the result. This can be slow if the result
|
|
set is huge or vertices contain a lot of data.
|
|
- *countingVisitor*: this is a very lightweight visitor: all it does is increase a
|
|
counter in the result for each vertex visited. Vertex data and paths will not be copied
|
|
into the result.
|
|
- *doNothingVisitor*: if no action shall be carried out when a vertex is visited, this
|
|
visitor can be employed. It will not do anything and will thus be fast. It can be used
|
|
for performance comparisons with other visitors.
|
|
|
|
We can also write our own visitor function if we want to. The general function signature for
|
|
visitor functions is as follows:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
visitor: function (config, result, vertex, path, connected) { ... }
|
|
};
|
|
```
|
|
|
|
Note: the *connected* parameter value will only be set if the traversal order is
|
|
set to *preorder-expander*. Otherwise, this parameter won't be set by the traverser.
|
|
|
|
Visitor functions are not expected to return any values. Instead, they can modify the
|
|
*result* variable (e.g. by pushing the current vertex into it), or do anything else.
|
|
For example, we can create a simple visitor function that only prints information about
|
|
the current vertex as we traverse:
|
|
|
|
```js
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("world_graph"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.inboundExpander,
|
|
visitor: function (config, result, vertex, path) {
|
|
require("@arangodb").print("visiting vertex", vertex.name);
|
|
}
|
|
};
|
|
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(undefined, startVertex);
|
|
```
|
|
|
|
To write a visitor that increments a counter each time a vertex is visited,
|
|
we could write the following custom visitor:
|
|
|
|
```js
|
|
config.visitor = function (config, result, vertex, path, connected) {
|
|
if (! result) {
|
|
result = { };
|
|
}
|
|
|
|
if (! result.hasOwnProperty('count')) {
|
|
result.count = 0;
|
|
}
|
|
|
|
++result.count;
|
|
}
|
|
```
|
|
|
|
Note that such visitor is already predefined (it's the countingVisitor described
|
|
above). It can be used as follows:
|
|
```js
|
|
config.visitor = traversal.countingVisitor;
|
|
```
|
|
|
|
Another example of a visitor is one that collects the `_id` values of all vertices
|
|
visited:
|
|
|
|
```js
|
|
config.visitor = function (config, result, vertex, path, connected) {
|
|
if (! result) {
|
|
result = { };
|
|
}
|
|
if (! result.hasOwnProperty("visited")) {
|
|
result.visited = { vertices: [ ] };
|
|
}
|
|
|
|
result.visited.vertices.push(vertex._id);
|
|
}
|
|
```
|
|
|
|
When the traversal order is set to *preorder-expander*, the traverser will pass
|
|
a fifth parameter value into the visitor function. This parameter contains the
|
|
connected edges of the visited vertex as an array. This can be handy because in this
|
|
case the visitor will get all information about the vertex and the connected edges
|
|
together.
|
|
|
|
For example, the following visitor can be used to print only leaf nodes (that
|
|
do not have any further connected edges):
|
|
|
|
```js
|
|
config.visitor = function (config, result, vertex, path, connected) {
|
|
if (connected && connected.length === 0) {
|
|
require("@arangodb").print("found a leaf-node: ", vertex);
|
|
}
|
|
}
|
|
```
|
|
|
|
Note that for this visitor to work, the traversal *order* attribute needs to be
|
|
set to the value *preorder-expander*.
|
|
|
|
Filtering Vertices and Edges
|
|
----------------------------
|
|
|
|
### Filtering Vertices
|
|
|
|
So far we have printed or returned all vertices that were visited during the traversal.
|
|
This is not always required. If the result shall be restrict to just specific vertices,
|
|
we can use a filter function for vertices. It can be defined by setting the *filter*
|
|
attribute of a traversal configuration, e.g.:
|
|
|
|
```js
|
|
var config = {
|
|
filter: function (config, vertex, path) {
|
|
if (vertex.type !== 'capital') {
|
|
return 'exclude';
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The above filter function will exclude all vertices that do not have a *type* value of
|
|
*capital*. The filter function will be called for each vertex found during the traversal.
|
|
It will receive the traversal configuration, the current vertex, and the full path from
|
|
the traversal start vertex to the current vertex. The path consists of an array of edges,
|
|
and an array of vertices. We could also filter everything but capitals by checking the
|
|
length of the path from the start vertex to the current vertex. Capitals will have a
|
|
distance of 3 from the *v/world* start vertex
|
|
(capital → is-in → country → is-in → continent → is-in → world):
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
filter: function (config, vertex, path) {
|
|
if (path.edges.length < 3) {
|
|
return 'exclude';
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Note**: If a filter function returns nothing (or *undefined*), the current vertex
|
|
will be included, and all connected edges will be followed. If a filter function
|
|
returns *exclude* the current vertex will be excluded from the result, and all still
|
|
all connected edges will be followed. If a filter function returns *prune*, the
|
|
current vertex will be included, but no connected edges will be followed.
|
|
|
|
For example, the following filter function will not descend into connected edges of
|
|
continents, limiting the depth of the traversal. Still, continent vertices will be
|
|
included in the result:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
filter: function (config, vertex, path) {
|
|
if (vertex.type === 'continent') {
|
|
return 'prune';
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
It is also possible to combine *exclude* and *prune* by returning an array with both
|
|
values:
|
|
|
|
```js
|
|
return [ 'exclude', 'prune' ];
|
|
```
|
|
|
|
Filtering Edges
|
|
---------------
|
|
|
|
It is possible to exclude certain edges from the traversal. To filter on edges, a
|
|
filter function can be defined via the *expandFilter* attribute. The *expandFilter*
|
|
is a function which is called for each edge during a traversal.
|
|
|
|
It will receive the current edge (*edge* variable) and the vertex which the edge
|
|
connects to (in the direction of the traversal). It also receives the current path
|
|
from the start vertex up to the current vertex (excluding the current edge and the
|
|
vertex the edge points to).
|
|
|
|
If the function returns *true*, the edge will be followed. If the function returns
|
|
*false*, the edge will not be followed.
|
|
Here is a very simple custom edge filter function implementation, which simply
|
|
includes edges if the (edges) path length is less than 1, and will exclude any
|
|
other edges. This will effectively terminate the traversal after the first level
|
|
of edges:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expandFilter: function (config, vertex, edge, path) {
|
|
return (path.edges.length < 1);
|
|
}
|
|
};
|
|
```
|
|
|
|
Writing Custom Expanders
|
|
------------------------
|
|
|
|
The edges connected to a vertex are determined by the expander. So far we have used a
|
|
default expander (the default inbound expander to be precise). The default inbound
|
|
expander simply enumerates all connected ingoing edges for a vertex, based on the
|
|
[edge collection](appendix-glossary.html#edge-collection) specified in the traversal configuration.
|
|
|
|
There is also a default outbound expander, which will enumerate all connected outgoing
|
|
edges. Finally, there is an any expander, which will follow both ingoing and outgoing
|
|
edges.
|
|
|
|
If connected edges must be determined in some different fashion for whatever reason, a
|
|
custom expander can be written and registered by setting the *expander* attribute of the
|
|
configuration. The expander function signature is as follows:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: function (config, vertex, path) { ... }
|
|
}
|
|
```
|
|
|
|
It is the expander's responsibility to return all edges and vertices directly
|
|
connected to the current vertex (which is passed via the *vertex* variable).
|
|
The full path from the start vertex up to the current vertex is also supplied via
|
|
the *path* variable.
|
|
An expander is expected to return an array of objects, which need to have an *edge*
|
|
and a *vertex* attribute each.
|
|
|
|
**Note**: If you want to rely on a particular order in which the edges
|
|
are traversed, you have to sort the edges returned by your expander
|
|
within the code of the expander. The functions to get outbound, inbound
|
|
or any edges from a vertex do not guarantee any particular order!
|
|
|
|
A custom implementation of an inbound expander could look like this (this is a
|
|
non-deterministic expander, which randomly decides whether or not to include
|
|
connected edges):
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: function (config, vertex, path) {
|
|
var connected = [ ];
|
|
var datasource = config.datasource;
|
|
datasource.getInEdges(vertex._id).forEach(function (edge) {
|
|
if (Math.random() >= 0.5) {
|
|
connected.push({ edge: edge, vertex: (edge._from) });
|
|
}
|
|
});
|
|
return connected;
|
|
}
|
|
};
|
|
```
|
|
|
|
A custom expander can also be used as an edge filter because it has full control
|
|
over which edges will be returned.
|
|
|
|
Following are two examples of custom expanders that pick edges based on attributes
|
|
of the edges and the connected vertices.
|
|
|
|
Finding the connected edges / vertices based on an attribute *when* in the
|
|
connected vertices. The goal is to follow the edge that leads to the vertex
|
|
with the highest value in the *when* attribute:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: function (config, vertex, path) {
|
|
var datasource = config.datasource;
|
|
// determine all outgoing edges
|
|
var outEdges = datasource.getOutEdges(vertex);
|
|
|
|
if (outEdges.length === 0) {
|
|
return [ ];
|
|
}
|
|
|
|
var data = [ ];
|
|
outEdges.forEach(function (edge) {
|
|
data.push({ edge: edge, vertex: datasource.getInVertex(edge) });
|
|
});
|
|
|
|
// sort outgoing vertices according to "when" attribute value
|
|
data.sort(function (l, r) {
|
|
if (l.vertex.when === r.vertex.when) {
|
|
return 0;
|
|
}
|
|
|
|
return (l.vertex.when < r.vertex.when ? 1 : -1);
|
|
});
|
|
|
|
// pick first vertex found (with highest "when" attribute value)
|
|
return [ data[0] ];
|
|
}
|
|
...
|
|
};
|
|
```
|
|
|
|
Finding the connected edges / vertices based on an attribute *when* in the
|
|
edge itself. The goal is to pick the one edge (out of potentially many) that
|
|
has the highest *when* attribute value:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: function (config, vertex, path) {
|
|
var datasource = config.datasource;
|
|
// determine all outgoing edges
|
|
var outEdges = datasource.getOutEdges(vertex);
|
|
|
|
if (outEdges.length === 0) {
|
|
return [ ]; // return an empty array
|
|
}
|
|
|
|
// sort all outgoing edges according to "when" attribute
|
|
outEdges.sort(function (l, r) {
|
|
if (l.when === r.when) {
|
|
return 0;
|
|
}
|
|
return (l.when < r.when ? -1 : 1);
|
|
});
|
|
|
|
// return first edge (the one with highest "when" value)
|
|
var edge = outEdges[0];
|
|
try {
|
|
var v = datasource.getInVertex(edge);
|
|
return [ { edge: edge, vertex: v } ];
|
|
}
|
|
catch (e) { }
|
|
|
|
return [ ];
|
|
}
|
|
...
|
|
};
|
|
```
|
|
|
|
|
|
Handling Uniqueness
|
|
-------------------
|
|
|
|
Graphs may contain cycles. To be on top of what happens when a traversal encounters a vertex
|
|
or an edge it has already visited, there are configuration options.
|
|
|
|
The default configuration is to visit every vertex, regardless of whether it was already visited
|
|
in the same traversal. However, edges will by default only be followed if they are not already
|
|
present in the current path.
|
|
|
|
Imagine the following graph which contains a cycle:
|
|
|
|
```
|
|
A -> B -> C -> A
|
|
```
|
|
|
|
When the traversal finds the edge from *C* to *A*, it will by default follow it. This is because
|
|
we have not seen this edge yet. It will also visit vertex *A* again. This is because by default
|
|
all vertices will be visited, regardless of whether already visited or not.
|
|
|
|
However, the traversal will not again following the outgoing edge from *A* to *B*. This is because
|
|
we already have the edge from *A* to *B* in our current path.
|
|
|
|
These default settings will prevent infinite traversals.
|
|
|
|
To adjust the uniqueness for visiting vertices, there are the following options for *uniqueness.vertices*:
|
|
|
|
* *"none"*: always visit a vertices, regardless of whether it was already visited or not
|
|
* *"global"*: visit a vertex only if it was not visited in the traversal
|
|
* *"path"*: visit a vertex if it is not included in the current path
|
|
|
|
To adjust the uniqueness for following edges, there are the following options for *uniqueness.edges*:
|
|
|
|
* *"none"*: always follow an edge, regardless of whether it was followed before
|
|
* *"global"*: follow an edge only if it wasn't followed in the traversal
|
|
* *"path"*: follow an edge if it is not included in the current path
|
|
|
|
Note that uniqueness checking will have some effect on both runtime and memory usage. For example, when
|
|
uniqueness checks are set to *"global"*, arrays of visited vertices and edges must be kept in memory while the
|
|
traversal is executed. Global uniqueness should thus only be used when a traversal is expected to visit
|
|
few nodes.
|
|
|
|
In terms of runtime, turning off uniqueness checks (by setting both options to *"none"*) is the best choice,
|
|
but it is only safe for graphs that do not contain cycles. When uniqueness checks are deactivated in a graph
|
|
with cycles, the traversal might not abort in a sensible amount of time.
|
|
|
|
|
|
Optimizations
|
|
-------------
|
|
|
|
There are a few options for making a traversal run faster.
|
|
|
|
The best option is to make the amount of visited vertices and followed edges as small as possible. This can
|
|
be achieved by writing custom filter and expander functions. Such functions should only include vertices of
|
|
interest, and only follow edges that might be interesting.
|
|
|
|
Traversal depth can also be bounded with the *minDepth* and *maxDepth* options.
|
|
|
|
Another way to speed up traversals is to write a custom visitor function. The default visitor function
|
|
(*trackingVisitor*) will copy every visited vertex into the result. If vertices contain lots of data,
|
|
this might be expensive. It is therefore recommended to only copy such data into the result that is actually
|
|
needed. The default visitor function will also copy the full path to the visited document into the result.
|
|
This is even more expensive and should be avoided if possible.
|
|
|
|
If the goal of a traversal is to only count the number of visited vertices, the prefab *countingVisitor*
|
|
will be much more efficient than the default visitor.
|
|
|
|
For graphs that are known to not contain any cycles, uniqueness checks should be turned off. This can achieved
|
|
via the *uniqueness* configuration options. Note that uniqueness checks should not be turned off for graphs
|
|
that are known contain cycles or if there is no information about the graph's structure.
|
|
|
|
By default, a traversal will only process a limited number of vertices. This is protect the user from
|
|
unintentionally run a never-ending traversal on a graph with cyclic data. How many vertices will be processed
|
|
at most is determined by the *maxIterations* configuration option. If a traversal hits the cap specified
|
|
by *maxIterations*, it will abort and throw a *too many iterations* exception. If this error is encountered,
|
|
the *maxIterations* value should be increased if it is made sure that the other traversal configuration
|
|
parameters are sane and the traversal will abort naturally at some point.
|
|
|
|
Finally, the *buildVertices* configuration option can be set to *false* to avoid looking up and fully constructing
|
|
vertex data. If all that's needed from vertices are the *_id* or *_key* attributes, the *buildvertices*
|
|
option can be set to *false*. If visitor, filter or expandFilter functions need to access other vertex
|
|
attributes, the option should not be changed.
|
|
|
|
|
|
Configuration Overview
|
|
----------------------
|
|
|
|
This section summarizes the configuration attributes for the traversal object. The
|
|
configuration can consist of the following attributes:
|
|
|
|
- *visitor*: visitor function for vertices. It will be called for all non-excluded vertices. The
|
|
general visitor function signature is *function (config, result, vertex, path)*. If the traversal
|
|
order is *preorder-expander*, the connecting edges of the visited vertex will be passed as the
|
|
fifth parameter, extending the function signature to: *function (config, result, vertex, path, edges)*.
|
|
|
|
Visitor functions are not expected to return values, but they may modify the *result* variable as
|
|
needed (e.g. by pushing vertex data into the result).
|
|
- *expander*: expander function that is responsible for returning edges and vertices
|
|
directly connected to a vertex. The function signature is *function (config, vertex, path)*.
|
|
The expander function is required to return an array of connection objects, consisting of an
|
|
*edge* and *vertex* attribute each. If there are no connecting edges, the expander is expected to
|
|
return an empty array.
|
|
- *filter*: vertex filter function. The function signature is *function (config, vertex, path)*. It
|
|
may return one of the following values:
|
|
- *undefined*: vertex will be included in the result and connected edges will be traversed
|
|
- *"exclude"*: vertex will not be included in the result and connected edges will be traversed
|
|
- *"prune"*: vertex will be included in the result but connected edges will not be traversed
|
|
- [ *"prune"*, *"exclude"* ]: vertex will not be included in the result and connected edges will not
|
|
be returned
|
|
- *expandFilter*: filter function applied on each edge/vertex combination determined by the expander.
|
|
The function signature is *function (config, vertex, edge, path)*. The function should return
|
|
*true* if the edge/vertex combination should be processed, and *false* if it should be ignored.
|
|
- *sort*: a filter function to determine the order in which connected edges are processed. The
|
|
function signature is *function (l, r)*. The function is required to return one of the following
|
|
values:
|
|
- *-1* if *l* should have a sort value less than *r*
|
|
- *1* if *l* should have a higher sort value than *r*
|
|
- *0* if *l* and *r* have the same sort value
|
|
- *strategy*: determines the visitation strategy. Possible values are *depthfirst* and *breadthfirst*.
|
|
- *order*: determines the visitation order. Possible values are *preorder*, *postorder*, and
|
|
*preorder-expander*. *preorder-expander* is the same as *preorder*, except that the signature of
|
|
the *visitor* function will change as described above.
|
|
- *itemOrder*: determines the order in which connections returned by the expander will be processed.
|
|
Possible values are *forward* and *backward*.
|
|
- *maxDepth*: if set to a value greater than *0*, this will limit the traversal to this maximum depth.
|
|
- *minDepth*: if set to a value greater than *0*, all vertices found on a level below the *minDepth*
|
|
level will not be included in the result.
|
|
- *maxIterations*: the maximum number of iterations that the traversal is allowed to perform. It is
|
|
sensible to set this number so unbounded traversals will terminate at some point.
|
|
- *uniqueness*: an object that defines how repeated visitations of vertices should be handled.
|
|
The *uniqueness* object can have a sub-attribute *vertices*, and a sub-attribute *edges*. Each
|
|
sub-attribute can have one of the following values:
|
|
- *"none"*: no uniqueness constraints
|
|
- *"path"*: element is excluded if it is already contained in the current path. This setting may be
|
|
sensible for graphs that contain cycles (e.g. A → B → C → A).
|
|
- *"global"*: element is excluded if it was already found/visited at any point during the traversal.
|
|
- *buildVertices*: this attribute controls whether vertices encountered during the traversal will be
|
|
looked up in the database and will be made available to visitor, filter, and expandFilter functions.
|
|
By default, vertices will be looked up and made available. However, there are some special use
|
|
cases when fully constructing vertex objects is not necessary and can be avoided. For example, if
|
|
a traversal is meant to only count the number of visited vertices but do not read any data from
|
|
vertices, this option might be set to *true*.
|