mirror of https://gitee.com/bigwinds/arangodb
650 lines
20 KiB
Plaintext
650 lines
20 KiB
Plaintext
!CHAPTER Getting started
|
|
|
|
To use a traversal object, we first need to require the *traversal* module:
|
|
|
|
```js
|
|
var traversal = require("org/arangodb/graph/traversal");
|
|
```
|
|
|
|
We then need to setup a configuration for the traversal and determine at which vertex to
|
|
start the traversal:
|
|
|
|
```js
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("world_graph"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.inboundExpander,
|
|
maxDepth: 1
|
|
};
|
|
|
|
var startVertex = db._document("v/world");
|
|
```
|
|
|
|
**Note**: The startVertex needs to be a document, not only a document id.
|
|
|
|
We can then create a traverser and start the traversal by calling its *traverse* method.
|
|
Note that *traverse* needs a *result* object, which it can modify in place:
|
|
|
|
```js
|
|
var result = {
|
|
visited: {
|
|
vertices: [ ],
|
|
paths: [ ]
|
|
}
|
|
};
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(result, startVertex);
|
|
```
|
|
|
|
Finally, we can print the contents of the *results* object, limited to the visited vertices.
|
|
We will only print the name and type of each visited vertex for brevity:
|
|
|
|
```js
|
|
require("internal").print(result.visited.vertices.map(function(vertex) {
|
|
return vertex.name + " (" + vertex.type + ")";
|
|
}));
|
|
```
|
|
|
|
|
|
The full script, which includes all steps carried out so far is thus:
|
|
|
|
```js
|
|
var traversal = require("org/arangodb/graph/traversal");
|
|
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("world_graph"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.inboundExpander,
|
|
maxDepth: 1
|
|
};
|
|
|
|
var startVertex = db._document("v/world");
|
|
var result = {
|
|
visited: {
|
|
vertices: [ ],
|
|
paths: [ ]
|
|
}
|
|
};
|
|
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(result, startVertex);
|
|
|
|
require("internal").print(result.visited.vertices.map(function(vertex) {
|
|
return vertex.name + " (" + vertex.type + ")";
|
|
}));
|
|
```
|
|
|
|
The result is a list of vertices that were visited during the traversal, starting at the
|
|
start vertex (i.e. *v/world* in our example):
|
|
|
|
```js
|
|
[
|
|
"World (root)",
|
|
"Africa (continent)",
|
|
"Asia (continent)",
|
|
"Australia (continent)",
|
|
"Europe (continent)",
|
|
"North America (continent)",
|
|
"South America (continent)"
|
|
]
|
|
```
|
|
|
|
**Note**: The result is limited to vertices directly connected to the start vertex. We
|
|
achieved this by setting the *maxDepth* attribute to *1*. Not setting it would return the
|
|
full list of vertices.
|
|
|
|
!SUBSECTION Traversal Direction
|
|
|
|
For the examples contained in this manual, we'll be starting the traversals at vertex
|
|
*v/world*. Vertices in our graph are connected like this:
|
|
|
|
```js
|
|
v/world <- is-in <- continent (Africa) <- is-in <- country (Algeria) <- is-in <- capital (Algiers)
|
|
```
|
|
|
|
To get any meaningful results, we must traverse the graph in inbound order. This means,
|
|
we'll be following all incoming edges of to a vertex. In the traversal configuration, we
|
|
have specified this via the *expander* attribute:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: traversal.inboundExpander
|
|
};
|
|
```
|
|
|
|
For other graphs, we might want to traverse via the outgoing edges. For this, we can
|
|
use the *outboundExpander*. There is also an *anyExpander*, which will follow both outgoing
|
|
and incoming edges. This should be used with care and the traversal should always be
|
|
limited to a maximum number of iterations (e.g. using the *maxIterations* attribute) in
|
|
order to terminate at some point.
|
|
|
|
To invoke the default outbound expander for a graph, simply use the predefined function:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: traversal.outboundExpander
|
|
};
|
|
```
|
|
|
|
Please note the outbound expander will not produce any output for the examples if we still
|
|
start the traversal at the *v/world* vertex.
|
|
|
|
Still, we can use the outbound expander if we start somewhere else in the graph, e.g.
|
|
|
|
```js
|
|
var traversal = require("org/arangodb/graph/traversal");
|
|
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("world_graph"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.outboundExpander
|
|
};
|
|
|
|
var startVertex = db._document("v/capital-algiers");
|
|
var result = {
|
|
visited: {
|
|
vertices: [ ],
|
|
paths: [ ]
|
|
}
|
|
};
|
|
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(result, startVertex);
|
|
|
|
require("internal").print(result.visited.vertices.map(function(vertex) {
|
|
return vertex.name + " (" + vertex.type + ")";
|
|
}));
|
|
```
|
|
|
|
The result is:
|
|
|
|
```js
|
|
[
|
|
"Algiers (capital)",
|
|
"Algeria (country)",
|
|
"Africa (continent)",
|
|
"World (root)"
|
|
]
|
|
```
|
|
|
|
which confirms that now we're going outbound.
|
|
|
|
!SUBSECTION Traversal Strategy
|
|
|
|
!SUBSUBSECTION Depth-first traversals
|
|
|
|
The visitation order of vertices is determined by the *strategy*, *order* attributes set
|
|
in the configuration. We chose *depthfirst* and *preorder*, meaning the traverser will
|
|
emit each vertex before handling connected edges (pre-order), and descend into any
|
|
connected edges before processing other vertices on the same level (depth-first).
|
|
|
|
Let's remove the *maxDepth* attribute now. We'll now be getting all vertices (directly
|
|
and indirectly connected to the start vertex):
|
|
|
|
```js
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("world_graph"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.inboundExpander
|
|
};
|
|
|
|
var result = {
|
|
visited: {
|
|
vertices: [ ],
|
|
paths: [ ]
|
|
}
|
|
};
|
|
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(result, startVertex);
|
|
|
|
require("internal").print(result.visited.vertices.map(function(vertex) {
|
|
return vertex.name + " (" + vertex.type + ")";
|
|
}));
|
|
```
|
|
|
|
The result will be a longer list, assembled in depth-first, pre-order order. For
|
|
each continent found, the traverser will descend into linked countries, and then into
|
|
the linked capital:
|
|
|
|
```js
|
|
[
|
|
"World (root)",
|
|
"Africa (continent)",
|
|
"Algeria (country)",
|
|
"Algiers (capital)",
|
|
"Angola (country)",
|
|
"Luanda (capital)",
|
|
"Botswana (country)",
|
|
"Gaborone (capital)",
|
|
"Burkina Faso (country)",
|
|
"Ouagadougou (capital)",
|
|
...
|
|
]
|
|
```
|
|
|
|
Let's switch the *order* attribute from *preorder* to *postorder*. This will make the
|
|
traverser emit vertices after all connected vertices were visited (i.e. most distant
|
|
vertices will be emitted first):
|
|
|
|
```js
|
|
[
|
|
"Algiers (capital)",
|
|
"Algeria (country)",
|
|
"Luanda (capital)",
|
|
"Angola (country)",
|
|
"Gaborone (capital)",
|
|
"Botswana (country)",
|
|
"Ouagadougou (capital)",
|
|
"Burkina Faso (country)",
|
|
"Bujumbura (capital)",
|
|
"Burundi (country)",
|
|
"Yaounde (capital)",
|
|
"Cameroon (country)",
|
|
"N'Djamena (capital)",
|
|
"Chad (country)",
|
|
"Yamoussoukro (capital)",
|
|
"Cote d'Ivoire (country)",
|
|
"Cairo (capital)",
|
|
"Egypt (country)",
|
|
"Asmara (capital)",
|
|
"Eritrea (country)",
|
|
"Africa (continent)",
|
|
...
|
|
]
|
|
```
|
|
|
|
!SUBSUBSECTION Breadth-first traversals
|
|
|
|
If we go back to *preorder*, but change the strategy to *breadth-first* and re-run the
|
|
traversal, we'll see that the return order changes, and items on the same level will be
|
|
returned adjacently:
|
|
|
|
```js
|
|
[
|
|
"World (root)",
|
|
"Africa (continent)",
|
|
"Asia (continent)",
|
|
"Australia (continent)",
|
|
"Europe (continent)",
|
|
"North America (continent)",
|
|
"South America (continent)",
|
|
"Burkina Faso (country)",
|
|
"Burundi (country)",
|
|
"Cameroon (country)",
|
|
"Chad (country)",
|
|
"Algeria (country)",
|
|
"Angola (country)",
|
|
...
|
|
]
|
|
```
|
|
|
|
**Note**: The order of items returned for the same level is undefined.
|
|
This is because there is no natural order of edges for a vertex with
|
|
multiple connected edges. To explicitly set the order for edges on the
|
|
same level, you can specify an edge comparator function with the *sort*
|
|
attribute:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
sort: function (l, r) { return l._key < r._key ? 1 : -1; }
|
|
...
|
|
};
|
|
```
|
|
|
|
The arguments l and r are edge documents.
|
|
This will traverse edges of the same vertex in backward *_key* order:
|
|
|
|
```js
|
|
[
|
|
"World (root)",
|
|
"South America (continent)",
|
|
"North America (continent)",
|
|
"Europe (continent)",
|
|
"Australia (continent)",
|
|
"Asia (continent)",
|
|
"Africa (continent)",
|
|
"Ecuador (country)",
|
|
"Colombia (country)",
|
|
"Chile (country)",
|
|
"Brazil (country)",
|
|
"Bolivia (country)",
|
|
"Argentina (country)",
|
|
...
|
|
]
|
|
```
|
|
|
|
**Note**: This attribute only works for the usual expanders
|
|
*traversal.inboundExpander*, *traversal.outboundExpander*,
|
|
*traversal.anyExpander* and their corresponding "WithLabels" variants.
|
|
If you are using custom expanders
|
|
you have to organize the sorting within the specified expander.
|
|
|
|
!SUBSUBSECTION Writing Custom Visitors
|
|
|
|
So far we have used much of the traverser's default functions. The traverser is very
|
|
configurable and many of the default functions can be overridden with custom functionality.
|
|
|
|
For example, we have been using the default visitor function (which is always used if
|
|
the configuration does not contain the *visitor* attribute). The default visitor function
|
|
is called for each vertex in a traversal, and will push it into the result.
|
|
This is the reason why the *result* variable looked different after the traversal, and
|
|
needed to be initialized before the traversal was started.
|
|
|
|
We can write our own visitor function if we want to. The general function signature for
|
|
visitor function is as follows:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
visitor: function (config, result, vertex, path) { ... }
|
|
};
|
|
```
|
|
|
|
Visitor functions are not expected to return any values. Instead, they can modify the
|
|
*result* variable (e.g. by pushing the current vertex into it), or do anything else.
|
|
For example, we can create a simple visitor function that only prints information about
|
|
the current vertex as we traverse:
|
|
|
|
```js
|
|
var config = {
|
|
datasource: traversal.generalGraphDatasourceFactory("world_graph"),
|
|
strategy: "depthfirst",
|
|
order: "preorder",
|
|
filter: traversal.visitAllFilter,
|
|
expander: traversal.inboundExpander,
|
|
visitor: function (config, result, vertex, path) {
|
|
require("internal").print("visiting vertex", vertex.name);
|
|
}
|
|
};
|
|
|
|
var traverser = new traversal.Traverser(config);
|
|
traverser.traverse(undefined, startVertex);
|
|
```js
|
|
|
|
!SUBSECTION Filtering Vertices and Edges
|
|
|
|
!SUBSUBSECTION Filtering Vertices
|
|
|
|
So far we have returned all vertices that were visited during the traversal. This is not
|
|
always required. If the result shall be restrict to just specific vertices, we can use a
|
|
filter function for vertices. It can be defined by setting the *filter* attribute of a
|
|
traversal configuration, e.g.:
|
|
|
|
```js
|
|
var config = {
|
|
filter: function (config, vertex, path) {
|
|
if (vertex.type !== 'capital') {
|
|
return 'exclude';
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
The above filter function will exclude all vertices that do not have a *type* value of
|
|
*capital*. The filter function will be called for each vertex found during the traversal.
|
|
It will receive the traversal configuration, the current vertex, and the full path from
|
|
the traversal start vertex to the current vertex. The path consists of a list of edges,
|
|
and a list of vertices. We could also filter everything but capitals by checking the
|
|
length of the path from the start vertex to the current vertex. Capitals will have a
|
|
distance of 3 from the *v/world* start vertex
|
|
(capital -> is-in -> country -> is-in -> continent -> is-in -> world):
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
filter: function (config, vertex, path) {
|
|
if (path.edges.length < 3) {
|
|
return 'exclude';
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
**Note**: If a filter function returns nothing (or *undefined*), the current vertex
|
|
will be included, and all connected edges will be followed. If a filter function
|
|
returns *exclude* the current vertex will be excluded from the result, and all still
|
|
all connected edges will be followed. If a filter function returns *prune*, the
|
|
current vertex will be included, but no connected edges will be followed.
|
|
|
|
For example, the following filter function will not descend into connected edges of
|
|
continents, limiting the depth of the traversal. Still, continent vertices will be
|
|
included in the result:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
filter: function (config, vertex, path) {
|
|
if (vertex.type === 'continent') {
|
|
return 'prune';
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
It is also possible to combine *exclude* and *prune* by returning a list with both
|
|
values:
|
|
|
|
```js
|
|
return [ 'exclude', 'prune' ];
|
|
```
|
|
|
|
!SUBSECTION Filtering Edges
|
|
|
|
It is possible to exclude certain edges from the traversal. To filter on edges, a
|
|
filter function can be defined via the *expandFilter* attribute. The *expandFilter*
|
|
is a function which is called for each edge during a traversal.
|
|
|
|
It will receive the current edge (*edge* variable) and the vertex which the edge
|
|
connects to (in the direction of the traversal). It also receives the current path
|
|
from the start vertex up to the current vertex (excluding the current edge and the
|
|
vertex the edge points to).
|
|
|
|
If the function returns *true*, the edge will be followed. If the function returns
|
|
*false*, the edge will not be followed.
|
|
Here is a very simple custom edge filter function implementation, which simply
|
|
includes edges if the (edges) path length is less than 1, and will exclude any
|
|
other edges. This will effectively terminate the traversal after the first level
|
|
of edges:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expandFilter: function (config, vertex, edge, path) {
|
|
return (path.edges.length < 1);
|
|
}
|
|
};
|
|
```
|
|
|
|
!SUBSECTION Writing Custom Expanders
|
|
|
|
The edges connected to a vertex are determined by the expander. So far we have used a
|
|
default expander (the default inbound expander to be precise). The default inbound
|
|
expander simply enumerates all connected ingoing edges for a vertex, based on the
|
|
[edge collection](../Glossary/README.html#edge_collection) specified in the traversal configuration.
|
|
|
|
There is also a default outbound expander, which will enumerate all connected outgoing
|
|
edges. Finally, there is an any expander, which will follow both ingoing and outgoing
|
|
edges.
|
|
|
|
If connected edges must be determined in some different fashion for whatever reason, a
|
|
custom expander can be written and registered by setting the *expander* attribute of the
|
|
configuration. The expander function signature is as follows:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: function (config, vertex, path) { ... }
|
|
}
|
|
```
|
|
|
|
It is the expander's responsibility to return all edges and vertices directly
|
|
connected to the current vertex (which is passed via the *vertex* variable).
|
|
The full path from the start vertex up to the current vertex is also supplied via
|
|
the *path* variable.
|
|
An expander is expected to return a list of objects, which need to have an *edge*
|
|
and a *vertex* attribute each.
|
|
|
|
**Note**: If you want to rely on a particular order in which the edges
|
|
are traversed, you have to sort the edges returned by your expander
|
|
within the code of the expander. The functions to get outbound, inbound
|
|
or any edges from a vertex do not guarantee any particular order!
|
|
|
|
A custom implementation of an inbound expander could look like this (this is a
|
|
non-deterministic expander, which randomly decides whether or not to include
|
|
connected edges):
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: function (config, vertex, path) {
|
|
var connections = [ ];
|
|
var datasource = config.datasource;
|
|
datasource.getInEdges(vertex._id).forEach(function (edge) {
|
|
if (Math.random() >= 0.5) {
|
|
connections.push({ edge: edge, vertex: (edge._from) });
|
|
}
|
|
});
|
|
return connections;
|
|
}
|
|
};
|
|
```
|
|
|
|
A custom expander can also be used as an edge filter because it has full control
|
|
over which edges will be returned.
|
|
|
|
Following are two examples of custom expanders that pick edges based on attributes
|
|
of the edges and the connected vertices.
|
|
|
|
Finding the connected edges / vertices based on an attribute *when* in the
|
|
connected vertices. The goal is to follow the edge that leads to the vertex
|
|
with the highest value in the *when* attribute:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: function (config, vertex, path) {
|
|
var datasource = config.datasource;
|
|
// determine all outgoing edges
|
|
var outEdges = datasource.getOutEdges(vertex);
|
|
|
|
if (outEdges.length === 0) {
|
|
return [ ];
|
|
}
|
|
|
|
var data = [ ];
|
|
outEdges.forEach(function (edge) {
|
|
data.push({ edge: edge, vertex: datasource.getInVertex(edge) });
|
|
});
|
|
|
|
// sort outgoing vertices according to "when" attribute value
|
|
data.sort(function (l, r) {
|
|
if (l.vertex.when === r.vertex.when) {
|
|
return 0;
|
|
}
|
|
|
|
return (l.vertex.when < r.vertex.when ? 1 : -1);
|
|
});
|
|
|
|
// pick first vertex found (with highest "when" attribute value)
|
|
return [ data[0] ];
|
|
}
|
|
...
|
|
};
|
|
```
|
|
|
|
Finding the connected edges / vertices based on an attribute *when* in the
|
|
edge itself. The goal is to pick the one edge (out of potentially many) that
|
|
has the highest *when* attribute value:
|
|
|
|
```js
|
|
var config = {
|
|
...
|
|
expander: function (config, vertex, path) {
|
|
var datasource = config.datasource;
|
|
// determine all outgoing edges
|
|
var outEdges = datasource.getOutEdges(vertex);
|
|
|
|
if (outEdges.length === 0) {
|
|
return [ ]; // return an empty list
|
|
}
|
|
|
|
// sort all outgoing edges according to "when" attribute
|
|
outEdges.sort(function (l, r) {
|
|
if (l.when === r.when) {
|
|
return 0;
|
|
}
|
|
return (l.when < r.when ? -1 : 1);
|
|
});
|
|
|
|
// return first edge (the one with highest "when" value)
|
|
var edge = outEdges[0];
|
|
try {
|
|
var v = datasource.getInVertex(edge);
|
|
return [ { edge: edge, vertex: v } ];
|
|
}
|
|
catch (e) { }
|
|
|
|
return [ ];
|
|
}
|
|
...
|
|
};
|
|
```
|
|
|
|
!SUBSECTION Configuration Overview
|
|
|
|
This section summarizes the configuration attributes for the traversal object. The
|
|
configuration can consist of the following attributes:
|
|
|
|
- *visitor*: visitor function for vertices. The function signature is *function (config, result, vertex, path)*.
|
|
This function is not expected to return a value, but may modify the *variable* as needed
|
|
(e.g. by pushing vertex data into the result).
|
|
- *expander*: expander function that is responsible for returning edges and vertices
|
|
directly connected to a vertex . The function signature is *function (config, vertex, path)*.
|
|
The expander function is required to return a list of connection objects, consisting of an
|
|
*edge* and *vertex* attribute each.
|
|
- *filter*: vertex filter function. The function signature is *function (config, vertex, path)*. It
|
|
may return one of the following values:
|
|
- *undefined*: vertex will be included in the result and connected edges will be traversed
|
|
- *exclude*: vertex will not be included in the result and connected edges will be traversed
|
|
- *prune*: vertex will be included in the result but connected edges will not be traversed
|
|
- [ *prune*, *exclude* ]: vertex will not be included in the result and connected edges will not
|
|
be returned
|
|
- *expandFilter*: filter function applied on each edge/vertex combination determined by the expander.
|
|
The function signature is *function (config, vertex, edge, path)*. The function should return
|
|
*true* if the edge/vertex combination should be processed, and *false* if it should be ignored.
|
|
- *sort*: a filter function to determine the order in which connected edges are processed. The
|
|
function signature is *function (l, r)*. The function is required to return one of the following
|
|
values:
|
|
- *-1* if *l* should have a sort value less than *r*
|
|
- *1* if *l* should have a higher sort value than *r*
|
|
- *0* if *l* and *r* have the same sort value
|
|
- *strategy*: determines the visitation strategy. Possible values are *depthfirst* and *breadthfirst*.
|
|
- *order*: determines the visitation order. Possible values are *preorder* and *postorder*.
|
|
- *itemOrder*: determines the order in which connections returned by the expander will be processed.
|
|
Possible values are *forward* and *backward*.
|
|
- *maxDepth*: if set to a value greater than *0*, this will limit the traversal to this maximum depth.
|
|
- *minDepth*: if set to a value greater than *0*, all vertices found on a level below the *minDepth*
|
|
level will not be included in the result.
|
|
- *maxIterations*: the maximum number of iterations that the traversal is allowed to perform. It is
|
|
sensible to set this number so unbounded traversals will terminate at some point.
|
|
- *uniqueness*: an object that defines how repeated visitations of vertices should be handled.
|
|
The *uniqueness* object can have a sub-attribute *vertices*, and a sub-attribute *edges*. Each
|
|
sub-attribute can have one of the following values:
|
|
- *none*: no uniqueness constraints
|
|
- *path*: element is excluded if it is already contained in the current path. This setting may be
|
|
sensible for graphs that contain cycles (e.g. A -> B -> C -> A).
|
|
- *global*: element is excluded if it was already found/visited at any point during the traversal.
|
|
|