1
0
Fork 0
arangodb/Documentation/Books/Manual/Graphs/Pregel
Simon Grätzer 43ad534142 Started README, threshold for page rank 2017-01-31 18:22:50 +01:00
..
README.mdpp

README.mdpp

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Distributed Iterative Graph Processing (Pregel)
======

Distributed graph processing enables you to do online analytical processing
directly on graphs stored into arangodb. This is intended to help you gain analytical insights
on your data, without having to use external processing sytems. Examples of algorithms
to execute are PageRank, Vertex Centrality, Vertex Closeness, Connected Components.

The processing system inside ArangoDB is based on: [Pregel: A System for Large-Scale Graph Processing](http://www.dcs.bbk.ac.uk/~dell/teaching/cc/paper/sigmod10/p135-malewicz.pdf)  Malewicz et al. (Google) 2010
This concept enables us to perform distributed graph processing, without the need for distributed global locking.


### Requirements

to enable iterative graph processing for your data, you will need to ensure
that your vertex and edge collections are sharded in a specific way.

The pregel computing model requires all edges to be present on the DB Server where
the vertex document identified by the '_from' value is located.
This means the vertex collections need to be sharded by '_key' and the edge collection
will need to be sharded after an attribute which always contains the '_key' of the vertex.
Our implementation currently requires every edge collection to be sharded after a "vertex" attributes,
additionally you will need to specify the key ```distributeShardsLike``` to ensure that all shards
use the same salt when consistent hashing the sharding attributes.


* Create main vertex collection: ```db._create("vertices", {shardKeys:['_key']});```
* Optionally create additional vertex collections ```db._create("additonal", {distributeShardsLike:"vertices"});```
* Create (one or more) edge-collections: ```db._createEdgeCollection("edges", {shardKeys:['vertex'], distributeShardsLike:"vertices"});```


You will need to ensure that edge documents contain the proper values in their sharding attribute.
For a vertex document in collection "vertices"  ```{_key:"A", value:0}```
the edge documents will have look like this:


  {_from:"vertices/A", _to: "vertices/B", vertex:"A"}
  {_from:"vertices/A", _to: "vertices/C", vertex:"A"}
  {_from:"vertices/A", _to: "vertices/D", vertex:"A"}
  ...


This will ensure that during an insert of an edge document, the document will be placed in a shard
on the proper DBServer. There is no restriction on additional attributes in your documents.

### Arangosh API

#### Starting an Algorithm Execution

The pregel API is accessible through the ```@arangodb/pregel``` package.

  var pregel = require("@arangodb/pregel");
  var params = {};
  var execution = pregel.start("<algorithm>", "<graph>", params);

The code returned uniquely identifies the algorithm execution.


#### Status of an Algorithm Execution

The code returned by the ```pregel.start(...)``` method can be used to
track the status of your algorithm.


  var execution = pregel.start("sssp", "demograph", {source: "vertices/V"});
  var status = pregel.status(execution);


The result will give your some information on the state of the algorithm. The status might
look something like this:

  {
    "state" : "running",
    "gss" : 12,
    "totalRuntime" : 123.23,
    "aggregators" : {
      "converged" : false,
      "max" : true,
      "phase" : 2
    },
    "sendCount" : 3240364978,
    "receivedCount" : 3240364975
  }



### Available Algorithms

#### Page Rank

var pregel = require("@arangodb/pregel");
pregel.start("pagerank", "graphname", )


#### Single-Source Shortest Path

  var pregel = require("@arangodb/pregel");
  pregel.start("sssp", "graphname", {source:"vertices/1337"})