1
0
Fork 0
arangodb/arangod/Pregel
jsteemann 44c7b1b476 remove tabstops 2018-07-16 15:00:12 +02:00
..
Algos remove tabstops 2018-07-16 15:00:12 +02:00
examples
Aggregator.h
AggregatorHandler.cpp
AggregatorHandler.h
AlgoRegistry.cpp
AlgoRegistry.h
Algorithm.h
CommonFormats.h
Conductor.cpp
Conductor.h
Graph.h
GraphFormat.h
GraphSerializer.h
GraphStore.cpp
GraphStore.h
IncomingCache.cpp
IncomingCache.h
Iterators.h
MasterContext.h
MessageCombiner.h
MessageFormat.h
OutgoingCache.cpp
OutgoingCache.h
PregelFeature.cpp
PregelFeature.h
README.md
Recovery.cpp
Recovery.h
Statistics.h
TypedBuffer.h
Utils.cpp
Utils.h
VertexComputation.h
Worker-templates-algorithms.cpp
Worker-templates-native-types.cpp
Worker.cpp
Worker.h
WorkerConfig.cpp
WorkerConfig.h
WorkerContext.h

README.md

ArangoDB-Logo

Pregel Subsystem

The pregel subsystem implements a variety of different grapg algorithms, this readme is more intended for internal use.

Protocol

Message format between DBServers:

{sender:"someid", executionNumber:1337, globalSuperstep:123, messages: [, , vertexID2, ] } Any type of slice is supported

Useful Commands

Import graph e.g. https://github.com/arangodb/example-datasets/tree/master/Graphs/1000 First rename the columns '_key', '_from', '_to' arangoimport will keep those.

In arangosh:

db._create('vertices', {numberOfShards: 2});
db._createEdgeCollection('alt_edges');
db._createEdgeCollection('edges', {numberOfShards: 2, shardKeys:["_vertex"], distributeShardsLike:'vertices'});

arangoimport --file generated_vertices.csv --type csv --collection vertices --overwrite true --server.endpoint http+tcp://127.0.0.1:8530

Or: for(var i=0; i < 5000; i++) db.vertices.save({_key:i+""});

arangoimport --file generated_edges.csv --type csv --collection alt_edges --overwrite true --from-collection-prefix "vertices" --to-collection-prefix "vertices" --convert false --server.endpoint http+tcp://127.0.0.1:8530

AQL script to copy edge collection into one with '_vertex':

FOR doc IN alt_edges INSERT {_vertex:SUBSTRING(doc._from,FIND_FIRST(doc._from,"/")+1), _from:doc._from, _to:doc._to} IN edges LET values = ( FOR s IN vertices RETURN s.result ) RETURN SUM(values)

AWK Scripts

Make CSV file with IDs unique cat edges.csv | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | awk '!seen[$0]++' > vertices.csv

Make CSV file with arango compatible edges

cat edges.csv | awk -F" " '{print "profiles/" $1 "\tprofiles/" $2 "\t" $1}' >> arango-edges.csv

arangoimport --file vertices.csv --type csv --collection twitter_v --overwrite true --convert false --server.endpoint http+tcp://127.0.0.1:8530 -c none arangoimport --file arango-edges.csv --type csv --collection twitter_e --overwrite true --convert false --separator "\t" --server.endpoint http+tcp://127.0.0.1:8530 -c none