1
0
Fork 0
arangodb/arangod/Pregel
Simon Grätzer afaab2e8d5 Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
..
Algos Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
examples
Aggregator.h Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
AggregatorHandler.cpp fixing aggregators 2017-01-23 18:20:30 +01:00
AggregatorHandler.h fixing aggregators 2017-01-23 18:20:30 +01:00
AlgoRegistry.cpp Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
AlgoRegistry.h
Algorithm.h Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
CommonFormats.h Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
Conductor.cpp Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
Conductor.h Aggregator refactoring 2017-01-20 14:42:01 +01:00
Graph.h Fixed PageRank 2017-01-24 02:04:53 +01:00
GraphFormat.h Connected components, removing some template parameters 2017-01-23 12:24:00 +01:00
GraphStore.cpp Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
GraphStore.h Connected components, removing some template parameters 2017-01-23 12:24:00 +01:00
IncomingCache.cpp Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
IncomingCache.h Adaptive message buffers, Optimized message format 2017-01-22 15:53:11 +01:00
Iterators.h Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
MasterContext.h Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
MemoryMapped.cpp
MemoryMapped.h
MessageCombiner.h
MessageFormat.h Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
OutgoingCache.cpp Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
OutgoingCache.h Fixing message counting 2017-01-19 18:19:44 +01:00
PregelFeature.cpp fixing aggregators 2017-01-23 18:20:30 +01:00
PregelFeature.h
README.md
Recovery.cpp Fixed pregel API 2017-01-21 19:00:37 +01:00
Recovery.h Fixed pregel API 2017-01-21 19:00:37 +01:00
Statistics.h
Utils.cpp Adaptive message buffers, Optimized message format 2017-01-22 15:53:11 +01:00
Utils.h Adaptive message buffers, Optimized message format 2017-01-22 15:53:11 +01:00
VertexComputation.h Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
Worker.cpp Working on PageRank and SCC 2017-01-24 16:37:21 +01:00
Worker.h Adaptive message buffers, Optimized message format 2017-01-22 15:53:11 +01:00
WorkerConfig.cpp
WorkerConfig.h
WorkerContext.h

README.md

ArangoDB-Logo

Pregel Subsystem

Protocol

Message format between DBServers:

{sender:"someid", executionNumber:1337, globalSuperstep:123, messages: [, , vertexID2, ] } Any type of slice is supported

Useful Commands

Import graph e.g. https://github.com/arangodb/example-datasets/tree/master/Graphs/1000 First rename the columns '_key', '_from', '_to' arangoimp will keep those.

In arangosh:

db._create('vertices', {numberOfShards: 2});
db._createEdgeCollection('alt_edges');
db._createEdgeCollection('edges', {numberOfShards: 2, shardKeys:["_vertex"], distributeShardsLike:'vertices'});

arangoimp --file generated_vertices.csv --type csv --collection vertices --overwrite true --server.endpoint http+tcp://127.0.0.1:8530

Or: for(var i=0; i < 5000; i++) db.vertices.save({_key:i+""});

arangoimp --file generated_edges.csv --type csv --collection alt_edges --overwrite true --from-collection-prefix "vertices" --to-collection-prefix "vertices" --convert false --server.endpoint http+tcp://127.0.0.1:8530

AQL script to copy edge collection into one with '_vertex':

FOR doc IN alt_edges INSERT {_vertex:SUBSTRING(doc._from,FIND_FIRST(doc._from,"/")+1), _from:doc._from, _to:doc._to} IN edges LET values = ( FOR s IN vertices RETURN s.result ) RETURN SUM(values)

AWK Scripts

Make CSV file with IDs unique cat edges.csv | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | awk '!seen[$0]++' > vertices.csv

Make CSV file with arango compatible edges

cat edges.csv | awk -F" " '{print "profiles/" $1 "\tprofiles/" $2 "\t" $1}' >> edges.csv

arangoimp --file twitter-vertices.csv --type csv --collection twitter_v --overwrite true --convert false --server.endpoint http+tcp://127.0.0.1:8530 arangoimp --file arango-twitter-edges.csv --type csv --collection twitter_e --overwrite true --convert false --separator "\t" --server.endpoint http+tcp://127.0.0.1:8530