arangodb

History

Alan Plum 8853ede355 org/arangodb -> @arangodb		2015-12-15 15:51:44 +01:00
..
Authentication.mdpp	Some changes to the comments in the code	2014-06-29 03:34:22 +02:00
FirewallSetup.mdpp	Even more syntax changes in gitbook	2014-06-06 16:28:17 +02:00
HowTo.mdpp	org/arangodb -> @arangodb	2015-12-15 15:51:44 +01:00
README.mdpp	Documentation: corrected typos and case, prefer American over British English	2015-09-01 17:19:13 +02:00
StatusOfImplementation.mdpp	Fix dangling anchors and add checker script to the make target.	2015-11-26 18:51:39 +01:00

README.mdpp

!CHAPTER Sharding

Sharding allows to use multiple machines to run a cluster of ArangoDB
instances that together constitute a single database. This enables
you to store much more data, since ArangoDB distributes the data 
automatically to the different servers. In many situations one can 
also reap a benefit in data throughput, again because the load can
be distributed to multiple machines.

In a cluster there are essentially two types of processes: "DBservers"
and "coordinators". The former actually store the data, the latter
expose the database to the outside world. The clients talk to the
coordinators exactly as they would talk to a single ArangoDB instance 
via the REST interface. The coordinators know about the configuration of 
the cluster and automatically forward the incoming requests to the
right DBservers.

As a central highly available service to hold the cluster configuration
and to synchronize reconfiguration and fail-over operations we currently
use a an external program called *etcd* (see [Github
page](https://github.com/coreos/etcd)). It provides a hierarchical
key value store with strong consistency and reliability promises.
This is called the "agency" and its processes are called "agents".

All this is admittedly a relatively complicated setup and involves a lot
of steps for the startup and shutdown of clusters. Therefore we have created
convenience functionality to plan, setup, start and shutdown clusters.

The whole process works in two phases, first the "planning" phase and
then the "running" phase. In the planning phase it is decided which
processes with which roles run on which machine, which ports they use,
where the central agency resides and what ports its agents use. The
result of the planning phase is a "cluster plan", which is just a
relatively big data structure in JSON format. You can then use this
cluster plan to startup, shutdown, check and cleanup your cluster.

This latter phase uses so-called "dispatchers". A dispatcher is yet another
ArangoDB instance and you have to install exactly one such instance on
every machine that will take part in your cluster. No special
configuration whatsoever is needed and you can organize authentication
exactly as you would in a normal ArangoDB instance. You only have
to activate the dispatcher functionality in the configuration file
(see options *cluster.disable-dispatcher-kickstarter* and
*cluster.disable-dispatcher-interface*, which are both initially
set to *true* in the standard setup we ship).

However, you can use any of these dispatchers to plan and start your
cluster. In the planning phase, you have to tell the planner about all
dispatchers in your cluster and it will automatically distribute your
agency, DBserver and coordinator processes amongst the dispatchers. The
result is the cluster plan which you feed into the kickstarter. The
kickstarter is a program that actually uses the dispatchers to
manipulate the processes in your cluster. It runs on one of the
dispatchers, which analyses the cluster plan and executes those actions,
for which it is personally responsible, and forwards all other actions
to the corresponding dispatchers. This is possible, because the cluster
plan incorporates the information about all dispatchers.

We also offer a graphical user interface to the cluster planner and
dispatcher.