1
0
Fork 0
arangodb/Documentation/Books/Users/Replication
Jan Steemann d92057dd03 the great rename: array => object, list => array 2014-12-18 22:33:23 +01:00
..
Components.mdpp the great rename: array => object, list => array 2014-12-18 22:33:23 +01:00
ExampleSetup.mdpp updated replication manual 2014-07-06 15:25:36 +02:00
Limitations.mdpp include or exclude system collections from replication 2014-12-05 14:58:43 +01:00
Overhead.mdpp updated replication manual 2014-07-06 15:25:36 +02:00
README.mdpp updated replication manual 2014-07-06 15:25:36 +02:00

README.mdpp

!CHAPTER Introduction to Replication

!SUBSECTION How the replication works

Starting with ArangoDB 1.4, ArangoDB comes with optional asynchronous master-slave 
replication. Replication is configured on a per-database level, meaning that 
different databases in the same ArangoDB instance can have different replication
settings. Replication must be turned on explicitly before it becomes active for a
database.

In a typical master-slave replication setup, clients direct *all* their write 
operations for a specific database to the master. The master database is the only 
place to connect to when making any insertions/updates/deletions.

The master database will log all write operations in its write-ahead log.
Any number of slaves can then connect to the master database and fetch data from the
master database's write-ahead log. The slaves then can apply all the events from the log in 
the same order locally. After that, they will have the same state of data as the master
database.

Transactions are honored in replication, i.e. transactional write operations will 
become visible on slaves atomically.

As all write operations will be logged to a master database's write-ahead log, the 
replication in ArangoDB currently cannot be used for write-scaling. The main purposes 
of the replication in current ArangoDB are to provide read-scalability and "hot backups" 
of specific databases.

It is possible to connect multiple slave databases to the same master database. Slave
databases should be used as read-only instances, and no user-initiated write operations 
should be carried out on them. Otherwise data conflicts may occur that cannot be solved 
automatically, and that will make the replication stop. Master-master (or multi-master) 
replication is not currently supported in ArangoDB.

The replication in ArangoDB is asynchronous, meaning that slaves will *pull* changes 
from the master database. Slaves need to know to which master database they should 
connect to, but a master database is not aware of the slaves that replicate from it. 
When the network connection between the master database and a slave goes down, write 
operations on the master can continue normally. When the network is up again, slaves 
can reconnect to the master database and transfer the remaining changes. This will 
happen automatically provided slaves are configured appropriately.


!SUBSECTION Replication lag

In this setup, write operations are applied first in the master database, and applied 
in the slave database(s) afterwards. 

For example, let's assume a write operation is executed in the master database 
at point in time t0. To make a slave database apply the same operation, it must first 
fetch the write operation's data from master database's write-ahead log, then parse it and 
apply it locally. This will happen at some point in time after t0, let's say t1. 

The difference between t1 and t0 is called the *replication lag*, and it is unavoidable 
in asynchronous replication. The amount of replication lag depends on many factors, a 
few of which are:

* the network capacity between the slaves and the master
* the load of the master and the slaves
* the frequency in which slaves poll the master for updates

Between t0 and t1, the state of data on the master is newer than the state of data
on the slave(s). At point in time t1, the state of data on the master and slave(s)
is consistent again (provided no new data modifications happened on the master in
between). Thus, the replication will lead to an *eventually consistent* state of data.

!SUBSECTION Replication configuration

The replication is turned off by default. In order to create a master-slave setup,
the so-called *replication applier* needs to be enabled on both on a slave databases.
Since ArangoDB 2.2, the master database does not need any special configuration, as
it will automatically log all data modification operations into its write-ahead log.
The write-ahead log is then used in replication. Previous versions of ArangoDB
required the so-called *replication logger* to be activated on the master, too. The
replication logger does not have any purpose in ArangoDB 2.2 and higher, and the
object is kept for compatibility with previous versions only.

Replication is configured on a per-database level. If multiple database are to be 
replicated, the replication must be set up individually per database.

The replication applier on the slave can be used to perform a one-time synchronisation
with the master (and then stop), or to perform an ongoing replication of changes. To
resume replication on slave restart, the *autoStart* attribute of the replication 
applier must be set to true.