arangodb/Documentation/Books/Users/Replication/Components.mdpp

!CHAPTER Components

The replication architecture in ArangoDB 1.4 consists of two main components, which
can be used together or in isolation: the *replication logger* and the *replication applier*.

Both are available in ArangoDB 1.4 and can be administered via the command line or
a REST API (see [HTTP Interface for Replication](../HttpReplications/README.md)).

In most cases, the *replication logger* will be run inside a master database, and the
*replication applier* will be executed in a slave database.

As there can be multiple database inside an ArangoDB instance, there can be multiple
replication loggers and multiple replication appliers per ArangoDB instance.

!SUBSECTION Replication Logger

!SUBSUBSECTION Purpose

The purpose of the replication logger is to log all changes that modify the state
of data in a specific database. This includes document insertions, updates, and deletions.
It also includes creating, dropping, renaming and changing collections and indexes.

When the replication logger is used, it will log all these write operations for the
database in its *event log*, in the same order in which the operations were carried out
originally.

Reading the event log sequentially provides a list of all write operations carried out
in the database. Replication clients can request certain log events from the logger.

For example, ArangoDB's replication applier will permanently query the latest events
from the replication logger. The applier of the slave database will then apply all changes
locally to get to the same state of data as the master database. It will keep track of
which parts of the event log it already synchronized, meaning that it will perform an
incremental synchronization.

Technically, the event log is a system collection named *_replication* inside the database
the applier runs in. The event log is persisted and will still be present after a server
shutdown or crash. The event log's underlying collection should not be modified by users
directly. It should only be accessed using the special API methods offered by ArangoDB.

!SUBSUBSECTION Starting and Stopping

ArangoDB will only log changes if the replication logger is turned on for the specific
database. Should there be any write operations in the database while the replication logger
is turned off, these events will be lost for replication.

To turn on the replication logger for a database once, the following command can be executed:

    require("org/arangodb/replication").logger.start();

Note that starting the replication logger does not necessarily mean it will be started
automatically on all following server startups. This can be configured separately (keep on
reading for this).

Please note that the above command turns on the replication logger for the current database
only, but not for all databases. To turn on the replication logger for multiple databases,
the above command must be executed separately for each database.

To turn the replication logger off for the current database, execute the *stop* command:

    require("org/arangodb/replication").logger.stop();

This has turned off the logger, but still the logger may be started again automatically
the next time the ArangoDB server is started. This can be configured separately (again,
keep on reading).

To determine the current state of the replication logger in the current database (including
whether it is currently running or not), use the *state* command:

    require("org/arangodb/replication").logger.state();

The result might look like this:

    {
      "state" : {
        "running" : false,
        "lastLogTick" : "255376126918573",
        "totalEvents" : 0,
        "time" : "2013-08-02T11:01:28Z"
      },
      "server" : {
        "version" : "1.4.devel",
        "serverId" : "53904504772335"
      },
      "clients" : [ ]
    }

The *running* attribute indicates whether the logger is currently enabled and will
log any events. The *totalEvents* attribute will indicate how many log events have been
logged since the start of the ArangoDB server. The value will not be reset between
multiple stops and restarts of the logger. Finally, the *lastLogTick* value will indicate
the id of the last event that was logged. It can be used to determine whether new
events were logged, and is also used by the replication applier for incremental
fetching of data.

Note: the replication logger state can also be queried via the HTTP API (see @ref HttpReplication).

!SUBSUBSECTION Configuration

To determine whether the current database's replication logger is automatically started
when the ArangoDB server is started, the logger has a separate configuration. The
configuration is stored in a file *REPLICATION-LOGGER-CONFIG* inside the current database's
directory. If it does not exist, ArangoDB will use default values.

To check and adjust the configuration of the replication logger for the current database,
use the *properties* command. To view the current configuration, use *properties* without
any arguments:

    require("org/arangodb/replication").logger.properties();

The result might look like this:

    {
      "autoStart" : false,
      "logRemoteChanges" : false,
      "maxEvents" : 1048576,
      "maxEventsSize" : 134217728
    }

The *autoStart* attribute indicates whether the current database's replication logger is
automatically started whenever the ArangoDB server is started. You may want to set it to
*true* for a proper master setup. To do so, supply the updated attributes to the *properties*
command, e.g.:

    require("org/arangodb/replication").logger.properties({
      autoStart: true
    });

This will ensure the replication logger for the current database is automatically started
on all following ArangoDB starts unless *autoStart* is set to *false* again.

Note that only those attributes will be changed that you supplied in the argument to *properties*.
All other configuration values will remain unchanged. Also note that the replication logger
can be reconfigured while it is running.

It a replication logger is turned on, the event log may be allowed to grow indefinitely.
It may be sensible to set a maximum size or a maximum number of events to keep.
If one of these thresholds is reached during logging, the replication logger will start
removing the oldest events from the event log automatically.

The thresholds are set by adjusting the *maxEvents* and *maxEventsSize* attributes via the
*properties* command. The *maxEvents* attribute is the maximum number of events to keep
in the event log before starting to remove oldest events. The *maxEventsSize* is the maximum
cumulated size of event log data (in bytes) that is kept in the log before starting to remove
oldest events. If both are set to a value of *0*, it means that the number and size
of log events to keep is unrestricted. If both are set to a non-zero value, it means
that there are restrictions on both the number and cumulated size of events to keep,
and if either of the restrictions is hit, the deletion will be triggered.

The following command will set the threshold to 5000 events to keep, irrespective of the
cumulated event data sizes:

    require("org/arangodb/replication").logger.properties({
      maxEvents: 5000,
      maxEventsSize: 0
    });


!SUBSECTION Replication Applier

!SUBSUBSECTION Purpose

The purpose of the replication applier is to read data from a master database's event log,
and apply them locally. The applier will check the master database for new events periodically.
It will perform an incremental synchronization, i.e. only asking the master for events
that occurred after the last synchronization.

The replication applier does not get notified by the master database when there are "new"
events, but uses the pull principle. It might thus take some time (the *replication lag*)
before an event from the master database gets shipped to and applied in a slave database.

The replication applier of a database is run in a separate thread. It may encounter problems
when a log event from the master cannot be applied safely, or when the connection to the master
database goes down (network outage, master database is down or unavailable etc.). In this case,
the database's replication applier thread might terminate itself. It is then up to the
administrator to fix the problem and restart the database's replication applier.

If the replication applier cannot connect to the master database, or the communication fails at
some point during the synchronization, the replication applier will try to reconnect to
the master database. It will give up reconnecting only after a configurable amount of connection
attempts.

The replication applier state is queryable at any time by using the *state* command of the
applier. This will return the state of the applier of the current database:

    require("org/arangodb/replication").applier.state();

The result might look like this:

    {
      "state" : {
        "running" : false,
        "lastAppliedContinuousTick" : "231848832948633",
        "lastProcessedContinuousTick" : "231848832948633",
        "lastAvailableContinuousTick" : null,
        "progress" : {
          "time" : "2013-08-02T11:40:08Z",
          "message" : "applier created",
          "failedConnects" : 0
        },
        "totalRequests" : 0,
        "totalFailedConnects" : 0,
        "totalEvents" : 0,
        "lastError" : {
          "errorNum" : 0
        },
        "time" : "2013-08-02T11:40:22Z"
      },
      "server" : {
        "version" : "1.4.devel",
        "serverId" : "53904504772335"
      },
      "endpoint" : "tcp://master.domain.org:8529"
    }

The *running* attribute indicates whether the replication applier of the current database
is currently running and polling the server at *endpoint* for new events.

The *failedConnects* attribute shows how many failed connection attempts the replication
applier currently has encountered in a row. In contrast, the *totalFailedConnects* attribute
indicates how many failed connection attempts the applier has made in total. The
*totalRequest* attribute shows how many requests the applier has sent to the master database
in total. The *totalEvents* attribute shows how many log events the applier has read from the
master.

The *message* sub-attribute of the *progress* sub-attribute gives a brief hint of what the
applier currently does (if it is running). The *lastError* attribute also has an optional
*errorMessage* sub-attribute, showing the latest error message. The *errorNum* sub-attribute of the
*lastError* attribute can be used by clients to programmatically check for errors. It should be *0*
if there is no error, and it should be non-zero when the applier terminated itself due to
a problem.

Here is an example of the state after the replication applier terminated itself due to
(repeated) connection problems:

    {
      "state" : {
        "running" : false,
        "progress" : {
          "time" : "2013-08-02T12:40:25Z",
          "message" : "applier stopped",
          "failedConnects" : 6
        },
        "totalRequests" : 18,
        "totalFailedConnects" : 11,
        "totalEvents" : 0,
        "lastError" : {
          "time" : "2013-08-02T12:40:25Z",
          "errorMessage" : "could not connect to master at tcp://master.example.org:8529: Could not connect to 'tcp:/...",
          "errorNum" : 1400
        },
        ...
      }
    }

Note: the state of a database's replication applier is queryable via the HTTP API, too.
Please refer to [HTTP Interface for Replication](../HttpReplications/README.md) for more details.

!SUBSUBSECTION Starting and Stopping

To start and stop the applier in the current database, the *start* and *stop* commands can
be used:

    require("org/arangodb/replication").applier.start(<tick>);
    require("org/arangodb/replication").applier.stop();

Note that starting a replication applier without setting up an initial configuration will
fail. The replication applier will look for its configuration in a file named
*REPLICATION-APPLIER-CONFIG* in the current database's directory. If the file is not present,
ArangoDB will use some default configuration, but it cannot guess the endpoint (the address
of the master database) the applier should connect to. Thus starting the applier without
configuration will fail.

Note that the first time you start the applier, you should pass *1* as value for tick. Otherwise
you might see the message

    "lastError" : {
      "time" : "2014-04-16T16:59:16Z",
      "errorMessage" : "no start tick",
      "errorNum" : 1413
    }

Note that starting a database's replication applier via the *start* command will not necessarily
start the applier on the next and following ArangoDB server restarts. Additionally, stopping a
database's replication applier manually will not necessarily prevent the applier from being
started again on the next server start. All of this is configurable separately (hang on reading).

Please note that when starting the replication applier of database, it will resume where it stopped.
This is sensible because replication log events should be applied incrementally. If the
replication applier of a database has never been started before, it needs some *tick* value from the
master database's event log from which to start fetching events.

!SUBSUBSECTION Configuration

To configure the replication applier of a specific database, use the *properties* command. Using
it without any arguments will return the current configuration:

    require("org/arangodb/replication").applier.properties();

The result might look like this:

    {
      "requestTimeout" : 300,
      "connectTimeout" : 10,
      "ignoreErrors" : 0,
      "maxConnectRetries" : 10,
      "chunkSize" : 0,
      "autoStart" : false,
      "adaptivePolling" : true
    }

Note that there is no *endpoint* attribute configured yet. The *endpoint* attribute is required
for the replication applier to be startable. You may also want to configure a username and password
for the connection via the *username* and *password* attributes.

    require("org/arangodb/replication").applier.properties({
      endpoint: "tcp://master.domain.org:8529",
      username:  "root",
      password: "secret"
    });

This will re-configure the replication applier for the current database. The configuration will be
used from the next start of the replication applier. The replication applier cannot be re-configured
while it is running. It must be stopped first to be re-configured.

To make the replication applier of the current database start automatically when the ArangoDB server
starts, use the *autoStart* attribute.

Setting the *adaptivePolling* attribute to *true* will make the replication applier poll the
master database for changes with a variable frequency. The replication applier will then lower the
frequency when the master is idle, and increase it when the master can provide new events).
Otherwise the replication applier will poll the master database for changes with a constant frequency
of 5 seconds if the master database's replication logger is turned off, and 0.5 seconds if it is
turned on.

To set a timeout for connection and following request attempts, use the *connectTimeout* and
*requestTimeout* values. The *maxConnectRetries* attribute configures after how many failed
connection attempts in a row the replication applier will give up and turn itself off.
You may want to set this to a high value so that temporary network outages do not lead to the
replication applier stopping itself.

The *chunkSize* attribute can be used to control the approximate maximum size of a master's
response (in bytes). Setting it to a low value may make the master respond faster (less data is
assembled before the master sends the response), but may require more request-response roundtrips.
Set it to *0* to use ArangoDB's built-in default value.

The following example will set most of the discussed properties for the current database's applier:

    require("org/arangodb/replication").applier.properties({
      endpoint: "tcp://master.domain.org:8529",
      username:  "root",
      password: "secret",
      adaptivePolling: true,
      connectTimeout: 15,
      maxConnectRetries: 100,
      chunkSize: 262144,
      autoStart: true
    });

After the applier is now fully configured, it could theoretically be started. However, we
may first need an initial sychronization of all collections and their data from the master before
we start the replication applier. The reason is that the replication logger of the master database
may not contain all data we need for a full synchronization.

The only safe method for doing a full synchronization is thus to

* make sure the replication logger of the master database is running
* perform an initial full sync with the master database
* note the master database's replication logger *tick* value and
* start the continuous replication applier using this tick value.

The initial synchronization for the current database is executed with the *sync* command:

    require("org/arangodb/replication").sync({
      endpoint: "tcp://master.domain.org:8529",
      username: "root",
      password: "secret
    });

Warning: *sync* will do a full synchronization of the collections in the current database with
collections present in the master database.
Any local instances of the collections and all their data are removed! Only execute this
command when you are sure you want to remove the local data!

As *sync* does a full synchronization, it may take a while to execute.
When *sync* completes successfully, it show a list of collections it has synchronized in the
*collections* attribute. It will also return the master database's replication logger tick value
at the time the *sync* was started on the master. The tick value is contained in the *lastLogTick*
attribute of the *sync* command:

    {
      "lastLogTick" : "231848833079705",
      "collections" : [ ... ]
    }

If the master database's replication logger is turned on, you can now start the continuous
synchronization for the current database with the command

    require("org/arangodb/replication").applier.start("231848833079705");

Note that the tick values should be handled as strings. Using numeric data types for tick
values is unsafe because they might exceed the 32 bit value and the IEEE754 double accuracy
ranges.