mirror of https://gitee.com/bigwinds/arangodb
405 lines
18 KiB
Plaintext
405 lines
18 KiB
Plaintext
!CHAPTER Components
|
|
|
|
The replication architecture in ArangoDB 1.4 consists of two main components, which
|
|
can be used together or in isolation: the *replication logger* and the *replication applier*.
|
|
|
|
Both are available in ArangoDB 1.4 and can be administered via the command line or
|
|
a REST API (see [HTTP Interface for Replication](../HttpReplications/README.md)).
|
|
|
|
In most cases, the *replication logger* will be run inside a master database, and the
|
|
*replication applier* will be executed in a slave database.
|
|
|
|
As there can be multiple database inside an ArangoDB instance, there can be multiple
|
|
replication loggers and multiple replication appliers per ArangoDB instance.
|
|
|
|
!SUBSECTION Replication Logger
|
|
|
|
!SUBSUBSECTION Purpose
|
|
|
|
The purpose of the replication logger is to log all changes that modify the state
|
|
of data in a specific database. This includes document insertions, updates, and deletions.
|
|
It also includes creating, dropping, renaming and changing collections and indexes.
|
|
|
|
When the replication logger is used, it will log all these write operations for the
|
|
database in its *event log*, in the same order in which the operations were carried out
|
|
originally.
|
|
|
|
Reading the event log sequentially provides a list of all write operations carried out
|
|
in the database. Replication clients can request certain log events from the logger.
|
|
|
|
For example, ArangoDB's replication applier will permanently query the latest events
|
|
from the replication logger. The applier of the slave database will then apply all changes
|
|
locally to get to the same state of data as the master database. It will keep track of
|
|
which parts of the event log it already synchronized, meaning that it will perform an
|
|
incremental synchronization.
|
|
|
|
Technically, the event log is a system collection named *_replication* inside the database
|
|
the applier runs in. The event log is persisted and will still be present after a server
|
|
shutdown or crash. The event log's underlying collection should not be modified by users
|
|
directly. It should only be accessed using the special API methods offered by ArangoDB.
|
|
|
|
!SUBSUBSECTION Starting and Stopping
|
|
|
|
ArangoDB will only log changes if the replication logger is turned on for the specific
|
|
database. Should there be any write operations in the database while the replication logger
|
|
is turned off, these events will be lost for replication.
|
|
|
|
To turn on the replication logger for a database once, the following command can be executed:
|
|
|
|
require("org/arangodb/replication").logger.start();
|
|
|
|
Note that starting the replication logger does not necessarily mean it will be started
|
|
automatically on all following server startups. This can be configured separately (keep on
|
|
reading for this).
|
|
|
|
Please note that the above command turns on the replication logger for the current database
|
|
only, but not for all databases. To turn on the replication logger for multiple databases,
|
|
the above command must be executed separately for each database.
|
|
|
|
To turn the replication logger off for the current database, execute the *stop* command:
|
|
|
|
require("org/arangodb/replication").logger.stop();
|
|
|
|
This has turned off the logger, but still the logger may be started again automatically
|
|
the next time the ArangoDB server is started. This can be configured separately (again,
|
|
keep on reading).
|
|
|
|
To determine the current state of the replication logger in the current database (including
|
|
whether it is currently running or not), use the *state* command:
|
|
|
|
require("org/arangodb/replication").logger.state();
|
|
|
|
The result might look like this:
|
|
|
|
{
|
|
"state" : {
|
|
"running" : false,
|
|
"lastLogTick" : "255376126918573",
|
|
"totalEvents" : 0,
|
|
"time" : "2013-08-02T11:01:28Z"
|
|
},
|
|
"server" : {
|
|
"version" : "1.4.devel",
|
|
"serverId" : "53904504772335"
|
|
},
|
|
"clients" : [ ]
|
|
}
|
|
|
|
The *running* attribute indicates whether the logger is currently enabled and will
|
|
log any events. The *totalEvents* attribute will indicate how many log events have been
|
|
logged since the start of the ArangoDB server. The value will not be reset between
|
|
multiple stops and restarts of the logger. Finally, the *lastLogTick* value will indicate
|
|
the id of the last event that was logged. It can be used to determine whether new
|
|
events were logged, and is also used by the replication applier for incremental
|
|
fetching of data.
|
|
|
|
Note: the replication logger state can also be queried via the HTTP API (see @ref HttpReplication).
|
|
|
|
!SUBSUBSECTION Configuration
|
|
|
|
To determine whether the current database's replication logger is automatically started
|
|
when the ArangoDB server is started, the logger has a separate configuration. The
|
|
configuration is stored in a file *REPLICATION-LOGGER-CONFIG* inside the current database's
|
|
directory. If it does not exist, ArangoDB will use default values.
|
|
|
|
To check and adjust the configuration of the replication logger for the current database,
|
|
use the *properties* command. To view the current configuration, use *properties* without
|
|
any arguments:
|
|
|
|
require("org/arangodb/replication").logger.properties();
|
|
|
|
The result might look like this:
|
|
|
|
{
|
|
"autoStart" : false,
|
|
"logRemoteChanges" : false,
|
|
"maxEvents" : 1048576,
|
|
"maxEventsSize" : 134217728
|
|
}
|
|
|
|
The *autoStart* attribute indicates whether the current database's replication logger is
|
|
automatically started whenever the ArangoDB server is started. You may want to set it to
|
|
*true* for a proper master setup. To do so, supply the updated attributes to the *properties*
|
|
command, e.g.:
|
|
|
|
require("org/arangodb/replication").logger.properties({
|
|
autoStart: true
|
|
});
|
|
|
|
This will ensure the replication logger for the current database is automatically started
|
|
on all following ArangoDB starts unless *autoStart* is set to *false* again.
|
|
|
|
Note that only those attributes will be changed that you supplied in the argument to *properties*.
|
|
All other configuration values will remain unchanged. Also note that the replication logger
|
|
can be reconfigured while it is running.
|
|
|
|
It a replication logger is turned on, the event log may be allowed to grow indefinitely.
|
|
It may be sensible to set a maximum size or a maximum number of events to keep.
|
|
If one of these thresholds is reached during logging, the replication logger will start
|
|
removing the oldest events from the event log automatically.
|
|
|
|
The thresholds are set by adjusting the *maxEvents* and *maxEventsSize* attributes via the
|
|
*properties* command. The *maxEvents* attribute is the maximum number of events to keep
|
|
in the event log before starting to remove oldest events. The *maxEventsSize* is the maximum
|
|
cumulated size of event log data (in bytes) that is kept in the log before starting to remove
|
|
oldest events. If both are set to a value of *0*, it means that the number and size
|
|
of log events to keep is unrestricted. If both are set to a non-zero value, it means
|
|
that there are restrictions on both the number and cumulated size of events to keep,
|
|
and if either of the restrictions is hit, the deletion will be triggered.
|
|
|
|
The following command will set the threshold to 5000 events to keep, irrespective of the
|
|
cumulated event data sizes:
|
|
|
|
require("org/arangodb/replication").logger.properties({
|
|
maxEvents: 5000,
|
|
maxEventsSize: 0
|
|
});
|
|
|
|
|
|
!SUBSECTION Replication Applier
|
|
|
|
!SUBSUBSECTION Purpose
|
|
|
|
The purpose of the replication applier is to read data from a master database's event log,
|
|
and apply them locally. The applier will check the master database for new events periodically.
|
|
It will perform an incremental synchronization, i.e. only asking the master for events
|
|
that occurred after the last synchronization.
|
|
|
|
The replication applier does not get notified by the master database when there are "new"
|
|
events, but uses the pull principle. It might thus take some time (the *replication lag*)
|
|
before an event from the master database gets shipped to and applied in a slave database.
|
|
|
|
The replication applier of a database is run in a separate thread. It may encounter problems
|
|
when a log event from the master cannot be applied safely, or when the connection to the master
|
|
database goes down (network outage, master database is down or unavailable etc.). In this case,
|
|
the database's replication applier thread might terminate itself. It is then up to the
|
|
administrator to fix the problem and restart the database's replication applier.
|
|
|
|
If the replication applier cannot connect to the master database, or the communication fails at
|
|
some point during the synchronization, the replication applier will try to reconnect to
|
|
the master database. It will give up reconnecting only after a configurable amount of connection
|
|
attempts.
|
|
|
|
The replication applier state is queryable at any time by using the *state* command of the
|
|
applier. This will return the state of the applier of the current database:
|
|
|
|
require("org/arangodb/replication").applier.state();
|
|
|
|
The result might look like this:
|
|
|
|
{
|
|
"state" : {
|
|
"running" : false,
|
|
"lastAppliedContinuousTick" : "231848832948633",
|
|
"lastProcessedContinuousTick" : "231848832948633",
|
|
"lastAvailableContinuousTick" : null,
|
|
"progress" : {
|
|
"time" : "2013-08-02T11:40:08Z",
|
|
"message" : "applier created",
|
|
"failedConnects" : 0
|
|
},
|
|
"totalRequests" : 0,
|
|
"totalFailedConnects" : 0,
|
|
"totalEvents" : 0,
|
|
"lastError" : {
|
|
"errorNum" : 0
|
|
},
|
|
"time" : "2013-08-02T11:40:22Z"
|
|
},
|
|
"server" : {
|
|
"version" : "1.4.devel",
|
|
"serverId" : "53904504772335"
|
|
},
|
|
"endpoint" : "tcp://master.domain.org:8529"
|
|
}
|
|
|
|
The *running* attribute indicates whether the replication applier of the current database
|
|
is currently running and polling the server at *endpoint* for new events.
|
|
|
|
The *failedConnects* attribute shows how many failed connection attempts the replication
|
|
applier currently has encountered in a row. In contrast, the *totalFailedConnects* attribute
|
|
indicates how many failed connection attempts the applier has made in total. The
|
|
*totalRequest* attribute shows how many requests the applier has sent to the master database
|
|
in total. The *totalEvents* attribute shows how many log events the applier has read from the
|
|
master.
|
|
|
|
The *message* sub-attribute of the *progress* sub-attribute gives a brief hint of what the
|
|
applier currently does (if it is running). The *lastError* attribute also has an optional
|
|
*errorMessage* sub-attribute, showing the latest error message. The *errorNum* sub-attribute of the
|
|
*lastError* attribute can be used by clients to programmatically check for errors. It should be *0*
|
|
if there is no error, and it should be non-zero when the applier terminated itself due to
|
|
a problem.
|
|
|
|
Here is an example of the state after the replication applier terminated itself due to
|
|
(repeated) connection problems:
|
|
|
|
{
|
|
"state" : {
|
|
"running" : false,
|
|
"progress" : {
|
|
"time" : "2013-08-02T12:40:25Z",
|
|
"message" : "applier stopped",
|
|
"failedConnects" : 6
|
|
},
|
|
"totalRequests" : 18,
|
|
"totalFailedConnects" : 11,
|
|
"totalEvents" : 0,
|
|
"lastError" : {
|
|
"time" : "2013-08-02T12:40:25Z",
|
|
"errorMessage" : "could not connect to master at tcp://master.example.org:8529: Could not connect to 'tcp:/...",
|
|
"errorNum" : 1400
|
|
},
|
|
...
|
|
}
|
|
}
|
|
|
|
Note: the state of a database's replication applier is queryable via the HTTP API, too.
|
|
Please refer to [HTTP Interface for Replication](../HttpReplications/README.md) for more details.
|
|
|
|
!SUBSUBSECTION Starting and Stopping
|
|
|
|
To start and stop the applier in the current database, the *start* and *stop* commands can
|
|
be used:
|
|
|
|
require("org/arangodb/replication").applier.start(<tick>);
|
|
require("org/arangodb/replication").applier.stop();
|
|
|
|
Note that starting a replication applier without setting up an initial configuration will
|
|
fail. The replication applier will look for its configuration in a file named
|
|
*REPLICATION-APPLIER-CONFIG* in the current database's directory. If the file is not present,
|
|
ArangoDB will use some default configuration, but it cannot guess the endpoint (the address
|
|
of the master database) the applier should connect to. Thus starting the applier without
|
|
configuration will fail.
|
|
|
|
Note that the first time you start the applier, you should pass *1* as value for tick. Otherwise
|
|
you might see the message
|
|
|
|
"lastError" : {
|
|
"time" : "2014-04-16T16:59:16Z",
|
|
"errorMessage" : "no start tick",
|
|
"errorNum" : 1413
|
|
}
|
|
|
|
Note that starting a database's replication applier via the *start* command will not necessarily
|
|
start the applier on the next and following ArangoDB server restarts. Additionally, stopping a
|
|
database's replication applier manually will not necessarily prevent the applier from being
|
|
started again on the next server start. All of this is configurable separately (hang on reading).
|
|
|
|
Please note that when starting the replication applier of database, it will resume where it stopped.
|
|
This is sensible because replication log events should be applied incrementally. If the
|
|
replication applier of a database has never been started before, it needs some *tick* value from the
|
|
master database's event log from which to start fetching events.
|
|
|
|
!SUBSUBSECTION Configuration
|
|
|
|
To configure the replication applier of a specific database, use the *properties* command. Using
|
|
it without any arguments will return the current configuration:
|
|
|
|
require("org/arangodb/replication").applier.properties();
|
|
|
|
The result might look like this:
|
|
|
|
{
|
|
"requestTimeout" : 300,
|
|
"connectTimeout" : 10,
|
|
"ignoreErrors" : 0,
|
|
"maxConnectRetries" : 10,
|
|
"chunkSize" : 0,
|
|
"autoStart" : false,
|
|
"adaptivePolling" : true
|
|
}
|
|
|
|
Note that there is no *endpoint* attribute configured yet. The *endpoint* attribute is required
|
|
for the replication applier to be startable. You may also want to configure a username and password
|
|
for the connection via the *username* and *password* attributes.
|
|
|
|
require("org/arangodb/replication").applier.properties({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "root",
|
|
password: "secret"
|
|
});
|
|
|
|
This will re-configure the replication applier for the current database. The configuration will be
|
|
used from the next start of the replication applier. The replication applier cannot be re-configured
|
|
while it is running. It must be stopped first to be re-configured.
|
|
|
|
To make the replication applier of the current database start automatically when the ArangoDB server
|
|
starts, use the *autoStart* attribute.
|
|
|
|
Setting the *adaptivePolling* attribute to *true* will make the replication applier poll the
|
|
master database for changes with a variable frequency. The replication applier will then lower the
|
|
frequency when the master is idle, and increase it when the master can provide new events).
|
|
Otherwise the replication applier will poll the master database for changes with a constant frequency
|
|
of 5 seconds if the master database's replication logger is turned off, and 0.5 seconds if it is
|
|
turned on.
|
|
|
|
To set a timeout for connection and following request attempts, use the *connectTimeout* and
|
|
*requestTimeout* values. The *maxConnectRetries* attribute configures after how many failed
|
|
connection attempts in a row the replication applier will give up and turn itself off.
|
|
You may want to set this to a high value so that temporary network outages do not lead to the
|
|
replication applier stopping itself.
|
|
|
|
The *chunkSize* attribute can be used to control the approximate maximum size of a master's
|
|
response (in bytes). Setting it to a low value may make the master respond faster (less data is
|
|
assembled before the master sends the response), but may require more request-response roundtrips.
|
|
Set it to *0* to use ArangoDB's built-in default value.
|
|
|
|
The following example will set most of the discussed properties for the current database's applier:
|
|
|
|
require("org/arangodb/replication").applier.properties({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "root",
|
|
password: "secret",
|
|
adaptivePolling: true,
|
|
connectTimeout: 15,
|
|
maxConnectRetries: 100,
|
|
chunkSize: 262144,
|
|
autoStart: true
|
|
});
|
|
|
|
After the applier is now fully configured, it could theoretically be started. However, we
|
|
may first need an initial sychronization of all collections and their data from the master before
|
|
we start the replication applier. The reason is that the replication logger of the master database
|
|
may not contain all data we need for a full synchronization.
|
|
|
|
The only safe method for doing a full synchronization is thus to
|
|
|
|
* make sure the replication logger of the master database is running
|
|
* perform an initial full sync with the master database
|
|
* note the master database's replication logger *tick* value and
|
|
* start the continuous replication applier using this tick value.
|
|
|
|
The initial synchronization for the current database is executed with the *sync* command:
|
|
|
|
require("org/arangodb/replication").sync({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "root",
|
|
password: "secret
|
|
});
|
|
|
|
Warning: *sync* will do a full synchronization of the collections in the current database with
|
|
collections present in the master database.
|
|
Any local instances of the collections and all their data are removed! Only execute this
|
|
command when you are sure you want to remove the local data!
|
|
|
|
As *sync* does a full synchronization, it may take a while to execute.
|
|
When *sync* completes successfully, it show a list of collections it has synchronized in the
|
|
*collections* attribute. It will also return the master database's replication logger tick value
|
|
at the time the *sync* was started on the master. The tick value is contained in the *lastLogTick*
|
|
attribute of the *sync* command:
|
|
|
|
{
|
|
"lastLogTick" : "231848833079705",
|
|
"collections" : [ ... ]
|
|
}
|
|
|
|
If the master database's replication logger is turned on, you can now start the continuous
|
|
synchronization for the current database with the command
|
|
|
|
require("org/arangodb/replication").applier.start("231848833079705");
|
|
|
|
Note that the tick values should be handled as strings. Using numeric data types for tick
|
|
values is unsafe because they might exceed the 32 bit value and the IEEE754 double accuracy
|
|
ranges.
|
|
|