mirror of https://gitee.com/bigwinds/arangodb
246 lines
11 KiB
Markdown
246 lines
11 KiB
Markdown
Per-Database Setup
|
|
==================
|
|
|
|
This page describes the master/slave replication process based on a specific database within an ArangoDB instance.
|
|
That means that only the specified database will be replicated.
|
|
|
|
Setting up a working master-slave replication requires two ArangoDB instances:
|
|
* **master**: this is the instance where all data-modification operations should be directed to
|
|
* **slave**: this is the instance that replicates the data from the master. We will start a _replication applier_ on it, and it will fetch data from the
|
|
master database's _write-ahead log_ and apply its operations locally
|
|
|
|
For the following example setup, we will use the instance *tcp://master.domain.org:8529* as the
|
|
_master_, and the instance *tcp://slave.domain.org:8530* as a _slave_.
|
|
|
|
The goal is to have all data from the database *_system* on _master_ *tcp://master.domain.org:8529*
|
|
be replicated to the database *_system* on the _slave_ *tcp://slave.domain.org:8530*.
|
|
|
|
On the _master_, nothing special needs to be done, as all write operations will automatically be
|
|
logged in the master's _write-ahead log_ (WAL).
|
|
|
|
All-in-one setup
|
|
----------------
|
|
|
|
To make the replication copy the initial data from the **master** to the **slave** and start the
|
|
continuous replication on the **slave**, there is an all-in-one command:
|
|
|
|
```js
|
|
require("@arangodb/replication").setupReplication(configuration);
|
|
```
|
|
|
|
The following example demonstrates how to use the command for setting up replication
|
|
for the *_system* database. Note that it should be run on the **slave** and not the
|
|
**master**:
|
|
|
|
```js
|
|
db._useDatabase("_system");
|
|
require("@arangodb/replication").setupReplication({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "myuser",
|
|
password: "mypasswd",
|
|
verbose: false,
|
|
includeSystem: false,
|
|
incremental: true,
|
|
autoResync: true
|
|
});
|
|
```
|
|
|
|
The command will return when the initial synchronization is finished and the continuous replication
|
|
has been started, or in case the initial synchronization has failed.
|
|
|
|
If the initial synchronization is successful, the command will store the given configuration on
|
|
the slave. It also configures the continuous replication to start automatically if the slave is
|
|
restarted, i.e. *autoStart* is set to *true*.
|
|
|
|
If the command is run while the slave's replication applier is already running, it will first
|
|
stop the running applier, drop its configuration and do a resynchronization of data with the
|
|
**master**. It will then use the provided configration, overwriting any previously existing replication
|
|
configuration on the **slave**.
|
|
|
|
|
|
Initial synchronization
|
|
-----------------------
|
|
|
|
The initial synchronization and continuous replication applier can also be started separately.
|
|
To start replication on the **slave**, make sure there currently is no replication applier running.
|
|
|
|
The following commands stop a running applier in the slave's *_system* database:
|
|
|
|
```js
|
|
db._useDatabase("_system");
|
|
require("@arangodb/replication").applier.stop();
|
|
```
|
|
|
|
The *stop* operation will terminate any replication activity in the _system database on the slave.
|
|
|
|
|
|
After that, the initial synchronization can be run. It will copy the collections from the **master**
|
|
to the **slave**, overwriting existing data. To run the initial synchronization, execute the following
|
|
commands on the **slave**:
|
|
|
|
```js
|
|
db._useDatabase("_system");
|
|
require("@arangodb/replication").sync({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "myuser",
|
|
password: "mypasswd",
|
|
verbose: false
|
|
});
|
|
```
|
|
|
|
Username and password only need to be specified when the **master** requires authentication.
|
|
To check what the synchronization is currently doing, supply set the *verbose* option to *true*.
|
|
If set, the synchronization will create log messages with the current synchronization status.
|
|
|
|
**Warning**: The *sync* command will replace data in the **slave** database with data from the
|
|
**master** database! Only execute these commands if you have verified you are on the correct server,
|
|
in the correct database!
|
|
|
|
The sync operation will return an attribute named *lastLogTick* which we'll need to note. The
|
|
last log tick will be used as the starting point for subsequent replication activity. Let's
|
|
assume we got the following last log tick:
|
|
|
|
```js
|
|
{
|
|
"lastLogTick" : "40694126",
|
|
...
|
|
}
|
|
```
|
|
|
|
Initial synchronization from the ArangoShell
|
|
--------------------------------------------
|
|
|
|
The initial synchronization via the *sync* command may take a long time to complete. The shell
|
|
will block until the slave has completed the initial synchronization or until an error occurs.
|
|
By default, the *sync* command in the ArangoShell will poll the slave for a status update every
|
|
10 seconds.
|
|
|
|
Optionally the *sync* command can be made non-blocking by setting its *async* option to true.
|
|
In this case, the *sync command* will return instantly with an id string, and the initial
|
|
synchronization will run detached on the master. To fetch the current status of the *sync*
|
|
progress from the ArangoShell, the *getSyncResult* function can be used as follows:
|
|
|
|
```js
|
|
db._useDatabase("_system");
|
|
var replication = require("@arangodb/replication");
|
|
|
|
/* run command in async mode */
|
|
var id = replication.sync({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "myuser",
|
|
password: "mypasswd",
|
|
async: true
|
|
});
|
|
|
|
/* now query the status of our operation */
|
|
print(replication.getSyncResult(id));
|
|
```
|
|
|
|
*getSyncResult* will return *false* as long as the synchronization is not complete, and return the
|
|
synchronization result otherwise.
|
|
|
|
|
|
Continuous synchronization
|
|
--------------------------
|
|
|
|
When the initial synchronization is finished, the continuous replication applier can be started using
|
|
the last log tick provided by the *sync* command. Before starting it, there is at least one
|
|
configuration option to consider: replication on the **slave** will be running until the
|
|
**slave** gets shut down. When the slave server gets restarted, replication will be turned off again.
|
|
To change this, we first need to configure the slave's replication applier and set its
|
|
*autoStart* attribute.
|
|
|
|
Here's the command to configure the replication applier with several options, including the
|
|
*autoStart* attribute:
|
|
|
|
```js
|
|
db._useDatabase("_system");
|
|
require("@arangodb/replication").applier.properties({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "myuser",
|
|
password: "mypasswd",
|
|
autoStart: true,
|
|
autoResync: true,
|
|
autoResyncRetries: 2,
|
|
adaptivePolling: true,
|
|
includeSystem: false,
|
|
requireFromPresent: false,
|
|
idleMinWaitTime: 0.5,
|
|
idleMaxWaitTime: 1.5,
|
|
verbose: false
|
|
});
|
|
```
|
|
|
|
An important consideration for replication is whether data from system collections (such as
|
|
*_graphs* or *_users*) should be applied. The *includeSystem* option controls that. If set to
|
|
*true*, changes in system collections will be replicated. Otherwise, they will not be replicated.
|
|
It is often not necessary to replicate data from system collections, especially because it may
|
|
lead to confusion on the slave because the slave needs to have its own system collections in
|
|
order to start and keep operational.
|
|
|
|
{% hint 'warning' %}
|
|
There is a separate option
|
|
*includeFoxxQueues* for controlling whether Foxx queue jobs from the system collections `_jobs`
|
|
and `_queues` collections should be replicated. Documents from these collections are not replicated
|
|
by default in order to avoid execution of Foxx queue jobs on the slave.
|
|
{% endhint %}
|
|
|
|
The *requireFromPresent* attribute controls whether the applier will start synchronizing in case
|
|
it detects that the master cannot provide data for the initial tick value provided by the slave.
|
|
This may be the case if the master does not have a big enough backlog of historic WAL logfiles,
|
|
and when the replication is re-started after a longer pause. When *requireFromPresent* is set to
|
|
*true*, then the replication applier will check at start whether the start tick from which it starts
|
|
or resumes replication is still present on the master. If not, then there would be data loss. If
|
|
*requireFromPresent* is *true*, the replication applier will abort with an appropriate error message.
|
|
If set to *false*, then the replication applier will still start, and ignore the data loss.
|
|
|
|
The *autoResync* option can be used in conjunction with the *requireFromPresent* option as follows:
|
|
when both *requireFromPresent* and *autoResync* are set to *true* and the master cannot provide the
|
|
log data the slave had requested, the replication applier will stop as usual. But due to the fact
|
|
that *autoResync* is set to true, the slave will automatically trigger a full resync of all data with
|
|
the master. After that, the replication applier will go into continuous replication mode again.
|
|
Additionally, setting *autoResync* to *true* will trigger a full re-synchronization of data when
|
|
the continuous replication is started and detects that there is no start tick value.
|
|
|
|
Note that automatic re-synchronization (*autoResync* option set to *true*) may transfer a lot of
|
|
data from the master to the slave and can therefore be expensive. Still it's turned on here so
|
|
there's less need for manual intervention.
|
|
|
|
The *autoResyncRetries* option can be used to control the number of resynchronization retries that
|
|
will be performed in a row when automatic resynchronization is enabled and kicks in. Setting this to
|
|
*0* will effectively disable *autoResync*. Setting it to some other value will limit the number of retries
|
|
that are performed. This helps preventing endless retries in case resynchronizations always fail.
|
|
|
|
Now it's time to start the replication applier on the slave using the last log tick we got
|
|
before:
|
|
|
|
```js
|
|
db._useDatabase("_system");
|
|
require("@arangodb/replication").applier.start("40694126");
|
|
```
|
|
|
|
This will replicate all operations happening in the master's system database and apply them
|
|
on the slave, too.
|
|
|
|
After that, you should be able to monitor the state and progress of the replication
|
|
applier by executing the *state* command on the slave server:
|
|
|
|
```js
|
|
db._useDatabase("_system");
|
|
require("@arangodb/replication").applier.state();
|
|
```
|
|
|
|
Please note that stopping the replication applier on the slave using the *stop* command
|
|
should be avoided. The reason is that currently ongoing transactions (that have partly been
|
|
replicated to the slave) will be need to be restarted after a restart of the replication
|
|
applier. Stopping and restarting the replication applier on the slave should thus only be
|
|
performed if there is certainty that the master is currently fully idle and all transactions
|
|
have been replicated fully.
|
|
|
|
Note that while a slave has only partly executed a transaction from the master, it might keep
|
|
a write lock on the collections involved in the transaction.
|
|
|
|
You may also want to check the master and slave states via the HTTP APIs
|
|
(see [HTTP Interface for Replication](../../../HTTP/Replications/index.html)).
|
|
|