mirror of https://gitee.com/bigwinds/arangodb
297 lines
14 KiB
Markdown
297 lines
14 KiB
Markdown
_Replication applier_
|
|
=====================
|
|
|
|
Replication configuration
|
|
-------------------------
|
|
|
|
The replication is turned off by default. In order to create a master-slave setup,
|
|
the so-called _replication applier_ needs to be enabled on the _slave_ databases.
|
|
|
|
Replication is configured on a per-database level, or, starting from 3.3.0 at server level.
|
|
|
|
The _replication applier_ on the _slave_ can be used to perform a one-time synchronization
|
|
with the _master_ (and then stop), or to perform an ongoing replication of changes. To
|
|
resume replication on _slave_ restart, the *autoStart* attribute of the replication
|
|
applier must be set to *true*.
|
|
|
|
_setupReplication_ Command
|
|
--------------------------
|
|
|
|
To copy the initial data from the _master_ to the _slave_ and start the
|
|
continuous replication, there is an all-in-one command *setupReplication*.
|
|
|
|
From _ArangoSH_:
|
|
|
|
```js
|
|
require("@arangodb/replication").setupReplication(configuration);
|
|
```
|
|
|
|
The following example demonstrates how to use the command for setting up replication
|
|
for the *_system* database. Note that it should be run on the _slave_ and not the _master_:
|
|
|
|
```js
|
|
db._useDatabase("_system");
|
|
require("@arangodb/replication").setupReplication({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "myuser",
|
|
password: "mypasswd",
|
|
verbose: false,
|
|
includeSystem: false,
|
|
incremental: true,
|
|
autoResync: true
|
|
});
|
|
```
|
|
|
|
The command will return when the initial synchronization is finished and the continuous replication
|
|
is started, or in case the initial synchronization has failed.
|
|
|
|
If the initial synchronization is successful, the command will store the given configuration on
|
|
the _slave_. It also configures the continuous replication to start automatically if the slave is
|
|
restarted, i.e. *autoStart* is set to *true*.
|
|
|
|
If the command is run while the slave's replication applier is already running, it will first
|
|
stop the running applier, drop its configuration and do a resynchronization of data with the
|
|
_master_. It will then use the provided configuration, overwriting any previously existing replication
|
|
configuration on the slave.
|
|
|
|
### Starting and Stopping the _replication applier_
|
|
|
|
To manually start and stop the _replication applier_ in the current database, the *start* and *stop* commands
|
|
can be used like this:
|
|
|
|
```js
|
|
require("@arangodb/replication").applier.start(<tick>);
|
|
require("@arangodb/replication").applier.stop();
|
|
```
|
|
|
|
**Note**: Starting a _replication applier_ without setting up an initial configuration will
|
|
fail. The replication applier will look for its configuration in a file named
|
|
*REPLICATION-APPLIER-CONFIG* in the current database's directory. If the file is not present,
|
|
ArangoDB will use some default configuration, but it cannot guess the endpoint (the address
|
|
of the master database) the applier should connect to. Thus starting the applier without
|
|
configuration will fail.
|
|
|
|
Note that at the first time you start the applier, you should pass the value returned in the
|
|
*lastLogTick* attribute of the initial sync operation.
|
|
|
|
**Note**: Starting a database's replication applier via the *start* command will not necessarily
|
|
start the applier on the next and following ArangoDB server restarts. Additionally, stopping a
|
|
database's replication applier manually will not necessarily prevent the applier from being
|
|
started again on the next server start. All of this is configurable separately (hang on reading).
|
|
|
|
**Note**: when stopping and restarting the replication applier of database, it will resume where
|
|
it last stopped. This is sensible because replication log events should be applied incrementally.
|
|
If the replication applier of a database has never been started before, it needs some *tick* value
|
|
from the master's log from which to start fetching events.
|
|
|
|
There is one caveat to consider when stopping a replication on the slave: if there are still
|
|
ongoing replicated transactions that are neither committed or aborted, stopping the replication
|
|
applier will cause these operations to be lost for the slave. If these transactions commit on the
|
|
master later and the replication is resumed, the slave will not be able to commit these transactions,
|
|
too. Thus stopping the replication applier on the slave manually should only be done if there
|
|
is certainty that there are no ongoing transactions on the master.
|
|
|
|
|
|
### _Replication applier_ Configuration
|
|
|
|
To configure the _replication applier_ of a specific database, use the *properties* command. Using
|
|
it without any arguments will return the applier's current configuration:
|
|
|
|
```js
|
|
require("@arangodb/replication").applier.properties();
|
|
```
|
|
|
|
The result might look like this:
|
|
|
|
```js
|
|
{
|
|
"requestTimeout" : 600,
|
|
"connectTimeout" : 10,
|
|
"ignoreErrors" : 0,
|
|
"maxConnectRetries" : 10,
|
|
"chunkSize" : 0,
|
|
"autoStart" : false,
|
|
"adaptivePolling" : true,
|
|
"includeSystem" : true,
|
|
"requireFromPresent" : false,
|
|
"autoResync" : false,
|
|
"autoResyncRetries" : 2,
|
|
"verbose" : false
|
|
}
|
|
```
|
|
|
|
**Note**: There is no *endpoint* attribute configured yet. The *endpoint* attribute is required
|
|
for the replication applier to be startable. You may also want to configure a username and password
|
|
for the connection via the *username* and *password* attributes.
|
|
|
|
```js
|
|
require("@arangodb/replication").applier.properties({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "root",
|
|
password: "secret",
|
|
verbose: false
|
|
});
|
|
```
|
|
|
|
This will re-configure the replication applier for the current database. The configuration will be
|
|
used from the next start of the replication applier. The replication applier cannot be re-configured
|
|
while it is running. It must be stopped first to be re-configured.
|
|
|
|
To make the replication applier of the current database start automatically when the ArangoDB server
|
|
starts, use the *autoStart* attribute.
|
|
|
|
Setting the *adaptivePolling* attribute to *true* will make the replication applier poll the
|
|
master database for changes with a variable frequency. The replication applier will then lower the
|
|
frequency when the master is idle, and increase it when the master can provide new events).
|
|
Otherwise the replication applier will poll the master database for changes with a constant frequency.
|
|
|
|
The *idleMinWaitTime* attribute controls the minimum wait time (in seconds) that the replication applier
|
|
will intentionally idle before fetching more log data from the master in case the master has already
|
|
sent all its log data. This wait time can be used to control the frequency with which the replication
|
|
applier sends HTTP log fetch requests to the master in case there is no write activity on the master.
|
|
|
|
The *idleMaxWaitTime* attribute controls the maximum wait time (in seconds) that the replication
|
|
applier will intentionally idle before fetching more log data from the master in case the master has
|
|
already sent all its log data and there have been previous log fetch attempts that resulted in no more
|
|
log data. This wait time can be used to control the maximum frequency with which the replication
|
|
applier sends HTTP log fetch requests to the master in case there is no write activity on the master
|
|
for longer periods. Note that this configuration value will only be used if the option *adaptivePolling*
|
|
is set to *true*.
|
|
|
|
To set a timeout for connection and following request attempts, use the *connectTimeout* and
|
|
*requestTimeout* values. The *maxConnectRetries* attribute configures after how many failed
|
|
connection attempts in a row the replication applier will give up and turn itself off.
|
|
You may want to set this to a high value so that temporary network outages do not lead to the
|
|
replication applier stopping itself.
|
|
The *connectRetryWaitTime* attribute configures how long the replication applier will wait
|
|
before retrying the connection to the master in case of connection problems.
|
|
|
|
The *chunkSize* attribute can be used to control the approximate maximum size of a master's
|
|
response (in bytes). Setting it to a low value may make the master respond faster (less data is
|
|
assembled before the master sends the response), but may require more request-response roundtrips.
|
|
Set it to *0* to use ArangoDB's built-in default value.
|
|
|
|
The *includeSystem* attribute controls whether changes to system collections (such as *_graphs* or
|
|
*_users*) should be applied. If set to *true*, changes in these collections will be replicated,
|
|
otherwise, they will not be replicated. It is often not necessary to replicate data from system
|
|
collections, especially because it may lead to confusion on the slave because the slave needs to
|
|
have its own system collections in order to start and keep operational.
|
|
|
|
The *requireFromPresent* attribute controls whether the applier will start synchronizing in case
|
|
it detects that the master cannot provide data for the initial tick value provided by the slave.
|
|
This may be the case if the master does not have a big enough backlog of historic WAL logfiles,
|
|
and when the replication is re-started after a longer pause. When *requireFromPresent* is set to
|
|
*true*, then the replication applier will check at start whether the start tick from which it starts
|
|
or resumes replication is still present on the master. If not, then there would be data loss. If
|
|
*requireFromPresent* is *true*, the replication applier will abort with an appropriate error message.
|
|
If set to *false*, then the replication applier will still start, and ignore the data loss.
|
|
|
|
The *autoResync* option can be used in conjunction with the *requireFromPresent* option as follows:
|
|
when both *requireFromPresent* and *autoResync* are set to *true* and the master cannot provide the
|
|
log data the slave requests, the replication applier will stop as usual. But due to the fact
|
|
that *autoResync* is set to true, the slave will automatically trigger a full resync of all data with
|
|
the master. After that, the replication applier will go into continuous replication mode again.
|
|
Additionally, setting *autoResync* to *true* will trigger a full re-synchronization of data when
|
|
the continuous replication is started and detects that there is no start tick value.
|
|
|
|
Automatic re-synchronization may transfer a lot of data from the master to the slave and can be
|
|
expensive. It is therefore turned off by default. When turned off, the slave will never perform an
|
|
automatic re-synchronization with the master.
|
|
|
|
The *autoResyncRetries* option can be used to control the number of resynchronization retries that
|
|
will be performed in a row when automatic resynchronization is enabled and kicks in. Setting this to
|
|
*0* will effectively disable *autoResync*. Setting it to some other value will limit the number of retries
|
|
that are performed. This helps preventing endless retries in case resynchronizations always fail.
|
|
|
|
The *verbose* attribute controls the verbosity of the replication logger. Setting it to `true` will
|
|
make the replication applier write a line to the log for every operation it performs. This should
|
|
only be used for diagnosing replication problems.
|
|
|
|
The following example will set most of the discussed properties for the current database's applier:
|
|
|
|
```js
|
|
require("@arangodb/replication").applier.properties({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "root",
|
|
password: "secret",
|
|
adaptivePolling: true,
|
|
connectTimeout: 15,
|
|
maxConnectRetries: 100,
|
|
chunkSize: 262144,
|
|
autoStart: true,
|
|
includeSystem: true,
|
|
autoResync: true,
|
|
autoResyncRetries: 2,
|
|
});
|
|
```
|
|
|
|
After the applier is now fully configured, it could theoretically be started. However, we
|
|
may first need an initial synchronization of all collections and their data from the master before
|
|
we start the replication applier.
|
|
|
|
The only safe method for doing a full synchronization (or re-synchronization) is thus to
|
|
|
|
* stop the replication applier on the slave (if currently running)
|
|
* perform an initial full sync with the master database
|
|
* note the master database's *lastLogTick* value and
|
|
* start the continuous replication applier on the slave using this tick value.
|
|
|
|
The initial synchronization for the current database is executed with the *sync* command:
|
|
|
|
```js
|
|
require("@arangodb/replication").sync({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "root",
|
|
password: "secret,
|
|
includeSystem: true
|
|
});
|
|
```
|
|
|
|
The *includeSystem* option controls whether data from system collections (such as *_graphs* and
|
|
*_users*) shall be synchronized.
|
|
|
|
The initial synchronization can optionally be configured to include or exclude specific
|
|
collections using the *restrictType* and *restrictCollection* parameters.
|
|
|
|
The following command only synchronizes collection *foo* and *bar*:
|
|
|
|
```js
|
|
require("@arangodb/replication").sync({
|
|
endpoint: "tcp://master.domain.org:8529",
|
|
username: "root",
|
|
password: "secret,
|
|
restrictType: "include",
|
|
restrictCollections: [ "foo", "bar" ]
|
|
});
|
|
```
|
|
|
|
Using a *restrictType* of *exclude*, all collections but the specified will be synchronized.
|
|
|
|
**Warning**: *sync* will do a full synchronization of the collections in the current database with
|
|
collections present in the master database.
|
|
Any local instances of the collections and all their data are removed! Only execute this
|
|
command if you are sure you want to remove the local data!
|
|
|
|
As *sync* does a full synchronization, it might take a while to execute.
|
|
When *sync* completes successfully, it returns an array of collections it has synchronized in its
|
|
*collections* attribute. It will also return the master database's last log tick value
|
|
at the time the *sync* was started on the master. The tick value is contained in the *lastLogTick*
|
|
attribute of the *sync* command:
|
|
|
|
```js
|
|
{
|
|
"lastLogTick" : "231848833079705",
|
|
"collections" : [ ... ]
|
|
}
|
|
```
|
|
Now you can start the continuous synchronization for the current database on the slave
|
|
with the command
|
|
|
|
```js
|
|
require("@arangodb/replication").applier.start("231848833079705");
|
|
```
|
|
|
|
**Note**: The tick values should be treated as strings. Using numeric data types for tick
|
|
values is unsafe because they might exceed the 32 bit value and the IEEE754 double accuracy
|
|
ranges.
|