Master/Slave Architecture ========================= Introduction ------------ In a _Master/Slave_ setup one or more ArangoDB _slaves_ asynchronously replicate from a _master_. The _master_ is the ArangoDB instance where all data-modification operations should be directed to. The _slave_ is the ArangoDB instance that replicates the data from the master. Components ---------- ### Replication Logger **Purpose** The _replication logger_ will write all data-modification operations into the _write-ahead log_. This log may then be read by clients to replay any data modification on a different server. **Checking the state** To query the current state of the _logger_, use the *state* command: require("@arangodb/replication").logger.state(); The result might look like this: ```js { "state" : { "running" : true, "lastLogTick" : "2064735086", "lastUncommittedLogTick" : "2064735086", "totalEvents" : 2064735086, "time" : "2019-03-01T11:38:39Z" }, "server" : { "version" : "3.4.4", "serverId" : "135694526467864", "engine" : "rocksdb" }, "clients" : [ { "serverId" : "46402312160836", "time" : "2019-03-01T11:38:39Z", "expires" : "2019-03-01T13:38:39Z", "lastServedTick" : "2064459411" }, { "serverId" : "260321896124903", "time" : "2019-03-01T11:29:45Z", "expires" : "2019-03-01T13:29:45Z", "lastServedTick" : "2002717896" } ] } ``` The *running* attribute will always be true. In earlier versions of ArangoDB the replication was optional and this could have been *false*. The *totalEvents* attribute indicates how many log events have been logged since the start of the ArangoDB server. The *lastLogTick* value indicates the _id_ of the last committed operation that was written to the server's _write-ahead log_. It can be used to determine whether new operations were logged, and is also used by the _replication applier_ for incremental fetching of data. The *lastUncommittedLogTick* value contains the _id_ of the last uncommitted operation that was written to the server's WAL. For the RocksDB storage engine, *lastLogTick* and *lastUncommittedLogTick* are identical, as the WAL only contains committed operations. The *clients* attribute reveals which clients (slaves) have connected to the master recently, and up to which tick value they caught up with the replication. **Note**: The replication logger state can also be queried via the [HTTP API](../../../../HTTP/Replications/index.html). To query which data ranges are still available for replication clients to fetch, the logger provides the *firstTick* and *tickRanges* functions: require("@arangodb/replication").logger.firstTick(); This will return the minimum tick value that the server can provide to replication clients via its replication APIs. The *tickRanges* function returns the minimum and maximum tick values per logfile: require("@arangodb/replication").logger.tickRanges(); ### Replication Applier **Purpose** The purpose of the _replication applier_ is to read data from a master database's event log, and apply them locally. The _applier_ will check the master database for new operations periodically. It will perform an incremental synchronization, i.e. only asking the master for operations that occurred after the last synchronization. The _replication applier_ does not get notified by the master database when there are "new" operations available, but instead uses the pull principle. It might thus take some time (the so-called *replication lag*) before an operation from the master database gets shipped to, and applied in, a slave database. The _replication applier_ of a database is run in a separate thread. It may encounter problems when an operation from the master cannot be applied safely, or when the connection to the master database goes down (network outage, master database is down or unavailable etc.). In this case, the database's _replication applier_ thread might terminate itself. It is then up to the administrator to fix the problem and restart the database's _replication applier_. If the _replication applier_ cannot connect to the master database, or the communication fails at some point during the synchronization, the _replication applier_ will try to reconnect to the master database. It will give up reconnecting only after a configurable amount of connection attempts. The _replication applier_ state is queryable at any time by using the *state* command of the _applier_. This will return the state of the _applier_ of the current database: ```js require("@arangodb/replication").applier.state(); ``` The result might look like this: ```js { "state" : { "started" : "2019-03-01T11:36:33Z", "running" : true, "phase" : "running", "lastAppliedContinuousTick" : "2050724544", "lastProcessedContinuousTick" : "2050724544", "lastAvailableContinuousTick" : "2050724546", "safeResumeTick" : "2050694546", "ticksBehind" : 2, "progress" : { "time" : "2019-03-01T11:36:33Z", "message" : "fetching master log from tick 2050694546, last scanned tick 2050664547, first regular tick 2050544543, barrier: 0, open transactions: 1, chunk size 6291456", "failedConnects" : 0 }, "totalRequests" : 2, "totalFailedConnects" : 0, "totalEvents" : 50010, "totalDocuments" : 50000, "totalRemovals" : 0, "totalResyncs" : 0, "totalOperationsExcluded" : 0, "totalApplyTime" : 1.1071290969848633, "averageApplyTime" : 1.1071290969848633, "totalFetchTime" : 0.2129514217376709, "averageFetchTime" : 0.10647571086883545, "lastError" : { "errorNum" : 0 }, "time" : "2019-03-01T11:36:34Z" }, "server" : { "version" : "3.4.4", "serverId" : "46402312160836" }, "endpoint" : "tcp://master.example.org", "database" : "test" } ``` The *running* attribute indicates whether the _replication applier_ of the current database is currently running and polling the master at *endpoint* for new events. The *started* attribute shows at what date and time the applier was started (if at all). The *progress.failedConnects* attribute shows how many failed connection attempts the _replication applier_ currently has encountered in a row. In contrast, the *totalFailedConnects* attribute indicates how many failed connection attempts the _applier_ has made in total. The *totalRequests* attribute shows how many requests the _applier_ has sent to the master database in total. The *totalEvents* attribute shows how many log events the _applier_ has read from the master. The *totalDocuments* and *totalRemovals* attributes indicate how may document operations the slave has applied locally. The attributes *totalApplyTime* and *totalFetchTime* show the total time the applier spent for applying data batches locally, and the total time the applier waited on data-fetching requests to the master, respectively. The *averageApplyTime* and *averageFetchTime* attributes show the average times clocked for these operations. Note that the average times will greatly be influenced by the chunk size used in the applier configuration (bigger chunk sizes mean less requests to the slave, but the batches will include more data and take more time to create and apply). The *progress.message* sub-attribute provides a brief hint of what the _applier_ currently does (if it is running). The *lastError* attribute also has an optional *errorMessage* sub-attribute, showing the latest error message. The *errorNum* sub-attribute of the *lastError* attribute can be used by clients to programmatically check for errors. It should be *0* if there is no error, and it should be non-zero if the _applier_ terminated itself due to a problem. Below is an example of the state after the _replication applier_ terminated itself due to (repeated) connection problems: ```js { "state" : { "started" : "2019-03-01T11:51:18Z", "running" : false, "phase" : "inactive", "lastAppliedContinuousTick" : "2101606350", "lastProcessedContinuousTick" : "2101606370", "lastAvailableContinuousTick" : "2101606370", "safeResumeTick" : "2101606350", "progress" : { "time" : "2019-03-01T11:52:45Z", "message" : "applier shut down", "failedConnects" : 6 }, "totalRequests" : 19, "totalFailedConnects" : 6, "totalEvents" : 0, "totalDocuments" : 0, "totalRemovals" : 0, "totalResyncs" : 0, "totalOperationsExcluded" : 0, "totalApplyTime" : 0, "averageApplyTime" : 0, "totalFetchTime" : 0.03386974334716797, "averageFetchTime" : 0.0028224786122639975, "lastError" : { "errorNum" : 1400, "time" : "2019-03-01T11:52:45Z", "errorMessage" : "could not connect to master at tcp://127.0.0.1:8529 for URL /_api/wal/tail?chunkSize=6291456&barrier=0&from=2101606369&lastScanned=2101606370&serverId=46402312160836&includeSystem=true&includeFoxxQueues=false: Could not connect to 'http+tcp://127.0.0.1:852..." }, "time" : "2019-03-01T11:52:56Z" }, "server" : { "version" : "3.4.4", "serverId" : "46402312160836" }, "endpoint" : "tcp://master.example.org", "database" : "test" } ``` **Note**: the state of a database's replication applier is queryable via the HTTP API, too. Please refer to [HTTP Interface for Replication](../../../../HTTP/Replications/index.html) for more details.