1
0
Fork 0

updated documentation

This commit is contained in:
Jan Steemann 2013-10-01 17:36:09 +02:00
parent ed64939dfc
commit 46d49e55b2
12 changed files with 333 additions and 168 deletions

View File

@ -94,3 +94,22 @@ Managing Databases using HTTP {#HttpDatabaseHttp}
@anchor HttpDatabaseDelete
@copydetails JSF_delete_api_database
@CLEARPAGE
Notes on Databases {#HttpDatabaseGlossary}
==========================================
Please keep in mind that each database contains its own system collections,
which need to set up when a database is created. This will make the creation
of a database take a while. Replication is configured on a per-database level,
meaning that any replication logging or applying for the a new database must
be configured explicitly after a new database has been created. Foxx applications
are also available only in the context of the database they have been installed
in. A new database will only provide access to the system applications shipped
with ArangoDB (that is the web interface at the moment) and no other Foxx
applications until they are explicitly installed for the particular database.
@copydoc GlossaryDatabase
@copydoc GlossaryDatabaseName
@copydoc GlossaryDatabaseOrganisation

View File

@ -10,3 +10,4 @@ TOC {#HttpDatabaseTOC}
- @ref HttpDatabaseList "GET /_api/database"
- @ref HttpDatabaseCreate "POST /_api/database"
- @ref HttpDatabaseDelete "DELETE /_api/database/database-name"
- @ref HttpDatabaseGlossary

View File

@ -18,12 +18,17 @@ The HTTP replication interface serves four main purposes:
- fetch the changelog from a server (used for incremental synchronisation of changes)
- administer the replication applier (starting, stopping, configuring, querying state)
Please note that all replication operations work on a database level. If an
ArangoDB server contains more than one database, the replication system must be
configured individually per database, and replicating the data of multiple
databases will require multiple operations.
Replication Dump Commands {#HttpReplicationDumpCommands}
--------------------------------------------------------
The `inventory` method can be used to query an ArangoDB server's current
The `inventory` method can be used to query an ArangoDB database's current
set of collections plus their indexes. Clients can use this method to get an
overview of which collections are present on the server. They can use this information
overview of which collections are present in the database. They can use this information
to either start a full or a partial synchronisation of data, e.g. to initiate a backup
or the incremental data synchronisation.
@ -42,16 +47,17 @@ parts of the dump results in the same order as they are served to them.
@anchor HttpReplicationDump
@copydetails triagens::arango::RestReplicationHandler::handleCommandDump
The `sync` method can be used by replication clients to connect an ArangoDB server
The `sync` method can be used by replication clients to connect an ArangoDB database
to a remote endpoint, fetch the remote list of collections and indexes, and collection
data.
It will thus create a local backup of the state of data at the remote ArangoDB server.
It will thus create a local backup of the state of data at the remote ArangoDB database.
`sync` works on a database level.
`sync` will first fetch the list of collections and indexes from the remote endpoint.
It does so by calling the `inventory` API of the remote server. It will then purge
data on the local ArangoDB instance, and after start will transfer collection data
from the remote server to the local ArangoDB instance. It will extract data from the
remote server by calling the remote's `dump` API until all data are fetched.
It does so by calling the `inventory` API of the remote database. It will then purge
data in the local ArangoDB database, and after start will transfer collection data
from the remote database to the local ArangoDB database. It will extract data from the
remote database by calling the remote database's `dump` API until all data are fetched.
As mentioned, `sync` will remove data from the local instance, and thus must be handled
with caution.
@ -63,8 +69,8 @@ with caution.
Replication Logger Commands {#HttpReplicationLoggerCommands}
------------------------------------------------------------
The logger commands allow starting, starting, and fetching the current state of
the replication logger.
The logger commands allow starting, starting, and fetching the current state of a
database's replication logger.
@anchor HttpReplicationLoggerGetConfig
@copydetails triagens::arango::RestReplicationHandler::handleCommandLoggerGetConfig
@ -85,11 +91,11 @@ the replication logger.
@anchor HttpReplicationLoggerState
@copydetails triagens::arango::RestReplicationHandler::handleCommandLoggerState
To query the latest changes logged by the replication logger, the Http interface
To query the latest changes logged by the replication logger, the HTTP interface
also provides the `logger-follow`.
This method should be used by replication clients to incrementally fetch updates
from an ArangoDB instance.
from an ArangoDB database.
@anchor HttpReplicationLoggerFollow
@copydetails triagens::arango::RestReplicationHandler::handleCommandLoggerFollow
@ -98,7 +104,7 @@ Replication Applier Commands {#HttpReplicationApplierCommands}
--------------------------------------------------------------
The applier commands allow to remotely start, stop, and query the state and
configuration of an ArangoDB server's replication applier.
configuration of an ArangoDB database's replication applier.
@anchor HttpReplicationApplierGetConfig
@copydetails triagens::arango::RestReplicationHandler::handleCommandApplierGetConfig

View File

@ -4,6 +4,11 @@ Database {#GlossaryDatabase}
@GE{Database}: ArangoDB can handle multiple databases in the same server
instance. Databases can be used to logically group and separate data. An ArangoDB
database consists of collections and dedicated database-specific worker processes.
A database contains its own collections (which cannot be accessed from other databases),
Foxx applications, and replication loggers and appliers. Each ArangoDB database
contains its own system collections (e.g. `_users`, `_replication`, ...).
There will always be at least one database in ArangoDB. This is the default
database, named `_system`. This database cannot be dropped, and provides special
operations for creating, dropping, and enumerating databases.
@ -11,5 +16,6 @@ Users can create additional databases and give them unique names to access them
Database management operations cannot be initiated from out of user-defined databases.
When ArangoDB is accessed via its HTTP REST API, the database name is read from the
request URI. If the request URI does not contain a database name, the database name
is automatically determined by some algorithm.
first part of the request URI path (e.g. `/_db/_system/...`). If the request URI does
not contain a database name, the database name is automatically determined by the
algorithm described in @ref HttpDatabaseMapping.

View File

@ -15,8 +15,29 @@ for details.
Foxx {#NewFeatures14Foxx}
-------------------------
### Foxx Improvements
TODO
### Standard Authentication
TODO
### Foxx Manager
ArangoDB builds now ship with an additional binary `foxx-manager` that can be used
to conveniently manage Foxx application in an ArangoDB instance. To make it easy to
start with Foxx, the `foxx-manager` provides commands to download example Foxx
applications from a central repository at `https://github.com/arangodb/foxx-apps`,
and install them locally:
> foxx-manager update
> foxx-manager install hello-foxx /hello-foxx
This will first mirror the central list of available Foxx applications locally, and
after that make the `hello-foxx` application available in the local ArangoDB database
under the URL prefix `/hello-foxx`.
Multiple Databases {#NewFeatures14MultipleDatabases}
----------------------------------------------------
@ -98,10 +119,6 @@ binary:
unix> foxx-manager --server.database mydb update
unix> foxx-manager --server.database mydb install hello-foxx /hello-foxx
@copydoc GlossaryDatabaseName
@copydoc GlossaryDatabaseOrganisation
Runtime Endpoint Management {#NewFeatures14Endpoints}
-----------------------------------------------------

View File

@ -114,6 +114,32 @@ There is also the option of using so-called *system applications*. These are not
database-specific but shared between all databases. ArangoDB is shipped with one
system application named *aardvark*, which is ArangoDB's web-based admin interface.
Addressing specific Databases in Requests {#Upgrading14Databases}
=================================================================
In pre-1.4, an ArangoDB instance only managed a single database, so there was no need
to specify a database name in any of the incoming HTTP requests. In ArangoDB 1.4, the
server can manage multiple datbases, and it may be necessary in a request to specify
which database the request should be executed in.
A specific ArangoDB database can be explicitly addressed by putting the database name
into the URL. If the first part of the URL path is `/_db/...`, ArangoDB will interpret
the `...` as the database name, and strip off the database name from the URL path for
further processing. This allows any actions that use URL paths to remain fully functional.
If no database name is specified in the URL path, ArangoDB will use the algorithm
described in @ref HttpDatabaseMapping to determine the database context for the request.
If no extra endpoints are used, the algorithm will default to the `_system` database.
A just upgraded ArangoDB instance will have all its collections and applications be
mapped to the `_system` database too, meaning an upgraded ArangoDB instance should remain
fully functional.
ArangoDB clients and drivers are not forced to supplydatabase names as part of the ArangoDB
URLs they call because of this compatibility functionaltiy. However, if clients need to
access a specific database in a multiple database context, they will be required to supply
the database name as part of the URLs. Most clients will use just one database most of the
time, making it sufficient to set the database name when the connection to the server is
established and then prefixing all requests with the database name initially set.
Troubleshooting {#Upgrading14Troubleshooting}
=============================================

View File

@ -2,5 +2,6 @@ TOC {#Upgrading14TOC}
=====================
- @ref Upgrading14Introduction
- @ref Upgrading14Databases
- @ref Upgrading14Troubleshooting
- @ref Upgrading14RemovedFeatures

View File

@ -4,11 +4,14 @@ Replication Events{#RefManualReplication}
@NAVIGATE_RefManualReplication
@EMBEDTOC{RefManualReplicationTOC}
The replication logger in ArangoDB will log all events into the `_replication`
system collection. It will only log events when the logger is enabled.
A database's replication logger in ArangoDB will log all its events into the
`_replication` system collection in the same database. The `_replication`
collection will therefore only contain events of a single database.
Events will only be logged during the time the database's replication logger
is enabled.
The following sections describe in detail the structure of the log events
returned by this API.
returned by the replication APIs.
Replication Event Types{#RefManualReplicationEventTypes}
========================================================

View File

@ -335,8 +335,39 @@ by the `BaseMiddleware` provided by Foxx.
The Request Object
------------------
Every request object has the `path` method from the underlying Actions.
This is the complete path as supplied by the user as a String.
The `request` object inherits several attributes from the underlying Actions:
* `user`: the name of the current ArangoDB user. This will be populated only
if authentication is turned on, and will be `null` otherwise.
* `database`: the name of the current database (e.g. `_system`)
* `protocol`: `http` or `https`
* `server`: a JSON object with sub-attributes `address` (containing server
host name or IP address) and `port` (server port).
* `path`: request URI path, with potential database name stripped off.
* `url`: request URI path + query string, with potential database name
stripped off
* `headers`: a JSON object with the request headers as key/value pairs
* `cookies`: a JSON object with the request cookies as key/value pairs
* `requestType`: the request method (e.g. "GET", "POST", "PUT", ...)
* `requestBody`: the complete body of the request as a string
* `parameters`: a JSON object with all parameters set in the URL as key/value
pairs
* `urlParameters`: a JSON object with all named parameters defined for the
route as key/value pairs.
In addition to these attributes, a Foxx request objects provides the following
convenience methods:
### Body
@ -357,7 +388,7 @@ The Response Object
Every response object has the body attribute from the underlying Actions
to set the raw body by hand.
You provide your response body as a String here.
You provide your response body as a string here.
### Status
@ -749,7 +780,7 @@ the URL or the accept header. Say you request an URL like `/people.json`:
The `FormatMiddleware` will set the format of the request to JSON and then
delete the `.json` from the request. You can therefore write handlers that
do not take an `extension` into consideration and instead handle the
format via a simple String. To determine the format of the request it
format via a simple string. To determine the format of the request it
checks the URL and then the `accept` header. If one of them gives a format
or both give the same, the format is set. If the formats are not the same,
an error is raised.

View File

@ -36,10 +36,6 @@ supported way in ArangoDB is to use one database at a time for a command,
an action, a script or a query. Operations started in one database must
not switch the database later and continue operating in another.
@copydoc GlossaryDatabase
@copydoc GlossaryDatabaseName
@CLEARPAGE
Working with Databases {#HandlingDatabasesShell}
================================================
@ -82,3 +78,23 @@ database only.
@anchor HandlingDatabasesDrop
@copydetails JS_DropDatabase
@CLEARPAGE
Notes about Databases {#HandlingDatabasesGlossary}
--------------------------------------------------
Please keep in mind that each database contains its own system collections,
which need to set up when a database is created. This will make the creation
of a database take a while. Replication is configured on a per-database level,
meaning that any replication logging or applying for the a new database must
be configured explicitly after a new database has been created. Foxx applications
are also available only in the context of the database they have been installed
in. A new database will only provide access to the system applications shipped
with ArangoDB (that is the web interface at the moment) and no other Foxx
applications until they are explicitly installed for the particular database.
@copydoc GlossaryDatabase
@copydoc GlossaryDatabaseName
@copydoc GlossaryDatabaseOrganisation

View File

@ -13,3 +13,4 @@ TOC {#HandlingDatabasesTOC}
- @ref HandlingDatabasesList "db._listDatabases"
- @ref HandlingDatabasesCreate "db._createDatabase"
- @ref HandlingDatabasesDrop "db._dropDatabase"
- @ref HandlingDatabasesGlossary

View File

@ -8,30 +8,36 @@ Introduction to Replication {#UserManualReplicationIntro}
=========================================================
Starting with ArangoDB 1.4, ArangoDB comes with optional asynchronous master-slave
replication.
replication. Replication is configured on a per-database level, meaning that
different databases in the same ArangoDB instance can have different replication
settings. Replication must be turned on explicitly before it becomes active for a
database.
In a typical master-slave replication setup, clients direct *all* their write
operations to the master. The master is the only place to connect to when making
any insertions/updates/deletions.
operations for a specific database to the master. The master database is the only
place to connect to when making any insertions/updates/deletions.
The master will log all write operations in its so-called *event log*. The event log
can be considered as an ordered stream of changes.
The master database will log all write operations in its so-called *event log*.
The event log can be considered as an ordered stream of changes for the database.
Any number of slave servers can then connect to the master and fetch data from the
master's event log. The slaves then can apply all the events from the log in the
same order locally. After that, they will have the same state of data as the master.
Any number of slaves can then connect to the master database and fetch data from the
master database's event log. The slaves then can apply all the events from the log in
the same order locally. After that, they will have the same state of data as the master
database.
In this setup, write operations are applied first on the master, and applied on the
slave afterwards. For example, let's assume a write operation is applied and logged
on the master at point in time t0.
In this setup, write operations are applied first in the master database, and applied
in the slave database(s) afterwards.
To make a slave apply the same operation, it must first fetch the write operation's
data from master's event log, then parse it and apply it locally. This will happen
after t0, let's say at point in time t1. The difference between t1 and t0 is called the
*replication lag*, and it is unavoidable in asychronous replication. The amount of
replication lag depends on many factors. A few of them are:
For example, let's assume a write operation is applied and logged in the master database
at point in time t0. To make a slave database apply the same operation, it must first
fetch the write operation's data from master database's event log, then parse it and
apply it locally. This will happen at some point in time after t0, let's say t1.
The difference between t1 and t0 is called the *replication lag*, and it is unavoidable
in asychronous replication. The amount of replication lag depends on many factors, a
few of which are:
- the network capacity between the slaves and the master
- the load and the master and the slaves
- the load of the master and the slaves
- the frequency in which slaves poll the master for updates
Between t0 and t1, the state of data on the master is newer than the state of data
@ -42,26 +48,30 @@ between). Thus, the replication will lead to an *eventually consistent* state of
Transactions are honored in replication, i.e. transactional write operations will
become visible on slaves atomically.
As all write operations will be logged to the master's event log, the replication in
ArangoDB 1.4 cannot be used for write-scaling. The main purposes of the replication
in ArangoDB 1.4 are to provide read-scalability and "hot backups".
As all write operations will be logged to a master database's event log, the replication
in ArangoDB 1.4 cannot be used for write-scaling. The main purposes of the replication
in ArangoDB 1.4 are to provide read-scalability and "hot backups" for specific databases.
It is possible to connect multiple slaves to the same master. Slaves should be used as
read-only instances, and no user-initiated write operations should be carried out on
them. Otherwise data conflicts may occur that cannot be solved automatically, and that
will make the replication stop. Master-master (or multi-master) replication is not
supported in ArangoDB 1.4.
It is possible to connect multiple slave databaes to the same master database. Slave
databases should be used as read-only instances, and no user-initiated write operations
should be carried out on them. Otherwise data conflicts may occur that cannot be solved
automatically, and that will make the replication stop. Master-master (or multi-master)
replication is not supported in ArangoDB 1.4.
The replication in ArangoDB 1.4 is asychronous, meaning that slaves will *pull*
changes from the master. Slaves need to know to which master they should connect to,
but a master is not aware of the slaves that replicate from it. When the network
connection between the master and a slave goes down, write operations on the
master can continue normally. When the network is up again, slaves can reconnect
to the master and transfer the remaining changes. This will happen automatically
provided slaves are configured appropriately.
changes from the master database. Slaves need to know to which master database they should
connect to, but a master database is not aware of the slaves that replicate from it.
When the network connection between the master database and a slave goes down, write
operations on the master can continue normally. When the network is up again, slaves
can reconnect to the master database and transfer the remaining changes. This will
happen automatically provided slaves are configured appropriately.
The replication is turned off by default. In order to create a master-slave setup,
the replication features need to be enabled on both the master and the slave(s).
the replication features need to be enabled on both the master database and the slave
databases.
Replication is configured on a per-database level. If multiple database are to be
replicated, the replication must be set up individually per database.
Components {#UserManualReplicationComponents}
=============================================
@ -72,8 +82,11 @@ can be used together or in isolation: the *replication logger* and the *replicat
Both are available in ArangoDB 1.4 and can be administered via the command line or
a REST API (see @ref HttpReplication).
In most cases, the *replication logger* will be run on a master, and the
*replication applier* will be executed on slave servers.
In most cases, the *replication logger* will be run inside a master database, and the
*replication applier* will be executed in a slave database.
As there can be multiple database inside an ArangoDB instance, there can be multiple
replication loggers and multiple replication appliers per ArangoDB instance.
Replication Logger {#UserManualReplicationLogger}
-------------------------------------------------
@ -81,33 +94,34 @@ Replication Logger {#UserManualReplicationLogger}
###Purpose
The purpose of the replication logger is to log all changes that modify the state
of data. This includes document insertions, updates, and deletions. It also includes
creating, dropping, renaming and changing collections and indexes.
of data in a specific database. This includes document insertions, updates, and deletions.
It also includes creating, dropping, renaming and changing collections and indexes.
When the replication logger is used, it will log all these write operations in its
*event log*, in the same order in which the operations were carried out originally.
When the replication logger is used, it will log all these write operations for the
database in its *event log*, in the same order in which the operations were carried out
originally.
Reading the event log sequentially provides a list of all write operations carried out.
Replication clients can request certain log events from the logger.
Reading the event log sequentially provides a list of all write operations carried out
in the database. Replication clients can request certain log events from the logger.
For example, ArangoDB's replication applier will permanently query the latest events
from the replication logger. The applier will then apply all changes locally to get to
the same state of data as the master server. It will keep track of which parts of
the event log it already synchronised, meaning that it will perform incremental
synchronisation.
from the replication logger. The applier of the slave database will then apply all changes
locally to get to the same state of data as the master database. It will keep track of
which parts of the event log it already synchronised, meaning that it will perform an
incremental synchronisation.
Technically, the event log is a system collection named `_replication`. The event log
is persisted and will still be present after a server shutdown or crash. The event log's
underlying collection should not be modified by users directly. It should ony be
accessed using the special API methods offered by ArangoDB.
Technically, the event log is a system collection named `_replication` inside the database
the applier runs in. The event log is persisted and will still be present after a server
shutdown or crash. The event log's underlying collection should not be modified by users
directly. It should only be accessed using the special API methods offered by ArangoDB.
###Starting and Stopping
ArangoDB will only log changes if the replication logger is turned on. Should there be
any write operations while the replication logger is turned off, these events will
be lost for replication.
ArangoDB will only log changes if the replication logger is turned on for the specific
database. Should there be any write operations in the database while the replication logger
is turned off, these events will be lost for replication.
To turn on the replication logger once, the following command can be executed:
To turn on the replication logger for a database once, the following command can be executed:
require("org/arangodb/replication").logger.start();
@ -115,7 +129,11 @@ Note that starting the replication logger does not necessarily mean it will be s
automatically on all following server startups. This can be configured seperately (keep on
reading for this).
To turn the replication logger off, execute the `stop` command:
Please note that the above command turns on the replication logger for the current database
only, but not for all databases. To turn on the replication logger for multiple databases,
the above command must be executed separately for each database.
To turn the replication logger off for the current database, execute the `stop` command:
require("org/arangodb/replication").logger.stop();
@ -123,8 +141,8 @@ This has turned off the logger, but still the logger may be started again automa
the next time the ArangoDB server is started. This can be configured separately (again,
keep on reading).
To determine the current state of the replication logger (including whether it is currently
running or not), use the `state` command:
To determine the current state of the replication logger in the current database (including
whether it is currently running or not), use the `state` command:
require("org/arangodb/replication").logger.state();
@ -156,13 +174,14 @@ Note: the replication logger state can also be queried via the HTTP API (see @re
###Configuration
To determine whether the replication logger is automatically started when the ArangoDB
server is started, the logger has a separate configuration. The configuration is stored
in a file `REPLICATION-LOGGER-CONFIG` inside the database directory. If it does not
exist, ArangoDB will use default values.
To determine whether the current database's replication logger is automatically started
when the ArangoDB server is started, the logger has a separate configuration. The
configuration is stored in a file `REPLICATION-LOGGER-CONFIG` inside the current database's
directory. If it does not exist, ArangoDB will use default values.
To check and adjust the configuration of the replication logger, use the `properties` command.
To view the current configuration, use `properties` without any arguments:
To check and adjust the configuration of the replication logger for the current database,
use the `properties` command. To view the current configuration, use `properties` without
any arguments:
require("org/arangodb/replication").logger.properties();
@ -175,22 +194,23 @@ The result might look like this:
"maxEventsSize" : 134217728
}
The `autoStart` attribute indicates whether the replication logger is automatically started
whenever the ArangoDB server is started. You may want to set it to `true` for a proper master
setup. To do so, supply the updated attributes to the `properties` command, e.g.:
The `autoStart` attribute indicates whether the current database's replication logger is
automatically started whenever the ArangoDB server is started. You may want to set it to
`true` for a proper master setup. To do so, supply the updated attributes to the `properties`
command, e.g.:
require("org/arangodb/replication").logger.properties({
autoStart: true
});
This will ensure the replication logger is automatically started on all following ArangoDB
starts unless `autoStart` is set to `false` again.
This will ensure the replication logger for the current database is automatically started
on all following ArangoDB starts unless `autoStart` is set to `false` again.
Note that only those attributes will be changed that you supplied in the argument to `properties`.
All other configuration values will remain unchanged. Also note that the replication logger
can be reconfigured while it is running.
It the replication logger is turned on, the event log may be allowed to grow indefinitely.
It a replication logger is turned on, the event log may be allowed to grow indefinitely.
It may be sensible to set a maximum size or a maximum number of events to keep.
If one of these thresholds is reached during logging, the replication logger will start
removing the oldest events from the event log automatically.
@ -218,28 +238,28 @@ Replication Applier {#UserManualReplicationApplier}
###Purpose
The purpose of the replication applier is to read data from a master's event log, and
apply them locally. The applier will check the master for new events periodically.
The purpose of the replication applier is to read data from a master database's event log,
and apply them locally. The applier will check the master database for new events periodically.
It will perform an incremental synchronisation, i.e. only asking the master for events
that occurred after the last synchronisation.
The replication applier does not get notified by the master when there are "new" events,
but uses the pull principle. It might thus take some time (the *replication lag*) before
an event that from the master gets shipped to and applied on a slave.
The replication applier does not get notified by the master database when there are "new"
events, but uses the pull principle. It might thus take some time (the *replication lag*)
before an event from the master database gets shipped to and applied in a slave database.
The replication applier is run in a separate thread. It may encounter problems when a
log event from the master cannot be applied safely, or when the connection to the master
goes down (network outage, master is down etc.). In this case, the replication applier
thread might terminate itself. It is then up to the administrator to fix the problem and
restart the replication applier.
The replication applier of a database is run in a separate thread. It may encounter problems
when a log event from the master cannot be applied safely, or when the connection to the master
database goes down (network outage, master database is down or unavailable etc.). In this case,
the database's replication applier thread might terminate itself. It is then up to the
administrator to fix the problem and restart the database's replication applier.
If the replication applier cannot connect to the master, or the communication fails at
If the replication applier cannot connect to the master database, or the communication fails at
some point during the synchronisation, the replication applier will try to reconnect to
the master. It will give up reconnecting only after a configurable amount of connection
the master database. It will give up reconnecting only after a configurable amount of connection
attempts.
The replication applier state is queryable at any time by using the `state` command of the
applier:
applier. This will return the state of the applier of the current database:
require("org/arangodb/replication").applier.state();
@ -271,14 +291,14 @@ The result might look like this:
"endpoint" : "tcp://master.domain.org:8529"
}
The `running` attribute indicates whether the replication applier is currently running
and polling the server at `endpoint` for new events.
The `running` attribute indicates whether the replication applier of the current database
is currently running and polling the server at `endpoint` for new events.
The `failedConnects` attribute shows how many failed connection attempts the replication
applier currently has encountered in a row. In contrast, the `totalFailedConnects` attribute
indicates how many failed connection attempts the applier has made in total. The
`totalRequest` attribute shows how many requests the applier has sent to the master in
total. The `totalEvents` attribute shows how many log events the applier has read from the
`totalRequest` attribute shows how many requests the applier has sent to the master database
in total. The `totalEvents` attribute shows how many log events the applier has read from the
master.
The `message` sub-attribute of the `progress` sub-attribute gives a brief hint of what the
@ -311,36 +331,38 @@ Here is an example of the state after the replication applier terminated itself
}
}
Note: the state of the replication applier is queryable via the HTTP API, too. Please refer to
@ref HttpReplication for more details.
Note: the state of a database's replication applier is queryable via the HTTP API, too.
Please refer to @ref HttpReplication for more details.
###Starting and Stopping
To start and stop the applier, the `start` and `stop` commands can be used:
To start and stop the applier in the current database, the `start` and `stop` commands can
be used:
require("org/arangodb/replication").applier.start(<tick>);
require("org/arangodb/replication").applier.stop();
Note that starting a replication applier without setting up an initial configuration will
fail. The replication applier will look for its configuration in a file named
`REPLICATION-APPLIER-CONFIG` in the database directory. If the file is not present, ArangoDB
will use some default configuration, but it cannot guess the endpoint (the address of the master)
the applier should connect to. Thus starting the applier without configuration will fail.
`REPLICATION-APPLIER-CONFIG` in the current database's directory. If the file is not present,
ArangoDB will use some default configuration, but it cannot guess the endpoint (the address
of the master database) the applier should connect to. Thus starting the applier without
configuration will fail.
Note that starting the replication applier via the `start` command will not necessarily start
the applier on the next and following ArangoDB server restarts. Additionally, stopping the applier
manually will not necessarily prevent the applier from being started again on the next
server start. All of this is configurable seperately (hang on reading).
Note that starting a database's replication applier via the `start` command will not necessarily
start the applier on the next and following ArangoDB server restarts. Additionally, stopping a
database's replication applier manually will not necessarily prevent the applier from being
started again on the next server start. All of this is configurable seperately (hang on reading).
Please note that when starting the replication applier, it will resume where it stopped.
Please note that when starting the replication applier of database, it will resume where it stopped.
This is sensible because replication log events should be applied incrementally. If the
replication applier has never been started before, it needs some `tick` value from the
master's event log from which to start fetching events.
replication applier of a database has never been started before, it needs some `tick` value from the
master database's event log from which to start fetching events.
####Configuration
To configure the replication applier, use the `properties` command. Using it without any
arguments will return the current configuration:
To configure the replication applier of a specific database, use the `properties` command. Using
it without any arguments will return the current configuration:
require("org/arangodb/replication").applier.properties();
@ -366,18 +388,19 @@ for the connection via the `username` and `password` attributes.
password: "secret"
});
This will re-configure the replication applier. The configuration will be used from the next
start of the replication applier. The replication applier cannot be re-configured while it is
running. It must be stopped first to be re-configured.
This will re-configure the replication applier for the current database. The configuration will be
used from the next start of the replication applier. The replication applier cannot be re-configured
while it is running. It must be stopped first to be re-configured.
To make the replication applier start automatically when the ArangoDB server starts, use the
`autoStart` attribute.
To make the replication applier of the current database start automatically when the ArangoDB server
starts, use the `autoStart` attribute.
Setting the `adaptivePolling` attribute to `true` will make the replication applier poll the
master for changes with a variable frequency. The replication applier will then lower the
master database for changes with a variable frequency. The replication applier will then lower the
frequency when the master is idle, and increase it when the master can provide new events).
Otherwise the replication applier will poll the master for changes with a constant frequency of
5 seconds if the master's replication logger is turned off, and 0.5 seconds if it is turned on.
Otherwise the replication applier will poll the master database for changes with a constant frequency
of 5 seconds if the master database's replication logger is turned off, and 0.5 seconds if it is
turned on.
To set a timeout for connection and following request attempts, use the `connectTimeout` and
`requestTimeout` values. The `maxConnectRetries` attribute configures after how many failed
@ -386,11 +409,11 @@ You may want to set this to a high value so that temporary network outages do no
replication applier stopping itself.
The `chunkSize` attribute can be used to control the approximate maximum size of a master's
reponse (in bytes). Setting it to a low value may make the master respond faster (less data is
response (in bytes). Setting it to a low value may make the master respond faster (less data is
assembled before the master sends the response), but may require more request-response roundtrips.
Set it to `0` to use ArangoDB's built-in default value.
The following example will set most of the discussed properties:
The following example will set most of the discussed properties for the current database's applier:
require("org/arangodb/replication").applier.properties({
endpoint: "tcp://master.domain.org:8529",
@ -405,15 +428,17 @@ The following example will set most of the discussed properties:
After the applier is now fully configured, it could theoretically be started. However, we
may first need an initial sychronisation of all collections and their data from the master before
we start the replication applier. The reason is that the replication logger on the master
may have been turned on the first after some collections have been created, or it may have
been turned off temporarily etc.
we start the replication applier. The reason is that the replication logger of the master database
may not contain all data we need for a full synchronisation.
The only safe method for initially starting the continuous replication applier is thus to
do a full synchronisation with the master first, note the master's current `tick` value, and
start the continuous replication applier using this tick value.
The only safe method for doing a full synchronisation is thus to
The initial synchronisation is executed with the `sync` command:
* make sure the replication logger of the master database is running
* perform an intiial full sync with the master database
* note the master database's replication logger `tick` value and
* start the continuous replication applier using this tick value.
The initial synchronisation for the current database is executed with the `sync` command:
require("org/arangodb/replication").sync({
endpoint: "tcp://master.domain.org:8529",
@ -421,14 +446,15 @@ The initial synchronisation is executed with the `sync` command:
password: "secret
});
Warning: `sync` will do a full synchronisation of the collections present on the master.
Warning: `sync` will do a full synchronisation of the collections in the current database with
collections present in the master database.
Any local instances of the collections and all their data are removed! Only execute this
command when you are sure you want to remove the local data!
As `sync` does a full synchronisation, it may take a while to execute.
When `sync` completes successfully, it show a list of collections it has synchronised in the
`collections` attribute. It will also return the master's replication logger tick value at
the time the `sync` was started on the master. The tick value is contained in the `lastLogTick`
`collections` attribute. It will also return the master database's replication logger tick value
at the time the `sync` was started on the master. The tick value is contained in the `lastLogTick`
attribute of the `sync` command:
{
@ -436,8 +462,8 @@ attribute of the `sync` command:
"collections" : [ ... ]
}
If the master's replication logger is turned on, you can now start the continuous synchronisation
with the command
If the master database's replication logger is turned on, you can now start the continuous
synchronisation for the current database with the command
require("org/arangodb/replication").applier.start("231848833079705");
@ -451,16 +477,18 @@ Example Setup {#UserManualReplicationSetup}
Setting up a working master-slave replication requires two ArangoDB instances:
- _master_: this is the instance we'll activate the replication logger on
- _slave_: on this instance, we'll start a replication applier, and this will fetch data from the
master's events log and apply all events locally
master database's events log and apply all events locally
For the following example setup, we'll use the instance *tcp://master.domain.org:8529* as the
master, and the instance *tcp://slave.domain.org:8530* as a slave.
The goal is to have all data from the master *tcp://master.domain.org:8529* be replicated to
the slave *tcp://slave.domain.org:8530*.
The goal is to have all data from the database `_system` on master *tcp://master.domain.org:8529*
be replicated to the database `_system` on the slave *tcp://slave.domain.org:8530*.
On the *master*, configure the replication logger to start automatically on ArangoDB startup.
Additionally, set some restrictions for the event log size:
On the *master*, configure the replication logger for database `_system` to start automatically
on ArangoDB startup. Additionally, set some restrictions for the event log size:
db._useDatabase("_system");
require("org/arangodb/replication").logger.properties({
autoStart: true,
@ -468,30 +496,36 @@ Additionally, set some restrictions for the event log size:
maxEventsSize: 0
});
After that, start the replication logger on the master:
After that, start the replication logger for the `_system` database on the master:
db._useDatabase("_system");
require("org/arangodb/replication").logger.start();
On the *slave*, make sure there currently is no replication applier running:
On the *slave*, make sure there currently is no replication applier running in the `_system`
database:
db._useDatabase("_system");
require("org/arangodb/replication").applier.stop();
After that, do an initial sync of the slave with data from the master. Execute the following
command on the slave:
commands on the slave:
db._useDatabase("_system");
require("org/arangodb/replication").sync({
endpoint: "tcp://master.example.org:8529",
username: "myuser",
password: "mypasswd"
});
Warning: this will replace data on the slave with data from the master! Only execute the
command if you have verified you are on the correct server!
Warning: this will replace data in the slave database with data from the master database!
Only execute the commands if you have verified you are on the correct server, in the
correct database!
Then re-configure the slave's replication applier to point to the master, set the username and
password for the connection, and set the `autoStart` attribute:
Then re-configure the slave database's replication applier to point to the master database,
set the username and password for the connection, and set the `autoStart` attribute:
db._useDatabase("_system");
require("org/arangodb/replication").applier.properties({
endpoint: "tcp://master.example.org:8529",
username: "myuser",
@ -500,13 +534,15 @@ password for the connection, and set the `autoStart` attribute:
adaptivePolling: true
});
After that, start the applier on the slave:
After that, start the applier in the slave database:
db._useDatabase("_system");
require("org/arangodb/replication").applier.start();
After that, you should be able to monitor the state and progress of the replication
applier by executing the `state` command on the slave:
applier by executing the `state` command in the slave database:
db._useDatabase("_system");
require("org/arangodb/replication").applier.state();
You may also want to check the master and slave states via the HTTP APIs (see @ref
@ -518,8 +554,8 @@ Replication Limitations {#UserManualReplicationLimitations}
The replication in ArangoDB 1.4-alpha has a few limitations still. Some of these
limitations may be removed in later versions of ArangoDB:
- the event log on the master is currently written directly after a write operation
is carried out on the master. In case the master crashes between having executed the
- the event log of the master database is currently written directly after a write operation
is carried out in the master database. In case the master crashes between having executed the
write operation and having it written into the event log, the write operation may
have been executed on the master, but may be lost for replication and not be applied
on any slaves.
@ -544,6 +580,8 @@ limitations may be removed in later versions of ArangoDB:
- there currently is one replication logger and one replication applier per ArangoDB
database. It is thus not possible to have a slave apply the events logs from
multiple masters.
- replication is set up on a per-database level. When using ArangoDB with multiple
databases, replication must be configured individually for each database.
- the replication applier is single-threaded, but write operations on the master may
be executed in parallel if they affect different collections. Thus the replication
applier might not be able to catch up with a very powerful and loaded master.