From ecd033491f652794e554571eb352a0f86f151ec1 Mon Sep 17 00:00:00 2001 From: sleto-it <31849787+sleto-it@users.noreply.github.com> Date: Wed, 14 Feb 2018 18:59:18 +0100 Subject: [PATCH] Doc - ArangoSync doc integration (#4590) --- .../Manual/Administration/DC2DC/README.md | 38 +- .../Books/Manual/Deployment/DC2DC.md | 9 +- .../Deployment/DC2DC/ArangoSyncMaster.md | 20 +- .../Deployment/DC2DC/ArangoSyncWorkers.md | 18 +- .../Books/Manual/Deployment/DC2DC/Cluster.md | 11 +- .../Manual/Deployment/DC2DC/KafkaZookeeper.md | 7 +- .../Deployment/DC2DC/PrometheusGrafana.md | 9 +- .../Manual/GettingStarted/DC2DC/README.md | 372 ++++++++++++++++++ .../Books/Manual/Monitoring/DC2DC/README.md | 11 +- Documentation/Books/Manual/SUMMARY.md | 8 + .../Manual/Scalability/DC2DC/Applicability.md | 4 +- .../Manual/Scalability/DC2DC/Introduction.md | 32 +- .../Books/Manual/Scalability/DC2DC/README.md | 11 +- .../Manual/Scalability/DC2DC/Requirements.md | 7 +- .../Books/Manual/Security/DC2DC/README.md | 51 +-- .../Manual/Troubleshooting/DC2DC/README.md | 58 +-- 16 files changed, 536 insertions(+), 130 deletions(-) create mode 100644 Documentation/Books/Manual/GettingStarted/DC2DC/README.md diff --git a/Documentation/Books/Manual/Administration/DC2DC/README.md b/Documentation/Books/Manual/Administration/DC2DC/README.md index b5bfa13965..233e3a1154 100644 --- a/Documentation/Books/Manual/Administration/DC2DC/README.md +++ b/Documentation/Books/Manual/Administration/DC2DC/README.md @@ -1,26 +1,27 @@ + # Datacenter to datacenter replication administration -This Section includes information related to the administration of the _datacenter +This Section includes information related to the administration of the _datacenter to datacenter replication_. For a general introduction to the _datacenter to datacenter replication_, please -refer to the [Datacenter to datacenter replication](..\..\Scalability\DC2DC\README.md) +refer to the [Datacenter to datacenter replication](../../Scalability/DC2DC/README.md) chapther. ## Starting synchronization -Once all components of the _ArangoSync_ solution have been deployed and are -running properly, _ArangoSync_ will not automatically replicate database structure +Once all components of the _ArangoSync_ solution have been deployed and are +running properly, _ArangoSync_ will not automatically replicate database structure and content. For that, it is is needed to configure synchronization. To configure synchronization, you need the following: - The endpoint of the sync master in the target datacenter. - The endpoint of the sync master in the source datacenter. -- A certificate (in keyfile format) used for client authentication of the sync master +- A certificate (in keyfile format) used for client authentication of the sync master (with the sync master in the source datacenter). - A CA certificate (public key only) for verifying the integrity of the sync masters. -- A username+password pair (or client certificate) for authenticating the configure +- A username+password pair (or client certificate) for authenticating the configure require with the sync master (in the target datacenter) With that information, run: @@ -35,9 +36,11 @@ arangosync configure sync \ --auth.password= ``` -The command will finish quickly. Afterwards it will take some time until +The command will finish quickly. Afterwards it will take some time until the clusters in both datacenters are in sync. +## Inspect status + Use the following command to inspect the status of the synchronization of a datacenter: ```bash @@ -51,7 +54,7 @@ arangosync get status \ Note: Invoking this command on the target datacenter will return different results from invoking it on the source datacenter. You need insight in both results to get a "complete picture". -Where the `get status` command gives insight in the status of synchronization, there +Where the `get status` command gives insight in the status of synchronization, there are more detailed commands to give insight in tasks & registered workers. Use the following command to get a list of all synchronization tasks in a datacenter: @@ -84,9 +87,9 @@ arangosync get workers \ -v ``` -## Stoping synchronization +## Stopping synchronization -If you no longer want to synchronize data from a source to a target datacenter +If you no longer want to synchronize data from a source to a target datacenter you must stop it. To do so, run the following command: ```bash @@ -100,7 +103,7 @@ The command will wait until synchronization has completely stopped before return If the synchronization is not completely stopped within a reasonable period (2 minutes by default) the command will fail. -If the source datacenter is no longer available it is not possible to stop synchronization in +If the source datacenter is no longer available it is not possible to stop synchronization in a graceful manner. If that happens abort the synchronization with the following command: ```bash @@ -109,8 +112,9 @@ arangosync abort sync \ --auth.user= \ --auth.password= ``` -If the source datacenter recovers after an `abort sync` has been executed, it is -needed to "cleanup" ArangoSync in the source datacenter. + +If the source datacenter recovers after an `abort sync` has been executed, it is +needed to "cleanup" ArangoSync in the source datacenter. To do so, execute the following command: ```bash @@ -120,12 +124,12 @@ arangosync abort outgoing sync \ --auth.password= ``` -## Reversing synchronization direction +## Reversing synchronization direction If you want to reverse the direction of synchronization (e.g. after a failure -in datacenter A and you switched to the datacenter B for fallback), you +in datacenter A and you switched to the datacenter B for fallback), you must first stop (or abort) the original synchronization. Once that is finished (and cleanup has been applied in case of abort), -you must now configure the synchronization again, but with swapped -source & target settings. \ No newline at end of file +you must now configure the synchronization again, but with swapped +source & target settings. diff --git a/Documentation/Books/Manual/Deployment/DC2DC.md b/Documentation/Books/Manual/Deployment/DC2DC.md index 8212a42d6c..9f46bacbac 100644 --- a/Documentation/Books/Manual/Deployment/DC2DC.md +++ b/Documentation/Books/Manual/Deployment/DC2DC.md @@ -1,12 +1,13 @@ -# Datacenter to datacenter replication deployment + +# Datacenter to datacenter replication deployment This chapter describes how to deploy all the components needed for _datacenter to datacenter replication_. For a general introduction to _datacenter to datacenter replication_, please refer -to the [Datacenter to datacenter replication](..\Scalability\DC2DC\README.md) chapter. +to the [Datacenter to datacenter replication](../Scalability/DC2DC/README.md) chapter. -[Requirements](..\Scalability\DC2DC\Requirements.md) can be found in this section. +[Requirements](../Scalability/DC2DC/Requirements.md) can be found in this section. Deployment steps: @@ -14,4 +15,4 @@ Deployment steps: - [Kafka & Zookeeper](DC2DC/KafkaZookeeper.md) - [ArangoSync Master](DC2DC/ArangoSyncMaster.md) - [ArangoSync Workers](DC2DC/ArangoSyncWorkers.md) -- [Prometheus & Grafana (optional)](DC2DC/PrometheusGrafana.md) +- [Prometheus & Grafana (optional)](DC2DC/PrometheusGrafana.md) diff --git a/Documentation/Books/Manual/Deployment/DC2DC/ArangoSyncMaster.md b/Documentation/Books/Manual/Deployment/DC2DC/ArangoSyncMaster.md index 2d92e6ad19..58d84f2a34 100644 --- a/Documentation/Books/Manual/Deployment/DC2DC/ArangoSyncMaster.md +++ b/Documentation/Books/Manual/Deployment/DC2DC/ArangoSyncMaster.md @@ -1,12 +1,14 @@ -# ArangoSync Master + +# ArangoSync Master The _ArangoSync Master_ is responsible for managing all synchronization, creating tasks and assigning those to the _ArangoSync Workers_. -
At least 2 instances muts be deployed in each datacenter. + +At least 2 instances must be deployed in each datacenter. One instance will be the "leader", the other will be an inactive slave. When the leader is gone for a short while, one of the other instances will take over. -With clusters of a significant size, the _sync master_ will require a +With clusters of a significant size, the _sync master_ will require a significant set of resources. Therefore it is recommended to deploy the _sync masters_ on their own servers, equiped with sufficient CPU power and memory capacity. @@ -14,12 +16,12 @@ To start an _ArangoSync Master_ using a `systemd` service, use a unit like this: ```text [Unit] -Description=Run ArangoSync in master mode +Description=Run ArangoSync in master mode After=network.target [Service] Restart=on-failure -EnvironmentFile=/etc/arangodb.env +EnvironmentFile=/etc/arangodb.env EnvironmentFile=/etc/arangodb.env.local LimitNOFILE=8192 ExecStart=/usr/sbin/arangosync run master \ @@ -41,8 +43,8 @@ TimeoutStopSec=60 WantedBy=multi-user.target ``` -The _sync master_ needs a TLS server certificate and a -If you want the service to create a TLS certificate & client authentication +The _sync master_ needs a TLS server certificate and a +If you want the service to create a TLS certificate & client authentication certificate, for authenticating with _ArangoSync Masters_ in another datacenter, for every start, add this to the `Service` section. @@ -65,7 +67,7 @@ ExecStartPre=/usr/sbin/arangosync create client-auth keyfile \ ``` The _ArangoSync Master_ must be reachable on a TCP port `${MASTERPORT}` (used with `--server.port` option). -This port must be reachable from inside the datacenter (by sync workers and operations) +This port must be reachable from inside the datacenter (by sync workers and operations) and from inside of the other datacenter (by sync masters in the other datacenter). ## Recommended deployment environment @@ -73,4 +75,4 @@ and from inside of the other datacenter (by sync masters in the other datacenter Since the _sync masters_ can be CPU intensive when running lots of databases & collections, it is recommended to run them on dedicated machines with a lot of CPU power. -Consider these machines "pets". \ No newline at end of file +Consider these machines "pets". diff --git a/Documentation/Books/Manual/Deployment/DC2DC/ArangoSyncWorkers.md b/Documentation/Books/Manual/Deployment/DC2DC/ArangoSyncWorkers.md index 6b3322831b..b12f60083d 100644 --- a/Documentation/Books/Manual/Deployment/DC2DC/ArangoSyncWorkers.md +++ b/Documentation/Books/Manual/Deployment/DC2DC/ArangoSyncWorkers.md @@ -1,26 +1,28 @@ + # ArangoSync Workers The _ArangoSync Worker_ is responsible for executing synchronization tasks. -
For optimal performance at least 1 _worker_ instance must be placed on -every machine that has an ArangoDB _DBserver_ running. This ensures that tasks + +For optimal performance at least 1 _worker_ instance must be placed on +every machine that has an ArangoDB _DBserver_ running. This ensures that tasks can be executed with minimal network traffic outside of the machine. -Since _sync workers_ will automatically stop once their TLS server certificate expires +Since _sync workers_ will automatically stop once their TLS server certificate expires (which is set to 2 years by default), it is recommended to run at least 2 instances -of a _worker_ on every machine in the datacenter. That way, tasks can still be -assigned in the most optimal way, even when a _worker_ in temporarily down for a +of a _worker_ on every machine in the datacenter. That way, tasks can still be +assigned in the most optimal way, even when a _worker_ is temporarily down for a restart. To start an _ArangoSync Worker_ using a `systemd` service, use a unit like this: ```text [Unit] -Description=Run ArangoSync in worker mode +Description=Run ArangoSync in worker mode After=network.target [Service] Restart=on-failure -EnvironmentFile=/etc/arangodb.env +EnvironmentFile=/etc/arangodb.env EnvironmentFile=/etc/arangodb.env.local Environment=PORT=8729 LimitNOFILE=1000000 @@ -48,4 +50,4 @@ you can decide to run multiple _sync workers_ on each machine in order to spread The _sync workers_ should be run on all machines that also contain an ArangoDB _DBServer_. The _sync worker_ can be memory intensive when running lots of databases & collections. -Consider these machines "cattle". \ No newline at end of file +Consider these machines "cattle". diff --git a/Documentation/Books/Manual/Deployment/DC2DC/Cluster.md b/Documentation/Books/Manual/Deployment/DC2DC/Cluster.md index f151e01497..a089df5dc2 100644 --- a/Documentation/Books/Manual/Deployment/DC2DC/Cluster.md +++ b/Documentation/Books/Manual/Deployment/DC2DC/Cluster.md @@ -1,4 +1,5 @@ -# ArangoDB cluster + +# ArangoDB cluster There are several ways to start an ArangoDB cluster. In this section we will focus on our recommended way to start ArangoDB: the ArangoDB _Starter_. @@ -14,7 +15,7 @@ The _Starter_ simplifies things for the operator and will coordinate a distribut cluster startup across several machines and assign cluster roles automatically. When started on several machines and enough machines have joined, the _Starters_ -will start _Agents_, s_Coordinators_ and _DBservers_ on these machines. +will start _Agents_, _Coordinators_ and _DBservers_ on these machines. When running the _Starter_ will supervise its child tasks (namely _Coordinators_, _DBservers_ and _Agents_) and restart them in case of failure. @@ -23,7 +24,7 @@ To start the cluster using a `systemd` unit file use the following: ```text [Unit] -Description=Run the ArangoDB Starter +Description=Run the ArangoDB Starter After=network.target [Service] @@ -48,7 +49,7 @@ Note that we set `rocksdb` in the unit service file. ## Cluster authentication -The communication between the cluster nodes use a token (JWT) to authenticate. +The communication between the cluster nodes use a token (JWT) to authenticate. This must be shared between cluster nodes. Sharing secrets is obviously a very delicate topic. The above workflow assumes @@ -76,4 +77,4 @@ The _Starter_ itself will use port `8528`. Since the _Agents_ are so critical to the availability of both the ArangoDB and the ArangoSync cluster, it is recommended to run _Agents_ on dedicated machines. Consider these machines "pets". -_Coordinators_ and _DBServers_ can be deployed of other machines that should be considered "cattle". +_Coordinators_ and _DBServers_ can be deployed on other machines that should be considered "cattle". diff --git a/Documentation/Books/Manual/Deployment/DC2DC/KafkaZookeeper.md b/Documentation/Books/Manual/Deployment/DC2DC/KafkaZookeeper.md index 77a6e818e9..85f79c848f 100644 --- a/Documentation/Books/Manual/Deployment/DC2DC/KafkaZookeeper.md +++ b/Documentation/Books/Manual/Deployment/DC2DC/KafkaZookeeper.md @@ -1,7 +1,8 @@ -# Kafka & Zookeeper + +# Kafka & Zookeeper -- How to deploy zookeeper -- How to deploy kafka +- How to deploy zookeeper +- How to deploy kafka - Accessible ports ## Recommended deployment environment diff --git a/Documentation/Books/Manual/Deployment/DC2DC/PrometheusGrafana.md b/Documentation/Books/Manual/Deployment/DC2DC/PrometheusGrafana.md index 169842ced0..6461d84cc3 100644 --- a/Documentation/Books/Manual/Deployment/DC2DC/PrometheusGrafana.md +++ b/Documentation/Books/Manual/Deployment/DC2DC/PrometheusGrafana.md @@ -1,9 +1,10 @@ + # Prometheus & Grafana (optional) _ArangoSync_ provides metrics in a format supported by [Prometheus](https://prometheus.io). We also provide a standard set of dashboards for viewing those metrics in [Grafana](https://grafana.org). -If you want to use these tools, please refer to their websites for instructions +If you want to use these tools, please refer to their websites for instructions on how to deploy them. After deployment, you must configure _Prometheus_ using a configuration file that @@ -55,7 +56,7 @@ scrape_configs: tls_config: insecure_skip_verify: true static_configs: - - targets: + - targets: - "${IPWORKERA1}:8729" - "${IPWORKERA2}:8729" - "${IPWORKERB1}:8729" @@ -81,8 +82,8 @@ scrape_configs: replacement: 2 ``` -Note: The above example assumes 2 datacenters, with 2 _sync masters_ & 2 _sync workers_ -per datacenter. You have to replace all `${...}` variables in the above configuration +Note: The above example assumes 2 datacenters, with 2 _sync masters_ & 2 _sync workers_ +per datacenter. You have to replace all `${...}` variables in the above configuration with applicable values from your environment. ## Recommended deployment environment diff --git a/Documentation/Books/Manual/GettingStarted/DC2DC/README.md b/Documentation/Books/Manual/GettingStarted/DC2DC/README.md new file mode 100644 index 0000000000..11a6be590f --- /dev/null +++ b/Documentation/Books/Manual/GettingStarted/DC2DC/README.md @@ -0,0 +1,372 @@ + +# Datacenter to datacenter Replication + +## About + +At some point in the growth of a database, there comes a need for +replicating it across multiple datacenters. + +Reasons for that can be: + +- Fallback in case of a disaster in one datacenter. +- Regional availability +- Separation of concerns + +And many more. + +This tutorial describes what the ArangoSync datacenter to datacenter +replication solution (ArangoSync from now on) offers, +when to use it, when not to use it and how to configure, +operate, troubleshoot it & keep it safe. + +### What is it + +ArangoSync is a solution that enables you to asynchronously replicate +the entire structure and content in an ArangoDB cluster in one place to a cluster +in another place. Typically it is used from one datacenter to another. +
It is not a solution for replicating single server instances. + +The replication done by ArangoSync is **asynchronous**. This means that when +a client is writing data into the source datacenter, it will consider the +request finished before the data has been replicated to the other datacenter. +The time needed to completely replicate changes to the other datacenter is +typically in the order of seconds, but this can vary significantly depending on +load, network & computer capacity. + +ArangoSync performs replication in a **single direction** only. That means that +you can replicate data from cluster A to cluster B or from cluster B to cluster A, +but never at the same time. +
Data modified in the destination cluster **will be lost!** + +Replication is a completely **autonomous** process. Once it is configured it is +designed to run 24/7 without frequent manual intervention. +
This does not mean that it requires no maintenance or attention at all. +
As with any distributed system some attention is needed to monitor its operation +and keep it secure (e.g. certificate & password rotation). + +Once configured, ArangoSync will replicate both **structure and data** of an +**entire cluster**. This means that there is no need to make additional configuration +changes when adding/removing databases or collections. +
Also meta data such as users, foxx application & jobs are automatically replicated. + +### When to use it... and when not + +ArangoSync is a good solution in all cases where you want to replicate +data from one cluster to another without the requirement that the data +is available immediately in the other cluster. + +ArangoSync is not a good solution when one of the following applies: + +- You want to replicate data from cluster A to cluster B and from cluster B + to cluster A at the same time. +- You need synchronous replication between 2 clusters. +- There is no network connection betwee cluster A and B. +- You want complete control over which database, collection & documents are replicate and which not. + +## Requirements + +To use ArangoSync you need the following: + +- Two datacenters, each running an ArangoDB Enterprise cluster, version 3.3 or higher. +- A network connection between both datacenters with accessible endpoints + for several components (see individual components for details). +- TLS certificates for ArangoSync master instances (can be self-signed). +- TLS certificates for Kafka brokers (can be self-signed). +- Optional (but recommended) TLS certificates for ArangoDB clusters (can be self-signed). +- Client certificates CA for ArangoSync masters (typically self-signed). +- Client certificates for ArangoSync masters (typically self-signed). +- At least 2 instances of the ArangoSync master in each datacenter. +- One instances of the ArangoSync worker on every machine in each datacenter. + +Note: In several places you will need a (x509) certificate. +
The [certificates](#certificates) section below provides more guidance for creating +and renewing these certificates. + +Besides the above list, you probably want to use the following: + +- An orchestrator to keep all components running. In this tutorial we will use `systemd` as an example. +- A log file collector for centralized collection & access to the logs of all components. +- A metrics collector & viewing solution such as Prometheus + Grafana. + +## Deployment + +In the following paragraphs you'll learn which components have to be deployed +for datacenter to datacenter replication. For detailed deployment instructions, +consult the [reference manual](../../Deployment/DC2DC.md). + +### ArangoDB cluster + +Datacenter to datacenter replication requires an ArangoDB cluster in both data centers, +configured with the `rocksdb` storage engine. + +Since the cluster agents are so critical to the availability of both the ArangoDB and the ArangoSync cluster, +it is recommended to run agents on dedicated machines. Consider these machines "pets". + +Coordinators and dbservers can be deployed of other machines that should be considered "cattle". + +### Kafka & Zookeeper + +Kafka & Zookeeper are needed when using the `kafka` type message queue. + +Since the kafka brokers are really CPU and memory intensive, +it is recommended to run zookeeper & kakfa on dedicated machines. + +Consider these machines "pets". + +### Sync Master + +The Sync Master is responsible for managing all synchronization, creating tasks and assigning +those to workers. +
At least 2 instances must be deployed in each datacenter. +One instance will be the "leader", the other will be an inactive slave. When the leader +is gone for a short while, one of the other instances will take over. + +With clusters of a significant size, the sync master will require a significant set of resources. +Therefore it is recommended to deploy sync masters on their own servers, equiped with sufficient +CPU power and memory capacity. + +The sync master must be reachable on a TCP port 8629 (default). +This port must be reachable from inside the datacenter (by sync workers and operations) +and from inside of the other datacenter (by sync masters in the other datacenter). + +Since the sync masters can be CPU intensive when running lots of databases & collections, +it is recommended to run them on dedicated machines with a lot of CPU power. + +Consider these machines "pets". + +### Sync Workers + +The Sync Worker is responsible for executing synchronization tasks. +
For optimal performance at least 1 worker instance must be placed on +every machine that has an ArangoDB `dbserver` running. This ensures that tasks +can be executed with minimal network traffic outside of the machine. + +Since sync workers will automatically stop once their TLS server certificate expires +(which is set to 2 years by default), +it is recommended to run at least 2 instances of a worker on every machine in the datacenter. +That way, tasks can still be assigned in the most optimal way, even when a worker in temporarily +down for a restart. + +The sync worker must be reachable on a TCP port 8729 (default). +This port must be reachable from inside the datacenter (by sync masters). +When using the `direct` message queue type, this port must also be reachable from +the other datacenter. + +Note the large file descriptor limit when using the `kafka` message queue type. +With kafka, the sync worker requires about 30 file descriptors per shard. +If you use hardware with huge resources, and still run out of file descriptors, +you can decide to run multiple sync workers on each machine in order to spread the tasks across them. + +The sync workers should be run on all machines that also contain an ArangoDB dbserver. +The sync worker can be memory intensive when running lots of databases & collections. + +Consider these machines "cattle". + +### Prometheus & Grafana (optional) + +ArangoSync provides metrics in a format supported by [Prometheus](https://prometheus.io). +We also provide a standard set of dashboards for viewing those metrics in [Grafana](https://grafana.org). + +If you want to use these tools, go to their websites for instructions on how to deploy them. + +After deployment, you must configure prometheus using a configuration file that instructs +it about which targets to scrape. For ArangoSync you should configure scrape targets for +all sync masters and all sync workers. +Consult the [reference manual](../../Deployment/DC2DC/PrometheusGrafana.md) for a sample configuration. + +Prometheus can be a memory & CPU intensive process. It is recommended to keep them +on other machines than used to run the ArangoDB cluster or ArangoSync components. + +Consider these machines "cattle", unless you configure alerting on prometheus, +in which case it is recommended to consider these machines "pets". + +## Configuration + +Once all components of the ArangoSync solution have been deployed and are +running properly, ArangoSync will not automatically replicate database structure +and content. For that, it is is needed to configure synchronization. + +To configure synchronization, you need the following: + +- The endpoint of the sync master in the target datacenter. +- The endpoint of the sync master in the source datacenter. +- A certificate (in keyfile format) used for client authentication of the sync master + (with the sync master in the source datacenter). +- A CA certificate (public key only) for verifying the integrity of the sync masters. +- A username+password pair (or client certificate) for authenticating the configure + require with the sync master (in the target datacenter) + +With that information, run: + +```bash +arangosync configure sync \ + --master.endpoint= \ + --master.keyfile= \ + --source.endpoint= \ + --source.cacert= \ + --auth.user= \ + --auth.password= +``` + +The command will finish quickly. Afterwards it will take some time until +the clusters in both datacenters are in sync. + +Use the following command to inspect the status of the synchronization of a datacenter: + +```bash +arangosync get status \ + --master.endpoint= \ + --auth.user= \ + --auth.password= \ + -v +``` + +Note: Invoking this command on the target datacenter will return different results from +invoking it on the source datacenter. You need insight in both results to get a "complete picture". + +ArangoSync has more command to inspect the status of synchronization. +Consult the [reference manual](../../Administration/DC2DC/README.md#inspect-status) for details. + +### Stop synchronization + +If you no longer want to synchronize data from a source to a target datacenter +you must stop it. To do so, run the following command: + +```bash +arangosync stop sync \ + --master.endpoint= \ + --auth.user= \ + --auth.password= +``` + +The command will wait until synchronization has completely stopped before returning. +If the synchronization is not completely stopped within a reasonable period (2 minutes by default) +the command will fail. + +If the source datacenter is no longer available it is not possible to stop synchronization in +a graceful manner. Consult the [reference manual](../../Administration/DC2DC/README.md#stopping-synchronization) for instructions how to abort synchronization in +this case. + +### Reversing synchronization direction + +If you want to reverse the direction of synchronization (e.g. after a failure +in datacenter A and you switched to the datacenter B for fallback), you +must first stop (or abort) the original synchronization. + +Once that is finished (and cleanup has been applied in case of abort), +you must now configure the synchronization again, but with swapped +source & target settings. + +## Operations & Maintenance + +ArangoSync is a distributed system with a lot different components. +As with any such system, it requires some, but not a lot, of operational +support. + +### What means are available to monitor status + +All of the components of ArangoSync provide means to monitor their status. +Below you'll find an overview per component. + +- Sync master & workers: The `arangosync` servers running as either master + or worker, provide: + - A status API, see `arangosync get status`. Make sure that all statuses report `running`. +
For even more detail the following commands are also available: + `arangosync get tasks`, `arangosync get masters` & `arangosync get workers`. + - A log on the standard output. Log levels can be configured using `--log.level` settings. + - A metrics API `GET /metrics`. This API is compatible with Prometheus. + Sample Grafana dashboards for inspecting these metrics are available. + +- ArangoDB cluster: The `arangod` servers that make up the ArangoDB cluster + provide: + - A log file. This is configurable with settings with a `log.` prefix. + E.g. `--log.output=file://myLogFile` or `--log.level=info`. + - A statistics API `GET /_admin/statistics` + +- Kafka cluster: The kafka brokers provide: + - A log file, see settings with `log.` prefix in its `server.properties` configuration file. + +- Zookeeper: The zookeeper agents provide: + - A log on standard output. + +### What to look for while monitoring status + +The very first thing to do when monitoring the status of ArangoSync is to +look into the status provided by `arangosync get status ... -v`. +When not everything is in the `running` state (on both datacenters), this is an +indication that something may be wrong. In case that happens, give it some time +(incremental synchronization may take quite some time for large collections) +and look at the status again. If the statuses do not change (or change, but not reach `running`) +it is time to inspects the metrics & log files. +
When the metrics or logs seem to indicate a problem in a sync master or worker, it is +safe to restart it, as long as only 1 instance is restarted at a time. +Give restarted instances some time to "catch up". + +### 'What if ...' + +Please consult the [reference manual](../../TroubleShooting/DC2DC/README.md) for details descriptions of what to do in case of certain +problems and how & what information to provide to support so they can assist you best when needed. + +### Metrics + +ArangoSync (master & worker) provide metrics that can be used for monitoring the ArangoSync +solution. These metrics are available using the following HTTPS endpoints: + +- GET `/metrics`: Provides metrics in a format supported by Prometheus. +- GET `/metrics.json`: Provides the same metrics in JSON format. + +Both endpoints include help information per metrics. + +Note: Both endpoints require authentication. Besides the usual authentication methods +these endpoints are also accessible using a special bearer token specified using the `--monitoring.token` +command line option. + +Consult the [reference manual](../../Monitoring/DC2DC/README.md#metrics) for sample output of the metrics endpoints. + +## Security + +### Firewall settings + +The components of ArangoSync use (TCP) network connections to communicate with each other. + +Consult the [reference manual](../../Security/DC2DC/README.md#firewall-settings) for a detailed list of connections and the ports that should be accessible. + +### Certificates + +Digital certificates are used in many places in ArangoSync for both encryption +and authentication. + +In ArangoSync all network connections are using Transport Layer Security (TLS), +a set of protocols that ensure that all network traffic is encrypted. +For this TLS certificates are used. The server side of the network connection +offers a TLS certificate. This certificate is (often) verified by the client side of the network +connection, to ensure that the certificate is signed by a trusted Certificate Authority (CA). +This ensures the integrity of the server. + +In several places additional certificates are used for authentication. In those cases +the client side of the connection offers a client certificate (on top of an existing TLS connection). +The server side of the connection uses the client certificate to authenticate +the client and (optionally) decides which rights should be assigned to the client. + +Note: ArangoSync does allow the use of certificates signed by a well know CA (eg. verisign) +however it is more convenient (and common) to use your own CA. + +Consult the [reference manual](../../Security/DC2DC/README.md#certificates) for detailed instructions on how to create these certificates. + +#### Renewing certificates + +All certificates have meta information in them the limit their use in function, +target & lifetime. +
A certificate created for client authentication (function) cannot be used as a TLS server certificate +(same is true for the reverse). +
A certificate for host `myserver` (target) cannot be used for host `anotherserver`. +
A certficiate that is valid until October 2017 (limetime) cannot be used after October 2017. + +If anything changes in function, target or lifetime you need a new certificate. + +The procedure for creating a renewed certificate is the same as for creating a "first" certificate. +
After creating the renewed certificate the process(es) using them have to be updated. +This mean restarting them. All ArangoSync components are designed to support stopping and starting +single instances, but do not restart more than 1 instance at the same time. +As soon as 1 instance has been restarted, give it some time to "catch up" before restarting +the next instance. diff --git a/Documentation/Books/Manual/Monitoring/DC2DC/README.md b/Documentation/Books/Manual/Monitoring/DC2DC/README.md index d2392d4e8f..fa2d4e24e1 100644 --- a/Documentation/Books/Manual/Monitoring/DC2DC/README.md +++ b/Documentation/Books/Manual/Monitoring/DC2DC/README.md @@ -1,16 +1,17 @@ + # Monitoring datacenter to datacenter replication -This section includes information related to the monitoring of the _datacenter +This section includes information related to the monitoring of the _datacenter to datacenter replication_. For a general introduction to the _datacenter to datacenter replication_, please refer to the [Datacenter to datacenter replication](..\..\Scalability\DC2DC\README.md) chapter. -# Metrics +## Metrics -_ArangoSync_ (_master_ & _worker_) provide metrics that can be used for monitoring -the _datacenter to datacenter repliation_ solution. These metrics are available +_ArangoSync_ (_master_ & _worker_) provide metrics that can be used for monitoring +the _datacenter to datacenter repliation_ solution. These metrics are available using the following HTTPS endpoints: - GET `/metrics`: Provides metrics in a format supported by Prometheus. @@ -18,7 +19,7 @@ using the following HTTPS endpoints: Both endpoints include help information per metrics. -Note: Both endpoints require authentication. Besides the usual authentication methods +Note: Both endpoints require authentication. Besides the usual authentication methods these endpoints are also accessible using a special bearer token specified using the `--monitoring.token` command line option. diff --git a/Documentation/Books/Manual/SUMMARY.md b/Documentation/Books/Manual/SUMMARY.md index 65aa4341ce..90505e1435 100644 --- a/Documentation/Books/Manual/SUMMARY.md +++ b/Documentation/Books/Manual/SUMMARY.md @@ -16,6 +16,8 @@ * [Coming from SQL](GettingStarted/ComingFromSql.md) # https://@github.com/arangodb-helper/arangodb.git;arangodb;docs/Manual;;/ * [ArangoDB Starter](GettingStarted/Starter/README.md) +# https://@github.com/arangodb/arangosync.git;arangosync;docs/Manual;;/ + * [Datacenter to datacenter Replication](GettingStarted/DC2DC/README.md) # * [Coming from MongoDB](GettingStarted/ComingFromMongoDb.md) #TODO # * [Highlights](Highlights.md) @@ -25,6 +27,7 @@ * [Architecture](Scalability/Architecture.md) * [Data models](Scalability/DataModels.md) * [Limitations](Scalability/Limitations.md) +# https://@github.com/arangodb/arangosync.git;arangosync;docs/Manual;;/ * [Datacenter to datacenter replication](Scalability/DC2DC/README.md) * [Introduction](Scalability/DC2DC/Introduction.md) * [Applicability](Scalability/DC2DC/Applicability.md) @@ -146,6 +149,7 @@ * [Cluster: Local test setups](Deployment/Local.md) * [Cluster: Processes](Deployment/Distributed.md) * [Cluster: Docker](Deployment/Docker.md) +# https://@github.com/arangodb/arangosync.git;arangosync;docs/Manual;;/ * [Multiple Datacenters](Deployment/DC2DC.md) * [Cluster](Deployment/DC2DC/Cluster.md) * [Kafka & Zookeeper](Deployment/DC2DC/KafkaZookeeper.md) @@ -211,6 +215,7 @@ * [Configuration](Administration/Replication/Synchronous/Configuration.md) * [Satellite Collections](Administration/Replication/Synchronous/Satellites.md) * [Cluster](Administration/Cluster/README.md) +# https://@github.com/arangodb/arangosync.git;arangosync;docs/Manual;;/ * [Datacenter to datacenter replication](Administration/DC2DC/README.md) * [Sharding](Administration/Sharding/README.md) # * [Authentication](Administration/Sharding/Authentication.md) @@ -233,12 +238,15 @@ * [Datafile Debugger](Troubleshooting/DatafileDebugger.md) * [Arangobench](Troubleshooting/Arangobench.md) * [Cluster](Troubleshooting/Cluster/README.md) +# https://@github.com/arangodb/arangosync.git;arangosync;docs/Manual;;/ * [Datacenter to datacenter replication](Troubleshooting/DC2DC/README.md) # * [Monitoring](Monitoring/README.md) +# https://@github.com/arangodb/arangosync.git;arangosync;docs/Manual;;/ * [Datacenter to datacenter replication](Monitoring/DC2DC/README.md) # * [Security](Security/README.md) +# https://@github.com/arangodb/arangosync.git;arangosync;docs/Manual;;/ * [Datacenter to datacenter replication](Security/DC2DC/README.md) # * [Architecture](Architecture/README.md) diff --git a/Documentation/Books/Manual/Scalability/DC2DC/Applicability.md b/Documentation/Books/Manual/Scalability/DC2DC/Applicability.md index 573b15bfe4..bdca383836 100644 --- a/Documentation/Books/Manual/Scalability/DC2DC/Applicability.md +++ b/Documentation/Books/Manual/Scalability/DC2DC/Applicability.md @@ -1,11 +1,13 @@ + # When to use it... and when not The _datacenter to datacenter replication_ is a good solution in all cases where -you want to replicate data from one cluster to another without the requirement +you want to replicate data from one cluster to another without the requirement that the data is available immediately in the other cluster. The _datacenter to datacenter replication_ is not a good solution when one of the following applies: + - You want to replicate data from cluster A to cluster B and from cluster B to cluster A at the same time. - You need synchronous replication between 2 clusters. diff --git a/Documentation/Books/Manual/Scalability/DC2DC/Introduction.md b/Documentation/Books/Manual/Scalability/DC2DC/Introduction.md index b471ae1860..a50585d632 100644 --- a/Documentation/Books/Manual/Scalability/DC2DC/Introduction.md +++ b/Documentation/Books/Manual/Scalability/DC2DC/Introduction.md @@ -1,17 +1,19 @@ + # Introduction -At some point in the grows of a database, there comes a need for replicating it +At some point in the grows of a database, there comes a need for replicating it across multiple datacenters. Reasons for that can be: + - Fallback in case of a disaster in one datacenter -- Regional availability -- Separation of concerns +- Regional availability +- Separation of concerns And many more. -Starting from version 3.3, ArangoDB supports _datacenter to datacenter -replication_, via the _ ArangoSync_ tool. +Starting from version 3.3, ArangoDB supports _datacenter to datacenter +replication_, via the _ArangoSync_ tool. ArangoDB's _datacenter to datacenter replication_ is a solution that enables you to asynchronously replicate the entire structure and content in an ArangoDB Cluster @@ -21,25 +23,25 @@ to another. ![ArangoDB DC2DC](dc2dc.png) -The replication done by _ArangoSync_ in **asynchronous**. That means that when -a client is writing data into the source datacenter, it will consider the +The replication done by _ArangoSync_ in **asynchronous**. That means that when +a client is writing data into the source datacenter, it will consider the request finished before the data has been replicated to the other datacenter. The time needed to completely replicate changes to the other datacenter is -typically in the order of seconds, but this can vary significantly depending on +typically in the order of seconds, but this can vary significantly depending on load, network & computer capacity. -_ArangoSync_ performs replication in a **single direction** only. That means that -you can replicate data from cluster _A_ to cluster _B_ or from cluster _B_ to -cluster _A_, but never at the same time. +_ArangoSync_ performs replication in a **single direction** only. That means that +you can replicate data from cluster _A_ to cluster _B_ or from cluster _B_ to +cluster _A_, but never at the same time.
Data modified in the destination cluster **will be lost!** -Replication is a completely **autonomous** process. Once it is configured it is +Replication is a completely **autonomous** process. Once it is configured it is designed to run 24/7 without frequent manual intervention.
This does not mean that it requires no maintenance or attention at all. -
As with any distributed system some attention is needed to monitor its operation +
As with any distributed system some attention is needed to monitor its operation and keep it secure (e.g. certificate & password rotation). -Once configured, _ArangoSync_ will replicate both **structure and data** of an -**entire cluster**. This means that there is no need to make additional configuration +Once configured, _ArangoSync_ will replicate both **structure and data** of an +**entire cluster**. This means that there is no need to make additional configuration changes when adding/removing databases or collections.
Also meta data such as users, foxx application & jobs are automatically replicated. diff --git a/Documentation/Books/Manual/Scalability/DC2DC/README.md b/Documentation/Books/Manual/Scalability/DC2DC/README.md index 457641624e..22d2934c0d 100644 --- a/Documentation/Books/Manual/Scalability/DC2DC/README.md +++ b/Documentation/Books/Manual/Scalability/DC2DC/README.md @@ -1,3 +1,4 @@ + # Datacenter to datacenter replication This chapter introduces ArangoDB's _datacenter to datacenter replication_ (DC2DC). @@ -5,8 +6,8 @@ This chapter introduces ArangoDB's _datacenter to datacenter replication_ (DC2DC For further information about _datacenter to datacenter replication_, please refer to the following sections: -- [Deployment](..\..\Deployment\DC2DC.md) -- [Administration](..\..\Administration\DC2DC\README.md) -- [Troubleshooting](..\..\Troubleshooting\DC2DC\README.md) -- [Monitoring](..\..\Monitoring\DC2DC\README.md) -- [Security](..\..\Security\DC2DC\README.md) +- [Deployment](../../Deployment/DC2DC.md) +- [Administration](../../Administration/DC2DC/README.md) +- [Troubleshooting](../../Troubleshooting/DC2DC/README.md) +- [Monitoring](../../Monitoring/DC2DC/README.md) +- [Security](../../Security/DC2DC/README.md) diff --git a/Documentation/Books/Manual/Scalability/DC2DC/Requirements.md b/Documentation/Books/Manual/Scalability/DC2DC/Requirements.md index f450410a0c..1d49862f99 100644 --- a/Documentation/Books/Manual/Scalability/DC2DC/Requirements.md +++ b/Documentation/Books/Manual/Scalability/DC2DC/Requirements.md @@ -1,4 +1,5 @@ -# Requirements + +# Requirements To use _datacenter to datacenter replication_ you need the following: @@ -13,8 +14,8 @@ To use _datacenter to datacenter replication_ you need the following: - At least 2 instances of the _ArangoSync master_ in each datacenter. - One instances of the _ArangoSync worker_ on every machine in each datacenter. -Note: In several places you will need a (x509) certificate. -
The [Certificates](..\..\Security\DC2DC\README.md#certificates) section provides more guidance for creating +Note: In several places you will need a (x509) certificate. +
The [Certificates](../../Security/DC2DC/README.md#certificates) section provides more guidance for creating and renewing these certificates. Besides the above list, you probably want to use the following: diff --git a/Documentation/Books/Manual/Security/DC2DC/README.md b/Documentation/Books/Manual/Security/DC2DC/README.md index aae7e5498c..11ecb5b666 100644 --- a/Documentation/Books/Manual/Security/DC2DC/README.md +++ b/Documentation/Books/Manual/Security/DC2DC/README.md @@ -1,3 +1,4 @@ + # Datacenter to datacenter Security This section includes information related to the _datacenter to datacenter replication_ @@ -54,25 +55,25 @@ Below you'll find an overview of these connections and the TCP ports that should be able to connect to all of these ports. By default Zookeeper uses: - + - port `2181` for client communication - port `2888` for follower communication - port `3888` for leader elections -## Certificates +## Certificates -Digital certificates are used in many places in _ArangoSync_ for both encryption +Digital certificates are used in many places in _ArangoSync_ for both encryption and authentication.
In ArangoSync all network connections are using Transport Layer Security (TLS), a set of protocols that ensure that all network traffic is encrypted. -For this TLS certificates are used. The server side of the network connection +For this TLS certificates are used. The server side of the network connection offers a TLS certificate. This certificate is (often) verified by the client side of the network -connection, to ensure that the certificate is signed by a trusted Certificate Authority (CA). +connection, to ensure that the certificate is signed by a trusted Certificate Authority (CA). This ensures the integrity of the server. -
In several places additional certificates are used for authentication. In those cases +
In several places additional certificates are used for authentication. In those cases the client side of the connection offers a client certificate (on top of an existing TLS connection). -The server side of the connection uses the client certificate to authenticate +The server side of the connection uses the client certificate to authenticate the client and (optionally) decides which rights should be assigned to the client. Note: ArangoSync does allow the use of certificates signed by a well know CA (eg. verisign) @@ -80,27 +81,27 @@ however it is more convenient (and common) to use your own CA. ### Formats -All certificates are x509 certificates with a public key, a private key and -an optional chain of certificates used to sign the certificate (this chain is +All certificates are x509 certificates with a public key, a private key and +an optional chain of certificates used to sign the certificate (this chain is typically provided by the Certificate Authority (CA)).
Depending on their use, certificates stored in a different format. The following formats are used: -- Public key only (`.crt`): A file that contains only the public key of +- Public key only (`.crt`): A file that contains only the public key of a certificate with an optional chain of parent certificates (public keys of certificates used to signed the certificate). -
Since this format contains only public keys, it is not a problem if its contents +
Since this format contains only public keys, it is not a problem if its contents are exposed. It must still be store it in a safe place to avoid losing it. - Private key only (`.key`): A file that contains only the private key of a certificate.
It is vital to protect these files and store them in a safe place. -- Keyfile with public & private key (`.keyfile`): A file that contains the public key of +- Keyfile with public & private key (`.keyfile`): A file that contains the public key of a certificate, an optional chain of parent certificates and a private key. -
Since this format also contains a private key, it is vital to protect these files +
Since this format also contains a private key, it is vital to protect these files and store them in a safe place. - Java keystore (`.jks`): A file containing a set of public and private keys.
It is possible to protect access to the content of this file using a keystore password. -
Since this format can contain private keys, it is vital to protect these files +
Since this format can contain private keys, it is vital to protect these files and store them in a safe place (even when its content is protected with a keystore password). ### Creating certificates @@ -110,7 +111,7 @@ ArangoSync provides commands to create all certificates needed. #### TLS server certificates To create a certificate used for TLS servers in the **keyfile** format, -you need the public key of the CA (`--cacert`), the private key of +you need the public key of the CA (`--cacert`), the private key of the CA (`--cakey`) and one or more hostnames (or IP addresses). Then run: @@ -124,7 +125,7 @@ arangosync create tls keyfile \ Make sure to store the generated keyfile (`my-tls-cert.keyfile`) in a safe place. To create a certificate used for TLS servers in the **crt** & **key** format, -you need the public key of the CA (`--cacert`), the private key of +you need the public key of the CA (`--cacert`), the private key of the CA (`--cakey`) and one or more hostnames (or IP addresses). Then run: @@ -141,15 +142,17 @@ Make sure to protect and store the generated files (`my-tls-cert.crt` & `my-tls- #### Client authentication certificates To create a certificate used for client authentication in the **keyfile** format, -you need the public key of the CA (`--cacert`), the private key of +you need the public key of the CA (`--cacert`), the private key of the CA (`--cakey`) and one or more hostnames (or IP addresses) or email addresses. Then run: -``` + +```bash arangosync create client-auth keyfile \ --cacert=my-client-auth-ca.crt --cakey=my-client-auth-ca.key \ [--host= | --email=] \ - --keyfile=my-client-auth-cert.keyfile + --keyfile=my-client-auth-cert.keyfile ``` + Make sure to protect and store the generated keyfile (`my-client-auth-cert.keyfile`) in a safe place. #### CA certificates @@ -162,15 +165,17 @@ arangosync create tls ca \ ``` Make sure to protect and store both generated files (`my-tls-ca.crt` & `my-tls-ca.key`) in a safe place. -
Note: CA certificates have a much longer lifetime than normal certificates. +
Note: CA certificates have a much longer lifetime than normal certificates. Therefore even more care is needed to store them safely. To create a CA certificate used to **sign client authentication certificates**, run: -``` + +```bash arangosync create client-auth ca \ - --cert=my-client-auth-ca.crt --key=my-client-auth-ca.key + --cert=my-client-auth-ca.crt --key=my-client-auth-ca.key ``` -Make sure to protect and store both generated files (`my-client-auth-ca.crt` & `my-client-auth-ca.key`) + +Make sure to protect and store both generated files (`my-client-auth-ca.crt` & `my-client-auth-ca.key`) in a safe place.
Note: CA certificates have a much longer lifetime than normal certificates. Therefore even more care is needed to store them safely. diff --git a/Documentation/Books/Manual/Troubleshooting/DC2DC/README.md b/Documentation/Books/Manual/Troubleshooting/DC2DC/README.md index a623802132..978a228071 100644 --- a/Documentation/Books/Manual/Troubleshooting/DC2DC/README.md +++ b/Documentation/Books/Manual/Troubleshooting/DC2DC/README.md @@ -1,17 +1,18 @@ + # Troubleshooting datacenter to datacenter replication -The _datacenter to datacenter replication_ is a distributed system with a lot -different components. As with any such system, it requires some, but not a lot, -of operational support. +The _datacenter to datacenter replication_ is a distributed system with a lot +different components. As with any such system, it requires some, but not a lot, +of operational support. This section includes information on how to troubleshoot the _datacenter to datacenter replication_. For a general introduction to the _datacenter to datacenter replication_, please -refer to the [Datacenter to datacenter replication](..\..\Scalability\DC2DC\README.md) +refer to the [Datacenter to datacenter replication](../../Scalability/DC2DC/README.md) chapter. -## What means are available to monitor status +## What means are available to monitor status All of the components of _ArangoSync_ provide means to monitor their status. Below you'll find an overview per component. @@ -25,7 +26,7 @@ Below you'll find an overview per component. - A metrics API `GET /metrics`. This API is compatible with Prometheus. Sample Grafana dashboards for inspecting these metrics are available. -- ArangoDB cluster: The `arangod` servers that make up the ArangoDB cluster +- ArangoDB cluster: The `arangod` servers that make up the ArangoDB cluster provide: - A log file. This is configurable with settings with a `log.` prefix. E.g. `--log.output=file://myLogFile` or `--log.level=info`. @@ -39,18 +40,18 @@ Below you'll find an overview per component. ## What to look for while monitoring status -The very first thing to do when monitoring the status of ArangoSync is to +The very first thing to do when monitoring the status of ArangoSync is to look into the status provided by `arangosync get status ... -v`. -When not everything is in the `running` state (on both datacenters), this is an -indication that something may be wrong. In case that happens, give it some time +When not everything is in the `running` state (on both datacenters), this is an +indication that something may be wrong. In case that happens, give it some time (incremental synchronization may take quite some time for large collections) and look at the status again. If the statuses do not change (or change, but not reach `running`) it is time to inspects the metrics & log files. -
When the metrics or logs seem to indicate a problem in a sync master or worker, it is +
When the metrics or logs seem to indicate a problem in a sync master or worker, it is safe to restart it, as long as only 1 instance is restarted at a time. Give restarted instances some time to "catch up". -## What to do when problems remain +## What to do when problems remain When a problem remains and restarting masters/workers does not solve the problem, contact support. Make sure to include provide support with the following information: @@ -70,24 +71,24 @@ contact support. Make sure to include provide support with the following informa ## What to do when a source datacenter is down When you use ArangoSync for backup of your cluster from one datacenter -to another and the source datacenter has a complete outage, you may consider +to another and the source datacenter has a complete outage, you may consider switching your applications to the target (backup) datacenter. This is what you must do in that case: -1. [Stop synchronization](..\..\Administration\DC2DC\README.md#stoping-synchronization) using: +1. [Stop synchronization](../../Administration/DC2DC/README.md#stoping-synchronization) using: ```bash arangosync stop sync ... ``` When the source datacenter is completely unresponsive this will not succeed. In that case use: - + ```bash arangosync abort sync ... - ``` - - See [Stoping synchronization](..\..\Administration\DC2DC\README.md#stoping-synchronization) + ``` + + See [Stoping synchronization](../../Administration/DC2DC/README.md#stoping-synchronization) for how to cleanup the source datacenter when it becomes available again. 1. Verify that configuration has completely stopped using: ```bash @@ -95,19 +96,19 @@ This is what you must do in that case: ``` 1. Reconfigure your applications to use the target (backup) datacenter. -When the original source datacenter is restored, you may switch roles and -make it the target datacenter. To do so, use `arangosync configure sync ...` -as described in [Reversing synchronization direction](..\..\Administration\DC2DC\README.md#reversing-synchronization-direction). +When the original source datacenter is restored, you may switch roles and +make it the target datacenter. To do so, use `arangosync configure sync ...` +as described in [Reversing synchronization direction](../../Administration/DC2DC/README.md#reversing-synchronization-direction). ## What to do in case of a planned network outage -All ArangoSync tasks send out heartbeat messages out to the other datacenter -to indicate "it is still alive". The other datacenter assumes the connection is +All ArangoSync tasks send out heartbeat messages out to the other datacenter +to indicate "it is still alive". The other datacenter assumes the connection is "out of sync" when it does not receive any messages for a certain period of time. -If you're planning some sort of maintenance where you know the connectivity -will be lost for some time (e.g. 3 hours), you can prepare ArangoSync for that -such that it will hold of re-synchronization for a given period of time. +If you're planning some sort of maintenance where you know the connectivity +will be lost for some time (e.g. 3 hours), you can prepare ArangoSync for that +such that it will hold off re-synchronization for a given period of time. To do so, on both datacenters, run: @@ -118,13 +119,14 @@ arangosync set message timeout \ --auth.password= \ 3h ``` -The last argument is the period that ArangoSync should hold-of resynchronization for. + +The last argument is the period that ArangoSync should hold-off resynchronization for. This can be minutes (e.g. `15m`) or hours (e.g. `3h`). If maintenance is taking longer than expected, you can use the same command the extend -the hold of period (e.g. to `4h`). +the hold-off period (e.g. to `4h`). -After the maintenance, use the same command restore the hold of period to its +After the maintenance, use the same command restore the hold-off period to its default of `1h`. ## What to do in case of a document that exceeds the message queue limits