Add documentation for synchronous replication.

2016-06-10 16:56:31 +02:00 · 2016-06-10 16:56:31 +02:00 · 3af860b3e8
parent 794a08dd86
commit 3af860b3e8
2 changed files with 55 additions and 0 deletions
--- a/Documentation/Books/Manual/DataModeling/Collections/DatabaseMethods.mdpp
+++ b/Documentation/Books/Manual/DataModeling/Collections/DatabaseMethods.mdpp
@ -125,6 +125,19 @@ to the [naming conventions](../NamingConventions/README.md).
  attribute and this can only be done efficiently if this is the
  only shard key by delegating to the individual shards.

+* *replicationFactor* (optional, default is 1): in a cluster, this
+  attribute determines how many copies of each shard are kept on 
+  different DBServers. The value 1 means that only one copy (no
+  synchronous replication) is kept. A value of k means that
+  k-1 replicas are kept. Any two copies reside on different DBServers.
+  Replication between them is synchronous, that is, every write operation
+  to the "leader" copy will be replicated to all "follower" replicas,
+  before the write operation is reported successful.
+
+  If a server fails, this is detected automatically and one of the
+  servers holding copies take over, usually without an error being
+  reported. 
+
 `db._create(collection-name, properties, type)`

 Specifies the optional *type* of the collection, it can either be *document* 
--- a/Documentation/Books/Manual/DataModeling/Collections/README.mdpp
+++ b/Documentation/Books/Manual/DataModeling/Collections/README.mdpp
@ -44,3 +44,45 @@ use

 This call will create a new collection called *collection-name*.
 This method is a database method and is documented in detail at [Database Methods](DatabaseMethods.md#create)
+
+!SUBSECTION Synchronous replication
+
+Starting in ArangoDB 3.0, the distributed version offers synchronous
+replication, which means that there is the option to replicate all data
+automatically within the ArangoDB cluster. This is configured for sharded
+collections on a per collection basis by specifying a "replication factor"
+when the collection is created. A replication factor of k means that 
+altogether k copies of each shard are kept in the cluster on k different
+servers, and are kept in sync. That is, every write operation is automatically
+replicated on all copies.
+
+This is organised using a leader/follower model. At all times, one of the
+servers holding replicas for a shard is "the leader" and all others
+are "followers", this configuration is held in the Agency (see 
+[Scalability](../../Scalability/README.md) for details of the ArangoDB
+cluster architecture). Every write operation is sent to the leader
+by one of the coordinators, and then replicated to all followers
+before the operation is reported to have succeeded. The leader keeps
+a record of which followers are currently in sync. In case of network
+problems or a failure of a follower, a leader can and will drop a follower 
+temporarily after 3 seconds, such that service can resume. In due course,
+the follower will automatically resynchronize with the leader to restore
+resilience.
+
+If a leader fails, the cluster Agency automatically initiates a failover
+routine after around 15 seconds, promoting one of the followers to
+leader. The other followers (and the former leader, when it comes back),
+automatically resynchronize with the new leader to restore resilience.
+Usually, this whole failover procedure can be handled transparently
+for the coordinator, such that the user code does not even see an error 
+message.
+
+Obviously, this fault tolerance comes at a cost of increased latency.
+Each write operation needs an additional network roundtrip for the
+synchronous replication of the followers, but all replication operations
+to all followers happen concurrently. This is, why the default replication
+factor is 1, which means no replication.
+
+For details on how to switch on synchronous replication for a collection,
+see the database method `db._create(collection-name)` in the section about 
+[Database Methods](DatabaseMethods.md#create).