1
0
Fork 0

Manual polishing.

This commit is contained in:
Max Neunhoeffer 2016-06-23 10:53:51 +02:00
parent ac4ed25dba
commit c1fe21df03
1 changed files with 14 additions and 14 deletions

View File

@ -4,7 +4,7 @@
Synchronous replication can be configured per collection via the property *replicationFactor*. Synchronous replication requires a cluster to operate.
Whenever you specify a *replicationFactor* greater than 1 when creating a collection, synchronous replication will be activated for this collection. The cluster will determine suitable *leaders* and *followers* for every requested shard (*numberOfShards*) within the cluster. When requesting data of a shard only the current leader will be asked whereas followers will only keep their copy in sync. Using *synchronous replication* alone will guarantee consistency and high availabilty at the cost of reduced performance (due to every write-request having to be executed on the followers). Combining it with [Sharding.md](sharding) will counteract that issue.
Whenever you specify a *replicationFactor* greater than 1 when creating a collection, synchronous replication will be activated for this collection. The cluster will determine suitable *leaders* and *followers* for every requested shard (*numberOfShards*) within the cluster. When requesting data of a shard only the current leader will be asked whereas followers will only keep their copy in sync. Using *synchronous replication* alone will guarantee consistency and high availabilty at the cost of reduced performance (due to every write-request having to be executed on the followers).
In a cluster synchronous replication will be managed by the *coordinators* for the client. The data will always be stored on *primaries*.
@ -21,7 +21,7 @@ The following example will give you an idea of how synchronous operation has bee
127.0.0.1:8530@_system> db.test.insert({"replication": "😎"})
4. The coordinator will write the data to the leader, which in turn will
replicate it to the follower
replicate it to the follower.
5. Only when both were successful the result is reported to be successful
{
@ -38,20 +38,20 @@ replicate it to the follower
!SUBSECTION Automatic failover
Whenever the leader of a shard is failing and there is a query trying to access data of that shard the coordinator will continue trying to contact the leader until it timeouts. Every 15 seconds the internal cluster supervision will validate cluster health. If the leader didn't come back in time the supervision will reorganize the cluster. The coordinator will then contact the new leader.
Whenever the leader of a shard is failing and there is a query trying to access data of that shard the coordinator will continue trying to contact the leader until it timeouts. The internal cluster supervision will check cluster health every few seconds and will take action if there is no heartbeat from a server for 15 seconds. If the leader doesn't come back in time the supervision will reorganize the cluster by promoting for each shard a follower that is in sync with its leader to be the new leader. From then on the coordinators will contact the new leader.
The process is best outlined using an example:
1. Leader of a shard (lets name it DBServer1) is going down
2. Coordinator is asked to return a document of a shard DBServer1 is managing:
1. The leader of a shard (lets name it DBServer001) is going down.
2. A coordinator is asked to return a document of a shard DBServer001 is managing:
127.0.0.1:8530@_system> db.test.document("100069")
3. Coordinator tries to contact the leader (DBServer1) and timeouts
4. Coordinator retries to contact the leader (DBServer1) and timeouts
5. Supervision detects outage of DBServer1
6. Supervision promotes one of the followers to be leader and makes DBServer1 a follower
7. Coordinator retries to contact the leader (DBServer2) and returns the result
3. The coordinator tries to contact the leader (DBServer001) and timeouts.
4. The coordinator retries to contact the leader (DBServer001) and timeouts.
5. The supervision detects outage of DBServer001.
6. The supervision promotes one of the followers (say DBServer002) that is in sync to be leader and makes DBServer001 a follower.
7. The coordinator retries to contact the leader (DBServer002) and returns the result:
{
"_key" : "100069",
@ -59,8 +59,8 @@ The process is best outlined using an example:
"_rev" : "513",
"replication" : "😎"
}
8. After a while supervision declares DBServer1 to be completely dead
9. New followers are determined from the pool of dbservers
10. New followers sync their data from the leader
8. After a while the supervision declares DBServer001 to be completely dead.
9. A new follower is determined from the pool of DBservers.
10. The new follower syncs its data from the leader and order is restored.
Please note that there may still be timeouts. Depending on when exactly the request has been done (in regard to the supervision heartbeat) and depending on the time needed to reconfigure the cluster the coordinator might fail with a timeout error!
Please note that there may still be timeouts. Depending on when exactly the request has been done (in regard to the supervision) and depending on the time needed to reconfigure the cluster the coordinator might fail with a timeout error!