A lot of progress with the Scalability chapter, nearly ready.

2016-06-18 23:33:41 -07:00 · 2016-06-18 23:33:41 -07:00 · 0d5703c334
parent 8adf1ac3b6
commit 0d5703c334
6 changed files with 340 additions and 72 deletions
--- a/Documentation/Books/Manual/SUMMARY.md
+++ b/Documentation/Books/Manual/SUMMARY.md
@ -18,8 +18,9 @@
  # * [Coming from MongoDB](GettingStarted/ComingFromMongoDb.md) #TODO
 #
 * [Scalability](Scalability/README.md)
-  * [Cluster](Scalability/Cluster.md)
-#  * [Joins](Scalability/Joins.md)
+  * [Architecture](Scalability/Architecture.md)
+  * [Data models](Scalability/DataModels.md)
+  * [Limitations](Scalability/Limitations.md)
 #
 * [Data model & modeling](DataModeling/README.md)
 # * [Collections](FirstSteps/CollectionsAndDocuments.md) #TODO
@ -142,8 +143,8 @@
      * [Implementation](Administration/Replication/Synchronous/Implementation.md)
      * [Configuration](Administration/Replication/Synchronous/Configuration.md)
  * [Sharding](Administration/Sharding/README.md)
-    * [Authentication](Administration/Sharding/Authentication.md)
-    * [Firewall setup](Administration/Sharding/FirewallSetup.md)
+#    * [Authentication](Administration/Sharding/Authentication.md)
+#    * [Firewall setup](Administration/Sharding/FirewallSetup.md)
  * [Web Interface](Administration/WebInterface/README.md)
    * [Queries](Administration/WebInterface/AqlEditor.md)
    * [Collections](Administration/WebInterface/Collections.md)
--- a/Documentation/Books/Manual/Scalability/Architecture.mdpp
+++ b/Documentation/Books/Manual/Scalability/Architecture.mdpp
@ -0,0 +1,270 @@
+!SECTION Architecture
+
+The cluster architecture of ArangoDB is a CP master/master model with no 
+single point of failure. With "CP" we mean that in the presence of a
+network partition, the database prefers internal consistency over 
+availability. With "master/master" we mean that clients can send their 
+requests to an arbitrary node, and experience the same view on the
+database regardless. "No single point of failure" means that the cluster
+can continue to serve requests, even if one machine fails completely.
+
+In this way, ArangoDB has been designed as a distributed multi-model 
+database. This section gives a short outline on the cluster architecture and
+how the above features and capabilities are achieved.
+
+!SUBSECTION Structure of an ArangoDB cluster
+
+An ArangoDB cluster consists of a number of ArangoDB instances
+which talk to each other over the network. They play different roles,
+which will be explained in detail below. The current configuration 
+of the cluster is held in the "Agency", which is a highly-available
+resilient key/value store based on an odd number of ArangoDB instances.
+
+!SUBSUBSECTION Cluster ID
+
+Every non-Agency ArangoDB instance in a cluster is assigned a unique
+ID during its startup. Using its ID a node is identifiable
+throughout the cluster. All cluster operations will communicate
+via this ID.
+
+For the various instances in an ArangoDB cluster there are 4 distinct
+roles: Agents, Coordinators, Primary and Secondary DBservers. In the
+following sections we will shed light on each of them.
+
+!SUBSUBSECTION Agents
+
+One or multiple Agents form the Agency in an ArangoDB cluster. The
+Agency is the central place to store the configuration in a cluster. It
+performs leader elections and provies other synchronisation services for
+the whole cluster. Without the Agency none of the other components can
+operate.
+
+While generally invisible to the outside it is the heart of the
+cluster. As such, fault tolerance is of course a must have for the
+Agency. To achieve that the Agents are using the [Raft Consensus
+Algorithm](https://raft.github.io/). The algorithm formally guarantees
+conflict free configuration management within the ArangoDB cluster.
+
+At its core the Agency manages a big configuration tree. It supports
+transactional read and write operations on this tree, and other servers
+can subscribe to HTTP callbacks for all changes to the tree.
+
+!SUBSUBSECTION Coordinators
+
+Coordinators should be accessible from the outside. These are the ones
+the clients talk to. They will coordinate cluster tasks like
+executing queries and running Foxx services. They know where the
+data is stored and will optimize where to run user supplied queries or
+parts thereof. Coordinators are stateless and can thus easily be shut down
+and restarted as needed.
+
+!SUBSUBSECTION Primary DBservers
+
+Primary DBservers are the ones where the data is actually hosted. They
+host shards of data and using synchronous replication a primary may
+either be leader or follower for a shard.
+
+They should not be accessed from the outside but indirectly through the
+coordinators. They may also execute queries in part or as a whole when
+asked by a coordinator.
+
+!SUBSUBSECTION Secondaries
+
+Secondary DBservers are asynchronous replicas of primaries. For each
+primary, there can be one ore more secondaries. Since the replication
+works asynchronously (eventual consistency), the replication does not
+impede the performance of the primaries. On the other hand, their
+replica of the data can be slightly out of date. The secondaries are
+perfectly suitable for backups as they don't interfere with the normal
+cluster operation.
+
+!SUBSECTION Sharding
+
+Using the roles outlined above an ArangoDB cluster is able to distribute
+data in so called shards across multiple primaries. From the outside
+this process is fully transparent and as such we achieve the goals of
+what other systems call "master-master replication". In an ArangoDB
+cluster you talk to any coordinator and whenever you read or write data
+it will automatically figure out where the data is stored (read) or to
+be stored (write). The information about the shards is shared across the
+coordinators using the Agency.
+
+!SUBSECTION Many sensible configurations
+
+This architecture is very flexible and thus allows many configurations,
+which are suitable for different usage scenarios:
+
+ 1. The default configuration is to run exactly one coorddinator and
+    one primary DBserver on each machine. This achieves the classical
+    master/master setup, since there is a perfect symmetry between the
+    different nodes, clients can equally well talk to any one of the
+    coordinators and all expose the same view to the datastore.
+ 2. One can deploy more coordinators than DBservers. This is a sensible
+    approach if one needs a lot of CPU power for the Foxx services, 
+    because they run on the coordinators.
+ 3. One can deploy more DBservers than coordinators if more data capacity 
+    is needed and the query performance is the lesser bottleneck
+ 4. One can deploy a coordinator on each machine where an application
+    server (e.g. a node.js server) runs, and the Agents and DBservers 
+    on a separate set of machines elsewhere. This avoids a network hop 
+    between the application server and the database and thus decreases
+    latency. Essentially, this moves some of the database distribution
+    logic to the machine where the client runs.
+
+These for shall suffice for now. The important piece of information here
+is that the coordinator layer can be scaled and deployed independently
+from the DBserver layer.
+
+!SUBSECTION Replication
+
+ArangoDB offers two ways of data replication within a cluster, synchronous
+and asynchronous. In this section we explain some details and highlight
+the advantages and disadvantages respectively.
+
+!SUBSUBSECTION Synchronous replication with automatic failover
+
+Synchronous replication works on a per-shard basis. One configures for
+each collection, how many copies of each shard are kept in the cluster.
+At any given time, one of the copies is declared to be the "leader" and
+all other replicas are "followers". Write operations for this shard
+are always sent to the DBserver which happens to hold the leader copy,
+which in turn replicates the changes to all followers before the operation
+is considered to be done and reported back to the coordinator.
+Read operations are all served by the server holding the leader copy,
+this allows to provide snapshot semantices for complex transactions.
+
+If a DBserver fails that holds a follower copy of a shard, then the leader
+can no longer synchronize its changes to that follower. After a short timeout
+(3 seconds), the leader gives up on the follower, declares it to be
+out of sync, and continues service without the follower. When the server
+with the follower copy comes back, it automatically resynchronizes its
+data with the leader and synchronous replication is restored.
+
+If a DBserver fails that holds a leader copy of a shard, then the leader
+can no longer serve any requests. It will no longer send a heartbeat to 
+the Agency. Therefore, a supervision process running in the Raft leader
+of the Agency, can take the necessary action (after 15 seconds of missing
+heartbeats), namely to promote one of the servers that hold in-sync
+replicas of the shard to leader for that shard. This involves a 
+reconfiguration in the Agency and leads to the fact that coordinators
+now contact a different DBserver for requests to this shard. Service
+resumes. The other surviving replicas automatically resynchronize their
+data with the new leader. When the DBserver with the original leader
+copy comes back, it notices that it now holds a follower replica,
+resynchronizes its data with the new leader and order is restored.
+
+All shard data synchronizations are done in an incremental way, such that
+resynchronizations are quick. This technology allows to move shards
+(follower and leader ones) between DBservers without service interruptions.
+Therefore, an ArangoDB cluster can move all the data on a specific DBserver
+to other DBservers and then shut down that server in a controlled way. 
+This allows to scale down an ArangoDB cluster without service interruption,
+loss of fault tolerance or data loss. Furthermore, one can rebalance the
+distribution of the shards, either manually or automatically.
+
+All these operations can be triggered via a REST/JSON API or via the
+graphical web UI. All failover operations are completely handled within
+the ArangoDB cluster.
+
+Obviously, synchronous replication involves a certain increased latency for
+write operations, simply because there is one more network hop within the 
+cluster for every request. Therefore the user can set the replication factor
+to 1, which means that only one copy of each shard ist kept, thereby
+switching off synchronous replication. This is a suitable setting for
+less important or easily recoverable data for which low latency write 
+operations matter.
+
+!SUBSUBSECTION Asynchronous replication with automatic failover
+
+Asynchronous replication works differently, in that it is organised
+using primary and secondary DBservers. Each secondary server replicates
+all the data held on a primary by polling in an asynchronous way. This
+process has very little impact on the performance of the primary. The
+disadvantage is that there is a delay between the confirmation of a
+write operation that is sent to the client and the actual replication of
+the data. If the master server fails during this delay, then committed
+and confirmed data can be lost.
+
+Nevertheless, we also offer automatic failover with this setup. Contrary 
+to the synchronous case, here the failover management is done from outside
+the ArangoDB cluster. In a future version we might move this management
+into the supervision process in the Agency, but as of now, the management
+is done via the Mesos framework scheduler for ArangoDB (see below).
+
+The granularity of the replication is a whole ArangoDB instance with
+all data that resides on that instance, which means that
+you need twice as many instances as without asynchronous replication. 
+Synchronous replication is more flexible in that respect, you can have 
+smaller and larger instances, and if one fails, the data can be rebalanced
+across the remaining ones.
+
+!SUBSECTION Microservices and zero administation
+
+The design and capabilities of ArangoDB are geared towards usage in
+modern microservice architectures of applications. With the 
+[Foxx services](../Foxx/README.md) it is very easy to deploy a data
+centric microservice within an ArangoDB cluster.
+
+Alternatively, one can deploy multiple instances of ArangoDB within the
+same project. One part of the project might need a scalable document
+store, another might need a graph database, and yet another might need
+the full power of a multi-model database actually mixing the various
+data models. There are enormous efficiency benefits to be reaped by
+being able to use a single technology for various roles in a project.
+
+To simplify live of the devops in such a scenario we try as much as
+possible to use a zero administration approach for ArangoDB. A running
+ArangoDB cluster is resilient against failures and essentially repairs
+itself in case of temporary failures. See the next section for further
+capabilities in this direction.
+
+!SUBSECTION Apache Mesos integration
+
+For the distributed setup, we use the Apache Mesos infrastructure by default.
+ArangoDB is a fully certified package for the Mesosphere DC/OS and can thus
+be deployed essentially with a few mouse clicks or a single command, once
+you have an existing DC/OS cluster. But even on a plain Apache Mesos cluster
+one can deploy ArangoDB via Marathon with a single API call and some JSON 
+configuration.
+
+The advantage of this approach is that we can not only implement the 
+initial deployment, but also the later management of automatic 
+replacement of failed instances and the scaling of the ArangoDB cluster
+(triggered manually or even automatically). Since all manipulations are
+either via the graphical web UI or via JSON/REST calls, one can even
+implement autoscaling very easily.
+
+A DC/OS cluster is a very natural environment to deploy microservice
+architectures, since it is so convenient to deploy various services,
+including potentially multiple ArangoDB cluster instances within the
+same DC/OS cluster. The builtin service discovery makes it extremely
+simple to connect the various microservices and Mesos automatically
+takes care of the distribution and deployment of the various tasks.
+
+As of June 2016, we offer Apache Mesos integration, later there will
+be integration with other cluster management infrastructures. See the
+[Deployment](../Deployment/README.md) chapter and its subsections for
+instructions.
+
+It is possible to deploy an ArangoDB cluster by simply launching a bunch of 
+Docker containers with the right command line options to link them up, 
+or even on a single machine starting multiple ArangoDB processes. In that 
+case, synchronous replication will work within the deployed ArangoDB cluster,
+and automatic failover in the sense that the duties of a failed server will
+automatically be assigned to another, surviving one. However, since the
+ArangoDB cluster cannot within itself launch additional instances, replacement
+of failed nodes is not automatic and scaling up and down has to be managed
+manually. This is why we do not recommend this setup for production 
+deployment.
+
+!SUBSECTION Authentication
+
+As of version 3.0 ArangoDB authentication is **NOT** supported within a
+cluster. You **HAVE** to properly secure your cluster to the outside.
+Most setups will have a secured data center anyway and ArangoDB will
+be accessed from the outside via an application. To this application
+only the coordinators need to be made available. If you want to isolate
+even further you can install a reverse proxy like haproxy or nginx in
+front of the coordinators (that will also allow easy access from the
+application).
+
--- a/Documentation/Books/Manual/Scalability/Cluster.mdpp
+++ b/Documentation/Books/Manual/Scalability/Cluster.mdpp
@ -1,39 +0,0 @@
-!CHAPTER Cluster Scalability
-
-ArangoDB has been designed as a distributed multi model database. In this chapter we will give a short outline on the cluster architecture.
-
-!SUBSECTION Cluster ID
-
-Every node in a cluster will be assigned a uniquely generated ID during its startup. Using its ID a node is identifiable througout the cluster. All cluster operations will communicate via this ID.
-
-!SUBSECTION Roles in an ArangoDB cluster
-
-In an ArangoDB cluster there are 4 distinct roles: Agents, Coordinators, Primaries and Secondaries. In the following sections we will shed light on each of them.
-
-!SUBSUBSECTION Agents
-
-One or multiple Agents form the Agency in an ArangoDB cluster. The Agency is the central place to store the configuration in a cluster. Without it none of the other components can operate. While generally invisible to the outside it is the heart of the cluster. As such, failure tolerance is of course a must have for the Agency. To achieve that the Agents are using the [Raft Consensus Algorithm](https://raft.github.io/). The algorithm formally guarantees conflict free configuration management within the ArangoDB cluster.
-
-At its core the Agency manages a big configuration tree. It supports transactional read and write operations on this tree.
-
-!SUBSUBSECTION Coordinators
-
-Coordinators should be accessible from the outside. These are the ones the clients should talk to. They will coordinate cluster tasks like executing queries and running foxx applications. They know where the data is stored and will optimize where to run user supplied queries or parts thereof.
-
-!SUBSUBSECTION Primaries
-
-Primaries are the ones where the data is actually hosted. They host shards of data and using synchronized replication a Primary may either be leader or follower for a shard.
-
-They should not be accessed from the outside but indirect through the coordinator. They may also execute queries in part or as a whole when asked by a coordinator.
-
-!SUBSUBSECTION Secondaries
-
-Secondaries are asynchronous followers of primaries. They are perfectly suitable for backups as they don't interfere with the normal cluster operation.
-
-!SUBSECTION Sharding
-
-Using the roles outlined above an ArangoDB cluster is able to distribute data in so called shards across multiple primaries. From the outside this process is fully transparent and as such we achieve the goals of what other systems call "master-master replication". In an ArangoDB cluster you talk to any coordinator and whenever you read or write data it will automatically figure out where the data is stored (read) or to be stored (write). The information about the shards is shared across the coordinators using the Agency.
-
-!SUBSECTION Authentication
-
-As of version 3.0 ArangoDB authentication is **NOT** supported within a cluster. You **HAVE** to properly secure your cluster to the outside. Most setups will have a secured data center anyway and ArangoDB will be accessed from the outside via an application. To this application only the coordinators need to be made available. If you want to isolate even further you can install a reverse proxy like haproxy or nginx in front of the coordinators (that will also allow easy access from the application).
--- a/Documentation/Books/Manual/Scalability/DataModels.mdpp
+++ b/Documentation/Books/Manual/Scalability/DataModels.mdpp
@ -0,0 +1,35 @@
+!SECTION Different data models and scalability
+
+In this section we discuss scalability in the context of the different
+data models supported by ArangoDB.
+
+!SUBSECTION Key/value pairs
+
+The key/value store data model is the easiest to scale. In ArangoDB,
+this is implemented in the sense that a document collection always has 
+a primary key `_key` attribute and in the absence of further secondary
+indexes the document collection behaves like a simple key/value store.
+
+The only operations that are possible in this context are single key
+lookups and key/value pair insertions and updates. If `_key` is the
+only sharding attribute then the sharding is done with respect to the
+primary key and all these operations scale linearly. If the sharding is
+done using different shard keys, then a lookup of a single key involves
+asking all shards and thus does not scale linearly.
+
+!SUBSECTION document store
+
+For the document store case even in the presence of secondary indexes
+essentially the same arguments apply, since an index for a sharded
+collection is simply the same as a local index for each shard. Therefore,
+single document operations still scale linearly with the size of the
+cluster, unless a special sharding configuration makes lookups or
+write operations more expensive.
+
+!SUBSECTION complex queries and joins
+
+TODO
+
+!SUBSECTION graph database
+
+TODO
--- a/Documentation/Books/Manual/Scalability/Limitations.mdpp
+++ b/Documentation/Books/Manual/Scalability/Limitations.mdpp
@ -0,0 +1,13 @@
+!SECTION Limitations
+
+ArangoDB has no builtin limitations to horizontal scalability. The
+central resilient Agency will easily sustain hundreds of DBservers
+and coordinators, and the usual database operations work completely
+decentrally and do not require assistence of the Agency.
+
+Likewise, the supervision process in the Agency can easily deal
+with lots of servers, since all its activities are not performance
+critical.
+
+Obviously, an ArangoDB cluster is limited by the available resources
+of CPU, memory, disk and network bandwidth and latency.
--- a/Documentation/Books/Manual/Scalability/README.mdpp
+++ b/Documentation/Books/Manual/Scalability/README.mdpp
@ -1,35 +1,23 @@
 !CHAPTER Scalability

-For single instance setups we provide binary packages for various Linux
-distributions, for Mac OSX and for Windows, as well as Docker images. 
-Installation is mostly straightforward using the standard package managers
-or deployment strategies. See also
-[our download page](https://www.arangodb.com/download/).
+ArangoDB is a distributed database supporting multiple data models,
+and can thus be scaled horizontally, that is, by using many servers,
+typically based on commodity hardware. This approach not only delivers 
+performance as well as capacity improvements, but also achieves
+resilience by means of replication and automatic failover. Furthermore,
+one can build systems that scale their capacity dynamically up and down 
+automatically according to demand.

-For the distributed setup, we use the Apache Mesos infrastructure by default.
-ArangoDB is a fully certified package for the Mesosphere DC/OS and can thus
-be deployed essentially with a few mouse clicks or a single command, once
-you have an existing DC/OS cluster. But even on a plain Apache Mesos cluster
-one can deploy ArangoDB via Marathon with a single API call and some JSON 
-configuration.
+Obviously, one can also scale ArangoDB vertically, that is, by using
+ever larger servers. However, this has the disadvantage that the
+costs grow faster than linear with the size of the server, and
+none of the resilience and dynamical capabilities can be achieved 
+in this way.

-The advantage of this approach is that we can not only implement the 
-initial deployment, but also the later management of automatic 
-replacement of failed instances and the scaling of the ArangoDB cluster
-(triggered manually or even automatically).
+In this chapter we explain the distributed architecture of ArangoDB and
+discuss its scalability features and limitations:

-As of June 2016, we offer Apache Mesos integration, later there will
-be integration with other cluster management infrastructures.  See the
-[Deployment](../Deployment/README.md) chapter and its subsections for
-instructions.
+  - [ArangoDB's distributed architecture](Architecture.md)
+  - [Different data models and scalability](DataModels.md)
+  - [Limitations](Limitations.md)

-It is possible to deploy an ArangoDB cluster by simply launching a bunch of 
-Docker containers with the right command line options to link them up, 
-or even on a single machine starting multiple ArangoDB processes. In that 
-case, synchronous replication will work within the deployed ArangoDB cluster,
-and automatic failover in the sense that the duties of a failed server will
-automatically be assigned to another, surviving one. However, since the
-ArangoDB cluster cannot within itself launch additional instances, replacement
-of failed nodes is not automatic and scaling up and down has to be managed
-manually. This is why we do not recommend this setup for production 
-deployment.