1
0
Fork 0

Merge branch 'devel' of github.com:arangodb/arangodb into devel

This commit is contained in:
hkernbach 2016-06-22 00:54:43 +02:00
commit 3d7a507a7c
3 changed files with 75 additions and 25 deletions

View File

@ -18,18 +18,13 @@ An ArangoDB cluster consists of a number of ArangoDB instances
which talk to each other over the network. They play different roles, which talk to each other over the network. They play different roles,
which will be explained in detail below. The current configuration which will be explained in detail below. The current configuration
of the cluster is held in the "Agency", which is a highly-available of the cluster is held in the "Agency", which is a highly-available
resilient key/value store based on an odd number of ArangoDB instances. resilient key/value store based on an odd number of ArangoDB instances
running [Raft Consensus Protocol](https://raft.github.io/).
!SUBSUBSECTION Cluster ID
Every non-Agency ArangoDB instance in a cluster is assigned a unique
ID during its startup. Using its ID a node is identifiable
throughout the cluster. All cluster operations will communicate
via this ID.
For the various instances in an ArangoDB cluster there are 4 distinct For the various instances in an ArangoDB cluster there are 4 distinct
roles: Agents, Coordinators, Primary and Secondary DBservers. In the roles: Agents, Coordinators, Primary and Secondary DBservers. In the
following sections we will shed light on each of them. following sections we will shed light on each of them. Note that the
tasks for all roles run the same binary from the same Docker image.
!SUBSUBSECTION Agents !SUBSUBSECTION Agents
@ -70,13 +65,21 @@ asked by a coordinator.
!SUBSUBSECTION Secondaries !SUBSUBSECTION Secondaries
Secondary DBservers are asynchronous replicas of primaries. For each Secondary DBservers are asynchronous replicas of primaries. If one is
primary, there can be one ore more secondaries. Since the replication using only synchronous replication, one does not need secondaries at all.
works asynchronously (eventual consistency), the replication does not For each primary, there can be one or more secondaries. Since the
impede the performance of the primaries. On the other hand, their replication works asynchronously (eventual consistency), the replication
replica of the data can be slightly out of date. The secondaries are does not impede the performance of the primaries. On the other hand,
perfectly suitable for backups as they don't interfere with the normal their replica of the data can be slightly out of date. The secondaries
cluster operation. are perfectly suitable for backups as they don't interfere with the
normal cluster operation.
!SUBSUBSECTION Cluster ID
Every non-Agency ArangoDB instance in a cluster is assigned a unique
ID during its startup. Using its ID a node is identifiable
throughout the cluster. All cluster operations will communicate
via this ID.
!SUBSECTION Sharding !SUBSECTION Sharding
@ -205,7 +208,7 @@ modern microservice architectures of applications. With the
[Foxx services](../Foxx/README.md) it is very easy to deploy a data [Foxx services](../Foxx/README.md) it is very easy to deploy a data
centric microservice within an ArangoDB cluster. centric microservice within an ArangoDB cluster.
Alternatively, one can deploy multiple instances of ArangoDB within the In addition, one can deploy multiple instances of ArangoDB within the
same project. One part of the project might need a scalable document same project. One part of the project might need a scalable document
store, another might need a graph database, and yet another might need store, another might need a graph database, and yet another might need
the full power of a multi-model database actually mixing the various the full power of a multi-model database actually mixing the various
@ -221,7 +224,7 @@ capabilities in this direction.
!SUBSECTION Apache Mesos integration !SUBSECTION Apache Mesos integration
For the distributed setup, we use the Apache Mesos infrastructure by default. For the distributed setup, we use the Apache Mesos infrastructure by default.
ArangoDB is a fully certified package for the Mesosphere DC/OS and can thus ArangoDB is a fully certified package for DC/OS and can thus
be deployed essentially with a few mouse clicks or a single command, once be deployed essentially with a few mouse clicks or a single command, once
you have an existing DC/OS cluster. But even on a plain Apache Mesos cluster you have an existing DC/OS cluster. But even on a plain Apache Mesos cluster
one can deploy ArangoDB via Marathon with a single API call and some JSON one can deploy ArangoDB via Marathon with a single API call and some JSON
@ -268,3 +271,5 @@ even further you can install a reverse proxy like haproxy or nginx in
front of the coordinators (that will also allow easy access from the front of the coordinators (that will also allow easy access from the
application). application).
Authentication in the cluster will be added soon after the initial 3.0
release.

View File

@ -17,7 +17,7 @@ primary key and all these operations scale linearly. If the sharding is
done using different shard keys, then a lookup of a single key involves done using different shard keys, then a lookup of a single key involves
asking all shards and thus does not scale linearly. asking all shards and thus does not scale linearly.
!SUBSECTION document store !SUBSECTION Document store
For the document store case even in the presence of secondary indexes For the document store case even in the presence of secondary indexes
essentially the same arguments apply, since an index for a sharded essentially the same arguments apply, since an index for a sharded
@ -26,10 +26,51 @@ single document operations still scale linearly with the size of the
cluster, unless a special sharding configuration makes lookups or cluster, unless a special sharding configuration makes lookups or
write operations more expensive. write operations more expensive.
!SUBSECTION complex queries and joins For a deeper analysis of this topic see
[this blog post](https://mesosphere.com/blog/2015/11/30/arangodb-benchmark-dcos/)
in which good linear scalability of ArangoDB for single document operations
is demonstrated.
TODO
!SUBSECTION graph database !SUBSECTION Complex queries and joins
TODO The AQL query language allows complex queries, using multiple
collections, secondary indexes as well as joins. In particular with
the latter, scaling can be a challenge, since if the data to be
joined resides on different machines, a lot of communication
has to happen. The AQL query execution engine organises a data
pipeline across the cluster to put together the results in the
most efficient way. The query optimizer is aware of the cluster
structure and knows what data is where and how it is indexed.
Therefore, it can arrive at an informed decision about what parts
of the query ought to run where in the cluster.
Nevertheless, for certain complicated joins, there are limits as
to what can be achieved. A very important case that can be
optimized relatively easily is if one of the collections involved
in the join is small enough such that it is possible to
replicated its data on all machines. We call such a collection a
"satellite collection". Due to the replication a join involving
such a collection can be executed locally without too much
communication overhead.
!SUBSECTION Graph database
Graph databases are particularly good at queries on graphs that involve
paths in the graph of an a priori unknown length. For example, finding
the shortest path between two vertices in a graph, or finding all
paths that match a certain pattern starting at a given vertex are such
example.
However, if the vertices and edges along the occurring paths are
distributed across the cluster, then a lot of communication is
necessary between nodes, and performance suffers. To achieve good
performance at scale, it is therefore necessary, to get the
distribution of the graph data across the shards in the cluster
right. Most of the time, the application developers and users of
ArangoDB know best, how their graphs a structured. Therefore,
ArangoDB allows users to specify, according to which attributes
the graph data is sharded. A useful first step is usually to make
sure that the edges originating at a vertex reside on the same
cluster node as the vertex.

View File

@ -8,8 +8,12 @@ resilience by means of replication and automatic failover. Furthermore,
one can build systems that scale their capacity dynamically up and down one can build systems that scale their capacity dynamically up and down
automatically according to demand. automatically according to demand.
Obviously, one can also scale ArangoDB vertically, that is, by using One can also scale ArangoDB vertically, that is, by using
ever larger servers. However, this has the disadvantage that the ever larger servers. There is no builtin limitation in ArangoDB,
for example, the server will automatically use more threads if
more CPUs are present.
However, scaling vertically has the disadvantage that the
costs grow faster than linear with the size of the server, and costs grow faster than linear with the size of the server, and
none of the resilience and dynamical capabilities can be achieved none of the resilience and dynamical capabilities can be achieved
in this way. in this way.