mirror of https://gitee.com/bigwinds/arangodb
Merge branch 'devel' of github.com:arangodb/arangodb into devel
This commit is contained in:
commit
3d7a507a7c
|
@ -18,18 +18,13 @@ An ArangoDB cluster consists of a number of ArangoDB instances
|
||||||
which talk to each other over the network. They play different roles,
|
which talk to each other over the network. They play different roles,
|
||||||
which will be explained in detail below. The current configuration
|
which will be explained in detail below. The current configuration
|
||||||
of the cluster is held in the "Agency", which is a highly-available
|
of the cluster is held in the "Agency", which is a highly-available
|
||||||
resilient key/value store based on an odd number of ArangoDB instances.
|
resilient key/value store based on an odd number of ArangoDB instances
|
||||||
|
running [Raft Consensus Protocol](https://raft.github.io/).
|
||||||
!SUBSUBSECTION Cluster ID
|
|
||||||
|
|
||||||
Every non-Agency ArangoDB instance in a cluster is assigned a unique
|
|
||||||
ID during its startup. Using its ID a node is identifiable
|
|
||||||
throughout the cluster. All cluster operations will communicate
|
|
||||||
via this ID.
|
|
||||||
|
|
||||||
For the various instances in an ArangoDB cluster there are 4 distinct
|
For the various instances in an ArangoDB cluster there are 4 distinct
|
||||||
roles: Agents, Coordinators, Primary and Secondary DBservers. In the
|
roles: Agents, Coordinators, Primary and Secondary DBservers. In the
|
||||||
following sections we will shed light on each of them.
|
following sections we will shed light on each of them. Note that the
|
||||||
|
tasks for all roles run the same binary from the same Docker image.
|
||||||
|
|
||||||
!SUBSUBSECTION Agents
|
!SUBSUBSECTION Agents
|
||||||
|
|
||||||
|
@ -70,13 +65,21 @@ asked by a coordinator.
|
||||||
|
|
||||||
!SUBSUBSECTION Secondaries
|
!SUBSUBSECTION Secondaries
|
||||||
|
|
||||||
Secondary DBservers are asynchronous replicas of primaries. For each
|
Secondary DBservers are asynchronous replicas of primaries. If one is
|
||||||
primary, there can be one ore more secondaries. Since the replication
|
using only synchronous replication, one does not need secondaries at all.
|
||||||
works asynchronously (eventual consistency), the replication does not
|
For each primary, there can be one or more secondaries. Since the
|
||||||
impede the performance of the primaries. On the other hand, their
|
replication works asynchronously (eventual consistency), the replication
|
||||||
replica of the data can be slightly out of date. The secondaries are
|
does not impede the performance of the primaries. On the other hand,
|
||||||
perfectly suitable for backups as they don't interfere with the normal
|
their replica of the data can be slightly out of date. The secondaries
|
||||||
cluster operation.
|
are perfectly suitable for backups as they don't interfere with the
|
||||||
|
normal cluster operation.
|
||||||
|
|
||||||
|
!SUBSUBSECTION Cluster ID
|
||||||
|
|
||||||
|
Every non-Agency ArangoDB instance in a cluster is assigned a unique
|
||||||
|
ID during its startup. Using its ID a node is identifiable
|
||||||
|
throughout the cluster. All cluster operations will communicate
|
||||||
|
via this ID.
|
||||||
|
|
||||||
!SUBSECTION Sharding
|
!SUBSECTION Sharding
|
||||||
|
|
||||||
|
@ -205,7 +208,7 @@ modern microservice architectures of applications. With the
|
||||||
[Foxx services](../Foxx/README.md) it is very easy to deploy a data
|
[Foxx services](../Foxx/README.md) it is very easy to deploy a data
|
||||||
centric microservice within an ArangoDB cluster.
|
centric microservice within an ArangoDB cluster.
|
||||||
|
|
||||||
Alternatively, one can deploy multiple instances of ArangoDB within the
|
In addition, one can deploy multiple instances of ArangoDB within the
|
||||||
same project. One part of the project might need a scalable document
|
same project. One part of the project might need a scalable document
|
||||||
store, another might need a graph database, and yet another might need
|
store, another might need a graph database, and yet another might need
|
||||||
the full power of a multi-model database actually mixing the various
|
the full power of a multi-model database actually mixing the various
|
||||||
|
@ -221,7 +224,7 @@ capabilities in this direction.
|
||||||
!SUBSECTION Apache Mesos integration
|
!SUBSECTION Apache Mesos integration
|
||||||
|
|
||||||
For the distributed setup, we use the Apache Mesos infrastructure by default.
|
For the distributed setup, we use the Apache Mesos infrastructure by default.
|
||||||
ArangoDB is a fully certified package for the Mesosphere DC/OS and can thus
|
ArangoDB is a fully certified package for DC/OS and can thus
|
||||||
be deployed essentially with a few mouse clicks or a single command, once
|
be deployed essentially with a few mouse clicks or a single command, once
|
||||||
you have an existing DC/OS cluster. But even on a plain Apache Mesos cluster
|
you have an existing DC/OS cluster. But even on a plain Apache Mesos cluster
|
||||||
one can deploy ArangoDB via Marathon with a single API call and some JSON
|
one can deploy ArangoDB via Marathon with a single API call and some JSON
|
||||||
|
@ -268,3 +271,5 @@ even further you can install a reverse proxy like haproxy or nginx in
|
||||||
front of the coordinators (that will also allow easy access from the
|
front of the coordinators (that will also allow easy access from the
|
||||||
application).
|
application).
|
||||||
|
|
||||||
|
Authentication in the cluster will be added soon after the initial 3.0
|
||||||
|
release.
|
||||||
|
|
|
@ -17,7 +17,7 @@ primary key and all these operations scale linearly. If the sharding is
|
||||||
done using different shard keys, then a lookup of a single key involves
|
done using different shard keys, then a lookup of a single key involves
|
||||||
asking all shards and thus does not scale linearly.
|
asking all shards and thus does not scale linearly.
|
||||||
|
|
||||||
!SUBSECTION document store
|
!SUBSECTION Document store
|
||||||
|
|
||||||
For the document store case even in the presence of secondary indexes
|
For the document store case even in the presence of secondary indexes
|
||||||
essentially the same arguments apply, since an index for a sharded
|
essentially the same arguments apply, since an index for a sharded
|
||||||
|
@ -26,10 +26,51 @@ single document operations still scale linearly with the size of the
|
||||||
cluster, unless a special sharding configuration makes lookups or
|
cluster, unless a special sharding configuration makes lookups or
|
||||||
write operations more expensive.
|
write operations more expensive.
|
||||||
|
|
||||||
!SUBSECTION complex queries and joins
|
For a deeper analysis of this topic see
|
||||||
|
[this blog post](https://mesosphere.com/blog/2015/11/30/arangodb-benchmark-dcos/)
|
||||||
|
in which good linear scalability of ArangoDB for single document operations
|
||||||
|
is demonstrated.
|
||||||
|
|
||||||
TODO
|
|
||||||
|
|
||||||
!SUBSECTION graph database
|
!SUBSECTION Complex queries and joins
|
||||||
|
|
||||||
TODO
|
The AQL query language allows complex queries, using multiple
|
||||||
|
collections, secondary indexes as well as joins. In particular with
|
||||||
|
the latter, scaling can be a challenge, since if the data to be
|
||||||
|
joined resides on different machines, a lot of communication
|
||||||
|
has to happen. The AQL query execution engine organises a data
|
||||||
|
pipeline across the cluster to put together the results in the
|
||||||
|
most efficient way. The query optimizer is aware of the cluster
|
||||||
|
structure and knows what data is where and how it is indexed.
|
||||||
|
Therefore, it can arrive at an informed decision about what parts
|
||||||
|
of the query ought to run where in the cluster.
|
||||||
|
|
||||||
|
Nevertheless, for certain complicated joins, there are limits as
|
||||||
|
to what can be achieved. A very important case that can be
|
||||||
|
optimized relatively easily is if one of the collections involved
|
||||||
|
in the join is small enough such that it is possible to
|
||||||
|
replicated its data on all machines. We call such a collection a
|
||||||
|
"satellite collection". Due to the replication a join involving
|
||||||
|
such a collection can be executed locally without too much
|
||||||
|
communication overhead.
|
||||||
|
|
||||||
|
|
||||||
|
!SUBSECTION Graph database
|
||||||
|
|
||||||
|
Graph databases are particularly good at queries on graphs that involve
|
||||||
|
paths in the graph of an a priori unknown length. For example, finding
|
||||||
|
the shortest path between two vertices in a graph, or finding all
|
||||||
|
paths that match a certain pattern starting at a given vertex are such
|
||||||
|
example.
|
||||||
|
|
||||||
|
However, if the vertices and edges along the occurring paths are
|
||||||
|
distributed across the cluster, then a lot of communication is
|
||||||
|
necessary between nodes, and performance suffers. To achieve good
|
||||||
|
performance at scale, it is therefore necessary, to get the
|
||||||
|
distribution of the graph data across the shards in the cluster
|
||||||
|
right. Most of the time, the application developers and users of
|
||||||
|
ArangoDB know best, how their graphs a structured. Therefore,
|
||||||
|
ArangoDB allows users to specify, according to which attributes
|
||||||
|
the graph data is sharded. A useful first step is usually to make
|
||||||
|
sure that the edges originating at a vertex reside on the same
|
||||||
|
cluster node as the vertex.
|
||||||
|
|
|
@ -8,8 +8,12 @@ resilience by means of replication and automatic failover. Furthermore,
|
||||||
one can build systems that scale their capacity dynamically up and down
|
one can build systems that scale their capacity dynamically up and down
|
||||||
automatically according to demand.
|
automatically according to demand.
|
||||||
|
|
||||||
Obviously, one can also scale ArangoDB vertically, that is, by using
|
One can also scale ArangoDB vertically, that is, by using
|
||||||
ever larger servers. However, this has the disadvantage that the
|
ever larger servers. There is no builtin limitation in ArangoDB,
|
||||||
|
for example, the server will automatically use more threads if
|
||||||
|
more CPUs are present.
|
||||||
|
|
||||||
|
However, scaling vertically has the disadvantage that the
|
||||||
costs grow faster than linear with the size of the server, and
|
costs grow faster than linear with the size of the server, and
|
||||||
none of the resilience and dynamical capabilities can be achieved
|
none of the resilience and dynamical capabilities can be achieved
|
||||||
in this way.
|
in this way.
|
||||||
|
|
Loading…
Reference in New Issue