mirror of https://gitee.com/bigwinds/arangodb
Merge branch 'devel' of github.com:arangodb/arangodb into devel
This commit is contained in:
commit
3d7a507a7c
|
@ -18,18 +18,13 @@ An ArangoDB cluster consists of a number of ArangoDB instances
|
|||
which talk to each other over the network. They play different roles,
|
||||
which will be explained in detail below. The current configuration
|
||||
of the cluster is held in the "Agency", which is a highly-available
|
||||
resilient key/value store based on an odd number of ArangoDB instances.
|
||||
|
||||
!SUBSUBSECTION Cluster ID
|
||||
|
||||
Every non-Agency ArangoDB instance in a cluster is assigned a unique
|
||||
ID during its startup. Using its ID a node is identifiable
|
||||
throughout the cluster. All cluster operations will communicate
|
||||
via this ID.
|
||||
resilient key/value store based on an odd number of ArangoDB instances
|
||||
running [Raft Consensus Protocol](https://raft.github.io/).
|
||||
|
||||
For the various instances in an ArangoDB cluster there are 4 distinct
|
||||
roles: Agents, Coordinators, Primary and Secondary DBservers. In the
|
||||
following sections we will shed light on each of them.
|
||||
following sections we will shed light on each of them. Note that the
|
||||
tasks for all roles run the same binary from the same Docker image.
|
||||
|
||||
!SUBSUBSECTION Agents
|
||||
|
||||
|
@ -70,13 +65,21 @@ asked by a coordinator.
|
|||
|
||||
!SUBSUBSECTION Secondaries
|
||||
|
||||
Secondary DBservers are asynchronous replicas of primaries. For each
|
||||
primary, there can be one ore more secondaries. Since the replication
|
||||
works asynchronously (eventual consistency), the replication does not
|
||||
impede the performance of the primaries. On the other hand, their
|
||||
replica of the data can be slightly out of date. The secondaries are
|
||||
perfectly suitable for backups as they don't interfere with the normal
|
||||
cluster operation.
|
||||
Secondary DBservers are asynchronous replicas of primaries. If one is
|
||||
using only synchronous replication, one does not need secondaries at all.
|
||||
For each primary, there can be one or more secondaries. Since the
|
||||
replication works asynchronously (eventual consistency), the replication
|
||||
does not impede the performance of the primaries. On the other hand,
|
||||
their replica of the data can be slightly out of date. The secondaries
|
||||
are perfectly suitable for backups as they don't interfere with the
|
||||
normal cluster operation.
|
||||
|
||||
!SUBSUBSECTION Cluster ID
|
||||
|
||||
Every non-Agency ArangoDB instance in a cluster is assigned a unique
|
||||
ID during its startup. Using its ID a node is identifiable
|
||||
throughout the cluster. All cluster operations will communicate
|
||||
via this ID.
|
||||
|
||||
!SUBSECTION Sharding
|
||||
|
||||
|
@ -205,7 +208,7 @@ modern microservice architectures of applications. With the
|
|||
[Foxx services](../Foxx/README.md) it is very easy to deploy a data
|
||||
centric microservice within an ArangoDB cluster.
|
||||
|
||||
Alternatively, one can deploy multiple instances of ArangoDB within the
|
||||
In addition, one can deploy multiple instances of ArangoDB within the
|
||||
same project. One part of the project might need a scalable document
|
||||
store, another might need a graph database, and yet another might need
|
||||
the full power of a multi-model database actually mixing the various
|
||||
|
@ -221,7 +224,7 @@ capabilities in this direction.
|
|||
!SUBSECTION Apache Mesos integration
|
||||
|
||||
For the distributed setup, we use the Apache Mesos infrastructure by default.
|
||||
ArangoDB is a fully certified package for the Mesosphere DC/OS and can thus
|
||||
ArangoDB is a fully certified package for DC/OS and can thus
|
||||
be deployed essentially with a few mouse clicks or a single command, once
|
||||
you have an existing DC/OS cluster. But even on a plain Apache Mesos cluster
|
||||
one can deploy ArangoDB via Marathon with a single API call and some JSON
|
||||
|
@ -268,3 +271,5 @@ even further you can install a reverse proxy like haproxy or nginx in
|
|||
front of the coordinators (that will also allow easy access from the
|
||||
application).
|
||||
|
||||
Authentication in the cluster will be added soon after the initial 3.0
|
||||
release.
|
||||
|
|
|
@ -17,7 +17,7 @@ primary key and all these operations scale linearly. If the sharding is
|
|||
done using different shard keys, then a lookup of a single key involves
|
||||
asking all shards and thus does not scale linearly.
|
||||
|
||||
!SUBSECTION document store
|
||||
!SUBSECTION Document store
|
||||
|
||||
For the document store case even in the presence of secondary indexes
|
||||
essentially the same arguments apply, since an index for a sharded
|
||||
|
@ -26,10 +26,51 @@ single document operations still scale linearly with the size of the
|
|||
cluster, unless a special sharding configuration makes lookups or
|
||||
write operations more expensive.
|
||||
|
||||
!SUBSECTION complex queries and joins
|
||||
For a deeper analysis of this topic see
|
||||
[this blog post](https://mesosphere.com/blog/2015/11/30/arangodb-benchmark-dcos/)
|
||||
in which good linear scalability of ArangoDB for single document operations
|
||||
is demonstrated.
|
||||
|
||||
TODO
|
||||
|
||||
!SUBSECTION graph database
|
||||
!SUBSECTION Complex queries and joins
|
||||
|
||||
TODO
|
||||
The AQL query language allows complex queries, using multiple
|
||||
collections, secondary indexes as well as joins. In particular with
|
||||
the latter, scaling can be a challenge, since if the data to be
|
||||
joined resides on different machines, a lot of communication
|
||||
has to happen. The AQL query execution engine organises a data
|
||||
pipeline across the cluster to put together the results in the
|
||||
most efficient way. The query optimizer is aware of the cluster
|
||||
structure and knows what data is where and how it is indexed.
|
||||
Therefore, it can arrive at an informed decision about what parts
|
||||
of the query ought to run where in the cluster.
|
||||
|
||||
Nevertheless, for certain complicated joins, there are limits as
|
||||
to what can be achieved. A very important case that can be
|
||||
optimized relatively easily is if one of the collections involved
|
||||
in the join is small enough such that it is possible to
|
||||
replicated its data on all machines. We call such a collection a
|
||||
"satellite collection". Due to the replication a join involving
|
||||
such a collection can be executed locally without too much
|
||||
communication overhead.
|
||||
|
||||
|
||||
!SUBSECTION Graph database
|
||||
|
||||
Graph databases are particularly good at queries on graphs that involve
|
||||
paths in the graph of an a priori unknown length. For example, finding
|
||||
the shortest path between two vertices in a graph, or finding all
|
||||
paths that match a certain pattern starting at a given vertex are such
|
||||
example.
|
||||
|
||||
However, if the vertices and edges along the occurring paths are
|
||||
distributed across the cluster, then a lot of communication is
|
||||
necessary between nodes, and performance suffers. To achieve good
|
||||
performance at scale, it is therefore necessary, to get the
|
||||
distribution of the graph data across the shards in the cluster
|
||||
right. Most of the time, the application developers and users of
|
||||
ArangoDB know best, how their graphs a structured. Therefore,
|
||||
ArangoDB allows users to specify, according to which attributes
|
||||
the graph data is sharded. A useful first step is usually to make
|
||||
sure that the edges originating at a vertex reside on the same
|
||||
cluster node as the vertex.
|
||||
|
|
|
@ -8,8 +8,12 @@ resilience by means of replication and automatic failover. Furthermore,
|
|||
one can build systems that scale their capacity dynamically up and down
|
||||
automatically according to demand.
|
||||
|
||||
Obviously, one can also scale ArangoDB vertically, that is, by using
|
||||
ever larger servers. However, this has the disadvantage that the
|
||||
One can also scale ArangoDB vertically, that is, by using
|
||||
ever larger servers. There is no builtin limitation in ArangoDB,
|
||||
for example, the server will automatically use more threads if
|
||||
more CPUs are present.
|
||||
|
||||
However, scaling vertically has the disadvantage that the
|
||||
costs grow faster than linear with the size of the server, and
|
||||
none of the resilience and dynamical capabilities can be achieved
|
||||
in this way.
|
||||
|
|
Loading…
Reference in New Issue