mirror of https://gitee.com/bigwinds/arangodb
162 lines
6.2 KiB
Markdown
162 lines
6.2 KiB
Markdown
---
|
|
layout: default
|
|
description: In general, a single server configuration and a cluster configurationof ArangoDB behave very similarly
|
|
---
|
|
Single Instance vs. Cluster
|
|
===========================
|
|
|
|
In general, a single server configuration and a cluster configuration
|
|
of ArangoDB behave very similarly. However, there are differences due to
|
|
the different nature of these setups. This can lead to a discrepancy in behavior
|
|
between these two configurations. A summary of potential differences follows.
|
|
|
|
See [Migrating from Single Instance to Cluster](deployment-migrating-single-instance-cluster.html)
|
|
for practical information.
|
|
|
|
Locking and dead-lock prevention
|
|
--------------------------------
|
|
|
|
In a single server configuration all data is local and dead-locks can
|
|
easily be detected. In a cluster configuration data is distributed to
|
|
many servers and some conflicts cannot be detected easily. Therefore
|
|
we have to do some things (like locking shards) sequentially and in a
|
|
strictly predefined order, to avoid dead-locks in this way by design.
|
|
|
|
Document Keys
|
|
-------------
|
|
|
|
In a cluster the *autoincrement* key generator is not supported. You
|
|
have to use the *traditional* or user defined keys.
|
|
|
|
Indexes
|
|
-------
|
|
|
|
### Unique constraints
|
|
|
|
There are restrictions on the allowed unique constraints in a cluster.
|
|
Any unique constraint which cannot be checked locally on a per shard
|
|
basis is not allowed in a cluster setup. More concretely, unique
|
|
constraints in a cluster are only allowed in the following situations:
|
|
|
|
- there is always a unique constraint on the primary key `_key`, if
|
|
the collection is not sharded by `_key`, then `_key` must be
|
|
automatically generated by the database and cannot be prescribed by
|
|
the client
|
|
- the collection has only one shard, in which case the same unique
|
|
constraints are allowed as in the single instance case
|
|
- if the collection is sharded by exactly one other attribute than
|
|
`_key`, then there can be a unique constraint on that attribute
|
|
|
|
These restrictions are imposed, because otherwise checking for a unique
|
|
constraint violation would involve checking with all shards, which would have
|
|
a considerable performance impact.
|
|
|
|
Renaming
|
|
--------
|
|
|
|
It is not possible to rename collections or views in a cluster.
|
|
|
|
AQL
|
|
---
|
|
|
|
The AQL syntax for single server and cluster is identical. However,
|
|
there is one additional requirement (regarding *with*) and possible
|
|
performance differences.
|
|
|
|
### WITH
|
|
|
|
The `WITH` keyword in AQL must be used to declare which collections
|
|
are used in the AQL. For most AQL requires the required collections
|
|
can be deduced from the query itself. However, with traversals this is
|
|
not possible, if edge collections are used directly. See
|
|
[AQL WITH operation](aql/operations-with.html)
|
|
for details. The `WITH` statement is not necessary when using named graphs
|
|
for the traversals.
|
|
|
|
As deadlocks cannot be detected in a cluster environment easily, the
|
|
`WITH` keyword is mandatory for this particular situation in a cluster,
|
|
but not in a single server.
|
|
|
|
### Performance
|
|
|
|
Performance of AQL queries can vary between single server and cluster.
|
|
If a query can be distributed to many DBserver and executed in
|
|
parallel then cluster performance can be better. For example, if you
|
|
do a distributed `COLLECT` aggregation or a distributed `FILTER`
|
|
operation.
|
|
|
|
On the other hand, if you do a join or a traversal and the data is not
|
|
local to one server then the performance can be worse compared to a
|
|
single server. This is especially true for traversal if the data is
|
|
not sharded with care. Our smart graph feature helps with this for
|
|
traversals.
|
|
|
|
Single document operations can have a higher throughput in cluster but
|
|
will also have a higher latency, due to an additional network hop from
|
|
coordinator to dbserver.
|
|
|
|
Any operation that needs to find documents by anything else but the
|
|
shard key will have to fan out to all shards, so it will be a lot
|
|
slower than when referring to the documents using the shard
|
|
key. Optimized lookups by shard key can only be used for equality
|
|
lookups, e.g. not for range lookups.
|
|
|
|
### Memory usage
|
|
|
|
Some query results must be built up in memory on a coordinator, for
|
|
example if a dataset needs to be sorted on the fly. This can relatively
|
|
easily overwhelm a coordinator if the dataset is sharded across multiple
|
|
dbservers. Use indexes and streaming cursors (>= 3.4) to circumvent this
|
|
problem.
|
|
|
|
Transactions
|
|
------------
|
|
|
|
Using a single instance of ArangoDB, multi-document / multi-collection
|
|
queries are guaranteed to be fully ACID. This is more than many other
|
|
NoSQL database systems support. In cluster mode, single-document
|
|
operations are also fully ACID. Multi-document / multi-collection
|
|
queries in a cluster are not ACID, which is equally the case for
|
|
competing database systems. See [Transactions](transactions.html)
|
|
for details.
|
|
|
|
Batch operations for multiple documents in the same collection are only
|
|
fully transactional in a single instance.
|
|
|
|
Smart graphs
|
|
------------
|
|
|
|
In smart graphs there are restrictions on the values of the `_key`
|
|
attributes. Essentially, the `_key` attribute values for vertices must
|
|
be prefixed with the string value of the smart graph attribute and a
|
|
colon. A similar restriction applies for the edges.
|
|
|
|
Foxx
|
|
----
|
|
|
|
Foxx apps run on the coordinators of a cluster. Since coordinators are
|
|
stateless, one must not use regular file accesses in Foxx apps in a
|
|
cluster.
|
|
|
|
Agency
|
|
------
|
|
|
|
A cluster deployment needs a central, RAFT-based key/value store called
|
|
"the agency" to keep the current cluster configuration and manage
|
|
failover. Being RAFT-based, this is a real-time system. If your servers
|
|
running the agency instances (typically three or five) receive too much
|
|
load, the RAFT protocol stops working and the whole stability of the
|
|
cluster is endangered. If you foresee this problem, run the agency
|
|
instances on separate nodes. All this is not necessary in a single
|
|
server deployment.
|
|
|
|
Dump/Restore
|
|
------------
|
|
|
|
At the time of this writing, the `arangodump` utility in a cluster
|
|
cannot guarantee a consistent snapshot across multiple shards or even
|
|
multiple collections. This is in line with most other current NoSQL
|
|
database systems. We are working on a consistent snapshot and
|
|
incremental backup capability for 3.5. In a single server, `arangodump`
|
|
produces a consistent snapshot.
|