1
0
Fork 0
arangodb/Documentation/Books/Manual/architecture-single-instanc...

162 lines
6.2 KiB
Markdown

---
layout: default
description: In general, a single server configuration and a cluster configurationof ArangoDB behave very similarly
---
Single Instance vs. Cluster
===========================
In general, a single server configuration and a cluster configuration
of ArangoDB behave very similarly. However, there are differences due to
the different nature of these setups. This can lead to a discrepancy in behavior
between these two configurations. A summary of potential differences follows.
See [Migrating from Single Instance to Cluster](deployment-migrating-single-instance-cluster.html)
for practical information.
Locking and dead-lock prevention
--------------------------------
In a single server configuration all data is local and dead-locks can
easily be detected. In a cluster configuration data is distributed to
many servers and some conflicts cannot be detected easily. Therefore
we have to do some things (like locking shards) sequentially and in a
strictly predefined order, to avoid dead-locks in this way by design.
Document Keys
-------------
In a cluster the *autoincrement* key generator is not supported. You
have to use the *traditional* or user defined keys.
Indexes
-------
### Unique constraints
There are restrictions on the allowed unique constraints in a cluster.
Any unique constraint which cannot be checked locally on a per shard
basis is not allowed in a cluster setup. More concretely, unique
constraints in a cluster are only allowed in the following situations:
- there is always a unique constraint on the primary key `_key`, if
the collection is not sharded by `_key`, then `_key` must be
automatically generated by the database and cannot be prescribed by
the client
- the collection has only one shard, in which case the same unique
constraints are allowed as in the single instance case
- if the collection is sharded by exactly one other attribute than
`_key`, then there can be a unique constraint on that attribute
These restrictions are imposed, because otherwise checking for a unique
constraint violation would involve checking with all shards, which would have
a considerable performance impact.
Renaming
--------
It is not possible to rename collections or views in a cluster.
AQL
---
The AQL syntax for single server and cluster is identical. However,
there is one additional requirement (regarding *with*) and possible
performance differences.
### WITH
The `WITH` keyword in AQL must be used to declare which collections
are used in the AQL. For most AQL requires the required collections
can be deduced from the query itself. However, with traversals this is
not possible, if edge collections are used directly. See
[AQL WITH operation](aql/operations-with.html)
for details. The `WITH` statement is not necessary when using named graphs
for the traversals.
As deadlocks cannot be detected in a cluster environment easily, the
`WITH` keyword is mandatory for this particular situation in a cluster,
but not in a single server.
### Performance
Performance of AQL queries can vary between single server and cluster.
If a query can be distributed to many DBserver and executed in
parallel then cluster performance can be better. For example, if you
do a distributed `COLLECT` aggregation or a distributed `FILTER`
operation.
On the other hand, if you do a join or a traversal and the data is not
local to one server then the performance can be worse compared to a
single server. This is especially true for traversal if the data is
not sharded with care. Our smart graph feature helps with this for
traversals.
Single document operations can have a higher throughput in cluster but
will also have a higher latency, due to an additional network hop from
coordinator to dbserver.
Any operation that needs to find documents by anything else but the
shard key will have to fan out to all shards, so it will be a lot
slower than when referring to the documents using the shard
key. Optimized lookups by shard key can only be used for equality
lookups, e.g. not for range lookups.
### Memory usage
Some query results must be built up in memory on a coordinator, for
example if a dataset needs to be sorted on the fly. This can relatively
easily overwhelm a coordinator if the dataset is sharded across multiple
dbservers. Use indexes and streaming cursors (>= 3.4) to circumvent this
problem.
Transactions
------------
Using a single instance of ArangoDB, multi-document / multi-collection
queries are guaranteed to be fully ACID. This is more than many other
NoSQL database systems support. In cluster mode, single-document
operations are also fully ACID. Multi-document / multi-collection
queries in a cluster are not ACID, which is equally the case for
competing database systems. See [Transactions](transactions.html)
for details.
Batch operations for multiple documents in the same collection are only
fully transactional in a single instance.
Smart graphs
------------
In smart graphs there are restrictions on the values of the `_key`
attributes. Essentially, the `_key` attribute values for vertices must
be prefixed with the string value of the smart graph attribute and a
colon. A similar restriction applies for the edges.
Foxx
----
Foxx apps run on the coordinators of a cluster. Since coordinators are
stateless, one must not use regular file accesses in Foxx apps in a
cluster.
Agency
------
A cluster deployment needs a central, RAFT-based key/value store called
"the agency" to keep the current cluster configuration and manage
failover. Being RAFT-based, this is a real-time system. If your servers
running the agency instances (typically three or five) receive too much
load, the RAFT protocol stops working and the whole stability of the
cluster is endangered. If you foresee this problem, run the agency
instances on separate nodes. All this is not necessary in a single
server deployment.
Dump/Restore
------------
At the time of this writing, the `arangodump` utility in a cluster
cannot guarantee a consistent snapshot across multiple shards or even
multiple collections. This is in line with most other current NoSQL
database systems. We are working on a consistent snapshot and
incremental backup capability for 3.5. In a single server, `arangodump`
produces a consistent snapshot.