Cluster Administration
======================

This _Section_ includes information related to the administration of an ArangoDB Cluster.

For a general introduction to the ArangoDB Cluster, please refer to the
Cluster [chapter](../../Scalability/Cluster/README.md).

Please also check the following talks:

| # | Date            | Title                                                                       | Who                                     | Link                                                                                                            |
|---|-----------------|-----------------------------------------------------------------------------|-----------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|
| 1 | 10th April 2018 | Fundamentals and Best Practices of ArangoDB Cluster Administration          | Kaveh Vahedipour, ArangoDB Cluster Team | [Online Meetup Page](https://www.meetup.com/online-ArangoDB-meetup/events/248996022/) & [Video](https://www.youtube.com/watch?v=RQ33fkgUg64) |
| 2 | 29th May 2018   | Fundamentals and Best Practices of ArangoDB Cluster Administration: Part II | Kaveh Vahedipour, ArangoDB Cluster Team | [Online Meetup Page](https://www.meetup.com/online-ArangoDB-meetup/events/250869684/) & [Video](https://www.youtube.com/watch?v=jj7YpTaL3pI) |


Enabling synchronous replication
--------------------------------

For an introduction about _Synchronous Replication_ in Cluster, please refer
to the [_Cluster Architecture_](../../Scalability/Cluster/Architecture.md#synchronous-replication) section. 
                                                               
Synchronous replication can be enabled per _collection_. When creating a
_collection_ you may specify the number of _replicas_ using the
*replicationFactor* parameter. The default value is set to `1` which
effectively *disables* synchronous replication among _DBServers_. 

Whenever you specify a _replicationFactor_ greater than 1 when creating a
collection, synchronous replication will be activated for this collection. 
The Cluster will determine suitable _leaders_ and _followers_ for every 
requested _shard_ (_numberOfShards_) within the Cluster.

Example:

```
127.0.0.1:8530@_system> db._create("test", {"replicationFactor": 3})
```

In the above case, any write operation will require 2 replicas to
report success from now on. 

Preparing growth
----------------

You may create a _collection_ with higher _replication factor_ than
available _DBServers_. When additional _DBServers_ become available 
the _shards_ are automatically replicated to the newly available _DBServers_. 

To create a _collection_ with higher _replication factor_ than
available _DBServers_ please set the option _enforceReplicationFactor_ to _false_, 
when creating the collection from _ArangoShell_ (the option is not available
from the web interface), e.g.:

```
db._create("test", { replicationFactor: 4 }, { enforceReplicationFactor: false });
```

The default value for _enforceReplicationFactor_ is true. 

**Note:** multiple _replicas_ of the same _shard_ can never coexist on the same
_DBServer_ instance.

Sharding
--------

For an introduction about _Sharding_ in Cluster, please refer to the
[_Cluster Architecture_](../../Scalability/Cluster/Architecture.md#sharding) section. 

Number of _shards_ can be configured at _collection_ creation time, e.g. the UI,
or the _ArangoDB Shell_:

```
127.0.0.1:8529@_system> db._create("sharded_collection", {"numberOfShards": 4});
```

To configure a custom _hashing_ for another attribute (default is __key_):

```
127.0.0.1:8529@_system> db._create("sharded_collection", {"numberOfShards": 4, "shardKeys": ["country"]});
```

The example above, where 'country' has been used as _shardKeys_ can be useful
to keep data of every country in one shard, which would result in better
performance for queries working on a per country base. 

It is also possible to specify multiple `shardKeys`. 

Note however that if you change the shard keys from their default `["_key"]`, then finding
a document in the collection by its primary key involves a request to
every single shard. Furthermore, in this case one can no longer prescribe
the primary key value of a new document but must use the automatically
generated one. This latter restriction comes from the fact that ensuring
uniqueness of the primary key would be very inefficient if the user
could specify the primary key.

On which DBServer in a Cluster a particular _shard_ is kept is undefined.
There is no option to configure an affinity based on certain _shard_ keys.

Unique indexes (hash, skiplist, persistent) on sharded collections are
only allowed if the fields used to determine the shard key are also
included in the list of attribute paths for the index:

| shardKeys | indexKeys |             |
|----------:|----------:|------------:|
| a         | a         |     allowed |
| a         | b         | not allowed |
| a         | a, b      |     allowed |
| a, b      | a         | not allowed |
| a, b      | b         | not allowed |
| a, b      | a, b      |     allowed |
| a, b      | a, b, c   |     allowed |
| a, b, c   | a, b      | not allowed |
| a, b, c   | a, b, c   |     allowed |

Moving/Rebalancing _shards_
---------------------------

A _shard_ can be moved from a _DBServer_ to another, and the entire shard distribution
can be rebalanced using the correponding buttons in the web [UI](../../Programs/WebInterface/Cluster.md).

Replacing/Removing a _Coordinator_
----------------------------------

_Coordinators_ are effectively stateless and can be replaced, added and
removed without more consideration than meeting the necessities of the
particular installation. 

To take out a _Coordinator_ stop the
_Coordinator_'s instance by issueing `kill -SIGTERM <pid>`.

Ca. 15 seconds later the cluster UI on any other _Coordinator_ will mark
the _Coordinator_ in question as failed. Almost simultaneously, a trash bin
icon will appear to the right of the name of the _Coordinator_. Clicking
that icon will remove the _Coordinator_ from the coordinator registry.

Any new _Coordinator_ instance that is informed of where to find any/all
agent/s, `--cluster.agency-endpoint` `<some agent endpoint>` will be
integrated as a new _Coordinator_ into the cluster. You may also just
restart the _Coordinator_ as before and it will reintegrate itself into
the cluster.

Replacing/Removing a _DBServer_
-------------------------------

_DBServers_ are where the data of an ArangoDB cluster is stored. They
do not publish a we UI and are not meant to be accessed by any other
entity than _Coordinators_ to perform client requests or other _DBServers_
to uphold replication and resilience.

The clean way of removing a _DBServer_ is to first releave it of all
its responsibilities for shards. This applies to _followers_ as well as
_leaders_ of shards. The requirement for this operation is that no
collection in any of the databases has a `relicationFactor` greater or
equal to the current number of _DBServers_ minus one. For the pupose of
cleaning out `DBServer004` for example would work as follows, when
issued to any _Coordinator_ of the cluster:

`curl <coord-ip:coord-port>/_admin/cluster/cleanOutServer -d '{"id":"DBServer004"}'`

After the _DBServer_ has been cleaned out, you will find a trash bin
icon to the right of the name of the _DBServer_ on any _Coordinators_'
UI. Clicking on it will remove the _DBServer_ in questiuon from the
cluster.

Firing up any _DBServer_ from a clean data directory by specifying the
any of all agency endpoints will integrate the new _DBServer_ into the
cluster.

To distribute shards onto the new _DBServer_ either click on the
`Distribute Shards` button at the bottom of the `Shards` page in every
database.