mirror of https://gitee.com/bigwinds/arangodb
110 lines
4.9 KiB
Markdown
110 lines
4.9 KiB
Markdown
Migration from ArangoDB 2.8 to 3.0
|
|
==================================
|
|
|
|
Problem
|
|
-------
|
|
|
|
I want to use ArangoDB 3.0 from now on but I still have data in ArangoDB 2.8.
|
|
I need to migrate my data. I am running an ArangoDB 3.0 cluster (and
|
|
possibly a cluster with ArangoDB 2.8 as well).
|
|
|
|
Solution
|
|
--------
|
|
|
|
The internal data format changed completely from ArangoDB 2.8 to 3.0,
|
|
therefore you have to dump all data using `arangodump` and then
|
|
restore it to the new ArangoDB instance using `arangorestore`.
|
|
|
|
General instructions for this procedure can be found
|
|
[in the manual](../../Manual/Administration/Upgrading/Upgrading30.html).
|
|
Here, we cover some additional details about the cluster case.
|
|
|
|
### Dumping the data in ArangoDB 2.8
|
|
|
|
Basically, dumping the data works with the following command (use `arangodump`
|
|
from your ArangoDB 2.8 distribution!):
|
|
|
|
arangodump --server.endpoint tcp://localhost:8530 --output-directory dump
|
|
|
|
or a variation of it, for details see the above mentioned manual page and
|
|
[this section](https://docs.arangodb.com/2.8/HttpBulkImports/Arangodump.html).
|
|
If your ArangoDB 2.8 instance is a cluster, simply use one of the
|
|
coordinator endpoints as the above `--server.endpoint`.
|
|
|
|
### Restoring the data in ArangoDB 3.0
|
|
|
|
The output consists of JSON files in the output directory, two for each
|
|
collection, one for the structure and one for the data. The data format
|
|
is 100% compatible with ArangoDB 3.0, except that ArangoDB 3.0 has
|
|
an additional option in the structure files for synchronous replication,
|
|
namely the attribute `replicationFactor`, which is used to specify,
|
|
how many copies of the data for each shard are kept in the cluster.
|
|
|
|
Therefore, you can simply use this command (use the `arangorestore` from
|
|
your ArangoDB 3.0 distribution!):
|
|
|
|
arangorestore --server.endpoint tcp://localhost:8530 --input-directory dump
|
|
|
|
to import your data into your new ArangoDB 3.0 instance. See
|
|
[this page](../../Manual/Administration/Arangorestore.html)
|
|
for details on the available command line options. If your ArangoDB 3.0
|
|
instance is a cluster, then simply use one of the coordinators as
|
|
`--server.endpoint`.
|
|
|
|
That is it, your data is migrated.
|
|
|
|
### Controling the number of shards and the replication factor
|
|
|
|
This procedure works for all four combinations of single server and cluster
|
|
for source and destination respectively. If the target is a single server
|
|
all simply works.
|
|
|
|
So it remains to explain how one controls the number of shards and the
|
|
replication factor if the destination is a cluster.
|
|
|
|
If the source was a cluster, `arangorestore` will use the same number
|
|
of shards as before, if you do not tell it otherwise. Since ArangoDB 2.8
|
|
does not have synchronous replication, it does not produce dumps
|
|
with the `replicationFactor` attribute, and so `arangorestore` will
|
|
use replication factor 1 for all collections. If the source was a
|
|
single server, the same will happen, additionally, `arangorestore`
|
|
will always create collections with just a single shard.
|
|
|
|
There are essentially 3 ways to change this behaviour:
|
|
|
|
1. The first is to create the collections explicitly on the
|
|
ArangoDB 3.0 cluster, and then set the `--create-collection false` flag.
|
|
In this case you can control the number of shards and the replication
|
|
factor for each collection individually when you create them.
|
|
2. The second is to use `arangorestore`'s options
|
|
`--default-number-of-shards` and `--default-replication-factor`
|
|
(this option was introduced in Version 3.0.2)
|
|
respectively to specify default values, which are taken if the
|
|
dump files do not specify numbers. This means that all such
|
|
restored collections will have the same number of shards and
|
|
replication factor.
|
|
3. If you need more control you can simply edit the structure files
|
|
in the dump. They are simply JSON files, you can even first
|
|
use a JSON pretty printer to make editing easier. For the
|
|
replication factor you simply have to add a `replicationFactor`
|
|
attribute to the `parameters` subobject with a numerical value.
|
|
For the number of shards, locate the `shards` subattribute of the
|
|
`parameters` attribute and edit it, such that it has the right
|
|
number of attributes. The actual names of the attributes as well
|
|
as their values do not matter. Alternatively, add a `numberOfShards`
|
|
attribute to the `parameters` subobject, this will override the
|
|
`shards` attribute (this possibility was introduced in Version
|
|
3.0.2).
|
|
|
|
Note that you can remove individual collections from your dump by
|
|
deleting their pair of structure and data file in the dump directory.
|
|
In this way you can restore your data in several steps or even
|
|
parallelise the restore operation by running multiple `arangorestore`
|
|
processes concurrently on different dump directories. You should
|
|
consider using different coordinators for the different `arangorestore`
|
|
processes in this case.
|
|
|
|
All these possibilities together give you full control over the sharding
|
|
layout of your data in the new ArangoDB 3.0 cluster.
|
|
|