mirror of https://gitee.com/bigwinds/arangodb
Doc - Fast Cluster Restore Procedure (#7756)
This commit is contained in:
parent
ef03234331
commit
37c5c1239d
|
@ -5,6 +5,12 @@ Backup and restore can be done via the tools
|
|||
[_arangodump_](../Programs/Arangodump/README.md) and
|
||||
[_arangorestore_](../Programs/Arangorestore/README.md).
|
||||
|
||||
{% hint 'tip' %}
|
||||
In order to speed up the _arangorestore_ performance in a Cluster environment,
|
||||
the [Fast Cluster Restore](../Programs/Arangorestore/FastClusterRestore.md)
|
||||
procedure is recommended.
|
||||
{% endhint %}
|
||||
|
||||
Performing frequent backups is important and a recommended best practices that
|
||||
can allow you to recover your data in case unexpected problems occur.
|
||||
Hardware failures, system crashes, or users mistakenly deleting data can always
|
||||
|
|
|
@ -18,12 +18,12 @@ _arangodump_ will by default connect to the *_system* database using the default
|
|||
endpoint. If you want to connect to a different database or a different endpoint,
|
||||
or use authentication, you can use the following command-line options:
|
||||
|
||||
- *--server.database <string>*: name of the database to connect to
|
||||
- *--server.endpoint <string>*: endpoint to connect to
|
||||
- *--server.username <string>*: username
|
||||
- *--server.password <string>*: password to use (omit this and you'll be prompted for the
|
||||
- `--server.database <string>`: name of the database to connect to
|
||||
- `--server.endpoint <string>`: endpoint to connect to
|
||||
- `--server.username <string>`: username
|
||||
- `--server.password <string>`: password to use (omit this and you'll be prompted for the
|
||||
password)
|
||||
- *--server.authentication <bool>*: whether or not to use authentication
|
||||
- `--server.authentication <bool>`: whether or not to use authentication
|
||||
|
||||
Here's an example of dumping data from a non-standard endpoint, using a dedicated
|
||||
[database name](../../Appendix/Glossary.md#database-name):
|
||||
|
@ -39,9 +39,9 @@ By default, _arangodump_ will dump both structural information and documents fro
|
|||
non-system collections. To adjust this, there are the following command-line
|
||||
arguments:
|
||||
|
||||
- *--dump-data <bool>*: set to *true* to include documents in the dump. Set to *false*
|
||||
- `--dump-data <bool>`: set to *true* to include documents in the dump. Set to *false*
|
||||
to exclude documents. The default value is *true*.
|
||||
- *--include-system-collections <bool>*: whether or not to include system collections
|
||||
- `--include-system-collections <bool>`: whether or not to include system collections
|
||||
in the dump. The default value is *false*. **Set to _true_ if you are using named
|
||||
graphs that you are interested in restoring.**
|
||||
|
||||
|
@ -69,9 +69,10 @@ Cluster Backup
|
|||
--------------
|
||||
|
||||
Starting with Version 2.1 of ArangoDB, the *arangodump* tool also
|
||||
supports sharding. Simply point it to one of the coordinators and it
|
||||
supports sharding and can be used to backup data from a Cluster.
|
||||
Simply point it to one of the _Coordinators_ and it
|
||||
will behave exactly as described above, working on sharded collections
|
||||
in the cluster.
|
||||
in the Cluster.
|
||||
|
||||
Please see the [Limitations](Limitations.md).
|
||||
|
||||
|
|
|
@ -18,32 +18,32 @@ _arangorestore_ can be invoked from the command-line as follows:
|
|||
|
||||
arangorestore --input-directory "dump"
|
||||
|
||||
This will connect to an ArangoDB server and reload structural information and
|
||||
This will connect to an ArangoDB server and reload structural information and
|
||||
documents found in the input directory *dump*. Please note that the input directory
|
||||
must have been created by running *arangodump* before.
|
||||
|
||||
_arangorestore_ will by default connect to the *_system* database using the default
|
||||
endpoint. If you want to connect to a different database or a different endpoint,
|
||||
endpoint. If you want to connect to a different database or a different endpoint,
|
||||
or use authentication, you can use the following command-line options:
|
||||
|
||||
- *--server.database <string>*: name of the database to connect to
|
||||
- *--server.endpoint <string>*: endpoint to connect to
|
||||
- *--server.username <string>*: username
|
||||
- *--server.password <string>*: password to use (omit this and you'll be prompted for the
|
||||
- `--server.database <string>`: name of the database to connect to
|
||||
- `--server.endpoint <string>`: endpoint to connect to
|
||||
- `--server.username <string>`: username
|
||||
- `--server.password <string>`: password to use (omit this and you'll be prompted for the
|
||||
password)
|
||||
- *--server.authentication <bool>*: whether or not to use authentication
|
||||
|
||||
Since version 2.6 _arangorestore_ provides the option *--create-database*. Setting this
|
||||
- `--server.authentication <bool>`: whether or not to use authentication
|
||||
|
||||
Since version 2.6 _arangorestore_ provides the option *--create-database*. Setting this
|
||||
option to *true* will create the target database if it does not exist. When creating the
|
||||
target database, the username and passwords passed to _arangorestore_ (in options
|
||||
*--server.username* and *--server.password*) will be used to create an initial user for the
|
||||
target database, the username and passwords passed to _arangorestore_ (in options
|
||||
*--server.username* and *--server.password*) will be used to create an initial user for the
|
||||
new database.
|
||||
|
||||
The option `--force-same-database` allows restricting arangorestore operations to a
|
||||
database with the same name as in the source dump's "dump.json" file. It can thus be used
|
||||
to prevent restoring data into a "wrong" database by accident.
|
||||
|
||||
For example, if a dump was taken from database `a`, and the restore is attempted into
|
||||
For example, if a dump was taken from database `a`, and the restore is attempted into
|
||||
database `b`, then with the `--force-same-database` option set to `true`, arangorestore
|
||||
will abort instantly.
|
||||
|
||||
|
@ -55,7 +55,7 @@ Here's an example of reloading data to a non-standard endpoint, using a dedicate
|
|||
arangorestore --server.endpoint tcp://192.168.173.13:8531 --server.username backup --server.database mydb --input-directory "dump"
|
||||
|
||||
To create the target database whe restoring, use a command like this:
|
||||
|
||||
|
||||
arangorestore --server.username backup --server.database newdb --create-database true --input-directory "dump"
|
||||
|
||||
_arangorestore_ will print out its progress while running, and will end with a line
|
||||
|
@ -64,25 +64,25 @@ showing some aggregate statistics:
|
|||
Processed 2 collection(s), read 2256 byte(s) from datafiles, sent 2 batch(es)
|
||||
|
||||
|
||||
By default, _arangorestore_ will re-create all non-system collections found in the input
|
||||
directory and load data into them. If the target database already contains collections
|
||||
which are also present in the input directory, the existing collections in the database
|
||||
By default, _arangorestore_ will re-create all non-system collections found in the input
|
||||
directory and load data into them. If the target database already contains collections
|
||||
which are also present in the input directory, the existing collections in the database
|
||||
will be dropped and re-created with the data found in the input directory.
|
||||
|
||||
The following parameters are available to adjust this behavior:
|
||||
|
||||
- *--create-collection <bool>*: set to *true* to create collections in the target
|
||||
- `--create-collection <bool>`: set to *true* to create collections in the target
|
||||
database. If the target database already contains a collection with the same name,
|
||||
it will be dropped first and then re-created with the properties found in the input
|
||||
directory. Set to *false* to keep existing collections in the target database. If
|
||||
directory. Set to *false* to keep existing collections in the target database. If
|
||||
set to *false* and _arangorestore_ encounters a collection that is present in the
|
||||
input directory but not in the target database, it will abort. The default value is *true*.
|
||||
- *--import-data <bool>*: set to *true* to load document data into the collections in
|
||||
the target database. Set to *false* to not load any document data. The default value
|
||||
- `--import-data <bool>`: set to *true* to load document data into the collections in
|
||||
the target database. Set to *false* to not load any document data. The default value
|
||||
is *true*.
|
||||
- *--include-system-collections <bool>*: whether or not to include system collections
|
||||
- `--include-system-collections <bool>`: whether or not to include system collections
|
||||
when re-creating collections or reloading data. The default value is *false*.
|
||||
|
||||
|
||||
For example, to (re-)create all non-system collections and load document data into them, use:
|
||||
|
||||
arangorestore --create-collection true --import-data true --input-directory "dump"
|
||||
|
@ -91,7 +91,7 @@ This will drop potentially existing collections in the target database that are
|
|||
in the input directory.
|
||||
|
||||
To include system collections too, use *--include-system-collections true*:
|
||||
|
||||
|
||||
arangorestore --create-collection true --import-data true --include-system-collections true --input-directory "dump"
|
||||
|
||||
To (re-)create all non-system collections without loading document data, use:
|
||||
|
@ -107,20 +107,29 @@ To just load document data into all non-system collections, use:
|
|||
|
||||
To restrict reloading to just specific collections, there is is the *--collection* option.
|
||||
It can be specified multiple times if required:
|
||||
|
||||
|
||||
arangorestore --collection myusers --collection myvalues --input-directory "dump"
|
||||
|
||||
Collections will be processed by in alphabetical order by _arangorestore_, with all document
|
||||
collections being processed before all [edge collection](../../Appendix/Glossary.md#edge-collection)s. This is to ensure that reloading
|
||||
data into edge collections will have the document collections linked in edges (*_from* and
|
||||
*_to* attributes) loaded.
|
||||
collections being processed before all [edge collections](../../Appendix/Glossary.md#edge-collection).
|
||||
This remains valid also when multiple threads are in use (from v3.4.0 on).
|
||||
|
||||
Note however that when restoring an edge collection no internal checks are made in order to validate that
|
||||
the documents that the edges connect exist or not. As a consequence, when restoring individual collections
|
||||
which are part of a graph you are not required to restore in a specific order.
|
||||
|
||||
{% hint 'warning' %}
|
||||
When restoring only a subset of collections of your database, and graphs are in use, you will need
|
||||
to make sure you are restoring all the needed collections (the ones that are part of the graph) as
|
||||
otherwise you might have edges pointing to non existing documents.
|
||||
{% endhint %}
|
||||
|
||||
To restrict reloading to specific views, there is the *--view* option.
|
||||
Should you specify the *--collection* parameter views will not be restored _unless_ you explicitly
|
||||
specify them via the *--view* option.
|
||||
|
||||
|
||||
arangorestore --collection myusers --view myview --input-directory "dump"
|
||||
|
||||
|
||||
In the case of an arangosearch view you must make sure that the linked collections are either
|
||||
also restored or already present on the server.
|
||||
|
||||
|
@ -132,8 +141,8 @@ See [Arangodump](../Arangodump/Examples.md#encryption) for details.
|
|||
Reloading Data into a different Collection
|
||||
------------------------------------------
|
||||
|
||||
_arangorestore_ will restore document and edges data with the exact same *_key*, *_rev*, *_from*
|
||||
and *_to* values as found in the input directory.
|
||||
_arangorestore_ will restore document and edges data with the exact same *_key*, *_rev*, *_from*
|
||||
and *_to* values as found in the input directory.
|
||||
|
||||
With some creativity you can also use _arangodump_ and _arangorestore_ to transfer data from one
|
||||
collection into another (either on the same server or not). For example, to copy data from
|
||||
|
@ -142,44 +151,49 @@ you can start with the following command:
|
|||
|
||||
arangodump --collection myvalues --server.database mydb --output-directory "dump"
|
||||
|
||||
This will create two files, *myvalues.structure.json* and *myvalues.data.json*, in the output
|
||||
directory. To load data from the datafile into an existing collection *mycopyvalues* in database
|
||||
This will create two files, *myvalues.structure.json* and *myvalues.data.json*, in the output
|
||||
directory. To load data from the datafile into an existing collection *mycopyvalues* in database
|
||||
*mycopy*, rename the files to *mycopyvalues.structure.json* and *mycopyvalues.data.json*.
|
||||
After that, run the following command:
|
||||
|
||||
|
||||
arangorestore --collection mycopyvalues --server.database mycopy --input-directory "dump"
|
||||
|
||||
Using arangorestore with sharding
|
||||
---------------------------------
|
||||
Restoring in a Cluster
|
||||
----------------------
|
||||
|
||||
As of Version 2.1 the *arangorestore* tool supports sharding. Simply
|
||||
point it to one of the coordinators in your cluster and it will
|
||||
work as usual but on sharded collections in the cluster.
|
||||
From v2.1 on, the *arangorestore* tool supports sharding and can be
|
||||
used to restore data into a Cluster. Simply point it to one of the
|
||||
_Coordinators_ in your Cluster and it will work as usual but on sharded
|
||||
collections in the Cluster.
|
||||
|
||||
If *arangorestore* is asked to restore a collection, it will use the same number of
|
||||
shards, replication factor and shard keys as when the collection was dumped.
|
||||
The distribution of the shards to the servers will also be the same as at the time of the dump,
|
||||
provided that the number of DBServers in the cluster dumped from is identical to the
|
||||
number of DBServers in the to-be-restored-to cluster.
|
||||
If *arangorestore* is asked to restore a collection, it will use the same
|
||||
number of shards, replication factor and shard keys as when the collection
|
||||
was dumped. The distribution of the shards to the servers will also be the
|
||||
same as at the time of the dump, provided that the number of _DBServers_ in
|
||||
the cluster dumped from is identical to the number of DBServers in the
|
||||
to-be-restored-to cluster.
|
||||
|
||||
To modify the number of _shards_ or the _replication factor_ for all or just some collections,
|
||||
*arangorestore*, starting from v3.3.22 and v3.4.2, provides the options `--number-of-shards` and `--replication-factor`.
|
||||
These options can be specified multiple times as well, in order to override the settings
|
||||
To modify the number of _shards_ or the _replication factor_ for all or just
|
||||
some collections, *arangorestore* provides the options `--number-of-shards`
|
||||
and `--replication-factor` (starting from v3.3.22 and v3.4.2). These options
|
||||
can be specified multiple times as well, in order to override the settings
|
||||
for dedicated collections, e.g.
|
||||
|
||||
unix> arangorestore --number-of-shards 2 --number-of-shards mycollection=3 --number-of-shards test=4
|
||||
arangorestore --number-of-shards 2 --number-of-shards mycollection=3 --number-of-shards test=4
|
||||
|
||||
The above will restore all collections except "mycollection" and "test" with 2 shards. "mycollection"
|
||||
will have 3 shards when restored, and "test" will have 4. It is possible to omit the default value
|
||||
and only use collection-specific overrides. In this case, the number of shards for any collections not
|
||||
overridden will be determined by looking into the "numberOfShards" values contained in the dump.
|
||||
The above will restore all collections except "mycollection" and "test" with
|
||||
2 shards. "mycollection" will have 3 shards when restored, and "test" will
|
||||
have 4. It is possible to omit the default value and only use
|
||||
collection-specific overrides. In this case, the number of shards for any
|
||||
collections not overridden will be determined by looking into the
|
||||
"numberOfShards" values contained in the dump.
|
||||
|
||||
The `--replication-factor` options works in the same way, e.g.
|
||||
|
||||
unix> arangorestore --replication-factor 2 --replication-factor mycollection=1
|
||||
|
||||
arangorestore --replication-factor 2 --replication-factor mycollection=1
|
||||
|
||||
will set the replication factor to 2 for all collections but "mycollection", which will get a
|
||||
replication factor of just 1.
|
||||
replication factor of just 1.
|
||||
|
||||
If a collection was dumped from a single instance and is then restored into
|
||||
a cluster, the sharding will be done by the `_key` attribute by default. One can
|
||||
|
@ -190,6 +204,29 @@ If you restore a collection that was dumped from a cluster into a single
|
|||
ArangoDB instance, the number of shards, replication factor and shard keys will silently
|
||||
be ignored.
|
||||
|
||||
### Factors affecting speed of arangorestore in a Cluster
|
||||
|
||||
The following factors affect speed of _arangorestore_ in a Cluster:
|
||||
|
||||
- **Replication Factor**: the higher the _replication factor_, the more
|
||||
time the restore will take. To speed up the restore you can restore
|
||||
using a _replication factor_ of 1 and then increase it again
|
||||
after the restore. This will reduce the number of network hops needed
|
||||
during the restore.
|
||||
- **Restore Parallelization**: if the collections are not restored in
|
||||
parallel, the restore speed is highly affected. A parallel restore can
|
||||
be done from v3.4.0 by using the `--threads` option of _arangorestore_.
|
||||
Before v3.4.0 it is possible to achieve parallelization by restoring
|
||||
on multiple _Coordinators_ at the same time. Depending on your specific
|
||||
case, parallelizing on multiple _Coordinators_ can still be useful even
|
||||
when the `--threads` option is in use (from v.3.4.0).
|
||||
|
||||
{% hint 'tip' %}
|
||||
Please refer to the [Fast Cluster Restore](FastClusterRestore.md) page
|
||||
for further operative details on how to take into account, when restoring
|
||||
using _arangorestore_, the two factors described above.
|
||||
{% endhint %}
|
||||
|
||||
### Restoring collections with sharding prototypes
|
||||
|
||||
*arangorestore* will yield an error when trying to restore a
|
||||
|
|
|
@ -0,0 +1,279 @@
|
|||
Fast Cluster Restore
|
||||
====================
|
||||
|
||||
The _Fast Cluster Restore_ procedure documented in this page is recommended
|
||||
to speed-up the performance of [_arangorestore_](../Arangorestore/README.md)
|
||||
in a Cluster environment.
|
||||
|
||||
It is assumed that a Cluster environment is running and a _logical_ backup
|
||||
with [_arangodump_](../Arangodump/README.md) has already been taken.
|
||||
|
||||
{% hint 'info' %}
|
||||
The procedure described in this page is particularly useful for ArangoDB
|
||||
version 3.3, but can be used in 3.4 and later versions as well. Note that
|
||||
from v3.4, _arangorestore_ includes the option `--threads` which can be a first
|
||||
good step already in achieving restore parallelization and its speed benefit.
|
||||
However, the procedure below allows for even further parallelization (making
|
||||
use of different _Coordinators_), and the part regarding temporarily setting
|
||||
_replication factor_ to 1 is still useful in 3.4 and later versions.
|
||||
{% endhint %}
|
||||
|
||||
The speed improvement obtained by the procedure below is achieved by:
|
||||
|
||||
1. Restoring into a Cluster that has _replication factor_ 1, thus reducing
|
||||
number of network hops needed during the restore operation (_replication factor_
|
||||
is reverted to initial value at the end of the procedure - steps #2, #3 and #6).
|
||||
2. Restoring in parallel multiple collections on different _Coordinators_
|
||||
(steps #4 and #5).
|
||||
|
||||
{% hint 'info' %}
|
||||
Please refer to
|
||||
[this](Examples.md#factors-affecting-speed-of-arangorestore-in-a-cluster)
|
||||
section for further context on the factors affecting restore speed when restoring
|
||||
using _arangorestore_ in a Cluster.
|
||||
{% endhint %}
|
||||
|
||||
Step 1: Copy the _dump_ directory to all _Coordinators_
|
||||
-------------------------------------------------------
|
||||
|
||||
The first step is to copy the directory that contains the _dump_ to all machines
|
||||
where _Coordinators_ are running.
|
||||
|
||||
{% hint 'tip' %}
|
||||
This step is not strictly required as the backup can be restored over the
|
||||
network. However, if the restore is executed locally the restore speed is
|
||||
significantly improved.
|
||||
{% endhint %}
|
||||
|
||||
Step 2: Restore collection structures
|
||||
-------------------------------------
|
||||
|
||||
The collection structures have to be restored from exactly one _Coordinator_ (any
|
||||
_Coordinator_ can be used) with a command similar to the following one. Please add
|
||||
any additional needed option for your specific use case, e.g. `--create-database`
|
||||
if the database where you want to restore does not exist yet:
|
||||
|
||||
```
|
||||
arangorestore
|
||||
--server.endpoint <endpoint-of-a-coordinator>
|
||||
--server.database <database-name>
|
||||
--server.password <password>
|
||||
--import-data false
|
||||
--input-directory <dump-directory>
|
||||
```
|
||||
|
||||
{% hint 'info' %}
|
||||
If you are using v3.3.22 or higher, or v3.4.2 or higher, please also add in the
|
||||
command above the option `--replication-factor 1`.
|
||||
{% endhint %}
|
||||
|
||||
The option `--import-data false` tells _arangorestore_ to restore only the
|
||||
collection structure and no data.
|
||||
|
||||
Step 3: Set _Replication Factor_ to 1
|
||||
--------------------------------------
|
||||
|
||||
{% hint 'info' %}
|
||||
This step is **not** needed if you are using v3.3.22 or higher or v3.4.2 or higher
|
||||
and you have used in the previous step the option `--replication-factor 1`.
|
||||
{% endhint %}
|
||||
|
||||
To speed up restore, it is possible to set the _replication factor_ to 1 before
|
||||
importing any data. Run the following command from exactly one _Coordinator_ (any
|
||||
_Coordinator_ can be used):
|
||||
|
||||
```
|
||||
echo 'db._collections().filter(function(c) { return c.name()[0] !== "_"; })
|
||||
.forEach(function(c) { print("collection:", c.name(), "replicationFactor:",
|
||||
c.properties().replicationFactor); c.properties({ replicationFactor: 1 }); });'
|
||||
| arangosh
|
||||
--server.endpoint <endpoint-of-a-coordinator>
|
||||
--server.database <database-name>
|
||||
--server.username <user-name>
|
||||
--server.password <password>
|
||||
```
|
||||
|
||||
Step 4: Create parallel restore scripts
|
||||
---------------------------------------
|
||||
|
||||
Now that the Cluster is prepared, the `parallelRestore` script will be used.
|
||||
|
||||
Please create the below `parallelRestore` script in any of your _Coordinators_.
|
||||
|
||||
When executed (see below for further details), this script will create other scripts
|
||||
that can be then copied and executed on each _Coordinator_.
|
||||
|
||||
|
||||
```
|
||||
#!/bin/sh
|
||||
#
|
||||
# Version: 0.3
|
||||
#
|
||||
# Release Notes:
|
||||
# - v0.3: fixed a bug that was happening when the collection name included an underscore
|
||||
# - v0.2: compatibility with version 3.4: now each coordinator_<number-of-coordinator>.sh
|
||||
# includes a single restore command (instead of one for each collection)
|
||||
# which allows making using of the --threads option in v.3.4.0 and later
|
||||
# - v0.1: initial version
|
||||
|
||||
if test -z "$ARANGOSH" ; then
|
||||
export ARANGOSH=arangosh
|
||||
fi
|
||||
cat > /tmp/parallelRestore$$.js <<'EOF'
|
||||
var fs = require("fs");
|
||||
var print = require("internal").print;
|
||||
var exit = require("internal").exit;
|
||||
var arangorestore = "arangorestore";
|
||||
var env = require("internal").env;
|
||||
if (env.hasOwnProperty("ARANGORESTORE")) {
|
||||
arangorestore = env["ARANGORESTORE"];
|
||||
}
|
||||
|
||||
// Check ARGUMENTS: dumpDir coordinator1 coordinator2 ...
|
||||
|
||||
if (ARGUMENTS.length < 2) {
|
||||
print("Need at least two arguments DUMPDIR and COORDINATOR_ENDPOINTS!");
|
||||
exit(1);
|
||||
}
|
||||
|
||||
var dumpDir = ARGUMENTS[0];
|
||||
var coordinators = ARGUMENTS[1].split(",");
|
||||
var otherArgs = ARGUMENTS.slice(2);
|
||||
|
||||
// Quickly check the dump dir:
|
||||
var files = fs.list(dumpDir).filter(f => !fs.isDirectory(f));
|
||||
var found = files.indexOf("ENCRYPTION");
|
||||
if (found === -1) {
|
||||
print("This directory does not have an ENCRYPTION entry.");
|
||||
exit(2);
|
||||
}
|
||||
// Remove ENCRYPTION entry:
|
||||
files = files.slice(0, found).concat(files.slice(found+1));
|
||||
|
||||
for (let i = 0; i < files.length; ++i) {
|
||||
if (files[i].slice(-5) !== ".json") {
|
||||
print("This directory has files which do not end in '.json'!");
|
||||
exit(3);
|
||||
}
|
||||
}
|
||||
files = files.map(function(f) {
|
||||
var fullName = fs.join(dumpDir, f);
|
||||
var collName = "";
|
||||
if (f.slice(-10) === ".data.json") {
|
||||
var pos;
|
||||
if (f.slice(0, 1) === "_") { // system collection
|
||||
pos = f.slice(1).indexOf("_") + 1;
|
||||
collName = "_" + f.slice(1, pos);
|
||||
} else {
|
||||
pos = f.lastIndexOf("_")
|
||||
collName = f.slice(0, pos);
|
||||
}
|
||||
}
|
||||
return {name: fullName, collName, size: fs.size(fullName)};
|
||||
});
|
||||
files = files.sort(function(a, b) { return b.size - a.size; });
|
||||
var dataFiles = [];
|
||||
for (let i = 0; i < files.length; ++i) {
|
||||
if (files[i].name.slice(-10) === ".data.json") {
|
||||
dataFiles.push(i);
|
||||
}
|
||||
}
|
||||
|
||||
// Produce the scripts, one for each coordinator:
|
||||
var scripts = [];
|
||||
var collections = [];
|
||||
for (let i = 0; i < coordinators.length; ++i) {
|
||||
scripts.push([]);
|
||||
collections.push([]);
|
||||
}
|
||||
|
||||
var cnum = 0;
|
||||
var temp = '';
|
||||
var collections = [];
|
||||
for (let i = 0; i < dataFiles.length; ++i) {
|
||||
var f = files[dataFiles[i]];
|
||||
if (typeof collections[cnum] == 'undefined') {
|
||||
collections[cnum] = (`--collection ${f.collName}`);
|
||||
} else {
|
||||
collections[cnum] += (` --collection ${f.collName}`);
|
||||
}
|
||||
cnum += 1;
|
||||
if (cnum >= coordinators.length) {
|
||||
cnum = 0;
|
||||
}
|
||||
}
|
||||
|
||||
var cnum = 0;
|
||||
for (let i = 0; i < coordinators.length; ++i) {
|
||||
scripts[i].push(`${arangorestore} --input-directory ${dumpDir} --server.endpoint ${coordinators[i]} ` + collections[i] + ' ' + otherArgs.join(" "));
|
||||
}
|
||||
|
||||
for (let i = 0; i < coordinators.length; ++i) {
|
||||
let f = "coordinator_" + i + ".sh";
|
||||
print("Writing file", f, "...");
|
||||
fs.writeFileSync(f, scripts[i].join("\n"));
|
||||
}
|
||||
EOF
|
||||
|
||||
${ARANGOSH} --javascript.execute /tmp/parallelRestore$$.js -- "$@"
|
||||
rm /tmp/parallelRestore$$.js
|
||||
```
|
||||
|
||||
To run this script, all _Coordinator_ endpoints of the Cluster have to be
|
||||
provided. The script accepts all options of the tool _arangorestore_.
|
||||
|
||||
The command below can for instance be used on a Cluster with three
|
||||
_Coordinators_:
|
||||
|
||||
```
|
||||
./parallelRestore <dump-directory>
|
||||
tcp://<ip-of-coordinator1>:<port of coordinator1>,
|
||||
tcp://<ip-of-coordinator2>:<port of coordinator2>,
|
||||
tcp://<ip-of-coordinator3>:<port of coordinator3>
|
||||
--server.username <username>
|
||||
--server.password <password>
|
||||
--server.database <database_name>
|
||||
--create-collection false
|
||||
```
|
||||
|
||||
**Notes:**
|
||||
|
||||
- The option `--create-collection false` is passed since the collection
|
||||
structures were created already in the previous step.
|
||||
- Starting from v3.4.0 the _arangorestore_ option *--threads N* can be
|
||||
passed to the command above, where _N_ is an integer, to further parallelize
|
||||
the restore (default is `--threads 2`).
|
||||
|
||||
The above command will create three scripts, where three corresponds to
|
||||
the amount of listed _Coordinators_.
|
||||
|
||||
The resulting scripts are named `coordinator_<number-of-coordinator>.sh` (e.g.
|
||||
`coordinator_0.sh`, `coordinator_1.sh`, `coordinator_2.sh`).
|
||||
|
||||
Step 5: Execute parallel restore scripts
|
||||
----------------------------------------
|
||||
|
||||
The `coordinator_<number-of-coordinator>.sh` scripts, that were created in the
|
||||
previous step, now have to be executed on each machine where a _Coordinator_
|
||||
is running. This will start a parallel restore of the dump.
|
||||
|
||||
Step 6: Revert to the initial _Replication Factor_
|
||||
--------------------------------------------------
|
||||
|
||||
Once the _arangorestore_ process on every _Coordinator_ is completed, the
|
||||
_replication factor_ has to be set to its initial value.
|
||||
|
||||
Run the following command from exactly one _Coordinator_ (any _Coordinator_ can be
|
||||
used). Please adjust the `replicationFactor` value to your specific case (2 in the
|
||||
example below):
|
||||
|
||||
```
|
||||
echo 'db._collections().filter(function(c) { return c.name()[0] !== "_"; })
|
||||
.forEach(function(c) { print("collection:", c.name(), "replicationFactor:",
|
||||
c.properties().replicationFactor); c.properties({ replicationFactor: 2 }); });'
|
||||
| arangosh
|
||||
--server.endpoint <endpoint-of-a-coordinator>
|
||||
--server.database <database-name>
|
||||
--server.username <user-name>
|
||||
--server.password <password>
|
||||
```
|
|
@ -11,4 +11,10 @@ If you want to import data in formats like JSON or CSV, see
|
|||
_Arangorestore_ can restore selected collections or all collections of a backup,
|
||||
optionally including _system_ collections. One can restore the structure, i.e.
|
||||
the collections with their configuration with or without data.
|
||||
Views can also be dumped or restored (either all of them or selectively).
|
||||
Views can also be dumped or restored (either all of them or selectively).
|
||||
|
||||
{% hint 'tip' %}
|
||||
In order to speed up the _arangorestore_ performance in a Cluster environment,
|
||||
the [Fast Cluster Restore](FastClusterRestore.md)
|
||||
procedure is recommended.
|
||||
{% endhint %}
|
||||
|
|
|
@ -80,6 +80,7 @@
|
|||
* [Limitations](Programs/Arangodump/Limitations.md)
|
||||
* [Arangorestore](Programs/Arangorestore/README.md)
|
||||
* [Examples](Programs/Arangorestore/Examples.md)
|
||||
* [Fast Cluster Restore](Programs/Arangorestore/FastClusterRestore.md)
|
||||
* [Options](Programs/Arangorestore/Options.md)
|
||||
* [Arangoimport](Programs/Arangoimport/README.md)
|
||||
* [Examples JSON](Programs/Arangoimport/ExamplesJson.md)
|
||||
|
|
Loading…
Reference in New Issue