mirror of https://gitee.com/bigwinds/arangodb
increased default --batch-size for arangoimp, improved documentation for arangoimp
This commit is contained in:
parent
a94435be26
commit
017d36bfc2
|
@ -75,7 +75,7 @@ Please note that by default, _arangoimp_ will import data into the specified
|
||||||
collection in the default database (`_system`). To specify a different database,
|
collection in the default database (`_system`). To specify a different database,
|
||||||
use the `--server.database` option when invoking _arangoimp_.
|
use the `--server.database` option when invoking _arangoimp_.
|
||||||
|
|
||||||
An _arangoimp_ import will print out the final results on the command line.
|
An _arangoimp_ import run will print out the final results on the command line.
|
||||||
By default, it shows the number of documents created, the number of errors that
|
By default, it shows the number of documents created, the number of errors that
|
||||||
occurred on the server side, and the total number of input file lines/documents
|
occurred on the server side, and the total number of input file lines/documents
|
||||||
that it processed. Additionally, _arangoimp_ will print out details about errors
|
that it processed. Additionally, _arangoimp_ will print out details about errors
|
||||||
|
@ -87,6 +87,45 @@ Example:
|
||||||
errors: 0
|
errors: 0
|
||||||
total: 2
|
total: 2
|
||||||
|
|
||||||
|
|
||||||
|
Please note that _arangoimp_ supports two formats when importing JSON data from
|
||||||
|
a file. The first format requires the input file to contain one JSON document
|
||||||
|
in each line, e.g.
|
||||||
|
|
||||||
|
{ "_key": "one", "value": 1 }
|
||||||
|
{ "_key": "two", "value": 2 }
|
||||||
|
{ "_key": "foo", "value": "bar" }
|
||||||
|
...
|
||||||
|
|
||||||
|
The above format can be imported sequentially by _arangoimp_. It will read data
|
||||||
|
from the input file in chunks and send it in batches to the server. Each batch
|
||||||
|
will be about as big as specified in the command-line parameter `--batch-size`.
|
||||||
|
|
||||||
|
An alternative is to put one big JSON document into the input file like this:
|
||||||
|
|
||||||
|
[
|
||||||
|
{ "_key": "one", "value": 1 },
|
||||||
|
{ "_key": "two", "value": 2 },
|
||||||
|
{ "_key": "foo", "value": "bar" },
|
||||||
|
...
|
||||||
|
]
|
||||||
|
|
||||||
|
This format allows line breaks within the input file as required. The downside
|
||||||
|
is that the whole input file will need to be read by _arangoimp_ before it can
|
||||||
|
send the first batch. This might be a problem if the input file is big. By
|
||||||
|
default, _arangoimp_ will allow importing such files up to a size of about 16 MB.
|
||||||
|
|
||||||
|
If you want to allow your _arangoimp_ instance to use more memory, you may want
|
||||||
|
to increase the maximum file size by specifying the command-line option
|
||||||
|
`--batch-size`. For example, to set the batch size to 32 MB, use the following
|
||||||
|
command:
|
||||||
|
|
||||||
|
unix> arangoimp --file "data.json" --type json --collection "users" --batch-size 33554432
|
||||||
|
|
||||||
|
Please also note that you may need to increase the value of `--batch-size` if
|
||||||
|
a single document inside the input file is bigger than the value of `--batch-size`.
|
||||||
|
|
||||||
|
|
||||||
Importing CSV Data {#ImpManualCsv}
|
Importing CSV Data {#ImpManualCsv}
|
||||||
==================================
|
==================================
|
||||||
|
|
||||||
|
|
|
@ -15,6 +15,8 @@ online manual, available at http://www.arangodb.org/
|
||||||
|
|
||||||
The most important startup options are:
|
The most important startup options are:
|
||||||
|
|
||||||
|
.IP "--batch-size <uint64>"
|
||||||
|
maximum size of data batches that are sent to the server
|
||||||
.IP "--configuration <string>"
|
.IP "--configuration <string>"
|
||||||
read configuration from file <string>
|
read configuration from file <string>
|
||||||
.IP "--collection <string>"
|
.IP "--collection <string>"
|
||||||
|
|
|
@ -15,6 +15,8 @@ online manual, available at http://www.arangodb.org/
|
||||||
|
|
||||||
The most important startup options are:
|
The most important startup options are:
|
||||||
|
|
||||||
|
OPTION "--batch-size <uint64>"
|
||||||
|
maximum size of data batches that are sent to the server ENDOPTION
|
||||||
OPTION "--configuration <string>"
|
OPTION "--configuration <string>"
|
||||||
read configuration from file <string> ENDOPTION
|
read configuration from file <string> ENDOPTION
|
||||||
OPTION "--collection <string>"
|
OPTION "--collection <string>"
|
||||||
|
|
|
@ -285,7 +285,7 @@ namespace triagens {
|
||||||
if (fd != STDIN_FILENO) {
|
if (fd != STDIN_FILENO) {
|
||||||
TRI_CLOSE(fd);
|
TRI_CLOSE(fd);
|
||||||
}
|
}
|
||||||
_errorMessage = "import file is too big.";
|
_errorMessage = "import file is too big. please increase the value of --batch-size (currently " + StringUtils::itoa(_maxUploadSize) + ")";
|
||||||
return false;
|
return false;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -81,7 +81,7 @@ V8ClientConnection* ClientConnection = 0;
|
||||||
/// @brief max size body size (used for imports)
|
/// @brief max size body size (used for imports)
|
||||||
////////////////////////////////////////////////////////////////////////////////
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
|
|
||||||
static uint64_t ChunkSize = 1024 * 1024 * 4;
|
static uint64_t ChunkSize = 1024 * 1024 * 16;
|
||||||
|
|
||||||
////////////////////////////////////////////////////////////////////////////////
|
////////////////////////////////////////////////////////////////////////////////
|
||||||
/// @brief quote character(s)
|
/// @brief quote character(s)
|
||||||
|
|
Loading…
Reference in New Issue