1
0
Fork 0

increased default --batch-size for arangoimp, improved documentation for arangoimp

This commit is contained in:
Jan Steemann 2014-05-23 18:22:00 +02:00
parent a94435be26
commit 017d36bfc2
5 changed files with 46 additions and 3 deletions

View File

@ -75,7 +75,7 @@ Please note that by default, _arangoimp_ will import data into the specified
collection in the default database (`_system`). To specify a different database, collection in the default database (`_system`). To specify a different database,
use the `--server.database` option when invoking _arangoimp_. use the `--server.database` option when invoking _arangoimp_.
An _arangoimp_ import will print out the final results on the command line. An _arangoimp_ import run will print out the final results on the command line.
By default, it shows the number of documents created, the number of errors that By default, it shows the number of documents created, the number of errors that
occurred on the server side, and the total number of input file lines/documents occurred on the server side, and the total number of input file lines/documents
that it processed. Additionally, _arangoimp_ will print out details about errors that it processed. Additionally, _arangoimp_ will print out details about errors
@ -87,6 +87,45 @@ Example:
errors: 0 errors: 0
total: 2 total: 2
Please note that _arangoimp_ supports two formats when importing JSON data from
a file. The first format requires the input file to contain one JSON document
in each line, e.g.
{ "_key": "one", "value": 1 }
{ "_key": "two", "value": 2 }
{ "_key": "foo", "value": "bar" }
...
The above format can be imported sequentially by _arangoimp_. It will read data
from the input file in chunks and send it in batches to the server. Each batch
will be about as big as specified in the command-line parameter `--batch-size`.
An alternative is to put one big JSON document into the input file like this:
[
{ "_key": "one", "value": 1 },
{ "_key": "two", "value": 2 },
{ "_key": "foo", "value": "bar" },
...
]
This format allows line breaks within the input file as required. The downside
is that the whole input file will need to be read by _arangoimp_ before it can
send the first batch. This might be a problem if the input file is big. By
default, _arangoimp_ will allow importing such files up to a size of about 16 MB.
If you want to allow your _arangoimp_ instance to use more memory, you may want
to increase the maximum file size by specifying the command-line option
`--batch-size`. For example, to set the batch size to 32 MB, use the following
command:
unix> arangoimp --file "data.json" --type json --collection "users" --batch-size 33554432
Please also note that you may need to increase the value of `--batch-size` if
a single document inside the input file is bigger than the value of `--batch-size`.
Importing CSV Data {#ImpManualCsv} Importing CSV Data {#ImpManualCsv}
================================== ==================================

View File

@ -15,6 +15,8 @@ online manual, available at http://www.arangodb.org/
The most important startup options are: The most important startup options are:
.IP "--batch-size <uint64>"
maximum size of data batches that are sent to the server
.IP "--configuration <string>" .IP "--configuration <string>"
read configuration from file <string> read configuration from file <string>
.IP "--collection <string>" .IP "--collection <string>"

View File

@ -15,6 +15,8 @@ online manual, available at http://www.arangodb.org/
The most important startup options are: The most important startup options are:
OPTION "--batch-size <uint64>"
maximum size of data batches that are sent to the server ENDOPTION
OPTION "--configuration <string>" OPTION "--configuration <string>"
read configuration from file <string> ENDOPTION read configuration from file <string> ENDOPTION
OPTION "--collection <string>" OPTION "--collection <string>"

View File

@ -285,7 +285,7 @@ namespace triagens {
if (fd != STDIN_FILENO) { if (fd != STDIN_FILENO) {
TRI_CLOSE(fd); TRI_CLOSE(fd);
} }
_errorMessage = "import file is too big."; _errorMessage = "import file is too big. please increase the value of --batch-size (currently " + StringUtils::itoa(_maxUploadSize) + ")";
return false; return false;
} }

View File

@ -81,7 +81,7 @@ V8ClientConnection* ClientConnection = 0;
/// @brief max size body size (used for imports) /// @brief max size body size (used for imports)
//////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////////
static uint64_t ChunkSize = 1024 * 1024 * 4; static uint64_t ChunkSize = 1024 * 1024 * 16;
//////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////////
/// @brief quote character(s) /// @brief quote character(s)