increased default --batch-size for arangoimp, improved documentation for arangoimp

2014-05-23 18:22:00 +02:00 · 2014-05-23 18:22:00 +02:00 · 017d36bfc2
parent a94435be26
commit 017d36bfc2
5 changed files with 46 additions and 3 deletions
--- a/Documentation/ToolsManual/ImpManual.md
+++ b/Documentation/ToolsManual/ImpManual.md
@ -75,7 +75,7 @@ Please note that by default, _arangoimp_ will import data into the specified
 collection in the default database (`_system`). To specify a different database, 
 use the `--server.database` option when invoking _arangoimp_. 
-An _arangoimp_ import will print out the final results on the command line.
+An _arangoimp_ import run will print out the final results on the command line.
 By default, it shows the number of documents created, the number of errors that
 occurred on the server side, and the total number of input file lines/documents
 that it processed. Additionally, _arangoimp_ will print out details about errors 
@ -87,6 +87,45 @@ Example:
    errors:           0
    total:            2
 Please note that _arangoimp_ supports two formats when importing JSON data from 
 a file. The first format requires the input file to contain one JSON document
 in each line, e.g.
    { "_key": "one", "value": 1 }
    { "_key": "two", "value": 2 }
    { "_key": "foo", "value": "bar" }
    ...
 The above format can be imported sequentially by _arangoimp_. It will read data
 from the input file in chunks and send it in batches to the server. Each batch
 will be about as big as specified in the command-line parameter `--batch-size`.
 An alternative is to put one big JSON document into the input file like this:
    [
      { "_key": "one", "value": 1 },
      { "_key": "two", "value": 2 },
      { "_key": "foo", "value": "bar" },
      ...
    ]
 This format allows line breaks within the input file as required. The downside 
 is that the whole input file will need to be read by _arangoimp_ before it can
 send the first batch. This might be a problem if the input file is big. By
 default, _arangoimp_ will allow importing such files up to a size of about 16 MB.
 If you want to allow your _arangoimp_ instance to use more memory, you may want
 to increase the maximum file size by specifying the command-line option
 `--batch-size`. For example, to set the batch size to 32 MB, use the following
 command:
    unix> arangoimp --file "data.json" --type json --collection "users" --batch-size 33554432
 Please also note that you may need to increase the value of `--batch-size` if
 a single document inside the input file is bigger than the value of `--batch-size`.
 Importing CSV Data {#ImpManualCsv}
 ==================================
--- a/Documentation/man/man1/arangoimp.1
+++ b/Documentation/man/man1/arangoimp.1
@ -15,6 +15,8 @@ online manual, available at http://www.arangodb.org/
 The most important startup options are:
 .IP "--batch-size <uint64>"
 maximum size of data batches that are sent to the server
 .IP "--configuration <string>"
 read configuration from file <string> 
 .IP "--collection <string>"
--- a/Documentation/man1/arangoimp
+++ b/Documentation/man1/arangoimp
@ -15,6 +15,8 @@ online manual, available at http://www.arangodb.org/
 The most important startup options are:
 OPTION "--batch-size <uint64>"
 maximum size of data batches that are sent to the server ENDOPTION
 OPTION "--configuration <string>"
 read configuration from file <string> ENDOPTION
 OPTION "--collection <string>"
--- a/arangosh/V8Client/ImportHelper.cpp
+++ b/arangosh/V8Client/ImportHelper.cpp
@ -285,7 +285,7 @@ namespace triagens {
            if (fd != STDIN_FILENO) {
              TRI_CLOSE(fd);
            }
-            _errorMessage = "import file is too big.";
+            _errorMessage = "import file is too big. please increase the value of --batch-size (currently " + StringUtils::itoa(_maxUploadSize) + ")";
            return false;
          }
--- a/arangosh/V8Client/arangoimp.cpp
+++ b/arangosh/V8Client/arangoimp.cpp
@ -81,7 +81,7 @@ V8ClientConnection* ClientConnection = 0;
 /// @brief max size body size (used for imports)
 ////////////////////////////////////////////////////////////////////////////////
-static uint64_t ChunkSize = 1024 * 1024 * 4;
+static uint64_t ChunkSize = 1024 * 1024 * 16;
 ////////////////////////////////////////////////////////////////////////////////
 /// @brief quote character(s)