1
0
Fork 0
arangodb/Documentation/Books/Manual/Programs/Arangoimport/ExamplesJson.md

6.4 KiB

Arangoimport Examples: JSON

Importing JSON-encoded Data

We will be using these example user records to import:

{ "name" : { "first" : "John", "last" : "Connor" }, "active" : true, "age" : 25, "likes" : [ "swimming"] }
{ "name" : { "first" : "Jim", "last" : "O'Brady" }, "age" : 19, "likes" : [ "hiking", "singing" ] }
{ "name" : { "first" : "Lisa", "last" : "Jones" }, "dob" : "1981-04-09", "likes" : [ "running" ] }

To import these records, all you need to do is to put them into a file (with one line for each record to import) and run the following command:

arangoimport --file "data.json" --type jsonl --collection "users"

This will transfer the data to the server, import the records, and print a status summary. To show the intermediate progress during the import process, the option --progress can be added. This option will show the percentage of the input file that has been sent to the server. This will only be useful for big import files.

arangoimport --file "data.json" --type json --collection users --progress true

It is also possible to use the output of another command as an input for arangoimport. For example, the following shell command can be used to pipe data from the cat process to arangoimport (Linux/Cygwin only):

cat data.json | arangoimport --file - --type json --collection users

In a command line or PowerShell on Windows, there is the type command:

type data.json | arangoimport --file - --type json --collection users

Note that you have to use --file - if you want to use another command as input for arangoimport. No progress can be reported for such imports as the size of the input will be unknown to arangoimport.

By default, the endpoint tcp://127.0.0.1:8529 will be used. If you want to specify a different endpoint, you can use the --server.endpoint option. You probably want to specify a database user and password as well. You can do so by using the options --server.username and --server.password. If you do not specify a password, you will be prompted for one.

arangoimport --server.endpoint tcp://127.0.0.1:8529 --server.username root --file "data.json" --type json --collection "users"

Note that the collection (users in this case) must already exist or the import will fail. If you want to create a new collection with the import data, you need to specify the --create-collection option. Note that by default it will create a document collection and no edge collection.

arangoimport --file "data.json" --type json --collection "users" --create-collection true

To create an edge collection instead, use the --create-collection-type option and set it to edge:

arangoimport --file "data.json" --collection "myedges" --create-collection true --create-collection-type edge

When importing data into an existing collection it is often convenient to first remove all data from the collection and then start the import. This can be achieved by passing the --overwrite parameter to arangoimport. If it is set to true, any existing data in the collection will be removed prior to the import. Note that any existing index definitions for the collection will be preserved even if --overwrite is set to true.

arangoimport --file "data.json" --type json --collection "users" --overwrite true

As the import file already contains the data in JSON format, attribute names and data types are fully preserved. As can be seen in the example data, there is no need for all data records to have the same attribute names or types. Records can be inhomogeneous.

Please note that by default, arangoimport will import data into the specified collection in the default database (_system). To specify a different database, use the --server.database option when invoking arangoimport. If you want to import into a nonexistent database you need to pass --create-database true. Note --create-database defaults to false

The tool also supports parallel imports, with multiple threads. Using multiple threads may provide a speedup, especially when using the RocksDB storage engine. To specify the number of parallel threads use the --threads option:

arangoimport --threads 4 --file "data.json" --type json --collection "users"

Note that using multiple threads may lead to a non-sequential import of the input data. Data that appears later in the input file may be imported earlier than data that appears earlier in the input file. This is normally not a problem but may cause issues when when there are data dependencies or duplicates in the import data. In this case, the number of threads should be set to 1.

JSON input file formats

arangoimport supports two formats when importing JSON data from a file:

  • JSON
  • JSONL

The first format that we already used above is commonly known as JSONL, also called new-line delimited JSON. However, in contrast to the JSONL specification it requires the input file to contain one complete JSON object in each line, e.g.

{ "_key": "one", "value": 1 }
{ "_key": "two", "value": 2 }
{ "_key": "foo", "value": "bar" }
...

So one could argue that this is only a subset of JSONL, which permits top-level arrays.

The above format can be imported sequentially by arangoimport. It will read data from the input file in chunks and send it in batches to the server. Each batch will be about as big as specified in the command-line parameter --batch-size.

An alternative is to put one big JSON array into the input file like this:

[
  { "_key": "one", "value": 1 },
  { "_key": "two", "value": 2 },
  { "_key": "foo", "value": "bar" },
  ...
]

This format allows line breaks in the input file within documents. The downside is that the whole input file will need to be read by arangoimport before it can send the first batch. This might be a problem if the input file is big. By default, arangoimport will allow importing such files up to a size of about 16 MB.

If you want to allow your arangoimport instance to use more memory, you may want to increase the maximum file size by specifying the command-line option --batch-size. For example, to set the batch size to 32 MB, use the following command:

arangoimport --file "data.json" --type json --collection "users" --batch-size 33554432

Please also note that you may need to increase the value of --batch-size if a single document inside the input file is bigger than the value of --batch-size.