1
0
Fork 0
arangodb/Documentation/ToolsManual/ImpManual.md

117 lines
5.1 KiB
Markdown

Importing Data into an ArangoDB Database {#ImpManual}
=====================================================
@NAVIGATE_ImpManual
@EMBEDTOC{ImpManualTOC}
This manual describes the ArangoDB importer _arangoimp_, which can be used for
bulk imports.
The most convenient method to import a lot of data into ArangoDB is to use the
`arangoimp` command-line tool. It allows you to import data records from a file
into an existing database collection.
It is possible to import document keys with the documents using the `_key`
attribute. When importing into an edge collection, it is mandatory that all
imported documents have the `_from` and `_to` attributes, and that they contain
valid references.
Let's assume for the following examples you want to import user records into an
existing collection named "users" on the server.
Importing JSON-encoded Data {#ImpManualJson}
============================================
Let's further assume the import at hand is encoded in JSON. We'll be using these
example user records to import:
@verbinclude arangoimp-data-json
To import these records, all you need to do is to put them into a file (with one
line for each record to import) and run the following command:
unix> arangoimp --file "data.json" --type json --collection "users"
This will transfer the data to the server, import the records, and print a
status summary. To show the intermediate progress during the import process, the
option `--progress` can be added. This option will show the percentage of the
input file that has been sent to the server. This will only be useful for big
import files.
unix> arangoimp --file "data.json" --type json --collection "users" --progress true
By default, the endpoint `tcp://127.0.0.1:8529` will be used. If you want to
specify a different endpoint, you can use the `--server.endpoint` option. You
probably want to specify a database user and password as well. You can do so by
using the options `--server.username` and `--server.password`. If you do not
specify a password, you will be prompted for one.
unix> arangoimp --server.endpoint tcp://127.0.0.1:8529 --server.username root --file "data.json" --type json --collection "users"
Note that the collection (`users` in this case) must already exist or the import
will fail. If you want to create a new collection with the import data, you need
to specify the `--create-collection` option. Note that it is only possible to
create a document collection using the `--create-collection` flag.
unix> arangoimp --file "data.json" --type json --collection "users" --create-collection true
As the import file already contains the data in JSON format, attribute names and
data types are fully preserved. As can be seen in the example data, there is no
need for all data records to have the same attribute names or types. Records can
be inhomogenous.
Please note that by default, _arangoimp_ will import data into the specified
collection in the default database (`_system`). To specify a different database,
use the `--server.database` option when invoking _arangoimp_.
Importing CSV Data {#ImpManualCsv}
==================================
_arangoimp_ also offers the possibility to import data from CSV files. This
comes handy when the data at hand is in CSV format already and you don't want to
spend time converting them to JSON for the import.
To import data from a CSV file, make sure your file contains the attribute names
in the first row. All the following lines in the file will be interpreted as
data records and will be imported.
The CSV import requires the data to have a homogenuous structure. All records
must have exactly the same amount of columns as there are headers.
The cell values can have different data types though. If a cell does not have
any value, it can be left empty in the file. These values will not be imported
so the attributes will not "be there" in document created. Values enclosed in
quotes will be imported as strings, so to import numeric values, boolean values
or the null value, don't enclose the value into the quotes in your file.
We'll be using the following import for the CSV import:
@verbinclude arangoimp-data-csv
The command line to execute the import then is:
unix> arangoimp --file "data.csv" --type csv --collection "users"
Note that the quote and separator characters can be adjusted via the
`--quote` and `--separator` arguments when invoking _arangoimp_. The importer
supports Windows (CRLF) and Unix (LF) line breaks.
Importing TSV Data {#ImpManualTsv}
==================================
You may also import tab-separated values (TSV) from a file. This format is very
simple: every line in the file represents a data record. There is no quoting or
escaping. That also means that the separator character (which defaults to the
tabstop symbol) must not be used anywhere in the actual data.
As with CSV, the first line in the TSV file must contain the attribute names,
and all lines must have an identical number of values.
If a different separator character or string should be used, it can be specified
with the `--separator` argument.
An example command line to execute the TSV import is:
unix> arangoimp --file "data.tsv" --type tsv --collection "users"