20 KiB
New Features in ArangoDB 1.2
@NAVIGATE_NewFeatures12 @EMBEDTOC{NewFeatures12TOC}
Features and Improvements
The following list shows in detail which features have been added or improved in ArangoDB 1.2. ArangoDB 1.2 also contains several bugfixes that are not listed here.
User-defined document keys
ArangoDB 1.2 allows users to determine their own document keys to uniquely
identify documents.
In previous versions of ArangoDB, each document automatically got an _id
value determined by the server. The document id part of this _id
value
was an ever-increasing integer number that the user could not change or preselect.
Additionally, big integers are hard to memoize.
ArangoDB 1.2 allows specifying own values for the document id part of the _id
values. This can be done by providing a _key
attribute when creating documents.
The value specified for _key
will automatically become part of the _id
value
of a document, and can be used to retrieve a document later:
arangosh> db._create("mycollection");
[ArangoCollection 316959596850, "mycollection" (type document, status loaded)]
arangosh> db.mycollection.save({ _key: "mytest" });
{ "error" : false, "_id" : "mycollection/mytest", "_rev" : "316961300786", "_key" : "mytest" }
arangosh> db.mycollection.toArray();
[ { "_id" : "mycollection/mytest", "_rev" : "316961300786", "_key" : "mytest" } ]
arangosh> db.mycollection.document("mytest");
{ "_key" : "mytest", "_id" : "mycollection/mytest", "_rev" : "316961300786" }
This should be much more intuitive and easier than dealing with the server-generated integer values of previous versions.
Specifiying a value for the _key
attribute is optional. If the attribute is not
specified, ArangoDB will create its own value as it did in previous versions.
When upgrading from ArangoDB 1.1 to ArangoDB 1.2, the document ids will automatically
be converted into equivalent _key
values, meaning the same identifiers can be
used after an upgrade.
There are a few restrictions for the values that can be used inside _key
(please
refer to @ref DocumentKeys for details). Document key values are immutable, i.e. they
cannot be changed after a document has been created. The only way to change a _key
value of a document is to remove the document and re-insert it with a different key.
Document keys can also be used with the REST API at all places where a document id was required before.
Usage of collection names instead of ids
ArangoDB 1.1 and earlier versions required clients to the use server-generated collection ids instead of the collections' actual names in many of the REST APIs. To make using the REST APIs easier, this has been changed so collection names can be specified instead of collection ids.
Collection names are also returned as part of documents' _id
values to make comparing
results easier in case just the collection name was known.
Here is an example AQL query from ArangoDB 1.1 that works fine, because both the collection id and the document id are known:
arangosh> db._create("test");
[ArangoCollection 402238582, "test" (type document, status loaded)]
arangosh> db.test.save({ value : 1});
{ "_id" : "402238582/403745910", "_rev" : 403745910 }
arangosh> stmt = db._createStatement({ query: 'FOR t IN test FILTER t._id == "402238582/403745910" RETURN t' });
arangosh> cursor = stmt.execute();
arangosh> while (cursor.hasNext()) { print(cursor.next()); }
{ "_id" : "402238582/403745910", "_rev" : 403745910, "value" : 1 }
However, if either the collection id or the document id are not precisely known at query time, then the filter condition will fail.
arangosh> stmt = db._createStatement({ query: 'FOR t IN test FILTER t._id == "403745910" RETURN t' });
arangosh> cursor = stmt.execute();
arangosh> while (cursor.hasNext()) { print(cursor.next()); }
Effectively this meant that in 1.1 you had to know both collection names and collection ids in several cases, making ArangoDB 1.1 harder to use than necessary.
In ArangoDB 1.2, the above query can be rewritten to the much more intuitive form:
arangosh> stmt = db._createStatement({ query: 'FOR t IN test FILTER t._id == "test/mykey" RETURN t' });
...
or even:
arangosh> stmt = db._createStatement({ query: 'FOR t IN test FILTER t._key == "mykey" RETURN t' });
...
Collection names can also be used to reference other documents when creating edges, which makes setting up and querying edges much easier in ArangoDB 1.2:
arangosh> db._create("myvertexcollection");
[ArangoCollection 316959891364, "myvertexcollection" (type document, status loaded)]
arangosh> db._createEdgeCollection("myedgecollection");
[ArangoCollection 316961464228, "myedgecollection" (type edge, status loaded)]
arangosh> db.myvertexcollection.save({ _key: "vertex1", value: 1234 });
{ "error" : false, "_id" : "myvertexcollection/vertex1", "_rev" : "316963299236", "_key" : "vertex1" }
arangosh> db.myvertexcollection.save({ _key: "vertex2", value: 6789 });
{ "error" : false, "_id" : "myvertexcollection/vertex2", "_rev" : "316963364772", "_key" : "vertex2" }
arangosh> db.myedgecollection.save("myvertexcollection/vertex1", "myvertexcollection/vertex2", { type: "some value" });
{ "error" : false, "_id" : "myedgecollection/316963561380", "_rev" : "316963561380", "_key" : "316963561380" }
arangosh> db.myedgecollection.outEdges("myvertexcollection/vertex1");
[
{ "_id" : "myedgecollection/316963561380", "_rev" : "316963561380", "_key" : "316963561380", "_from" : "myvertexcollection/vertex1", "_to" : "myvertexcollection/vertex2", "type" : "some value" }
]
Additional graph functionality
-
@BOOK_REF{JSModuleGraph}
-
@BOOK_REF{HttpGraph}
AQL Improvements
Additional functions
The following functions have been added in the ArangoDB Query Language (AQL) in ArangoDB 1.2:
-
TRAVERSAL()
: returns the result of a graph traversal.The starting point and direction of the traversal have to be specified. It is also possible to configure the traversal by setting the visitation order (e.g. depth-first vs. breadth-first, pre-order vs. post-order visitation), limiting the traversal depth, specifying uniqueness constraints, and limiting the traversal to specific types of edges.
-
TRAVERSAL_TREE()
: the same asTRAVERSAL()
, but will return the result in a hierarchical format -
EDGES()
: access edges connected to a vertex from via an AQL function -
ATTRIBUTES()
: returns the names of all attributes of a document as a list -
KEEP()
: keeps only the specified attributes of a document, and removes all others -
UNSET()
: removes only the specified attributes from a document, and preserves all others -
MATCHES()
: to check if a document matches one of multiple example documents -
LIKE()
: pattern-based text comparison -
FULLTEXT()
: to execute queries on a fulltext index
The documentation for the individual functions is available at @ref AqlFunctions
Syntactic extensions
AQL now also allows the following syntactic constructs, which were not available in previous versions of ArangoDB:
FUNC(...)[....]
FUNC(...).attribute
The above syntax constructs allow accessing an indexed or named attribute of a function return value directly, without having to store the return value in an intermediate variable.
An example query that is made possible by this change (returns names of all available collections):
RETURN COLLECTIONS()[*].name
This query did not work in ArangoDB 1.1, and the workaround was to either use a for loop and process the individual function return values separately, or to store them in a temporary variable as follows:
FOR c IN COLLECTIONS() RETURN c.name
LET c = COLLECTIONS() RETURN c[*].name
Note: ArangoDB 1.2 still supports these queries, but the above syntax is much simpler and intuitive.
ArangoDB 1.2 also supports negative list indexes to return list elements from
the end of the list using the [ ]
operator.
To return the last element from a list, the following query can be used in ArangoDB 1.2:
LET l = [ 1, 2, 3, 4 ] RETURN l[-1]
whereas in ArangoDB 1.1, the length of the list had to be determined, and the element offset had to be calculated based on that:
LET l = [ 1, 2, 3, 4 ] RETURN l[LENGTH(l) - 1]
Note: ArangoDB 1.2 still supports this, but the above form is much simpler.
Index usage and optimisations
AQL queries in ArangoDB 1.2 will use indexes in more cases than in previous versions of ArangoDB:
- querying with an equality predicate on the
_key
attribute will try to use the primary index - querying on
_from
and_to
attributes of documents in an edge collection will try to use the edges index - if the result of the
PATHS()
function is restricted to specificsource
ordestination
vertexes (identified by dedicated_id
values), the query will try to use the edges index. - some specific LIMIT and SORT operations will be pushed upwards in the execution pipeline to make queries faster.
Note: try
in this context means that the query optimiser will try to apply the
mentioned optimisations. Specific optimisations might not be applied if other parts
of the query prohibit the optimisations (e.g. querying with logical OR on different
attributes will likely disable several optimisations).
Fulltext indexes
- added fulltext index type. This index type allows indexing words and prefixes of words from a specific document attribute. The index can be queries using a SimpleQueryFull object, the HTTP REST API at /_api/simple/fulltext, or via AQL
CORS support
ArangoDB 1.2 offers support for Cross-Origin Resource Sharing (CORS), allowing it to be used as AJAX back end even in a cross-domain request setup.
Browsers will normally not allow cross-domain requests for AJAX calls due to security reasons. Modern browsers allow defined exceptions from this same-origin policy rule, provided the client code sets some specific properties in the XMLHttpRequest object. Browsers will then patch requests with some additional headers, and will allow cross-domain requests if the server returns that request with a specifically patched response (see @EXTREF_S{http://en.wikipedia.org/wiki/Cross-origin_resource_sharing, CORS on Wikipedia} for more details.
To support CORS, ArangoDB 1.2 will answer requests that include a HTTP Origin
header
with an Access-Control-Allow-Origin
header with the originally request value.
ArangoDB 1.2 will also answer CORS preflight requests that are issued from browsers
via the HTTP OPTIONS method.
More details on how ArangoDB handles CORS requests can be found in @ref CommunicationCors.
In-memory collections
ArangoDB collections are persistent and collection data is synchronised to disk frequently.
For some specific use cases persistence and synchronisation are overkill. This is true for "unimportant" data that can be reproduced from other sources easily. An example for such data are temporary calculations or caches.
To support managing data that does not require any sort of persistence, ArangoDB 1.2 introduces in-memory collections as an experimental feature.
Experimental means that the feature might not be available on all platforms, and that it might change drastically or even be removed in a future version of ArangoDB.
To create an in-memory collection without persistence, the optional attribute
isVolatile
can be set to true
:
arangosh> db._create("mycache", { isVolatile: true});
[ArangoCollection 316961790811, "mycache" (type document, status loaded)]
The collection can be used normally, and its definition (name, id, type) will remain even after a server restart. However, all data written into in-memory collections is lost when the collection is unloaded or the server is shut down:
arangosh> db.mycache.save({ value : 1 });
{ "error" : false, "_id" : "mycache/316963298139", "_rev" : "316963298139", "_key" : "316963298139" }
arangosh> db.mycache.count();
1
arangosh> db.mycache.unload();
arangosh> db.mycache.count();
0
So please be careful when employing this type of collection. It should never be used for any critical data that requires persistence and durability.
The benefits of this type of collection are that there is no disk overhead for document operations in it. The actual performance gains over regular collections are highly platform-dependent.
Other Improvements
Remove by Example
ArangoDB 1.2 provides a "Remove by Example" method that can be used to delete all documents from a collection that match the specified example. This can be used to delete multiple documents with a single operation.
The removeByExample
method returns the number of documents removed:
arangosh> db._create("mycollection");
arangosh> db.mycollection.save({ value: 1 });
arangosh> db.mycollection.save({ value: 2 });
arangosh> db.mycollection.save({ value: 3 });
arangosh> db.mycollection.save({ value: 1 });
arangosh> db.mycollection.removeByExample({ value: 1 });
2
The number of documents to remove can optionally be limited.
The method is also available via the REST API (@ref HttpSimple).
Replace / Update by Example
ArangoDB 1.2 also provides "Replace by Example" and "Update by Example" methods that can be used to fully or partially update documents from a collection that match the specified example. This can be used to replace document values easily with a single operation.
The replaceByExample
and updateByExaple
methods return the number of documents
modified. Both operations can be limited to a specific number of documents if required.
Both methods are also available via the REST API (@ref HttpSimple).
Collection revision id
ArangoDB 1.2 collections have a revision property. The current revision of a collection can be queried to determine whether data in a collection has changed.
The information can be used by clients to invalidate caches etc. Here's an example:
arangosh> db._create("mycollection");
[ArangoCollection 316964608859, "mycollection" (type document, status loaded)]
arangosh> db.mycollection.save({ value : 1 });
{ "error" : false, "_id" : "mycollection/316966312795", "_rev" : "316966312795", "_key" : "316966312795" }
arangosh> db.mycollection.revision();
316966312795
arangosh> db.mycollection.save({ value : 2 });
{ "error" : false, "_id" : "mycollection/316966378331", "_rev" : "316966378331", "_key" : "316966378331" }
arangosh> db.mycollection.revision();
316966378331
arangosh> db.mycollection.remove("316966378331");
true
arangosh> db.mycollection.revision();
316966443867
The revision value returned should be treated as an opaque string. Clients should only use the value to check whether it is different from a previous revision value.
The revision value is also available via the REST API (@ref HttpCollection).
Number of shapes
In some cases it is useful to determine how many shapes are present in a specific collection ("shape" is ArangoDB lingo for a document format, consisting of attribute names and data types).
Though shapes are implicitly managed by ArangoDB and users can not create or delete shapes, the number of shapes can give a useful hint on how much variety the documents in a collection have or had.
The number of shapes is returned by a collection's figures()
method, and is also provided
in the return value of the figures REST API call (@ref HttpCollection).
Edges index
Insertion into an edges index is faster than in ArangoDB 1.1. This speeds up inserting documents into an edges collection, or loading an edges collection. The benefits will only be noticeable if the collection is big.
Allow importing of JSON lists
arangoimp
and the REST bulk import API (@ref HttpImport) of ArangoDB allow mass importing
documents from a file in these formats:
-
individual JSON documents: each line in the input file is a standalone JSON document: { _key: "key1", value: 1 } { _key: "key2", value: 2 } ...
-
CSV: the input file must have a header line which defines the attribute names. All following lines are considered to be the attribute values
-
TSV: like CSV, but not using a comma as the separator but tabs
ArangoDB 1.2 also allows to import an input file that contains a JSON list of JSON documents as follows:
[
{ _key: "key1", value: 1 },
{ _key: "key2", value: 2 },
...
]
Note that there is a size restriction for this format. The whole input file needs to
fit into a single buffer, or arangoimp
will refuse to import the data.
The buffer size can be adjusted when invoking arangoimp
by setting the --max-upload-size
option to an appropriate value. To set the buffer to 16 MB:
> arangoimp --file "input.json" --type json --max-upload-size 16777216 --collection myimport --create-collection true
Importing into edge collections
ArangoDB 1.2 also allows to import documents into edge collections. Previous versions only supported importing documents into document collections.
When importing data into edge collections, it is mandatory that all imported documents
contain the _from
and _to
attributes containing valid references.
Logging of import failures
When documents are imported via the REST bulk import API (@ref HttpImport), ArangoDB returns an aggregate response containing the number of documents inserted plus the number of errors that occurred during the import.
ArangoDB 1.2 supplies additional information about the errors that occurred in the server log files. This information can be used to find documents that could not be imported (for example, because of a unique constraint violation, malformed data etc.).
Pretty printing in arangosh
arangosh
now uses colorised output by default, which makes all of its output
more easy to grasp.
Note that colorised output may not be supported on all platforms.
Deactivatable statistics gathering
ArangoDB normally includes functionality to automatically gather request and connection-related statistics. Gathering these statistics happens automatically in a background thread. Collecting statistics can be quite CPU-intensive, especially when multiple ArangoDB instances are run on the same physical host.
Statistics features can be turned off at compile time using the --enable-figures=no
configure option. However, this option affects compilation and thus was not available
when using any of th precompiled packages.
For those, ArangoDB 1.2 now provides a startup option --server.disable-statistics
that
allows turning off the statistics gathering for performance reasons even with an
already compiled binary.
Usage is:
> arangod --server.disable-statistics true ...
@BNAVIGATE_NewFeatures12