20 KiB

Raw Blame History

New Features in ArangoDB 1.2

@NAVIGATE_NewFeatures12 @EMBEDTOC{NewFeatures12TOC}

Features and Improvements

The following list shows in detail which features have been added or improved in ArangoDB 1.2. ArangoDB 1.2 also contains several bugfixes that are not listed here.

User-defined document keys

ArangoDB 1.2 allows users to determine their own document keys to uniquely identify documents. In previous versions of ArangoDB, each document automatically got an _id value determined by the server. The document id part of this _id value was an ever-increasing integer number that the user could not change or preselect. Additionally, big integers are hard to memoize.

ArangoDB 1.2 allows specifying own values for the document id part of the _id values. This can be done by providing a _key attribute when creating documents.

The value specified for _key will automatically become part of the _id value of a document, and can be used to retrieve a document later:

arangosh> db._create("mycollection");
[ArangoCollection 316959596850, "mycollection" (type document, status loaded)]

arangosh> db.mycollection.save({ _key: "mytest" });
{ "error" : false, "_id" : "mycollection/mytest", "_rev" : "316961300786", "_key" : "mytest" }

arangosh> db.mycollection.toArray();
[ { "_id" : "mycollection/mytest", "_rev" : "316961300786", "_key" : "mytest" } ]

arangosh> db.mycollection.document("mytest");
{ "_key" : "mytest", "_id" : "mycollection/mytest", "_rev" : "316961300786" }

This should be much more intuitive and easier than dealing with the server-generated integer values of previous versions.

Specifiying a value for the _key attribute is optional. If the attribute is not specified, ArangoDB will create its own value as it did in previous versions. When upgrading from ArangoDB 1.1 to ArangoDB 1.2, the document ids will automatically be converted into equivalent _key values, meaning the same identifiers can be used after an upgrade.

There are a few restrictions for the values that can be used inside _key (please refer to @ref DocumentKeys for details). Document key values are immutable, i.e. they cannot be changed after a document has been created. The only way to change a _key value of a document is to remove the document and re-insert it with a different key.

Document keys can also be used with the REST API at all places where a document id was required before.

Usage of collection names instead of ids

ArangoDB 1.1 and earlier versions required clients to the use server-generated collection ids instead of the collections' actual names in many of the REST APIs. To make using the REST APIs easier, this has been changed so collection names can be specified instead of collection ids.

Collection names are also returned as part of documents' _id values to make comparing results easier in case just the collection name was known.

Here is an example AQL query from ArangoDB 1.1 that works fine, because both the collection id and the document id are known:

arangosh> db._create("test");
[ArangoCollection 402238582, "test" (type document, status loaded)]

arangosh> db.test.save({ value : 1});
{ "_id" : "402238582/403745910", "_rev" : 403745910 }

arangosh> stmt = db._createStatement({ query: 'FOR t IN test FILTER t._id == "402238582/403745910" RETURN t' });
arangosh> cursor = stmt.execute();                                                                            
arangosh> while (cursor.hasNext()) { print(cursor.next()); }
{ "_id" : "402238582/403745910", "_rev" : 403745910, "value" : 1 }

However, if either the collection id or the document id are not precisely known at query time, then the filter condition will fail.

arangosh> stmt = db._createStatement({ query: 'FOR t IN test FILTER t._id == "403745910" RETURN t' });
arangosh> cursor = stmt.execute();                                                                            
arangosh> while (cursor.hasNext()) { print(cursor.next()); }

Effectively this meant that in 1.1 you had to know both collection names and collection ids in several cases, making ArangoDB 1.1 harder to use than necessary.

In ArangoDB 1.2, the above query can be rewritten to the much more intuitive form:

arangosh> stmt = db._createStatement({ query: 'FOR t IN test FILTER t._id == "test/mykey" RETURN t' });
...

or even:

arangosh> stmt = db._createStatement({ query: 'FOR t IN test FILTER t._key == "mykey" RETURN t' });
...

Collection names can also be used to reference other documents when creating edges, which makes setting up and querying edges much easier in ArangoDB 1.2:

arangosh> db._create("myvertexcollection");
[ArangoCollection 316959891364, "myvertexcollection" (type document, status loaded)]

arangosh> db._createEdgeCollection("myedgecollection");
[ArangoCollection 316961464228, "myedgecollection" (type edge, status loaded)]

arangosh> db.myvertexcollection.save({ _key: "vertex1", value: 1234 });
{ "error" : false, "_id" : "myvertexcollection/vertex1", "_rev" : "316963299236", "_key" : "vertex1" }

arangosh> db.myvertexcollection.save({ _key: "vertex2", value: 6789 });
{ "error" : false, "_id" : "myvertexcollection/vertex2", "_rev" : "316963364772", "_key" : "vertex2" }

arangosh> db.myedgecollection.save("myvertexcollection/vertex1", "myvertexcollection/vertex2", { type: "some value" });
{ "error" : false, "_id" : "myedgecollection/316963561380", "_rev" : "316963561380", "_key" : "316963561380" } 

arangosh> db.myedgecollection.outEdges("myvertexcollection/vertex1");
[ 
  { "_id" : "myedgecollection/316963561380", "_rev" : "316963561380", "_key" : "316963561380", "_from" : "myvertexcollection/vertex1", "_to" : "myvertexcollection/vertex2", "type" : "some value" } 
]

Additional graph functionality

@BOOK_REF{JSModuleGraph}
@BOOK_REF{HttpGraph}

AQL Improvements

Additional functions

The following functions have been added in the ArangoDB Query Language (AQL) in ArangoDB 1.2:

TRAVERSAL(): returns the result of a graph traversal.

The starting point and direction of the traversal have to be specified. It is also possible to configure the traversal by setting the visitation order (e.g. depth-first vs. breadth-first, pre-order vs. post-order visitation), limiting the traversal depth, specifying uniqueness constraints, and limiting the traversal to specific types of edges.
TRAVERSAL_TREE(): the same as TRAVERSAL(), but will return the result in a hierarchical format
EDGES(): access edges connected to a vertex from via an AQL function
ATTRIBUTES(): returns the names of all attributes of a document as a list
KEEP(): keeps only the specified attributes of a document, and removes all others
UNSET(): removes only the specified attributes from a document, and preserves all others
MATCHES(): to check if a document matches one of multiple example documents
LIKE(): pattern-based text comparison
FULLTEXT(): to execute queries on a fulltext index

The documentation for the individual functions is available at @ref AqlFunctions

Syntactic extensions

AQL now also allows the following syntactic constructs, which were not available in previous versions of ArangoDB:

FUNC(...)[....]
FUNC(...).attribute

The above syntax constructs allow accessing an indexed or named attribute of a function return value directly, without having to store the return value in an intermediate variable.

An example query that is made possible by this change (returns names of all available collections):

RETURN COLLECTIONS()[*].name

This query did not work in ArangoDB 1.1, and the workaround was to either use a for loop and process the individual function return values separately, or to store them in a temporary variable as follows:

FOR c IN COLLECTIONS() RETURN c.name
LET c = COLLECTIONS() RETURN c[*].name

Note: ArangoDB 1.2 still supports these queries, but the above syntax is much simpler and intuitive.

ArangoDB 1.2 also supports negative list indexes to return list elements from the end of the list using the [ ] operator.

To return the last element from a list, the following query can be used in ArangoDB 1.2:

LET l = [ 1, 2, 3, 4 ] RETURN l[-1]

whereas in ArangoDB 1.1, the length of the list had to be determined, and the element offset had to be calculated based on that:

LET l = [ 1, 2, 3, 4 ] RETURN l[LENGTH(l) - 1]

Note: ArangoDB 1.2 still supports this, but the above form is much simpler.

Index usage and optimisations

AQL queries in ArangoDB 1.2 will use indexes in more cases than in previous versions of ArangoDB:

querying with an equality predicate on the _key attribute will try to use the primary index
querying on _from and _to attributes of documents in an edge collection will try to use the edges index
if the result of the PATHS() function is restricted to specific source or destination vertexes (identified by dedicated _id values), the query will try to use the edges index.
some specific LIMIT and SORT operations will be pushed upwards in the execution pipeline to make queries faster.

Note: try in this context means that the query optimiser will try to apply the mentioned optimisations. Specific optimisations might not be applied if other parts of the query prohibit the optimisations (e.g. querying with logical OR on different attributes will likely disable several optimisations).

Fulltext indexes

added fulltext index type. This index type allows indexing words and prefixes of words from a specific document attribute. The index can be queries using a SimpleQueryFull object, the HTTP REST API at /_api/simple/fulltext, or via AQL

CORS support

ArangoDB 1.2 offers support for Cross-Origin Resource Sharing (CORS), allowing it to be used as AJAX back end even in a cross-domain request setup.

Browsers will normally not allow cross-domain requests for AJAX calls due to security reasons. Modern browsers allow defined exceptions from this same-origin policy rule, provided the client code sets some specific properties in the XMLHttpRequest object. Browsers will then patch requests with some additional headers, and will allow cross-domain requests if the server returns that request with a specifically patched response (see @EXTREF_S{http://en.wikipedia.org/wiki/Cross-origin_resource_sharing, CORS on Wikipedia} for more details.

To support CORS, ArangoDB 1.2 will answer requests that include a HTTP Origin header with an Access-Control-Allow-Origin header with the originally request value. ArangoDB 1.2 will also answer CORS preflight requests that are issued from browsers via the HTTP OPTIONS method. More details on how ArangoDB handles CORS requests can be found in @ref CommunicationCors.

In-memory collections

ArangoDB collections are persistent and collection data is synchronised to disk frequently.

For some specific use cases persistence and synchronisation are overkill. This is true for "unimportant" data that can be reproduced from other sources easily. An example for such data are temporary calculations or caches.

To support managing data that does not require any sort of persistence, ArangoDB 1.2 introduces in-memory collections as an experimental feature.

Experimental means that the feature might not be available on all platforms, and that it might change drastically or even be removed in a future version of ArangoDB.

To create an in-memory collection without persistence, the optional attribute isVolatile can be set to true:

arangosh> db._create("mycache", { isVolatile: true});
[ArangoCollection 316961790811, "mycache" (type document, status loaded)]

The collection can be used normally, and its definition (name, id, type) will remain even after a server restart. However, all data written into in-memory collections is lost when the collection is unloaded or the server is shut down:

arangosh> db.mycache.save({ value : 1 });
{ "error" : false, "_id" : "mycache/316963298139", "_rev" : "316963298139", "_key" : "316963298139" } 

arangosh> db.mycache.count();
1

arangosh> db.mycache.unload();

arangosh> db.mycache.count();
0

So please be careful when employing this type of collection. It should never be used for any critical data that requires persistence and durability.

The benefits of this type of collection are that there is no disk overhead for document operations in it. The actual performance gains over regular collections are highly platform-dependent.

Other Improvements

Remove by Example

ArangoDB 1.2 provides a "Remove by Example" method that can be used to delete all documents from a collection that match the specified example. This can be used to delete multiple documents with a single operation.

The removeByExample method returns the number of documents removed:

arangosh> db._create("mycollection");
arangosh> db.mycollection.save({ value: 1 });
arangosh> db.mycollection.save({ value: 2 });
arangosh> db.mycollection.save({ value: 3 });
arangosh> db.mycollection.save({ value: 1 });

arangosh> db.mycollection.removeByExample({ value: 1 });
2

The number of documents to remove can optionally be limited.

The method is also available via the REST API (@ref HttpSimple).

Replace / Update by Example

ArangoDB 1.2 also provides "Replace by Example" and "Update by Example" methods that can be used to fully or partially update documents from a collection that match the specified example. This can be used to replace document values easily with a single operation.

The replaceByExample and updateByExaple methods return the number of documents modified. Both operations can be limited to a specific number of documents if required.

Both methods are also available via the REST API (@ref HttpSimple).

Collection revision id

ArangoDB 1.2 collections have a revision property. The current revision of a collection can be queried to determine whether data in a collection has changed.

The information can be used by clients to invalidate caches etc. Here's an example:

arangosh> db._create("mycollection");
[ArangoCollection 316964608859, "mycollection" (type document, status loaded)]

arangosh> db.mycollection.save({ value : 1 });
{ "error" : false, "_id" : "mycollection/316966312795", "_rev" : "316966312795", "_key" : "316966312795" }

arangosh> db.mycollection.revision();
316966312795

arangosh> db.mycollection.save({ value : 2 });
{ "error" : false, "_id" : "mycollection/316966378331", "_rev" : "316966378331", "_key" : "316966378331" }

arangosh> db.mycollection.revision();
316966378331

arangosh> db.mycollection.remove("316966378331");
true

arangosh> db.mycollection.revision();
316966443867

The revision value returned should be treated as an opaque string. Clients should only use the value to check whether it is different from a previous revision value.

The revision value is also available via the REST API (@ref HttpCollection).

Number of shapes

In some cases it is useful to determine how many shapes are present in a specific collection ("shape" is ArangoDB lingo for a document format, consisting of attribute names and data types).

Though shapes are implicitly managed by ArangoDB and users can not create or delete shapes, the number of shapes can give a useful hint on how much variety the documents in a collection have or had.

The number of shapes is returned by a collection's figures() method, and is also provided in the return value of the figures REST API call (@ref HttpCollection).

Edges index

Insertion into an edges index is faster than in ArangoDB 1.1. This speeds up inserting documents into an edges collection, or loading an edges collection. The benefits will only be noticeable if the collection is big.

Allow importing of JSON lists

arangoimp and the REST bulk import API (@ref HttpImport) of ArangoDB allow mass importing documents from a file in these formats:

individual JSON documents: each line in the input file is a standalone JSON document: { _key: "key1", value: 1 } { _key: "key2", value: 2 } ...
CSV: the input file must have a header line which defines the attribute names. All following lines are considered to be the attribute values
TSV: like CSV, but not using a comma as the separator but tabs

ArangoDB 1.2 also allows to import an input file that contains a JSON list of JSON documents as follows:

[
  { _key: "key1", value: 1 },
  { _key: "key2", value: 2 },
  ...
]

Note that there is a size restriction for this format. The whole input file needs to fit into a single buffer, or arangoimp will refuse to import the data.

The buffer size can be adjusted when invoking arangoimp by setting the --max-upload-size option to an appropriate value. To set the buffer to 16 MB:

> arangoimp --file "input.json" --type json --max-upload-size 16777216 --collection myimport --create-collection true

Importing into edge collections

ArangoDB 1.2 also allows to import documents into edge collections. Previous versions only supported importing documents into document collections.

When importing data into edge collections, it is mandatory that all imported documents contain the _from and _to attributes containing valid references.

Logging of import failures

When documents are imported via the REST bulk import API (@ref HttpImport), ArangoDB returns an aggregate response containing the number of documents inserted plus the number of errors that occurred during the import.

ArangoDB 1.2 supplies additional information about the errors that occurred in the server log files. This information can be used to find documents that could not be imported (for example, because of a unique constraint violation, malformed data etc.).

Pretty printing in arangosh

arangosh now uses colorised output by default, which makes all of its output more easy to grasp.

Note that colorised output may not be supported on all platforms.

Deactivatable statistics gathering

ArangoDB normally includes functionality to automatically gather request and connection-related statistics. Gathering these statistics happens automatically in a background thread. Collecting statistics can be quite CPU-intensive, especially when multiple ArangoDB instances are run on the same physical host.

Statistics features can be turned off at compile time using the --enable-figures=no configure option. However, this option affects compilation and thus was not available when using any of th precompiled packages.

For those, ArangoDB 1.2 now provides a startup option --server.disable-statistics that allows turning off the statistics gathering for performance reasons even with an already compiled binary.

Usage is:

> arangod --server.disable-statistics true ...

@BNAVIGATE_NewFeatures12

20 KiB Raw Blame History