23 KiB

Raw Blame History

New Features in ArangoDB 1.4

@NAVIGATE_NewFeatures14 @EMBEDTOC{NewFeatures14TOC}

Features and Improvements

The following list shows in detail which features have been added or improved in ArangoDB 1.4. ArangoDB 1.4 also contains several bugfixes that are not listed here - see the @EXTREF{https://github.com/triAGENS/ArangoDB/blob/devel/CHANGELOG,CHANGELOG} for details.

Foxx

Foxx Improvements

Foxx has now left the feature preview state and is now a full feature. It is not backwards compatible with the feature preview. Please check the Foxx documentation for details. Some of the most important pitfalls when upgrading:

Foxx.Application is now Foxx.Controller
In your manifest file: apps should now be controllers

All future changes will be handled with deprecations beforehand.

There are a lot of new features including JSDoc Style annotations, an authentication module, enhanced standard Repository, new route annotations and a possibility to annotate all routes of a controller at once.

Foxx Manager

ArangoDB builds now ship with an additional binary foxx-manager that can be used to conveniently manage Foxx application in an ArangoDB instance. To make it easy to start with Foxx, the foxx-manager provides commands to download example Foxx applications from a central repository at https://github.com/arangodb/foxx-apps, and install them locally:

> foxx-manager update
> foxx-manager install hello-foxx /hello-foxx

This will first mirror the central list of available Foxx applications locally, and after that make the hello-foxx application available in the local ArangoDB database under the URL prefix /hello-foxx.

Multiple Databases

Traditionally ArangoDB provided collections, but no way of grouping them. When an ArangoDB server was used for multiple applications, there could be collection name clashes, so this was worked around by prefixing collection names with unique application names etc.

Since version 1.4, ArangoDB provides databases as the highest level grouping element. Multiple databases can exist in parallel, and each collection belongs to exactly one database. Collection names need to be unique within their database, but not globally. Thus it is now possible to use the same collection name in different databases.

Individual databases are kept relatively separate by ArangoDB. For example, every databases has its own system collections (e.g. _users, _replication). This allows setting up different users for different databases. Replication is also configured on a per-database level. Other data stored in system collections such as graphs and AQL user function definitions are also database-specific.

Server-side actions (including transactions and Foxx) also belong to exactly one database. They are only allowed to access data from their own database, and must not switch to another. AQL queries can also access collections in the current database only. There is no way to refer to collections of another than the current database.

There is no intention to offer multi-database or cross-database operations in future releases.

By default, ArangoDB comes with one database, which is named _system. This database will always be there. Installations that upgrade from older versions to 1.4 will have all their collections be moved into the _system database during the upgrade procedure.

The _system database cannot be dropped, though all user-defined collections in it can be dropped normally if required. All database management operations such as creating new databases, dropping databases, and retrieving the list of existing databases can only be carried out from within the _system database. Fetching the server logs (via the /_api/log API) is also restricted the _system database.

All command-line tools (e.g. arangosh, arangoimp, foxx-manager) will connect to the _system database by default. They also provide the option --server.database to connect to a different database, e.g.:

> arangosh --server.database mydb --server.endpoint tcp://127.0.0.1:8529

From within arangosh, the name of the current database can always be retrieved using the db._name() method:

arangosh> db._name(); 
_system

To create a new database, use the db._createDatabase() command (note: you need to be inside the _system database to execute this command):

arangosh> db._createDatabase("mydb");

To retrieve the list of available databases, issue the following command from within arangosh (note: you need to be inside the _system database to execute this command):

arangosh> db._listDatabases();

To switch to a different database, use the db._useDatabase() method:

arangosh> db._useDatabase("mydb");

To remove an entire database with all collections, use the db._dropDatabase() method (note: dropping a database can be executed from within the _system database):

arangosh> db._dropDatabase("mydb");

Databases are dropped asynchronously when all clients have disconnected and all references to the database have been garbage-collected. This is similar to how collections are dropped.

Foxx applications are also active on a per-database level. Each database in an ArangoDB instance can have different Foxx applications installed. To install a Foxx application for a specific database, use the --server.database option when invoking the foxx manager binary:

unix> foxx-manager --server.database mydb update
unix> foxx-manager --server.database mydb install hello-foxx /hello-foxx

Please note that when using the ArangoDB web interface, it will show the results of the currently selected database only. This will in many cases be the _system database. Please check the @ref Upgrading14Databases "section on how to specify a database in the request" to use the web interface with any other than the _system database.

Runtime Endpoint Management

The ArangoDB server can listen for incoming requests on multiple endpoints.

The endpoints are normally specified either in ArangoDB's configuration file or on the command-line, using the @ref CommandLineArangoEndpoint "--server.endpoint" option. In previous versions of ArangoDB, the endpoints could not be changed while the server was running.

In ArangoDB 1.4, the endpoints can also be changed at runtime. Each endpoint can optionally be restricted to a specific database (or a list of databases). This endpoint-to- database mapping allows the usage of different port numbers for different databases.
This may be useful in multi-tenant setups.

A multi-endpoint setup may also be useful to turn on encrypted communication for just specific databases.

Asynchronous Execution

ArangoDB 1.4 provides an additional mechanism for asynchronous request execution. Clients can optionally mark any request as "to be executed asynchronously". Asynchronous execution decouples the execution from the actual server response, allowing clients to continue working earlier without waiting for the operation to finish.

Asynchronous requests are queued on the server and executed by the server as soon as possible. The maximum size of the queue can be bounded by using the startup option --scheduler.maximal-queue-size.

The results of asynchronously executed requests are either dropped immediately by the server (i.e. fire and forget) or are stored in memory for later retrieval by the client. ArangoDB provides an API for clients to retrieve the status of async jobs, and to retrieve their responses.

Please refer to @ref HttpJob for more details.

Dump and Reload Tools

ArangoDB 1.4 comes with two tools for dumping collection data from an ArangoDB database and reloading them later. These tools, arangodump and arangoreload can be used to create and restore logical backups of an ArangoDB database.

arangodump and arangoreload are client tools that connect to a running ArangoDB server. The dump tool will write the collection data into the file system in JSON format. The restore tool can later be used to load the data created with the dump tool, either into the same ArangoDB server instance, or into a different one.

The documentation for the tools can be found here:

@ref DumpManual
@ref RestoreManual

Replication

See @ref UserManualReplication for details.

Web Interface Improvements

The web interface now provides a graph viewer on the Graphs tab. The graph viewer can be used to explore and navigate an existing ArangoDB graph. It supports both graphs in the _graphs system collection as well as user-defined graphs that are composed of an arbitrary vertex and edge collection. Please note that when using ArangoDB's web interface with Internet Explorer (IE), you will need IE version 9 or higher. The graph viewer relies on client-side SVG which is not available in previous versions of IE.

The Dashboard tab in the web interface provides an overview of server figures, which can be adjusted to user needs. New figures are polled by the web interface in a configurable interval, and can be graphed as time-series.

The new API provides the full documentation of ArangoDB's built-in HTTP API. This allows browsing the ArangoDB API locally without the need to refer to the arangodb.org website for documentation.

The Applications tab in the web interface provides an improved overview of installed and available Foxx applications.

The AQL Editor now provides some example queries and allows saving user-defined queries for later reuse.

The Logs tab in the web interface is now available in the _system database only. This prevents the global server logs to be visible from within other than the _system database.

The details view of collections in the web interface now shows more detailed figures and also a collection's available indexes. Indexes can be created and deleted directly from the web interface now.

The web interface now also allows searching for documents based on arbitrary document attribute values and has a document import / upload facility.

Journal File Handling Changes

###Journal Creation

In ArangoDB versions prior to 1.4, a journal file was always created when a collection was created. With the default journal size being 32 MB this has "wasted" a lot of disk space and RAM for collections that were created empty and did not contain any documents.

ArangoDB 1.4 delays the creation of a collection journal file until the first write in the collection. If no journal is present when the first write is performed, a journal is created on the fly. The same happens if a collection has no journal due to a previous journal file rotation. The next write in the collection will create a new journal then.

This means that from now on there may be cases when a collection intentionally has no journal. This should not affect end users except if they work on the collection files directly with tools other than ArangoDB. And end users might benefit from ArangoDB using less disk space for empty collections.

###Explicit Journal Rotation

Collections now also have a method named rotate, which can be called by clients to explicitly close a collection's active journal. Explicitly closing a journal is not necessary for normal operations, but as it turns the journal into a read-only datafile, data from the journal becomes visible for the compaction. The compaction will always ignore data in the current journal, and only data from datafiles will be compacted. Explicitly closing the datafile and turning it into a journal can thus be used to have data compacted if it is known that the last journal contained a lot of data that has become superfluous due to later updates or deletions.

For example, calling rotate may make sense in the following situation:

// create a collection and insert some data
var test = db._create("test");
for (var i = 0; i < 10000; ++i) {
  test.save({ _key: "test" + i, value: "abcdefg" }); 
}

...
// the following makes the collection empty, but "wasted" space will not yet be compacted
test.truncate(); 

// to make data in the journal become visible for the compaction, explicitly 
// rotate the journal:
test.rotate();

The actual gains that can be achieved by forcing rotation vary on journal size, document sizes, performed operations and even the platform type, but in some situations the storage savings can be enormous as can be seen in the following example:

var show = function (what, collection) { 
  var figures = collection.figures(); 
  require("internal").print(what, figures.datafiles.fileSize + figures.journals.fileSize);
};

var test = db._create("test");
for (var i = 0; i < 10000; ++i) {
  test.save({ _key: "test" + i, value: "abcdefg" }); 
}

show("before truncate", test);
test.truncate();
show("after truncate", test);
test.rotate();

// wait for compaction
require("internal").wait(30);
show("after rotate", test);

The results for the above on some 64 bit test machine are:

before truncate 33554432
after truncate 33554432
after rotate 184

Using rotate may also be useful when data in a collection is known to not change in the immediate future. After having completed all write operations on a collection, performing a rotate will reduce the size of the current journal to the actually required size (journals are pre-allocated with a specific size when they are created) before making the journal a datafile. Thus rotate may cause disk space savings, even if the datafiles does not qualify for being compacted after the rotation.

Journal rotation is executed asynchronously, meaning a call to rotate might return before the rotation was executed.

The rotate function is also exposed via the collection REST API via PUT /_api/collection/<name>/rotate.

###Compaction

The collection compaction process in 1.4 may now merge smaller datafiles together. This can reduce the number of file descriptors needed for a collection that has undergone a lot of compaction runs.

###Less Syncing during Creation

When collections are created, less sync system calls are executed when creating the initial shapes for a collection. A collection is also created without a journal since 1.4 if it is not populated instantly. This may reduce the overhead of creating a collection in 1.4 compared to 1.3 and earlier.

AQL Improvements

The following functions have been added to AQL:

UNION_DISTINCT(list1, list2, ...): returns the union of distinct values of all lists specified. The function expects at least two list values as its arguments. The result is a list of values in an undefined order.
MINUS(list1, list2, ...): returns the difference of all lists specified. The function expects at least two list values as its arguments. The result is a list of values that occur in the first list but not in any of the subsequent lists. The order of the result list is undefined and should not be relied on. Note: duplicates will be removed.
INTERSECTION(list1, list2, ...): returns the intersection of all lists specified. The function expects at least two list values as its arguments. The result is a list of values that occur in all arguments. The order of the result list is undefined and should not be relied on. Note: duplicates will be removed.

AQL query parse error messages now indicate the error position (line/column). If an unknown collection is referenced in an AQL query, this is now more clearly stated in the errors message.

AQL also has support for single-line comments now. Single-line comments are initiated with a // and terminate automatically at line end.

AQL's LIMIT clause can now be used with bind parameters.

When executing an AQL query via an HTTP call (using the POST /_api/cursor API), it is now possible to make the query return the number of documents before applying a LIMIT. This can be achieved by providing the fullCount attribute in the options attribute. See @ref HttpCursor for more details.

Collection Functions

###Cap Constraints

Cap constraints in ArangoDB can now be used to limit the number as well as the total size of active documents in a collection. The size can be specified in bytes. This can be used to keep a collection from growing endlessly.

The arguments for creating a cap constraint are now:

collection.ensureCapConstraint(<count>, <byteSize>);

The following examples show how the constraints be employed in isolation or together:

collection.ensureCapConstraint(100, 0);           // limit only number of documents
collection.ensureCapConstraint(100);              // same

collection.ensureCapConstraint(0, 1024 * 1024);   // limit only byte size
collection.ensureCapConstraint(500, 1024 * 1024); // limit both count and byte size.

If two constraints are defined, the first met constraint will trigger the automatic deletion of "old" documents from the collection.

###Exists Function

ArangoDB 1.4 now provides exists methods on a collection and the database level. The function can be used to quickly check if a document with a specific key or revision is present.

###Checksum Function

ArangoDB provides a checksum method for collections. This method can be used to calculate a hash value of the keys and optionally the document data in a collection. The obtained hash value can be used to check

if data in a collection has changed over time: this can be achieved by calculating the checksum of the collection at two different points in time and comparing them. If nothing has changed, the two checksums should be identical).
if two collections contain the same keys (and or revisions) and optionally document data

The checksum function is also exposed via the collection REST API at GET /_api/collection/<name>/checksum.

###First and Last Functions

ArangoDB 1.4 collections provide methods first and last to retrieve the oldest and newest documents from a collection easily. The "age" of documents in a collection is determined by document insertion or update time.

Using first and last, it is possible to use collections as a LIFO stacks or FIFO queues.

###Extended Figures

The figures method in ArangoDB 1.4 now also includes figures for the shape files owned by a collection. These additional values can be used to better judge the actual disk space usage of a collection. In pre-1.4 ArangoDB, the figures call did not include the size of shape datafiles for a collection.

Miscellaneous Improvements

ArangoDB 1.4 now provides a REST API to execute server-side traversals with custom traversal functions. The API is described @ref HttpTraversals "here".

The bulk import API now provides a complete URL parameter that can be used to control the behaviour when at least one document cannot be imported. Setting complete to true will abort the whole import and roll back any already imported documents. Setting it to false or omitting it will make the import continue importing documents even if some documents could not be imported. This is also the behaviour that previous ArangoDB versions exposed.

Command-Line Options added

Options can be specified on the command line or in configuration files. If a string @VARIABLE@ occurs in the value, it is replaced by the corresponding environment variable.

The following command-line options have been added for arangod in ArangoDB 1.4:

--server.allow-method-override: this option can be set to allow overriding the HTTP request method in a request using one of the following custom headers:
- x-http-method-override
- x-http-method
- x-method-override Using this option allows bypassing proxies and tools that would otherwise just let certain types of requests pass. Enabling this option may impose a security risk, so it should only be used in very controlled environments. The default value for this option is false (no method overriding allowed).

--server.default-api-compatibility: this option can be used to determine the compatibility of (some) server API return values. The value for this parameter is a server version number calculated as follows: 10000 * major + 100 * minor (e.g. 10400 for ArangoDB 1.3). The default value is 10400 (1.4), the minimum allowed value is 10300 (1.3).

When setting this option to a value lower than the current server version, the server might respond with old-style results to "old" clients, increasing compatibility with "old" (non-up-to-date) clients. In ArangoDB 1.4.0, this option mainly affects the style of the returned location headers:

When set to 10300, the returned location headers will not include the database name. When set to 10400 or higher, the location headers returned will also include the database name.

The default value for this option is 10400 (ArangoDB 1.4), so ArangoDB will return the new-style location headers (including database name) by default. If you use a non-1.4 compatible ArangoDB client driver, you may set this option to make ArangoDB return the old-style headers.

--scheduler.maximal-queue-size: limits the size of the asynchronous request execution queue. Please have a look at @ref NewFeatures14Async for more details.
--server.authenticate-system-only: require authentication only for requests going to internal APIs, that is URLs starting with /_ (e.g. /_api/...).
--server.disable-replication-applier: start the server with all replication appliers disabled. This might be useful if you want to fix a replication setup problem.
--server.disable-replication-logger: start the server with all replication loggers disabled. This might be useful if you want to fix a replication setup problem.
--log.requests-file: log incoming HTTP requests to a file
--log.content-filter: restrict log output to messages containing a specific string only. This is a debugging option.
--log.source-filter (renamed from --log.filter): restrict trace/debug log output to messages originated by specific C/C++ source files. This is a debugging option.

The following command-line options have been added for arangosh in 1.4:

--server.database: allows specifying the database name for the connection
--prompt: allows setting a user-defined prompt in arangosh. A %d in the prompt will be replaced with the name of the current database.

The following command-line options have been added for arangoimp in 1.4:

--server.database: allows specifying the database name for the connection

@BNAVIGATE_NewFeatures14

23 KiB Raw Blame History