16 KiB
New Features in ArangoDB 1.1
@NAVIGATE_NewFeatures11 @EMBEDTOC{NewFeatures11TOC}
Features and Improvements
The following list shows in detail which features have been added or improved in ArangoDB 1.1. Additionally, ArangoDB 1.1 contains a lot of bugfixes that are not listed separately.
Collection Types
In ArangoDB 1.1, collections are now explicitly typed:
- regular documents go into document-only collections,
- and edges go into edge collections.
In 1.0, collections were untyped, and edges and documents could be
mixed in the same collection. Whether or not a collection was to be
treated as an edge or document collection was decided at runtime
by looking at the prefix used (e.g. db.xxx
vs. edges.xxx
).
The explicit collection types used in ArangoDB allow users to query the collection type at runtime and make decisions based on the type:
arangosh> db.users.type();
Extra Javascript functions have been introduced to create collections:
arangosh> db._createDocumentCollection("users");
arangosh> db._createEdgeCollection("relationships");
The "traditional" functions are still available:
arangosh> db._create("users");
arangosh> edges._create("relationships");
The ArangoDB web interface also allows the explicit creation of edge collections.
Batch Requests
ArangoDB 1.1 provides a new REST API for batch requests at
/_api/batch
.
Clients can use the API to send multiple requests to ArangoDB at once. They can package multiple requests into just one aggregated request.
ArangoDB will then unpack the aggregated request and process the contained requests one-by-one. When done it will send an aggregated response to the client, that the client can then unpack to get the list of individual responses.
Using the batch request API may save network overhead because it reduces the number of HTTP requests and responses that clients and ArangoDB need to exchange. This may be especially important if the network is slow or if the individual requests are small and the network overhead per request would be significant.
It should be noted that packing multiple individual requests into one aggregate request on the client side introduces some overhead itself. The same is true for the aggregate request unpacking and assembling on the server side. Using batch requests may still be beneficial in many cases, but it should be obvious that they should be used only when they replace a considerable amount of individual requests.
For more information see @ref HttpBatch and @EXTREF{http://www.arangodb.org/2012/10/04/gain-factor-of-5-using-batch-updates,this blog article}.
Support for Partial Updates
The REST API for documents now offers the HTTP PATCH method to partially update documents. A partial update allows specifying only the attributes the change instead of the full document. Internally, it will merge the supplied attributes into the existing document.
Completely overwriting/replacing entire documents is still available via the HTTP PUT method in ArangoDB 1.0. In arangosh, the partial update method is named update, and the previously existing replace method still performs a replacement of the entire document as before.
This call with replace just the active
attribute of the document
user
. All other attributes will remain unmodified. The document
revision number will of course be updated as updating creates a new
revision:
arangosh> db.users.update(user, { "active" : false });
Contrary, the replace
method will replace the entire existing
document with the data supplied. All other attributes will be
removed. Replacing will also create a new revision:
arangosh> db.users.replace(user, { "active" : false });
For more information, please check @ref RestDocument.
AQL Improvements
The following functions have been added or extended in the ArangoDB Query Language (AQL) in ArangoDB 1.1:
MERGE_RECURSIVE()
: new function that merges documents recursively. Especially, it will merge sub-attributes, a functionality not provided by the previously existingMERGE()
function.NOT_NULL()
: now works with any number of arguments and returns the first non-null argument. If all arguments arenull
, the function will returnnull
, too.FIRST_LIST()
: new function that returns the first argument that is a list, andnull
if none of the arguments are lists.FIRST_DOCUMENT()
: new function that returns the first argument that is a document, andnull
if none of the arguments are documents.TO_LIST()
: converts the argument into a list.
Disk Synchronisation Improvements
Synchronisation of Shape Data
ArangoDB 1.1 provides an option --database.force-sync-shapes
that
controls whether shape data (information about document attriubte
names and attribute value types) should be synchronised to disk
directly after each write, or if synchronisation is allowed to happen
asynchronously. The latter options allows ArangoDB to return faster
from operations that involve new document shapes.
In ArangoDB 1.0, shape information was always synchronised to disk,
and users did not have any options. The default value of
--database.force-sync-shapes
in ArangoDB 1.1 is true
so it is
fully compatible with ArangoDB 1.0. However, in ArangoDB 1.1 the
direct synchronisation can be turned off by setting the value to
false
. Direct synchronisation of shape data will then be disabled
for collections that have a waitForSync
value of false
. Shape
data will always be synchronised directly for collections that have a
waitForSync
value of true
.
Still, ArangoDB 1.1 may need to perform less synchronisation when it writes shape data (attribute names and attribute value types of collection documents).
Users may benefit if they save documents with many different structures (in terms of document attribute names and attribute value types) in the same collection. If only small amounts of distinct document shapes are used, the effect will not be noticable.
Finer Control of Disk Sync Behavior for CRUD operations
ArangoDB stores all document data in memory-mapped files. When adding new documents, updating existing documents or deleting documents, these modifications are appended at the end of the currently used memory-mapped datafile.
It is configurable whether ArangoDB should directly respond then and
synchronise the changes to disk asynchronously, or if it should force
the synchronisation before responding. The parameter to control this
is named waitForSync
and can be set on a per-collection level.
Often, sychronisation is not required on collection level, but on operation level. ArangoDB 1.1 tries to improve on this by providing extra parameters for the REST and Javascript document and edge modification operations.
This parameter can be used to force synchronisation for operations
that work on collections that have waitForSync
set to false
.
The following REST API methods support the parameter waitForSync
to
force synchronisation:
POST /_api/document
: adding a documentPOST /_api/edge
: adding an edgePATCH /_api/document
: partially update a documentPATCH /_api/edge
: partially update an edgePUT /_api/document
: replace a documentPUT /_api/edge
: replace an edgeDELETE /_api/document
: delete a documentDELETE /_api/edge
: delete an edge
If the waitForSync
parameter is omitted or set to false
, the
collection-level synchronisation behavior will be applied. Setting the
parameter to true
will force synchronisation.
The following Javascript methods support forcing synchronisation, too:
save()
update()
relace()
delete()
Force synchronisation of a save operation:
> db.users.save({"name":"foo"}, true);
If the second parameter is omitted or set to false
, the
collection-level synchronisation behavior will be applied. Setting the
parameter to true
will force synchronisation.
Server Statistics
ArangoDB 1.1 allows querying the server status via the administration front-end or via REST API methods.
The following methods are available:
GET /_admin/connection-statistics
: provides connection statisticsGET /_admin/request-statistics
: provides request statistics
Both methods return the current figures and historical values. The historical figures are aggregated. They can be used to monitor the current server status as well as to get an overview of how the figures developed over time and look for trends.
The ArangoDB web interface is using these APIs to provide charts with the server connection statistics figures. It has a new tab "Statistics" for this purpose.
For more information on the APIs, please refer to @S_EXTREF_S{http://www.arangodb.org/manuals/1.1/HttpSystem.html#HttpSystemConnectionStatistics,HttpSystemConnectionStatistics} and @S_EXTREF{http://www.arangodb.org/manuals/1.1/HttpSystem.html#HttpSystemRequestStatistics,HttpSystemRequestStatistics}.
Endpoints and SSL support
ArangoDB can now listen to incoming connections on one or many "endpoint" of different types. In ArangoDB lingo, an endpoint is the combination of a protocol and some configuration for it.
The currently supported protocol types are:
tcp
: for unencrypted connection over TCP/IPssl
: for secure connections using SSL over TCP/IPunix
: for connections over Unix domain sockets
You should note that the data transferred inside the protocol is still HTTP, regardless of the chosen protocol. The endpoint protocol can thus be understood as the envelope that all HTTP communication is shipped inside.
To specify an endpoint, ArangoDB 1.1 introduces a new option
--server.endpoint
. The values accepted by this option have the
following specification syntax:
tcp://host:port (HTTP over IPv4)
tcp://[host]:port (HTTP over IPv6)
ssl://host:port (HTTP over SSL-encrypted IPv4)
ssl://[host]:port (HTTP over SSL-encrypted IPv6)
unix:///path/to/socket (HTTP over Unix domain socket)
TCP endpoints
The configuration options for the tcp
endpoint type are hostname/ip
address and an optional port number. If the port is ommitted, the
default port number of 8529 is used.
To make the server listen to connections coming in for IP 192.168.173.13 on TCP/IP port 8529:
> bin/arangod --server.endpoint tcp://192.168.173.13:8529
To make the server listen to connections coming in for IP 127.0.0.1 TCP/IP port 999:
> bin/arangod --server.endpoint tcp://127.0.0.1:999
SSL endpoints
SSL endpoints can be used for secure, encrypted connections to ArangoDB. The connection is secured using SSL. SSL is computationally intensive so using it will result in an (unavoidable) performance degradation when compared to plain-text requests.
The configuration options for the ssl
endpoint type are the same as
for tcp
endpoints.
To make the server listen to SSL connections coming in for IP 192.168.173.13 on TCP/IP port 8529:
> bin/arangod --server.endpoint ssl://192.168.173.13:8529
As multiple endpoints can be configured, ArangoDB can serve SSL and non-SSL requests in parallel, provided they use different ports:
> bin/arangod --server.endpoint tcp://192.168.173.13:8529 --server.endpoint ssl://192.168.173.13:8530
Unix domain socket endpoints
The unix
endpoint type can only be used if clients are on the same
host as the arangod server. Connections will then be estabished
using a Unix domain socket, which is backed by a socket descriptor
file. This type of connection should be slightly more efficient than
TCP/IP.
The configuration option for a unix
endpoint type is the socket
descriptor filename:
To make the server use a Unix domain socket with filename
/var/run/arango.sock
:
> bin/arangod --server.endpoint unix:///var/run/arango.sock
Improved HTTP Request Handling
Error Handling
ArangoDB 1.1 better handles malformed HTTP requests than ArangoDB 1.0 did. When it encounters an invalid HTTP request, it might answer with some HTTP status codes that ArangoDB 1.0 did not use:
HTTP 411 Length Required
will be returned for requests that have a negative value in theirContent-Length
HTTP header.HTTP 413 Request Entity Too Large
will be returned for too big requests. The maximum size is 512 MB at the moment.HTTP 431 Request Header Field Too Large
will be returned for requests with too long HTTP headers. The maximum size per header field is 1 MB at the moment.
For requests that are not completely shipped from the client to the server, the server will allow the client 90 seconds of time before closing the dangling connection.
If the Content-Length
HTTP header in an incoming request is set and
contains a value that is less than the length of the HTTP body sent,
the server will return a HTTP 400 Bad Request
.
Keep-Alive
In version 1.1, ArangoDB will behave as follows when it comes to HTTP keep-alive:
- if a client sends a
Connection: close
HTTP header, the server will close the connection as requested - if a client sends a
Connection: keep-alive
HTTP header, the server will not close the connection but keep it alive as requested - if a client does not send any
Connection
HTTP header, the server will assume keep-alive if the request was an HTTP/1.1 request, and close if the request was an HTTP/1.0 request - dangling keep-alive connections will be closed automatically by the
server after a configurable amount of seconds. To adjust the value,
use the new server option
--server.keep-alive-timeout
. - Keep-alive can be turned off in ArangoDB by setting
--server.keep-alive-timeout
to a value of0
.
Configurable Backlog
ArangoDB 1.1 adds an option --server.backlog-size
to configure the
system backlog size. The backlog size controls the maximum number of
queued connections and is used by the listen() system call.
The default value in ArangoDB is 10, the maximum value is platform-dependent.
Using V8 Options
To use arbitrary options the V8 engine provides, ArangoDB 1.1
introduces a new startup option --javascript.v8-options
. All
options that shall be passed to V8 without being interpreted by
ArangoDB can be put inside this option. ArangoDB itself will ignore
these options and will let V8 handle them. It is up to V8 to handle
these options and complain about wrong options. In case of invalid
options, V8 may refuse to start, which will also abort the startup of
ArangoDB.
To get a list of all options that the V8 engine in the currently used
version of ArangoDB supports, you can use the value --help
, which
will just be passed through to V8:
> bin/arangod --javascript.v8-options "--help" /tmp/voctest
Other Improvements
Smaller Hash Indexes
Some internal structures have been adjusted in ArangoDB 1.1 so that hash index entries consume considerably less memory.
Installations may benefit if they use unique or non-unqiue hash indexes on collections.
arangoimp
arangoimp now allows specifiying the end-of-line (EOL) character of
the input file. This allows better support for files created on
Windows systems with \r\n
EOLs.
arangoimp also supports importing input files in TSV format. TSV is a simple separated format such as CSV, but with the tab character as the separator, no quoting for values and thus no support for line breaks inside the values.
libicu
ArangoDB uses ICU - International Components for Unicode (icu-project.org) for string sorting and string normalization.
ArangoDB 1.1 adds the option --default-language
to select a locale for
sorting and comparing strings. The default locale is set to be the system
locale on that platform.
@BNAVIGATE_NewFeatures11