1
0
Fork 0
arangodb/Documentation/Books/Manual/ReleaseNotes/NewFeatures32.md

329 lines
14 KiB
Markdown

Features and Improvements
=========================
The following list shows in detail which features have been added or improved in
ArangoDB 3.2. ArangoDB 3.2 also contains several bugfixes that are not listed
here.
Storage engines
---------------
ArangoDB 3.2 offers two storage engines:
* the always-existing memory-mapped files storage engine
* a new storage engine based on [RocksDB](https://www.github.com/facebook/rocksdb/)
### Memory-mapped files storage engine (MMFiles)
The former storage engine (named MMFiles engine henceforth) persists data in memory-mapped
files.
Any data changes are done first in the engine's write-ahead log (WAL). The WAL
is replayed after a crash so the engine offers durability and crash-safety. Data
from the WAL is eventually moved to collection-specific datafiles. The files are
always written in an append-only fashion, so data in files is never overwritten.
Obsolete data in files will eventually be purged by background compaction threads.
Most of this engine's indexes are built in RAM. When a collection is loaded, this requires
rebuilding the indexes in RAM from the data stored on disk. The MMFiles engine has
collection-level locking.
This storage engine is a good choice when data (including the indexes) can fit in the
server's available RAM. If the size of data plus the in-memory indexes exceeds the size
of the available RAM, then this engine may try to allocate more memory than available.
This will either make the operating system swap out parts of the data (and cause disk I/O)
or, when no swap space is configured, invoke the operating system's out-of-memory process
killer.
The locking strategy allows parallel reads and is often good enough in read-mostly
workloads. Writes need exclusive locks on the collections, so they can block other
operations in the same collection. The locking strategy also provides transactional consistency
and isolation.
### RocksDB storage engine
The RocksDB storage engine is new in ArangoDB 3.2. It is designed to store datasets
that are bigger than the server's available RAM. It persists all data (including the
indexes) in a RocksDB instance.
That means any document read or write operations will be answered by RocksDB under the
hood. RocksDB will serve the data from its own in-RAM caches or from disk.
The RocksDB engine has a write-ahead log (WAL) and uses background threads for compaction.
It supports data compression.
The RocksDB storage engine has document-level locking. Read operations do not block and
are never blocked by other operations. Write operations only block writes on the same
documents/index values. Because multiple writers can operate in parallel on the same
collection, there is the possibility of write-write conflicts. If such write conflict
is detected, one of the write operations is aborted with error 1200 ("conflict").
Client applications can then either abort the operation or retry, based on the required
consistency semantics.
### Storage engine selection
The storage engine to use in an ArangoDB cluster or a single-server instance must be
selected initially. The default storage engine in ArangoDB 3.2 is the MMFiles engine if
no storage engine is selected explicitly. This ensures all users upgrading from earlier
versions can continue with the well-known MMFiles engine.
To select the storage-engine, there is the configuration option `--server.storage-engine`.
It can be set to either `mmfiles`, `rocksdb` or `auto`. While the first two values will
explicitly select a storage engine, the `auto` option will automatically choose the
storage engine based on which storage engine was previously selected. If no engine was
selected previously, `auto` will select the MMFiles engine. If an engine was previously
selected, the selection will be written to a file `ENGINE` in the server's database
directory and will be read from there at any subsequent server starts.
Once the storage engine was selected, the selection cannot be changed by adjusting
`--server.storage-engine`. In order to switch to another storage engine, it is required
to re-start the server with another (empty) database directory. In order to use data
created with the other storage engine, it is required to dump the data first with the
old engine and restore it using the new storage engine. This can be achieved via
invoking arangodump and arangorestore.
Unlike in MySQL, the storage engine selection in ArangoDB is for an entire cluster or
an entire single-server instance. All databases and collections will use the same storage
engine.
### RocksDB storage engine: known issues
The RocksDB storage engine in this release has a few known issues and missing features.
These will be resolved in the following releases:
* index selectivity estimates are missing. All indexes will report their selectivity
estimate as `0.2`. This may lead to non-optimal indexes being used in a query.
* the geo index is not yet implemented
* the number of documents reported for collections (`db.<collection>.count()`) may be
slightly wrong during transactions
* transactions are de facto limited in size, but no size restriction is currently
enforced. These restrictions will be implemented in a future release
* the engine is not yet performance-optimized and well configured
* the datafile debugger (arango-dfdb) cannot be used with this storage engine
### RocksDB storage engine: supported index types
The existing indexes in the RocksDB engine are all persistent. The following indexes are
supported there:
* primary: this type of index is automatically created. It indexes `_id` / `_key`
* edge: this index is automatically created for edge collections. It indexes
`_from` and `_to`
* hash, skiplist, persistent: these are user-defined indexes, Despite their names, they are
neither hash nor skiplist indexes. These index types map to the same RocksDB-based
sorted index implementation. The same is true for the "persistent" index. The names
"hash", "skiplist" and "persistent" are only used for compatibility with the MMFiles
engine where these indexes existed in previous and the current version of ArangoDB.
* fulltext: user-defined sorted reverted index on words occurring in documents
Memory management
-----------------
* added startup options `--vm.resident-limit` and `--vm.path` for file-backed
memory mapping after reaching a configurable maximum RAM size
This prevents ArangoDB from using all available RAM when using large datasets.
This will also lower the chances of the arangod process being killed by the
operation system's OOM killer.
Note: these options are not available in all builds and environments.
* make arangod start with less V8 JavaScript contexts
This speeds up the server start and makes arangod use less memory at start.
Whenever a V8 context is needed by a Foxx action or some other JavaScript operation
and there is no usable V8 context, a new context will be created dynamically now.
Up to `--javascript.v8-contexts` V8 contexts will be created, so this option
will change its meaning. Previously as many V8 contexts as specified by this
option were created at server start, and the number of V8 contexts did not
change at runtime. Now up to this number of V8 contexts will be in use at the
same time, but the actual number of V8 contexts is dynamic.
The garbage collector thread will automatically delete unused V8 contexts after
a while. The number of spare contexts will go down to as few as configured in
the new option `--javascript.v8-contexts-minimum`. Actually that many V8 contexts
are also created at server start.
The first few requests in new V8 contexts may take longer than in contexts
that have been there already. Performance may therefore suffer a bit for the
initial requests sent to ArangoDB or when there are only few but performance-
critical situations in which new V8 contexts need to be created. If this is a
concern, it can easily be fixed by setting `--javascipt.v8-contexts-minimum`
and `--javascript.v8-contexts` to a relatively high value, which will guarantee
that many number of V8 contexts to be created at startup and kept around even
when unused.
Waiting for an unused V8 context will now also abort and write a log message
in case no V8 context can be acquired/created after 60 seconds.
* the number of pending operations in arangod can now be limited to a configurable
number. If this number is exceeded, the server will now respond with HTTP 503
(service unavailable). The maximum size of pending operations is controlled via
the startup option `--server.maximal-queue-size`. Setting it to 0 means "no limit".
* the in-memory document revisions cache was removed entirely because it did not
provide the expected benefits. The 3.1 implementation shadowed document data in
RAM, which increased the server's RAM usage but did not speed up document lookups
too much.
This also obsoletes the startup options `--database.revision-cache-chunk-size` and
`--database.revision-cache-target-size`.
The MMFiles engine now does not use a document revisions cache but has in-memory
indexes and maps documents to RAM automatically via mmap when documents are
accessed. The RocksDB engine has its own mechanism for caching accessed documents.
Communication Layer
-------------------
* HTTP responses returned by arangod will now include the extra HTTP header
`x-content-type-options: nosniff` to work around a cross-site scripting bug
in MSIE
* the default value for `--ssl.protocol` was changed from TLSv1 to TLSv1.2.
When not explicitly set, arangod and all client tools will now use TLSv1.2.
* the JSON data in all incoming HTTP requests in now validated for duplicate
attribute names.
Incoming JSON data with duplicate attribute names will now be rejected as
invalid. Previous versions of ArangoDB only validated the uniqueness of
attribute names inside incoming JSON for some API endpoints, but not
consistently for all APIs.
* Internal JavaScript REST actions will now hide their stack traces to the client
unless in HTTP responses. Instead they will always log to the logfile.
JavaScript
----------
* updated V8 version to 5.7.0.0
* change undocumented behaviour in case of invalid revision ids in
`If-Match` and `If-None-Match` headers from 400 (BAD) to 412 (PRECONDITION
FAILED).
* change default string truncation length from 80 characters to 256 characters for
`print`/`printShell` functions in ArangoShell and arangod. This will emit longer
prefixes of string values before truncating them with `...`, which is helpful
for debugging. This change is mostly useful when using the ArangoShell (arangosh).
Pregel
------
AQL
---
### Optimizer improvements
* Geo indexes are now implicitly and automatically used when using appropriate SORT/FILTER
statements in AQL, without the need to use the somewhat limited special-purpose geo AQL
functions `NEAR` or `WITHIN`.
Compared to using thespecial purpose AQL functions this approach has the
advantage that it is more composable, and will also honor any `LIMIT` values
used in the AQL query.
The special purpose `NEAR` AQL function can now be substituted with the
following AQL (provided there is a geo index present on the `doc.latitude`
and `doc.longitude` attributes):
FOR doc in geoSort
SORT DISTANCE(doc.latitude, doc.longitude, 0, 0)
LIMIT 5
RETURN doc
`WITHIN` can be substituted with the following AQL:
FOR doc in geoFilter
FILTER DISTANCE(doc.latitude, doc.longitude, 0, 0) < 2000
RETURN doc
Note that this will work in the MMFiles engine only.
### Miscellaneous improvements
* the slow query list now contains the values of bind variables used in the
slow queries. Bind variables are also provided for the currently running
queries. This helps debugging slow or blocking queries that use dynamic
collection names via bind parameters.
* AQL breaking change in cluster:
The SHORTEST_PATH statement using edge collection names instead
of a graph names now requires to explicitly name the vertex collection names
within the AQL query in the cluster. It can be done by adding `WITH <name>`
at the beginning of the query.
Example:
```
FOR v,e IN OUTBOUND SHORTEST_PATH @start TO @target edges [...]
```
Now has to be:
```
WITH vertices
FOR v,e IN OUTBOUND SHORTEST_PATH @start TO @target edges [...]
```
This change is due to avoid deadlock sitations in clustered case.
An error stating the above is included.
Client tools
------------
* added data export tool, arangoexport.
arangoexport can be used to export collections to json, jsonl or xml
and export a graph or collections to xgmml.
* added "jsonl" as input file type for arangoimp
* added `--translate` option for arangoimp to translate attribute names from
the input files to attriubte names expected by ArangoDB
The `--translate` option can be specified multiple times (once per translation
to be executed). The following example renames the "id" column from the input
file to "_key", and the "from" column to "_from", and the "to" column to "_to":
arangoimp --type csv --file data.csv --translate "id=_key" --translate "from=_from" --translate "to=_to"
`--translate` works for CSV and TSV inputs only.
* changed default value for client tools option `--server.max-packet-size` from 128 MB
to 256 MB. this allows transferring bigger result sets from the server without the
client tools rejecting them as invalid.
Authentication
--------------
* added [LDAP](../Administration/Configuration/Ldap.md) authentication (Enterprise only)
Foxx
----
* the [cookie session transport](../Foxx/Sessions/Transports/Cookie.md) now supports all options supported by the [cookie method of the response object](../Foxx/Router/Response.md#cookie).
* it's now possible to provide your own version of the `graphql-sync` module when using the [GraphQL extensions for Foxx](../Foxx/GraphQL.md) by passing a copy of the module using the new _graphql_ option.
* custom API endpoints can now be tagged using the [tag method](../Foxx/Router/Endpoints.md#tag) to generate a cleaner Swagger documentation.