1
0
Fork 0
arangodb/Documentation/Books/Manual/ReleaseNotes/NewFeatures35.md

494 lines
21 KiB
Markdown

Features and Improvements in ArangoDB 3.5
=========================================
The following list shows in detail which features have been added or improved in
ArangoDB 3.5. ArangoDB 3.5 also contains several bug fixes that are not listed
here.
AQL
---
### SORT-LIMIT optimization
A new SORT-LIMIT optimization has been added. This optimization will be pulled off
by the query optimizer if there is a SORT statement followed by a LIMIT node, and the
overall number of documents to return is relatively small in relation to the total
number of documents to be sorted. In this case, the optimizer will use a size-constrained
heap for keeping only the required number of results in memory, which can drastically
reduce memory usage and, for some queries, also execution time for the sorting.
If the optimization is applied, it will show as "sort-limit" rule in the query execution
plan.
### Index hints in AQL
Users may now take advantage of the `indexHint` inline query option to override
the internal optimizer decision regarding which index to use to serve content
from a given collection. The index hint works with the named indices feature
above, making it easy to specify which index to use.
### Sorted primary index (RocksDB engine)
The query optimizer can now make use of the sortedness of primary indexes if the
RocksDB engine is used. This means the primary index can be utilized for queries
that sort by either the `_key` or `_id` attributes of a collection and also for
range queries on these attributes.
In the list of documents for a collection in the web interface, the documents will
now always be sorted in lexicographical order of their `_key` values. An exception for
keys representing quasi-numerical values has been removed when doing the sorting in
the web interface. Removing this exception can also speed up the display of the list
of documents.
This change potentially affects the order in which documents are displayed in the
list of documents overview in the web interface. A document with a key value "10" will
now be displayed before a document with a key value of "9". In previous versions of
ArangoDB this was exactly opposite.
### Edge index query optimization (RocksDB engine)
An AQL query that uses the edge index only and returns the opposite side of
the edge can now be executed in a more optimized way, e.g.
FOR edge IN edgeCollection FILTER edge._from == "v/1" RETURN edge._to
is fully covered by the RocksDB edge index.
For MMFiles this rule does not apply.
### AQL syntax improvements
AQL now allows the usage of floating point values without leading zeros, e.g.
`.1234`. Previous versions of ArangoDB required a leading zero in front of
the decimal separator, i.e `0.1234`.
Smart Joins
-----------
The "smart joins" feature available in the ArangoDB Enterprise Edition allows running
joins between two sharded collections with performance close to that of a local join
operation.
The prerequisite for this is that the two collections have an identical sharding setup,
established via the `distributeShardsLike` attribute of one of the collections.
Quick example setup for two collections with identical sharding:
> db._create("products", { numberOfShards: 3, shardKeys: ["_key"] });
> db._create("orders", { distributeShardsLike: "products", shardKeys: ["productId"] });
> db.orders.ensureIndex({ type: "hash", fields: ["productId"] });
Now an AQL query that joins the two collections via their shard keys will benefit from
the smart join optimization, e.g.
FOR p IN products
FOR o IN orders
FILTER p._key == o.productId
RETURN o
In this query's execution plan, the extra hop via the coordinator can be saved
that is normally there for generic joins. Thanks to the smart join optimization,
the query's execution is as simple as:
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
3 EnumerateCollectionNode DBS 9 - FOR o IN orders /* full collection scan, 3 shard(s) */
7 IndexNode DBS 0 - FOR p IN products /* primary index scan, scan only, 3 shard(s) */
10 RemoteNode COOR 0 - REMOTE
11 GatherNode COOR 0 - GATHER
6 ReturnNode COOR 0 - RETURN o
Without the smart join optimization, there will be an extra hop via the
coordinator for shipping the data from each shard of the one collection to
each shard of the other collection, which will be a lot more expensive:
Execution plan:
Id NodeType Site Est. Comment
1 SingletonNode DBS 1 * ROOT
16 IndexNode DBS 3 - FOR p IN products /* primary index scan, index only, projections: `_key`, 3 shard(s) */
14 RemoteNode COOR 3 - REMOTE
15 GatherNode COOR 3 - GATHER
8 ScatterNode COOR 3 - SCATTER
9 RemoteNode DBS 3 - REMOTE
7 IndexNode DBS 3 - FOR o IN orders /* hash index scan, 3 shard(s) */
10 RemoteNode COOR 3 - REMOTE
11 GatherNode COOR 3 - GATHER
6 ReturnNode COOR 3 - RETURN o
In the end, smart joins can optimize away a lot of the inter-node network
requests normally required for performing a join between sharded collections.
The performance advantage of smart joins compared to regular joins will grow
with the number of shards of the underlying collections.
In general, for two collections with `n` shards each, the minimal number of
network requests for the general join (_no_ smart joins optimization) will be
`n * (n + 2)`. The number of network requests increases quadratically with the
number of shards.
Smart joins can get away with a minimal number of `n` requests here, which scales
linearly with the number of shards.
Smart joins will also be especially advantageous for queries that have to ship a lot
of data around for performing the join, but that will filter out most of the data
after the join. In this case smart joins should greatly outperform the general join,
as they will eliminate most of the inter-node data shipping overhead.
Also see the [Smart Joins](../SmartJoins.md) page.
Background Index Creation
-------------------------
Creating new indexes is by default done under an exclusive collection lock. This means
that the collection (or the respective shards) are not available for write operations
as long as the index is created. This "foreground" index creation can be undesirable,
if you have to perform it on a live system without a dedicated maintenance window.
Starting with ArangoDB 3.5, indexes can also be created in "background", not using an
exclusive lock during the entire index creation. The collection remains basically available,
so that other CRUD operations can run on the collection while the index is being created.
This can be achieved by setting the *inBackground* attribute when creating an index.
To create an index in the background in *arangosh* just specify `inBackground: true`,
like in the following example:
```js
db.collection.ensureIndex({ type: "hash", fields: [ "value" ], inBackground: true });
```
Indexes that are still in the build process will not be visible via the ArangoDB APIs.
Nevertheless it is not possible to create the same index twice via the *ensureIndex* API
while an index is still begin created. AQL queries also will not use these indexes until
the index reports back as fully created. Note that the initial *ensureIndex* call or HTTP
request will still block until the index is completely ready. Existing single-threaded
client programs can thus safely set the *inBackground* option to *true* and continue to
work as before.
Should you be building an index in the background you cannot rename or drop the collection.
These operations will block until the index creation is finished. This is equally the case
with foreground indexing.
After an interrupted index build (i.e. due to a server crash) the partially built index
will the removed. In the ArangoDB cluster the index might then be automatically recreated
on affected shards.
Background index creation might be slower than the "foreground" index creation and require
more RAM. Under a write heavy load (specifically many remove, update or replace operations),
the background index creation needs to keep a list of removed documents in RAM. This might
become unsustainable if this list grows to tens of millions of entries.
Building an index is always a write-heavy operation, so it is always a good idea to build
indexes during times with less load.
Please note that background index creation is useful only in combination with the RocksDB
storage engine. With the MMFiles storage engine, creating an index will always block any
other operations on the collection.
TTL (time-to-live) Indexes
--------------------------
The new TTL indexes feature provided by ArangoDB can be used for automatically
removing expired documents from a collection.
TTL indexes support eventual removal of documents which are past a configured
expiration timepoint. The expiration timepoints can be based upon the documents'
original insertion or last-updated timepoints, with adding a period during
which to retain the documents.
Alternatively, expiration timepoints can be specified as absolute values per
document.
It is also possible to exclude documents from automatic expiration and removal.
Please also note that TTL indexes are designed exactly for the purpose of removing
expired documents from collections. It is *not recommended* to rely on TTL indexes
for user-land AQL queries. This is because TTL indexes internally may store a transformed,
always numerical version of the index attribute value even if it was originally passed in
as a datestring. As a result TTL indexes will likely not be used for filtering and sort
operations in user-land AQL queries.
Also see the [TTL Indexes](../Indexing/Ttl.md) page.
HTTP API extensions
-------------------
### Extended index API
The HTTP API for creating indexes at POST `/_api/index` has been extended two-fold:
* to create a TTL (time-to-live) index, it is now possible to specify a value of `ttl`
in the `type` attribute. When creating a TTL index, the attribute `expireAfter` is
also required. That attribute contains the expiration time (in seconds), which is
based on the documents' index attribute value.
* to create an index in background, the attribute `inBackground` can be set to `true`.
### API for querying the responsible shard
The HTTP API for collections has got an additional route for retrieving the responsible
shard for a document at PUT `/_api/collection/<name>/responsibleShard`.
When calling this route, the request body is supposed to contain the document for which
the responsible shard should be determined. The response will contain an attribute `shardId`
containing the ID of the shard that is responsible for that document.
A method `collection.getResponsibleShard(document)` was added to the JS API as well.
It does not matter if the document actually exists or not, as the shard responsibility
is determined from the document's attribute values only.
Please note that this API is only meaningful and available on a cluster coordinator.
### Foxx API for running tests
The HTTP API for running Foxx service tests now supports a `filter` attribute,
which can be used to limit which test cases should be executed.
Web interface
-------------
When using the RocksDB engine, the selection of index types "persistent" and "skiplist"
has been removed from the web interface when creating new indexes.
The index types "hash", "skiplist" and "persistent" are just aliases of each other
when using the RocksDB engine, so there is no need to offer them all.
JavaScript
----------
### V8 updated
The bundled version of the V8 JavaScript engine has been upgraded from 5.7.492.77 to
7.1.302.28.
Among other things, the new version of V8 provides a native JavaScript `BigInt` type which
can be used to store arbitrary-precision integers. However, to store such `BigInt` objects
in ArangoDB, they need to be explicitly converted to either strings or simple JavaScript
numbers.
Converting BigInts to strings for storage is preferred because converting a BigInt to a
simple number may lead to precision loss.
```js
// will fail with "bad parameter" error:
value = BigInt("123456789012345678901234567890");
db.collection.insert({ value });
// will succeed:
db.collection.insert({ value: String(value) });
// will succeed, but lead to precision loss:
db.collection.insert({ value: Number(value) });
```
The new V8 version also changes the default timezone of date strings to be conditional
on whether a time part is included:
```js
> new Date("2019-04-01");
Mon Apr 01 2019 02:00:00 GMT+0200 (Central European Summer Time)
> new Date("2019-04-01T00:00:00");
Mon Apr 01 2019 00:00:00 GMT+0200 (Central European Summer Time)
```
If the timezone is explicitly set in the date string, then the specified timezone will
always be honored:
```js
> new Date("2019-04-01Z");
Mon Apr 01 2019 02:00:00 GMT+0200 (Central European Summer Time)
> new Date("2019-04-01T00:00:00Z");
Mon Apr 01 2019 02:00:00 GMT+0200 (Central European Summer Time)
```
### JavaScript security options
ArangoDB 3.5 provides several new options for restricting the functionality of
JavaScript application code running in the server, with the intent to make a setup
more secure.
There now exist startup options for restricting which environment variables and
values of which configuration options JavaScript code is allowed to read. These
options can be set to prevent leaking of confidential information from the
environment or the setup into the JavaScript application code.
Additionally there are options to restrict outbound HTTP connections from JavaScript
applications to certain endpoints and to restrict filesystem access from JavaScript
applications to certain directories only.
Finally there are startup options to turn off the REST APIs for managing Foxx
services, which can be used to prevent installation and uninstallation of Foxx
applications on a server. A separate option is provided to turn off access and
connections to the central Foxx app store via the web interface.
A complete overview of the security options can be found in [Security Options](../Security/SecurityOptions.md).
### Foxx
Request credentials are now exposed via the `auth` property:
```js
const tokens = context.collection("tokens");
router.get("/authorized", (req, res) => {
if (!req.auth || !req.auth.bearer || !tokens.exists(req.auth.bearer)) {
res.throw(403, "Not authenticated");
}
// ...
});
```
### API improvements
Collections now provide the `documentId` method to derive document ids from keys.
Before:
```js
const collection = context.collection("users");
const documentKey = "my-document-key";
const documentId = `${collection.name()}/${documentKey}`;
```
After:
```js
const collection = context.collection("users");
const documentKey = "my-document-key";
const documentId = collection.documentId(documentKey);
```
Client tools
------------
### Dump and restore all databases
**arangodump** got an option `--all-databases` to make it dump all available databases
instead of just a single database specified via the option `--server.database`.
When set to true, this makes arangodump dump all available databases the current
user has access to. The option `--all-databases` cannot be used in combination with
the option `--server.database`.
When `--all-databases` is used, arangodump will create a subdirectory with the data
of each dumped database. Databases will be dumped one after the after. However,
inside each database, the collections of the database can be dumped in parallel
using multiple threads.
When dumping all databases, the consistency guarantees of arangodump are the same
as when dumping multiple single database individually, so the dump does not provide
cross-database consistency of the data.
**arangorestore** got an option `--all-databases` to make it restore all databases from
inside the subdirectories of the specified dump directory, instead of just the
single database specified via the option `--server.database`.
Using the option for arangorestore only makes sense for dumps created with arangodump
and the `--all-databases` option. As for arangodump, arangorestore cannot be invoked
with the both options `--all-databases` and `--server.database` at the same time.
Additionally, the option `--force-same-database` cannot be used together with
`--all-databases`.
If the to-be-restored databases do not exist on the target server, then restoring data
into them will fail unless the option `--create-database` is also specified for
arangorestore. Please note that in this case a database user must be used that has
access to the `_system` database, in order to create the databases on restore.
### Warning if connected to DBServer
Under normal circumstances there should be no need to connect to a
database server in a cluster with one of the client tools, and it is
likely that any user operations carried out there with one of the client
tools may cause trouble.
The client tools arangosh, arangodump and arangorestore will now emit
a warning when connecting with them to a database server node in a cluster.
Startup option changes
----------------------
The value type of the hidden startup option `--rocksdb.recycle-log-file-num` has
been changed from numeric to boolean in ArangoDB 3.5, as the option is also a
boolean option in the underlying RocksDB library.
Client configurations that use this configuration variable should adjust their
configuration and set this variable to a boolean value instead of to a numeric
value.
Miscellaneous
-------------
### Improved overview of available program options
The `--help-all` command-line option for all ArangoDB executables will now also
show all hidden program options.
Previously hidden program options were only returned when invoking arangod or
a client tool with the cryptic `--help-.` option. Now `--help-all` simply retuns
them as well.
### Fewer system collections
The system collections `_frontend`, `_modules` and `_routing` are not created
anymore for new databases by default.
`_modules` and `_routing` are only needed for legacy functionality.
Existing `_routing` collections will not be touched as they may contain user-defined
entries, and will continue to work.
Existing `_modules` collections will also remain functional.
The `_frontend` collection may still be required for actions triggered by the
web interface, but it will automatically be created lazily if needed.
### Named indices
Indices now have an additional `name` field, which allows for more useful
identifiers. System indices, like the primary and edge indices, have default
names (`primary` and `edge`, respectively). If no `name` value is specified
on index creation, one will be auto-generated (e.g. `idx_13820395`). The index
name _cannot_ be changed after index creation. No two indices on the same
collection may share the same name, but two indices on different collections
may.
### ID values in log messages
By default, ArangoDB and its client tools now show a 5 digit unique ID value in
any of their log messages, e.g.
2019-03-25T21:23:19Z [8144] INFO [cf3f4] ArangoDB (version 3.5.0 enterprise [linux]) is ready for business. Have fun!.
In this message, the `cf3f4` is the message's unique ID value. ArangoDB users can
use this ID to build custom monitoring or alerting based on specific log ID values.
Existing log ID values are supposed to stay constant in future releases of arangod.
Additionally the unique log ID values can be used by the ArangoDB support to find
out which component of the product exactly generated a log message. The IDs also
make disambiguation of identical log messages easier.
The presence of these ID values in log messages may confuse custom log message filtering
or routing mechanisms that parse log messages and that rely on the old log message
format.
This can be fixed adjusting any existing log message parsers and making them aware
of the ID values. The ID values are always 5 byte strings, consisting of the characters
`[0-9a-f]`. ID values are placed directly behind the log level (e.g. `INFO`).
Alternatively, the log IDs can be suppressed in all log messages by setting the startup
option `--log.ids false` when starting arangod or any of the client tools.
Internal
--------
We have moved from C++11 to C++14, which allows us to use some of the simplifications,
features and guarantees that this standard has in stock.
To compile ArangoDB from source, a compiler that supports C++14 is now required.
The bundled JEMalloc memory allocator used in ArangoDB release packages has been
upgraded from version 5.0.1 to version 5.2.0.
The bundled version of the RocksDB library has been upgraded from 5.16 to 6.0.