mirror of https://gitee.com/bigwinds/arangodb
1142 lines
38 KiB
Plaintext
1142 lines
38 KiB
Plaintext
!CHAPTER High-level operations
|
|
|
|
!SUBSECTION FOR
|
|
|
|
The *FOR* keyword can be to iterate over all elements of an array.
|
|
The general syntax is:
|
|
|
|
```
|
|
FOR variable-name IN expression
|
|
```
|
|
|
|
Each array element returned by *expression* is visited exactly once. It is
|
|
required that *expression* returns an array in all cases. The empty array is
|
|
allowed, too. The current array element is made available for further processing
|
|
in the variable specified by *variable-name*.
|
|
|
|
```
|
|
FOR u IN users
|
|
RETURN u
|
|
```
|
|
|
|
This will iterate over all elements from the array *users* (note: this array
|
|
consists of all documents from the collection named "users" in this case) and
|
|
make the current array element available in variable *u*. *u* is not modified in
|
|
this example but simply pushed into the result using the *RETURN* keyword.
|
|
|
|
Note: When iterating over collection-based arrays as shown here, the order of
|
|
documents is undefined unless an explicit sort order is defined using a *SORT*
|
|
statement.
|
|
|
|
The variable introduced by *FOR* is available until the scope the *FOR* is
|
|
placed in is closed.
|
|
|
|
Another example that uses a statically declared array of values to iterate over:
|
|
|
|
```
|
|
FOR year IN [ 2011, 2012, 2013 ]
|
|
RETURN { "year" : year, "isLeapYear" : year % 4 == 0 && (year % 100 != 0 || year % 400 == 0) }
|
|
```
|
|
|
|
Nesting of multiple *FOR* statements is allowed, too. When *FOR* statements are
|
|
nested, a cross product of the array elements returned by the individual *FOR*
|
|
statements will be created.
|
|
|
|
```
|
|
FOR u IN users
|
|
FOR l IN locations
|
|
RETURN { "user" : u, "location" : l }
|
|
```
|
|
|
|
In this example, there are two array iterations: an outer iteration over the array
|
|
*users* plus an inner iteration over the array *locations*. The inner array is
|
|
traversed as many times as there are elements in the outer array. For each
|
|
iteration, the current values of *users* and *locations* are made available for
|
|
further processing in the variable *u* and *l*.
|
|
|
|
!SUBSECTION RETURN
|
|
|
|
The *RETURN* statement can be used to produce the result of a query.
|
|
It is mandatory to specify a *RETURN* statement at the end of each block in a
|
|
data-selection query, otherwise the query result would be undefined. Using
|
|
*RETURN* on the main level in data-modification queries is optional.
|
|
|
|
The general syntax for *RETURN* is:
|
|
|
|
```
|
|
RETURN expression
|
|
```
|
|
|
|
The *expression* returned by *RETURN* is produced for each iteration in the block the
|
|
*RETURN* statement is placed in. That means the result of a *RETURN* statement
|
|
is always an array (this includes the empty array). To return all elements from
|
|
the currently iterated array without modification, the following simple form can
|
|
be used:
|
|
|
|
```
|
|
FOR variable-name IN expression
|
|
RETURN variable-name
|
|
```
|
|
|
|
As *RETURN* allows specifying an expression, arbitrary computations can be
|
|
performed to calculate the result elements. Any of the variables valid in the
|
|
scope the *RETURN* is placed in can be used for the computations.
|
|
|
|
Note: *RETURN* will close the current scope and eliminate all local variables in
|
|
it.
|
|
|
|
!SUBSUBSECTION RETURN DISTINCT
|
|
|
|
Since ArangoDB 2.7, *RETURN* can optionally be followed by the *DISTINCT* keyword.
|
|
The *DISTINCT* keyword will ensure uniqueness of the values returned by the
|
|
*RETURN* statement:
|
|
|
|
```
|
|
FOR variable-name IN expression
|
|
RETURN DISTINCT expression
|
|
```
|
|
|
|
If the *DISTINCT* is applied on an expression that itself is an array or a subquery,
|
|
the *DISTINCT* will not make the values in each array or subquery result unique, but instead
|
|
ensure that the result contains only distinct arrays or subquery results. To make
|
|
the result of an array or a subquery unique, simply apply the *DISTINCT* for the
|
|
array or the subquery.
|
|
|
|
For example, the following query will apply *DISTINCT* on its subquery results,
|
|
but not inside the subquery:
|
|
|
|
```
|
|
FOR what IN 1..2
|
|
RETURN DISTINCT (
|
|
FOR i IN [ 1, 2, 3, 4, 1, 3 ]
|
|
RETURN i
|
|
)
|
|
```
|
|
|
|
Here we'll have a *FOR* loop with two iterations that each execute a subquery. The
|
|
*DISTINCT* here is applied on the two subquery results. Both subqueries return the
|
|
same result value (that is [ 1, 2, 3, 4, 1, 3 ]), so after *DISTINCT* there will
|
|
only be one occurrence of the value [ 1, 2, 3, 4, 1, 3 ] left:
|
|
|
|
```
|
|
[
|
|
[ 1, 2, 3, 4, 1, 3 ]
|
|
]
|
|
```
|
|
|
|
If the goal is to apply the *DISTINCT* inside the subquery, it needs to be moved
|
|
there:
|
|
|
|
```
|
|
FOR what IN 1..2
|
|
LET sub = (
|
|
FOR i IN [ 1, 2, 3, 4, 1, 3 ]
|
|
RETURN DISTINCT i
|
|
)
|
|
RETURN sub
|
|
```
|
|
|
|
In the above case, the *DISTINCT* will make the subquery results unique, so that
|
|
each subquery will return a unique array of values ([ 1, 2, 3, 4 ]). As the subquery
|
|
is executed twice and there is no *DISTINCT* on the top-level, that array will be
|
|
returned twice:
|
|
|
|
```
|
|
[
|
|
[ 1, 2, 3, 4 ],
|
|
[ 1, 2, 3, 4 ]
|
|
]
|
|
```
|
|
|
|
Note: the order of results is undefined for *RETURN DISTINCT*.
|
|
|
|
Note: *RETURN DISTINCT* is not allowed on the top-level of a query if there is no *FOR*
|
|
loop in front of it.
|
|
|
|
|
|
!SUBSECTION FILTER
|
|
|
|
The *FILTER* statement can be used to restrict the results to elements that
|
|
match an arbitrary logical condition. The general syntax is:
|
|
|
|
```
|
|
FILTER condition
|
|
```
|
|
|
|
*condition* must be a condition that evaluates to either *false* or *true*. If
|
|
the condition result is false, the current element is skipped, so it will not be
|
|
processed further and not be part of the result. If the condition is true, the
|
|
current element is not skipped and can be further processed.
|
|
|
|
```
|
|
FOR u IN users
|
|
FILTER u.active == true && u.age < 39
|
|
RETURN u
|
|
```
|
|
|
|
In the above example, all array elements from *users* will be included that have
|
|
an attribute *active* with value *true* and that have an attribute *age* with a
|
|
value less than *39* (including *null* ones). All other elements from *users*
|
|
will be skipped and not be included the result produced by *RETURN*.
|
|
|
|
It is allowed to specify multiple *FILTER* statements in a query, and even in
|
|
the same block. If multiple *FILTER* statements are used, their results will be
|
|
combined with a logical and, meaning all filter conditions must be true to
|
|
include an element.
|
|
|
|
```
|
|
FOR u IN users
|
|
FILTER u.active == true
|
|
FILTER u.age < 39
|
|
RETURN u
|
|
```
|
|
|
|
!SUBSECTION SORT
|
|
|
|
The *SORT* statement will force a sort of the array of already produced
|
|
intermediate results in the current block. *SORT* allows specifying one or
|
|
multiple sort criteria and directions. The general syntax is:
|
|
|
|
```
|
|
SORT expression direction
|
|
```
|
|
|
|
Specifying the *direction* is optional. The default (implicit) direction for a
|
|
sort is the ascending order. To explicitly specify the sort direction, the
|
|
keywords *ASC* (ascending) and *DESC* can be used. Multiple sort criteria can be
|
|
separated using commas.
|
|
|
|
Note: when iterating over collection-based arrays, the order of documents is
|
|
always undefined unless an explicit sort order is defined using *SORT*.
|
|
|
|
```
|
|
FOR u IN users
|
|
SORT u.lastName, u.firstName, u.id DESC
|
|
RETURN u
|
|
```
|
|
|
|
Note that constant *SORT* expressions can be used to indicate that no particular
|
|
sort order is desired. Constant *SORT* expressions will be optimized away by the AQL
|
|
optimizer during optimization, but specifying them explicitly may enable further
|
|
optimizations if the optimizer does not need to take into account any particular
|
|
sort order.
|
|
|
|
!SUBSECTION LIMIT
|
|
|
|
The *LIMIT* statement allows slicing the result array using an
|
|
offset and a count. It reduces the number of elements in the result to at most
|
|
the specified number. Two general forms of *LIMIT* are followed:
|
|
|
|
```
|
|
LIMIT count
|
|
LIMIT offset, count
|
|
```
|
|
|
|
The first form allows specifying only the *count* value whereas the second form
|
|
allows specifying both *offset* and *count*. The first form is identical using
|
|
the second form with an *offset* value of *0*.
|
|
|
|
The *offset* value specifies how many elements from the result shall be
|
|
discarded. It must be 0 or greater. The *count* value specifies how many
|
|
elements should be at most included in the result.
|
|
|
|
```
|
|
FOR u IN users
|
|
SORT u.firstName, u.lastName, u.id DESC
|
|
LIMIT 0, 5
|
|
RETURN u
|
|
```
|
|
|
|
!SUBSECTION LET
|
|
|
|
The *LET* statement can be used to assign an arbitrary value to a variable. The
|
|
variable is then introduced in the scope the *LET* statement is placed in. The
|
|
general syntax is:
|
|
|
|
```
|
|
LET variable-name = expression
|
|
```
|
|
|
|
*LET* statements are mostly used to declare complex computations and to avoid
|
|
repeated computations of the same value at multiple parts of a query.
|
|
|
|
```
|
|
FOR u IN users
|
|
LET numRecommendations = LENGTH(u.recommendations)
|
|
RETURN {
|
|
"user" : u,
|
|
"numRecommendations" : numRecommendations,
|
|
"isPowerUser" : numRecommendations >= 10
|
|
}
|
|
```
|
|
|
|
In the above example, the computation of the number of recommendations is
|
|
factored out using a *LET* statement, thus avoiding computing the value twice in
|
|
the *RETURN* statement.
|
|
|
|
Another use case for *LET* is to declare a complex computation in a subquery,
|
|
making the whole query more readable.
|
|
|
|
```
|
|
FOR u IN users
|
|
LET friends = (
|
|
FOR f IN friends
|
|
FILTER u.id == f.userId
|
|
RETURN f
|
|
)
|
|
LET memberships = (
|
|
FOR m IN memberships
|
|
FILTER u.id == m.userId
|
|
RETURN m
|
|
)
|
|
RETURN {
|
|
"user" : u,
|
|
"friends" : friends,
|
|
"numFriends" : LENGTH(friends),
|
|
"memberShips" : memberships
|
|
}
|
|
```
|
|
|
|
!SUBSECTION COLLECT
|
|
|
|
The *COLLECT* keyword can be used to group an array by one or multiple group
|
|
criteria.
|
|
|
|
The *COLLECT* statement will eliminate all local variables in the current
|
|
scope. After *COLLECT* only the variables introduced by *COLLECT* itself are
|
|
available.
|
|
|
|
The general syntaxes for *COLLECT* are:
|
|
|
|
```
|
|
COLLECT variable-name = expression options
|
|
COLLECT variable-name = expression INTO groups-variable options
|
|
COLLECT variable-name = expression INTO groups-variable = projection-expression options
|
|
COLLECT variable-name = expression INTO groups-variable KEEP keep-variable options
|
|
COLLECT variable-name = expression WITH COUNT INTO count-variable options
|
|
COLLECT WITH COUNT INTO count-variable options
|
|
```
|
|
|
|
!SUBSUBSECTION Grouping syntaxes
|
|
|
|
The first syntax form of *COLLECT* only groups the result by the defined group
|
|
criteria specified in *expression*. In order to further process the results
|
|
produced by *COLLECT*, a new variable (specified by *variable-name*) is introduced.
|
|
This variable contains the group value.
|
|
|
|
Here's an example query that find the distinct values in *u.city* and makes
|
|
them available in variable *city*:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT city = u.city
|
|
RETURN {
|
|
"city" : city
|
|
}
|
|
```
|
|
|
|
The second form does the same as the first form, but additionally introduces a
|
|
variable (specified by *groups-variable*) that contains all elements that fell into the
|
|
group. This works as follows: The *groups-variable* variable is an array containing
|
|
as many elements as there are in the group. Each member of that array is
|
|
a JSON object in which the value of every variable that is defined in the
|
|
AQL query is bound to the corresponding attribute. Note that this considers
|
|
all variables that are defined before the *COLLECT* statement, but not those on
|
|
the top level (outside of any *FOR*), unless the *COLLECT* statement is itself
|
|
on the top level, in which case all variables are taken. Furthermore note
|
|
that it is possible that the optimizer moves *LET* statements out of *FOR*
|
|
statements to improve performance.
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT city = u.city INTO groups
|
|
RETURN {
|
|
"city" : city,
|
|
"usersInCity" : groups
|
|
}
|
|
```
|
|
|
|
In the above example, the array *users* will be grouped by the attribute
|
|
*city*. The result is a new array of documents, with one element per distinct
|
|
*u.city* value. The elements from the original array (here: *users*) per city are
|
|
made available in the variable *groups*. This is due to the *INTO* clause.
|
|
|
|
*COLLECT* also allows specifying multiple group criteria. Individual group
|
|
criteria can be separated by commas:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT country = u.country, city = u.city INTO groups
|
|
RETURN {
|
|
"country" : country,
|
|
"city" : city,
|
|
"usersInCity" : groups
|
|
}
|
|
```
|
|
|
|
In the above example, the array *users* is grouped by country first and then
|
|
by city, and for each distinct combination of country and city, the users
|
|
will be returned.
|
|
|
|
|
|
!SUBSUBSECTION Discarding obsolete variables
|
|
|
|
The third form of *COLLECT* allows rewriting the contents of the *groups-variable*
|
|
using an arbitrary *projection-expression*:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT country = u.country, city = u.city INTO groups = u.name
|
|
RETURN {
|
|
"country" : country,
|
|
"city" : city,
|
|
"userNames" : groups
|
|
}
|
|
```
|
|
|
|
In the above example, only the *projection-expression* is *u.name*. Therefore,
|
|
only this attribute is copied into the *groups-variable* for each document.
|
|
This is probably much more efficient than copying all variables from the scope into
|
|
the *groups-variable* as it would happen without a *projection-expression*.
|
|
|
|
The expression following *INTO* can also be used for arbitrary computations:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT country = u.country, city = u.city INTO groups = {
|
|
"name" : u.name,
|
|
"isActive" : u.status == "active"
|
|
}
|
|
RETURN {
|
|
"country" : country,
|
|
"city" : city,
|
|
"usersInCity" : groups
|
|
}
|
|
```
|
|
|
|
*COLLECT* also provides an optional *KEEP* clause that can be used to control
|
|
which variables will be copied into the variable created by `INTO`. If no
|
|
*KEEP* clause is specified, all variables from the scope will be copied as
|
|
sub-attributes into the *groups-variable*.
|
|
This is safe but can have a negative impact on performance if there
|
|
are many variables in scope or the variables contain massive amounts of data.
|
|
|
|
The following example limits the variables that are copied into the *groups-variable*
|
|
to just *name*. The variables *u* and *someCalculation* also present in the scope
|
|
will not be copied into *groups-variable* because they are not listed in the *KEEP* clause:
|
|
|
|
```
|
|
FOR u IN users
|
|
LET name = u.name
|
|
LET someCalculation = u.value1 + u.value2
|
|
COLLECT city = u.city INTO groups KEEP name
|
|
RETURN {
|
|
"city" : city,
|
|
"userNames" : groups[*].name
|
|
}
|
|
```
|
|
|
|
*KEEP* is only valid in combination with *INTO*. Only valid variable names can
|
|
be used in the *KEEP* clause. *KEEP* supports the specification of multiple
|
|
variable names.
|
|
|
|
|
|
!SUBSUBSECTION Group length calculation
|
|
|
|
*COLLECT* also provides a special *WITH COUNT* clause that can be used to
|
|
determine the number of group members efficiently.
|
|
|
|
The simplest form just returns the number of items that made it into the
|
|
*COLLECT*:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT WITH COUNT INTO length
|
|
RETURN length
|
|
```
|
|
|
|
The above is equivalent to, but more efficient than:
|
|
|
|
```
|
|
RETURN LENGTH(
|
|
FOR u IN users
|
|
RETURN length
|
|
)
|
|
```
|
|
|
|
The *WITH COUNT* clause can also be used to efficiently count the number
|
|
of items in each group:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT age = u.age WITH COUNT INTO length
|
|
RETURN {
|
|
"age" : age,
|
|
"count" : length
|
|
}
|
|
```
|
|
|
|
Note: the *WITH COUNT* clause can only be used together with an *INTO* clause.
|
|
|
|
|
|
!SUBSUBSECTION COLLECT variants
|
|
|
|
Since ArangoDB 2.6, there are two variants of *COLLECT* that the optimizer can
|
|
choose from: the *sorted* variant and the *hash* variant. The *hash* variant only becomes a
|
|
candidate for *COLLECT* statements that do not use an *INTO* clause.
|
|
|
|
The optimizer will always generate a plan that employs the *sorted* method. The *sorted* method
|
|
requires its input to be sorted by the group criteria specified in the *COLLECT* clause.
|
|
To ensure correctness of the result, the AQL optimizer will automatically insert a *SORT*
|
|
statement into the query in front of the *COLLECT* statement. The optimizer may be able to
|
|
optimize away that *SORT* statement later if a sorted index is present on the group criteria.
|
|
|
|
In case a *COLLECT* qualifies for using the *hash* variant, the optimizer will create an extra
|
|
plan for it at the beginning of the planning phase. In this plan, no extra *SORT* statement will be
|
|
added in front of the *COLLECT*. This is because the *hash* variant of *COLLECT* does not require
|
|
sorted input. Instead, a *SORT* statement will be added after the *COLLECT* to sort its output.
|
|
This *SORT* statement may be optimized away again in later stages.
|
|
If the sort order of the *COLLECT* is irrelevant to the user, adding the extra instruction *SORT null*
|
|
after the *COLLECT* will allow the optimizer to remove the sorts altogether:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT age = u.age
|
|
SORT null /* note: will be optimized away */
|
|
RETURN age
|
|
```
|
|
|
|
Which *COLLECT* variant is used by the optimizer depends on the optimizer's cost estimations. The
|
|
created plans with the different *COLLECT* variants will be shipped through the regular optimization
|
|
pipeline. In the end, the optimizer will pick the plan with the lowest estimated total cost as usual.
|
|
|
|
In general, the *sorted* variant of *COLLECT* should be preferred in cases when there is a sorted index
|
|
present on the group criteria. In this case the optimizer can eliminate the *SORT* statement in front
|
|
of the *COLLECT*, so that no *SORT* will be left.
|
|
|
|
If there is no sorted index available on the group criteria, the up-front sort required by the *sorted*
|
|
variant can be expensive. In this case it is likely that the optimizer will prefer the *hash* variant
|
|
of *COLLECT*, which does not require its input to be sorted.
|
|
|
|
Which variant of *COLLECT* was actually used can be figured out by looking into the execution plan of
|
|
a query, specifically the *AggregateNode* and its *aggregationOptions* attribute.
|
|
|
|
|
|
!SUBSUBSECTION Setting COLLECT options
|
|
|
|
*options* can be used in a *COLLECT* statement to inform the optimizer about the preferred *COLLECT*
|
|
method. When specifying the following appendix to a *COLLECT* statement, the optimizer will always use
|
|
the *sorted* variant of *COLLECT* and not even create a plan using the *hash* variant:
|
|
|
|
```
|
|
OPTIONS { method: "sorted" }
|
|
```
|
|
|
|
Note that specifying *hash* as method will not make the optimizer use the *hash* variant. This is
|
|
because the *hash* variant is not eligible for all queries. Instead, if no options or any other method
|
|
than *sorted* are specified in *OPTIONS*, the optimizer will use its regular cost estimations.
|
|
|
|
|
|
!SUBSUBSECTION COLLECT vs. RETURN DISTINCT
|
|
|
|
In order to make a result set unique, one can either use *COLLECT* or *RETURN DISTINCT*. Behind the
|
|
scenes, both variants will work by creating an *AggregateNode*. For both variants, the optimizer
|
|
may try the sorted and the hashed variant of *COLLECT*. The difference is therefore mainly syntactical,
|
|
with *RETURN DISTINCT* saving a bit of typing when compared to an equivalent *COLLECT*:
|
|
|
|
```
|
|
FOR u IN users
|
|
RETURN DISTINCT u.age
|
|
```
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT age = u.age
|
|
RETURN age
|
|
```
|
|
|
|
However, *COLLECT* is vastly more flexible than *RETURN DISTINCT*. Additionally, the order of results is
|
|
undefined for a *RETURN DISTINCT*, whereas for a *COLLECT* the results will be sorted.
|
|
|
|
|
|
!SUBSECTION REMOVE
|
|
|
|
The *REMOVE* keyword can be used to remove documents from a collection. On a
|
|
single server, the document removal is executed transactionally in an
|
|
all-or-nothing fashion. For sharded collections, the entire remove operation
|
|
is not transactional.
|
|
|
|
Only a single *REMOVE* statement is allowed per AQL query, and it cannot be combined
|
|
with other data-modification or retrieval operations. A remove operation is
|
|
restricted to a single collection, and the [collection name](../Glossary/index.html#collection_name) must not be dynamic.
|
|
|
|
The syntax for a remove operation is:
|
|
|
|
```
|
|
REMOVE key-expression IN collection options
|
|
```
|
|
|
|
*collection* must contain the name of the collection to remove the documents
|
|
from. *key-expression* must be an expression that contains the document identification.
|
|
This can either be a string (which must then contain the [document key](../Glossary/index.html#document_key)) or a
|
|
document, which must contain a *_key* attribute.
|
|
|
|
The following queries are thus equivalent:
|
|
|
|
```
|
|
FOR u IN users
|
|
REMOVE { _key: u._key } IN users
|
|
|
|
FOR u IN users
|
|
REMOVE u._key IN users
|
|
|
|
FOR u IN users
|
|
REMOVE u IN users
|
|
```
|
|
|
|
**Note**: A remove operation can remove arbitrary documents, and the documents
|
|
do not need to be identical to the ones produced by a preceding *FOR* statement:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
REMOVE { _key: CONCAT('test', i) } IN users
|
|
|
|
FOR u IN users
|
|
FILTER u.active == false
|
|
REMOVE { _key: u._key } IN backup
|
|
```
|
|
|
|
!SUBSUBSECTION Setting query options
|
|
|
|
*options* can be used to suppress query errors that may occur when trying to
|
|
remove non-existing documents. For example, the following query will fail if one
|
|
of the to-be-deleted documents does not exist:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
REMOVE { _key: CONCAT('test', i) } IN users
|
|
```
|
|
|
|
By specifying the *ignoreErrors* query option, these errors can be suppressed so
|
|
the query completes:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
REMOVE { _key: CONCAT('test', i) } IN users OPTIONS { ignoreErrors: true }
|
|
```
|
|
|
|
To make sure data has been written to disk when a query returns, there is the *waitForSync*
|
|
query option:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
REMOVE { _key: CONCAT('test', i) } IN users OPTIONS { waitForSync: true }
|
|
```
|
|
|
|
!SUBSUBSECTION Returning the removed documents
|
|
|
|
The removed documents can also be returned by the query. In this case, the `REMOVE`
|
|
statement must be followed by a `RETURN` statement (intermediate `LET` statements
|
|
are allowed, too).`REMOVE` introduces the pseudo-value `OLD` to refer to the removed
|
|
documents:
|
|
|
|
```
|
|
REMOVE key-expression IN collection options RETURN OLD
|
|
```
|
|
|
|
Following is an example using a variable named `removed` for capturing the removed
|
|
documents. For each removed document, the document key will be returned.
|
|
|
|
```
|
|
FOR u IN users
|
|
REMOVE u IN users
|
|
LET removed = OLD
|
|
RETURN removed._key
|
|
```
|
|
|
|
!SUBSECTION UPDATE
|
|
|
|
The *UPDATE* keyword can be used to partially update documents in a collection. On a
|
|
single server, updates are executed transactionally in an all-or-nothing fashion.
|
|
For sharded collections, the entire update operation is not transactional.
|
|
|
|
Only a single *UPDATE* statement is allowed per AQL query, and it cannot be combined
|
|
with other data-modification or retrieval operations. An update operation is
|
|
restricted to a single collection, and the collection name must not be dynamic.
|
|
The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be
|
|
updated.
|
|
|
|
The two syntaxes for an update operation are:
|
|
|
|
```
|
|
UPDATE document IN collection options
|
|
UPDATE key-expression WITH document IN collection options
|
|
```
|
|
|
|
*collection* must contain the name of the collection in which the documents should
|
|
be updated. *document* must be a document that contains the attributes and values
|
|
to be updated. When using the first syntax, *document* must also contain the *_key*
|
|
attribute to identify the document to be updated.
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users
|
|
```
|
|
|
|
The following query is invalid because it does not contain a *_key* attribute and
|
|
thus it is not possible to determine the documents to be updated:
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE { name: CONCAT(u.firstName, u.lastName) } IN users
|
|
```
|
|
|
|
When using the second syntax, *key-expression* provides the document identification.
|
|
This can either be a string (which must then contain the document key) or a
|
|
document, which must contain a *_key* attribute.
|
|
|
|
The following queries are equivalent:
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users
|
|
|
|
FOR u IN users
|
|
UPDATE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users
|
|
|
|
FOR u IN users
|
|
UPDATE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users
|
|
```
|
|
|
|
An update operation may update arbitrary documents which do not need to be identical
|
|
to the ones produced by a preceding *FOR* statement:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
UPDATE CONCAT('test', i) WITH { foobar: true } IN users
|
|
|
|
FOR u IN users
|
|
FILTER u.active == false
|
|
UPDATE u WITH { status: 'inactive' } IN backup
|
|
```
|
|
|
|
!SUBSUBSECTION Setting query options
|
|
|
|
*options* can be used to suppress query errors that may occur when trying to
|
|
update non-existing documents or violating unique key constraints:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
UPDATE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true }
|
|
```
|
|
|
|
An update operation will only update the attributes specified in *document* and
|
|
leave other attributes untouched. Internal attributes (such as *_id*, *_key*, *_rev*,
|
|
*_from* and *_to*) cannot be updated and are ignored when specified in *document*.
|
|
Updating a document will modify the document's revision number with a server-generated value.
|
|
|
|
When updating an attribute with a null value, ArangoDB will not remove the attribute
|
|
from the document but store a null value for it. To get rid of attributes in an update
|
|
operation, set them to null and provide the *keepNull* option:
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE u WITH { foobar: true, notNeeded: null } IN users OPTIONS { keepNull: false }
|
|
```
|
|
|
|
The above query will remove the *notNeeded* attribute from the documents and update
|
|
the *foobar* attribute normally.
|
|
|
|
There is also the option *mergeObjects* that controls whether object contents will be
|
|
merged if an object attribute is present in both the *UPDATE* query and in the
|
|
to-be-updated document.
|
|
|
|
The following query will set the updated document's *name* attribute to the exact
|
|
same value that is specified in the query. This is due to the *mergeObjects* option
|
|
being set to *false*:
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: false }
|
|
```
|
|
|
|
Contrary, the following query will merge the contents of the *name* attribute in the
|
|
original document with the value specified in the query:
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: true }
|
|
```
|
|
|
|
Attributes in *name* that are present in the to-be-updated document but not in the
|
|
query will now be preserved. Attributes that are present in both will be overwritten
|
|
with the values specified in the query.
|
|
|
|
Note: the default value for *mergeObjects* is *true*, so there is no need to specify it
|
|
explicitly.
|
|
|
|
To make sure data are durable when an update query returns, there is the *waitForSync*
|
|
query option:
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE u WITH { foobar: true } IN users OPTIONS { waitForSync: true }
|
|
```
|
|
|
|
!SUBSUBSECTION Returning the modified documents
|
|
|
|
The modified documents can also be returned by the query. In this case, the `UPDATE`
|
|
statement needs to be followed a `RETURN` statement (intermediate `LET` statements
|
|
are allowed, too). These statements can refer to the pseudo-values `OLD` and `NEW`.
|
|
The `OLD` pseudo-value refers to the document revisions before the update, and `NEW`
|
|
refers to document revisions after the update.
|
|
|
|
Both `OLD` and `NEW` will contain all document attributes, even those not specified
|
|
in the update expression.
|
|
|
|
```
|
|
UPDATE document IN collection options RETURN OLD
|
|
UPDATE document IN collection options RETURN NEW
|
|
UPDATE key-expression WITH document IN collection options RETURN OLD
|
|
UPDATE key-expression WITH document IN collection options RETURN NEW
|
|
```
|
|
|
|
Following is an example using a variable named `previous` to capture the original
|
|
documents before modification. For each modified document, the document key is returned.
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE u WITH { value: "test" }
|
|
LET previous = OLD
|
|
RETURN previous._key
|
|
```
|
|
|
|
The following query uses the `NEW` pseudo-value to return the updated documents,
|
|
without some of the system attributes:
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE u WITH { value: "test" }
|
|
LET updated = NEW
|
|
RETURN UNSET(updated, "_key", "_id", "_rev")
|
|
```
|
|
|
|
It is also possible to return both `OLD` and `NEW`:
|
|
|
|
```
|
|
FOR u IN users
|
|
UPDATE u WITH { value: "test" }
|
|
RETURN { before: OLD, after: NEW }
|
|
```
|
|
|
|
|
|
!SUBSECTION REPLACE
|
|
|
|
The *REPLACE* keyword can be used to completely replace documents in a collection. On a
|
|
single server, the replace operation is executed transactionally in an all-or-nothing
|
|
fashion. For sharded collections, the entire replace operation is not transactional.
|
|
|
|
Only a single *REPLACE* statement is allowed per AQL query, and it cannot be combined
|
|
with other data-modification or retrieval operations. A replace operation is
|
|
restricted to a single collection, and the collection name must not be dynamic.
|
|
The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be
|
|
replaced.
|
|
|
|
The two syntaxes for a replace operation are:
|
|
|
|
```
|
|
REPLACE document IN collection options
|
|
REPLACE key-expression WITH document IN collection options
|
|
```
|
|
|
|
*collection* must contain the name of the collection in which the documents should
|
|
be replaced. *document* is the replacement document. When using the first syntax, *document*
|
|
must also contain the *_key* attribute to identify the document to be replaced.
|
|
|
|
```
|
|
FOR u IN users
|
|
REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName), status: u.status } IN users
|
|
```
|
|
|
|
The following query is invalid because it does not contain a *_key* attribute and
|
|
thus it is not possible to determine the documents to be replaced:
|
|
|
|
```
|
|
FOR u IN users
|
|
REPLACE { name: CONCAT(u.firstName, u.lastName, status: u.status) } IN users
|
|
```
|
|
|
|
When using the second syntax, *key-expression* provides the document identification.
|
|
This can either be a string (which must then contain the document key) or a
|
|
document, which must contain a *_key* attribute.
|
|
|
|
The following queries are equivalent:
|
|
|
|
```
|
|
FOR u IN users
|
|
REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users
|
|
|
|
FOR u IN users
|
|
REPLACE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users
|
|
|
|
FOR u IN users
|
|
REPLACE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users
|
|
|
|
FOR u IN users
|
|
REPLACE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users
|
|
```
|
|
|
|
A replace will fully replace an existing document, but it will not modify the values
|
|
of internal attributes (such as *_id*, *_key*, *_from* and *_to*). Replacing a document
|
|
will modify a document's revision number with a server-generated value.
|
|
|
|
A replace operation may update arbitrary documents which do not need to be identical
|
|
to the ones produced by a preceding *FOR* statement:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
REPLACE CONCAT('test', i) WITH { foobar: true } IN users
|
|
|
|
FOR u IN users
|
|
FILTER u.active == false
|
|
REPLACE u WITH { status: 'inactive', name: u.name } IN backup
|
|
```
|
|
|
|
!SUBSUBSECTION Setting query options
|
|
|
|
*options* can be used to suppress query errors that may occur when trying to
|
|
replace non-existing documents or when violating unique key constraints:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
REPLACE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true }
|
|
```
|
|
|
|
To make sure data are durable when a replace query returns, there is the *waitForSync*
|
|
query option:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
REPLACE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { waitForSync: true }
|
|
```
|
|
|
|
!SUBSUBSECTION Returning the modified documents
|
|
|
|
The modified documents can also be returned by the query. In this case, the `REPLACE`
|
|
statement must be followed by a `RETURN` statement (intermediate `LET` statements are
|
|
allowed, too). The `OLD` pseudo-value can be used to refer to document revisions before
|
|
the replace, and `NEW` refers to document revisions after the replace.
|
|
|
|
Both `OLD` and `NEW` will contain all document attributes, even those not specified
|
|
in the replace expression.
|
|
|
|
|
|
```
|
|
REPLACE document IN collection options RETURN OLD
|
|
REPLACE document IN collection options RETURN NEW
|
|
REPLACE key-expression WITH document IN collection options RETURN OLD
|
|
REPLACE key-expression WITH document IN collection options RETURN NEW
|
|
```
|
|
|
|
Following is an example using a variable named `previous` to return the original
|
|
documents before modification. For each replaced document, the document key will be
|
|
returned:
|
|
|
|
```
|
|
FOR u IN users
|
|
REPLACE u WITH { value: "test" }
|
|
LET previous = OLD
|
|
RETURN previous._key
|
|
```
|
|
|
|
The following query uses the `NEW` pseudo-value to return the replaced
|
|
documents (without some of their system attributes):
|
|
|
|
```
|
|
FOR u IN users
|
|
REPLACE u WITH { value: "test" }
|
|
LET replaced = NEW
|
|
RETURN UNSET(replaced, '_key', '_id', '_rev')
|
|
```
|
|
|
|
|
|
!SUBSECTION INSERT
|
|
|
|
The *INSERT* keyword can be used to insert new documents into a collection. On a
|
|
single server, an insert operation is executed transactionally in an all-or-nothing
|
|
fashion. For sharded collections, the entire insert operation is not transactional.
|
|
|
|
Only a single *INSERT* statement is allowed per AQL query, and it cannot be combined
|
|
with other data-modification or retrieval operations. An insert operation is
|
|
restricted to a single collection, and the collection name must not be dynamic.
|
|
|
|
The syntax for an insert operation is:
|
|
|
|
```
|
|
INSERT document IN collection options
|
|
```
|
|
|
|
**Note**: The *INTO* keyword is also allowed in the place of *IN*.
|
|
|
|
*collection* must contain the name of the collection into which the documents should
|
|
be inserted. *document* is the document to be inserted, and it may or may not contain
|
|
a *_key* attribute. If no *_key* attribute is provided, ArangoDB will auto-generate
|
|
a value for *_key* value. Inserting a document will also auto-generate a document
|
|
revision number for the document.
|
|
|
|
```
|
|
FOR i IN 1..100
|
|
INSERT { value: i } IN numbers
|
|
```
|
|
|
|
When inserting into an [edge collection](../Glossary/index.html#edge_collection), it is mandatory to specify the attributes
|
|
*_from* and *_to* in document:
|
|
|
|
```
|
|
FOR u IN users
|
|
FOR p IN products
|
|
FILTER u._key == p.recommendedBy
|
|
INSERT { _from: u._id, _to: p._id } IN recommendations
|
|
```
|
|
|
|
!SUBSUBSECTION Setting query options
|
|
|
|
*options* can be used to suppress query errors that may occur when violating unique
|
|
key constraints:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
INSERT { _key: CONCAT('test', i), name: "test" } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true }
|
|
```
|
|
|
|
To make sure data are durable when an insert query returns, there is the *waitForSync*
|
|
query option:
|
|
|
|
```
|
|
FOR i IN 1..1000
|
|
INSERT { _key: CONCAT('test', i), name: "test" } WITH { foobar: true } IN users OPTIONS { waitForSync: true }
|
|
```
|
|
|
|
!SUBSUBSECTION Returning the inserted documents
|
|
|
|
The inserted documents can also be returned by the query. In this case, the `INSERT`
|
|
statement can be a `RETURN` statement (intermediate `LET` statements are allowed, too).
|
|
To refer to the inserted documents, the `INSERT` statement introduces a pseudo-value
|
|
named `NEW`.
|
|
|
|
The documents contained in `NEW` will contain all attributes, even those auto-generated by
|
|
the database (e.g. `_id`, `_key`, `_rev`, `_from`, and `_to`).
|
|
|
|
|
|
```
|
|
INSERT document IN collection options RETURN NEW
|
|
```
|
|
|
|
Following is an example using a variable named `inserted` to return the inserted
|
|
documents. For each inserted document, the document key is returned:
|
|
|
|
```
|
|
FOR i IN 1..100
|
|
INSERT { value: i }
|
|
LET inserted = NEW
|
|
RETURN inserted._key
|
|
```
|
|
|
|
|
|
!SUBSECTION UPSERT
|
|
|
|
The *UPSERT* keyword can be used for checking whether certain documents exist,
|
|
and to update them in case they exist, or create them in case they do not exist.
|
|
On a single server, upserts are executed transactionally in an all-or-nothing fashion.
|
|
For sharded collections, the entire update operation is not transactional.
|
|
|
|
Only a single *UPSERT* statement is allowed per AQL query, and it cannot be combined
|
|
with other data-modification or retrieval operations. An upsert operation is
|
|
restricted to a single collection, and the collection name must not be dynamic.
|
|
|
|
The syntax for an upsert operation is:
|
|
|
|
```
|
|
UPSERT search-expression INSERT insert-expression UPDATE update-expression IN collection options
|
|
UPSERT search-expression INSERT insert-expression REPLACE update-expression IN collection options
|
|
```
|
|
|
|
When using the *UPDATE* variant of the upsert operation, the found document will be
|
|
partially updated, meaning only the attributes specified in *update-expression* will be
|
|
updated or added. When using the *REPLACE* variant of upsert, existing documents will
|
|
be replaced with the contexts of *update-expression*.
|
|
|
|
Updating a document will modify the document's revision number with a server-generated value.
|
|
The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be
|
|
updated.
|
|
|
|
The *search-expression* contains the document to be looked for. It must be an object
|
|
literal without dynamic attribute names. In case no such document can be found in
|
|
*collection*, a new document will be inserted into the collection as specified in the
|
|
*insert-expression*.
|
|
|
|
In case at least one document in *collection* matches the *search-expression*, it will
|
|
be updated using the *update-expression*. When more than one document in the collection
|
|
matches the *search-expression*, it is undefined which of the matching documents will
|
|
be updated. It is therefore often sensible to make sure by other means (such as unique
|
|
indexes, application logic etc.) that at most one document matches *search-expression*.
|
|
|
|
The following query will look in the *users* collection for a document with a specific
|
|
*name* attribute value. If the document exists, its *logins* attribute will be increased
|
|
by one. If it does not exist, a new document will be inserted, consisting of the
|
|
attributes *name*, *logins*, and *dateCreated*:
|
|
|
|
```
|
|
UPSERT { name: 'superuser' }
|
|
INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() }
|
|
UPDATE { logins: OLD.logins + 1 } IN users
|
|
```
|
|
|
|
Note that in the *UPDATE* case it is possible to refer to the previous version of the
|
|
document using the *OLD* pseudo-value.
|
|
|
|
|
|
!SUBSUBSECTION Setting query options
|
|
|
|
As in several above examples, the *ignoreErrors* option can be used to suppress query
|
|
errors that may occur when trying to violate unique key constraints.
|
|
|
|
When updating or replacing an attribute with a null value, ArangoDB will not remove the
|
|
attribute from the document but store a null value for it. To get rid of attributes in
|
|
an upsert operation, set them to null and provide the *keepNull* option.
|
|
|
|
There is also the option *mergeObjects* that controls whether object contents will be
|
|
merged if an object attribute is present in both the *UPDATE* query and in the
|
|
to-be-updated document.
|
|
|
|
Note: the default value for *mergeObjects* is *true*, so there is no need to specify it
|
|
explicitly.
|
|
|
|
To make sure data are durable when an update query returns, there is the *waitForSync*
|
|
query option.
|
|
|
|
|
|
!SUBSUBSECTION Returning documents
|
|
|
|
`UPSERT` statements can optionally return data. To do so, they need to be followed
|
|
by a `RETURN` statement (intermediate `LET` statements are allowed, too). These statements
|
|
can optionally perform calculations and refer to the pseudo-values `OLD` and `NEW`.
|
|
In case the upsert performed an insert operation, `OLD` will have a value of *null*.
|
|
In case the upsert performed an update or replace operation, `OLD` will contain the
|
|
previous version of the document, before update/replace.
|
|
|
|
`NEW` will always be populated. It will contain the inserted document in case the
|
|
upsert performed an insert, or the updated/replaced document in case it performed an
|
|
update/replace.
|
|
|
|
This can also be used to check whether the upsert has performed an insert or an update
|
|
internally:
|
|
|
|
```
|
|
UPSERT { name: 'superuser' }
|
|
INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() }
|
|
UPDATE { logins: OLD.logins + 1 } IN users
|
|
RETURN { doc: NEW, type: OLD ? 'update' : 'insert' }
|
|
```
|
|
|