diff --git a/Documentation/Books/Users/Aql/Operations.mdpp b/Documentation/Books/Users/Aql/Operations.mdpp index 26c12359e5..6a30e82a89 100644 --- a/Documentation/Books/Users/Aql/Operations.mdpp +++ b/Documentation/Books/Users/Aql/Operations.mdpp @@ -1,1160 +1,17 @@ !CHAPTER High-level operations -!SUBSECTION FOR - -The *FOR* keyword can be to iterate over all elements of an array. -The general syntax is: - -``` -FOR variable-name IN expression -``` - -There also is a special case for graph traversals: - -``` -FOR vertex-variable-name, edge-variable-name, path-variable-name IN traversal-expression -``` - -For this special case see [the graph traversals chapter](GraphTraversals.md). -For all other cases read on: - -Each array element returned by *expression* is visited exactly once. It is -required that *expression* returns an array in all cases. The empty array is -allowed, too. The current array element is made available for further processing -in the variable specified by *variable-name*. - -``` -FOR u IN users - RETURN u -``` - -This will iterate over all elements from the array *users* (note: this array -consists of all documents from the collection named "users" in this case) and -make the current array element available in variable *u*. *u* is not modified in -this example but simply pushed into the result using the *RETURN* keyword. - -Note: When iterating over collection-based arrays as shown here, the order of -documents is undefined unless an explicit sort order is defined using a *SORT* -statement. - -The variable introduced by *FOR* is available until the scope the *FOR* is -placed in is closed. - -Another example that uses a statically declared array of values to iterate over: - -``` -FOR year IN [ 2011, 2012, 2013 ] - RETURN { "year" : year, "isLeapYear" : year % 4 == 0 && (year % 100 != 0 || year % 400 == 0) } -``` - -Nesting of multiple *FOR* statements is allowed, too. When *FOR* statements are -nested, a cross product of the array elements returned by the individual *FOR* -statements will be created. - -``` -FOR u IN users - FOR l IN locations - RETURN { "user" : u, "location" : l } -``` - -In this example, there are two array iterations: an outer iteration over the array -*users* plus an inner iteration over the array *locations*. The inner array is -traversed as many times as there are elements in the outer array. For each -iteration, the current values of *users* and *locations* are made available for -further processing in the variable *u* and *l*. - -!SUBSECTION RETURN - -The *RETURN* statement can be used to produce the result of a query. -It is mandatory to specify a *RETURN* statement at the end of each block in a -data-selection query, otherwise the query result would be undefined. Using -*RETURN* on the main level in data-modification queries is optional. - -The general syntax for *RETURN* is: - -``` -RETURN expression -``` - -The *expression* returned by *RETURN* is produced for each iteration in the block the -*RETURN* statement is placed in. That means the result of a *RETURN* statement -is always an array (this includes the empty array). To return all elements from -the currently iterated array without modification, the following simple form can -be used: - -``` -FOR variable-name IN expression - RETURN variable-name -``` - -As *RETURN* allows specifying an expression, arbitrary computations can be -performed to calculate the result elements. Any of the variables valid in the -scope the *RETURN* is placed in can be used for the computations. - -Note: *RETURN* will close the current scope and eliminate all local variables in -it. - -!SUBSUBSECTION RETURN DISTINCT - -Since ArangoDB 2.7, *RETURN* can optionally be followed by the *DISTINCT* keyword. -The *DISTINCT* keyword will ensure uniqueness of the values returned by the -*RETURN* statement: - -``` -FOR variable-name IN expression - RETURN DISTINCT expression -``` - -If the *DISTINCT* is applied on an expression that itself is an array or a subquery, -the *DISTINCT* will not make the values in each array or subquery result unique, but instead -ensure that the result contains only distinct arrays or subquery results. To make -the result of an array or a subquery unique, simply apply the *DISTINCT* for the -array or the subquery. - -For example, the following query will apply *DISTINCT* on its subquery results, -but not inside the subquery: - -``` -FOR what IN 1..2 - RETURN DISTINCT ( - FOR i IN [ 1, 2, 3, 4, 1, 3 ] - RETURN i - ) -``` - -Here we'll have a *FOR* loop with two iterations that each execute a subquery. The -*DISTINCT* here is applied on the two subquery results. Both subqueries return the -same result value (that is [ 1, 2, 3, 4, 1, 3 ]), so after *DISTINCT* there will -only be one occurrence of the value [ 1, 2, 3, 4, 1, 3 ] left: - -``` -[ - [ 1, 2, 3, 4, 1, 3 ] -] -``` - -If the goal is to apply the *DISTINCT* inside the subquery, it needs to be moved -there: - -``` -FOR what IN 1..2 - LET sub = ( - FOR i IN [ 1, 2, 3, 4, 1, 3 ] - RETURN DISTINCT i - ) - RETURN sub -``` - -In the above case, the *DISTINCT* will make the subquery results unique, so that -each subquery will return a unique array of values ([ 1, 2, 3, 4 ]). As the subquery -is executed twice and there is no *DISTINCT* on the top-level, that array will be -returned twice: - -``` -[ - [ 1, 2, 3, 4 ], - [ 1, 2, 3, 4 ] -] -``` - -Note: the order of results is undefined for *RETURN DISTINCT*. - -Note: *RETURN DISTINCT* is not allowed on the top-level of a query if there is no *FOR* -loop in front of it. - - -!SUBSECTION FILTER - -The *FILTER* statement can be used to restrict the results to elements that -match an arbitrary logical condition. The general syntax is: - -``` -FILTER condition -``` - -*condition* must be a condition that evaluates to either *false* or *true*. If -the condition result is false, the current element is skipped, so it will not be -processed further and not be part of the result. If the condition is true, the -current element is not skipped and can be further processed. - -``` -FOR u IN users - FILTER u.active == true && u.age < 39 - RETURN u -``` - -In the above example, all array elements from *users* will be included that have -an attribute *active* with value *true* and that have an attribute *age* with a -value less than *39* (including *null* ones). All other elements from *users* -will be skipped and not be included the result produced by *RETURN*. - -It is allowed to specify multiple *FILTER* statements in a query, and even in -the same block. If multiple *FILTER* statements are used, their results will be -combined with a logical and, meaning all filter conditions must be true to -include an element. - -``` -FOR u IN users - FILTER u.active == true - FILTER u.age < 39 - RETURN u -``` - -!SUBSECTION SORT - -The *SORT* statement will force a sort of the array of already produced -intermediate results in the current block. *SORT* allows specifying one or -multiple sort criteria and directions. The general syntax is: - -``` -SORT expression direction -``` - -Specifying the *direction* is optional. The default (implicit) direction for a -sort is the ascending order. To explicitly specify the sort direction, the -keywords *ASC* (ascending) and *DESC* can be used. Multiple sort criteria can be -separated using commas. - -Note: when iterating over collection-based arrays, the order of documents is -always undefined unless an explicit sort order is defined using *SORT*. - -``` -FOR u IN users - SORT u.lastName, u.firstName, u.id DESC - RETURN u -``` - -Note that constant *SORT* expressions can be used to indicate that no particular -sort order is desired. Constant *SORT* expressions will be optimized away by the AQL -optimizer during optimization, but specifying them explicitly may enable further -optimizations if the optimizer does not need to take into account any particular -sort order. - -!SUBSECTION LIMIT - -The *LIMIT* statement allows slicing the result array using an -offset and a count. It reduces the number of elements in the result to at most -the specified number. Two general forms of *LIMIT* are followed: - -``` -LIMIT count -LIMIT offset, count -``` - -The first form allows specifying only the *count* value whereas the second form -allows specifying both *offset* and *count*. The first form is identical using -the second form with an *offset* value of *0*. - -The *offset* value specifies how many elements from the result shall be -discarded. It must be 0 or greater. The *count* value specifies how many -elements should be at most included in the result. - -``` -FOR u IN users - SORT u.firstName, u.lastName, u.id DESC - LIMIT 0, 5 - RETURN u -``` - -!SUBSECTION LET - -The *LET* statement can be used to assign an arbitrary value to a variable. The -variable is then introduced in the scope the *LET* statement is placed in. The -general syntax is: - -``` -LET variable-name = expression -``` - -*LET* statements are mostly used to declare complex computations and to avoid -repeated computations of the same value at multiple parts of a query. - -``` -FOR u IN users - LET numRecommendations = LENGTH(u.recommendations) - RETURN { - "user" : u, - "numRecommendations" : numRecommendations, - "isPowerUser" : numRecommendations >= 10 - } -``` - -In the above example, the computation of the number of recommendations is -factored out using a *LET* statement, thus avoiding computing the value twice in -the *RETURN* statement. - -Another use case for *LET* is to declare a complex computation in a subquery, -making the whole query more readable. - -``` -FOR u IN users - LET friends = ( - FOR f IN friends - FILTER u.id == f.userId - RETURN f - ) - LET memberships = ( - FOR m IN memberships - FILTER u.id == m.userId - RETURN m - ) - RETURN { - "user" : u, - "friends" : friends, - "numFriends" : LENGTH(friends), - "memberShips" : memberships - } -``` - -!SUBSECTION COLLECT - -The *COLLECT* keyword can be used to group an array by one or multiple group -criteria. - -The *COLLECT* statement will eliminate all local variables in the current -scope. After *COLLECT* only the variables introduced by *COLLECT* itself are -available. - -The general syntaxes for *COLLECT* are: - -``` -COLLECT variable-name = expression options -COLLECT variable-name = expression INTO groups-variable options -COLLECT variable-name = expression INTO groups-variable = projection-expression options -COLLECT variable-name = expression INTO groups-variable KEEP keep-variable options -COLLECT variable-name = expression WITH COUNT INTO count-variable options -COLLECT WITH COUNT INTO count-variable options -``` - -!SUBSUBSECTION Grouping syntaxes - -The first syntax form of *COLLECT* only groups the result by the defined group -criteria specified in *expression*. In order to further process the results -produced by *COLLECT*, a new variable (specified by *variable-name*) is introduced. -This variable contains the group value. - -Here's an example query that find the distinct values in *u.city* and makes -them available in variable *city*: - -``` -FOR u IN users - COLLECT city = u.city - RETURN { - "city" : city - } -``` - -The second form does the same as the first form, but additionally introduces a -variable (specified by *groups-variable*) that contains all elements that fell into the -group. This works as follows: The *groups-variable* variable is an array containing -as many elements as there are in the group. Each member of that array is -a JSON object in which the value of every variable that is defined in the -AQL query is bound to the corresponding attribute. Note that this considers -all variables that are defined before the *COLLECT* statement, but not those on -the top level (outside of any *FOR*), unless the *COLLECT* statement is itself -on the top level, in which case all variables are taken. Furthermore note -that it is possible that the optimizer moves *LET* statements out of *FOR* -statements to improve performance. - -``` -FOR u IN users - COLLECT city = u.city INTO groups - RETURN { - "city" : city, - "usersInCity" : groups - } -``` - -In the above example, the array *users* will be grouped by the attribute -*city*. The result is a new array of documents, with one element per distinct -*u.city* value. The elements from the original array (here: *users*) per city are -made available in the variable *groups*. This is due to the *INTO* clause. - -*COLLECT* also allows specifying multiple group criteria. Individual group -criteria can be separated by commas: - -``` -FOR u IN users - COLLECT country = u.country, city = u.city INTO groups - RETURN { - "country" : country, - "city" : city, - "usersInCity" : groups - } -``` - -In the above example, the array *users* is grouped by country first and then -by city, and for each distinct combination of country and city, the users -will be returned. - - -!SUBSUBSECTION Discarding obsolete variables - -The third form of *COLLECT* allows rewriting the contents of the *groups-variable* -using an arbitrary *projection-expression*: - -``` -FOR u IN users - COLLECT country = u.country, city = u.city INTO groups = u.name - RETURN { - "country" : country, - "city" : city, - "userNames" : groups - } -``` - -In the above example, only the *projection-expression* is *u.name*. Therefore, -only this attribute is copied into the *groups-variable* for each document. -This is probably much more efficient than copying all variables from the scope into -the *groups-variable* as it would happen without a *projection-expression*. - -The expression following *INTO* can also be used for arbitrary computations: - -``` -FOR u IN users - COLLECT country = u.country, city = u.city INTO groups = { - "name" : u.name, - "isActive" : u.status == "active" - } - RETURN { - "country" : country, - "city" : city, - "usersInCity" : groups - } -``` - -*COLLECT* also provides an optional *KEEP* clause that can be used to control -which variables will be copied into the variable created by `INTO`. If no -*KEEP* clause is specified, all variables from the scope will be copied as -sub-attributes into the *groups-variable*. -This is safe but can have a negative impact on performance if there -are many variables in scope or the variables contain massive amounts of data. - -The following example limits the variables that are copied into the *groups-variable* -to just *name*. The variables *u* and *someCalculation* also present in the scope -will not be copied into *groups-variable* because they are not listed in the *KEEP* clause: - -``` -FOR u IN users - LET name = u.name - LET someCalculation = u.value1 + u.value2 - COLLECT city = u.city INTO groups KEEP name - RETURN { - "city" : city, - "userNames" : groups[*].name - } -``` - -*KEEP* is only valid in combination with *INTO*. Only valid variable names can -be used in the *KEEP* clause. *KEEP* supports the specification of multiple -variable names. - - -!SUBSUBSECTION Group length calculation - -*COLLECT* also provides a special *WITH COUNT* clause that can be used to -determine the number of group members efficiently. - -The simplest form just returns the number of items that made it into the -*COLLECT*: - -``` -FOR u IN users - COLLECT WITH COUNT INTO length - RETURN length -``` - -The above is equivalent to, but more efficient than: - -``` -RETURN LENGTH( - FOR u IN users - RETURN length -) -``` - -The *WITH COUNT* clause can also be used to efficiently count the number -of items in each group: - -``` -FOR u IN users - COLLECT age = u.age WITH COUNT INTO length - RETURN { - "age" : age, - "count" : length - } -``` - -Note: the *WITH COUNT* clause can only be used together with an *INTO* clause. - - -!SUBSUBSECTION COLLECT variants - -Since ArangoDB 2.6, there are two variants of *COLLECT* that the optimizer can -choose from: the *sorted* variant and the *hash* variant. The *hash* variant only becomes a -candidate for *COLLECT* statements that do not use an *INTO* clause. - -The optimizer will always generate a plan that employs the *sorted* method. The *sorted* method -requires its input to be sorted by the group criteria specified in the *COLLECT* clause. -To ensure correctness of the result, the AQL optimizer will automatically insert a *SORT* -statement into the query in front of the *COLLECT* statement. The optimizer may be able to -optimize away that *SORT* statement later if a sorted index is present on the group criteria. - -In case a *COLLECT* qualifies for using the *hash* variant, the optimizer will create an extra -plan for it at the beginning of the planning phase. In this plan, no extra *SORT* statement will be -added in front of the *COLLECT*. This is because the *hash* variant of *COLLECT* does not require -sorted input. Instead, a *SORT* statement will be added after the *COLLECT* to sort its output. -This *SORT* statement may be optimized away again in later stages. -If the sort order of the *COLLECT* is irrelevant to the user, adding the extra instruction *SORT null* -after the *COLLECT* will allow the optimizer to remove the sorts altogether: - -``` -FOR u IN users - COLLECT age = u.age - SORT null /* note: will be optimized away */ - RETURN age -``` - -Which *COLLECT* variant is used by the optimizer depends on the optimizer's cost estimations. The -created plans with the different *COLLECT* variants will be shipped through the regular optimization -pipeline. In the end, the optimizer will pick the plan with the lowest estimated total cost as usual. - -In general, the *sorted* variant of *COLLECT* should be preferred in cases when there is a sorted index -present on the group criteria. In this case the optimizer can eliminate the *SORT* statement in front -of the *COLLECT*, so that no *SORT* will be left. - -If there is no sorted index available on the group criteria, the up-front sort required by the *sorted* -variant can be expensive. In this case it is likely that the optimizer will prefer the *hash* variant -of *COLLECT*, which does not require its input to be sorted. - -Which variant of *COLLECT* was actually used can be figured out by looking into the execution plan of -a query, specifically the *AggregateNode* and its *aggregationOptions* attribute. - - -!SUBSUBSECTION Setting COLLECT options - -*options* can be used in a *COLLECT* statement to inform the optimizer about the preferred *COLLECT* -method. When specifying the following appendix to a *COLLECT* statement, the optimizer will always use -the *sorted* variant of *COLLECT* and not even create a plan using the *hash* variant: - -``` -OPTIONS { method: "sorted" } -``` - -Note that specifying *hash* as method will not make the optimizer use the *hash* variant. This is -because the *hash* variant is not eligible for all queries. Instead, if no options or any other method -than *sorted* are specified in *OPTIONS*, the optimizer will use its regular cost estimations. - - -!SUBSUBSECTION COLLECT vs. RETURN DISTINCT - -In order to make a result set unique, one can either use *COLLECT* or *RETURN DISTINCT*. Behind the -scenes, both variants will work by creating an *AggregateNode*. For both variants, the optimizer -may try the sorted and the hashed variant of *COLLECT*. The difference is therefore mainly syntactical, -with *RETURN DISTINCT* saving a bit of typing when compared to an equivalent *COLLECT*: - -``` -FOR u IN users - RETURN DISTINCT u.age -``` - -``` -FOR u IN users - COLLECT age = u.age - RETURN age -``` - -However, *COLLECT* is vastly more flexible than *RETURN DISTINCT*. Additionally, the order of results is -undefined for a *RETURN DISTINCT*, whereas for a *COLLECT* the results will be sorted. - - -!SUBSECTION REMOVE - -The *REMOVE* keyword can be used to remove documents from a collection. On a -single server, the document removal is executed transactionally in an -all-or-nothing fashion. For sharded collections, the entire remove operation -is not transactional. - -Each *REMOVE* operation is restricted to a single collection, and the -[collection name](../Glossary/README.md#collection-name) must not be dynamic. -Only a single *REMOVE* statement per collection is allowed per AQL query, and -it cannot be followed by read operations that access the same collection, by -traversal operations, or AQL functions that can read documents. - -The syntax for a remove operation is: - -``` -REMOVE key-expression IN collection options -``` - -*collection* must contain the name of the collection to remove the documents -from. *key-expression* must be an expression that contains the document identification. -This can either be a string (which must then contain the [document key](../Glossary/README.md#document-key)) or a -document, which must contain a *_key* attribute. - -The following queries are thus equivalent: - -``` -FOR u IN users - REMOVE { _key: u._key } IN users - -FOR u IN users - REMOVE u._key IN users - -FOR u IN users - REMOVE u IN users -``` - -**Note**: A remove operation can remove arbitrary documents, and the documents -do not need to be identical to the ones produced by a preceding *FOR* statement: - -``` -FOR i IN 1..1000 - REMOVE { _key: CONCAT('test', i) } IN users - -FOR u IN users - FILTER u.active == false - REMOVE { _key: u._key } IN backup -``` - -!SUBSUBSECTION Setting query options - -*options* can be used to suppress query errors that may occur when trying to -remove non-existing documents. For example, the following query will fail if one -of the to-be-deleted documents does not exist: - -``` -FOR i IN 1..1000 - REMOVE { _key: CONCAT('test', i) } IN users -``` - -By specifying the *ignoreErrors* query option, these errors can be suppressed so -the query completes: - -``` -FOR i IN 1..1000 - REMOVE { _key: CONCAT('test', i) } IN users OPTIONS { ignoreErrors: true } -``` - -To make sure data has been written to disk when a query returns, there is the *waitForSync* -query option: - -``` -FOR i IN 1..1000 - REMOVE { _key: CONCAT('test', i) } IN users OPTIONS { waitForSync: true } -``` - -!SUBSUBSECTION Returning the removed documents - -The removed documents can also be returned by the query. In this case, the `REMOVE` -statement must be followed by a `RETURN` statement (intermediate `LET` statements -are allowed, too).`REMOVE` introduces the pseudo-value `OLD` to refer to the removed -documents: - -``` -REMOVE key-expression IN collection options RETURN OLD -``` - -Following is an example using a variable named `removed` for capturing the removed -documents. For each removed document, the document key will be returned. - -``` -FOR u IN users - REMOVE u IN users - LET removed = OLD - RETURN removed._key -``` - -!SUBSECTION UPDATE - -The *UPDATE* keyword can be used to partially update documents in a collection. On a -single server, updates are executed transactionally in an all-or-nothing fashion. -For sharded collections, the entire update operation is not transactional. - -Each *UPDATE* operation is restricted to a single collection, and the -[collection name](../Glossary/README.md#collection-name) must not be dynamic. -Only a single *UPDATE* statement per collection is allowed per AQL query, and -it cannot be followed by read operations that access the same collection, by -traversal operations, or AQL functions that can read documents. -The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be -updated. - -The two syntaxes for an update operation are: - -``` -UPDATE document IN collection options -UPDATE key-expression WITH document IN collection options -``` - -*collection* must contain the name of the collection in which the documents should -be updated. *document* must be a document that contains the attributes and values -to be updated. When using the first syntax, *document* must also contain the *_key* -attribute to identify the document to be updated. - -``` -FOR u IN users - UPDATE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users -``` - -The following query is invalid because it does not contain a *_key* attribute and -thus it is not possible to determine the documents to be updated: - -``` -FOR u IN users - UPDATE { name: CONCAT(u.firstName, u.lastName) } IN users -``` - -When using the second syntax, *key-expression* provides the document identification. -This can either be a string (which must then contain the document key) or a -document, which must contain a *_key* attribute. - -The following queries are equivalent: - -``` -FOR u IN users - UPDATE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users - -FOR u IN users - UPDATE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users - -FOR u IN users - UPDATE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users -``` - -An update operation may update arbitrary documents which do not need to be identical -to the ones produced by a preceding *FOR* statement: - -``` -FOR i IN 1..1000 - UPDATE CONCAT('test', i) WITH { foobar: true } IN users - -FOR u IN users - FILTER u.active == false - UPDATE u WITH { status: 'inactive' } IN backup -``` - -!SUBSUBSECTION Setting query options - -*options* can be used to suppress query errors that may occur when trying to -update non-existing documents or violating unique key constraints: - -``` -FOR i IN 1..1000 - UPDATE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } -``` - -An update operation will only update the attributes specified in *document* and -leave other attributes untouched. Internal attributes (such as *_id*, *_key*, *_rev*, -*_from* and *_to*) cannot be updated and are ignored when specified in *document*. -Updating a document will modify the document's revision number with a server-generated value. - -When updating an attribute with a null value, ArangoDB will not remove the attribute -from the document but store a null value for it. To get rid of attributes in an update -operation, set them to null and provide the *keepNull* option: - -``` -FOR u IN users - UPDATE u WITH { foobar: true, notNeeded: null } IN users OPTIONS { keepNull: false } -``` - -The above query will remove the *notNeeded* attribute from the documents and update -the *foobar* attribute normally. - -There is also the option *mergeObjects* that controls whether object contents will be -merged if an object attribute is present in both the *UPDATE* query and in the -to-be-updated document. - -The following query will set the updated document's *name* attribute to the exact -same value that is specified in the query. This is due to the *mergeObjects* option -being set to *false*: - -``` -FOR u IN users - UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: false } -``` - -Contrary, the following query will merge the contents of the *name* attribute in the -original document with the value specified in the query: - -``` -FOR u IN users - UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: true } -``` - -Attributes in *name* that are present in the to-be-updated document but not in the -query will now be preserved. Attributes that are present in both will be overwritten -with the values specified in the query. - -Note: the default value for *mergeObjects* is *true*, so there is no need to specify it -explicitly. - -To make sure data are durable when an update query returns, there is the *waitForSync* -query option: - -``` -FOR u IN users - UPDATE u WITH { foobar: true } IN users OPTIONS { waitForSync: true } -``` - -!SUBSUBSECTION Returning the modified documents - -The modified documents can also be returned by the query. In this case, the `UPDATE` -statement needs to be followed a `RETURN` statement (intermediate `LET` statements -are allowed, too). These statements can refer to the pseudo-values `OLD` and `NEW`. -The `OLD` pseudo-value refers to the document revisions before the update, and `NEW` -refers to document revisions after the update. - -Both `OLD` and `NEW` will contain all document attributes, even those not specified -in the update expression. - -``` -UPDATE document IN collection options RETURN OLD -UPDATE document IN collection options RETURN NEW -UPDATE key-expression WITH document IN collection options RETURN OLD -UPDATE key-expression WITH document IN collection options RETURN NEW -``` - -Following is an example using a variable named `previous` to capture the original -documents before modification. For each modified document, the document key is returned. - -``` -FOR u IN users - UPDATE u WITH { value: "test" } - LET previous = OLD - RETURN previous._key -``` - -The following query uses the `NEW` pseudo-value to return the updated documents, -without some of the system attributes: - -``` -FOR u IN users - UPDATE u WITH { value: "test" } - LET updated = NEW - RETURN UNSET(updated, "_key", "_id", "_rev") -``` - -It is also possible to return both `OLD` and `NEW`: - -``` -FOR u IN users - UPDATE u WITH { value: "test" } - RETURN { before: OLD, after: NEW } -``` - - -!SUBSECTION REPLACE - -The *REPLACE* keyword can be used to completely replace documents in a collection. On a -single server, the replace operation is executed transactionally in an all-or-nothing -fashion. For sharded collections, the entire replace operation is not transactional. - -Each *REPLACE* operation is restricted to a single collection, and the -[collection name](../Glossary/README.md#collection-name) must not be dynamic. -Only a single *REPLACE* statement per collection is allowed per AQL query, and -it cannot be followed by read operations that access the same collection, by -traversal operations, or AQL functions that can read documents. -The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be -replaced. - -The two syntaxes for a replace operation are: - -``` -REPLACE document IN collection options -REPLACE key-expression WITH document IN collection options -``` - -*collection* must contain the name of the collection in which the documents should -be replaced. *document* is the replacement document. When using the first syntax, *document* -must also contain the *_key* attribute to identify the document to be replaced. - -``` -FOR u IN users - REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName), status: u.status } IN users -``` - -The following query is invalid because it does not contain a *_key* attribute and -thus it is not possible to determine the documents to be replaced: - -``` -FOR u IN users - REPLACE { name: CONCAT(u.firstName, u.lastName, status: u.status) } IN users -``` - -When using the second syntax, *key-expression* provides the document identification. -This can either be a string (which must then contain the document key) or a -document, which must contain a *_key* attribute. - -The following queries are equivalent: - -``` -FOR u IN users - REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users - -FOR u IN users - REPLACE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users - -FOR u IN users - REPLACE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users - -FOR u IN users - REPLACE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users -``` - -A replace will fully replace an existing document, but it will not modify the values -of internal attributes (such as *_id*, *_key*, *_from* and *_to*). Replacing a document -will modify a document's revision number with a server-generated value. - -A replace operation may update arbitrary documents which do not need to be identical -to the ones produced by a preceding *FOR* statement: - -``` -FOR i IN 1..1000 - REPLACE CONCAT('test', i) WITH { foobar: true } IN users - -FOR u IN users - FILTER u.active == false - REPLACE u WITH { status: 'inactive', name: u.name } IN backup -``` - -!SUBSUBSECTION Setting query options - -*options* can be used to suppress query errors that may occur when trying to -replace non-existing documents or when violating unique key constraints: - -``` -FOR i IN 1..1000 - REPLACE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } -``` - -To make sure data are durable when a replace query returns, there is the *waitForSync* -query option: - -``` -FOR i IN 1..1000 - REPLACE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { waitForSync: true } -``` - -!SUBSUBSECTION Returning the modified documents - -The modified documents can also be returned by the query. In this case, the `REPLACE` -statement must be followed by a `RETURN` statement (intermediate `LET` statements are -allowed, too). The `OLD` pseudo-value can be used to refer to document revisions before -the replace, and `NEW` refers to document revisions after the replace. - -Both `OLD` and `NEW` will contain all document attributes, even those not specified -in the replace expression. - - -``` -REPLACE document IN collection options RETURN OLD -REPLACE document IN collection options RETURN NEW -REPLACE key-expression WITH document IN collection options RETURN OLD -REPLACE key-expression WITH document IN collection options RETURN NEW -``` - -Following is an example using a variable named `previous` to return the original -documents before modification. For each replaced document, the document key will be -returned: - -``` -FOR u IN users - REPLACE u WITH { value: "test" } - LET previous = OLD - RETURN previous._key -``` - -The following query uses the `NEW` pseudo-value to return the replaced -documents (without some of their system attributes): - -``` -FOR u IN users - REPLACE u WITH { value: "test" } - LET replaced = NEW - RETURN UNSET(replaced, '_key', '_id', '_rev') -``` - - -!SUBSECTION INSERT - -The *INSERT* keyword can be used to insert new documents into a collection. On a -single server, an insert operation is executed transactionally in an all-or-nothing -fashion. For sharded collections, the entire insert operation is not transactional. - -Each *INSERT* operation is restricted to a single collection, and the -[collection name](../Glossary/README.md#collection-name) must not be dynamic. -Only a single *INSERT* statement per collection is allowed per AQL query, and -it cannot be followed by read operations that access the same collection, by -traversal operations, or AQL functions that can read documents. - -The syntax for an insert operation is: - -``` -INSERT document IN collection options -``` - -**Note**: The *INTO* keyword is also allowed in the place of *IN*. - -*collection* must contain the name of the collection into which the documents should -be inserted. *document* is the document to be inserted, and it may or may not contain -a *_key* attribute. If no *_key* attribute is provided, ArangoDB will auto-generate -a value for *_key* value. Inserting a document will also auto-generate a document -revision number for the document. - -``` -FOR i IN 1..100 - INSERT { value: i } IN numbers -``` - -When inserting into an [edge collection](../Glossary/README.md#edge-collection), it is mandatory to specify the attributes -*_from* and *_to* in document: - -``` -FOR u IN users - FOR p IN products - FILTER u._key == p.recommendedBy - INSERT { _from: u._id, _to: p._id } IN recommendations -``` - -!SUBSUBSECTION Setting query options - -*options* can be used to suppress query errors that may occur when violating unique -key constraints: - -``` -FOR i IN 1..1000 - INSERT { _key: CONCAT('test', i), name: "test" } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } -``` - -To make sure data are durable when an insert query returns, there is the *waitForSync* -query option: - -``` -FOR i IN 1..1000 - INSERT { _key: CONCAT('test', i), name: "test" } WITH { foobar: true } IN users OPTIONS { waitForSync: true } -``` - -!SUBSUBSECTION Returning the inserted documents - -The inserted documents can also be returned by the query. In this case, the `INSERT` -statement can be a `RETURN` statement (intermediate `LET` statements are allowed, too). -To refer to the inserted documents, the `INSERT` statement introduces a pseudo-value -named `NEW`. - -The documents contained in `NEW` will contain all attributes, even those auto-generated by -the database (e.g. `_id`, `_key`, `_rev`, `_from`, and `_to`). - - -``` -INSERT document IN collection options RETURN NEW -``` - -Following is an example using a variable named `inserted` to return the inserted -documents. For each inserted document, the document key is returned: - -``` -FOR i IN 1..100 - INSERT { value: i } - LET inserted = NEW - RETURN inserted._key -``` - - -!SUBSECTION UPSERT - -The *UPSERT* keyword can be used for checking whether certain documents exist, -and to update them in case they exist, or create them in case they do not exist. -On a single server, upserts are executed transactionally in an all-or-nothing fashion. -For sharded collections, the entire update operation is not transactional. - -Each *UPSERT* operation is restricted to a single collection, and the -[collection name](../Glossary/README.md#collection-name) must not be dynamic. -Only a single *UPSERT* statement per collection is allowed per AQL query, and -it cannot be followed by read operations that access the same collection, by -traversal operations, or AQL functions that can read documents. - -The syntax for an upsert operation is: - -``` -UPSERT search-expression INSERT insert-expression UPDATE update-expression IN collection options -UPSERT search-expression INSERT insert-expression REPLACE update-expression IN collection options -``` - -When using the *UPDATE* variant of the upsert operation, the found document will be -partially updated, meaning only the attributes specified in *update-expression* will be -updated or added. When using the *REPLACE* variant of upsert, existing documents will -be replaced with the contexts of *update-expression*. - -Updating a document will modify the document's revision number with a server-generated value. -The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be -updated. - -The *search-expression* contains the document to be looked for. It must be an object -literal without dynamic attribute names. In case no such document can be found in -*collection*, a new document will be inserted into the collection as specified in the -*insert-expression*. - -In case at least one document in *collection* matches the *search-expression*, it will -be updated using the *update-expression*. When more than one document in the collection -matches the *search-expression*, it is undefined which of the matching documents will -be updated. It is therefore often sensible to make sure by other means (such as unique -indexes, application logic etc.) that at most one document matches *search-expression*. - -The following query will look in the *users* collection for a document with a specific -*name* attribute value. If the document exists, its *logins* attribute will be increased -by one. If it does not exist, a new document will be inserted, consisting of the -attributes *name*, *logins*, and *dateCreated*: - -``` -UPSERT { name: 'superuser' } -INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() } -UPDATE { logins: OLD.logins + 1 } IN users -``` - -Note that in the *UPDATE* case it is possible to refer to the previous version of the -document using the *OLD* pseudo-value. - - -!SUBSUBSECTION Setting query options - -As in several above examples, the *ignoreErrors* option can be used to suppress query -errors that may occur when trying to violate unique key constraints. - -When updating or replacing an attribute with a null value, ArangoDB will not remove the -attribute from the document but store a null value for it. To get rid of attributes in -an upsert operation, set them to null and provide the *keepNull* option. - -There is also the option *mergeObjects* that controls whether object contents will be -merged if an object attribute is present in both the *UPDATE* query and in the -to-be-updated document. - -Note: the default value for *mergeObjects* is *true*, so there is no need to specify it -explicitly. - -To make sure data are durable when an update query returns, there is the *waitForSync* -query option. - - -!SUBSUBSECTION Returning documents - -`UPSERT` statements can optionally return data. To do so, they need to be followed -by a `RETURN` statement (intermediate `LET` statements are allowed, too). These statements -can optionally perform calculations and refer to the pseudo-values `OLD` and `NEW`. -In case the upsert performed an insert operation, `OLD` will have a value of *null*. -In case the upsert performed an update or replace operation, `OLD` will contain the -previous version of the document, before update/replace. - -`NEW` will always be populated. It will contain the inserted document in case the -upsert performed an insert, or the updated/replaced document in case it performed an -update/replace. - -This can also be used to check whether the upsert has performed an insert or an update -internally: - -``` -UPSERT { name: 'superuser' } -INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() } -UPDATE { logins: OLD.logins + 1 } IN users -RETURN { doc: NEW, type: OLD ? 'update' : 'insert' } -``` +The following high-level operations are described here after: + +- *FOR* : to iterate over all elements of an array. +- *RETURN* : to produce the result of a query. +- *FILTER* : to restrict the results to elements that match an arbitrary logical condition. +- *SORT* : to force a sort of the array of already produced intermediate results. +- *LIMIT* : to reduce the number of elements in the result to at most the specified number. +- *LET* : to assign an arbitrary value to a variable. +- *COLLECT* : to group an array by one or multiple group criteria. +- *REMOVE* : to remove documents from a collection. +- *UPDATE* : to partially update documents in a collection. +- *REPLACE* : to completely replace documents in a collection. +- *INSERT* : to insert new documents into a collection. +- *UPSERT* : to update an existing document, or create it in the case it does not exist. diff --git a/Documentation/Books/Users/AqlOperations/Collect.mdpp b/Documentation/Books/Users/AqlOperations/Collect.mdpp new file mode 100644 index 0000000000..7e29b775ea --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Collect.mdpp @@ -0,0 +1,263 @@ +!CHAPTER COLLECT + +The *COLLECT* keyword can be used to group an array by one or multiple group +criteria. + +The *COLLECT* statement will eliminate all local variables in the current +scope. After *COLLECT* only the variables introduced by *COLLECT* itself are +available. + +The general syntaxes for *COLLECT* are: + +``` +COLLECT variable-name = expression options +COLLECT variable-name = expression INTO groups-variable options +COLLECT variable-name = expression INTO groups-variable = projection-expression options +COLLECT variable-name = expression INTO groups-variable KEEP keep-variable options +COLLECT variable-name = expression WITH COUNT INTO count-variable options +COLLECT WITH COUNT INTO count-variable options +``` + +!SUBSECTION Grouping syntaxes + +The first syntax form of *COLLECT* only groups the result by the defined group +criteria specified in *expression*. In order to further process the results +produced by *COLLECT*, a new variable (specified by *variable-name*) is introduced. +This variable contains the group value. + +Here's an example query that find the distinct values in *u.city* and makes +them available in variable *city*: + +``` +FOR u IN users + COLLECT city = u.city + RETURN { + "city" : city + } +``` + +The second form does the same as the first form, but additionally introduces a +variable (specified by *groups-variable*) that contains all elements that fell into the +group. This works as follows: The *groups-variable* variable is an array containing +as many elements as there are in the group. Each member of that array is +a JSON object in which the value of every variable that is defined in the +AQL query is bound to the corresponding attribute. Note that this considers +all variables that are defined before the *COLLECT* statement, but not those on +the top level (outside of any *FOR*), unless the *COLLECT* statement is itself +on the top level, in which case all variables are taken. Furthermore note +that it is possible that the optimizer moves *LET* statements out of *FOR* +statements to improve performance. + +``` +FOR u IN users + COLLECT city = u.city INTO groups + RETURN { + "city" : city, + "usersInCity" : groups + } +``` + +In the above example, the array *users* will be grouped by the attribute +*city*. The result is a new array of documents, with one element per distinct +*u.city* value. The elements from the original array (here: *users*) per city are +made available in the variable *groups*. This is due to the *INTO* clause. + +*COLLECT* also allows specifying multiple group criteria. Individual group +criteria can be separated by commas: + +``` +FOR u IN users + COLLECT country = u.country, city = u.city INTO groups + RETURN { + "country" : country, + "city" : city, + "usersInCity" : groups + } +``` + +In the above example, the array *users* is grouped by country first and then +by city, and for each distinct combination of country and city, the users +will be returned. + + +!SUBSECTION Discarding obsolete variables + +The third form of *COLLECT* allows rewriting the contents of the *groups-variable* +using an arbitrary *projection-expression*: + +``` +FOR u IN users + COLLECT country = u.country, city = u.city INTO groups = u.name + RETURN { + "country" : country, + "city" : city, + "userNames" : groups + } +``` + +In the above example, only the *projection-expression* is *u.name*. Therefore, +only this attribute is copied into the *groups-variable* for each document. +This is probably much more efficient than copying all variables from the scope into +the *groups-variable* as it would happen without a *projection-expression*. + +The expression following *INTO* can also be used for arbitrary computations: + +``` +FOR u IN users + COLLECT country = u.country, city = u.city INTO groups = { + "name" : u.name, + "isActive" : u.status == "active" + } + RETURN { + "country" : country, + "city" : city, + "usersInCity" : groups + } +``` + +*COLLECT* also provides an optional *KEEP* clause that can be used to control +which variables will be copied into the variable created by `INTO`. If no +*KEEP* clause is specified, all variables from the scope will be copied as +sub-attributes into the *groups-variable*. +This is safe but can have a negative impact on performance if there +are many variables in scope or the variables contain massive amounts of data. + +The following example limits the variables that are copied into the *groups-variable* +to just *name*. The variables *u* and *someCalculation* also present in the scope +will not be copied into *groups-variable* because they are not listed in the *KEEP* clause: + +``` +FOR u IN users + LET name = u.name + LET someCalculation = u.value1 + u.value2 + COLLECT city = u.city INTO groups KEEP name + RETURN { + "city" : city, + "userNames" : groups[*].name + } +``` + +*KEEP* is only valid in combination with *INTO*. Only valid variable names can +be used in the *KEEP* clause. *KEEP* supports the specification of multiple +variable names. + + +!SUBSECTION Group length calculation + +*COLLECT* also provides a special *WITH COUNT* clause that can be used to +determine the number of group members efficiently. + +The simplest form just returns the number of items that made it into the +*COLLECT*: + +``` +FOR u IN users + COLLECT WITH COUNT INTO length + RETURN length +``` + +The above is equivalent to, but more efficient than: + +``` +RETURN LENGTH( + FOR u IN users + RETURN length +) +``` + +The *WITH COUNT* clause can also be used to efficiently count the number +of items in each group: + +``` +FOR u IN users + COLLECT age = u.age WITH COUNT INTO length + RETURN { + "age" : age, + "count" : length + } +``` + +Note: the *WITH COUNT* clause can only be used together with an *INTO* clause. + + +!SUBSECTION COLLECT variants + +Since ArangoDB 2.6, there are two variants of *COLLECT* that the optimizer can +choose from: the *sorted* variant and the *hash* variant. The *hash* variant only becomes a +candidate for *COLLECT* statements that do not use an *INTO* clause. + +The optimizer will always generate a plan that employs the *sorted* method. The *sorted* method +requires its input to be sorted by the group criteria specified in the *COLLECT* clause. +To ensure correctness of the result, the AQL optimizer will automatically insert a *SORT* +statement into the query in front of the *COLLECT* statement. The optimizer may be able to +optimize away that *SORT* statement later if a sorted index is present on the group criteria. + +In case a *COLLECT* qualifies for using the *hash* variant, the optimizer will create an extra +plan for it at the beginning of the planning phase. In this plan, no extra *SORT* statement will be +added in front of the *COLLECT*. This is because the *hash* variant of *COLLECT* does not require +sorted input. Instead, a *SORT* statement will be added after the *COLLECT* to sort its output. +This *SORT* statement may be optimized away again in later stages. +If the sort order of the *COLLECT* is irrelevant to the user, adding the extra instruction *SORT null* +after the *COLLECT* will allow the optimizer to remove the sorts altogether: + +``` +FOR u IN users + COLLECT age = u.age + SORT null /* note: will be optimized away */ + RETURN age +``` + +Which *COLLECT* variant is used by the optimizer depends on the optimizer's cost estimations. The +created plans with the different *COLLECT* variants will be shipped through the regular optimization +pipeline. In the end, the optimizer will pick the plan with the lowest estimated total cost as usual. + +In general, the *sorted* variant of *COLLECT* should be preferred in cases when there is a sorted index +present on the group criteria. In this case the optimizer can eliminate the *SORT* statement in front +of the *COLLECT*, so that no *SORT* will be left. + +If there is no sorted index available on the group criteria, the up-front sort required by the *sorted* +variant can be expensive. In this case it is likely that the optimizer will prefer the *hash* variant +of *COLLECT*, which does not require its input to be sorted. + +Which variant of *COLLECT* was actually used can be figured out by looking into the execution plan of +a query, specifically the *AggregateNode* and its *aggregationOptions* attribute. + + +!SUBSECTION Setting COLLECT options + +*options* can be used in a *COLLECT* statement to inform the optimizer about the preferred *COLLECT* +method. When specifying the following appendix to a *COLLECT* statement, the optimizer will always use +the *sorted* variant of *COLLECT* and not even create a plan using the *hash* variant: + +``` +OPTIONS { method: "sorted" } +``` + +Note that specifying *hash* as method will not make the optimizer use the *hash* variant. This is +because the *hash* variant is not eligible for all queries. Instead, if no options or any other method +than *sorted* are specified in *OPTIONS*, the optimizer will use its regular cost estimations. + + +!SUBSECTION COLLECT vs. RETURN DISTINCT + +In order to make a result set unique, one can either use *COLLECT* or *RETURN DISTINCT*. Behind the +scenes, both variants will work by creating an *AggregateNode*. For both variants, the optimizer +may try the sorted and the hashed variant of *COLLECT*. The difference is therefore mainly syntactical, +with *RETURN DISTINCT* saving a bit of typing when compared to an equivalent *COLLECT*: + +``` +FOR u IN users + RETURN DISTINCT u.age +``` + +``` +FOR u IN users + COLLECT age = u.age + RETURN age +``` + +However, *COLLECT* is vastly more flexible than *RETURN DISTINCT*. Additionally, the order of results is +undefined for a *RETURN DISTINCT*, whereas for a *COLLECT* the results will be sorted. + + + diff --git a/Documentation/Books/Users/AqlOperations/Filter.mdpp b/Documentation/Books/Users/AqlOperations/Filter.mdpp new file mode 100644 index 0000000000..595d84451f --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Filter.mdpp @@ -0,0 +1,37 @@ +!CHAPTER FILTER + +The *FILTER* statement can be used to restrict the results to elements that +match an arbitrary logical condition. The general syntax is: + +``` +FILTER condition +``` + +*condition* must be a condition that evaluates to either *false* or *true*. If +the condition result is false, the current element is skipped, so it will not be +processed further and not be part of the result. If the condition is true, the +current element is not skipped and can be further processed. + +``` +FOR u IN users + FILTER u.active == true && u.age < 39 + RETURN u +``` + +In the above example, all array elements from *users* will be included that have +an attribute *active* with value *true* and that have an attribute *age* with a +value less than *39* (including *null* ones). All other elements from *users* +will be skipped and not be included the result produced by *RETURN*. + +It is allowed to specify multiple *FILTER* statements in a query, and even in +the same block. If multiple *FILTER* statements are used, their results will be +combined with a logical and, meaning all filter conditions must be true to +include an element. + +``` +FOR u IN users + FILTER u.active == true + FILTER u.age < 39 + RETURN u +``` + diff --git a/Documentation/Books/Users/AqlOperations/For.mdpp b/Documentation/Books/Users/AqlOperations/For.mdpp new file mode 100644 index 0000000000..ce3a555eb8 --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/For.mdpp @@ -0,0 +1,64 @@ +!CHAPTER FOR + + +The *FOR* keyword can be to iterate over all elements of an array. +The general syntax is: + +``` +FOR variable-name IN expression +``` + +There also is a special case for graph traversals: + +``` +FOR vertex-variable-name, edge-variable-name, path-variable-name IN traversal-expression +``` + +For this special case see [the graph traversals chapter](../Aql/GraphTraversals.md). +For all other cases read on: + +Each array element returned by *expression* is visited exactly once. It is +required that *expression* returns an array in all cases. The empty array is +allowed, too. The current array element is made available for further processing +in the variable specified by *variable-name*. + +``` +FOR u IN users + RETURN u +``` + +This will iterate over all elements from the array *users* (note: this array +consists of all documents from the collection named "users" in this case) and +make the current array element available in variable *u*. *u* is not modified in +this example but simply pushed into the result using the *RETURN* keyword. + +Note: When iterating over collection-based arrays as shown here, the order of +documents is undefined unless an explicit sort order is defined using a *SORT* +statement. + +The variable introduced by *FOR* is available until the scope the *FOR* is +placed in is closed. + +Another example that uses a statically declared array of values to iterate over: + +``` +FOR year IN [ 2011, 2012, 2013 ] + RETURN { "year" : year, "isLeapYear" : year % 4 == 0 && (year % 100 != 0 || year % 400 == 0) } +``` + +Nesting of multiple *FOR* statements is allowed, too. When *FOR* statements are +nested, a cross product of the array elements returned by the individual *FOR* +statements will be created. + +``` +FOR u IN users + FOR l IN locations + RETURN { "user" : u, "location" : l } +``` + +In this example, there are two array iterations: an outer iteration over the array +*users* plus an inner iteration over the array *locations*. The inner array is +traversed as many times as there are elements in the outer array. For each +iteration, the current values of *users* and *locations* are made available for +further processing in the variable *u* and *l*. + diff --git a/Documentation/Books/Users/AqlOperations/Insert.mdpp b/Documentation/Books/Users/AqlOperations/Insert.mdpp new file mode 100644 index 0000000000..69173180e6 --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Insert.mdpp @@ -0,0 +1,85 @@ +!CHAPTER INSERT + +The *INSERT* keyword can be used to insert new documents into a collection. On a +single server, an insert operation is executed transactionally in an all-or-nothing +fashion. For sharded collections, the entire insert operation is not transactional. + +Each *INSERT* operation is restricted to a single collection, and the +[collection name](../Glossary/README.md#collection-name) must not be dynamic. +Only a single *INSERT* statement per collection is allowed per AQL query, and +it cannot be followed by read operations that access the same collection, by +traversal operations, or AQL functions that can read documents. + +The syntax for an insert operation is: + +``` +INSERT document IN collection options +``` + +**Note**: The *INTO* keyword is also allowed in the place of *IN*. + +*collection* must contain the name of the collection into which the documents should +be inserted. *document* is the document to be inserted, and it may or may not contain +a *_key* attribute. If no *_key* attribute is provided, ArangoDB will auto-generate +a value for *_key* value. Inserting a document will also auto-generate a document +revision number for the document. + +``` +FOR i IN 1..100 + INSERT { value: i } IN numbers +``` + +When inserting into an [edge collection](../Glossary/README.md#edge-collection), it is mandatory to specify the attributes +*_from* and *_to* in document: + +``` +FOR u IN users + FOR p IN products + FILTER u._key == p.recommendedBy + INSERT { _from: u._id, _to: p._id } IN recommendations +``` + +!SUBSECTION Setting query options + +*options* can be used to suppress query errors that may occur when violating unique +key constraints: + +``` +FOR i IN 1..1000 + INSERT { _key: CONCAT('test', i), name: "test" } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } +``` + +To make sure data are durable when an insert query returns, there is the *waitForSync* +query option: + +``` +FOR i IN 1..1000 + INSERT { _key: CONCAT('test', i), name: "test" } WITH { foobar: true } IN users OPTIONS { waitForSync: true } +``` + +!SUBSECTION Returning the inserted documents + +The inserted documents can also be returned by the query. In this case, the `INSERT` +statement can be a `RETURN` statement (intermediate `LET` statements are allowed, too). +To refer to the inserted documents, the `INSERT` statement introduces a pseudo-value +named `NEW`. + +The documents contained in `NEW` will contain all attributes, even those auto-generated by +the database (e.g. `_id`, `_key`, `_rev`, `_from`, and `_to`). + + +``` +INSERT document IN collection options RETURN NEW +``` + +Following is an example using a variable named `inserted` to return the inserted +documents. For each inserted document, the document key is returned: + +``` +FOR i IN 1..100 + INSERT { value: i } + LET inserted = NEW + RETURN inserted._key +``` + + diff --git a/Documentation/Books/Users/AqlOperations/Let.mdpp b/Documentation/Books/Users/AqlOperations/Let.mdpp new file mode 100644 index 0000000000..2a88a969df --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Let.mdpp @@ -0,0 +1,50 @@ +!CHAPTER LET + +The *LET* statement can be used to assign an arbitrary value to a variable. The +variable is then introduced in the scope the *LET* statement is placed in. The +general syntax is: + +``` +LET variable-name = expression +``` + +*LET* statements are mostly used to declare complex computations and to avoid +repeated computations of the same value at multiple parts of a query. + +``` +FOR u IN users + LET numRecommendations = LENGTH(u.recommendations) + RETURN { + "user" : u, + "numRecommendations" : numRecommendations, + "isPowerUser" : numRecommendations >= 10 + } +``` + +In the above example, the computation of the number of recommendations is +factored out using a *LET* statement, thus avoiding computing the value twice in +the *RETURN* statement. + +Another use case for *LET* is to declare a complex computation in a subquery, +making the whole query more readable. + +``` +FOR u IN users + LET friends = ( + FOR f IN friends + FILTER u.id == f.userId + RETURN f + ) + LET memberships = ( + FOR m IN memberships + FILTER u.id == m.userId + RETURN m + ) + RETURN { + "user" : u, + "friends" : friends, + "numFriends" : LENGTH(friends), + "memberShips" : memberships + } +``` + diff --git a/Documentation/Books/Users/AqlOperations/Limit.mdpp b/Documentation/Books/Users/AqlOperations/Limit.mdpp new file mode 100644 index 0000000000..9c0032181b --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Limit.mdpp @@ -0,0 +1,26 @@ +!CHAPTER LIMIT + +The *LIMIT* statement allows slicing the result array using an +offset and a count. It reduces the number of elements in the result to at most +the specified number. Two general forms of *LIMIT* are followed: + +``` +LIMIT count +LIMIT offset, count +``` + +The first form allows specifying only the *count* value whereas the second form +allows specifying both *offset* and *count*. The first form is identical using +the second form with an *offset* value of *0*. + +The *offset* value specifies how many elements from the result shall be +discarded. It must be 0 or greater. The *count* value specifies how many +elements should be at most included in the result. + +``` +FOR u IN users + SORT u.firstName, u.lastName, u.id DESC + LIMIT 0, 5 + RETURN u +``` + diff --git a/Documentation/Books/Users/AqlOperations/Remove.mdpp b/Documentation/Books/Users/AqlOperations/Remove.mdpp new file mode 100644 index 0000000000..7da97d7a6f --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Remove.mdpp @@ -0,0 +1,98 @@ + +!CHAPTER REMOVE + +The *REMOVE* keyword can be used to remove documents from a collection. On a +single server, the document removal is executed transactionally in an +all-or-nothing fashion. For sharded collections, the entire remove operation +is not transactional. + +Each *REMOVE* operation is restricted to a single collection, and the +[collection name](../Glossary/README.md#collection-name) must not be dynamic. +Only a single *REMOVE* statement per collection is allowed per AQL query, and +it cannot be followed by read operations that access the same collection, by +traversal operations, or AQL functions that can read documents. + +The syntax for a remove operation is: + +``` +REMOVE key-expression IN collection options +``` + +*collection* must contain the name of the collection to remove the documents +from. *key-expression* must be an expression that contains the document identification. +This can either be a string (which must then contain the [document key](../Glossary/README.md#document-key)) or a +document, which must contain a *_key* attribute. + +The following queries are thus equivalent: + +``` +FOR u IN users + REMOVE { _key: u._key } IN users + +FOR u IN users + REMOVE u._key IN users + +FOR u IN users + REMOVE u IN users +``` + +**Note**: A remove operation can remove arbitrary documents, and the documents +do not need to be identical to the ones produced by a preceding *FOR* statement: + +``` +FOR i IN 1..1000 + REMOVE { _key: CONCAT('test', i) } IN users + +FOR u IN users + FILTER u.active == false + REMOVE { _key: u._key } IN backup +``` + +!SUBSECTION Setting query options + +*options* can be used to suppress query errors that may occur when trying to +remove non-existing documents. For example, the following query will fail if one +of the to-be-deleted documents does not exist: + +``` +FOR i IN 1..1000 + REMOVE { _key: CONCAT('test', i) } IN users +``` + +By specifying the *ignoreErrors* query option, these errors can be suppressed so +the query completes: + +``` +FOR i IN 1..1000 + REMOVE { _key: CONCAT('test', i) } IN users OPTIONS { ignoreErrors: true } +``` + +To make sure data has been written to disk when a query returns, there is the *waitForSync* +query option: + +``` +FOR i IN 1..1000 + REMOVE { _key: CONCAT('test', i) } IN users OPTIONS { waitForSync: true } +``` + +!SUBSECTION Returning the removed documents + +The removed documents can also be returned by the query. In this case, the `REMOVE` +statement must be followed by a `RETURN` statement (intermediate `LET` statements +are allowed, too).`REMOVE` introduces the pseudo-value `OLD` to refer to the removed +documents: + +``` +REMOVE key-expression IN collection options RETURN OLD +``` + +Following is an example using a variable named `removed` for capturing the removed +documents. For each removed document, the document key will be returned. + +``` +FOR u IN users + REMOVE u IN users + LET removed = OLD + RETURN removed._key +``` + diff --git a/Documentation/Books/Users/AqlOperations/Replace.mdpp b/Documentation/Books/Users/AqlOperations/Replace.mdpp new file mode 100644 index 0000000000..fa40291d82 --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Replace.mdpp @@ -0,0 +1,131 @@ +!CHAPTER REPLACE + +The *REPLACE* keyword can be used to completely replace documents in a collection. On a +single server, the replace operation is executed transactionally in an all-or-nothing +fashion. For sharded collections, the entire replace operation is not transactional. + +Each *REPLACE* operation is restricted to a single collection, and the +[collection name](../Glossary/README.md#collection-name) must not be dynamic. +Only a single *REPLACE* statement per collection is allowed per AQL query, and +it cannot be followed by read operations that access the same collection, by +traversal operations, or AQL functions that can read documents. +The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be +replaced. + +The two syntaxes for a replace operation are: + +``` +REPLACE document IN collection options +REPLACE key-expression WITH document IN collection options +``` + +*collection* must contain the name of the collection in which the documents should +be replaced. *document* is the replacement document. When using the first syntax, *document* +must also contain the *_key* attribute to identify the document to be replaced. + +``` +FOR u IN users + REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName), status: u.status } IN users +``` + +The following query is invalid because it does not contain a *_key* attribute and +thus it is not possible to determine the documents to be replaced: + +``` +FOR u IN users + REPLACE { name: CONCAT(u.firstName, u.lastName, status: u.status) } IN users +``` + +When using the second syntax, *key-expression* provides the document identification. +This can either be a string (which must then contain the document key) or a +document, which must contain a *_key* attribute. + +The following queries are equivalent: + +``` +FOR u IN users + REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users + +FOR u IN users + REPLACE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users + +FOR u IN users + REPLACE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users + +FOR u IN users + REPLACE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users +``` + +A replace will fully replace an existing document, but it will not modify the values +of internal attributes (such as *_id*, *_key*, *_from* and *_to*). Replacing a document +will modify a document's revision number with a server-generated value. + +A replace operation may update arbitrary documents which do not need to be identical +to the ones produced by a preceding *FOR* statement: + +``` +FOR i IN 1..1000 + REPLACE CONCAT('test', i) WITH { foobar: true } IN users + +FOR u IN users + FILTER u.active == false + REPLACE u WITH { status: 'inactive', name: u.name } IN backup +``` + +!SUBSECTION Setting query options + +*options* can be used to suppress query errors that may occur when trying to +replace non-existing documents or when violating unique key constraints: + +``` +FOR i IN 1..1000 + REPLACE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } +``` + +To make sure data are durable when a replace query returns, there is the *waitForSync* +query option: + +``` +FOR i IN 1..1000 + REPLACE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { waitForSync: true } +``` + +!SUBSECTION Returning the modified documents + +The modified documents can also be returned by the query. In this case, the `REPLACE` +statement must be followed by a `RETURN` statement (intermediate `LET` statements are +allowed, too). The `OLD` pseudo-value can be used to refer to document revisions before +the replace, and `NEW` refers to document revisions after the replace. + +Both `OLD` and `NEW` will contain all document attributes, even those not specified +in the replace expression. + + +``` +REPLACE document IN collection options RETURN OLD +REPLACE document IN collection options RETURN NEW +REPLACE key-expression WITH document IN collection options RETURN OLD +REPLACE key-expression WITH document IN collection options RETURN NEW +``` + +Following is an example using a variable named `previous` to return the original +documents before modification. For each replaced document, the document key will be +returned: + +``` +FOR u IN users + REPLACE u WITH { value: "test" } + LET previous = OLD + RETURN previous._key +``` + +The following query uses the `NEW` pseudo-value to return the replaced +documents (without some of their system attributes): + +``` +FOR u IN users + REPLACE u WITH { value: "test" } + LET replaced = NEW + RETURN UNSET(replaced, '_key', '_id', '_rev') +``` + diff --git a/Documentation/Books/Users/AqlOperations/Return.mdpp b/Documentation/Books/Users/AqlOperations/Return.mdpp new file mode 100644 index 0000000000..0f4891e6d0 --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Return.mdpp @@ -0,0 +1,99 @@ +!CHAPTER RETURN + +The *RETURN* statement can be used to produce the result of a query. +It is mandatory to specify a *RETURN* statement at the end of each block in a +data-selection query, otherwise the query result would be undefined. Using +*RETURN* on the main level in data-modification queries is optional. + +The general syntax for *RETURN* is: + +``` +RETURN expression +``` + +The *expression* returned by *RETURN* is produced for each iteration in the block the +*RETURN* statement is placed in. That means the result of a *RETURN* statement +is always an array (this includes the empty array). To return all elements from +the currently iterated array without modification, the following simple form can +be used: + +``` +FOR variable-name IN expression + RETURN variable-name +``` + +As *RETURN* allows specifying an expression, arbitrary computations can be +performed to calculate the result elements. Any of the variables valid in the +scope the *RETURN* is placed in can be used for the computations. + +Note: *RETURN* will close the current scope and eliminate all local variables in +it. + +!SUBSECTION RETURN DISTINCT + +Since ArangoDB 2.7, *RETURN* can optionally be followed by the *DISTINCT* keyword. +The *DISTINCT* keyword will ensure uniqueness of the values returned by the +*RETURN* statement: + +``` +FOR variable-name IN expression + RETURN DISTINCT expression +``` + +If the *DISTINCT* is applied on an expression that itself is an array or a subquery, +the *DISTINCT* will not make the values in each array or subquery result unique, but instead +ensure that the result contains only distinct arrays or subquery results. To make +the result of an array or a subquery unique, simply apply the *DISTINCT* for the +array or the subquery. + +For example, the following query will apply *DISTINCT* on its subquery results, +but not inside the subquery: + +``` +FOR what IN 1..2 + RETURN DISTINCT ( + FOR i IN [ 1, 2, 3, 4, 1, 3 ] + RETURN i + ) +``` + +Here we'll have a *FOR* loop with two iterations that each execute a subquery. The +*DISTINCT* here is applied on the two subquery results. Both subqueries return the +same result value (that is [ 1, 2, 3, 4, 1, 3 ]), so after *DISTINCT* there will +only be one occurrence of the value [ 1, 2, 3, 4, 1, 3 ] left: + +``` +[ + [ 1, 2, 3, 4, 1, 3 ] +] +``` + +If the goal is to apply the *DISTINCT* inside the subquery, it needs to be moved +there: + +``` +FOR what IN 1..2 + LET sub = ( + FOR i IN [ 1, 2, 3, 4, 1, 3 ] + RETURN DISTINCT i + ) + RETURN sub +``` + +In the above case, the *DISTINCT* will make the subquery results unique, so that +each subquery will return a unique array of values ([ 1, 2, 3, 4 ]). As the subquery +is executed twice and there is no *DISTINCT* on the top-level, that array will be +returned twice: + +``` +[ + [ 1, 2, 3, 4 ], + [ 1, 2, 3, 4 ] +] +``` + +Note: the order of results is undefined for *RETURN DISTINCT*. + +Note: *RETURN DISTINCT* is not allowed on the top-level of a query if there is no *FOR* +loop in front of it. + diff --git a/Documentation/Books/Users/AqlOperations/Sort.mdpp b/Documentation/Books/Users/AqlOperations/Sort.mdpp new file mode 100644 index 0000000000..00bccbd184 --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Sort.mdpp @@ -0,0 +1,31 @@ + +!CHAPTER SORT + +The *SORT* statement will force a sort of the array of already produced +intermediate results in the current block. *SORT* allows specifying one or +multiple sort criteria and directions. The general syntax is: + +``` +SORT expression direction +``` + +Specifying the *direction* is optional. The default (implicit) direction for a +sort is the ascending order. To explicitly specify the sort direction, the +keywords *ASC* (ascending) and *DESC* can be used. Multiple sort criteria can be +separated using commas. + +Note: when iterating over collection-based arrays, the order of documents is +always undefined unless an explicit sort order is defined using *SORT*. + +``` +FOR u IN users + SORT u.lastName, u.firstName, u.id DESC + RETURN u +``` + +Note that constant *SORT* expressions can be used to indicate that no particular +sort order is desired. Constant *SORT* expressions will be optimized away by the AQL +optimizer during optimization, but specifying them explicitly may enable further +optimizations if the optimizer does not need to take into account any particular +sort order. + diff --git a/Documentation/Books/Users/AqlOperations/Update.mdpp b/Documentation/Books/Users/AqlOperations/Update.mdpp new file mode 100644 index 0000000000..48827a7918 --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Update.mdpp @@ -0,0 +1,177 @@ +!CHAPTER UPDATE + +The *UPDATE* keyword can be used to partially update documents in a collection. On a +single server, updates are executed transactionally in an all-or-nothing fashion. +For sharded collections, the entire update operation is not transactional. + +Each *UPDATE* operation is restricted to a single collection, and the +[collection name](../Glossary/README.md#collection-name) must not be dynamic. +Only a single *UPDATE* statement per collection is allowed per AQL query, and +it cannot be followed by read operations that access the same collection, by +traversal operations, or AQL functions that can read documents. +The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be +updated. + +The two syntaxes for an update operation are: + +``` +UPDATE document IN collection options +UPDATE key-expression WITH document IN collection options +``` + +*collection* must contain the name of the collection in which the documents should +be updated. *document* must be a document that contains the attributes and values +to be updated. When using the first syntax, *document* must also contain the *_key* +attribute to identify the document to be updated. + +``` +FOR u IN users + UPDATE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users +``` + +The following query is invalid because it does not contain a *_key* attribute and +thus it is not possible to determine the documents to be updated: + +``` +FOR u IN users + UPDATE { name: CONCAT(u.firstName, u.lastName) } IN users +``` + +When using the second syntax, *key-expression* provides the document identification. +This can either be a string (which must then contain the document key) or a +document, which must contain a *_key* attribute. + +The following queries are equivalent: + +``` +FOR u IN users + UPDATE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users + +FOR u IN users + UPDATE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users + +FOR u IN users + UPDATE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users +``` + +An update operation may update arbitrary documents which do not need to be identical +to the ones produced by a preceding *FOR* statement: + +``` +FOR i IN 1..1000 + UPDATE CONCAT('test', i) WITH { foobar: true } IN users + +FOR u IN users + FILTER u.active == false + UPDATE u WITH { status: 'inactive' } IN backup +``` + +!SUBSECTION Setting query options + +*options* can be used to suppress query errors that may occur when trying to +update non-existing documents or violating unique key constraints: + +``` +FOR i IN 1..1000 + UPDATE { _key: CONCAT('test', i) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } +``` + +An update operation will only update the attributes specified in *document* and +leave other attributes untouched. Internal attributes (such as *_id*, *_key*, *_rev*, +*_from* and *_to*) cannot be updated and are ignored when specified in *document*. +Updating a document will modify the document's revision number with a server-generated value. + +When updating an attribute with a null value, ArangoDB will not remove the attribute +from the document but store a null value for it. To get rid of attributes in an update +operation, set them to null and provide the *keepNull* option: + +``` +FOR u IN users + UPDATE u WITH { foobar: true, notNeeded: null } IN users OPTIONS { keepNull: false } +``` + +The above query will remove the *notNeeded* attribute from the documents and update +the *foobar* attribute normally. + +There is also the option *mergeObjects* that controls whether object contents will be +merged if an object attribute is present in both the *UPDATE* query and in the +to-be-updated document. + +The following query will set the updated document's *name* attribute to the exact +same value that is specified in the query. This is due to the *mergeObjects* option +being set to *false*: + +``` +FOR u IN users + UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: false } +``` + +Contrary, the following query will merge the contents of the *name* attribute in the +original document with the value specified in the query: + +``` +FOR u IN users + UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: true } +``` + +Attributes in *name* that are present in the to-be-updated document but not in the +query will now be preserved. Attributes that are present in both will be overwritten +with the values specified in the query. + +Note: the default value for *mergeObjects* is *true*, so there is no need to specify it +explicitly. + +To make sure data are durable when an update query returns, there is the *waitForSync* +query option: + +``` +FOR u IN users + UPDATE u WITH { foobar: true } IN users OPTIONS { waitForSync: true } +``` + +!SUBSECTION Returning the modified documents + +The modified documents can also be returned by the query. In this case, the `UPDATE` +statement needs to be followed a `RETURN` statement (intermediate `LET` statements +are allowed, too). These statements can refer to the pseudo-values `OLD` and `NEW`. +The `OLD` pseudo-value refers to the document revisions before the update, and `NEW` +refers to document revisions after the update. + +Both `OLD` and `NEW` will contain all document attributes, even those not specified +in the update expression. + +``` +UPDATE document IN collection options RETURN OLD +UPDATE document IN collection options RETURN NEW +UPDATE key-expression WITH document IN collection options RETURN OLD +UPDATE key-expression WITH document IN collection options RETURN NEW +``` + +Following is an example using a variable named `previous` to capture the original +documents before modification. For each modified document, the document key is returned. + +``` +FOR u IN users + UPDATE u WITH { value: "test" } + LET previous = OLD + RETURN previous._key +``` + +The following query uses the `NEW` pseudo-value to return the updated documents, +without some of the system attributes: + +``` +FOR u IN users + UPDATE u WITH { value: "test" } + LET updated = NEW + RETURN UNSET(updated, "_key", "_id", "_rev") +``` + +It is also possible to return both `OLD` and `NEW`: + +``` +FOR u IN users + UPDATE u WITH { value: "test" } + RETURN { before: OLD, after: NEW } +``` + diff --git a/Documentation/Books/Users/AqlOperations/Upsert.mdpp b/Documentation/Books/Users/AqlOperations/Upsert.mdpp new file mode 100644 index 0000000000..02f869d0ed --- /dev/null +++ b/Documentation/Books/Users/AqlOperations/Upsert.mdpp @@ -0,0 +1,98 @@ +!CHAPTER UPSERT + +The *UPSERT* keyword can be used for checking whether certain documents exist, +and to update them in case they exist, or create them in case they do not exist. +On a single server, upserts are executed transactionally in an all-or-nothing fashion. +For sharded collections, the entire update operation is not transactional. + +Each *UPSERT* operation is restricted to a single collection, and the +[collection name](../Glossary/README.md#collection-name) must not be dynamic. +Only a single *UPSERT* statement per collection is allowed per AQL query, and +it cannot be followed by read operations that access the same collection, by +traversal operations, or AQL functions that can read documents. + +The syntax for an upsert operation is: + +``` +UPSERT search-expression INSERT insert-expression UPDATE update-expression IN collection options +UPSERT search-expression INSERT insert-expression REPLACE update-expression IN collection options +``` + +When using the *UPDATE* variant of the upsert operation, the found document will be +partially updated, meaning only the attributes specified in *update-expression* will be +updated or added. When using the *REPLACE* variant of upsert, existing documents will +be replaced with the contexts of *update-expression*. + +Updating a document will modify the document's revision number with a server-generated value. +The system attributes of documents (*_key*, *_id*, *_from*, *_to*, *_rev*) cannot be +updated. + +The *search-expression* contains the document to be looked for. It must be an object +literal without dynamic attribute names. In case no such document can be found in +*collection*, a new document will be inserted into the collection as specified in the +*insert-expression*. + +In case at least one document in *collection* matches the *search-expression*, it will +be updated using the *update-expression*. When more than one document in the collection +matches the *search-expression*, it is undefined which of the matching documents will +be updated. It is therefore often sensible to make sure by other means (such as unique +indexes, application logic etc.) that at most one document matches *search-expression*. + +The following query will look in the *users* collection for a document with a specific +*name* attribute value. If the document exists, its *logins* attribute will be increased +by one. If it does not exist, a new document will be inserted, consisting of the +attributes *name*, *logins*, and *dateCreated*: + +``` +UPSERT { name: 'superuser' } +INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() } +UPDATE { logins: OLD.logins + 1 } IN users +``` + +Note that in the *UPDATE* case it is possible to refer to the previous version of the +document using the *OLD* pseudo-value. + + +!SUBSECTION Setting query options + +As in several above examples, the *ignoreErrors* option can be used to suppress query +errors that may occur when trying to violate unique key constraints. + +When updating or replacing an attribute with a null value, ArangoDB will not remove the +attribute from the document but store a null value for it. To get rid of attributes in +an upsert operation, set them to null and provide the *keepNull* option. + +There is also the option *mergeObjects* that controls whether object contents will be +merged if an object attribute is present in both the *UPDATE* query and in the +to-be-updated document. + +Note: the default value for *mergeObjects* is *true*, so there is no need to specify it +explicitly. + +To make sure data are durable when an update query returns, there is the *waitForSync* +query option. + + +!SUBSECTION Returning documents + +`UPSERT` statements can optionally return data. To do so, they need to be followed +by a `RETURN` statement (intermediate `LET` statements are allowed, too). These statements +can optionally perform calculations and refer to the pseudo-values `OLD` and `NEW`. +In case the upsert performed an insert operation, `OLD` will have a value of *null*. +In case the upsert performed an update or replace operation, `OLD` will contain the +previous version of the document, before update/replace. + +`NEW` will always be populated. It will contain the inserted document in case the +upsert performed an insert, or the updated/replaced document in case it performed an +update/replace. + +This can also be used to check whether the upsert has performed an insert or an update +internally: + +``` +UPSERT { name: 'superuser' } +INSERT { name: 'superuser', logins: 1, dateCreated: DATE_NOW() } +UPDATE { logins: OLD.logins + 1 } IN users +RETURN { doc: NEW, type: OLD ? 'update' : 'insert' } +``` + diff --git a/Documentation/Books/Users/SUMMARY.md b/Documentation/Books/Users/SUMMARY.md index 222eba8fa7..7800fce219 100644 --- a/Documentation/Books/Users/SUMMARY.md +++ b/Documentation/Books/Users/SUMMARY.md @@ -88,6 +88,18 @@ * [Query Results](Aql/QueryResults.md) * [Operators](Aql/Operators.md) * [High level Operations](Aql/Operations.md) + * [For](AqlOperations/For.md) + * [Return](AqlOperations/Return.md) + * [Filter](AqlOperations/Filter.md) + * [Sort](AqlOperations/Sort.md) + * [Limit](AqlOperations/Limit.md) + * [Let](AqlOperations/Let.md) + * [Collect](AqlOperations/Collect.md) + * [Remove](AqlOperations/Remove.md) + * [Update](AqlOperations/Update.md) + * [Replace](AqlOperations/Replace.md) + * [Insert](AqlOperations/Insert.md) + * [Upsert](AqlOperations/Upsert.md) * [Graphs](Aql/Graphs.md) * [Traversal](Aql/GraphTraversals.md) * [Named Operations](Aql/GraphOperations.md) @@ -246,3 +258,4 @@ * [Attribute Names](NamingConventions/AttributeNames.md) * [Error codes and meanings](ErrorCodes/README.md) * [Glossary](Glossary/README.md) +