!CHAPTER High-level operations !SUBSECTION FOR The *FOR* keyword can be to iterate over all elements of an array. The general syntax is: ``` FOR variable-name IN expression ``` Each array element returned by *expression* is visited exactly once. It is required that *expression* returns an array in all cases. The empty array is allowed, too. The current array element is made available for further processing in the variable specified by *variable-name*. ``` FOR u IN users RETURN u ``` This will iterate over all elements from the array *users* (note: this array consists of all documents from the collection named "users" in this case) and make the current array element available in variable *u*. *u* is not modified in this example but simply pushed into the result using the *RETURN* keyword. Note: When iterating over collection-based arrays as shown here, the order of documents is undefined unless an explicit sort order is defined using a *SORT* statement. The variable introduced by *FOR* is available until the scope the *FOR* is placed in is closed. Another example that uses a statically declared array of values to iterate over: ``` FOR year IN [ 2011, 2012, 2013 ] RETURN { "year" : year, "isLeapYear" : year % 4 == 0 && (year % 100 != 0 || year % 400 == 0) } ``` Nesting of multiple *FOR* statements is allowed, too. When *FOR* statements are nested, a cross product of the array elements returned by the individual *FOR* statements will be created. ``` FOR u IN users FOR l IN locations RETURN { "user" : u, "location" : l } ``` In this example, there are two array iterations: an outer iteration over the array *users* plus an inner iteration over the arry *locations*. The inner array is traversed as many times as there are elements in the outer array. For each iteration, the current values of *users* and *locations* are made available for further processing in the variable *u* and *l*. !SUBSECTION RETURN The *RETURN* statement can (and must) be used to produce the result of a query. It is mandatory to specify a *RETURN* statement at the end of each block in a query, otherwise the query result would be undefined. The general syntax for *return* is: ``` RETURN expression ``` The *expression* returned by *RETURN* is produced for each iteration the *RETURN* statement is placed in. That means the result of a *RETURN* statement is always an array (this includes the empty array). To return all elements from the currently iterated array without modification, the following simple form can be used: ``` FOR variable-name IN expression RETURN variable-name ``` As *RETURN* allows specifying an expression, arbitrary computations can be performed to calculate the result elements. Any of the variables valid in the scope the *RETURN* is placed in can be used for the computations. Note: Return will close the current scope and eliminate all local variables in it. !SUBSECTION FILTER The *FILTER* statement can be used to restrict the results to elements that match an arbitrary logical condition. The general syntax is: ``` FILTER condition ``` *condition* must be a condition that evaluates to either *false* or *true*. If the condition result is false, the current element is skipped, so it will not be processed further and not be part of the result. If the condition is true, the current element is not skipped and can be further processed. ``` FOR u IN users FILTER u.active == true && u.age < 39 RETURN u ``` In the above example, all array elements from *users* will be included that have an attribute *active* with value *true* and that have an attribute *age* with a value less than *39* (including *null* ones). All other elements from *users* will be skipped and not be included the result produced by *RETURN*. It is allowed to specify multiple *FILTER* statements in a query, and even in the same block. If multiple *FILTER* statements are used, their results will be combined with a logical and, meaning all filter conditions must be true to include an element. ``` FOR u IN users FILTER u.active == true FILTER u.age < 39 RETURN u ``` !SUBSECTION SORT The *SORT* statement will force a sort of the array of already produced intermediate results in the current block. *SORT* allows specifying one or multiple sort criteria and directions. The general syntax is: ``` SORT expression direction ``` Specifying the *direction* is optional. The default (implicit) direction for a sort is the ascending order. To explicitly specify the sort direction, the keywords *ASC* (ascending) and *DESC* can be used. Multiple sort criteria can be separated using commas. Note: when iterating over collection-based arrays, the order of documents is always undefined unless an explicit sort order is defined using *SORT*. ``` FOR u IN users SORT u.lastName, u.firstName, u.id DESC RETURN u ``` !SUBSECTION LIMIT The *LIMIT* statement allows slicing the result array using an offset and a count. It reduces the number of elements in the result to at most the specified number. Two general forms of *LIMIT* are followed: ``` LIMIT count LIMIT offset, count ``` The first form allows specifying only the *count* value whereas the second form allows specifying both *offset* and *count*. The first form is identical using the second form with an *offset* value of *0*. The *offset* value specifies how many elements from the result shall be discarded. It must be 0 or greater. The *count* value specifies how many elements should be at most included in the result. ``` FOR u IN users SORT u.firstName, u.lastName, u.id DESC LIMIT 0, 5 RETURN u ``` !SUBSECTION LET The *LET* statement can be used to assign an arbitrary value to a variable. The variable is then introduced in the scope the *LET* statement is placed in. The general syntax is: ``` LET variable-name = expression ``` *LET* statements are mostly used to declare complex computations and to avoid repeated computations of the same value at multiple parts of a query. ``` FOR u IN users LET numRecommendations = LENGTH(u.recommendations) RETURN { "user" : u, "numRecommendations" : numRecommendations, "isPowerUser" : numRecommendations >= 10 } ``` In the above example, the computation of the number of recommendations is factored out using a *LET* statement, thus avoiding computing the value twice in the *RETURN* statement. Another use case for *LET* is to declare a complex computation in a subquery, making the whole query more readable. ``` FOR u IN users LET friends = ( FOR f IN friends FILTER u.id == f.userId RETURN f ) LET memberships = ( FOR m IN memberships FILTER u.id == m.userId RETURN m ) RETURN { "user" : u, "friends" : friends, "numFriends" : LENGTH(friends), "memberShips" : memberships } ``` !SUBSECTION COLLECT The *COLLECT* keyword can be used to group an array by one or multiple group criteria. The *COLLECT* statement will eliminate all local variables in the current scope. After *COLLECT* only the variables introduced by *COLLECT* itself are available. The general syntaxes for *COLLECT* are: ``` COLLECT variable-name = expression COLLECT variable-name = expression INTO groups-variable COLLECT variable-name = expression INTO groups-variable = projection-expression COLLECT variable-name = expression INTO groups-variable KEEP keep-variable COLLECT variable-name = expression WITH COUNT INTO count-variable COLLECT WITH COUNT INTO count-variable ``` Note: the first four forms of *COLLECT* require their input be sorted by the group criteria. To ensure correctness of the result, the AQL optimizer will automatically insert a *SORT* statement into the query in front of the *COLLECT* statement. The optimizer can optimize away the *SORT* statement later if a sorted index is present on the group criteria. This will make *COLLECT* operations more efficient. !SUBSUBSECTION Grouping syntaxes The first syntax form of *COLLECT* only groups the result by the defined group criteria specified in *expression*. In order to further process the results produced by *COLLECT*, a new variable (specified by *variable-name*) is introduced. This variable contains the group value. Here's an example query that find the distinct values in *u.city* and makes them available in variable *city*: ``` FOR u IN users COLLECT city = u.city RETURN { "city" : city } ``` The second form does the same as the first form, but additionally introduces a variable (specified by *groups-variable*) that contains all elements that fell into the group. This works as follows: The *groups-variable* variable is an array containing as many elements as there are in the group. Each member of that array is a JSON object in which the value of every variable that is defined in the AQL query is bound to the corresponding attribute. Note that this considers all variables that are defined before the *COLLECT* statement, but not those on the top level (outside of any *FOR*), unless the *COLLECT* statement is itself on the top level, in which case all variables are taken. Furthermore note that it is possible that the optimizer moves *LET* statements out of *FOR* statements to improve performance. ``` FOR u IN users COLLECT city = u.city INTO groups RETURN { "city" : city, "usersInCity" : groups } ``` In the above example, the array *users* will be grouped by the attribute *city*. The result is a new array of documents, with one element per distinct *u.city* value. The elements from the original array (here: *users*) per city are made available in the variable *groups*. This is due to the *INTO* clause. *COLLECT* also allows specifying multiple group criteria. Individual group criteria can be separated by commas: ``` FOR u IN users COLLECT country = u.country, city = u.city INTO groups RETURN { "country" : country, "city" : city, "usersInCity" : groups } ``` In the above example, the array *users* is grouped by country first and then by city, and for each distinct combination of country and city, the users will be returned. !SUBSUBSECTION Discarding obsolete variables The third form of *COLLECT* allows rewriting the contents of the *groups-variable* using an arbitrary *projection-expression*: ``` FOR u IN users COLLECT country = u.country, city = u.city INTO groups = u.name RETURN { "country" : country, "city" : city, "userNames" : groups } ``` In the above example, only the *projection-expression* is *u.name*. Therefore, only this attribute is copied into the *groups-variable* for each document. This is probably much more efficient than copying all variables from the scope into the *groups-variable* as it would happen without a *projection-expression*. The expression following *INTO* can also be used for arbitrary computations: ``` FOR u IN users COLLECT country = u.country, city = u.city INTO groups = { "name" : u.name, "isActive" : u.status == "active" } RETURN { "country" : country, "city" : city, "usersInCity" : groups } ``` *COLLECT* also provides an optional *KEEP* clause that can be used to control which variables will be copied into the variable created by `INTO`. If no *KEEP* clause is specified, all variables from the scope will be copied as sub-attributes into the *groups-variable*. This is safe but can have a negative impact on performance if there are many variables in scope or the variables contain massive amounts of data. The following example limits the variables that are copied into the *groups-variable* to just *name*. The variables *u* and *someCalculation* also present in the scope will not be copied into *groups-variable* because they are not listed in the *KEEP* clause: ``` FOR u IN users LET name = u.name LET someCalculation = u.value1 + u.value2 COLLECT city = u.city INTO groups KEEP name RETURN { "city" : city, "userNames" : groups[*].name } ``` *KEEP* is only valid in combination with *INTO*. Only valid variable names can be used in the *KEEP* clause. *KEEP* supports the specification of multiple variable names. !SUBSUBSECTION Group length calculation *COLLECT* also provides a special *WITH COUNT* clause that can be used to determine the number of group members efficiently. The simplest form just returns the number of items that made it into the *COLLECT*: ``` FOR u IN users COLLECT WITH COUNT INTO length RETURN length ``` The above is equivalent to, but more efficient than: ``` RETURN LENGTH( FOR u IN users RETURN length ) ``` The *WITH COUNT* clause can also be used to efficiently count the number of items in each group: ``` FOR u IN users COLLECT age = u.age WITH COUNT INTO length RETURN { "age" : age, "count" : length } ``` Note: the *WITH COUNT* clause can only be used together with an *INTO* clause. !SUBSECTION REMOVE The *REMOVE* keyword can be used to remove documents from a collection. On a single server, the document removal is executed transactionally in an all-or-nothing fashion. For sharded collections, the entire remove operation is not transactional. Only a single *REMOVE* statement is allowed per AQL query, and it cannot be combined with other data-modification or retrieval operations. A remove operation is restricted to a single collection, and the [collection name](../Glossary/README.html#collection_name) must not be dynamic. The syntax for a remove operation is: ``` REMOVE key-expression IN collection options ``` *collection* must contain the name of the collection to remove the documents from. *key-expression* must be an expression that contains the document identification. This can either be a string (which must then contain the [document key](../Glossary/README.html#document_key)) or a document, which must contain a *_key* attribute. The following queries are thus equivalent: ``` FOR u IN users REMOVE { _key: u._key } IN users FOR u IN users REMOVE u._key IN users FOR u IN users REMOVE u IN users ``` **Note**: A remove operation can remove arbitrary documents, and the documents do not need to be identical to the ones produced by a preceding *FOR* statement: ``` FOR i IN 1..1000 REMOVE { _key: CONCAT('test', TO_STRING(i)) } IN users FOR u IN users FILTER u.active == false REMOVE { _key: u._key } IN backup ``` *options* can be used to suppress query errors that may occur when trying to remove non-existing documents. For example, the following query will fail if one of the to-be-deleted documents does not exist: ``` FOR i IN 1..1000 REMOVE { _key: CONCAT('test', TO_STRING(i)) } IN users ``` By specifying the *ignoreErrors* query option, these errors can be suppressed so the query completes: ``` FOR i IN 1..1000 REMOVE { _key: CONCAT('test', TO_STRING(i)) } IN users OPTIONS { ignoreErrors: true } ``` To make sure data has been written to disk when a query returns, there is the *waitForSync* query option: ``` FOR i IN 1..1000 REMOVE { _key: CONCAT('test', TO_STRING(i)) } IN users OPTIONS { waitForSync: true } ``` !SUBSECTION UPDATE The *UPDATE* keyword can be used to partially update documents in a collection. On a single server, updates are executed transactionally in an all-or-nothing fashion. For sharded collections, the entire update operation is not transactional. Only a single *UPDATE* statement is allowed per AQL query, and it cannot be combined with other data-modification or retrieval operations. An update operation is restricted to a single collection, and the collection name must not be dynamic. The two syntaxes for an update operation are: ``` UPDATE document IN collection options UPDATE key-expression WITH document IN collection options ``` *collection* must contain the name of the collection in which the documents should be updated. *document* must be a document that contains the attributes and values to be updated. When using the first syntax, *document* must also contain the *_key* attribute to identify the document to be updated. ``` FOR u IN users UPDATE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users ``` The following query is invalid because it does not contain a *_key* attribute and thus it is not possible to determine the documents to be updated: ``` FOR u IN users UPDATE { name: CONCAT(u.firstName, u.lastName) } IN users ``` When using the second syntax, *key-expression* provides the document identification. This can either be a string (which must then contain the document key) or a document, which must contain a *_key* attribute. The following queries are equivalent: ``` FOR u IN users UPDATE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users FOR u IN users UPDATE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users FOR u IN users UPDATE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users ``` An update operation may update arbitrary documents which do not need to be identical to the ones produced by a preceding *FOR* statement: ``` FOR i IN 1..1000 UPDATE CONCAT('test', TO_STRING(i)) WITH { foobar: true } IN users FOR u IN users FILTER u.active == false UPDATE u WITH { status: 'inactive' } IN backup ``` *options* can be used to suppress query errors that may occur when trying to update non-existing documents or violating unique key constraints: ``` FOR i IN 1..1000 UPDATE { _key: CONCAT('test', TO_STRING(i)) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } ``` An update operation will only update the attributes specified in *document* and leave other attributes untouched. Internal attributes (such as *_id*, *_key*, *_rev*, *_from* and *_to*) cannot be updated and are ignored when specified in *document*. Updating a document will modify the document's revision number with a server-generated value. When updating an attribute with a null value, ArangoDB will not remove the attribute from the document but store a null value for it. To get rid of attributes in an update operation, set them to null and provide the *keepNull* option: ``` FOR u IN users UPDATE u WITH { foobar: true, notNeeded: null } IN users OPTIONS { keepNull: false } ``` The above query will remove the *notNeeded* attribute from the documents and update the *foobar* attribute normally. There is also the option *mergeObjects* that controls whether object contents will be merged if an object attribute is merged if it is present in both the *UPDATE* query and in the to-be-updated document. The following query will set the updated document's *name* attribute to the exact same value that is specified in the query. This is due to the *mergeObjects* option being set to *false*: ``` FOR u IN users UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: false } ``` Contrary, the following query will merge the contents of the *name* attribute in the original document with the value specified in the query: ``` FOR u IN users UPDATE u WITH { name: { first: "foo", middle: "b.", last: "baz" } } IN users OPTIONS { mergeObjects: true } ``` Attributes in *name* that are present in the to-be-updated document but not in the query will now be preserved. Attributes that are present in both will be overwritten with the values specified in the query. Note: the default value for *mergeObjects* is *true*, so there is no need to specify it explicitly. To make sure data are durable when an update query returns, there is the *waitForSync* query option: ``` FOR u IN users UPDATE u WITH { foobar: true } IN users OPTIONS { waitForSync: true } ``` !SUBSECTION REPLACE The *REPLACE* keyword can be used to completely replace documents in a collection. On a single server, the replace operation is executed transactionally in an all-or-nothing fashion. For sharded collections, the entire replace operation is not transactional. Only a single *REPLACE* statement is allowed per AQL query, and it cannot be combined with other data-modification or retrieval operations. A replace operation is restricted to a single collection, and the collection name must not be dynamic. The two syntaxes for a replace operation are: ``` REPLACE document IN collection options REPLACE key-expression WITH document IN collection options ``` *collection* must contain the name of the collection in which the documents should be replaced. *document* is the replacement document. When using the first syntax, *document* must also contain the *_key* attribute to identify the document to be replaced. ``` FOR u IN users REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName), status: u.status } IN users ``` The following query is invalid because it does not contain a *_key* attribute and thus it is not possible to determine the documents to be replaced: ``` FOR u IN users REPLACE { name: CONCAT(u.firstName, u.lastName, status: u.status) } IN users ``` When using the second syntax, *key-expression* provides the document identification. This can either be a string (which must then contain the document key) or a document, which must contain a *_key* attribute. The following queries are equivalent: ``` FOR u IN users REPLACE { _key: u._key, name: CONCAT(u.firstName, u.lastName) } IN users FOR u IN users REPLACE u._key WITH { name: CONCAT(u.firstName, u.lastName) } IN users FOR u IN users REPLACE { _key: u._key } WITH { name: CONCAT(u.firstName, u.lastName) } IN users FOR u IN users REPLACE u WITH { name: CONCAT(u.firstName, u.lastName) } IN users ``` A replace will fully replace an existing document, but it will not modify the values of internal attributes (such as *_id*, *_key*, *_from* and *_to*). Replacing a document will modify a document's revision number with a server-generated value. A replace operation may update arbitrary documents which do not need to be identical to the ones produced by a preceding *FOR* statement: ``` FOR i IN 1..1000 REPLACE CONCAT('test', TO_STRING(i)) WITH { foobar: true } IN users FOR u IN users FILTER u.active == false REPLACE u WITH { status: 'inactive', name: u.name } IN backup ``` *options* can be used to suppress query errors that may occur when trying to replace non-existing documents or when violating unique key constraints: ``` FOR i IN 1..1000 REPLACE { _key: CONCAT('test', TO_STRING(i)) } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } ``` To make sure data are durable when a replace query returns, there is the *waitForSync* query option: ``` FOR i IN 1..1000 REPLACE { _key: CONCAT('test', TO_STRING(i)) } WITH { foobar: true } IN users OPTIONS { waitForSync: true } ``` !SUBSECTION INSERT The *INSERT* keyword can be used to insert new documents into a collection. On a single server, an insert operation is executed transactionally in an all-or-nothing fashion. For sharded collections, the entire insert operation is not transactional. Only a single *INSERT* statement is allowed per AQL query, and it cannot be combined with other data-modification or retrieval operations. An insert operation is restricted to a single collection, and the collection name must not be dynamic. The syntax for an insert operation is: ``` INSERT document IN collection options ``` **Note**: The *INTO* keyword is also allowed in the place of *IN*. *collection* must contain the name of the collection into which the documents should be inserted. *document* is the document to be inserted, and it may or may not contain a *_key* attribute. If no *_key* attribute is provided, ArangoDB will auto-generate a value for *_key* value. Inserting a document will also auto-generate a document revision number for the document. ``` FOR i IN 1..100 INSERT { value: i } IN numbers ``` When inserting into an [edge collection](../Glossary/README.html#edge_collection), it is mandatory to specify the attributes *_from* and *_to* in document: ``` FOR u IN users FOR p IN products FILTER u._key == p.recommendedBy INSERT { _from: u._id, _to: p._id } IN recommendations ``` *options* can be used to suppress query errors that may occur when violating unique key constraints: ``` FOR i IN 1..1000 INSERT { _key: CONCAT('test', TO_STRING(i)), name: "test" } WITH { foobar: true } IN users OPTIONS { ignoreErrors: true } ``` To make sure data are durable when an insert query returns, there is the *waitForSync* query option: ``` FOR i IN 1..1000 INSERT { _key: CONCAT('test', TO_STRING(i)), name: "test" } WITH { foobar: true } IN users OPTIONS { waitForSync: true } ```