!CHAPTER Language basics !SUBSECTION Query types An AQL query must either return a result (indicated by usage of the *RETURN* keyword) or execute a data-modification operation (indicated by usage of one of the keywords *INSERT*, *UPDATE*, *REPLACE* or *REMOVE*). The AQL parser will return an error if it detects more than one data-modification operation in the same query or if it cannot figure out if the query is meant to be a data retrieval or a modification operation. AQL only allows *one* query in a single query string; thus semicolons to indicate the end of one query and separate multiple queries (as seen in SQL) are not allowed. !SUBSECTION Whitespace Whitespaces (blanks, carriage returns, line feeds, and tab stops) can be used in the query text to increase its readability. Tokens have to be separated by any number of whitespaces. Whitespace within strings or names must be enclosed in quotes in order to be preserved. !SUBSECTION Comments Comments can be embedded at any position in a query. The text contained in the comment is ignored by the AQL parser. Multi-line comments cannot be nested, which means subsequent comment starts within comments are ignored, comment ends will end the comment. AQL supports two types of comments: - Single line comments: These start with a double forward slash and end at the end of the line, or the end of the query string (whichever is first). - Multi line comments: These start with a forward slash and asterisk, and end with an asterisk and a following forward slash. They can span as many lines as necessary. /* this is a comment */ RETURN 1 /* these */ RETURN /* are */ 1 /* multiple */ + /* comments */ 1 /* this is a multi line comment */ // a single line comment !SUBSECTION Keywords On the top level, AQL offers the following operations: - `FOR`: array iteration - `RETURN`: results projection - `FILTER`: results filtering - `SORT`: result sorting - `LIMIT`: result slicing - `LET`: variable assignment - `COLLECT`: result grouping - `INSERT`: insertion of new documents - `UPDATE`: (partial) update of existing documents - `REPLACE`: replacement of existing documents - `REMOVE`: removal of existing documents - `UPSERT`: insertion or update of existing documents Each of the above operations can be initiated in a query by using a keyword of the same name. An AQL query can (and typically does) consist of multiple of the above operations. An example AQL query may look like this: FOR u IN users FILTER u.type == "newbie" && u.active == true RETURN u.name In this example query, the terms *FOR*, *FILTER*, and *RETURN* initiate the higher-level operation according to their name. These terms are also keywords, meaning that they have a special meaning in the language. For example, the query parser will use the keywords to find out which high-level operations to execute. That also means keywords can only be used at certain locations in a query. This also makes all keywords reserved words that must not be used for other purposes than they are intended for. For example, it is not possible to use a keyword as a collection or attribute name. If a collection or attribute need to have the same name as a keyword, the collection or attribute name needs to be quoted. Keywords are case-insensitive, meaning they can be specified in lower, upper, or mixed case in queries. In this documentation, all keywords are written in upper case to make them distinguishable from other query parts. In addition to the higher-level operations keywords, there are other keywords. The current list of keywords is: - FOR - RETURN - FILTER - SORT - LIMIT - LET - COLLECT - INSERT - UPDATE - REPLACE - REMOVE - UPSERT - WITH - ASC - DESC - IN - INTO - NOT - AND - OR - NULL - TRUE - FALSE Additional keywords may be added in future versions of ArangoDB. !SUBSECTION Names In general, names are used to identify objects (collections, attributes, variables, and functions) in AQL queries. The maximum supported length of any name is 64 bytes. Names in AQL are always case-sensitive. Keywords must not be used as names. If a reserved keyword should be used as a name, the name must be enclosed in backticks. Enclosing a name in backticks makes it possible to use otherwise reserved keywords as names. An example for this is: FOR f IN `filter` RETURN f.`sort` Due to the backticks, *filter* and *sort* are interpreted as names and not as keywords here. !SUBSUBSECTION Collection names Collection names can be used in queries as they are. If a collection happens to have the same name as a keyword, the name must be enclosed in backticks. Please refer to the [Naming Conventions in ArangoDB](../NamingConventions/CollectionNames.md) about collection naming conventions. !SUBSUBSECTION Attribute names When referring to attributes of documents from a collection, the fully qualified attribute name must be used. This is because multiple collections with ambiguous attribute names may be used in a query. To avoid any ambiguity, it is not allowed to refer to an unqualified attribute name. Please refer to the [Naming Conventions in ArangoDB](../NamingConventions/AttributeNames.md) for more information about the attribute naming conventions. FOR u IN users FOR f IN friends FILTER u.active == true && f.active == true && u.id == f.userId RETURN u.name In the above example, the attribute names *active*, *name*, *id*, and *userId* are qualified using the collection names they belong to (*u* and *f* respectively). !SUBSUBSECTION Variable names AQL allows the user to assign values to additional variables in a query. All variables that are assigned a value must have a name that is unique within the context of the query. Variable names must be different from the names of any collection name used in the same query. FOR u IN users LET friends = u.friends RETURN { "name" : u.name, "friends" : friends } In the above query, *users* is a collection name, and both *u* and *friends* are variable names. This is because the *FOR* and *LET* operations need target variables to store their intermediate results. Allowed characters in variable names are the letters *a* to *z* (both in lower and upper case), the numbers *0* to *9* and the underscore (*_*) symbol. A variable name must not start with a number. If a variable name starts with the underscore character, it must also contain at least one letter (a-z or A-Z). !SUBSECTION Data types AQL supports both primitive and compound data types. The following types are available: - Primitive types: Consisting of exactly one value - null: An empty value, also: The absence of a value - bool: Boolean truth value with possible values *false* and *true* - number: Signed (real) number - string: UTF-8 encoded text value - Compound types: Consisting of multiple values - array: Sequence of values, referred to by their positions - object / document: Sequence of values, referred to by their names !SUBSUBSECTION Numeric literals Numeric literals can be integers or real values. They can optionally be signed using the *+* or *-* symbols. The scientific notation is also supported. 1 42 -1 -42 1.23 -99.99 0.1 -4.87e103 All numeric values are treated as 64-bit double-precision values internally. The internal format used is IEEE 754. !SUBSUBSECTION String literals String literals must be enclosed in single or double quotes. If the used quote character is to be used itself within the string literal, it must be escaped using the backslash symbol. Backslash literals themselves also be escaped using a backslash. "yikes!" "don't know" "this is a \"quoted\" word" "this is a longer string." "the path separator on Windows is \\" 'yikes!' 'don\'t know' 'this is a longer string." 'the path separator on Windows is \\' All string literals must be UTF-8 encoded. It is currently not possible to use arbitrary binary data if it is not UTF-8 encoded. A workaround to use binary data is to encode the data using base64 or other algorithms on the application side before storing, and decoding it on application side after retrieval. !SUBSUBSECTION Arrays AQL supports two compound types: - arrays: A composition of unnamed values, each accessible by their positions - objects / documents: A composition of named values, each accessible by their names The first supported compound type is the array type. Arrays are effectively sequences of (unnamed / anonymous) values. Individual array elements can be accessed by their positions. The order of elements in an array is important. An *array-declaration* starts with the *[* symbol and ends with the *]* symbol. An *array-declaration* contains zero or many *expression*s, separated from each other with the *,* symbol. In the easiest case, an array is empty and thus looks like: [ ] Array elements can be any legal *expression* values. Nesting of arrays is supported. [ 1, 2, 3 ] [ -99, "yikes!", [ true, [ "no"], [ ] ], 1 ] [ [ "fox", "marshal" ] ] Individual array values can later be accesses bd their positions using the *[]* accessor. The position of the accessed element must be a numeric value. Positions start at 0. It is also possible to use negative index values to access array values starting from the end of the array. This is convenient if the length of the array is unknown and access to elements at the end of the array is required. // access 1st array element (element start at index 0) u.friends[0] // access 3rd array element u.friends[2] // access last array element u.friends[-1] // access second last array element u.friends[-2] !SUBSUBSECTION Objects / Documents The other supported compound type is the object (or document) type. Objects are a composition of zero to many attributes. Each attribute is a name/value pair. Object attributes can be accessed individually by their names. Object declarations start with the *{* symbol and end with the *}* symbol. An object contains zero to many attribute declarations, separated from each other with the *,* symbol. In the simplest case, an object is empty. Its declaration would then be: { } Each attribute in an object is a name / value pair. Name and value of an attribute are separated using the *:* symbol. The attribute name is mandatory and must be specified as a quoted or unquoted string. If a keyword is to be used as an attribute name, the name must be quoted. Any valid expression can be used as an attribute value. That also means nested objects can be used as attribute values: { name : "Peter" } { "name" : "Vanessa", "age" : 15 } { "name" : "John", likes : [ "Swimming", "Skiing" ], "address" : { "street" : "Cucumber lane", "zip" : "94242" } } Individual object attributes can later be accessed by their names using the *.* accessor. If a non-existing attribute is accessed, the result is *null*. u.address.city.name u.friends[0].name.first !SUBSECTION Bind parameters AQL supports the usage of bind parameters, thus allowing to separate the query text from literal values used in the query. It is good practice to separate the query text from the literal values because this will prevent (malicious) injection of keywords and other collection names into an existing query. This injection would be dangerous because it may change the meaning of an existing query. Using bind parameters, the meaning of an existing query cannot be changed. Bind parameters can be used everywhere in a query where literals can be used. The syntax for bind parameters is *@nameparameter* where *nameparameter* is the actual parameter name. The bind parameter values need to be passed along with the query when it is executed, but not as part of the query text itself. FOR u IN users FILTER u.id == @id && u.name == @nameparameter RETURN u Bind parameter names must start with any of the letters *a* to *z* (both in lower and upper case) or a digit (*0* to *9*), and can be followed by any letter, digit or the underscore symbol. A special type of bind parameter exists for injecting collection names. This type of bind parameter has a name prefixed with an additional *@* symbol (thus when using the bind parameter in a query, two *@* symbols must be used). FOR u IN @@collection FILTER u.active == true RETURN u !SUBSECTION Type and value order When checking for equality or inequality or when determining the sort order of values, AQL uses a deterministic algorithm that takes both the data types and the actual values into account. The compared operands are first compared by their data types, and only by their data values if the operands have the same data types. The following type order is used when comparing data types: null < bool < number < string < array < object / document This means *null* is the smallest type in AQL and *document* is the type with the highest order. If the compared operands have a different type, then the comparison result is determined and the comparison is finished. For example, the boolean *true* value will always be less than any numeric or string value, any array (even an empty array) or any object / document. Additionally, any string value (even an empty string) will always be greater than any numeric value, a boolean value, *true* or *false*. null < false null < true null < 0 null < '' null < ' ' null < '0' null < 'abc' null < [ ] null < { } false < true false < 0 false < '' false < ' ' false < '0' false < 'abc' false < [ ] false < { } true < 0 true < '' true < ' ' true < '0' true < 'abc' true < [ ] true < { } 0 < '' 0 < ' ' 0 < '0' 0 < 'abc' 0 < [ ] 0 < { } '' < ' ' '' < '0' '' < 'abc' '' < [ ] '' < { } [ ] < { } If the two compared operands have the same data types, then the operands values are compared. For the primitive types (null, boolean, number, and string), the result is defined as follows: - null: *null* is equal to *null* - boolean: *false* is less than *true* - number: numeric values are ordered by their cardinal value - string: string values are ordered using a localized comparison, Note: unlike in SQL, *null* can be compared to any value, including *null* itself, without the result being converted into *null* automatically. For compound, types the following special rules are applied: Two array values are compared by comparing their individual elements position by position, starting at the first element. For each position, the element types are compared first. If the types are not equal, the comparison result is determined, and the comparison is finished. If the types are equal, then the values of the two elements are compared. If one of the arrays is finished and the other array still has an element at a compared position, then *null* will be used as the element value of the fully traversed array. If an array element is itself a compound value (an array or an object / document), then the comparison algorithm will check the element's sub values recursively. The element's sub-elements are compared recursively. [ ] < [ 0 ] [ 1 ] < [ 2 ] [ 1, 2 ] < [ 2 ] [ 99, 99 ] < [ 100 ] [ false ] < [ true ] [ false, 1 ] < [ false, '' ] Two object / documents operands are compared by checking attribute names and value. The attribute names are compared first. Before attribute names are compared, a combined array of all attribute names from both operands is created and sorted lexicographically. This means that the order in which attributes are declared in an object / document is not relevant when comparing two objects / documents. The combined and sorted array of attribute names is then traversed, and the respective attributes from the two compared operands are then looked up. If one of the objects / documents does not have an attribute with the sought name, its attribute value is considered to be *null*. Finally, the attribute value of both objects / documents is compared using the before mentioned data type and value comparison. The comparisons are performed for all object / document attributes until there is an unambiguous comparison result. If an unambiguous comparison result is found, the comparison is finished. If there is no unambiguous comparison result, the two compared objects / documents are considered equal. { } < { "a" : 1 } { } < { "a" : null } { "a" : 1 } < { "a" : 2 } { "b" : 1 } < { "a" : 0 } { "a" : { "c" : true } } < { "a" : { "c" : 0 } } { "a" : { "c" : true, "a" : 0 } } < { "a" : { "c" : false, "a" : 1 } } { "a" : 1, "b" : 2 } == { "b" : 2, "a" : 1 } !SUBSECTION Accessing data from collections Collection data can be accessed by specifying a collection name in a query. A collection can be understood as an array of documents, and that is how they are treated in AQL. Documents from collections are normally accessing using the *FOR* keyword. Note that when iterating over documents from a collection, the order of documents is undefined. To traverse documents in an explicit and deterministic order, the *SORT* keyword should be used in addition. Data in collections is stored in documents, with each document potentially having different attributes than other documents. This is true even for documents of the same collection. It is therefore quite normal to encounter documents that do not have some or all of the attributes that are queried in an AQL query. In this case, the non-existing attributes in the document will be treated as if they would exist with a value of *null*. That means that comparing a document attribute to *null* will return true if the document has the particular attribute and the attribute has a value of *null*, or that the document does not have the particular attribute at all. For example, the following query will return all documents from the collection *users* that have a value of *null* in the attribute *name*, plus all documents from *users* that do not have the *name* attribute at all: FOR u IN users FILTER u.name == null RETURN u Furthermore, *null* is less than any other value (excluding *null* itself). That means documents with non-existing attributes may be included in the result when comparing attribute values with the less than or less equal operators. For example, the following query will return all documents from the collection *users* that have an attribute *age* with a value less than *39*, but also all documents from the collection that do not have the attribute *age* at all. FOR u IN users FILTER u.age < 39 RETURN u This behavior should always be taken into account when writing queries.