mirror of https://gitee.com/bigwinds/arangodb
277 lines
5.7 KiB
Plaintext
277 lines
5.7 KiB
Plaintext
!CHAPTER Grouping
|
|
|
|
To group results by arbitrary criteria, AQL provides the *COLLECT* keyword.
|
|
*COLLECT* will perform a grouping, but no aggregation. Aggregation can still be
|
|
added in the query if required.
|
|
|
|
!SUBSECTION Ensuring uniqueness
|
|
|
|
*COLLECT* can be used to make a result set unique. The following query will return each distinct
|
|
`age` attribute value only once:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT age = u.age
|
|
RETURN age
|
|
```
|
|
|
|
This is grouping without tracking the group values, but just the group criterion (*age*) value.
|
|
|
|
Grouping can also be done on multiple levels using *COLLECT*:
|
|
|
|
```
|
|
FOR u IN users
|
|
COLLECT status = u.status, age = u.age
|
|
RETURN { status, age }
|
|
```
|
|
|
|
|
|
Alternatively *RETURN DISTINCT* can be used to make a result set unique. *RETURN DISTINCT* supports a
|
|
single criterion only:
|
|
|
|
```
|
|
FOR u IN users
|
|
RETURN DISTINCT u.age
|
|
```
|
|
|
|
Note: the order of results is undefined for *RETURN DISTINCT*.
|
|
|
|
!SUBSECTION Fetching group values
|
|
|
|
To group users by age, and return the names of the users with the highest ages,
|
|
we'll issue a query like this:
|
|
|
|
```
|
|
FOR u IN users
|
|
FILTER u.active == true
|
|
COLLECT age = u.age INTO usersByAge
|
|
SORT age DESC LIMIT 0, 5
|
|
RETURN {
|
|
"age" : age,
|
|
"users" : usersByAge[*].u.name
|
|
}
|
|
|
|
[
|
|
{
|
|
"age" : 37,
|
|
"users" : [
|
|
"John",
|
|
"Sophia"
|
|
]
|
|
},
|
|
{
|
|
"age" : 36,
|
|
"users" : [
|
|
"Fred",
|
|
"Emma"
|
|
]
|
|
},
|
|
{
|
|
"age" : 34,
|
|
"users" : [
|
|
"Madison"
|
|
]
|
|
},
|
|
{
|
|
"age" : 33,
|
|
"users" : [
|
|
"Chloe",
|
|
"Michael"
|
|
]
|
|
},
|
|
{
|
|
"age" : 32,
|
|
"users" : [
|
|
"Alexander"
|
|
]
|
|
}
|
|
]
|
|
```
|
|
|
|
The query will put all users together by their *age* attribute. There will be one
|
|
result document per distinct *age* value (let aside the *LIMIT*). For each group,
|
|
we have access to the matching document via the *usersByAge* variable introduced in
|
|
the *COLLECT* statement.
|
|
|
|
!SUBSECTION Variable Expansion
|
|
|
|
The *usersByAge* variable contains the full documents found, and as we're only
|
|
interested in user names, we'll use the expansion operator <i>[\*]</i> to extract just the
|
|
*name* attribute of all user documents in each group.
|
|
|
|
The <i>[\*]</i> expansion operator is just a handy short-cut. Instead of <i>usersByAge[\*].u.name</i>
|
|
we could also write:
|
|
|
|
```
|
|
FOR temp IN usersByAge
|
|
RETURN temp.u.name
|
|
```
|
|
|
|
!SUBSECTION Grouping by multiple criteria
|
|
|
|
To group by multiple criteria, we'll use multiple arguments in the *COLLECT* clause.
|
|
For example, to group users by *ageGroup* (a derived value we need to calculate first)
|
|
and then by *gender*, we'll do:
|
|
|
|
```
|
|
FOR u IN users
|
|
FILTER u.active == true
|
|
COLLECT ageGroup = FLOOR(u.age / 5) * 5,
|
|
gender = u.gender INTO group
|
|
SORT ageGroup DESC
|
|
RETURN {
|
|
"ageGroup" : ageGroup,
|
|
"gender" : gender
|
|
}
|
|
|
|
[
|
|
{
|
|
"ageGroup" : 35,
|
|
"gender" : "f"
|
|
},
|
|
{
|
|
"ageGroup" : 35,
|
|
"gender" : "m"
|
|
},
|
|
{
|
|
"ageGroup" : 30,
|
|
"gender" : "f"
|
|
},
|
|
{
|
|
"ageGroup" : 30,
|
|
"gender" : "m"
|
|
},
|
|
{
|
|
"ageGroup" : 25,
|
|
"gender" : "f"
|
|
},
|
|
{
|
|
"ageGroup" : 25,
|
|
"gender" : "m"
|
|
}
|
|
]
|
|
```
|
|
|
|
!SUBSECTION Aggregation
|
|
|
|
So far we only grouped data without aggregation. Adding aggregation is simple in AQL,
|
|
as all that needs to be done is to run an aggregate function on the array created by
|
|
the *INTO* clause of a *COLLECT* statement:
|
|
|
|
```
|
|
FOR u IN users
|
|
FILTER u.active == true
|
|
COLLECT ageGroup = FLOOR(u.age / 5) * 5,
|
|
gender = u.gender INTO group
|
|
SORT ageGroup DESC
|
|
RETURN {
|
|
"ageGroup" : ageGroup,
|
|
"gender" : gender,
|
|
"numUsers" : LENGTH(group)
|
|
}
|
|
|
|
[
|
|
{
|
|
"ageGroup" : 35,
|
|
"gender" : "f",
|
|
"numUsers" : 2
|
|
},
|
|
{
|
|
"ageGroup" : 35,
|
|
"gender" : "m",
|
|
"numUsers" : 2
|
|
},
|
|
{
|
|
"ageGroup" : 30,
|
|
"gender" : "f",
|
|
"numUsers" : 4
|
|
},
|
|
{
|
|
"ageGroup" : 30,
|
|
"gender" : "m",
|
|
"numUsers" : 4
|
|
},
|
|
{
|
|
"ageGroup" : 25,
|
|
"gender" : "f",
|
|
"numUsers" : 2
|
|
},
|
|
{
|
|
"ageGroup" : 25,
|
|
"gender" : "m",
|
|
"numUsers" : 2
|
|
}
|
|
]
|
|
```
|
|
|
|
We have used the function *LENGTH* here (it returns the length of a array). This is the
|
|
equivalent to SQL's `SELECT g, COUNT(*) FROM ... GROUP BY g`.
|
|
In addition to *LENGTH* AQL also provides *MAX*, *MIN*, *SUM* and *AVERAGE* as
|
|
basic aggregation functions.
|
|
|
|
In AQL all aggregation functions can be run on arrays only. If an aggregation function
|
|
is run on anything that is not an array, an error will occur and the query will fail.
|
|
|
|
!SUBSECTION Post-filtering aggregated data
|
|
|
|
To filter on the results of a grouping or aggregation operation (i.e. something
|
|
similar to *HAVING* in SQL), simply add another *FILTER* clause after the *COLLECT*
|
|
statement.
|
|
|
|
For example, to get the 3 *ageGroup*s with the most users in them:
|
|
|
|
```
|
|
FOR u IN users
|
|
FILTER u.active == true
|
|
COLLECT ageGroup = FLOOR(u.age / 5) * 5 INTO group
|
|
LET numUsers = LENGTH(group)
|
|
FILTER numUsers > 2 // group must contain at least 3 users in order to qualify
|
|
SORT numUsers DESC
|
|
LIMIT 0, 3
|
|
RETURN {
|
|
"ageGroup" : ageGroup,
|
|
"numUsers" : numUsers,
|
|
"users" : group[*].u.name
|
|
}
|
|
|
|
[
|
|
{
|
|
"ageGroup" : 30,
|
|
"numUsers" : 8,
|
|
"users" : [
|
|
"Abigail",
|
|
"Madison",
|
|
"Anthony",
|
|
"Alexander",
|
|
"Isabella",
|
|
"Chloe",
|
|
"Daniel",
|
|
"Michael"
|
|
]
|
|
},
|
|
{
|
|
"ageGroup" : 25,
|
|
"numUsers" : 4,
|
|
"users" : [
|
|
"Mary",
|
|
"Mariah",
|
|
"Jim",
|
|
"Diego"
|
|
]
|
|
},
|
|
{
|
|
"ageGroup" : 35,
|
|
"numUsers" : 4,
|
|
"users" : [
|
|
"Fred",
|
|
"John",
|
|
"Emma",
|
|
"Sophia"
|
|
]
|
|
}
|
|
]
|
|
```
|
|
|
|
To increase readability, the repeated expression *LENGTH(group)* was put into a variable
|
|
*numUsers*. The *FILTER* on *numUsers* is the equivalent an SQL *HAVING* clause.
|