Skip to content

Commit 28e60eb

Browse files
en doc for sequence aggregation
1 parent 2b4dc0d commit 28e60eb

File tree

3 files changed

+221
-24
lines changed

3 files changed

+221
-24
lines changed

hexo-docs/source/en/entity-sequence.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,7 @@ inline fun <E : Entity<E>, T : Table<E>> EntitySequence<E, T>.sortedBy(
169169
): EntitySequence<E, T>
170170
```
171171

172-
Ktorm provides a `sortedBy` function, which allows us to specify the *order by* closure for the sequence's internal query. The function accepts a closure as its parameter in which we need to return a column or expression. The following code obtains all the employees and sorts them by their salaries:
172+
Ktorm provides a `sortedBy` function, which allows us to specify the *order by* clause for the sequence's internal query. The function accepts a closure as its parameter in which we need to return a column or expression. The following code obtains all the employees and sorts them by their salaries:
173173

174174
```kotlin
175175
val employees = Employees.asSequence().sortedBy { it.salary }.toList()
@@ -293,7 +293,7 @@ select t_employee.name
293293
from t_employee
294294
```
295295

296-
If we want to select two or more columns, we can changed to `mapColumns2` or `mapColumns3`, then we need to wrap our selected columns by `Pair` or `Triple` in the closure, and the function's return type becomes `List<Pair<C1?, C2?>>` or `List<Triple<C1?, C2?, C3?>>`. The example below prints the IDs, names and hired days of the employees in department 1:
296+
If we want to select two or more columns, we can change to `mapColumns2` or `mapColumns3`, then we need to wrap our selected columns by `Pair` or `Triple` in the closure, and the function's return type becomes `List<Pair<C1?, C2?>>` or `List<Triple<C1?, C2?, C3?>>`. The example below prints the IDs, names and hired days of the employees in department 1:
297297

298298
```kotlin
299299
// MySQL datediff function
@@ -347,13 +347,13 @@ In addition to the basic forms, there are also many variants for these functions
347347

348348
### fold/reduce/forEach
349349

350-
This serial of functions provide features of iteration and folding, and their usages are also the same as the corresponding ones of `kotlin.Sequence`. The following code calculates the total salaries of all employees:
350+
This serial of functions provide features of iteration and folding, and their usages are also the same as the corresponding ones of `kotlin.Sequence`. The following code calculates the total salary of all employees:
351351

352352
```kotlin
353353
val totalSalary = Employees.asSequence().fold(0L) { acc, employee -> acc + employee.salary }
354354
```
355355

356-
Of course, if only the total salaries are needed, we don't have to write codes in that way. Because the performance is really poor, as all employees are obtained from the database. Here we just show you the usage of the `fold` function. It's better to use `sumBy`:
356+
Of course, if only the total salary is needed, we don't have to write codes in that way. Because the performance is really poor, as all employees are obtained from the database. Here we just show you the usage of the `fold` function. It's better to use `sumBy`:
357357

358358
```kotlin
359359
val totalSalary = Employees.sumBy { it.salary }

hexo-docs/source/en/sequence-aggregation.md

Lines changed: 199 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,203 @@ lang: en
44
related_path: zh-cn/sequence-aggregation.html
55
---
66

7-
# Entity Sequence
7+
# Sequence Aggregation
88

9-
Under construction...
9+
The entity sequence APIs not only allow us to obtain entities from databases just like using `kotlin.Sequence`, but they also provide rich support for aggregations, so we can conveniently count the columns, sum them, or calculate their averages, etc.
10+
11+
> Note: entity sequence APIs are only available since Ktorm version 2.0.
12+
13+
## Simple Aggregation
14+
15+
Let's learn the definition of the extension function `aggregateColumns` first:
16+
17+
```kotlin
18+
inline fun <E : Entity<E>, T : Table<E>, C : Any> EntitySequence<E, T>.aggregateColumns(
19+
aggregationSelector: (T) -> ColumnDeclaring<C>
20+
): C?
21+
```
22+
23+
It's a terminal operation, and it accepts a closure as its paramter, in which we need to return an aggregate expression. Ktorm will create an aggregate query, using the current filter condition and selecting the aggregate expression specified by us, then execute the query and obtain the aggregate result. The following code obtains the max salary in department 1:
24+
25+
```kotlin
26+
val max = Employees
27+
.asSequenceWithoutReferences()
28+
.filter { it.departmentId eq 1 }
29+
.aggregateColumns { max(it.salary) }
30+
```
31+
32+
If we want to aggregate two or more columns, we can change to `aggregateColumns2` or `aggregateColumns3`, then we need to wrap our aggregate expressions by `Pair` or `Triple` in the closure, and the function's return type becomes `Pair<C1?, C2?>` or `Triple<C1?, C2?, C3?>`. The example below obtains the average and the range of salaries in department 1:
33+
34+
```kotlin
35+
val (avg, diff) = Employees
36+
.asSequenceWithoutReferences()
37+
.filter { it.departmentId eq 1 }
38+
.aggregateColumns2 { Pair(avg(it.salary), max(it.salary) - min(it.salary)) }
39+
```
40+
41+
Generated SQL:
42+
43+
```sql
44+
select avg(t_employee.salary), max(t_employee.salary) - min(t_employee.salary)
45+
from t_employee
46+
where t_employee.department_id = ?
47+
```
48+
49+
> Is there other functions like `aggregateColumns4` or more? I'm sorry to say no too. Just like the `mapColumns` in the former section, we doesn't think it's a frequent-used and irreplaceable feature. If you really need that, it's easy to implement it by yourself, or you can raise an issue to us.
50+
51+
Additionally, Ktorm also provides many convenient helper functions, they are all implemented based on `aggregateColumns`. For example, we can use `maxBy { it.salary }` to obtain the max salary, that's equivalent to `aggregateColumns { max(it.salary) }`. Here is a list of these functions:
52+
53+
| Name | Usage Example | Description | Quivalent |
54+
| --------- | ---------------------------------- | ------------------------------------------- | ------------------------------------------------------------ |
55+
| count | `count { it.salary greater 1000 }` | Count those whose salary greater than 1000 | `filter { it.salary greater 1000 }`<br/>`.aggregateColumns { count() }` |
56+
| any | `any { it.salary greater 1000 }` | True if any one's salary greater than 1000 | `count { it.salary greater 1000 } > 0` |
57+
| none | `none { it.salary greater 1000 }` | True if no one's salary greater than 1000 | `count { it.salary greater 1000 } == 0` |
58+
| all | `all { it.salary greater 1000 }` | True if everyone's salary greater than 1000 | `count { it.salary lessEq 1000 } == 0` |
59+
| sumBy | `sumBy { it.salary }` | Obtain the salaries' sum | `aggregateColumns { sum(it.salary) }` |
60+
| maxBy | `maxBy { it.salary }` | Obtain the salaries' max value | `aggregateColumns { max(it.salary) }` |
61+
| minBy | `minBy { it.salary }` | Obtain the salaries' min value | `aggregateColumns { min(it.salary) }` |
62+
| averageBy | `averageBy { it.salary }` | Obtain the average salary | `aggregateColumns { avg(it.salary) }` |
63+
64+
## Grouping Aggregation
65+
66+
To use grouping aggregations, we need to learn how to group elements in an entity sequence first. Ktorm provides two different grouping functions, they are `groupBy` and `groupingBy`.
67+
68+
### groupBy
69+
70+
```kotlin
71+
inline fun <E : Entity<E>, K> EntitySequence<E, *>.groupBy(
72+
keySelector: (E) -> K
73+
): Map<K, List<E>>
74+
```
75+
76+
Obviously, `groupBy` is a terminal operation, it will execute the internal query and iterate the query results right now, then extract a grouping key by the `keySelector` closure for each element, finally collect them into the groups they are belonging to. The following code obtains all the employees and groups them by their departments:
77+
78+
```kotlin
79+
val employees = Employees.asSequence().groupBy { it.department.id }
80+
```
81+
82+
Here, the type of `employees` is `Map<Int, List<Employee>>`, in which the keys are departments' IDs, and the values are the lists of employees belonging to the departments. Now we have the employees' data for every departments, we are able to do some aggregate calculations over the data. The following code calculates the average salaries for each department:
83+
84+
```kotlin
85+
val averageSalaries = Employees
86+
.asSequence()
87+
.groupBy { it.department.id }
88+
.mapValues { (_, employees) -> employees.map { it.salary }.average() }
89+
```
90+
91+
But, unfortunately, the aggregate calculation here is performed inside the JVM, and the generated SQL still obtains all the employees, although we don't really need them:
92+
93+
```sql
94+
select *
95+
from t_employee
96+
left join t_department _ref0 on t_employee.department_id = _ref0.id
97+
```
98+
99+
Here, the only thing we need is the average salaries, but we still have to obtain all the employees' data from the database. The performance losts may be intolerable in most cases. It'll be better for us to generate proper SQLs using *group by* clauses and aggregate functions, and move the aggregate calculations back to the database. To solve this problem, we need to use the `groupingBy` function.
100+
101+
> Note that these two functions are design for very different purposes. The `groupBy` is a terminal operation, as it'll obtain all the entity objects and divide them into groups inside the JVM memory; However, the `groupingBy` is an intermediate operation, it'll add a *group by* clause to the final generated SQL, and particular aggregations should be specified using the following extension functions of `EntityGrouping`.
102+
103+
### groupingBy
104+
105+
```kotlin
106+
fun <E : Entity<E>, T : Table<E>, K : Any> EntitySequence<E, T>.groupingBy(
107+
keySelector: (T) -> ColumnDeclaring<K>
108+
): EntityGrouping<E, T, K> {
109+
return EntityGrouping(this, keySelector)
110+
}
111+
```
112+
113+
The `groupingBy` function is an intermediate operation, and it accepts a closure as its paramter, in which we should return a `ColumnDeclaring<K>` as the grouping key. The grouping key can be a column or expression, and it'll be used in the SQL's *group by* clause. Actually, the `groupingBy` function doesn't do anything, it just returns a new created `EntityGrouping` with the `keySelector` given by us. The definition of `EntityGrouping` is simple:
114+
115+
```kotlin
116+
data class EntityGrouping<E : Entity<E>, T : Table<E>, K : Any>(
117+
val sequence: EntitySequence<E, T>,
118+
val keySelector: (T) -> ColumnDeclaring<K>
119+
) {
120+
fun asKotlinGrouping(): kotlin.collections.Grouping<E, K?> { ... }
121+
}
122+
```
123+
124+
Most of the `EntityGrouping`'s APIs are provided as extension functions. Let's learn the `aggregateColumns` first:
125+
126+
```kotlin
127+
inline fun <E : Entity<E>, T : Table<E>, K : Any, C : Any> EntityGrouping<E, T, K>.aggregateColumns(
128+
aggregationSelector: (T) -> ColumnDeclaring<C>
129+
): Map<K?, C?>
130+
```
131+
132+
Similar to the `aggregateColumns` of `EntitySequence`, it's a terminal operation, and it accepts a closure as its parameter, in which we should return an aggregate expression. Ktorm will create an aggregate query, using the current filter condition and the grouping key, selecting the aggregate expression specified by us, then execute the query and obtain the aggregate results. Its return type is `Map<K?, C?>`, in which the keys are our grouping keys, and the values are the aggregate results for the groups. The following code obtains the average salaries for each department:
133+
134+
```kotlin
135+
val averageSalaries = Employees
136+
.asSequenceWithoutReferences()
137+
.groupingBy { it.departmentId }
138+
.aggregateColumns { avg(it.salary) }
139+
```
140+
141+
Now we can see that the generated SQL uses a *group by* clause and do the aggregation inside the database:
142+
143+
```sql
144+
select t_employee.department_id, avg(t_employee.salary)
145+
from t_employee
146+
group by t_employee.department_id
147+
```
148+
149+
If we want to aggregate two or more columns, we can change to `aggregateColumns2` or `aggregateColumns3`, then we need to wrap our aggregate expressions by `Pair` or `Triple` in the closure, and the function’s return type becomes `Map<K?, Pair<C1?, C2?>>` or `Map<K?, Triple<C1?, C2?, C3?>>`. The following code prints the averages and the ranges of salaries for each department:
150+
151+
```kotlin
152+
Employees
153+
.asSequenceWithoutReferences()
154+
.groupingBy { it.departmentId }
155+
.aggregateColumns2 { Pair(avg(it.salary), max(it.salary) - min(it.salary)) }
156+
.forEach { departmentId, (avg, diff) ->
157+
println("$departmentId:$avg:$diff")
158+
}
159+
```
160+
161+
Generated SQL:
162+
163+
```sql
164+
select t_employee.department_id, avg(t_employee.salary), max(t_employee.salary) - min(t_employee.salary)
165+
from t_employee
166+
group by t_employee.department_id
167+
```
168+
169+
Additionally, Ktorm also provides many convenient helper functions, they are all implemented based on `aggregateColumns`. Here is a list of them:
170+
171+
| Name | Usage Example | Description | Equivalent |
172+
| ----------------- | ----------------------------- | ------------------------------------------ | ------------------------------------- |
173+
| eachCount(To) | `eachCount()` | Obtain record counts for each group | `aggregateColumns { count() }` |
174+
| eachSumBy(To) | `eachSumBy { it.salary }` | Obtain salaries's sums for each group | `aggregateColumns { sum(it.salary) }` |
175+
| eachMaxBy(To) | `eachMaxBy { it.salary }` | Obtain salaries' max values for each group | `aggregateColumns { max(it.salary) }` |
176+
| eachMinBy(To) | `eachMinBy { it.salary }` | Obtain salaries' min values for each group | `aggregateColumns { min(it.salary) }` |
177+
| eachAverageBy(To) | `eachAverageBy { it.salary }` | Obtain salaries' averages for each group | `aggregateColumns { avg(it.salary) }` |
178+
179+
With these functions, we can write the code below to obtain average salaries for each department:
180+
181+
```kotlin
182+
val averageSalaries = Employees
183+
.asSequenceWithoutReferences()
184+
.groupingBy { it.departmentId }
185+
.eachAverageBy { it.salary }
186+
```
187+
188+
Besides, Ktorm also provides `aggregate`, `fold`, `reduce`, they have the same names as the extension functions of `kotlin.collections.Grouping`, and the usages are totally the same. The following code calculates the total salaries for each department:
189+
190+
```kotlin
191+
val totalSalaries = Employees
192+
.asSequenceWithoutReferences()
193+
.groupingBy { it.departmentId }
194+
.fold(0L) { acc, employee ->
195+
acc + employee.salary
196+
}
197+
```
198+
199+
Of course, if only the total salaries are needed, we don’t have to write codes in that way. Because the performance is really poor, as all employees are obtained from the database. Here we just show you the usage of the `fold` function. It’s better to use `eachSumBy`:
200+
201+
```kotlin
202+
val totalSalaries = Employees
203+
.asSequenceWithoutReferences()
204+
.groupingBy { it.departmentId }
205+
.eachSumBy { it.salary }
206+
```

0 commit comments

Comments
 (0)