DataFrame 1.0 Help

groupBy

Splits the rows of DataFrame into groups using one or several columns as grouping keys.

The groupBy function returns a GroupBy object. A GroupBy is a dataframe-like structure that contains one or more key columns and a group FrameColumn. Key columns contain all unique combinations of key values, and the group FrameColumn contains the corresponding groups of rows (each represented as a DataFrame). Each row in a GroupBy corresponds to a keys/group combination.

groupBy(moveToTop = true) { columns } [ transformations ] reducer | aggregator | pivot transformations = [ .sortByGroup { expression } | .sortByGroupDesc { expression } | .sortByCount() | .sortByCountAsc() | .sortByKey() | .sortByKeyDesc() | .sortBy { columns } | .sortByDesc { columns } ] [ .updateGroups { frameExpression } ] [ .filter { rowExpression } ] [ .add(column) { rowExpression } ] reducer = .minBy { column } | .maxBy { column } | .medianBy { rowExpression } | .percentileBy(percentile) { rowExpression } | .first [ { rowCondition } ] | .last [ { rowCondition } ] .concat() | .into([column]) [{ rowExpression }] | .values { valueColumns } aggregator = .count() | .concat() | .concatWithKeys() | .toDataFrame() | .into([column]) [{ rowExpression }] | .values { valueColumns } | .aggregate { aggregations } | .<stat> [ { columns } ] pivot = .pivot { columns } [ .default(defaultValue) ] pivotReducer | pivotAggregator

See column selectors for how to select the columns for this operation, groupBy transformations, groupBy reducing, groupBy aggregations, and pivot+groupBy.

df
df.groupBy { isHappy }
df.groupBy("isHappy")
df.groupBy { name.firstName and isHappy }
df.groupBy { "name"["firstName"]<String>() and "isHappy" }
df.groupBy { age / 10 named "ageDecade" }
df.groupBy { "age"<Int>() / 10 named "ageDecade" }

Grouping columns can be created inplace:

df.groupBy { expr { name.firstName.length + name.lastName.length } named "nameLength" }
df.groupBy { expr { "name"["firstName"]<String>().length + "name"["lastName"]<String>().length } named "nameLength" }

With the optional moveToTop parameter, you can choose whether to make a selected nested column a top-level column:

df.groupBy(moveToTop = true) { name.firstName }
df.groupBy(moveToTop = true) { "name"["firstName"]<String>() }

or to keep it inside a ColumnGroup:

df.groupBy(moveToTop = false) { name.firstName }
df.groupBy(moveToTop = false) { "name"["firstName"]<String>() }

Returns GroupBy object.

Transformation

A GroupBy can be transformed into a new GroupBy using one of the following methods:

  • sortByGroup/sortByGroupDesc — sorts the order of groups (and their corresponding keys) by values computed with a DataFrameExpression applied to each group;

  • sortByCount/sortByCountAsc — sorts the order of groups (and their corresponding keys) by the number of rows they contain;

  • sortByKey/sortByKeyDesc — sorts the order of groups (and their corresponding keys) by the grouping key values;

  • sortBy/sortByDesc — sorts the order of rows within each group by one or more column values;

  • updateGroups — transforms each group into a new one using the provided transforming function;

  • filter — filters group rows by the given predicate;

  • add — adds a new column to each group.

Any DataFrame with FrameColumn can be reinterpreted as a GroupBy:

val df = dataFrameOf( "key" to columnOf(1, 2), "data" to columnOf(df[0..3], df[4..6]), ) // create dataframe with two columns df.asGroupBy { data } // convert dataframe to GroupBy by interpreting 'data' column as groups
val df = dataFrameOf( "key" to columnOf(1, 2), "data" to columnOf(df[0..3], df[4..6]), ) // create dataframe with two columns df.asGroupBy("data") // convert dataframe to GroupBy by interpreting 'data' column as groups

Examples of transformation

sortByGroup

df.groupBy { isHappy }.sortByGroup { mean { age } }
df.groupBy("isHappy").sortByGroup { mean("age") }

sortByCount

df.groupBy { age }.sortByCount()
df.groupBy("age").sortByCount()

sortByKey

df.groupBy { age }.sortByKey()
df.groupBy { age }.sortByKey()

sortBy

df.groupBy { isHappy }.sortBy { age }
df.groupBy("isHappy").sortBy("age")

updateGroups

df.groupBy { isHappy }.updateGroups { sortByDesc { age }.take(2) }
df.groupBy("isHappy").updateGroups { sortByDesc("age").take(2) }

filter

df.groupBy { isHappy }.filter { group.median { age } > 20 }
df.groupBy("isHappy").filter { group.median { "age"<Int>() } > 20 }

add

df.groupBy { isHappy }.add("isAdult") { age >= 18 }
df.groupBy("isHappy").add("isAdult") { "age"<Int>() >= 18 }

Reducing

A GroupBy can be reduced into a DataFrame. It means that each group in this GroupBy is collapsed into a single representative row, and these rows are concatenated into a new DataFrame.

Reducing is a specific case of aggregation.

This mechanism includes two steps.

Step 1: use a reducing function to make a single row from each group

To perform a reducing operation, use the following functions:

  • first/last – to get the first / last row (optionally, the first or last one that satisfies a predicate) of each group.

  • minBy/maxBy – to get from each group the row with the smallest / largest result of the row expression supplied to the function.

  • medianBy/percentileBy – to get the row with the value closest to the estimated median/percentile index of the row expression's results calculated on rows within each group.

These functions return an instance of ReducedGroupBy, which is a class serving as a transitional step between performing a reduction on groups and specifying how the resulting reduced rows (either original or transformed) should be represented in a new DataFrame.

Examples of reducing

df.groupBy
df.groupBy { isHappy }
df.groupBy("isHappy")
first
df.groupBy { isHappy }.first { age == 30 }
df.groupBy("isHappy").first { it["age"] == 30 }
last
df.groupBy { isHappy }.last { weight == null }
df.groupBy("isHappy").last { it["weight"] == null }
minBy
df.groupBy { isHappy }.minBy { weight }
df.groupBy("isHappy").minBy("weight")
maxBy
df.groupBy { isHappy }.maxBy { age }
df.groupBy("isHappy").maxBy("age")
medianBy
df.groupBy { isHappy }.medianBy { weight }
df.groupBy("isHappy").medianBy("weight")
percentileBy
df.groupBy { isHappy }.percentileBy(25.0) { weight }
df.groupBy("isHappy").percentileBy(25.0, "weight")

Step 2: transform the result to a DataFrame

A ReducedGroupBy can be transformed into a DataFrame using the following functions:

Each method returns a new DataFrame that includes the grouping key columns, containing all unique grouping key values (or value combinations for multiple keys) along with their corresponding reduced rows.

Examples of transforming

concat
df.groupBy { isHappy }.minBy { age }.concat()
df.groupBy("isHappy").minBy("age").concat()
values
df.groupBy { isHappy }.minBy { age }.values { name and age and city }
df.groupBy("isHappy").minBy("age").values("name", "age", "city")
into
df.groupBy { isHappy }.minBy { age }.into("youngest") { name }
df.groupBy("isHappy").minBy("age").into("youngest") { getColumnGroup("name") }

Aggregation

A GroupBy can be directly transformed into a new DataFrame by applying one or more aggregation operations to its groups.

Aggregation is a generalization of reducing.

The following aggregation methods are available:

  • concat — concatenates all rows from all groups into a single DataFrame, without preserving grouping keys.

  • toDataFrame — returns this GroupBy as a DataFrame with the grouping keys and corresponding groups in FrameColumn.

  • concatWithKeys — a variant of concat that also includes grouping keys that were not present in the original DataFrame.

  • into — creates a new column containing a list of values computed with a RowExpression for each group, or a new FrameColumn containing the groups themselves.

  • values — collects all column values for every group without aggregation. For a ValueColumn of type T it will gather group values into lists of type List<T>. For a ColumnGroup it will gather group values into a DataFrame and convert that ColumnGroup into a FrameColumn.

  • count — creates a DataFrame containing the grouping key columns and an additional column with the number of rows in each corresponding group.

  • aggregate — performs a set of custom aggregations using AggregateDsl, allowing you to compute one or more statistics per every group of GroupBy. The body if this function will be executed for every data group and has a receiver of type DataFrame that represents the current data group being aggregated. To add a new column to the resulting DataFrame, pass the name of the new column to infix function into.

Each of these methods returns a new DataFrame that includes the grouping key columns (except for concat) along with the columns of values aggregated from the corresponding groups.

Examples of aggregation

concat on GroupBy

concat can be used to union all data groups of GroupBy into the original DataFrame preserving the new order of rows produced by grouping:

df.groupBy { isHappy }.concat()
df.groupBy("isHappy").concat()

toDataFrame on GroupBy

Any GroupBy can be reinterpreted as DataFrame with FrameColumn:

df.groupBy { isHappy }.toDataFrame()
df.groupBy("isHappy").toDataFrame()

concatWithKeys on GroupBy

df.groupBy { expr { age >= 18 } named "isAdult" }.concatWithKeys()
df.groupBy { expr { "age"<Int>() >= 18 } named "isAdult" }.concatWithKeys()

into on GroupBy

df.groupBy { isHappy }.into("ages") { age }
df.groupBy("isHappy").into("ages") { "age"<Int>() }

values on GroupBy

all columns
df.groupBy { isHappy }.values()
df.groupBy("isHappy").values()
selected columns
df.groupBy { isHappy }.values { name and age }
df.groupBy("isHappy").values("name", "age")
rename columns
df.groupBy { isHappy }.values { age into "ages" }
df.groupBy("isHappy").values { "age" into "ages" }

count on GroupBy

df.groupBy { city }.count()
df.groupBy("city").count()

aggregate on GroupBy

df.groupBy { city }.aggregate { count() into "total" count { age > 18 } into "adults" median { age } into "median age" min { age } into "min age" maxBy { age }.name into "oldest" }
df.groupBy("city").aggregate { count() into "total" count { "age"<Int>() > 18 } into "adults" median("age") into "median age" min("age") into "min age" maxBy("age")["name"] into "oldest" } // or df.groupBy("city").aggregate { count() into "total" count { "age"<Int>() > 18 } into "adults" "age"<Int>().median() into "median age" "age"<Int>().min() into "min age" maxBy("age")["name"] into "oldest" }

If only one aggregation function is used, the column name can be omitted:

df.groupBy { city }.aggregate { maxBy { age }.name }
df.groupBy("city").aggregate { maxBy("age")["name"] }

Aggregation statistics

Aggregation statistics are predefined shortcuts for common statistical aggregations such as sum, mean, median, and others.

Each function computes a statistic across the rows of a group and returns the result as a new column (or several columns) in the resulting DataFrame.

The following aggregation statistics are available:

To compute one or several statistics per every group of GroupBy, use the aggregate function.

The functions max, maxOf, and maxFor differ as follows. They all calculate the maximum of values, but:

  • max computes it on the selected columns. If more than one column is selected, for each group it computes one maximum value among all selected columns.

  • maxOf computes it by a row expression: the expression is calculated for each row of the group and the maximum value is returned.

  • maxFor computes it for each of the selected columns within each group. If more than one column is selected, for each group it computes the maximum value for each selected column separately.

Similar logic applies to other statistics.

Direct aggregations

Most common aggregation functions can be computed directly on a GroupBy.

Examples of direct aggregations
max
df.groupBy { city }.max() // max for every column with mutually comparable values
df.groupBy("city").max() // max for every column with mutually comparable values
df.groupBy { isHappy }.max { age and weight }
df.groupBy("city").max("age", "weight")
df.groupBy { isHappy }.maxFor { age and weight }
df.groupBy("isHappy").maxFor("age", "weight")
df.groupBy { isHappy }.maxOf { if (age < 30) weight else null }
df.groupBy("isHappy").maxOf { if ("age"<Int>() < 30) "weight"<Int>() else null }
df.groupBy { city } .max { name.firstName.map { it.length } and name.lastName.map { it.length } } // maximum length of firstName or lastName into column "max"
df.groupBy("city").max { "name"["firstName"]<String>().map { it.length } and "name"["lastName"]<String>().map { it.length } } // maximum length of firstName or lastName into column "max"
min
df.groupBy { isHappy }.min { age }
df.groupBy("isHappy").min("age")
df.groupBy { city } .minFor { (age into "minAge") and (weight into "minWeight") } // min age into column "min age", min weight into column "min weight"
df.groupBy("city") .minFor { ("age"<Int>() into "minAge") and ("weight"<Int?>() into "minWeight") } // min age into column "min age", min weight into column "min weight"
sum
df.groupBy { city }.sum("totalWeight") { weight } // sum of weights into column "total weight"
df.groupBy("city").sum("weight", name = "totalWeight") // sum of weights into column "total weight"
mean
df.groupBy { city }.mean() // mean for every numeric column
df.groupBy("city").mean() // mean for every numeric column
df.groupBy { city }.meanOf("meanRatio") { weight?.div(age) } // mean of weight/age into column "mean ratio"
df.groupBy("city").meanOf("meanRatio") { "weight"<Int?>()?.div("age"<Int>()) } // mean of weight/age into column "mean ratio"
std
df.groupBy { isHappy }.std { age }
df.groupBy("isHappy").std("age")
median
df.groupBy { isHappy }.median { age }
df.groupBy("isHappy").median("age")
df.groupBy { city } .medianFor { age and weight } // median age into column "age", median weight into column "weight"
df.groupBy("city") .medianFor("age", "weight") // median age into column "age", median weight into column "weight"
percentile
df.groupBy { isHappy }.percentile(25.0) { age }
df.groupBy("isHappy").percentile(25.0) { "age"<Int>() }

Pivot + GroupBy

A GroupBy can be pivoted with the pivot method. It produces a PivotGroupBy that combines vertical and horizontal grouping, enabling computation of cross-group, matrix-like statistics.

df.groupBy { isHappy }.pivot { name.firstName }
df.groupBy("isHappy").pivot { "name"["firstName"]<String>() }

For more information, see pivot + groupBy

13 May 2026