groupBy

groupBy(moveToTop = true) { columns }
      [ transformations ]
      reducer | aggregator | pivot

transformations = [ .sortByCount() | .sortByCountAsc() | .sortBy { columns } | .sortByDesc { columns } ]
                  [ .updateGroups { frameExpression } ]
                  [ .add(column) { rowExpression } ]

reducer = .minBy { column } | .maxBy { column } | .first [ { rowCondition } ] | .last [ { rowCondition } ]
          .concat() | .into([column]) [{ rowExpression }] | .values { valueColumns }

aggregator = .count() | .concat() | .into([column]) [{ rowExpression }] | .values { valueColumns } | .aggregate { aggregations } | .<stat> [ { columns } ]

pivot = .pivot { columns }
      [ .default(defaultValue) ]
         pivotReducer | pivotAggregator

df.groupBy { name }
df.groupBy { city and name.lastName }
df.groupBy { age / 10 named "ageDecade" }

df.groupBy { expr { name.firstName.length + name.lastName.length } named "nameLength" }

With optional moveToTop parameter you can choose whether to make a selected nested column a top-level column:

df.groupBy(moveToTop = true) { name.lastName }

or to keep it inside a ColumnGroup:

df.groupBy(moveToTop = false) { name.lastName }

Returns GroupBy object.

Transformation

GroupBy DataFrame is a DataFrame with one chosen FrameColumn containing data groups.

Any DataFrame with FrameColumn can be reinterpreted as GroupBy DataFrame:

val key by columnOf(1, 2) // create int column with name "key"
val data by columnOf(df[0..3], df[4..6]) // create frame column with name "data"
val df = dataFrameOf(key, data) // create dataframe with two columns

df.asGroupBy { data } // convert dataframe to GroupBy by interpreting 'data' column as groups

And any GroupBy DataFrame can be reinterpreted as DataFrame with FrameColumn:

df.groupBy { city }.toDataFrame()

Use concat to union all data groups of GroupBy into original DataFrame preserving new order of rows produced by grouping:

df.groupBy { name }.concat()

Aggregation

To compute one or several statistics per every group of GroupBy use aggregate function. Its body will be executed for every data group and has a receiver of type DataFrame that represents current data group being aggregated. To add a new column to the resulting DataFrame, pass the name of new column to infix function into:

df.groupBy { city }.aggregate {
    count() into "total"
    count { age > 18 } into "adults"
    median { age } into "median age"
    min { age } into "min age"
    maxBy { age }.name into "oldest"
}

df.groupBy { city }.aggregate { maxBy { age }.name }

df.groupBy { city }.max() // max for every comparable column
df.groupBy { city }.mean() // mean for every numeric column
df.groupBy { city }.max { age } // max age into column "age"
df.groupBy { city }.sum("total weight") { weight } // sum of weights into column "total weight"
df.groupBy { city }.count() // number of rows into column "count"
df.groupBy { city }
    .max { name.firstName.length() and name.lastName.length() } // maximum length of firstName or lastName into column "max"
df.groupBy { city }
    .medianFor { age and weight } // median age into column "age", median weight into column "weight"
df.groupBy { city }
    .minFor { (age into "min age") and (weight into "min weight") } // min age into column "min age", min weight into column "min weight"
df.groupBy { city }.meanOf("mean ratio") { weight?.div(age) } // mean of weight/age into column "mean ratio"

To get all column values for every group without aggregation use values function:

df.groupBy { city }.values()
df.groupBy { city }.values { name and age }
df.groupBy { city }.values { weight into "weights" }

groupBy﻿

Transformation﻿

Aggregation﻿

groupBy

Transformation

Aggregation