Dataframe 0.13 Help

pivot

Splits the rows of DataFrame and groups them horizontally into new columns based on values from one or several columns of original DataFrame.

pivot (inward = true) { pivotColumns } [ .groupBy { indexColumns } | .groupByOther() ] [ .default(defaultValue) ] reducer | aggregator reducer = .minBy { column } | .maxBy { column } | .first [ { rowCondition } ] | .last [ { rowCondition } ] .with { rowExpression } | .values { valueColumns } aggregator = .count() | .matches() | .frames() | .with { rowExpression } | .values { valueColumns } | .aggregate { aggregations } | .<stat> [ { columns } ]

Parameters:

  • inward — if true generated columns will be nested inside original column, otherwise they will be top-level

  • pivotColumns — columns with values for horizontal data grouping and generation of new columns

  • indexColumns — columns with values for vertical data grouping

  • defaultValue — value to fill mismatched pivot-index column pairs

  • valueColumns — columns with output values

df.pivot { city }
val city by column<String?>() df.pivot { city }
df.pivot("city")

To pivot several columns at once you can combine them using and or then infix function:

  • and will pivot columns independently

  • then will create column hierarchy from combinations of values from pivoted columns

df.pivot { city and name.firstName } df.pivot { city then name.firstName }
val city by column<String?>() val name by columnGroup() val firstName by name.column<String>() df.pivot { city and firstName } df.pivot { city then firstName }
df.pivot { "city" and "name"["firstName"] } df.pivot { "city" then "name"["firstName"] }

pivot + groupBy

To create matrix table that is expanded both horizontally and vertically, apply groupBy transformation passing the columns for vertical grouping. Reversed order of pivot and groupBy will produce the same result.

df.pivot { city }.groupBy { name } // same as df.groupBy { name }.pivot { city }
val city by column<String?>() val name by columnGroup() df.pivot { city }.groupBy { name } // same as df.groupBy { name }.pivot { city }
df.pivot("city").groupBy("name") // same as df.groupBy("name").pivot("city")

To group by all columns except pivoted use groupByOther:

df.pivot { city }.groupByOther()

Aggregation

To aggregate data groups with one or several statistics use aggregate:

df.pivot { city }.aggregate { minBy { age }.name }
val city by column<String?>() val name by columnGroup() val firstName by name.column<String>() val age by column<Int>() val weight by column<Int?>() df.pivot { city }.aggregate { minBy(age).name }
df.pivot("city").aggregate { minBy("age")["name"] }
df.pivot { city }.groupBy { name.firstName }.aggregate { meanFor { age and weight } into "means" stdFor { age and weight } into "stds" maxByOrNull { weight }?.name?.lastName into "biggest" }
val city by column<String?>() val name by columnGroup() val firstName by name.column<String>() val age by column<Int>() val weight by column<Int?>() df.pivot { city }.groupBy { firstName }.aggregate { meanFor { age and weight } into "means" stdFor { age and weight } into "stds" maxByOrNull(weight)?.name?.lastName into "biggest" }
df.pivot("city").groupBy { "name"["firstName"] }.aggregate { meanFor("age", "weight") into "means" stdFor("age", "weight") into "stds" maxByOrNull("weight")?.getColumnGroup("name")?.get("lastName") into "biggest" }

Shortcuts for common aggregation functions are also available:

df.pivot { city }.maxFor { age and weight } df.groupBy { name }.pivot { city }.median { age }
val city by column<String?>() val name by columnGroup() val age by column<Int>() val weight by column<Int?>() df.pivot { city }.maxFor { age and weight } df.groupBy { name }.pivot { city }.median { age }
df.pivot("city").maxFor("age", "weight") df.groupBy("name").pivot("city").median("age")

By default, when aggregation function produces several values for single data group, column hierarchy in resulting DataFrame will be indexed first by pivot keys and then by the names of aggregated values. To reverse this order so that resulting columns will be indexed first by names of aggregated values and then by pivot keys, use separate=true flag that is available in multi-result aggregation operations, such as aggregate or <stat>For:

df.pivot { city }.maxFor(separate = true) { age and weight } df.pivot { city }.aggregate(separate = true) { min { age } into "min age" maxOrNull { weight } into "max weight" }
val city by column<String?>() val age by column<Int>() val weight by column<Int?>() df.pivot { city }.maxFor(separate = true) { age and weight } df.pivot { city }.aggregate(separate = true) { min { age } into "min age" maxOrNull { weight } into "max weight" }
df.pivot("city").maxFor("age", "weight", separate = true) df.pivot("city").aggregate(separate = true) { min("age") into "min age" maxOrNull("weight") into "max weight" }

By default, any aggregation function will result in null value for those matrix cells, where intersection of column and row keys produced an empty data group. You can specify default value for any aggregation by default infix function. This value will replace all null results of aggregation function over non-empty data groups as well. To use one default value for all aggregation functions, use default() before aggregation.

df.pivot { city }.groupBy { name }.aggregate { min { age } default 0 } df.pivot { city }.groupBy { name }.default(0).min()
val city by column<String?>() val age by column<Int>() val weight by column<Int?>() val name by columnGroup() df.pivot { city }.groupBy { name }.aggregate { min { age } default 0 } df.pivot { city }.groupBy { name }.default(0).min()
df.pivot("city").groupBy("name").aggregate { min("age") default 0 } df.pivot("city").groupBy("name").default(0).min()
df.pivot { city }.groupBy { name }.aggregate { median { age } into "median age" default 0 minOrNull { weight } into "min weight" default 100 }
val city by column<String?>() val age by column<Int>() val weight by column<Int?>() val name by columnGroup() df.pivot { city }.groupBy { name }.aggregate { median { age } into "median age" default 0 minOrNull { weight } into "min weight" default 100 }
df.pivot("city").groupBy("name").aggregate { median("age") into "median age" default 0 minOrNull("weight") into "min weight" default 100 }

Pivot inside aggregate

pivot transformation can be used inside aggregate function of groupBy. This allows to combine column pivoting with other groupBy aggregations:

df.groupBy { name.firstName }.aggregate { pivot { city }.aggregate(separate = true) { mean { age } into "mean age" count() into "count" } count() into "total" }
val city by column<String?>() val name by columnGroup() val firstName by name.column<String>() val age by column<Int>() df.groupBy { firstName }.aggregate { pivot { city }.aggregate(separate = true) { mean { age } into "mean age" count() into "count" } count() into "total" }
df.groupBy { "name"["firstName"] }.aggregate { pivot("city").aggregate(separate = true) { mean("age") into "mean age" count() into "count" } count() into "total" }

pivotCounts

Pivots with count statistics one or several columns preserving all other columns of DataFrame or GroupBy DataFrame.

df.pivotCounts { city } // same as df.pivot { city }.groupByOther().count() df.groupBy { name }.pivotCounts { city } // same as df.groupBy { name }.pivot { city }.count() // same as df.groupBy { name }.aggregate { pivotCounts { city } }