pivot
Splits the rows of DataFrame
and groups them horizontally into new columns based on values from one or several columns of original DataFrame
.
pivot (inward = true) { pivotColumns }
[ .groupBy { indexColumns } | .groupByOther() ]
[ .default(defaultValue) ]
reducer | aggregator
reducer = .minBy { column } | .maxBy { column } | .first [ { rowCondition } ] | .last [ { rowCondition } ]
.with { rowExpression } | .values { valueColumns }
aggregator = .count() | .matches() | .frames() | .with { rowExpression } | .values { valueColumns } | .aggregate { aggregations } | .<stat> [ { columns } ]
Parameters:
inward
— if true
generated columns will be nested inside original column, otherwise they will be top-level
pivotColumns
— columns with values for horizontal data grouping and generation of new columns
indexColumns
— columns with values for vertical data grouping
defaultValue
— value to fill mismatched pivot-index column pairs
valueColumns
— columns with output values
val city by column<String?>()
df.pivot { city }
To pivot several columns at once you can combine them using and
or then
infix function:
df.pivot { city and name.firstName }
df.pivot { city then name.firstName }
val city by column<String?>()
val name by columnGroup()
val firstName by name.column<String>()
df.pivot { city and firstName }
df.pivot { city then firstName }
df.pivot { "city" and "name"["firstName"] }
df.pivot { "city" then "name"["firstName"] }
pivot + groupBy
To create matrix table that is expanded both horizontally and vertically, apply groupBy
transformation passing the columns for vertical grouping. Reversed order of pivot
and groupBy
will produce the same result.
df.pivot { city }.groupBy { name }
// same as
df.groupBy { name }.pivot { city }
val city by column<String?>()
val name by columnGroup()
df.pivot { city }.groupBy { name }
// same as
df.groupBy { name }.pivot { city }
df.pivot("city").groupBy("name")
// same as
df.groupBy("name").pivot("city")
To group by all columns except pivoted use groupByOther
:
df.pivot { city }.groupByOther()
Aggregation
To aggregate data groups with one or several statistics use aggregate
:
df.pivot { city }.aggregate { minBy { age }.name }
val city by column<String?>()
val name by columnGroup()
val firstName by name.column<String>()
val age by column<Int>()
val weight by column<Int?>()
df.pivot { city }.aggregate { minBy(age).name }
df.pivot("city").aggregate { minBy("age")["name"] }
df.pivot { city }.groupBy { name.firstName }.aggregate {
meanFor { age and weight } into "means"
stdFor { age and weight } into "stds"
maxByOrNull { weight }?.name?.lastName into "biggest"
}
val city by column<String?>()
val name by columnGroup()
val firstName by name.column<String>()
val age by column<Int>()
val weight by column<Int?>()
df.pivot { city }.groupBy { firstName }.aggregate {
meanFor { age and weight } into "means"
stdFor { age and weight } into "stds"
maxByOrNull(weight)?.name?.lastName into "biggest"
}
df.pivot("city").groupBy { "name"["firstName"] }.aggregate {
meanFor("age", "weight") into "means"
stdFor("age", "weight") into "stds"
maxByOrNull("weight")?.getColumnGroup("name")?.get("lastName") into "biggest"
}
Shortcuts for common aggregation functions are also available:
df.pivot { city }.maxFor { age and weight }
df.groupBy { name }.pivot { city }.median { age }
val city by column<String?>()
val name by columnGroup()
val age by column<Int>()
val weight by column<Int?>()
df.pivot { city }.maxFor { age and weight }
df.groupBy { name }.pivot { city }.median { age }
df.pivot("city").maxFor("age", "weight")
df.groupBy("name").pivot("city").median("age")
By default, when aggregation function produces several values for single data group, column hierarchy in resulting DataFrame
will be indexed first by pivot keys and then by the names of aggregated values. To reverse this order so that resulting columns will be indexed first by names of aggregated values and then by pivot keys, use separate=true
flag that is available in multi-result aggregation operations, such as aggregate
or <stat>For
:
df.pivot { city }.maxFor(separate = true) { age and weight }
df.pivot { city }.aggregate(separate = true) {
min { age } into "min age"
maxOrNull { weight } into "max weight"
}
val city by column<String?>()
val age by column<Int>()
val weight by column<Int?>()
df.pivot { city }.maxFor(separate = true) { age and weight }
df.pivot { city }.aggregate(separate = true) {
min { age } into "min age"
maxOrNull { weight } into "max weight"
}
df.pivot("city").maxFor("age", "weight", separate = true)
df.pivot("city").aggregate(separate = true) {
min("age") into "min age"
maxOrNull("weight") into "max weight"
}
By default, any aggregation function will result in null
value for those matrix cells, where intersection of column and row keys produced an empty data group. You can specify default value for any aggregation by default
infix function. This value will replace all null
results of aggregation function over non-empty data groups as well. To use one default value for all aggregation functions, use default()
before aggregation.
df.pivot { city }.groupBy { name }.aggregate { min { age } default 0 }
df.pivot { city }.groupBy { name }.default(0).min()
val city by column<String?>()
val age by column<Int>()
val weight by column<Int?>()
val name by columnGroup()
df.pivot { city }.groupBy { name }.aggregate { min { age } default 0 }
df.pivot { city }.groupBy { name }.default(0).min()
df.pivot("city").groupBy("name").aggregate { min("age") default 0 }
df.pivot("city").groupBy("name").default(0).min()
df.pivot { city }.groupBy { name }.aggregate {
median { age } into "median age" default 0
minOrNull { weight } into "min weight" default 100
}
val city by column<String?>()
val age by column<Int>()
val weight by column<Int?>()
val name by columnGroup()
df.pivot { city }.groupBy { name }.aggregate {
median { age } into "median age" default 0
minOrNull { weight } into "min weight" default 100
}
df.pivot("city").groupBy("name").aggregate {
median("age") into "median age" default 0
minOrNull("weight") into "min weight" default 100
}
Pivot inside aggregate
pivot transformation can be used inside aggregate
function of groupBy
. This allows to combine column pivoting with other groupBy
aggregations:
df.groupBy { name.firstName }.aggregate {
pivot { city }.aggregate(separate = true) {
mean { age } into "mean age"
count() into "count"
}
count() into "total"
}
val city by column<String?>()
val name by columnGroup()
val firstName by name.column<String>()
val age by column<Int>()
df.groupBy { firstName }.aggregate {
pivot { city }.aggregate(separate = true) {
mean { age } into "mean age"
count() into "count"
}
count() into "total"
}
df.groupBy { "name"["firstName"] }.aggregate {
pivot("city").aggregate(separate = true) {
mean("age") into "mean age"
count() into "count"
}
count() into "total"
}
pivotCounts
Pivots with count
statistics one or several columns preserving all other columns of DataFrame
or GroupBy DataFrame
.
df.pivotCounts { city }
// same as
df.pivot { city }.groupByOther().count()
df.groupBy { name }.pivotCounts { city }
// same as
df.groupBy { name }.pivot { city }.count()
// same as
df.groupBy { name }.aggregate {
pivotCounts { city }
}
pivotMatches
Pivots with Boolean
statistics one or several columns preserving all other columns of DataFrame
.
df.pivotMatches { city }
// same as
df.pivot { city }.groupByOther().matches()
df.groupBy { name }.pivotMatches { city }
// same as
df.groupBy { name }.pivot { city }.matches()
// same as
df.groupBy { name }.aggregate {
pivotMatches { city }
}
Last modified: 09 December 2024