pivot
Splits the rows of a DataFrame
and groups them horizontally into new columns based on values from one or several columns of the original DataFrame
.
pivot (inward = true) { pivotColumns }
[ .groupBy { indexColumns } | .groupByOther() ]
[ .default(defaultValue) ]
reducer | aggregator
reducer = .minBy { column } | .maxBy { column } | .first [ { rowCondition } ] | .last [ { rowCondition } ]
.with { rowExpression } | .values { valueColumns }
aggregator = .count() | .matches() | .frames() | .with { rowExpression } | .values { valueColumns } | .aggregate { aggregations } | .<stat> [ { columns } ]
See column selectors for how to select the columns for this operation.
Parameters:
inward
— iftrue
generated columns are nested inside the original column, otherwise they will be top-levelpivotColumns
— columns with values for horizontal data grouping and generation of new columnsindexColumns
— columns with values for vertical data groupingdefaultValue
— value to fill mismatched pivot-index column pairsvalueColumns
— columns with output values
df.pivot { city }
df.pivot("city")
To pivot several columns at once, you can combine them using and
or then
infix function:
and
will pivot columns independentlythen
will create column hierarchy from combinations of values from pivoted columns
df.pivot { city and name.firstName }
df.pivot { city then name.firstName }
df.pivot { "city" and "name"["firstName"] }
df.pivot { "city" then "name"["firstName"] }
To create a matrix table that is expanded both horizontally and vertically, apply groupBy
transformation passing the columns for vertical grouping. Reversed order of pivot
and groupBy
will produce the same result.
df.pivot { city }.groupBy { name }
// same as
df.groupBy { name }.pivot { city }
df.pivot("city").groupBy("name")
// same as
df.groupBy("name").pivot("city")
To group by all columns except pivoted use groupByOther
:
df.pivot { city }.groupByOther()
To aggregate data groups with one or several statistics use aggregate
:
df.pivot { city }.aggregate { minBy { age }.name }
df.pivot("city").aggregate { minBy("age")["name"] }
df.pivot { city }.groupBy { name.firstName }.aggregate {
meanFor { age and weight } into "means"
stdFor { age and weight } into "stds"
maxByOrNull { weight }?.name?.lastName into "biggest"
}
df.pivot("city").groupBy { "name"["firstName"] }.aggregate {
meanFor("age", "weight") into "means"
stdFor("age", "weight") into "stds"
maxByOrNull("weight")?.getColumnGroup("name")?.get("lastName") into "biggest"
}
Shortcuts for common aggregation functions are also available:
df.pivot { city }.maxFor { age and weight }
df.groupBy { name }.pivot { city }.median { age }
df.pivot("city").maxFor("age", "weight")
df.groupBy("name").pivot("city").median("age")
By default, when aggregation function produces several values for single data group, column hierarchy in resulting DataFrame
will be indexed first by pivot keys and then by the names of aggregated values. To reverse this order so that resulting columns will be indexed first by names of aggregated values and then by pivot keys, use separate=true
flag that is available in multi-result aggregation operations, such as aggregate
or <stat>For
:
df.pivot { city }.maxFor(separate = true) { age and weight }
df.pivot { city }.aggregate(separate = true) {
min { age } into "min age"
maxOrNull { weight } into "max weight"
}
df.pivot("city").maxFor("age", "weight", separate = true)
df.pivot("city").aggregate(separate = true) {
min("age") into "min age"
maxOrNull("weight") into "max weight"
}
By default, any aggregation function will result in null
value for those matrix cells, where intersection of column and row keys produced an empty data group. You can specify default value for any aggregation by default
infix function. This value will replace all null
results of aggregation function over non-empty data groups as well. To use one default value for all aggregation functions, use default()
before aggregation.
df.pivot { city }.groupBy { name }.aggregate { min { age } default 0 }
df.pivot { city }.groupBy { name }.default(0).min()
df.pivot("city").groupBy("name").aggregate { min("age") default 0 }
df.pivot("city").groupBy("name").default(0).min()
df.pivot { city }.groupBy { name }.aggregate {
median { age } into "median age" default 0
minOrNull { weight } into "min weight" default 100
}
df.pivot("city").groupBy("name").aggregate {
median("age") into "median age" default 0
minOrNull("weight") into "min weight" default 100
}
pivot transformation can be used inside aggregate
function of groupBy
. This allows combining column pivoting with other groupBy
aggregations:
df.groupBy { name.firstName }.aggregate {
pivot { city }.aggregate(separate = true) {
mean { age } into "mean age"
count() into "count"
}
count() into "total"
}
df.groupBy { "name"["firstName"] }.aggregate {
pivot("city").aggregate(separate = true) {
mean("age") into "mean age"
count() into "count"
}
count() into "total"
}
Pivots with count
statistics one or several columns preserving all other columns of DataFrame
or GroupBy DataFrame
.
df.pivotCounts { city }
// same as
df.pivot { city }.groupByOther().count()
df.groupBy { name }.pivotCounts { city }
// same as
df.groupBy { name }.pivot { city }.count()
// same as
df.groupBy { name }.aggregate {
pivotCounts { city }
}
Pivots with Boolean
statistics one or several columns preserving all other columns of DataFrame
.
df.pivotMatches { city }
// same as
df.pivot { city }.groupByOther().matches()
df.groupBy { name }.pivotMatches { city }
// same as
df.groupBy { name }.pivot { city }.matches()
// same as
df.groupBy { name }.aggregate {
pivotMatches { city }
}