pivot
Splits the rows of a DataFrame and groups them horizontally into new columns based on values from one or several columns of the original DataFrame.
See column selectors for how to select the columns for this operation, pivot + groupBy, pivot reducing, pivot aggregation.
Parameters:
inward— iftruegenerated columns are nested inside the original column, otherwise they will be top-levelpivotColumns— columns with values for horizontal data grouping and generation of new columnsindexColumns— columns with values for vertical data groupingdefaultValue— value to fill mismatched pivot-index column pairsvalueColumns— columns with output values
To pivot several columns at once, you can combine them using and or then infix function:
andwill pivot columns independentlythenwill create column hierarchy from combinations of values from pivoted columns
pivot + groupBy
To create a matrix table that is expanded both horizontally and vertically, apply groupBy transformation passing the columns for vertical grouping. Reversed order of pivot and groupBy will produce the same result.
To group by all columns except pivoted, use groupByOther:
PivotGroupBy supports the same reducing and aggregation operations as Pivot. In addition, PivotGroupBy supports the default and matches functions.
Reducing
A Pivot can be reduced into a DataRow. It means that each Pivot group is collapsed into a single row, and then these rows are concatenated into a single DataRow with Pivot keys as top-level columns or as column groups.
Reducing is a specific case of aggregation.
Step 1: use a reducing method
Use the following functions to collapse each group in a Pivot into a single row:
first/last— take the first or last row (optionally, the first or last one that satisfies a predicate) of each group;minBy/maxBy— take the row with the minimum or maximum value of the givenRowExpressionevaluated on rows within each group;medianBy/percentileBy— take the row with the median or a specific percentile value of the givenRowExpressionevaluated on rows within each group.
These functions return an instance of ReducedPivot, which is a class serving as a transitional step between performing a reduction on Pivot groups and specifying how the resulting reduced rows should be represented in a resulting DataRow.
Examples of reducing
df.pivot
first
The first function can be used with a predicate:
last
The last function can be used with a predicate:
minBy
maxBy
medianBy
percentileBy
Step 2: transform the resulting reduced rows into a new DataRow
ReducedPivot can be transformed into a new DataRow (either using the original reduced rows or their transformed versions).
To perform this transformation, use one of the following functions:
values— creates a newDataRowcontaining the values from the reduced rows in the selected columns;with— computes a new value for each reduced row using aRowExpressionand produces aDataRowcontaining these computed values.
Each of these functions returns a new DataRow with Pivot keys as top-level columns (or as column groups) and values composed of the reduced results from each group.
Examples of transforming
values
with
Aggregation
Each Pivot group can be aggregated — that is, transformed into a new value, DataRow, or DataFrame — and then concatenated into a single DataRow composed of these aggregated results, with pivot keys as top-level columns or as column groups.
The following aggregation methods are available:
frames— returns thisPivotas aDataRowwith pivot keys ascolumns(orcolumn groups) and corresponding groups stored asFrameColumns;values— collects values from all rows of each group for the selected columns into a singleList(values fromcolumn groupsare collected into aFrameColumn);count— creates aDataRowwith thePivotkey columns containing the number of rows in each corresponding group;with— creates aDataRowcontaining values computed using aRowExpressionacross all rows of each group and collected into a single List for every group;aggregate— performs a set of custom aggregations usingAggregateDsl, allowing computation of one or more derived values per group;various
aggregation statistics.
Each of these methods returns a new DataRow with Pivot keys as top-level columns (or as column groups) and values representing the aggregated results of each group.
Examples of aggregation
frames
values on Pivot
count
with on Pivot
aggregate
If only one aggregation function is used, the column name can be omitted:
Aggregation statistics
Aggregation statistics are predefined shortcuts for common statistical aggregations such as sum, mean, median, and others. These aggregation operations are applied to each group within a Pivot.
Each function computes a statistic across the rows of a pivot group and returns the result as a new value in the resulting DataRow (for Pivot) or as a new column in the resulting DataFrame (for PivotGroupBy).
The following aggregation statistics are available:
To compute one or several statistics per every pivot group, use the aggregate function.
The functions max, maxOf, and maxFor differ as follows. They all calculate the maximum of values, but:
maxcomputes it on the selected columns. If more than one column is selected, for each group it computes one maximum value among all selected columns.maxOfcomputes it by arow expression: the expression is calculated for each row of the group and the maximum value is returned.maxForcomputes it for each of the selected columns within each group. If more than one column is selected, for each group it computes the maximum value for each selected column separately.
Similar logic applies to other statistics.
Direct aggregations
Most common aggregation functions can be computed directly on a Pivot or PivotGroupBy.
When applied to a Pivot (without groupBy), the result is a DataRow with pivot keys as columns and the computed statistic as values.
When applied to a PivotGroupBy, the result is a DataFrame where rows correspond to the grouping keys and columns correspond to the pivot keys, with computed statistics as cell values.
Examples of direct aggregations
max
min
sum
mean
std
median
percentile
The separate flag
By default, when an aggregation function produces several values for a single data group, the column hierarchy in the resulting DataFrame is indexed first by pivot keys and then by the names of aggregated values. To reverse this order so that resulting columns are indexed first by names of aggregated values and then by pivot keys, use the separate=true flag that is available in multi-result aggregation operations, such as aggregate or <stat>For:
Default values for aggregation
By default, any aggregation function will result in a null value for those matrix cells where intersection of column and row keys produced an empty data group. You can specify a default value for any aggregation by using the default infix function. This value will replace all null results of the aggregation function over non-empty data groups as well. To use one default value for all aggregation functions, use default() before aggregation.
Pivot inside aggregate
pivot can be used inside the aggregate function of groupBy. This allows combining column pivoting with other groupBy aggregations:
pivotCounts
Pivots one or several columns with count aggregation, preserving all other columns of a DataFrame or a GroupBy.
pivotMatches
Pivots one or several columns with Boolean aggregation, preserving all other columns of DataFrame. Each cell in the resulting pivoted columns contains true if that value was present in the original data for the corresponding row, and false otherwise.