Summary statistics
Every summary statistics can be used in aggregations of:
sum, mean, std are available for numeric columns of types Int
, Double
, Float
, BigDecimal
, Long
, Byte
.
min/max, median are available for Comparable
columns.
When statistics x
is applied to several columns, it can be computed in several modes:
x(): DataRow
computes separate value per every suitable columnx { columns }: Value
computes single value across all given columnsxFor { columns }: DataRow
computes separate value per every given columnxOf { rowExpression }: Value
computes single value across results of row expression evaluated for every row
min and max statistics have additional mode by
:
minBy { rowExpression }: DataRow
finds a row with minimal result of expression
groupBy statistics
When statistics is applied to GroupBy DataFrame
, it is computed for every data group.
If statistic is applied in a mode that returns a single value for every data group, it will be stored in a single column named by statistic name.
You can also pass custom name for aggregated column:
If statistic is applied in a mode that returns separate value per every column in data group, aggregated values will be stored in columns with original column names.
pivot statistics
When statistics is applied to Pivot
or PivotGroupBy
, it is computed for every data group.
If statistic is applied in a mode that returns a single value for every data group, it will be stored in matrix cell without any name.
If statistic is applied in such a way that it returns separate value per every column in data group, every cell in matrix will contain DataRow
with values for every aggregated column.
To group columns in aggregation results not by pivoted values, but by aggregated columns, apply separate
flag: