# Summary statistics

Every summary statistics can be used in aggregations of:

df.mean() df.age.sum() df.groupBy { city }.mean() df.pivot { city }.median() df.pivot { city }.groupBy { name.lastName }.std()

sum, mean, std are available for numeric columns of types `Int`, `Double`, `Float`, `BigDecimal`, `Long`, `Byte`.

min/max, median are available for `Comparable` columns.

When statistics `x` is applied to several columns, it can be computed in several modes:

• `x(): DataRow` computes separate value per every suitable column

• `x { columns }: Value` computes single value across all given columns

• `xFor { columns }: DataRow` computes separate value per every given column

• `xOf { rowExpression }: Value` computes single value across results of row expression evaluated for every row

min and max statistics have additional mode `by`:

• `minBy { rowExpression }: DataRow` finds a row with minimal result of expression

df.sum() // sum of values per every numeric column df.sum { age and weight } // sum of all values in `age` and `weight` df.sumFor { age and weight } // sum of values per `age` and `weight` separately df.sumOf { (weight ?: 0) / age } // sum of expression evaluated for every row

## groupBy statistics

When statistics is applied to `GroupBy DataFrame`, it is computed for every data group.

If statistic is applied in a mode that returns a single value for every data group, it will be stored in a single column named by statistic name.

df.groupBy { city }.mean { age } // [`city`, `mean`] df.groupBy { city }.meanOf { age / 2 } // [`city`, `mean`]

You can also pass custom name for aggregated column:

df.groupBy { city }.mean("mean age") { age } // [`city`, `mean age`] df.groupBy { city }.meanOf("custom") { age / 2 } // [`city`, `custom`]

If statistic is applied in a mode that returns separate value per every column in data group, aggregated values will be stored in columns with original column names.

df.groupBy { city }.meanFor { age and weight } // [`city`, `age`, `weight`] df.groupBy { city }.mean() // [`city`, `age`, `weight`, ...]

## pivot statistics

When statistics is applied to `Pivot` or `PivotGroupBy`, it is computed for every data group.

If statistic is applied in a mode that returns a single value for every data group, it will be stored in matrix cell without any name.

df.groupBy { city }.pivot { name.lastName }.mean { age } df.groupBy { city }.pivot { name.lastName }.meanOf { age / 2.0 }
val city by column<String?>() val age by column<Int>() val name by columnGroup() val lastName by name.column<String>() df.groupBy { city }.pivot { lastName }.mean { age } df.groupBy { city }.pivot { lastName }.meanOf { age() / 2.0 }
df.groupBy("city").pivot { "name"["lastName"] }.mean("age") df.groupBy("city").pivot { "name"["lastName"] }.meanOf { "age"<Int>() / 2.0 }

If statistic is applied in such a way that it returns separate value per every column in data group, every cell in matrix will contain `DataRow` with values for every aggregated column.

df.groupBy { city }.pivot { name.lastName }.meanFor { age and weight } df.groupBy { city }.pivot { name.lastName }.mean()

To group columns in aggregation results not by pivoted values, but by aggregated columns, apply `separate` flag:

df.groupBy { city }.pivot { name.lastName }.meanFor(separate = true) { age and weight } df.groupBy { city }.pivot { name.lastName }.mean(separate = true)