Summary statistics

df.mean()
df.age.sum()
df.groupBy { city }.mean()
df.pivot { city }.median()
df.pivot { city }.groupBy { name.lastName }.std()

sum, mean, std are available for (primitive) number columns of types Int, Double, Float, Long, Byte, Short, and any mix of those.

min/max, median, and percentile are available for self-comparable columns (so columns of type T : Comparable<T>, like DateTime, String, Int, etc.) which includes all primitive number columns, but no mix of different number types.

In all cases, null values are ignored.

NaN values can optionally be ignored by setting the skipNaN flag to true. When it's set to false, a NaN in the input will be propagated to the result.

Big numbers (BigInteger, BigDecimal) are generally not supported for statistics. Please convert them to primitive types before using statistics.

When statistics x is applied to several columns, it can be computed in several modes:

min/max, median, and percentile have additional mode by:

df.sum() // sum of values per every numeric column
df.sum { age and weight } // sum of all values in `age` and `weight`
df.sumFor(skipNaN = true) { age and weight } // sum of values per `age` and `weight` separately
df.sumOf { (weight ?: 0) / age } // sum of expression evaluated for every row

groupBy statistics

df.groupBy { city }.mean { age } // [`city`, `mean`]
df.groupBy { city }.meanOf { age / 2 } // [`city`, `mean`]

df.groupBy { city }.mean("mean age") { age } // [`city`, `mean age`]
df.groupBy { city }.meanOf("custom") { age / 2 } // [`city`, `custom`]

df.groupBy { city }.meanFor { age and weight } // [`city`, `age`, `weight`]
df.groupBy { city }.mean() // [`city`, `age`, `weight`, ...]

pivot statistics

When statistics are applied to Pivot or PivotGroupBy, it is computed for every data group.

If a statistic is applied in a mode that returns a single value for every data group, it will be stored in a DataFrame cell without any name.

df.groupBy { city }.pivot { name.lastName }.mean { age }
df.groupBy { city }.pivot { name.lastName }.meanOf { age / 2.0 }

df.groupBy { city }.pivot { name.lastName }.meanFor { age and weight }
df.groupBy { city }.pivot { name.lastName }.mean()

To group columns in aggregation results not by pivoted values, but by aggregated columns, apply the separate flag:

df.groupBy { city }.pivot { name.lastName }.meanFor(separate = true) { age and weight }
df.groupBy { city }.pivot { name.lastName }.mean(separate = true)

Summary statistics﻿

groupBy statistics﻿

pivot statistics﻿

Summary statistics

groupBy statistics

pivot statistics