Dataframe
 
1.0

Summary statistics

Edit pageLast modified: 15 May 2025

Basic summary statistics:

Aggregating summary statistics:

Every summary statistics can be used in aggregations of:

sum, mean, std are available for (primitive) number columns of types Int, Double, Float, Long, Byte, Short, and any mix of those.

min/max, median, and percentile are available for self-comparable columns (so columns of type T : Comparable<T>, like DateTime, String, Int, etc.) which includes all primitive number columns, but no mix of different number types.

In all cases, null values are ignored.

NaN values can optionally be ignored by setting the skipNaN flag to true. When it's set to false, a NaN in the input will be propagated to the result.

Big numbers (BigInteger, BigDecimal) are generally not supported for statistics. Please convert them to primitive types before using statistics.

When statistics x is applied to several columns, it can be computed in several modes:

  • x(): DataRow computes separate value per every suitable column

  • x { columns }: Value computes single value across all given columns

  • xFor { columns }: DataRow computes separate value per every given column

  • xOf { rowExpression }: Value computes single value across results of row expression evaluated for every row

min/max, median, and percentile have additional mode by:

  • minBy { rowExpression }: DataRow finds a row with the minimal result of the rowExpression

  • medianBy { rowExpression }: DataRow finds a row where the median lies based on the results of the rowExpression

To perform statistics for a single row, see row statistics.