Dataframe 0.14 Help

Summary statistics

Every summary statistics can be used in aggregations of:

df.mean() df.age.sum() df.groupBy { city }.mean() df.pivot { city }.median() df.pivot { city }.groupBy { name.lastName }.std()

sum, mean, std are available for numeric columns of types Int, Double, Float, BigDecimal, Long, Byte.

min/max, median are available for Comparable columns.

When statistics x is applied to several columns, it can be computed in several modes:

  • x(): DataRow computes separate value per every suitable column

  • x { columns }: Value computes single value across all given columns

  • xFor { columns }: DataRow computes separate value per every given column

  • xOf { rowExpression }: Value computes single value across results of row expression evaluated for every row

min and max statistics have additional mode by:

  • minBy { rowExpression }: DataRow finds a row with minimal result of expression

df.sum() // sum of values per every numeric column df.sum { age and weight } // sum of all values in `age` and `weight` df.sumFor { age and weight } // sum of values per `age` and `weight` separately df.sumOf { (weight ?: 0) / age } // sum of expression evaluated for every row

groupBy statistics

When statistics is applied to GroupBy DataFrame, it is computed for every data group.

If statistic is applied in a mode that returns a single value for every data group, it will be stored in a single column named by statistic name.

df.groupBy { city }.mean { age } // [`city`, `mean`] df.groupBy { city }.meanOf { age / 2 } // [`city`, `mean`]

You can also pass custom name for aggregated column:

df.groupBy { city }.mean("mean age") { age } // [`city`, `mean age`] df.groupBy { city }.meanOf("custom") { age / 2 } // [`city`, `custom`]

If statistic is applied in a mode that returns separate value per every column in data group, aggregated values will be stored in columns with original column names.

df.groupBy { city }.meanFor { age and weight } // [`city`, `age`, `weight`] df.groupBy { city }.mean() // [`city`, `age`, `weight`, ...]

pivot statistics

When statistics is applied to Pivot or PivotGroupBy, it is computed for every data group.

If statistic is applied in a mode that returns a single value for every data group, it will be stored in matrix cell without any name.

df.groupBy { city }.pivot { name.lastName }.mean { age } df.groupBy { city }.pivot { name.lastName }.meanOf { age / 2.0 }
val city by column<String?>() val age by column<Int>() val name by columnGroup() val lastName by name.column<String>() df.groupBy { city }.pivot { lastName }.mean { age } df.groupBy { city }.pivot { lastName }.meanOf { age() / 2.0 }
df.groupBy("city").pivot { "name"["lastName"] }.mean("age") df.groupBy("city").pivot { "name"["lastName"] }.meanOf { "age"<Int>() / 2.0 }

If statistic is applied in such a way that it returns separate value per every column in data group, every cell in matrix will contain DataRow with values for every aggregated column.

df.groupBy { city }.pivot { name.lastName }.meanFor { age and weight } df.groupBy { city }.pivot { name.lastName }.mean()

To group columns in aggregation results not by pivoted values, but by aggregated columns, apply separate flag:

df.groupBy { city }.pivot { name.lastName }.meanFor(separate = true) { age and weight } df.groupBy { city }.pivot { name.lastName }.mean(separate = true)
Last modified: 27 September 2024