Dataframe 0.13 Help

DataRow

DataRow represents a single record, one piece of data within a DataFrame

Row functions

  • index(): Int — sequential row number in DataFrame, starts from 0

  • prev(): DataRow? — previous row (null for the first row)

  • next(): DataRow? — next row (null for the last row)

  • diff(T) { rowExpression }: T / diffOrNull { rowExpression }: T? — difference between the results of a row expression calculated for current and previous rows

  • explode(columns): DataFrame<T> — spread lists and DataFrames vertically into new rows

  • values(): List<Any?> — list of all cell values from the current row

  • valuesOf<T>(): List<T> — list of values of the given type

  • columnsCount(): Int — number of columns

  • columnNames(): List<String> — list of all column names

  • columnTypes(): List<KType> — list of all column types

  • namedValues(): List<NameValuePair<Any?>> — list of name-value pairs where name is a column name and value is cell value

  • namedValuesOf<T>(): List<NameValuePair<T>> — list of name-value pairs where value has given type

  • transpose(): DataFrame<NameValuePair<*>> — dataframe of two columns: name: String is column names and value: Any? is cell values

  • transposeTo<T>(): DataFrame<NameValuePair<T>> — dataframe of two columns: name: String is column names and value: T is cell values

  • getRow(Int): DataRow — row from DataFrame by row index

  • getRows(Iterable<Int>): DataFrame — dataframe with subset of rows selected by absolute row index.

  • relative(Iterable<Int>): DataFrame — dataframe with subset of rows selected by relative row index: relative(-1..1) will return previous, current and next row. Requested indices will be coerced to the valid range and invalid indices will be skipped

  • getValue<T>(columnName) — cell value of type T by this row and given columnName

  • getValueOrNull<T>(columnName) — cell value of type T? by this row and given columnName or null if there's no such column

  • get(column): T — cell value by this row and given column

  • String.invoke<T>(): T — cell value of type T by this row and given this column name

  • ColumnPath.invoke<T>(): T — cell value of type T by this row and given this column path

  • ColumnReference.invoke(): T — cell value of type T by this row and given this column

  • df()DataFrame that current row belongs to

Row expressions

Row expressions provide a value for every row of DataFrame and are used in add, filter, forEach, update and other operations.

// Row expression computes values for a new column df.add("fullName") { name.firstName + " " + name.lastName } // Row expression computes updated values df.update { weight }.at(1, 3, 4).with { prev()?.weight } // Row expression computes cell content for values of pivoted column df.pivot { city }.with { name.lastName.uppercase() }

Row expression signature: DataRow.(DataRow) -> T. Row values can be accessed with or without it keyword. Implicit and explicit argument represent the same DataRow object.

Row conditions

Row condition is a special case of row expression that returns Boolean.

// Row condition is used to filter rows by index df.filter { index() % 5 == 0 } // Row condition is used to drop rows where `age` is the same as in the previous row df.drop { diffOrNull { age } == 0 } // Row condition is used to filter rows for value update df.update { weight }.where { index() > 4 && city != "Paris" }.with { 50 }

Row condition signature: DataRow.(DataRow) -> Boolean

Row statistics

The following statistics are available for DataRow:

  • rowMax

  • rowMin

  • rowSum

  • rowMean

  • rowStd

  • rowMedian

These statistics will be applied only to values of appropriate types and incompatible values will be ignored. For example, if DataFrame has columns of type String and Int, rowSum() will successfully compute sum of Int values in a row and ignore String values.

To apply statistics only to values of particular type use -Of versions:

  • rowMaxOf<T>

  • rowMinOf<T>

  • rowSumOf<T>

  • rowMeanOf<T>

  • rowMedianOf<T>

Last modified: 29 March 2024