DataFrame 1.0 Help

DataRow

DataRow represents a single record, one piece of data within a DataFrame

Row functions

  • index(): Int — sequential row number in DataFrame, starts from 0;

  • prev(): DataRow? — previous row (null for the first row);

  • next(): DataRow? — next row (null for the last row);

  • diff(T) { rowExpression }: T / diffOrNull { rowExpression }: T? — difference between the results of a row expression calculated for the current and previous rows;

  • explode(columns): DataFrame<T> — spread lists and DataFrame objects vertically into new rows;

  • values(): List<Any?> — list of all cell values from the current row;

  • valuesOf<T>(): List<T> — list of values of the given type ;

  • columnsCount(): Int — number of columns;

  • columnNames(): List<String> — list of all column names;

  • columnTypes(): List<KType> — list of all column types;

  • namedValues(): List<NameValuePair<Any?>> — list of name-value pairs where name is a column name and value is a cell value;

  • namedValuesOf<T>(): List<NameValuePair<T>> — list of name-value pairs where the value has the given type;

  • transpose(): DataFrame<NameValuePair<*>>DataFrame with two columns: name: String for column names and value: Any? for cell values;

  • transposeTo<T>(): DataFrame<NameValuePair<T>>DataFrame with two columns: name: String for column names and value: T for cell values;

  • getRow(Int): DataRow — row from the DataFrame by a row index;

  • getRows(Iterable<Int>): DataFrameDataFrame with a subset of rows selected by absolute row indices;

  • relative(Iterable<Int>): DataFrameDataFrame with a subset of rows selected by relative row indices: relative(-1..1) will return the previous, current, and next row. Requested indices will be coerced to the valid range and invalid indices will be skipped;

  • getValue<T>(columnName) — cell value of type T by this row and the given columnName;

  • getValueOrNull<T>(columnName) — cell value of type T? by this row and the given columnName or null if there's no such column;

  • get(column): T — cell value by this row and the given column;

  • String.invoke<T>(): T — cell value of type T by this row and the given this column name;

  • ColumnPath.invoke<T>(): T — cell value of type T by this row and the given this column path;

  • ColumnReference.invoke(): T — cell value of type T by this row and the given this column;

  • df()DataFrame that the current row belongs to.

The following dataframe will be used in the examples below:

df

Row expressions

Row expressions provide a value for every row of DataFrame and are used in add, filter, forEach, update, and other operations. There are two types of row expressions, differing in what the it argument refers to:

RowExpression

RowExpression computes a new value for every selected cell given the DataRow of that cell. Both this and it keywords in RowExpression refer to the same DataRow. Row values can be accessed with or without these keywords.

RowExpression signature: DataRow.(DataRow) -> T.

RowExpression examples

add
// Row expression computes values for a new column df.add("fullName") { name.firstName + " " + name.lastName }
// Row expression computes values for a new column df.add("fullName") { "name"["firstName"] + " " + "name"["lastName"] }
pivot
// Row expression computes cell content for values of pivoted column df.pivot { city }.with { name.lastName.uppercase() }
// Row expression computes cell content for values of pivoted column df.pivot { city }.with { "name"["lastName"]<String>().uppercase() }

RowValueExpression

RowValueExpression computes a new value for every selected cell given the DataRow of that cell and the current value of that cell. this refers to the current DataRow, and it refers to the current value of the cell. RowValueExpression is used after selecting columns in functions such as update or convert.

RowValueExpression signature: DataRow.(C) -> T.

RowValueExpression examples

update (expression)
// "it" refers to the current "weight" cell, and "prev()" is called on the row "this" df.update { weight }.at(2, 3, 5).with { it ?: prev()?.weight }
// "it" refers to the current "weight" cell, and "prev()" is called on the row "this" df.update("weight").at(2, 3, 5).with { it ?: prev()?.get("weight") }
convert
// "it" refers to the current "city" cell df.convert { city }.notNull { it.uppercase() }
// "it" refers to the current "city" cell df.convert("city").notNull { (it as String).uppercase() }

Row conditions

Row condition is a special case of row expression that returns Boolean. There are two types of row conditions:

RowFilter

RowFilter evaluates a DataRow and returns a Boolean indicating whether the row should be included in the result. Both this and it in RowFilter refer to the same DataRow. RowFilter is used in functions such as filter, drop, first, and count.

RowFilter signature: DataRow.(DataRow) -> Boolean.

RowFilter examples

filter
df
// Row filter is used to filter rows df.filter { name.firstName == "Alice" && age >= 18 }
// Row filter is used to filter rows df.filter { "name"["firstName"]<String>() == "Alice" && "age"<Int>() >= 18 }
drop
// Row filter is used to drop rows where `city` or `weight` is null df.drop { city == null || weight == null }
// Row filter is used to drop rows where `city` or `weight` is null df.drop { "city"<String?>() == null || "weight"<Int?>() == null }
first
// Row filter is used to take the first row where `city` is Milan df.first { city == "Milan" }
// Row filter is used to take the first row where `city` is Milan df.first { "city"<String?>() == "Milan" }
count
// Row filter is used to count happy people df.count { isHappy } // the result is 5
// Row filter is used to count happy people df.count { "isHappy"() } // the result is 5

RowValueFilter

RowValueFilter is used after selecting columns in functions such as update, gather, and format. Like RowFilter, it returns a Boolean indicating whether the row should be included in the result. However, unlike RowFilter, where both this and it refer to the current DataRow, RowValueFilter uses the current row as this and can also access the selected column value from this row as it.

RowValueFilter signature: DataRow.(C) -> Boolean.

RowValueFilter examples

update (condition)
// Row value filter is used to filter rows for value update df.update { age }.where { name.firstName == "Alice" && name.lastName == "Cooper" }.with { 16 }
// Row value filter is used to filter rows for value update df.update("age") .where { "name"["firstName"]<String>() == "Alice" && "name"["lastName"]<String>() == "Cooper" } .with { 16 }
gather
// Row value filter is used to gather only unfilled profile fields df.gather { age and city and weight and isHappy } .where { it == null } .into("field", "value")
// Row value filter is used to gather only unfilled profile fields df.gather("age", "city", "weight", "isHappy") .where { it == null } .into("field", "value")
format
// Row value filter is used to format only rows with minors df .format() .where { age < 18 } .with { background(RgbColor(242, 210, 189)) and textColor(black) }
// Row value filter is used to format only rows with minors df .format() .where { "age"<Int>() < 18 } .with { background(RgbColor(242, 210, 189)) and textColor(black) }

Row statistics

The following statistics are available for DataRow:

  • rowSum

  • rowMean

  • rowStd

These statistics will be applied only to values of appropriate types, and incompatible values will be ignored. For example, if a dataframe has columns of types String and Int, rowSum() will compute the sum of the Int values in the row and ignore String values.

To apply statistics only to values of a particular type, use -Of versions:

  • rowSumOf<T>

  • rowMeanOf<T>

  • rowStdOf<T>

  • rowMinOf<T>

  • rowMaxOf<T>

  • rowMedianOf<T>

  • rowPercentileOf<T>

30 June 2026