DataFrame 1.0 Help

String API

The String API is the most basic and straightforward way to select columns in Kotlin DataFrame operations.

In String API operation overloads, selected column names are provided directly as String values in function arguments:

// Select "name" and "info" columns df.select("name", "info")

String Column Accessors

The String API can also be used inside the Columns Selection DSL and row expressions via String column accessors.

String column accessors allow you to access nested columns and combine them with the extensions properties or with any other CS DSL methods.

String column accessors are created using special functions. In the Columns Selection DSL, they have the special type ColumnAccessor, while in row expressions they resolve to concrete value types.

You can optionally specify the column type as a type argument of the String column accessor creation function. This is required for row expressions and for some operations with a column selection. If the specified type does not match the actual column type, a runtime exception may be thrown.

Columns Seletcion DSL

Row Expressions

col("name")/col<T>("name")

getValue<T>("name")

Resolves into general DataColumn/ row value with the provided "name" and type T.

colGroup("name")/colGroup<T>("name")

getColumnGroup("name")

Resolves into ColumnGroup with the provided "name" and type T. Can be used for accessing nested columns

valueCol("name")/valueCol<T>("name")

getValue<T>("name")

Resolves into ValueColumn/ row value with the provided "name" and type T.

frameCol("name")/frameCol<T>("name")

getFrameColumn("name")

Resolves into FrameColumn/DataFrame with the provided "name" and type T.

Example

Consider a simple hierarchical dataframe from example.csv.

This table consists of two columns: name, which is a String column, and info, which is a column group containing two nested value columnsage of type Int, and height of type Double.

name

info

age

height

Alice

23

175.5

Bob

27

160.2

Columns Selection DSL

Get a single "height" subcolumn from the "info" column group

df.getColumn { colGroup("info").col("height") }

Select the "age" subcolumn from the "info" column group and the "name" column

df.select { colGroup("info").col("age") and col("name") }

Calculate the mean value of the ("info"/"age") column; specify the column type as a col type argument

df.mean { colGroup("info").col<Int>("age") }

Combine Extensions Properties and String Column Accessors. Select "height" and "name" columns, assuming we have extensions properties for "info" and "name" columns but not for the ("info"/"height") column

df.select { "info".col("height") and name }

Combine Columns Selection DSL and String Column Accessors. Remove all Number columns from the dataframe except ("info"/"age")

df.remove { colsAtAnyDepth().colsOf<Number>() except colGroup("info").col("age") }

Select all subcolumns from the "info" column group

df.select { colGroup("info").select { col("age") and col("height") } } // or df.select { colGroup("info").allCols() }

Row Expressions

Add a new "heightInt" column by casting the "height" column values to Int

df.add("heightInt") { "info"["height"]<Double>().toInt() }

Filter rows where the ("info"/"age") column value is greater than or equal to 18

df.filter { "info"["age"]<Int>() >= 18 }

Invoked String API

Alternatively, you can use the String invocation (optional typed argument) for column accessor creation. It will create the same column accessors as in the Columns Selection DSL. You can access nested columns using the String.get or String.invoke operators or using the String.select {} function, where the receiver is the column group name.

// Columns Selection DSL // Get a single "height" subcolumn from the "info" column group df.getColumn { "info"["height"]<Double>() } // Select the "age" subcolumn of the "info" column group // and the "name" column df.select { "info"["age"] and "name"() } // Calculate the mean value of the ("info"/"age") column; // specify the column type as an invocation type argument df.mean { "info" { "age"<Int>() } } // Select all subcolumns from the "info" column group df.select { "info" { "age"() and "height"() } } // or df.select { "info".allCols() } // Row Expressions // Add a new "heightInt" column by // casting the "height" column values to `Int` df.add("heightInt") { "info"["height"]<Double>().toInt() } // Filter rows where the ("info"/"age") column value // is greater than or equal to 18 df.filter { "info"["age"]<Int>() >= 18 }

When should I use the String API?

The String API is a good starting point for learning the library and understanding how column selection works.

For production code we strongly recommend using the Extension Properties API instead. It is more concise, fully type-safe, and provides better IDE support.

However, note that sometimes the usage of Extension Properties API is not possible or may require too many excess actions. In such cases, use String Column Accessors.

10 March 2026