String API
The String API is the most basic and straightforward way to select columns in Kotlin DataFrame operations.
In String API operation overloads, selected column names are provided directly as String values in function arguments:
String Column Accessors
The String API can also be used inside the Columns Selection DSL and row expressions via String column accessors.
String column accessors allow you to access nested columns and combine them with the extensions properties or with any other CS DSL methods.
String column accessors are created using special functions. In the Columns Selection DSL, they have the special type ColumnAccessor, while in row expressions they resolve to concrete value types.
You can optionally specify the column type as a type argument of the String column accessor creation function. This is required for row expressions and for some operations with a column selection. If the specified type does not match the actual column type, a runtime exception may be thrown.
Columns Seletcion DSL | Row Expressions | |
|---|---|---|
|
| Resolves into general |
|
| Resolves into |
|
| Resolves into |
|
| Resolves into |
Example
Consider a simple hierarchical dataframe from example.csv.
This table consists of two columns: name, which is a String column, and info, which is a column group containing two nested value columns — age of type Int, and height of type Double.
name | info | |
|---|---|---|
age | height | |
Alice | 23 | 175.5 |
Bob | 27 | 160.2 |
Columns Selection DSL
Get a single "height" subcolumn from the "info" column group
Select the "age" subcolumn from the "info" column group and the "name" column
Calculate the mean value of the ("info"/"age") column; specify the column type as a col type argument
Combine Extensions Properties and String Column Accessors. Select "height" and "name" columns, assuming we have extensions properties for "info" and "name" columns but not for the ("info"/"height") column
Combine Columns Selection DSL and String Column Accessors. Remove all Number columns from the dataframe except ("info"/"age")
Select all subcolumns from the "info" column group
Row Expressions
Add a new "heightInt" column by casting the "height" column values to Int
Filter rows where the ("info"/"age") column value is greater than or equal to 18
Invoked String API
Alternatively, you can use the String invocation (optional typed argument) for column accessor creation. It will create the same column accessors as in the Columns Selection DSL. You can access nested columns using the String.get or String.invoke operators or using the String.select {} function, where the receiver is the column group name.
When should I use the String API?
The String API is a good starting point for learning the library and understanding how column selection works.
For production code we strongly recommend using the Extension Properties API instead. It is more concise, fully type-safe, and provides better IDE support.
However, note that sometimes the usage of Extension Properties API is not possible or may require too many excess actions. In such cases, use String Column Accessors.