Extension Properties API
When working with a DataFrame
, the most convenient and reliable way to access its columns — including for operations and retrieving column values in row expressions — is through auto-generated extension properties. They are generated based on a dataframe schema, with the name and type of properties inferred from the name and type of the corresponding columns. It also works for all types of hierarchical dataframes.
Example
Consider a simple hierarchical dataframe from example.csv.
This table consists of two columns: name
, which is a String
column, and info
, which is a column group containing two nested value columns — age
of type Int
, and height
of type Double
.
name | info | |
---|---|---|
age | height | |
Alice | 23 | 175.5 |
Bob | 27 | 160.2 |
Read the DataFrame
from the CSV file:
After cell execution data schema and extensions for this DataFrame
will be generated so you can use extensions for accessing columns, using it in operations inside the Column Selector DSL and DataRow API:
If you change the dataframe's schema by changing any column name, or type or add a new one, you need to run a cell with a new DataFrame
declaration first. For example, rename the name
column into "firstName":
After running the cell with the code above, you can use firstName
extensions in the following cells:
See the Quickstart Guide in Kotlin Notebook with basic Extension Properties API examples.
For now, if you read DataFrame
from a file or URL, you need to define its schema manually. You can do it quickly with generate..()
methods.
Define schemas:
Read the DataFrame
from the CSV file and specify the schema with .convertTo()
or cast()
:
Extensions for this DataFrame
will be generated automatically by the plugin, so you can use extensions for accessing columns, using it in operations inside the Column Selector DSL and DataRow API.
Moreover, new extensions will be generated on-the-fly after each schema change: by changing any column name, or type or add a new one. For example, rename the name
column into "firstName" and then we can use firstName
extensions in the following operations:
See Compiler Plugin Example IDEA project with basic Extension Properties API examples.
Properties name generation
By default, each extension property is generated with a name equal to the original column name.
If the original column name cannot be used as a property name (for example, if it contains spaces or has a name equal to a keyword in Kotlin), it will be enclosed in backticks.
However, sometimes the original column name contains special symbols and can't be used as a property name in backticks. In such cases, special symbols in the auto-generated property name will be replaced.
If you don't want to change the actual column name, but you need a convenient accessor for this column, you can use the @ColumnName
annotation in a manually declared data schema. It allows you to use a property name different from the original column name without changing the column's actual name: