DataFrame 1.0 Help

Extension Properties API

When working with a DataFrame, the most convenient and reliable way to access its columns — including for operations and retrieving column values in row expressions — is through auto-generated extension properties. They are generated based on a dataframe schema, with the name and type of properties inferred from the name and type of the corresponding columns. It also works for all types of hierarchical dataframes.

Example

Consider a simple hierarchical dataframe from example.csv.

This dataframe consists of two columns:

  • name, which is a String column

  • info, which is a column group containing two nested value columns:

    • age of type Int

    • height of type Double

name

info

age

height

Alice

23

175.5

Bob

27

160.2

Read the DataFrame from the CSV file:

val df = DataFrame.readCsv("example.csv")

After cell execution data schema and extensions for this DataFrame will be generated so you can use extensions for accessing columns, using it in operations inside the Column Selector DSL and DataRow API:

// Get nested column df.info.age // Sort by multiple columns df.sortBy { name and info.height } // Filter rows using a row condition. // These extensions express the exact value in the row // with the corresponding type: df.filter { name.startsWith("A") && info.age >= 16 }

If you change the dataframe's schema by changing any column name, or type or add a new one, you need to run a cell with a new DataFrame declaration first. For example, rename the name column into "firstName":

val dfRenamed = df.rename { name }.into("firstName")

After running the cell with the code above, you can use firstName extensions in the following cells:

dfRenamed.firstName dfRenamed.rename { firstName }.into("name") dfRenamed.filter { firstName == "Nikita" }

See the Quickstart Guide in Kotlin Notebook with basic Extension Properties API examples.

For now, if you read DataFrame from a file or URL, you need to define its schema manually. You can do it quickly with generate..() methods.

Define schemas:

// Data schema of the "info" column group @DataSchema interface Info { val age: Int val height: Float } // Data schema of the entire DataFrame @DataSchema interface Person { val info: Info val name: String }
Read the [`DataFrame`](DataFrame.md) from the CSV file and specify the schema with [`.convertTo()`](convertTo.md) or [`cast()`](cast.md): ```kotlin val df = DataFrame.readCsv("example.csv").cast<Person>()

Extensions for this DataFrame will be generated automatically by the plugin, so you can use extensions for accessing columns, using it in operations inside the Column Selector DSL and DataRow API.

// Get nested column df.info.age // Sort by multiple columns df.sortBy { name and info.height } // Filter rows using a row condition. // These extensions express the exact value in the row // with the corresponding type: df.filter { name.startsWith("A") && info.age >= 16 }

Moreover, new extensions will be generated on-the-fly after each schema change: by changing any column name, or type or add a new one. For example, rename the name column into "firstName" and then we can use firstName extensions in the following operations:

// Rename "name" column into "firstName" df.rename { name }.into("firstName") // Can use `firstName` extension in the row condition // right after renaming .filter { firstName == "Nikita" }

See Compiler Plugin Example IDEA project with basic Extension Properties API examples.

Properties name generation

By default, each extension property is generated with a name equal to the original column name.

val df = dataFrameOf("size_in_inches" to listOf(..)) df.size_in_inches

If the original column name cannot be used as a property name (for example, if it contains spaces or has a name equal to a keyword in Kotlin), it will be enclosed in backticks.

val df = dataFrameOf("size in inches" to listOf(..)) df.`size in inches`

However, sometimes the original column name contains special symbols and can't be used as a property name in backticks. In such cases, special symbols in the auto-generated property name will be replaced.

val df = dataFrameOf("size\nin:inches" to listOf(..)) df.`size in - inches`

If you don't want to change the actual column name, but you need a convenient accessor for this column, you can use the @ColumnName annotation in a manually declared data schema. It allows you to use a property name different from the original column name without changing the column's actual name:

@DataSchema interface Info { @ColumnName("size\nin:inches") val sizeInInches: Double }
val df = dataFrameOf("size\nin:inches" to listOf(..)).cast<Info>() df.sizeInInches

Custom extension properties

Sometimes it is useful to define your own extension properties based on a data schema.

For example, consider a simple dataframe with two columns and the following BranchData schema:

@DataSchema interface BranchData { val expenses: Long val revenue: Long }
// Read DataFrame and cast its type parameter to BranchData val df = DataFrame.readCsv("branchData.csv").cast<BranchData>()

You can define an extension property for DataRow<BranchData> to create a convenient shortcut:

// Use generated extension properties to create a new one val DataRow<BranchData>.profit get() = revenue - expenses

You can then use it, for example, in row expressions:

val dfProfitable = df.filter { it.profit > 0 }

Note that if you change the actual schema of a dataframe (by performing operations that modify its structure), this extension property can no longer be used, because it is tied to the specific schema.

df.add("name") { "branchName" } // unresolved because of `add` .filter { it.profit > 0 }

However, you can work around this by casting back to the original schema:

df.add("name") { "branchName" } .filter { it.cast<BranchData>().profit > 0 }
27 May 2026