Data Schemas in Gradle projects
Edit page Last modified: 14 January 2025In Gradle projects, the Kotlin DataFrame library provides
Annotation processing for generation of extension properties
Annotation processing for
DataSchema
inference from datasets.Gradle task for
DataSchema
inference from datasets.
Configuration
To use the extension properties API in Gradle project add the dataframe
plugin as follows:
plugins {
id("org.jetbrains.kotlinx.dataframe") version "0.15.0"
}
dependencies {
implementation("org.jetbrains.kotlinx:dataframe:0.15.0")
}
plugins {
id("org.jetbrains.kotlinx.dataframe") version "0.15.0"
}
dependencies {
implementation 'org.jetbrains.kotlinx:dataframe:0.15.0'
}
Annotation processing
Declare data schemas in your code and use them to access data in DataFrame
objects. A data schema is a class or interface annotated with @DataSchema
:
import org.jetbrains.kotlinx.dataframe.annotations.DataSchema
@DataSchema
interface Person {
val name: String
val age: Int
}
Execute the assemble task to generate type-safe accessors for schemas:
val df = dataFrameOf("name", "age")(
"Alice", 15,
"Bob", 20,
).cast<Person>()
// age only available after executing `build` or `kspKotlin`!
val teens = df.filter { age in 10..19 }
teens.print()
Schema inference
Specify schema with preferred method and execute the assemble
task.
@ImportDataSchema
annotation must be above package directive. You can import schemas from a URL or from the relative path of a file. Relative path by default is resolved to the project root directory. You can configure it by passing dataframe.resolutionDir
option to preprocessor. For example:
ksp {
arg("dataframe.resolutionDir", file("data").absolutePath)
}
Note that due to incremental processing, imported schema will be re-generated only if some source code has changed from the previous invocation, at least one character.
For the following configuration, file Repository.Generated.kt
will be generated to build/generated/ksp/
folder in the same package as file containing the annotation.
@file:ImportDataSchema(
"Repository",
"https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv",
)
import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema
import org.jetbrains.kotlinx.dataframe.api.*
See KDocs for @ImportDataSchema
in IDE or GitHub for more details.
Put this in build.gradle
or build.gradle.kts
For the following configuration, file Repository.Generated.kt
will be generated to build/generated/dataframe/org/example
folder.
dataframes {
schema {
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
name = "org.example.Repository"
}
}
After assemble
, the following code should compile and run:
// Repository.readCSV() has argument 'path' with default value https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv
val df = Repository.readCSV()
// Use generated properties to access data in rows
df.maxBy { stargazersCount }.print()
// Or to access columns in dataframe.
print(df.fullName.count { it.contains("kotlin") })