Dataframe 0.13 Help

Data Schemas in Gradle projects

In Gradle projects, the Kotlin DataFrame library provides

  1. Annotation processing for generation of extension properties

  2. Annotation processing for DataSchema inference from datasets.

  3. Gradle task for DataSchema inference from datasets.

Configuration

To use the extension properties API in Gradle project add the dataframe plugin as follows:

plugins { id("org.jetbrains.kotlinx.dataframe") version "0.13.1" } dependencies { implementation("org.jetbrains.kotlinx:dataframe:0.13.1") }
plugins { id("org.jetbrains.kotlinx.dataframe") version "0.13.1" } dependencies { implementation 'org.jetbrains.kotlinx:dataframe:0.13.1' }

Annotation processing

Declare data schemas in your code and use them to access data in DataFrames. A data schema is a class or interface annotated with @DataSchema:

import org.jetbrains.kotlinx.dataframe.annotations.DataSchema @DataSchema interface Person { val name: String val age: Int }

Execute the assemble task to generate type-safe accessors for schemas:

val df = dataFrameOf("name", "age")( "Alice", 15, "Bob", 20 ).cast<Person>() // age only available after executing `build` or `kspKotlin`! val teens = df.filter { age in 10..19 } teens.print()

Schema inference

Specify schema with preferred method and execute the assemble task.

@ImportDataSchema annotation must be above package directive. You can import schemas from a URL or from the relative path of a file. Relative path by default is resolved to the project root directory. You can configure it by passing dataframe.resolutionDir option to preprocessor. For example:

ksp { arg("dataframe.resolutionDir", file("data").absolutePath) }

Note that due to incremental processing, imported schema will be re-generated only if some source code has changed from the previous invocation, at least one character.

For the following configuration, file Repository.Generated.kt will be generated to build/generated/ksp/ folder in the same package as file containing the annotation.

@file:ImportDataSchema( "Repository", "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv", ) import org.jetbrains.kotlinx.dataframe.annotations.ImportDataSchema import org.jetbrains.kotlinx.dataframe.api.*

See KDocs for @ImportDataSchema in IDE or GitHub for more details.

Put this in build.gradle or build.gradle.kts For the following configuration, file Repository.Generated.kt will be generated to build/generated/dataframe/org/example folder.

dataframes { schema { data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv" name = "org.example.Repository" } }

See reference and examples for more details.

After assemble, the following code should compile and run:

// Repository.readCSV() has argument 'path' with default value https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv val df = Repository.readCSV() // Use generated properties to access data in rows df.maxBy { stargazersCount }.print() // Or to access columns in dataframe. print(df.fullName.count { it.contains("kotlin") })
Last modified: 29 March 2024