Gradle plugin reference
Examples
In the best scenario, your schema could be defined as simple as this:
dataframes {
// output: build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
schema {
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
}
}
Note that the name of the file and the interface are normalized: split by '_' and ' ' and joined to CamelCase. You can set parsing options for CSV:
dataframes {
// output: build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
schema {
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
csvOptions {
delimiter = ','
}
}
}
In this case, the output path will depend on your directory structure. For project with package org.example
path will be build/generated/dataframe/main/kotlin/org/example/dataframe/JetbrainsRepositories.Generated.kt
.
Note that the name of the Kotlin file is derived from the name of the data file with the suffix .Generated
and the package is derived from the directory structure with child directory dataframe
.
The name of the data schema itself is JetbrainsRepositories
. You could specify it explicitly:
schema {
// output: build/generated/dataframe/main/kotlin/org/example/dataframe/MyName.Generated.kt
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
name = "MyName"
}
If you want to change the default package for all schemas:
dataframes {
packageName = "org.example"
// Schemas...
}
Then you can set packageName for specific schema exclusively:
dataframes {
// output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt
schema {
packageName = "org.example.data"
data = file("path/to/data.csv")
}
}
If you want non-default name and package, consider using fully qualified name:
dataframes {
// output: build/generated/dataframe/main/kotlin/org/example/data/OtherName.Generated.kt
schema {
name = "org.example.data.OtherName"
data = file("path/to/data.csv")
}
}
By default, the plugin will generate output in a specified source set. Source set could be specified for all schemas or for specific schema:
dataframes {
packageName = "org.example"
sourceSet = "test"
// output: build/generated/dataframe/test/kotlin/org/example/Data.Generated.kt
schema {
data = file("path/to/data.csv")
}
// output: build/generated/dataframe/integrationTest/kotlin/org/example/Data.Generated.kt
schema {
sourceSet = "integrationTest"
data = file("path/to/data.csv")
}
}
If you need the generated files to be put in another directory, set src
:
dataframes {
// output: schemas/org/example/test/OtherName.Generated.kt
schema {
data = "https://raw.githubusercontent.com/Kotlin/dataframe/master/data/jetbrains_repositories.csv"
name = "org.example.test.OtherName"
src = file("schemas")
}
}
Schema Definitions from SQL Databases
To generate a schema for an existing SQL table, you need to define a few parameters to establish a JDBC connection: URL (passing to data
field), username, and password.
Also, the tableName
parameter should be specified to convert the data from the table with that name to the dataframe.
dataframes {
schema {
data = "jdbc:mariadb://localhost:3306/imdb"
name = "org.example.imdb.Actors"
jdbcOptions {
user = "root"
password = "pass"
tableName = "actors"
}
}
}
To generate a schema for the result of an SQL query, you need to define the same parameters as before together with the SQL query to establish connection.
dataframes {
schema {
data = "jdbc:mariadb://localhost:3306/imdb"
name = "org.example.imdb.TarantinoFilms"
jdbcOptions {
user = "root"
password = "pass"
sqlQuery = """
SELECT name, year, rank,
GROUP_CONCAT (genre) as "genres"
FROM movies JOIN movies_directors ON movie_id = movies.id
JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id
WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino"
GROUP BY name, year, rank
ORDER BY year
"""
}
}
}
Find full example code here.
NOTE: This is an experimental functionality and, for now, we only support four databases: MariaDB, MySQL, PostgreSQL, and SQLite.
Additionally, support for JSON and date-time types is limited. Please take this into consideration when using these functions.
DSL reference
Inside dataframes
you can configure parameters that will apply to all schemas. Configuration inside schema
will override these defaults for a specific schema. Here is the full DSL for declaring data schemas:
dataframes {
sourceSet = "mySources" // [optional; default: "main"]
packageName = "org.jetbrains.data" // [optional; default: common package under source set]
visibility = // [optional; default: if explicitApiMode enabled then EXPLICIT_PUBLIC, else IMPLICIT_PUBLIC]
// KOTLIN SCRIPT: DataSchemaVisibility.INTERNAL DataSchemaVisibility.IMPLICIT_PUBLIC, DataSchemaVisibility.EXPLICIT_PUBLIC
// GROOVY SCRIPT: 'internal', 'implicit_public', 'explicit_public'
withoutDefaultPath() // disable a default path for all schemas
// i.e., plugin won't copy "data" property of the schemas to generated companion objects
// split property names by delimiters (arguments of this method), lowercase parts and join to camel case
// enabled by default
withNormalizationBy('_') // [optional: default: ['\t', '_', ' ']]
withoutNormalization() // disable property names normalization
schema {
sourceSet /* String */ = "…" // [optional; override default]
packageName /* String */ = "…" // [optional; override default]
visibility /* DataSchemaVisibility */ = "…" // [optional; override default]
src /* File */ = file("…") // [optional; default: file("build/generated/dataframe/$sourceSet/kotlin")]
data /* URL | File | String */ = "…" // Data in JSON or CSV formats
name = "org.jetbrains.data.Person" // [optional; default: from filename]
csvOptions {
delimiter /* Char */ = ';' // [optional; default: ',']
}
// See names normalization
withNormalizationBy('_') // enable property names normalization for this schema and use these delimiters
withoutNormalization() // disable property names normalization for this schema
withoutDefaultPath() // disable the default path for this schema
withDefaultPath() // enable the default path for this schema
}
}
Last modified: 27 September 2024