Data Schemas/Data Classes Generation
Special utility functions that generate code of useful Kotlin definitions (returned as a String
) based on the current DataFrame
schema.
inline fun <reified T> DataFrame<T>.generateInterfaces(): CodeString
fun <T> DataFrame<T>.generateInterfaces(markerName: String): CodeString
Generates @DataSchema
interfaces for this DataFrame
(including all nested DataFrame
columns and column groups) as Kotlin interfaces.
This is useful when working with the compiler plugin in cases where the schema cannot be inferred automatically from the source.
markerName
:String
– The base name to use for generated interfaces.
If not specified, uses theT
type argument ofDataFrame
simple name.
CodeString
– A value class wrapper forString
, containing
the generated Kotlin code of@DataSchema
interfaces without extension properties.
df
df.generateInterfaces()
Output:
@DataSchema(isOpen = false)
interface _DataFrameType11 {
val amount: kotlin.Double
val orderId: kotlin.Int
}
@DataSchema
interface _DataFrameType1 {
val orders: List<_DataFrameType11>
val user: kotlin.String
}
By adding these interfaces to your project with the compiler plugin enabled,
you'll gain full support for the extension properties API and type-safe operations.
Use cast
to apply the generated schema to a DataFrame
:
df.cast<_DataFrameType1>().filter { orders.all { orderId >= 102 } }
inline fun <reified T> DataFrame<T>.generateDataClasses(
markerName: String? = null,
extensionProperties: Boolean = false,
visibility: MarkerVisibility = MarkerVisibility.IMPLICIT_PUBLIC,
useFqNames: Boolean = false,
nameNormalizer: NameNormalizer = NameNormalizer.default,
): CodeString
Generates Kotlin data classes corresponding to the DataFrame
schema (including all nested DataFrame
columns and column groups).
Useful when you want to:
Work with the data as regular Kotlin data classes.
Work with data classes serialization.
Extract structured types for further use in your application.
markerName
:String?
— The base name to use for generated data classes.
Ifnull
, uses theT
type argument ofDataFrame
simple name.
Default:null
.extensionProperties
:Boolean
– Whether to generate extension properties in addition todata class
declarations.
Default:false
.visibility
:MarkerVisibility
– Visibility modifier for the generated declarations.
Default:MarkerVisibility.IMPLICIT_PUBLIC
.useFqNames
:Boolean
– Iftrue
, fully qualified type names will be used in generated code.
Default:false
.nameNormalizer
:NameNormalizer
– Strategy for converting column names (with spaces, underscores, etc.) to valid Kotlin identifiers.
Default:NameNormalizer.default
.
CodeString
– A value class wrapper forString
, containing
the generated Kotlin code ofdata class
declarations and optionally extension properties.
df.generateDataClasses("Customer")
Output:
@DataSchema
data class Customer1(
val amount: Double,
val orderId: Int
)
@DataSchema
data class Customer(
val orders: List<Customer1>,
val user: String
)
Add these classes to your project and convert the DataFrame to a list of typed objects:
val customers: List<Customer> = df.cast<Customer>().toList()
inline fun <reified T> DataFrame<T>.generateCode(
fields: Boolean = true,
extensionProperties: Boolean = true,
): CodeString
fun <T> DataFrame<T>.generateCode(
markerName: String,
fields: Boolean = true,
extensionProperties: Boolean = true,
visibility: MarkerVisibility = MarkerVisibility.IMPLICIT_PUBLIC,
): CodeString
Generates a data schema interface as generateInterfaces()
,
along with explicit extension properties. Useful if you don't use the compiler plugin.
markerName
:String
– The base name to use for generated interfaces. If not specified, uses theT
type argument ofDataFrame
simple name.fields
:Boolean
– Whether to generate fields (val ...
) inside interfaces. Default:true
.extensionProperties
:Boolean
– Whether to generate extension properties for the schema. Default:true
.visibility
:MarkerVisibility
– Visibility modifier for the generated declarations. Default:MarkerVisibility.IMPLICIT_PUBLIC
.
CodeString
– A value class wrapper forString
, containing the generated Kotlin code of@DataSchema
interfaces and/or extension properties.
df.generateCode("Customer")
Output:
@DataSchema(isOpen = false)
interface Customer1 {
val amount: kotlin.Double
val orderId: kotlin.Int
}
val org.jetbrains.kotlinx.dataframe.ColumnsContainer<Customer1>.amount: org.jetbrains.kotlinx.dataframe.DataColumn<kotlin.Double> @JvmName("Customer1_amount") get() = this["amount"] as org.jetbrains.kotlinx.dataframe.DataColumn<kotlin.Double>
val org.jetbrains.kotlinx.dataframe.DataRow<Customer1>.amount: kotlin.Double @JvmName("Customer1_amount") get() = this["amount"] as kotlin.Double
val org.jetbrains.kotlinx.dataframe.ColumnsContainer<Customer1>.orderId: org.jetbrains.kotlinx.dataframe.DataColumn<kotlin.Int> @JvmName("Customer1_orderId") get() = this["orderId"] as org.jetbrains.kotlinx.dataframe.DataColumn<kotlin.Int>
val org.jetbrains.kotlinx.dataframe.DataRow<Customer1>.orderId: kotlin.Int @JvmName("Customer1_orderId") get() = this["orderId"] as kotlin.Int
@DataSchema
interface Customer {
val orders: List<Customer1>
val user: kotlin.String
}
val org.jetbrains.kotlinx.dataframe.ColumnsContainer<Customer>.orders: org.jetbrains.kotlinx.dataframe.DataColumn<org.jetbrains.kotlinx.dataframe.DataFrame<Customer1>> @JvmName("Customer_orders") get() = this["orders"] as org.jetbrains.kotlinx.dataframe.DataColumn<org.jetbrains.kotlinx.dataframe.DataFrame<Customer1>>
val org.jetbrains.kotlinx.dataframe.DataRow<Customer>.orders: org.jetbrains.kotlinx.dataframe.DataFrame<Customer1> @JvmName("Customer_orders") get() = this["orders"] as org.jetbrains.kotlinx.dataframe.DataFrame<Customer1>
val org.jetbrains.kotlinx.dataframe.ColumnsContainer<Customer>.user: org.jetbrains.kotlinx.dataframe.DataColumn<kotlin.String> @JvmName("Customer_user") get() = this["user"] as org.jetbrains.kotlinx.dataframe.DataColumn<kotlin.String>
val org.jetbrains.kotlinx.dataframe.DataRow<Customer>.user: kotlin.String @JvmName("Customer_user") get() = this["user"] as kotlin.String
By adding this generated code to your project, you can use the extension properties API for fully type-safe column access and transformations.
Use cast
to apply the generated schema to a DataFrame
:
df.cast<Customer>()
.add("ordersTotal") { orders.sumOf { it.amount } }
.filter { user.startsWith("A") }
.rename { user }.into("customer")