Custom Data Schemas

@DataSchema
interface Person {
    val name: String
    val age: Int
}

fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName")
fun DataFrame<Person>.adults() = filter { age > 18 }

In Jupyter these functions will work automatically for any DataFrame that matches Person schema:

val df = dataFrameOf("name", "age", "weight")(
    "Merton, Alice", 15, 60.0,
    "Marley, Bob", 20, 73.5,
)

Schema of df is compatible with Person, so auto-generated schema interface will inherit from it:

@DataSchema(isOpen = false)
interface DataFrameType : Person

val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double>
val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double

Despite df has additional column weight, previously defined functions for DataFrame<Person> will work for it:

df.splitName()

firstName lastName age weight
   Merton    Alice  15 60.000
   Marley      Bob  20 73.125

df.adults()

name        age weight
Marley, Bob  20   73.5

df.cast<Person>().splitName()

Custom Data Schemas﻿

Custom Data Schemas