Dataframe 0.15 Help

Custom Data Schemas

You can define your own DataSchema interfaces and use them in functions and classes to represent DataFrame with specific set of columns:

@DataSchema interface Person { val name: String val age: Int }

After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be generated. Now we can use these properties to create functions for typed DataFrame:

fun DataFrame<Person>.splitName() = split { name }.by(",").into("firstName", "lastName") fun DataFrame<Person>.adults() = filter { age > 18 }

In Jupyter these functions will work automatically for any DataFrame that matches Person schema:

val df = dataFrameOf("name", "age", "weight")( "Merton, Alice", 15, 60.0, "Marley, Bob", 20, 73.5, )

Schema of df is compatible with Person, so auto-generated schema interface will inherit from it:

@DataSchema(isOpen = false) interface DataFrameType : Person val ColumnsContainer<DataFrameType>.weight: DataColumn<Double> get() = this["weight"] as DataColumn<Double> val DataRow<DataFrameType>.weight: Double get() = this["weight"] as Double

Despite df has additional column weight, previously defined functions for DataFrame<Person> will work for it:

df.splitName()
firstName lastName age weight Merton Alice 15 60.000 Marley Bob 20 73.125
df.adults()
name age weight Marley, Bob 20 73.5

In JVM project you will have to cast DataFrame explicitly to the target interface:

df.cast<Person>().splitName()
Last modified: 09 December 2024