Data Schemas in Kotlin Notebook
After execution of a cell
the following actions take place:
Columns in
df
are analyzed to extract data schemaEmpty interface with
DataSchema
annotation is generated:
Extension properties for this
DataSchema
are generated:
Every column produces two extension properties:
Property for
ColumnsContainer<DataFrameType>
returns columnProperty for
DataRow<DataFrameType>
returns cell value
df
variable is typed by schema interface:
To log all these additional code executions, use cell magic
Custom Data Schemas
You can define your own DataSchema
interfaces and use them in functions and classes to represent DataFrame
with a specific set of columns:
After execution of this cell in notebook or annotation processing in IDEA, extension properties for data access will be generated. Now we can use these properties to create functions for typed DataFrame
:
In Kotlin Notebook these functions will work automatically for any DataFrame
that matches Person
schema:
Schema of df
is compatible with Person
, so auto-generated schema interface will inherit from it:
Despite df
has additional column weight
, previously defined functions for DataFrame<Person>
will work for it:
Use external Data Schemas
Sometimes it is convenient to extract reusable code from Kotlin Notebook into the Kotlin JVM library. Schema interfaces should also be extracted if this code uses Custom Data Schemas.
In order to enable support them in Kotlin, you should register them in library integration class with useSchema
function:
After loading this library into the notebook, schema interfaces for all DataFrame
variables that match Person
schema will derive from Person
Now df
is assignable to DataFrame<Person>
and countAdults
is available: