Custom Data Schemas

Edit page Last modified: 14 January 2025

You can define your own DataSchema interfaces and use them in functions and classes to represent DataFrame with specific set of columns:

After execution of this cell in Jupyter or annotation processing in IDEA, extension properties for data access will be generated. Now we can use these properties to create functions for typed DataFrame:

In Jupyter these functions will work automatically for any DataFrame that matches Person schema:

Schema of df is compatible with Person, so auto-generated schema interface will inherit from it:

Despite df has additional column weight, previously defined functions for DataFrame<Person> will work for it:

In JVM project you will have to cast DataFrame explicitly to the target interface: