Extension Properties API
Auto-generated extension properties are the safest and easiest way to access columns in a DataFrame
. They are generated based on a dataframe schema, with the name and type of properties inferred from the name and type of the corresponding columns.
Having these, it allows you to work with your dataframe like:
val peopleDf /* : DataFrame<Person> */ = DataFrame.read("people.csv").cast<Person>()
val nameColumn /* : DataColumn<String> */ = peopleDf.name
val ageColumn /* : DataColumn<Int> */ = peopleDf.personData.age
and of course
peopleDf.add("lastName") { name.split(",").last() }
.dropNulls { personData.age }
.filter { survived && home.endsWith("NY") && personData.age in 10..20 }
To find out how to use this API in your environment, check out Working with Data Schemas or jump straight to Data Schemas in Gradle projects, or Data Schemas in Jupyter notebooks.
Last modified: 09 December 2024