Interop with Collections
Kotlin DataFrame and Kotlin Collection represent two different approaches to data storage:
DataFramestores data by fields/columnsCollectionstores data by records/rows
Although DataFrame doesn't implement the Collection or Iterable interface, it has many similar operations, such as filter, take, first, map, groupBy etc.
DataFrame has two-way compatibility with Map and List:
List<T>->DataFrame<T>: toDataFrameDataFrame<T>->List<T>: toListMap<String, List<*>>->DataFrame<*>: toDataFrameDataFrame<*>->Map<String, List<*>>: toMapList<List<T>>->DataFrame<*>: toDataFrame
Columns, rows, and values of DataFrame can be accessed as List, Iterable and Sequence accordingly:
df.columns() // List<DataColumn>
df.rows() // Iterable<DataRow>
df.values() // Sequence<Any?>DataFrame can be used as an intermediate object for transformation from one data structure to another.
Assume you have a list of instances of some data class that you need to transform into some other format.
data class Input(val a: Int, val b: Int)
val list = listOf(Input(1, 2), Input(3, 4))You can convert this list into DataFrame using toDataFrame() extension:
val df = list.toDataFrame()Mark the original data class with DataSchema annotation to get extension properties and perform data transformations.
@DataSchema
data class Input(val a: Int, val b: Int)
val df2 = df.add("c") { a + b }tip
To enable extension properties generation, you should use the DataFrame plugin for Gradle or the Kotlin Jupyter kernel
After your data is transformed, DataFrame instances can be exported eagerly into List of another data class using toList or toListOf extensions:
data class Output(val a: Int, val b: Int, val c: Int)
val result = df2.toListOf<Output>()data class Output(val a: Int, val b: Int, val c: Int)
val result = df2.toListOf<Output>()Alternatively, one can create lazy Sequence objects. This avoids holding the entire list of objects in memory as objects are created on the fly as needed.
val df = dataFrameOf("name", "lastName", "age")("John", "Doe", 21)
.group("name", "lastName").into("fullName")
data class FullName(val name: String, val lastName: String)
data class Person(val fullName: FullName, val age: Int)
val persons = df.toListOf<Person>() // [Person(fullName = FullName(name = "John", lastName = "Doe"), age = 21)]unfold can be used as toDataFrame() analogue for specific columns inside existing dataframes