Dataframe 0.15 Help

Create DataFrame

This section describes ways to create a DataFrame instance.

emptyDataFrame

Returns a DataFrame with no rows and no columns.

val df = emptyDataFrame<Any>()

dataFrameOf

Returns a DataFrame with given column names and values.

// DataFrame with 2 columns and 3 rows val df = dataFrameOf("name", "age")( "Alice", 15, "Bob", 20, "Charlie", 100, )
// DataFrame with 2 columns and 3 rows val df = dataFrameOf( "name" to listOf("Alice", "Bob", "Charlie"), "age" to listOf(15, 20, 100), )
val name by columnOf("Alice", "Bob", "Charlie") val age by columnOf(15, 20, 22) // DataFrame with 2 columns val df = dataFrameOf(name, age)
val names = listOf("name", "age") val values = listOf( "Alice", 15, "Bob", 20, "Charlie", 22, ) val df = dataFrameOf(names, values)
// Multiplication table dataFrameOf(1..10) { x -> (1..10).map { x * it } }
// 5 columns filled with 7 random double values: val names = (1..5).map { "column$it" } dataFrameOf(names).randomDouble(7) // 5 columns filled with 7 random double values between 0 and 1 (inclusive) dataFrameOf(names).randomDouble(7, 0.0..1.0).print() // 5 columns filled with 7 random int values between 0 and 100 (inclusive) dataFrameOf(names).randomInt(7, 0..100).print()
val names = listOf("first", "second", "third") // DataFrame with 3 columns, fill each column with 15 `true` values val df = dataFrameOf(names).fill(15, true)

toDataFrame

Creates a DataFrame from an Iterable<DataColumn>:

val name by columnOf("Alice", "Bob", "Charlie") val age by columnOf(15, 20, 22) listOf(name, age).toDataFrame()

DataFrame from Map<String, List<*>>:

val map = mapOf("name" to listOf("Alice", "Bob", "Charlie"), "age" to listOf(15, 20, 22)) // DataFrame with 2 columns map.toDataFrame()

Creates a DataFrame from an Iterable of basic types (except arrays):

The return type of these overloads is a typed DataFrame. Its data schema defines the column that can be used right after the conversion for additional computations.

val names = listOf("Alice", "Bob", "Charlie") val df: DataFrame<ValueProperty<String>> = names.toDataFrame() df.add("length") { value.length }

Creates a DataFrame from an Iterable<T> with one column: "columnName: DataColumn<T> ". This is an easy way to create a DataFrame when you have a list of Files, URLs, or a structure you want to extract data from. In a notebook, it can be convenient to start from the column of these values to see the number of rows, their toString in a table and then iteratively add columns with the parts of the data you're interested in. It could be a File's content, a specific section of an HTML document, some metadata, etc.

val files = listOf(File("data.csv"), File("data1.csv")) val df = files.toDataFrame(columnName = "data")

Creates a DataFrame from an Iterable of objects:

data class Person(val name: String, val age: Int) val persons = listOf(Person("Alice", 15), Person("Bob", 20), Person("Charlie", 22)) val df = persons.toDataFrame()

Scans object properties using reflection and creates a ValueColumn for every property. The scope of properties for scanning is defined at compile-time by the formal types of the objects in the Iterable, so the properties of implementation classes will not be scanned.

Specify the depth parameter to perform deep object graph traversal and convert nested objects into ColumnGroups and FrameColumns:

data class Name(val firstName: String, val lastName: String) data class Score(val subject: String, val value: Int) data class Student(val name: Name, val age: Int, val scores: List<Score>) val students = listOf( Student(Name("Alice", "Cooper"), 15, listOf(Score("math", 4), Score("biology", 3))), Student(Name("Bob", "Marley"), 20, listOf(Score("music", 5))), ) val df = students.toDataFrame(maxDepth = 1)

For detailed control over object graph transformations, use the configuration DSL. It allows you to exclude particular properties or classes from the object graph traversal, compute additional columns, and configure column grouping.

val df = students.toDataFrame { // add column "year of birth" from { 2021 - it.age } // scan all properties properties(maxDepth = 1) { exclude(Score::subject) // `subject` property will be skipped from object graph traversal preserve<Name>() // `Name` objects will be stored as-is without transformation into DataFrame } // add column group "summary" { "max score" from { it.scores.maxOf { it.value } } "min score" from { it.scores.minOf { it.value } } } }

DynamicDataFrameBuilder

Previously mentioned DataFrame constructors throw an exception when column names are duplicated. When implementing a custom operation involving multiple DataFrame objects, or computed columns or when parsing some third-party data, it might be desirable to disambiguate column names instead of throwing an exception.

fun peek(vararg dataframes: AnyFrame): AnyFrame { val builder = DynamicDataFrameBuilder() for (df in dataframes) { df.columns().firstOrNull()?.let { builder.add(it) } } return builder.toDataFrame() } val col by columnOf(1, 2, 3) peek(dataFrameOf(col), dataFrameOf(col))
Last modified: 09 December 2024