Create DataFrame
This section describes ways to create a DataFrame
instance.
emptyDataFrame
Returns a DataFrame
with no rows and no columns.
val df = emptyDataFrame<Any>()
dataFrameOf
Returns a DataFrame
with given column names and values.
// DataFrame with 2 columns and 3 rows
val df = dataFrameOf("name", "age")(
"Alice", 15,
"Bob", 20,
"Charlie", 100,
)
// DataFrame with 2 columns and 3 rows
val df = dataFrameOf(
"name" to listOf("Alice", "Bob", "Charlie"),
"age" to listOf(15, 20, 100),
)
val name by columnOf("Alice", "Bob", "Charlie")
val age by columnOf(15, 20, 22)
// DataFrame with 2 columns
val df = dataFrameOf(name, age)
val names = listOf("name", "age")
val values = listOf(
"Alice", 15,
"Bob", 20,
"Charlie", 22,
)
val df = dataFrameOf(names, values)
// Multiplication table
dataFrameOf(1..10) { x -> (1..10).map { x * it } }
// 5 columns filled with 7 random double values:
val names = (1..5).map { "column$it" }
dataFrameOf(names).randomDouble(7)
// 5 columns filled with 7 random double values between 0 and 1 (inclusive)
dataFrameOf(names).randomDouble(7, 0.0..1.0).print()
// 5 columns filled with 7 random int values between 0 and 100 (inclusive)
dataFrameOf(names).randomInt(7, 0..100).print()
val names = listOf("first", "second", "third")
// DataFrame with 3 columns, fill each column with 15 `true` values
val df = dataFrameOf(names).fill(15, true)
toDataFrame
Creates a DataFrame
from an Iterable<DataColumn>
:
val name by columnOf("Alice", "Bob", "Charlie")
val age by columnOf(15, 20, 22)
listOf(name, age).toDataFrame()
DataFrame
from Map<String, List<*>>
:
val map = mapOf("name" to listOf("Alice", "Bob", "Charlie"), "age" to listOf(15, 20, 22))
// DataFrame with 2 columns
map.toDataFrame()
Creates a DataFrame
from an Iterable
of basic types (except arrays):
The return type of these overloads is a typed DataFrame
. Its data schema defines the column that can be used right after the conversion for additional computations.
val names = listOf("Alice", "Bob", "Charlie")
val df: DataFrame<ValueProperty<String>> = names.toDataFrame()
df.add("length") { value.length }
Creates a DataFrame
from an Iterable<T>
with one column: "columnName: DataColumn<T>
". This is an easy way to create a DataFrame
when you have a list of Files, URLs, or a structure you want to extract data from. In a notebook, it can be convenient to start from the column of these values to see the number of rows, their toString
in a table and then iteratively add columns with the parts of the data you're interested in. It could be a File's content, a specific section of an HTML document, some metadata, etc.
val files = listOf(File("data.csv"), File("data1.csv"))
val df = files.toDataFrame(columnName = "data")
Creates a DataFrame
from an Iterable
of objects:
data class Person(val name: String, val age: Int)
val persons = listOf(Person("Alice", 15), Person("Bob", 20), Person("Charlie", 22))
val df = persons.toDataFrame()
Scans object properties using reflection and creates a ValueColumn for every property. The scope of properties for scanning is defined at compile-time by the formal types of the objects in the Iterable
, so the properties of implementation classes will not be scanned.
Specify the depth
parameter to perform deep object graph traversal and convert nested objects into ColumnGroups and FrameColumns:
data class Name(val firstName: String, val lastName: String)
data class Score(val subject: String, val value: Int)
data class Student(val name: Name, val age: Int, val scores: List<Score>)
val students = listOf(
Student(Name("Alice", "Cooper"), 15, listOf(Score("math", 4), Score("biology", 3))),
Student(Name("Bob", "Marley"), 20, listOf(Score("music", 5))),
)
val df = students.toDataFrame(maxDepth = 1)
For detailed control over object graph transformations, use the configuration DSL. It allows you to exclude particular properties or classes from the object graph traversal, compute additional columns, and configure column grouping.
val df = students.toDataFrame {
// add column
"year of birth" from { 2021 - it.age }
// scan all properties
properties(maxDepth = 1) {
exclude(Score::subject) // `subject` property will be skipped from object graph traversal
preserve<Name>() // `Name` objects will be stored as-is without transformation into DataFrame
}
// add column group
"summary" {
"max score" from { it.scores.maxOf { it.value } }
"min score" from { it.scores.minOf { it.value } }
}
}
DynamicDataFrameBuilder
Previously mentioned DataFrame
constructors throw an exception when column names are duplicated. When implementing a custom operation involving multiple DataFrame
objects, or computed columns or when parsing some third-party data, it might be desirable to disambiguate column names instead of throwing an exception.
fun peek(vararg dataframes: AnyFrame): AnyFrame {
val builder = DynamicDataFrameBuilder()
for (df in dataframes) {
df.columns().firstOrNull()?.let { builder.add(it) }
}
return builder.toDataFrame()
}
val col by columnOf(1, 2, 3)
peek(dataFrameOf(col), dataFrameOf(col))
Last modified: 09 December 2024