Dataframe 0.13 Help

DataColumn

DataColumn represents a column of values. It can store objects of primitive or reference types, or other DataFrame objects.

See how to create columns

Properties

  • name: String — name of the column; should be unique within containing dataframe

  • path: ColumnPath — path to the column; depends on the way column was retrieved from dataframe

  • type: KType — type of elements in the column

  • hasNulls: Boolean — flag indicating whether column contains null values

  • values: Iterable<T> — column data

  • size: Int — number of elements in the column

Column kinds

DataColumn instances can be one of three subtypes: ValueColumn, ColumnGroup or FrameColumn

ValueColumn

Represents a sequence of values.

It can store values of primitive (integers, strings, decimals, etc.) or reference types. Currently, it uses List as underlying data storage.

ColumnGroup

Container for nested columns. Is used to create column hierarchy.

FrameColumn

Special case of ValueColumn that stores another DataFrame objects as elements.

DataFrame stored in FrameColumn may have different schemas.

FrameColumn may appear after reading from JSON or other hierarchical data structures, or after grouping operations such as groupBy or pivot.

Column accessors

ColumnAccessors are used for typed data access in DataFrame. ColumnAccessor stores column name (for top-level columns) or column path (for nested columns), has type argument that corresponds to type of thep column, but it doesn't contain any actual data.

val age by column<Int>() // Access fourth cell in the "age" column of dataframe `df`. // This expression returns `Int` because variable `age` has `ColumnAccessor<Int>` type. // If dataframe `df` has no column "age" or column "age" has type which is incompatible with `Int`, // runtime exception will be thrown. df[age][3] + 5 // Access first cell in the "age" column of dataframe `df`. df[0][age] * 2 // Returns new dataframe sorted by age column (ascending) df.sortBy(age) // Returns new dataframe with the column "year of birth" added df.add("year of birth") { 2021 - age } // Returns new dataframe containing only rows with age > 30 df.filter { age > 30 }

Column accessors are created by property delegate column. Column type should be passed as type argument, column name will be taken from the variable name.

val name by column<String>()

To assign column name explicitly, pass it as an argument.

val accessor by column<String>("complex column name")

You can also create column accessors for ColumnGroups and FrameColumns

val columns by columnGroup() val frames by frameColumn()

To reference nested columns inside ColumnGroups, invoke column<T>() on accessor to parent ColumnGroup:

val name by columnGroup() val firstName by name.column<String>()

You can also create virtual accessor that doesn't point to a real column but computes some expression on every data access:

val fullName by column(df) { name.firstName + " " + name.lastName } df[fullName]
val name by columnGroup() val firstName by name.column<String>() val lastName by name.column<String>() val fullName by column { firstName() + " " + lastName() } df[fullName]
val fullName by column { "name"["firstName"]<String>() + " " + "name"["lastName"]<String>() } df[fullName]

If expression depends only on one column, you can also use map:

val age by column<Int>() val year by age.map { 2021 - it } df.filter { year > 2000 }

To convert ColumnAccessor into DataColumn add values using withValues function:

val age by column<Int>() val ageCol1 = age.withValues(15, 20) val ageCol2 = age.withValues(1..10)
Last modified: 18 July 2024