DataColumn
DataColumn
represents a column of values. It can store objects of primitive or reference types, or other DataFrame
objects.
Properties
name: String
— name of the column; should be unique within containing dataframepath: ColumnPath
— path to the column; depends on the way column was retrieved from dataframetype: KType
— type of elements in the columnhasNulls: Boolean
— flag indicating whether column containsnull
valuesvalues: Iterable<T>
— column datasize: Int
— number of elements in the column
Column kinds
DataColumn
instances can be one of three subtypes: ValueColumn
, ColumnGroup
or FrameColumn
ValueColumn
Represents a sequence of values.
It can store values of primitive (integers, strings, decimals, etc.) or reference types. Currently, it uses List
as underlying data storage.
ColumnGroup
Container for nested columns. Is used to create column hierarchy.
FrameColumn
Special case of ValueColumn
that stores another DataFrame
objects as elements.
DataFrame
stored in FrameColumn
may have different schemas.
FrameColumn
may appear after reading from JSON or other hierarchical data structures, or after grouping operations such as groupBy or pivot.
Column accessors
ColumnAccessors
are used for typed data access in DataFrame
. ColumnAccessor
stores column name
(for top-level columns) or column path (for nested columns), has type argument that corresponds to type
of thep column, but it doesn't contain any actual data.
Column accessors are created by property delegate column
. Column type
should be passed as type argument, column name
will be taken from the variable name.
To assign column name explicitly, pass it as an argument.
You can also create column accessors for ColumnGroups and FrameColumns
To reference nested columns inside ColumnGroups, invoke column<T>()
on accessor to parent ColumnGroup
:
You can also create virtual accessor that doesn't point to a real column but computes some expression on every data access:
If expression depends only on one column, you can also use map
:
To convert ColumnAccessor
into DataColumn
add values using withValues
function: