split
This operation splits every value in the given columns into several values and optionally spreads them horizontally or vertically.
df.split { columns }
[.cast<Type>()]
[.by(delimiters|regex [,trim=true][,ignoreCase=true][,limit=0]) | .by { splitter } | .match(regex)] // how to split cell value
[.default(value)] // how to fill nulls
.into(columnNames) [ { columnNamesGenerator } ] | .inward(columnNames) [ { columnNamesGenerator } | .inplace() | .intoRows() | .intoColumns() ] // where to store results
splitter = DataRow.(T) -> Iterable<Any>
columnNamesGenerator = DataColumn.(columnIndex: Int) -> String
The following types of columns can be split easily:
String
: for instance, by","
List
: splits into elements, noby
required!DataFrame
: splits into rows, noby
required!
See column selectors for how to select the columns for this operation.
Stores split values as lists in their original columns.
Use the .inplace()
terminal operation in your split
configuration to spread split values in place:
df.split { name.firstName }.by { it.asIterable() }.inplace()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.inplace()
Stores split values in new columns.
into(col1, col2, ... )
— stores split values in new top-level columnsinward(col1, col2, ...)
— stores split values in new columns nested inside the original columnintoColumns
— splitsFrameColumns
intoColumnGroups
storing in every cell in aList
of the original values per column
Reverse operation: merge
columnNamesGenerator
is used to generate names for additional columns when the list of explicitly specified columnNames
is not long enough. columnIndex
starts with 1
for the first additional column name.
The default columnNamesGenerator
generates column names like split1
, split2
, etc.
Some examples:
df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2")
df.split { "name"["lastName"]<String>() }.by { it.asIterable() }.into("char1", "char2")
df.split { name.lastName }
.by { it.asIterable() }.default(' ')
.inward { "char$it" }
df.split { "name"["lastName"]<String>() }
.by { it.asIterable() }.default(' ')
.inward { "char$it" }
String
columns can also be split into group matches of Regex
patterns:
val name by column<String>()
merged.split { name }
.match("""(.*) \((.*)\)""")
.inward("firstName", "lastName")
FrameColumn
can be split into columns:
val df1 = dataFrameOf("a", "b", "c")(
1, 2, 3,
4, 5, 6,
)
val df2 = dataFrameOf("a", "b")(
5, 6,
7, 8,
9, 10,
)
val group by columnOf(df1, df2)
val id by columnOf("x", "y")
val df = dataFrameOf(id, group)
df.split { group }.intoColumns()
Stores split values in new rows, duplicating values in other columns.
Reverse operation: implode
Use the .intoRows()
terminal operation in your split
configuration to spread split values vertically:
df.split { name.firstName }.by { it.asIterable() }.intoRows()
df.split { name }.by { it.values() }.intoRows()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.intoRows()
df.split { colGroup("name") }.by { it.values() }.intoRows()
Equals to split { column }...inplace().explode { column }
. See explode
for details.