split
This operation splits every value in the given columns into several values and optionally spreads them horizontally or vertically.
df.split { columns }
[.cast<Type>()]
[.by(delimiters|regex [,trim=true][,ignoreCase=true][,limit=0]) | .by { splitter } | .match(regex)] // how to split cell value
[.default(value)] // how to fill nulls
.into(columnNames) [ { columnNamesGenerator } ] | .inward(columnNames) [ { columnNamesGenerator } | .inplace() | .intoRows() | .intoColumns() ] // where to store results
splitter = DataRow.(T) -> Iterable<Any>
columnNamesGenerator = DataColumn.(columnIndex: Int) -> StringThe following types of columns can be split easily:
String: for instance, by","List: splits into elements, nobyrequired!DataFrame: splits into rows, nobyrequired!
Related operations: Split / merge columns
See column selectors for how to select the columns for this operation.
Stores split values as lists in their original columns.
Use the .inplace() terminal operation in your split configuration to spread split values in place:
df.split { name.firstName }.by { it.asIterable() }.inplace()df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.inplace()Stores split values in new columns.
into(col1, col2, ... )— stores split values in new top-level columnsinward(col1, col2, ...)— stores split values in new columns nested inside the original columnintoColumns— splitsFrameColumnsintoColumnGroupsstoring in every cell in aListof the original values per column
Reverse operation: merge
columnNamesGenerator is used to generate names for additional columns when the list of explicitly specified columnNames is not long enough. columnIndex starts with 1 for the first additional column name.
The default columnNamesGenerator generates column names like split1, split2, etc.
Some examples:
df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2")df.split { "name"["lastName"]<String>() }.by { it.asIterable() }.into("char1", "char2")df.split { name.lastName }
.by { it.asIterable() }.default(' ')
.inward { "char$it" }df.split { "name"["lastName"]<String>() }
.by { it.asIterable() }.default(' ')
.inward { "char$it" }String columns can also be split into group matches of Regex patterns:
merged.split { "name"<String>() }
.match("""(.*) \((.*)\)""")
.inward("firstName", "lastName")FrameColumn can be split into columns:
val df1 = dataFrameOf("a", "b", "c")(
1, 2, 3,
4, 5, 6,
)
val df2 = dataFrameOf("a", "b")(
5, 6,
7, 8,
9, 10,
)
val df = dataFrameOf(
"id" to columnOf("x", "y"),
"group" to columnOf(df1, df2)
)
df.split { "group"<AnyFrame>() }.intoColumns()Stores split values in new rows, duplicating values in other columns.
Reverse operation: implode
Use the .intoRows() terminal operation in your split configuration to spread split values vertically:
df.split { name.firstName }.by { it.asIterable() }.intoRows()
df.split { name }.by { it.values() }.intoRows()df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.intoRows()
df.split { colGroup("name") }.by { it.values() }.intoRows()Equals to split { column }...inplace().explode { column }. See explode for details.