split
This operation splits every value in the given columns into several values, and optionally spreads them horizontally or vertically.
df.split { columns }
[.cast<Type>()]
[.by(delimiters|regex [,trim=true][,ignoreCase=true][,limit=0]) | .by { splitter } | .match(regex)] // how to split cell value
[.default(value)] // how to fill nulls
.into(columnNames) [ { columnNamesGenerator } ] | .inward(columnNames) [ { columnNamesGenerator } | .inplace() | .intoRows() | .intoColumns() ] // where to store results
splitter = DataRow.(T) -> Iterable<Any>
columnNamesGenerator = DataColumn.(columnIndex: Int) -> String
The following types of columns can be split without any splitter configuration:
String
: split by ,
and trim
List
: split into elements
DataFrame
: split into rows
Split in place
Stores split values as lists in their original columns.
Use the .inplace()
terminal operation in your split
configuration to spread split values in place:
df.split { name.firstName }.by { it.asIterable() }.inplace()
val name by columnGroup()
val firstName by name.column<String>()
df.split { firstName }.by { it.asIterable() }.inplace()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.inplace()
Split horizontally
Stores split values in new columns.
into(col1, col2, ... )
— stores split values in new top-level columns
inward(col1, col2, ...)
— stores split values in new columns nested inside the original column
intoColumns
— splits FrameColumns
into ColumnGroups
storing in every cell in a List
of the original values per column
Reverse operation: merge
columnNamesGenerator
is used to generate names for additional columns when the list of explicitly specified columnNames
is not long enough. columnIndex
starts with 1
for the first additional column name.
The default columnNamesGenerator
generates column names like split1
, split2
, etc.
Some examples:
df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2")
val name by columnGroup()
val lastName by name.column<String>()
df.split { lastName }.by { it.asIterable() }.into("char1", "char2")
df.split { "name"["lastName"]<String>() }.by { it.asIterable() }.into("char1", "char2")
df.split { name.lastName }
.by { it.asIterable() }.default(' ')
.inward { "char$it" }
val name by columnGroup()
val lastName by name.column<String>()
df.split { lastName }
.by { it.asIterable() }.default(' ')
.inward { "char$it" }
df.split { "name"["lastName"]<String>() }
.by { it.asIterable() }.default(' ')
.inward { "char$it" }
String
columns can also be split into group matches of Regex
patterns:
val name by column<String>()
merged.split { name }
.match("""(.*) \((.*)\)""")
.inward("firstName", "lastName")
FrameColumn
can be split into columns:
val df1 = dataFrameOf("a", "b", "c")(
1, 2, 3,
4, 5, 6,
)
val df2 = dataFrameOf("a", "b")(
5, 6,
7, 8,
9, 10,
)
val group by columnOf(df1, df2)
val id by columnOf("x", "y")
val df = dataFrameOf(id, group)
df.split { group }.intoColumns()
Split vertically
Stores split values in new rows, duplicating values in other columns.
Reverse operation: implode
Use the .intoRows()
terminal operation in your split
configuration to spread split values vertically:
df.split { name.firstName }.by { it.asIterable() }.intoRows()
df.split { name }.by { it.values() }.intoRows()
val name by columnGroup()
val firstName by name.column<String>()
df.split { firstName }.by { it.asIterable() }.intoRows()
df.split { name }.by { it.values() }.intoRows()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.intoRows()
df.split { colGroup("name") }.by { it.values() }.intoRows()
Equals to split { column }...inplace().explode { column }
. See explode
for details.
Last modified: 09 December 2024