Dataframe 1.0 Help

split

This operation splits every value in the given columns into several values and optionally spreads them horizontally or vertically.

df.split { columns } [.cast<Type>()] [.by(delimiters|regex [,trim=true][,ignoreCase=true][,limit=0]) | .by { splitter } | .match(regex)] // how to split cell value [.default(value)] // how to fill nulls .into(columnNames) [ { columnNamesGenerator } ] | .inward(columnNames) [ { columnNamesGenerator } | .inplace() | .intoRows() | .intoColumns() ] // where to store results splitter = DataRow.(T) -> Iterable<Any> columnNamesGenerator = DataColumn.(columnIndex: Int) -> String

The following types of columns can be split easily:

  • String: for instance, by ","

  • List: splits into elements, no by required!

  • DataFrame: splits into rows, no by required!

See column selectors for how to select the columns for this operation.

Split in place

Stores split values as lists in their original columns.

Use the .inplace() terminal operation in your split configuration to spread split values in place:

df.split { name.firstName }.by { it.asIterable() }.inplace()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.inplace()

Split horizontally

Stores split values in new columns.

  • into(col1, col2, ... ) — stores split values in new top-level columns

  • inward(col1, col2, ...) — stores split values in new columns nested inside the original column

  • intoColumns — splits FrameColumns into ColumnGroups storing in every cell in a List of the original values per column

Reverse operation: merge

columnNamesGenerator is used to generate names for additional columns when the list of explicitly specified columnNames is not long enough. columnIndex starts with 1 for the first additional column name.

The default columnNamesGenerator generates column names like split1, split2, etc.

Some examples:

df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2")
df.split { "name"["lastName"]<String>() }.by { it.asIterable() }.into("char1", "char2")
df.split { name.lastName } .by { it.asIterable() }.default(' ') .inward { "char$it" }
df.split { "name"["lastName"]<String>() } .by { it.asIterable() }.default(' ') .inward { "char$it" }

String columns can also be split into group matches of Regex patterns:

val name by column<String>() merged.split { name } .match("""(.*) \((.*)\)""") .inward("firstName", "lastName")

FrameColumn can be split into columns:

val df1 = dataFrameOf("a", "b", "c")( 1, 2, 3, 4, 5, 6, ) val df2 = dataFrameOf("a", "b")( 5, 6, 7, 8, 9, 10, ) val group by columnOf(df1, df2) val id by columnOf("x", "y") val df = dataFrameOf(id, group) df.split { group }.intoColumns()

Split vertically

Stores split values in new rows, duplicating values in other columns.

Reverse operation: implode

Use the .intoRows() terminal operation in your split configuration to spread split values vertically:

df.split { name.firstName }.by { it.asIterable() }.intoRows() df.split { name }.by { it.values() }.intoRows()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.intoRows() df.split { colGroup("name") }.by { it.values() }.intoRows()

Equals to split { column }...inplace().explode { column }. See explode for details.

16 June 2025