Dataframe 0.15 Help

split

This operation splits every value in the given columns into several values, and optionally spreads them horizontally or vertically.

df.split { columns } [.cast<Type>()] [.by(delimiters|regex [,trim=true][,ignoreCase=true][,limit=0]) | .by { splitter } | .match(regex)] // how to split cell value [.default(value)] // how to fill nulls .into(columnNames) [ { columnNamesGenerator } ] | .inward(columnNames) [ { columnNamesGenerator } | .inplace() | .intoRows() | .intoColumns() ] // where to store results splitter = DataRow.(T) -> Iterable<Any> columnNamesGenerator = DataColumn.(columnIndex: Int) -> String

The following types of columns can be split without any splitter configuration:

  • String: split by , and trim

  • List: split into elements

  • DataFrame: split into rows

Split in place

Stores split values as lists in their original columns.

Use the .inplace() terminal operation in your split configuration to spread split values in place:

df.split { name.firstName }.by { it.asIterable() }.inplace()
val name by columnGroup() val firstName by name.column<String>() df.split { firstName }.by { it.asIterable() }.inplace()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.inplace()

Split horizontally

Stores split values in new columns.

  • into(col1, col2, ... ) — stores split values in new top-level columns

  • inward(col1, col2, ...) — stores split values in new columns nested inside the original column

  • intoColumns — splits FrameColumns into ColumnGroups storing in every cell in a List of the original values per column

Reverse operation: merge

columnNamesGenerator is used to generate names for additional columns when the list of explicitly specified columnNames is not long enough. columnIndex starts with 1 for the first additional column name.

The default columnNamesGenerator generates column names like split1, split2, etc.

Some examples:

df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2")
val name by columnGroup() val lastName by name.column<String>() df.split { lastName }.by { it.asIterable() }.into("char1", "char2")
df.split { "name"["lastName"]<String>() }.by { it.asIterable() }.into("char1", "char2")
df.split { name.lastName } .by { it.asIterable() }.default(' ') .inward { "char$it" }
val name by columnGroup() val lastName by name.column<String>() df.split { lastName } .by { it.asIterable() }.default(' ') .inward { "char$it" }
df.split { "name"["lastName"]<String>() } .by { it.asIterable() }.default(' ') .inward { "char$it" }

String columns can also be split into group matches of Regex patterns:

val name by column<String>() merged.split { name } .match("""(.*) \((.*)\)""") .inward("firstName", "lastName")

FrameColumn can be split into columns:

val df1 = dataFrameOf("a", "b", "c")( 1, 2, 3, 4, 5, 6, ) val df2 = dataFrameOf("a", "b")( 5, 6, 7, 8, 9, 10, ) val group by columnOf(df1, df2) val id by columnOf("x", "y") val df = dataFrameOf(id, group) df.split { group }.intoColumns()

Split vertically

Stores split values in new rows, duplicating values in other columns.

Reverse operation: implode

Use the .intoRows() terminal operation in your split configuration to spread split values vertically:

df.split { name.firstName }.by { it.asIterable() }.intoRows() df.split { name }.by { it.values() }.intoRows()
val name by columnGroup() val firstName by name.column<String>() df.split { firstName }.by { it.asIterable() }.intoRows() df.split { name }.by { it.values() }.intoRows()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.intoRows() df.split { colGroup("name") }.by { it.values() }.intoRows()

Equals to split { column }...inplace().explode { column }. See explode for details.

Last modified: 09 December 2024