Dataframe 0.13 Help

split

Splits every value in the given columns into several values and optionally spreads them horizontally or vertically.

df.split { columns } [.cast<Type>()] [.by(delimeters) | .by { splitter } | .match(regex)] // how to split cell value [.default(value)] // how to fill nulls .into(columnNames) [ { columnNamesGenerator } ] | .inward(columnNames) [ { columnNamesGenerator } | .inplace() | .intoRows() | .intoColumns() ] // where to store results splitter = DataRow.(T) -> Iterable<Any> columnNamesGenerator = DataColumn.(columnIndex: Int) -> String

The following types of columns can be split without any splitter configuration:

  • String: split by , and trim

  • List: split into elements

  • DataFrame: split into rows

Split inplace

Stores split values as lists in original columns.

Use .inplace() terminal operation in split configuration to spread split values inplace:

df.split { name.firstName }.by { it.asIterable() }.inplace()
val name by columnGroup() val firstName by name.column<String>() df.split { firstName }.by { it.asIterable() }.inplace()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.inplace()

Split horizontally

Stores split values in new columns.

  • into(col1, col2, ... ) — store split values in new top-level columns

  • inward(col1, col2, ...) — store split values in new columns nested inside original column

  • intoColumns — split FrameColumn into ColumnGroup storing in every cell a List of original values per every column

Reverse operation: merge

columnNamesGenerator is used to generate names for additional columns when the list of explicitly specified columnNames was not long enough. columnIndex starts with 1 for the first additional column name.

Default columnNamesGenerator generates column names split1, split2...

df.split { name.lastName }.by { it.asIterable() }.into("char1", "char2")
val name by columnGroup() val lastName by name.column<String>() df.split { lastName }.by { it.asIterable() }.into("char1", "char2")
df.split { "name"["lastName"]<String>() }.by { it.asIterable() }.into("char1", "char2")
df.split { name.lastName } .by { it.asIterable() }.default(' ') .inward { "char$it" }
val name by columnGroup() val lastName by name.column<String>() df.split { lastName } .by { it.asIterable() }.default(' ') .inward { "char$it" }
df.split { "name"["lastName"]<String>() } .by { it.asIterable() }.default(' ') .inward { "char$it" }

String columns can also be split into group matches of Regex pattern:

val merged = df.merge { name.lastName and name.firstName } .by { it[0] + " (" + it[1] + ")" } .into("name")
val name by column<String>() merged.split { name } .match("""(.*) \((.*)\)""") .inward("firstName", "lastName")

FrameColumn can be split into columns:

val df1 = dataFrameOf("a", "b", "c")( 1, 2, 3, 4, 5, 6 ) val df2 = dataFrameOf("a", "b")( 5, 6, 7, 8, 9, 10 ) val group by columnOf(df1, df2) val id by columnOf("x", "y") val df = dataFrameOf(id, group) df.split { group }.intoColumns()

Split vertically

Stores split values in new rows duplicating values in other columns.

Reverse operation: implode

Use .intoRows() terminal operation in split configuration to spread split values vertically:

df.split { name.firstName }.by { it.asIterable() }.intoRows() df.split { name }.by { it.values() }.intoRows()
val name by columnGroup() val firstName by name.column<String>() df.split { firstName }.by { it.asIterable() }.intoRows() df.split { name }.by { it.values() }.intoRows()
df.split { "name"["firstName"]<String>() }.by { it.asIterable() }.intoRows() df.split { colGroup("name") }.by { it.values() }.intoRows()

Equals to split { column }...inplace().explode { column }. See explode for details.

Last modified: 29 March 2024