A join creates a new dataframe by combining rows from two input dataframes according to one or more key columns. Rows are merged when the values in the join columns match. If there is no match, whether the row is included and how missing values are filled depends on the type of join (e.g., inner, left, right, full).
Returns a new DataFrame that contains the merged rows and columns from both inputs.
join(otherDf, type = JoinType.Inner) [ { joinColumns } ]
joinColumns: JoinDsl.(LeftDataFrame) -> Columns
interface JoinDsl: LeftDataFrame {
val right: RightDataFrame
fun DataColumn.match(rightColumn: DataColumn)
}
joinColumns is a special case of columns selector that defines column mapping for join.
Use the Join DSL when the key column names differ:
access the right DataFrame via right;
define the join condition with match.
dfAges
dfCities
// INNER JOIN on differently named keys:
// Merge a row when dfAges.firstName == dfCities.name.
// With the given data all 3 names match → all rows merge.
dfAges.join(dfCities) { firstName match right.name }
Join with explicit keys (with the same names)
If mapped columns have the same name, just select join columns (one or several) from the left DataFrame:
dfLeft
dfRight
// INNER JOIN on "name" only:
// Merge when left.name == right.name.
// Duplicate keys produce multiple merged rows (one per pairing).
dfLeft.join(dfRight) { name }
Join with implicit keys (all columns with the same name)
If joinColumns is not specified, columns with the same name from both DataFrame objects will be used as join columns:
dfLeft
dfRight
// INNER JOIN on all same-named columns ("name" and "city"):
// Merge when BOTH name AND city are equal; otherwise the row is dropped.
dfLeft.join(dfRight)
Join types
Supported join types:
Inner (default) — only matched rows from left and right DataFrame objects
// INNER JOIN:
// Combines columns from the left and right dataframes
// and keep only rows where (name, city) matches on both sides.
dfLeft.innerJoin(dfRight) { name and city }
Filter
dfLeft
dfRight
// FILTER JOIN:
// Keep ONLY left rows that have ANY match on (name, city).
// No right-side columns are added.
dfLeft.filterJoin(dfRight) { name and city }
Left
dfLeft
dfRight
// LEFT JOIN:
// Keep ALL left rows and add columns from the right dataframe.
// If (name, city) matches, attach right columns values from
// the corresponding row in the right dataframe;
// if not (e.g. ("Bob", "Dubai") row), fill them with `null`.
dfLeft.leftJoin(dfRight) { name and city }
Right
dfLeft
dfRight
// RIGHT JOIN:
// Keep ALL right rows and add columns from the left dataframe.
// If (name, city) matches, attach left columns values from
// the corresponding row in the left dataframe;
// if not (e.g. ("Bob", "Tokyo") row), fill them with `null`.
dfLeft.rightJoin(dfRight) { name and city }
Full
dfLeft
dfRight
// FULL JOIN:
// Keep ALL rows from both sides. Where there's no match on (name, city),
// the other side is filled with nulls.
dfLeft.fullJoin(dfRight) { name and city }
Exclude
dfLeft
dfRight
// EXCLUDE JOIN:
// Keep ONLY left rows that have NO match on (name, city).
// Useful to find "unpaired" left rows.
dfLeft.excludeJoin(dfRight) { name and city }