distinct
Removes duplicate rows. The rows in the resulting DataFrame
are in the same order as they were in the original DataFrame
.
df.distinct()
If columns are specified, resulting DataFrame
will have only given columns with distinct values.
df.distinct { age and name }
// same as
df.select { age and name }.distinct()
df.distinct("age", "name")
// same as
df.select("age", "name").distinct()
distinctBy
Keep only the first row for every group of rows grouped by some condition.
df.distinctBy { age and name }
// same as
df.groupBy { age and name }.mapToRows { group.first() }
df.distinctBy("age", "name")
// same as
df.groupBy("age", "name").mapToRows { group.first() }
20 May 2025