DataFrame 1.0 Help

countDistinct

Counts distinct rows or distinct combinations of values in selected columns.

When countDistinct is used on a DataFrame, it returns the number of distinct rows in this DataFrame.

df
df.countDistinct() // the result is 4

You can also specify which columns to use when counting distinct combinations of values.

df.countDistinct { name.firstName and city } // the result is 3
df.countDistinct { "name"["firstName"] and "city" } // the result is 3

When countDistinct is used on a GroupBy, it counts distinct rows within each group. That is, this function returns a DataFrame where each row corresponds to a group from the original GroupBy. The result contains the original group key columns and a new column with the number of distinct rows (or combinations of values in selected columns) in each group.

Let's take this GroupBy as an example:

df.groupBy { city }

Applying countDistinct to this GroupBy yields the following result:

df.groupBy { city }.countDistinct()
df.groupBy("city").countDistinct()

You can also specify which columns in the groups should be used to determine distinctness.

df.groupBy { city }.countDistinct { name.firstName }
df.groupBy("city").countDistinct { "name"["firstName"] }

The default name of the new column is countDistinct, but you can choose a different one.

df.groupBy { city }.countDistinct("uniqueFirstNames") { name.firstName }
df.groupBy("city").countDistinct("uniqueFirstNames") { "name"["firstName"] }
11 June 2026