countApproxDistinctByKey
Return approximate number of distinct values for each key in this RDD.
The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.
Parameters
Relative accuracy. Smaller values create counters that require more space. It must be greater than 0.000017.
partitioner of the resulting RDD.
Return approximate number of distinct values for each key in this RDD.
The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.
Parameters
Relative accuracy. Smaller values create counters that require more space. It must be greater than 0.000017.
number of partitions of the resulting RDD.
Return approximate number of distinct values for each key in this RDD.
The algorithm used is based on streamlib's implementation of "HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm", available here.
Parameters
Relative accuracy. Smaller values create counters that require more space. It must be greater than 0.000017.