sampleByKeyExact

fun <K, V> JavaRDD<Tuple2<K, V>>.sampleByKeyExact(    withReplacement: Boolean,     fractions: Map<K, Double>,     seed: Long = Random.nextLong()): JavaRDD<Tuple2<K, V>>

Return a subset of this RDD sampled by key (via stratified sampling) containing exactly math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).

This method differs from sampleByKey in that we make additional passes over the RDD to create a sample size that's exactly equal to the sum of math.ceil(numItems * samplingRate) over all key values with a 99.99% confidence. When sampling without replacement, we need one additional pass over the RDD to guarantee sample size; when sampling with replacement, we need two additional passes.

Return

RDD containing the sampled subset

Parameters

withReplacement

whether to sample with or without replacement

fractions

map of specific keys to sampling rates

seed

seed for the random number generator