sampleByKeyExact
fun <K, V> JavaRDD<Tuple2<K, V>>.sampleByKeyExact( withReplacement: Boolean, fractions: Map<K, Double>, seed: Long = Random.nextLong()): JavaRDD<Tuple2<K, V>>
Return a subset of this RDD sampled by key (via stratified sampling) containing exactly math.ceil(numItems * samplingRate) for each stratum (group of pairs with the same key).
This method differs from sampleByKey in that we make additional passes over the RDD to create a sample size that's exactly equal to the sum of math.ceil(numItems * samplingRate) over all key values with a 99.99% confidence. When sampling without replacement, we need one additional pass over the RDD to guarantee sample size; when sampling with replacement, we need two additional passes.
Return
RDD containing the sampled subset
Parameters
withReplacement
whether to sample with or without replacement
fractions
map of specific keys to sampling rates
seed
seed for the random number generator