sampleByKey

fun <K, V> JavaRDD<Tuple2<K, V>>.sampleByKey(    withReplacement: Boolean,     fractions: Map<K, Double>,     seed: Long = Random.nextLong()): JavaRDD<Tuple2<K, V>>

Return a subset of this RDD sampled by key (via stratified sampling).

Create a sample of this RDD using variable sampling rates for different keys as specified by fractions, a key to sampling rate map, via simple random sampling with one pass over the RDD, to produce a sample of size that's approximately equal to the sum of math.ceil(numItems * samplingRate) over all key values.

Return

RDD containing the sampled subset

Parameters

withReplacement

whether to sample with or without replacement

fractions

map of specific keys to sampling rates

seed

seed for the random number generator