sampleByKey
fun <K, V> JavaRDD<Tuple2<K, V>>.sampleByKey( withReplacement: Boolean, fractions: Map<K, Double>, seed: Long = Random.nextLong()): JavaRDD<Tuple2<K, V>>
Return a subset of this RDD sampled by key (via stratified sampling).
Create a sample of this RDD using variable sampling rates for different keys as specified by fractions, a key to sampling rate map, via simple random sampling with one pass over the RDD, to produce a sample of size that's approximately equal to the sum of math.ceil(numItems * samplingRate) over all key values.
Return
RDD containing the sampled subset
Parameters
withReplacement
whether to sample with or without replacement
fractions
map of specific keys to sampling rates
seed
seed for the random number generator