reduceByKey

fun <K, V> JavaRDD<Tuple2<K, V>>.reduceByKey(partitioner: Partitioner, func: (V, V) -> V): JavaRDD<Tuple2<K, V>>

Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce.


fun <K, V> JavaRDD<Tuple2<K, V>>.reduceByKey(numPartitions: Int, func: (V, V) -> V): JavaRDD<Tuple2<K, V>>

Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce. Output will be hash-partitioned with numPartitions partitions.


fun <K, V> JavaRDD<Tuple2<K, V>>.reduceByKey(func: (V, V) -> V): JavaRDD<Tuple2<K, V>>

Merge the values for each key using an associative and commutative reduce function. This will also perform the merging locally on each mapper before sending results to a reducer, similarly to a "combiner" in MapReduce. Output will be hash-partitioned with the existing partitioner/ parallelism level.


fun <K, V> JavaDStream<Tuple2<K, V>>.reduceByKey(    numPartitions: Int = dstream().ssc().sc().defaultParallelism(),     reduceFunc: (V, V) -> V): JavaDStream<Tuple2<K, V>>

Return a new DStream by applying reduceByKey to each RDD. The values for each key are merged using the supplied reduce function. Hash partitioning is used to generate the RDDs with numPartitions partitions.


fun <K, V> JavaDStream<Tuple2<K, V>>.reduceByKey(partitioner: Partitioner, reduceFunc: (V, V) -> V): JavaDStream<Tuple2<K, V>>

Return a new DStream by applying reduceByKey to each RDD. The values for each key are merged using the supplied reduce function. org.apache.spark.Partitioner is used to control the partitioning of each RDD.