kotlin-spark-api_3.3.2_2.13/org.jetbrains.kotlinx.spark.api/combineByKey

combineByKey

fun <K, V, C> JavaRDD<Tuple2<K, V>>.combineByKey( createCombiner: (V) -> C, mergeValue: (C, V) -> C, mergeCombiners: (C, C) -> C, partitioner: Partitioner, mapSideCombine: Boolean = true, serializer: Serializer? = null): JavaRDD<Tuple2<K, C>>

Generic function to combine the elements for each key using a custom set of aggregation functions. This method is here for backward compatibility. It does not provide combiner classtag information to the shuffle.

fun <K, V, C> JavaRDD<Tuple2<K, V>>.combineByKey( createCombiner: (V) -> C, mergeValue: (C, V) -> C, mergeCombiners: (C, C) -> C, numPartitions: Int): JavaRDD<Tuple2<K, C>>

Simplified version of combineByKeyWithClassTag that hash-partitions the output RDD. This method is here for backward compatibility. It does not provide combiner classtag information to the shuffle.

fun <K, V, C> JavaRDD<Tuple2<K, V>>.combineByKey( createCombiner: (V) -> C, mergeValue: (C, V) -> C, mergeCombiners: (C, C) -> C): JavaRDD<Tuple2<K, C>>

Simplified version of combineByKeyWithClassTag that hash-partitions the resulting RDD using the existing partitioner/parallelism level. This method is here for backward compatibility. It does not provide combiner classtag information to the shuffle.

fun <K, V, C> JavaDStream<Tuple2<K, V>>.combineByKey( createCombiner: (V) -> C, mergeValue: (C, V) -> C, mergeCombiner: (C, C) -> C, numPartitions: Int = dstream().ssc().sc().defaultParallelism(), mapSideCombine: Boolean = true): JavaDStream<Tuple2<K, C>>

fun <K, V, C> JavaDStream<Tuple2<K, V>>.combineByKey( createCombiner: (V) -> C, mergeValue: (C, V) -> C, mergeCombiner: (C, C) -> C, partitioner: Partitioner, mapSideCombine: Boolean = true): JavaDStream<Tuple2<K, C>>

Combine elements of each key in DStream's RDDs using custom functions. This is similar to the combineByKey for RDDs. Please refer to combineByKey in org.apache.spark.rdd.PairRDDFunctions in the Spark core documentation for more information.