kotlin-spark-api_3.3.2_2.13/org.jetbrains.kotlinx.spark.api/reduceByKeyAndWindow

reduceByKeyAndWindow

fun <K, V> JavaDStream<Tuple2<K, V>>.reduceByKeyAndWindow(windowDuration: Duration, slideDuration: Duration = dstream().slideDuration(), numPartitions: Int = dstream().ssc().sc().defaultParallelism(), reduceFunc: (V, V) -> V): JavaDStream<Tuple2<K, V>>

Return a new DStream by applying reduceByKey over a sliding window. This is similar to DStream.reduceByKey() but applies it over a sliding window. Hash partitioning is used to generate the RDDs with numPartitions partitions.

Parameters

reduceFunc

associative and commutative reduce function

windowDuration

width of the window; must be a multiple of this DStream's batching interval

slideDuration

sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream's batching interval

numPartitions

number of partitions of each RDD in the new DStream.

fun <K, V> JavaDStream<Tuple2<K, V>>.reduceByKeyAndWindow(windowDuration: Duration, slideDuration: Duration = dstream().slideDuration(), partitioner: Partitioner, reduceFunc: (V, V) -> V): JavaDStream<Tuple2<K, V>>

Return a new DStream by applying reduceByKey over a sliding window. Similar to DStream.reduceByKey(), but applies it over a sliding window.

Parameters

reduceFunc

associative and commutative reduce function

windowDuration

width of the window; must be a multiple of this DStream's batching interval

slideDuration

sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream's batching interval

partitioner

partitioner for controlling the partitioning of each RDD in the new DStream.

fun <K, V> JavaDStream<Tuple2<K, V>>.reduceByKeyAndWindow(invReduceFunc: (V, V) -> V, windowDuration: Duration, slideDuration: Duration = dstream().slideDuration(), numPartitions: Int = dstream().ssc().sc().defaultParallelism(), filterFunc: (Tuple2<K, V>) -> Boolean? = null, reduceFunc: (V, V) -> V): JavaDStream<Tuple2<K, V>>

Return a new DStream by applying incremental reduceByKey over a sliding window. The reduced value of over a new window is calculated using the old window's reduced value :

reduce the new values that entered the window (e.g., adding new counts)
"inverse reduce" the old values that left the window (e.g., subtracting old counts)

This is more efficient than reduceByKeyAndWindow without "inverse reduce" function. However, it is applicable to only "invertible reduce functions". Hash partitioning is used to generate the RDDs with Spark's default number of partitions.

Parameters

reduceFunc

associative and commutative reduce function

invReduceFunc

inverse reduce function; such that for all y, invertible x: invReduceFunc(reduceFunc(x, y), x) = y

windowDuration

width of the window; must be a multiple of this DStream's batching interval

slideDuration

sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream's batching interval

filterFunc

Optional function to filter expired key-value pairs; only pairs that satisfy the function are retained

fun <K, V> JavaDStream<Tuple2<K, V>>.reduceByKeyAndWindow(invReduceFunc: (V, V) -> V, windowDuration: Duration, slideDuration: Duration = dstream().slideDuration(), partitioner: Partitioner, filterFunc: (Tuple2<K, V>) -> Boolean? = null, reduceFunc: (V, V) -> V): JavaDStream<Tuple2<K, V>>

Return a new DStream by applying incremental reduceByKey over a sliding window. The reduced value of over a new window is calculated using the old window's reduced value :

reduce the new values that entered the window (e.g., adding new counts)
"inverse reduce" the old values that left the window (e.g., subtracting old counts) This is more efficient than reduceByKeyAndWindow without "inverse reduce" function. However, it is applicable to only "invertible reduce functions".

Parameters

reduceFunc

associative and commutative reduce function

invReduceFunc

inverse reduce function

windowDuration

width of the window; must be a multiple of this DStream's batching interval

slideDuration

sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream's batching interval

partitioner

partitioner for controlling the partitioning of each RDD in the new DStream.

filterFunc

Optional function to filter expired key-value pairs; only pairs that satisfy the function are retained