kotlin-spark-api_3.3.2_2.13/org.jetbrains.kotlinx.spark.api/groupByKeyAndWindow

groupByKeyAndWindow

fun <K, V> JavaDStream<Tuple2<K, V>>.groupByKeyAndWindow( windowDuration: Duration, slideDuration: Duration = dstream().slideDuration(), numPartitions: Int = dstream().ssc().sc().defaultParallelism()): JavaDStream<Tuple2<K, Iterable<V>>>

Return a new DStream by applying groupByKey over a sliding window on this DStream. Similar to DStream.groupByKey(), but applies it over a sliding window. Hash partitioning is used to generate the RDDs with numPartitions partitions.

Parameters

windowDuration

width of the window; must be a multiple of this DStream's batching interval

slideDuration

sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream's batching interval

numPartitions

number of partitions of each RDD in the new DStream; if not specified then Spark's default number of partitions will be used

fun <K, V> JavaDStream<Tuple2<K, V>>.groupByKeyAndWindow( windowDuration: Duration, slideDuration: Duration = dstream().slideDuration(), partitioner: Partitioner): JavaDStream<Tuple2<K, Iterable<V>>>

Create a new DStream by applying groupByKey over a sliding window on this DStream. Similar to DStream.groupByKey(), but applies it over a sliding window.

Parameters

windowDuration

width of the window; must be a multiple of this DStream's batching interval

slideDuration

sliding interval of the window (i.e., the interval after which the new DStream will generate RDDs); must be a multiple of this DStream's batching interval

partitioner

partitioner for controlling the partitioning of each RDD in the new DStream.