kotlin-spark-api_3.3.2_2.13/org.jetbrains.kotlinx.spark.api/updateStateByKey

updateStateByKey

@JvmName(name = "updateStateByKeyNullable")

fun <K, V, S> JavaDStream<Tuple2<K, V>>.updateStateByKey(numPartitions: Int = dstream().ssc().sc().defaultParallelism(), updateFunc: (List<V>, S?) -> S?): JavaDStream<Tuple2<K, S>>

@JvmName(name = "updateStateByKey")

fun <K, V, S> JavaDStream<Tuple2<K, V>>.updateStateByKey(numPartitions: Int = dstream().ssc().sc().defaultParallelism(), updateFunc: (List<V>, Optional<S>) -> Optional<S>): JavaDStream<Tuple2<K, S>>

Return a new "state" DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of each key. In every batch the updateFunc will be called for each state even if there are no new values. Hash partitioning is used to generate the RDDs with Spark's default number of partitions. Note: Needs checkpoint directory to be set.

Parameters

updateFunc

State update function. If this function returns null, then corresponding state key-value pair will be eliminated.

@JvmName(name = "updateStateByKeyNullable")

fun <K, V, S> JavaDStream<Tuple2<K, V>>.updateStateByKey(partitioner: Partitioner, updateFunc: (List<V>, S?) -> S?): JavaDStream<Tuple2<K, S>>

fun <K, V, S> JavaDStream<Tuple2<K, V>>.updateStateByKey(partitioner: Partitioner, updateFunc: (List<V>, Optional<S>) -> Optional<S>): JavaDStream<Tuple2<K, S>>

Return a new "state" DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of each key. In every batch the updateFunc will be called for each state even if there are no new values. [org.apache.spark.Partitioner] is used to control the partitioning of each RDD. Note: Needs checkpoint directory to be set.

Parameters

updateFunc

State update function. Note, that this function may generate a different tuple with a different key than the input key. Therefore keys may be removed or added in this way. It is up to the developer to decide whether to remember the partitioner despite the key being changed.

partitioner

Partitioner for controlling the partitioning of each RDD in the new DStream

@JvmName(name = "updateStateByKeyNullable")

fun <K, V, S> JavaDStream<Tuple2<K, V>>.updateStateByKey(partitioner: Partitioner, initialRDD: JavaRDD<Tuple2<K, S>>, updateFunc: (List<V>, S?) -> S?): JavaDStream<Tuple2<K, S>>

fun <K, V, S> JavaDStream<Tuple2<K, V>>.updateStateByKey(partitioner: Partitioner, initialRDD: JavaRDD<Tuple2<K, S>>, updateFunc: (List<V>, Optional<S>) -> Optional<S>): JavaDStream<Tuple2<K, S>>

Return a new "state" DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values of the key. org.apache.spark.Partitioner is used to control the partitioning of each RDD. Note: Needs checkpoint directory to be set.

Parameters

updateFunc

State update function. If this function returns null, then corresponding state key-value pair will be eliminated.

partitioner

Partitioner for controlling the partitioning of each RDD in the new DStream.

initialRDD

initial state value of each key.