kotlin-spark-api_3.3.2_2.13/org.jetbrains.kotlinx.spark.api/withSparkStreaming

withSparkStreaming

fun withSparkStreaming( batchDuration: Duration = Durations.seconds(1L), checkpointPath: String? = null, hadoopConf: Configuration = SparkHadoopUtil.get().conf(), createOnError: Boolean = false, props: Map<String, Any> = emptyMap(), master: String = SparkConf().get("spark.master", "local[*]"), appName: String = "Kotlin Spark Sample", timeout: Long = -1L, startStreamingContext: Boolean = true, func: KSparkStreamingSession.() -> Unit)

Wrapper for spark streaming creation. spark: SparkSession and ssc: JavaStreamingContext are provided, started, awaited, and stopped automatically. The use of a checkpoint directory is optional. If checkpoint data exists in the provided checkpointPath, then StreamingContext will be recreated from the checkpoint data. If the data does not exist, then the provided factory will be used to create a JavaStreamingContext.

Parameters

batchDuration

The time interval at which streaming data will be divided into batches. Defaults to 1 second.

checkpointPath

If checkpoint data exists in the provided checkpointPath, then StreamingContext will be recreated from the checkpoint data. If the data does not exist (or null is provided), then the streaming context will be built using the other provided parameters.

hadoopConf

Only used if checkpointPath is given. Hadoop configuration if necessary for reading from any HDFS compatible file system.

createOnError

Only used if checkpointPath is given. Whether to create a new JavaStreamingContext if there is an error in reading checkpoint data.

props

Spark options, value types are runtime-checked for type-correctness.

master

Sets the Spark master URL to connect to, such as "local" to run locally, "local4" to run locally with 4 cores, or "spark://master:7077" to run on a Spark standalone cluster. By default, it tries to get the system value "spark.master", otherwise it uses "local*".

appName

Sets a name for the application, which will be shown in the Spark web UI. If no application name is set, a randomly generated name will be used.

timeout

The time in milliseconds to wait for the stream to terminate without input. -1 by default, this means no timeout.

startStreamingContext

Defaults to true. If set to false, then the streaming context will not be started.

func

Function which will be executed in context of KSparkStreamingSession (it means that this inside block will point to KSparkStreamingSession)