With the broader adoption of message brokers like Apache Kafka as well as distributed, message-sending architectures, the need for tools which can process vast amounts of data quickly became critical. To fill this need, we have several competing products, including Spark Streaming. In this talk, we will understand the use cases for stream processing and how Spark's concept of distributed batch processing reduces down to micro batches in the streaming case. We will understand the two streaming models for Spark, DStreams and Structured Streaming with DataFrames, and will see examples of streaming applications in Scala and F#.
ADDITIONAL MEDIA
No recordings or additional media are available for this talk.
Sarfaraz Hussain has a four-part series on Spark Structured Streaming. Part one is an introduction to the topic. Part two covers some of the basics of query structure and checkpointing. Part three introduces the concept of stateful streaming. Part four covers late-arriving data.
Ligh-rain explains the two window types for Spark Streaming: tumbling and sliding. This paper calls sliding windows the same as hopping windows, but there's a minor difference between the two and properly speaking, Spark Streaming is sliding, not hopping.
Microsoft explains different window types. This is specifically for Azure Stream Analytics, so Spark Streaming doesn't support all of these, but it does give you a good idea of the sorts of windows you might find in products.