By the end of this project, you will use the Apache Spark Structured Streaming API with Python to stream data from two different sources, store a dataset in the MongoDB database, and join two datasets. The Apache Spark Structured Streaming API is used to continuously stream data from various sources including the file system or a TCP/IP socket. One application is to continuously capture data from weather stations for historical purposes.
Apache Spark SQL
Apache Spark Structured Streaming API
Apache Spark Schema