The aim of the project is to create an Streaming application with Spark that connects to a Kafka cluster, reads and process the data.
- Java 1.8.x
- Scala 2.11.x
- Spark 2.4.x
- Kafka
- Python 3.6 or above
In order to run the application you will need to start:
- Zookeeper:
/usr/bin/zookeeper-server-start config/zookeeper.properties
- Kafka server:
/usr/bin/kafka-server-start config/server.properties
- Insert data into topic:
python kafka_server.py
- Kafka consumer:
kafka-console-consumer --topic "topic-name" --from-beginning --bootstrap-server localhost:9092
- Run Spark job:
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.4 --master local[*] data_stream.py