Skip to content

Udacity Data Streaming Nanodegree SF Crime Statistics with Spark Streaming

Notifications You must be signed in to change notification settings

rubengura/SF_Crime_Statistics

Repository files navigation

SF Crime Statistics with Spark Streaming Project

Introduction

The aim of the project is to create an Streaming application with Spark that connects to a Kafka cluster, reads and process the data.

Requirements

  • Java 1.8.x
  • Scala 2.11.x
  • Spark 2.4.x
  • Kafka
  • Python 3.6 or above

How to use the application

In order to run the application you will need to start:

  1. Zookeeper:

/usr/bin/zookeeper-server-start config/zookeeper.properties

  1. Kafka server:

/usr/bin/kafka-server-start config/server.properties

  1. Insert data into topic:

python kafka_server.py

  1. Kafka consumer:

kafka-console-consumer --topic "topic-name" --from-beginning --bootstrap-server localhost:9092

  1. Run Spark job:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.4 --master local[*] data_stream.py

Kafka Consumer Console Output

kafka consumer output

Progress Reporter

progress reporter

Count Output

count output

About

Udacity Data Streaming Nanodegree SF Crime Statistics with Spark Streaming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published