Skip to content

This Python-based project extracts data from Wikipedia using Apache Airflow, cleans it and pushes it Azure Data Lake for processing and further processing and visualization is done on Azure Data Factory, Azure Synapse and Tableau.

Notifications You must be signed in to change notification settings

Punam918/FootballAnalysis_DataEngineering

Repository files navigation

Football Data Engineering

This project leverages Python and Apache Airflow to extract data from Wikipedia. The extracted data is cleaned and stored in Azure Data Lake for further processing. Subsequent data transformation and analysis are performed using Azure Data Factory and Azure Synapse Analytics. Finally, the processed data is visualized using Tableau, enabling actionable insights and decision-making.

Requirements

  • Python 3.10
  • Docker
  • MySql
  • Azure
  • Apache Airflow 2.6

System Architecture

Alt text

Fetching And Processing Data

Football Data based on stadium by capacity was fetched from wikipedia page https://en.wikipedia.org/wiki/List_of_association_football_stadiums_by_capacity using Apache Airflow and the fetched data was trasnformed and stored in Azure Data Lake and further queried using Synapse. Alt text

Visualization

For visualizing data, Tableau was used Alt text

Alt text

Alt text

About

This Python-based project extracts data from Wikipedia using Apache Airflow, cleans it and pushes it Azure Data Lake for processing and further processing and visualization is done on Azure Data Factory, Azure Synapse and Tableau.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published