Probabilistic_Forecasting-Temperature

Kaggle Community Challenge 04/02/2024-07/21/2024

https://www.kaggle.com/competitions/probabilistic-forecasting-i-temperature

This project provides a solution to the Kaggle open community challenge on probabilistic temperature forecasting, which entails predicting the likelihood of various potential future temperatures rather than offering a single deterministic forecast. The proposed methodology employs an ensemble of models, combining traditional time series models—Holt-Winters + ARIMA(9,1,0) for the residuals to generate initial forecasts, and machine learning models—LightGBM used to produce the final forecast along with a range of possible temperature outcomes, each with an associated probability. This reflects the inherent uncertainty in weather predictions. The proposed solution achieved a Continuous Ranked Probability Score (CRPS) of 1.2035, securing the 10th position on the final leaderboard as part of Team Naive Forecasters.

Dataset

There are 64,320 rows of training data spanning from 2016-07-01 to 2018-05-01. There are 5,360 rows of data in the test dataset, spanning from 2018-05-02 to 2018-06-26. The data consists of a time-stamp and 6 anonymized features. The target is the Temperature column. The temperature is recorded once in 15 minutes i.e. 96 times/day.

Methodology

Exploratory Data-Analysis

The auto-correlation chart for 96 lags reveals that the temperature at all lags is significantly above the horizontal threshold (well above zero). Consequently, we will generate 96 time-lagged temperature features in our approach. Additionally, we will create cosine and sine features to capture cyclic patterns.

HeatMap

WorkFlow

1. Holtwinter's + ARIMA(9,1,0)

2. LightGBM

Performance of Validation Data

The Continuous Ranked Probability Score (CRPS) on validation data using the above workflow is 0.25934600174955524

Coverage-Report

The below figure shows median prediction, the target and the quantiles (shown as prediction intervals) for the validation data:

Examining the coverage report, the ensemble model has met all the coverage criteria for the validation data. Therefore, the model was retrained using the entire training dataset and then used to perform forecasting on the test data. The Continuous Ranked Probability Score (CRPS) for the test data is 1.2035.

Contributors

Sai Harsha Vardhan Reddy, Kolan- skolan@horizon.csueastbay.edu, harsha62334@gmail.com
Veera Venkata Sai Kalyan, Kaparaju- vkaparaju@horizon.csueastbay.edu, kvvsaikalyan01@gmail.com

Thanks for reading!

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Dataset		Dataset
Expo+Arima and LBGM.main.ipynb		Expo+Arima and LBGM.main.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Probabilistic_Forecasting-Temperature

Kaggle Community Challenge 04/02/2024-07/21/2024

Dataset

Methodology

Exploratory Data-Analysis

HeatMap

WorkFlow

1. Holtwinter's + ARIMA(9,1,0)

2. LightGBM

Performance of Validation Data

Coverage-Report

Contributors

About

Releases

Packages

Languages

KolanHarsha/Probabilistic_Forecasting-Temperature

Folders and files

Latest commit

History

Repository files navigation

Probabilistic_Forecasting-Temperature

Kaggle Community Challenge 04/02/2024-07/21/2024

Dataset

Methodology

Exploratory Data-Analysis

HeatMap

WorkFlow

1. Holtwinter's + ARIMA(9,1,0)

2. LightGBM

Performance of Validation Data

Coverage-Report

Contributors

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages