Solar radiation measurement is hard and costs much. The goal of this project is to build a regression model to predict the solar radiation as accurate as possible.
- pandas: data processing
- numoy: linear algebra
- seaborn: data visualization
- matplotlib: data visualization
- datetime: manipulate date time types of data
- sklearn: maching learning
- The original dataset
- Kaggle was the last destination in the provenance of the data
- Original source: NASA
- Variables captured within the dataset are solar radiation, temperature, humidity, barometric pressure, wind direction, wind speed, and sunrise/sunset based on Hawaii time.
- A Jupyter Notebook to clean and wrangle the data
- A look at the data: data types, distributions of numerical fields
- Data cleaning
- Feature engineering: add dummy variables and higher-order terms
- Correlation matrix
- The output file from DataWrangling.ipynb
- A Jupyter Notebook to build linear regression models
- Model 1: Linear Regression (No Higher-Order Terms)
- Model 2: Linear Regression (With Higher-Order Terms)
- Model 3: Ridge Regression
- Model 4: Lasso Regression
This project is under MIT License.