- Elham Amini
- Shikhar Bahadur Bhandari
- JP Burgess
- Tyler Lentini
- Stan Misina
- Tony Prahtchaya Poonsombaht
A study of how natural factors such as solar irradiance, solar angle, cloud cover, rainfall and more effect solar panel power generation.
The reason we selected this topic is to hopefully encourage state governments and buisnesses to invest in solar energy stations and solar technology. We would like to show how location and weather patterns can inform the potential for solar panel electrical generation. We propose building a machine learning model that takes solar irradiance, temperature, wind, cloud cover and solar angle into account to provide an estimate of solar panel generation potential. Hopefully we could use this model to encourage state governments to invest in solar power by showing the predicted amount of power generation. We hope to show promising results, and enthuse our government into writing and approving more policies and bills in regards to the production of solar stations and solar technology.
- Electrical output
- Irradiance
- Temperature
- Wind
- Solar Angle
A database covering minute to minute solar panel output and weather data from 2015 to 2018. Export to CSV.
- Temperature
- Wind
- Solar Irradiance
- Solar Zenith Angle
Database covering hourly global solar irradiance. We can customize the api attributes based on features we need. Database converted to csv.
- Can we find a relation between weather patterns and solar power generation to inform state governments of the potential for solar power in their area?
- Can we find a relation between weather patterns and solar power generation?
- Do weather patterns have an effect on solar power generation?
- Could these predictions lead to policy adoption and changes?
- Can the study be used to promote investment in solar power generation facilities?
We will use a postgreSQL database to house our data. Sample data is pulled from https://pvdata.nist.gov/ and loaded to the database. We will merge data from our weather table with the features for our linear regression. Our machine learning model takes features from the solarData table and performs linear regression.
We plan on using a linear regression model with weather data as features (solar angle, solar irradiance, temperature, wind) and electrical output of the solar panels as the target. We can use this model to predict the potential energy of a given location based on their climate and latitude using widely available solar irradiance and weather data. We will utilize a PostgreSQL Database connected to our machine learning model which will be run using Python and it's data analytics libraries such as Pandas, SkLearn, and many others. Another possible model we could use is a multi-layer neural network with sigmoid and non-sigmoid activations. The models will be built with static and dynamic configurations for differential testing.
Our data was obtained from two sources: The National Institute of Standards and Technology (NIST) and the National Solar Radiation Database (NREL). The NIST data contained data on three solar arrays located on the NIST campus. The data was downloaded from the NIST website in CSV format with separate folders for each month and each year. A python script was made to extract data from all the CSVs and distill them into one dataframe. Additionally the NIST dataset contained minute to minute data from 2015 to 2018 while the NREL data was in half-hour intervals. The NIST dataset had to be trimmed down to align with the NREL timestamps. The NREL data was obtained through an API and a script that pulled all data from 2015 to 2018.
Description of preliminary feature engineering and preliminary feature selection, including their decision-making process:
The NIST data had many features about the arrays and about the local weather but we decided that we would just use DC voltage output from NIST and use the weather data from NREL. We decided on this approach because the NREL dataset covers the whole globe. This way we can use any dataset which has the dc voltage output of solar panels and timestamps. The NREL dataset contains a number of significant weather features covering solar irradiance, surface weather conditions, and solar zenith angle.
The data was split into training and testing sets using the python library scikit-learn’s function train_test_split. We used the default setting of splitting 25% of the data off for making a test set.
In this project we are looking to use various weather features to predict the voltage output of solar panel arrays. Because of this we knew we would want some kind of regression model. We have a group of weather features but we are not certain which ones will be significant so we knew supervised machine learning would be the route we would take. The combination of regression and supervised learning made us decide that a neural network was the best fit for our analysis. Using a neural net means our model will be able to handle noise in the training data well and may provide a higher degree of accuracy than standard linear regression if we are able to build the model properly. A limitation of neural nets is that the path to optimizing the model is unclear and is mostly dependent on trial and error. This is the main limitation of neural nets. Although they are able to approximate nearly any function it is very difficult to use the model to actually give you the function that is being approximated. This means that we cannot distil our model into a function and we are reliant on the model to provide us with the functions output. Despite these drawbacks we still believe that a neural net is the right tool for our analysis.
Explanation of changes in model choice (if changes occurred between the Segment 2 and Segment 3 deliverables)
In an effort to circumvent the “black box” problem of neural nets where the significance of features is unintelligible we ran our data through other models to get a sense of the importance of the features. Using the random forest model we were able to clearly quantify the correlations between our features and the target. We found that in the random forest model the feature Solar Zenith Angle was by far the most important. The random forest helped us to discover what our seaborn charts and Tableau graphs were visually informing us.
We built a MLPRegressor model based on its success with accurate predictions for vehicle movement in a chaotic environment. We were able to train the model to an R2 test score of 0.958. Incorporating all the database features, it gives our primary model credibility as this second model shows very accurate predictions produced from a different machine learning model.
After trying these two other models we decided to continue using a neural network as our main model.
Being able to look at other models brought us great value in terms of interpretability of our features that would be impossible to find using only our neural network model.
Description of how they have trained the model thus far, and any additional training that will take place
Initially our model was unable to make any predictions. We found, through experimentation, that we needed to add more neurons in order to make the model converge. Once we increased the number of neurons we were left with a highly accurate model with an R2 score of 0.976. Continuing to experiment with different configurations of the model we found that adding more hidden layers and more neurons did not lead to a significant increase in R2 score or MSE loss while becoming much more computationally expensive.
The lowest MSE score at the end of our current run of test phases is 617.92. We found the loss to be a decent figure in comparison to the starting point of approximately 2,300 and initially over 4,000 in our first few run trials. To lower our losses, we have optimized our model through the Keras Optimization feature, trying a variety of additional layers as well as adding more and less neurons. We have also tweaked our optimization technique from “Adam” to “Adamax” and found our losses to have dropped by a considerable margin. After a few trial runs, and seeing our model output a loss of 617.92 at only 50 epochs, we decided to run our model into 250 epochs and saw a decrease in MSE. However, this did lower our R2 score as well. At the moment, we are analyzing an acceptable impact of our loss ratio versus our high R2 score.
Database interfaces with the project in some format (e.g., scraping updates the database, or database connects to the model)
- Tableau
- JavaScript/CSS
- SQL
- Buttons click through to detail data
- Charts allow filtering
- This is our landing page on the web. Our main dashboard shows the data analyis.
- First, our machine learning dashboard shows key features modeled against the target feature. This graph illustrates the relationships between key features and the target, Voltage Output. We used this modeling to guide our testing of the machine learning models.
- Second, our MLM's predictions are modeled against the actual values in the bar and line chart. The orange line represents the predictions and the blue bars are the actual values.
- Filtering is available on multiple features. In addition, we set up the first line chart as a filter for the other charts and graphs. This way you can focus in on a particular month or point of data.