Analyze power usage data from 3 submeters over a period of four years.
The premise behind the initial analysis of the data is that a law firm has contacted our data science team in order to determine whether their client was occupying a residence during an undisclosed time period during the summer of 2008. The client claims to not have been occupying the residence during this time period. The data was accessed from a SQL database, and contains the Date/Time information, as well as power usage for Sub_metering_1, Sub_metering_2, and Sub_metering_3. The meters take readings every minute, and the power usage is measured in watt-hours. The data spans from December 2006, to November 2010. Because the years 2006 and 2010 are incomplete, they were removed from the analysis, and the years 2007-2009 were analyzed. In this case, we were expected to do a preliminary statistical analysis in order to solve the problem. A report detailing the results is included, named "Report.pdf".
In the second part of the analysis, the same data was used but with a different premise. In this case, the client has provided us with the above data in the interest of learning how they use their power and in what areas they can reduce their power usage in order to become more environmentally friendly and reduce their carbon footprint. A deeper data visualization is performed using each submeter, and the month of September 2008 is analyzed thoroughly. Methods include using linear regression forecasting, decomposing the time series, and HoltWinters forecasting. Although the data originally contains an energy reading every minute, the granularity was reduced to every 30 min for clarity. A report detailing the results is included, named "Time Series Analysis.pdf".