Independent extension of the IB Data Science coursework referenced in my CV. There were three notebooks prior to this which were more about filling in gaps or plotting already processed data. Everything in this notebook is completely my own work (including the bullet point objective lists), and was completed of my own volition.
I was keen to have a go at fitting a model to a set of "training data" and then testing the goodness of fit on new data. This was done by purposefully choosing only the first 70-80% of a dataset on which to train the model, and then using the remaining data to test. The data used is atmospheric carbon dioxide concentration levels time series data.
Python libraries NumPy and matplotlib were used for processing and plotting data respectively. The training data was first fitted to a linear and a quadratic model. This was done by linear regression (least squares). Next, I looked at the residuals of these models (a residual being the difference between a datapoint in the model and the corresponding one in the raw data). I took a FFT (Fast Fourier Transform) of these residuals, which, in short, gives the frequencies of any periodic components in the data (the most obvious being the yearly fluctuations). Knowing what frequency of sinusoid to use, I then found the coefficients of the sinusoids by least squares, and so had models for the residuals. Making the combined model is as easy as adding the residual models to the linear/quadratic models. From here I extended the models to all the data (including the test data), and compared the two models.