- Copyright (c) 2019-2021 Sherman Lo
- MIT LICENSE
Concept code for predicting precipitation using model fields (temperature, geopotential, wind velocity, etc.) as predictors for sub-areas across the British Isle. A Bayesian inference was used to quantify uncertainty in the forecasting.
The documentation is designed for Linux users only.
Please see LICENCE for further information on how you can use and modify this repository for your own purpose.
Keywords: Compound Poisson, MCMC sampling, time series, precipitation, weather forecasting
- At least 16 GB of RAM
numpy
pandas
scipy
matplotlib
statsmodels
joblib
cartopy
netCDF4
cftime
Must be version 1.0.4.2 because future releases change the use of num2date, see the change log for further details.
gdal
mpi4py
- Requires MPICH
The data is available on FigShare. Please download and extract the files so that the directories are as follow:
Data/ana/
Data/eobs/
Data/era5/
Data/topo/
All code which reads and processes the data are stored in the package dataset
. All statistical code are in the package compound_poisson
. Scripts to reproduce results or to play with a toy model are in the directory script
.
The packages dataset
and compound_poisson
will need to be import
-ed and accessible in order to run scripts. This can be done, for example, by adding to the ~/.bashrc
file
PYTHONPATH=[path to repository]:$PYTHONPATH
export PYTHONPATH
where [path to repository]
is to be replaced with the path to a local copy of this repository. This makes the repository accessible by Python on start up of the computer.
It is recommended to run scripts from the location of the scripts, eg
cd script
cd cardiff_5_20
python3 hyper_slice.py
so that all relevant results and figures are stored in script/cardiff_5_20/
.
Please see README.md
files in the packages for further information.
script/london
- Training set: 1980-1989 inclusive
- Test set: 1990-1999 inclusive
script/cardiff_1_20
- Training set: 1999
- Test set: 2000-2019 inclusive
script/cardiff_5_20
- Training set: 1995-1999 inclusive
- Test set: 2000-2019 inclusive
script/cardiff_10_20
- Training set: 1990-1999 inclusive
- Test set: 2000-2019 inclusive
script/cardiff_20_20
- Training set: 1979-1999 inclusive
- Test set: 2000-2019 inclusive
To reproduce the results, run the script hyper_slice.py
followed by hyper_slice_forecast.py
. Figures are plotted and saved in the figure
directory.
Results are saved in the result
directory. When results from previous runs are detected, it will resume the run up to the provided number of samples. Delete the directory if you wish to restart the sampling process from the start.
Options are provided which may be useful for development, debugging or check-pointing purposes.
-
python3 hyper_slice.py [--sample [nsample]]
--sample nsample
: Run MCMC untilnsample
posterior samples are obtained.
-
python3 hyper_slice_forecast.py [--sample [nsample]] [--burnin [burnin]] [--noprint]
--sample nsample
: Run posterior predictions untilnsample
predictive samples are obtained.--burnin burnin
: Set the burn-in toburnin
. Otherwise, uses the default value.--noprint
: Do not print figures.
The following examples are provided:
python3 hyper_slice.py
- Does the default number of MCMC samples
- If MCMC samples are detected from a previous run, only print out figures
python3 hyper_slice.py --sample 10000
- Does 10000 MCMC samples
python3 hyper_slice.py --sample 10000
followed bypython3 hyper_slice.py --sample 20000
- Does 10000 MCMC samples, save the samples, then does 20000 more MCMC samples
python3 hyper_slice_forecast.py
- Does the default number of forecast samples
- If forecast samples are detected from a previous run, only print out figures
python3 hyper_slice_forecast.py --sample 500
- Does 500 forecast samples
python3 hyper_slice_forecast.py --sample 500
followed bypython3 hyper_slice_forecast.py --sample 1000
- Does 500 forecast samples, save the samples, then does 1000 more forecast samples
python3 hyper_slice_forecast.py --sample 500 --burnin 100
- Does 500 forecast samples with a burn in of 100. If
--burnin
is not provided, the default burn in is used.
- Does 500 forecast samples with a burn in of 100. If
script/cardiff_era5/era5.py
- IFS prediction on the test set: 2000-2019 inclusive
script/cardiff_plot/plot.py
- Compare the AUC, bias loss and residuals of the MCMC forecast with IFS.
script/cardiff_plot/plot_dist.py
- Compare the distribution of the MCMC forecast with IFS.
script/isle_of_man
- Training set: 1990-1999 inclusive
- Test set: 2000-2009 inclusive
script/wales_5_20
- Training set: 1995-1999 inclusive
- Test set: 2000-2019 inclusive
To reproduce the results, run the script multiseries.py
to do MCMC sampling. 20,000 MCMC samples are required to reproduce the results. This is programmed to be the default but should be noted if one wants to checkpoint during a run. Afterwards, run the script multiseries_forecast.py
to do forecasting. Figures are plotted and saved in the figure
directory.
The scripts multiseriesgp.py
and multiseriesgp_forecast.gp
are versions with GP smoothing when forecasting.
Results are saved in the result
directory. When results from previous runs are detected, it will resume the run up to the provided number of samples. Delete the directory if you wish to restart the sampling process from the start.
These scripts uses multiple threads so using a multi-core processor(s) is recommended. By default, multiprocessing.Pool
is used. Changing what parallel computation package to use can be done by modifying the code. For example, one can modify
pool = multiprocess.Pool
to
pool = multiprocess.MPIPoolExecutor
so that mpi4py.futures.MPIPoolExecutor
is used. In addition, the module -m mpi4py.futures
and the script should be run using MPI, for example mpiexec
or srun
.
Options are provided which may be useful for development, debugging or check-pointing purposes.
-
[mpiexec] python3 [-m mpi4py.futures] multiseries.py [--sample [nsample]]
--sample nsample
: Run MCMC untilnsample
posterior samples are obtained.[mpiexec]
and-m mpi4py.futures
required only if usingMPIPoolExecutor
-
[mpiexec] python3 [-m mpi4py.futures] multiseries_forecast.py [--sample [nsample]] [--burnin [burnin]] [--noprint]
--sample nsample
: Run posterior predictions untilnsample
predictive samples are obtained.--burnin burnin
: Set the burn-in toburnin
. Otherwise, uses the default value.--noprint
: Do not print figures.[mpiexec]
and-m mpi4py.futures
required only if usingMPIPoolExecutor
The following examples are provided:
python3 multiseries.py
- Does the default number of MCMC samples
- If MCMC samples are detected from a previous run, only print out figures
python3 multiseries.py --sample 10000
- Does 10000 MCMC samples
python3 multiseries.py --sample 10000
followed bypython3 multiseries.py --sample 20000
- Does 10000 MCMC samples, save the samples, then does 20000 more MCMC samples
python3 multiseries_forecast.py
- Does the default number of forecast samples
- If forecast samples are detected from a previous run, only print out figures
python3 multiseries_forecast.py --sample 500
- Does 500 forecast samples
python3 multiseries_forecast.py --sample 500
followed bypython3 multiseries_forecast.py --sample 1000
- Does 500 forecast samples, save the samples, then does 1000 more forecast samples
python3 multiseries_forecast.py --sample 500 --burnin 100
- Does 500 forecast samples with a burn in of 100. If
--burnin
is not provided, the default burn in is used.
- Does 500 forecast samples with a burn in of 100. If
script/wales_era5/era5.py
- IFS prediction on the test set: 2000-2019 inclusive
script/wales_5_20/multiseries_compare.py
- Compare the AUC, bias loss and residuals of the MCMC forecast with IFS.
- Compare the distribution of the MCMC forecast with IFS.
- Compare the cross correlation of the MCMC and IFS forecast as well as the observed.
- Please see the packages
compound_poisson
anddataset
for further documentations.
- The code shall be lightly maintained with more small examples added.
- Legacy copy where the parameters have a Gaussain process prior are left in here (see for example
compound_poisson/downscale.py
). - The author has a background in Java so there is a frequent use of classes and inheritance structure.
- The author attempted to keep to the Google Python style guide. There are a few omission such as underscores for denoting private and protected methods, variables and classes.