This repo provides the code to run simple benchmarks for signature features on the UCR/UEA Time Series Classification repository. (Work in progress)
Some of our benchmark results are uploaded in this Github
repository under results/
.
Currently, there are two Jupyter notebooks that analyze
the results:
-
compare_ensembles.ipynb
draws a comparison between two ensemble classifiers using signature features. -
result_analysis.ipynb
draws a comparison between the XGBoost classifier and the classical logistic regression on signature features.
This project uses the Python packages iisignature
, sklearn
, numpy
, pandas
, sktime
.
Furthermore, luigi
is used for orchestration of different tasks (slight overkill currently).
First clone the repo:
git clone https://github.com/zhy0/sig-tsc.git
Create a Python 3 virtual env and install the dependencies:
cd sig-tsc
python3 -m venv env
source env/bin/activate
# install numpy and cython separately, this is due to an issue with installing sktime
pip install numpy
pip install cython
# install rest of the dependencies
pip install -r requirements.txt
This will create the folder pipeline
under the
sig-tsc
folder which contains the results of the classification.
Enter the sig-tsc
folder and run the following commands:
# UCR datasets
wget http://www.timeseriesclassification.com/Downloads/Archives/Univariate2018_arff.zip
unzip Univariate2018_arff.zip
# UEA datasets (multivariate)
wget http://www.timeseriesclassification.com/Downloads/Archives/Multivariate2018_arff.zip
unzip Multivariate2018_arff.zip
Unzipping will create two folders, NewTSCProblems
and MultivariateTSCProblems
.
In tasks.py
, the location of these two folders can be changed. By default
they're assumed to be in the sig-tsc
folder.
To run a benchmark of a single dataset, you can use the luigi
command line:
python -m luigi --module tasks RunUnivariate --levels '[2,3,4]' --dataset ECG200 --model-type sklearn.svm.LinearSVC --sig-type sig --local-scheduler
See tasks.py
for the individual parameters.
(The local-scheduler
flag tells Luigi to use the local scheduler.)
Running benchmarks on all datasets can done using
python tasks.py
You can edit the contents of main()
to change the behavior.