Skip to content

Latest commit

 

History

History
38 lines (32 loc) · 4.42 KB

README.md

File metadata and controls

38 lines (32 loc) · 4.42 KB

Examples

There are many ways to call SPA for automatic model training and testing. The simplest is to just input the path to training data (or perhaps two paths, one to training and the other to test data). SPA will automatically select a suitable cross-validation method and model architecture, then automatically cross-validate a combination of standard hyperparameters for that model architecture. It is possible to customize the process; some examples are shown below, but checking the documentation of main_SPA() (in the SPA.py file) is a good idea.

Simplest use case; no testing data

import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv')

Manually select a CV method

import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', cv_method = 'KFold')

Manually select a model (or models)

import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', model_name = ['LCEN'])
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', model_name = ['LCEN', 'EN', 'PLS'])
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', model_name = ['MLP'])

Asking SPA to use dynamic models

import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', dynamic_model = True)

Restrict what values will be tested for some hyperparameter(s)

import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', model_name = ['LCEN'], l1_ratio = [0, 0.5, 0.99])
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', cv_method = 'KFold', model_name = ['LCEN'], degree = list(range(1, 6)))
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', cv_method = 'KFold', model_name = ['MLP'], activation = ['relu', 'tanh'], weight_decay = 1e-2)
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', cv_method = 'KFold', lag = [0, 1, 2], model_name = ['LCEN'])

Plotting the data interrogation results (relevant only when model_name is not passed to SPA)

import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', plot_interrogation = True)

Real example: investigating the Concrete Strength dataset of I. Yeh

First, split Concrete_data.csv into cross-validation and test sets (for example, with sklearn's train_test_split)
import SPA; _ = SPA.main_SPA('Concrete_data_train.csv', test_data = 'Concrete_data_test.csv', cv_method = 'KFold', model_name = ['LCEN'], degree = [4], LCEN_cutoff = 4e-2)
import SPA; _ = SPA.main_SPA('Concrete_data_train.csv', test_data = 'Concrete_data_test.csv', cv_method = 'KFold', model_name = ['MLP'], learning_rate = [0.001, 0.005, 0.01], activation = ['relu', 'tanhshrink'], scheduler = 'cosine')

Note that most of the examples above do not pass a path to testing data (using the test_data keyword). In reality, it is essential to have independent testing data to properly evaluate a model. SPA properly isolates the testing set from the models while cross-validation occurs, and reports training and testing results in the output files.

Sample artificial datasets may be created with the create_random_data.py file. A comparison between the LCEN and the ALVEN algorithms using artificial data is available in the SPA_paper_comparison.py file.

Sample datasets

Dataset name Source
poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv Artificial data generated by create_random_data.py
Kepler_3rd_law.csv J. Kepler, "The Harmony of the World", 1619
Kepler_3rd_law_modern.csv Wolfram Alpha
Concrete_data.csv I. Yeh, "Modeling of strength of high-performance concrete using artificial neural networks", 1998; link to data