There are many ways to call SPA for automatic model training and testing. The simplest is to just input the path to training data (or perhaps two paths, one to training and the other to test data). SPA will automatically select a suitable cross-validation method and model architecture, then automatically cross-validate a combination of standard hyperparameters for that model architecture. It is possible to customize the process; some examples are shown below, but checking the documentation of main_SPA()
(in the SPA.py file) is a good idea.
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv')
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', cv_method = 'KFold')
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', model_name = ['LCEN'])
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', model_name = ['LCEN', 'EN', 'PLS'])
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', model_name = ['MLP'])
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', dynamic_model = True)
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', model_name = ['LCEN'], l1_ratio = [0, 0.5, 0.99])
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', cv_method = 'KFold', model_name = ['LCEN'], degree = list(range(1, 6)))
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', cv_method = 'KFold', model_name = ['MLP'], activation = ['relu', 'tanh'], weight_decay = 1e-2)
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', cv_method = 'KFold', lag = [0, 1, 2], model_name = ['LCEN'])
import SPA; _ = SPA.main_SPA('poly_1000x5-data_1to10-range_1-degree_123456789-seed_(0,0)-noise.csv', plot_interrogation = True)
First, split Concrete_data.csv into cross-validation and test sets (for example, with sklearn's train_test_split
)
import SPA; _ = SPA.main_SPA('Concrete_data_train.csv', test_data = 'Concrete_data_test.csv', cv_method = 'KFold', model_name = ['LCEN'], degree = [4], LCEN_cutoff = 4e-2)
import SPA; _ = SPA.main_SPA('Concrete_data_train.csv', test_data = 'Concrete_data_test.csv', cv_method = 'KFold', model_name = ['MLP'], learning_rate = [0.001, 0.005, 0.01], activation = ['relu', 'tanhshrink'], scheduler = 'cosine')
Note that most of the examples above do not pass a path to testing data (using the test_data
keyword). In reality, it is essential to have independent testing data to properly evaluate a model. SPA properly isolates the testing set from the models while cross-validation occurs, and reports training and testing results in the output files.
Sample artificial datasets may be created with the create_random_data.py file. A comparison between the LCEN and the ALVEN algorithms using artificial data is available in the SPA_paper_comparison.py file.