You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The data leakage in this project is serious, I doubt how could those academic peer reviewed the paper...
train_size=round(len(dataset) *0.7)
print(f'Training Data Size: {train_size}')
train_data=dataset[0:train_size]
test_data=dataset[train_size:]
X_train=pd.DataFrame(train_data)
X_test=pd.DataFrame(test_data)
y_train=pd.DataFrame(train_data['Close'])
y_test=pd.DataFrame(test_data['Close'])
# Fit & Transform Features# Normalized the dataX_scaler=MinMaxScaler(feature_range=(-1, 1))
y_scaler=MinMaxScaler(feature_range=(-1, 1))
X_train=X_scaler.fit_transform(X_train)
y_train=y_scaler.fit_transform(y_train)
X_test=X_scaler.transform(X_test)
y_test=y_scaler.transform(y_test)
This snippet will generate the normalised data without data leakage.
After this correction, the scaler will not work probably which makes the model useless.
The price in testing period is way higher than training price, using common method like z-score or MinMax will not be useful.
So either it needs to use adaptive normalisation or changing the model target (y_value) to percentage delta change or trend classification.
Data Leakage when normalize the train data and test data together?
The text was updated successfully, but these errors were encountered: