-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong predictions while testing new data #47
Comments
Hi @SivaNagendra-sn, Thanks for reporting your problem here. Did you change the identifiers when initializing the model and making inferences with it? The Madelon |
Hi @madelonhulsebos |
Hi @SivaNagendra-sn, To use the model retrained with the new paragraph vector files, the No changes should be made to the feature identifiers in the |
Yeah, I have actually done that. While training the Sherlock model I have mentioned the model_id as "retrained_sherlock". While working with predict function also we are mentioning model_id as "retrained_sherlock". For test data it is giving the results with good accuracy. But testing with new data( i.e, extracting features with 'extract_features' function then using predict function with model_id mentioned as retrained_sherlock also) the predictions were totally wrong |
I have retraining and prediction working on new data but if it's mostly text type of fields. For numeric fields and length of size 12 or more it does not work well.. prediction vector returned is null even though classification score and output for test data looks good . Do you have any suggetions? @madelonhulsebos |
OK, that should be alright then @SivaNagendra-sn. Is your training data formatted exactly as the original training data (as downloaded through the data download)? The feature extraction pipeline expects "stringified" lists. The input data may be wrong in your case as well, @iganand. |
I am getting null in prediction vector. Although classification report for that specific field looks to F1 score .87. What might be the reason |
I have trained a Sherlock model and it is performing well in test data. But, when I tried test the model and passing the data to model as per ' Sherlock-out-of-the-box' notebook, then it is giving wrong predictions ( even passing training data(in the same way) also results in wrong predictions). Any separate l approach need to be taken for testing the data ?
Note : I have created my own paragraph vector w.r.to data I have and using that for training Sherlock model as well.
The text was updated successfully, but these errors were encountered: