-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recreating training model #24
Comments
Hi @bnpapas - Are you segmenting the fast5? |
I am following the instructions posted here: https://psy-fer.github.io/deeplexicon/train/ |
You may need to segment the data a priori, e.g. by running python3 deeplexicon.py dmux
This will split the signal to separate the barcodes from the RNA.
Then train on the segmented barcode output.
… On Mar 13, 2023, at 10:15 AM, bnpapas ***@***.***> wrote:
I am following the instructions posted here: https://psy-fer.github.io/deeplexicon/train/ <https://psy-fer.github.io/deeplexicon/train/>
I'm not sure which step would be segmentation?
—
Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDCR37TRXRHAKBGBBFT273W34TZTANCNFSM6AAAAAAVWZHVLA>.
You are receiving this because you are subscribed to this thread.
|
The goal here is to be able to train a new model with an eye towards possibly adding new barcodes - I won't be able to use dmux first in a real use case. The truth table files I've assembled are based on mapping information, as was done in the publication. The match between these truth tables and the dmux results from "resnet20-final.h5" is very good. Edit: To make sure it is clear, I am using the python version of the training code, which uses the "dRNA_segmenter" function to segment reads prior to image generation and subsequent training. |
When dmux is assigning barcodes, it uses the "classify" function. This function does a transform of the data:
The training subcommand, however, does not take this step and trains directly on the images. I've removed the transform from "classify" and now my freshly-trained models produce sensible results with dmux. I assume I can get similar behavior by adding the transform into the train subroutine. |
I think that was added (meant to be on both), to avoid a zero divide error to make it 1 indexed. Sorry been a while since I wrote that. |
would you mind sharing the code? I see |
I have been attempting to use the fast5 data provided with the manuscript to train a model to call the same 4 barcodes as "resnet20-final.h5". I've used mapping information to assign barcodes, and if I use the given model with deeplexicon the agreement with my truth table is excellent.
I've tried taking 40k reads from each barcode as a training set, with 10k from each as test and validation sets. The training runs, seemingly without issue, however it shows some behavior I don't understand.
Note: I have been using the docker image provided by pulling lpryszcz/deeplexicon:1.2.0-gpu, with "deeplexicon_multi.py train" having default options. Do you have any suggestions how I can improve the model training results?
The text was updated successfully, but these errors were encountered: