Recreating training model #24

bnpapas · 2023-03-10T18:14:37Z

I have been attempting to use the fast5 data provided with the manuscript to train a model to call the same 4 barcodes as "resnet20-final.h5". I've used mapping information to assign barcodes, and if I use the given model with deeplexicon the agreement with my truth table is excellent.
I've tried taking 40k reads from each barcode as a training set, with 10k from each as test and validation sets. The training runs, seemingly without issue, however it shows some behavior I don't understand.

Even when the reported accuracy on the training set crosses 0.9, the validation accuracy hovers around 0.5. I've even completed a test where I've had my validation set be a subset of the training reads, and this still occurs.
Using the final model output by the training yields terrible results, even when used on the training set.

Note: I have been using the docker image provided by pulling lpryszcz/deeplexicon:1.2.0-gpu, with "deeplexicon_multi.py train" having default options. Do you have any suggestions how I can improve the model training results?

enovoa · 2023-03-10T21:11:33Z

Hi @bnpapas - Are you segmenting the fast5?

bnpapas · 2023-03-13T14:15:42Z

I am following the instructions posted here: https://psy-fer.github.io/deeplexicon/train/
I'm not sure which step would be segmentation?

noncodo · 2023-03-13T17:22:32Z

You may need to segment the data a priori, e.g. by running python3 deeplexicon.py dmux This will split the signal to separate the barcodes from the RNA. Then train on the segmented barcode output.

…

On Mar 13, 2023, at 10:15 AM, bnpapas ***@***.***> wrote: I am following the instructions posted here: https://psy-fer.github.io/deeplexicon/train/ <https://psy-fer.github.io/deeplexicon/train/> I'm not sure which step would be segmentation? — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABDCR37TRXRHAKBGBBFT273W34TZTANCNFSM6AAAAAAVWZHVLA>. You are receiving this because you are subscribed to this thread.

bnpapas · 2023-03-14T13:31:43Z

The goal here is to be able to train a new model with an eye towards possibly adding new barcodes - I won't be able to use dmux first in a real use case. The truth table files I've assembled are based on mapping information, as was done in the publication. The match between these truth tables and the dmux results from "resnet20-final.h5" is very good.

Edit: To make sure it is clear, I am using the python version of the training code, which uses the "dRNA_segmenter" function to segment reads prior to image generation and subsequent training.

bnpapas · 2023-05-26T13:40:21Z

When dmux is assigning barcodes, it uses the "classify" function. This function does a transform of the data:

  x = image.astype('float32') + 1
  x = x / 2

The training subcommand, however, does not take this step and trains directly on the images. I've removed the transform from "classify" and now my freshly-trained models produce sensible results with dmux. I assume I can get similar behavior by adding the transform into the train subroutine.
Is there a reason to think having this transformation is better than not?

Psy-Fer · 2023-05-26T13:44:00Z

I think that was added (meant to be on both), to avoid a zero divide error to make it 1 indexed. Sorry been a while since I wrote that.

fulaibaowang · 2023-08-09T11:34:30Z

You may need to segment the data a priori, e.g. by running python3 deeplexicon.py dmux This will split the signal to separate the barcodes from the RNA. Then train on the segmented barcode output.
…
On Mar 13, 2023, at 10:15 AM, bnpapas @.***> wrote: I am following the instructions posted here: https://psy-fer.github.io/deeplexicon/train/ https://psy-fer.github.io/deeplexicon/train/ I'm not sure which step would be segmentation? — Reply to this email directly, view it on GitHub <#24 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDCR37TRXRHAKBGBBFT273W34TZTANCNFSM6AAAAAAVWZHVLA. You are receiving this because you are subscribed to this thread.

would you mind sharing the code? I see deeplexicon_multi.py squig for getting the segmetation but how to would you "split the signal to separate the barcodes from the RNA"?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recreating training model #24

Recreating training model #24

bnpapas commented Mar 10, 2023

enovoa commented Mar 10, 2023

bnpapas commented Mar 13, 2023

noncodo commented Mar 13, 2023 via email

bnpapas commented Mar 14, 2023 •

edited

Loading

bnpapas commented May 26, 2023

Psy-Fer commented May 26, 2023

fulaibaowang commented Aug 9, 2023 •

edited

Loading

Recreating training model #24

Recreating training model #24

Comments

bnpapas commented Mar 10, 2023

enovoa commented Mar 10, 2023

bnpapas commented Mar 13, 2023

noncodo commented Mar 13, 2023 via email

bnpapas commented Mar 14, 2023 • edited Loading

bnpapas commented May 26, 2023

Psy-Fer commented May 26, 2023

fulaibaowang commented Aug 9, 2023 • edited Loading

bnpapas commented Mar 14, 2023 •

edited

Loading

fulaibaowang commented Aug 9, 2023 •

edited

Loading