About _epoch_train and _epoch_val #7

fireholder · 2019-08-05T01:52:15Z

When i was traning, I've met a problem that the progress came to a standstill. And I've found that it was the function _epoch_train and _epoch_val stopped it, which raises NotImplementedError. I wonder why and how to fix it.

Ike-yang · 2019-08-05T08:39:28Z

hi, bro, I am trying to run the trainer.py, but I don't know about the argument "--load_model_path", there is nothing in the current folder, I am sure what kind of pretrain model need to load here, any advise?

fireholder · 2019-08-05T08:47:04Z

I think '--load_model_path' is only used when 'pretrained', but the log.txt shows error when not loading model files.

Ike-yang · 2019-08-05T10:18:25Z

Exactly, I got something in the logs.txt file like this :
Vocab Size:1173
[Load Model Failed] [Errno 2] No such file or directory: ''
[Load Model Failed] [Errno 21] Is a directory: '.'
[Load MLC Failed [Errno 21] Is a directory: '.'!]
[Load Co-attention Failed [Errno 21] Is a directory: '.'!]
[Load Sentence model Failed [Errno 21] Is a directory: '.'!]
[Load Word model Failed [Errno 21] Is a directory: '.'!]
Namespace(attention_version='v4', batch_size=16, caption_json='./data/new_data/.......

I thought program just stop here because of the error message.
So, I could just ignore the message, and keep training?
Are there other places need to be modified?

fireholder · 2019-08-05T11:09:37Z

I find that it's not stopped, it's just not printed.

Ike-yang · 2019-08-06T02:24:05Z

Yeah, I leave it to run all night, but I found val_loss is always 0 in logs.txt, there must something wrong and need to be modified

fireholder · 2019-08-06T02:34:24Z

Because in '_epoch_val' all val loss is set to 0, you can try uncomenting the code in '_epoch_val'. But I find my train loss very large, is it the same to you? By the way, have you tried the tester

Ike-yang · 2019-08-06T03:35:32Z

Yes, extremely large train loss. Haven't tried the tester yet

Ike-yang · 2019-08-07T04:26:31Z

I have tried tester.py, not working, someplace need to convert tensor.cpu(), have you run tester.py completely?

fireholder · 2019-08-07T08:29:13Z

Yes, just convert to tensor.cpu() as the error suggested.

fireholder · 2019-08-09T13:47:05Z

However , My test results are all the Same. All my predicted captions are the same

…

------------------ 原始邮件 ------------------ 发件人: "Ike-yang"<notifications@github.com>; 发送时间: 2019年8月7日(星期三) 中午12:26 收件人: "ZexinYan/Medical-Report-Generation"<Medical-Report-Generation@noreply.github.com>; 抄送: "横舟"<xuwenting33@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [ZexinYan/Medical-Report-Generation] About _epoch_train and_epoch_val (#7) I have tried tester.py, not working, someplace need to convert tensor.cpu(), have you run tester.py completely? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Cao-Shuang · 2019-08-10T00:32:15Z

I have the same caption too. Can you find the reason？------------------ 原始邮件 ------------------ 发件人: "xwt"<notifications@github.com> 发送时间: 2019年8月9日(星期五) 晚上9:47 收件人: "ZexinYan/Medical-Report-Generation"<Medical-Report-Generation@noreply.github.com>; 抄送: "Subscribed"<subscribed@noreply.github.com>; 主题: Re: [ZexinYan/Medical-Report-Generation] About _epoch_train and_epoch_val (#7) However , My test results are all the Same. All my predicted captions are the same

…

------------------ 原始邮件 ------------------ 发件人: "Ike-yang"<notifications@github.com>; 发送时间: 2019年8月7日(星期三) 中午12:26 收件人: "ZexinYan/Medical-Report-Generation"<Medical-Report-Generation@noreply.github.com>; 抄送: "横舟"<xuwenting33@qq.com>; "Author"<author@noreply.github.com>; 主题: Re: [ZexinYan/Medical-Report-Generation] About _epoch_train and_epoch_val (#7) I have tried tester.py, not working, someplace need to convert tensor.cpu(), have you run tester.py completely? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

fireholder · 2019-08-10T01:55:35Z

not yet

ShivamPanchal · 2019-09-08T17:29:35Z

When I run
python tester.py

FileNotFoundError: [Errno 2] No such file or directory: './data/new_data/debug_vocab.pkl'

CinKKKyo · 2019-11-11T13:52:19Z

Did u guys met the problem like"

WARNING:tensorflow:From /content/drive/Shared drives/shared drive-zma/ACL18/utils/logger.py:15: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

Traceback (most recent call last):
File "/content/drive/Shared drives/shared drive-zma/ACL18/trainer.py", line 662, in
debugger.train()
File "/content/drive/Shared drives/shared drive-zma/ACL18/trainer.py", line 60, in train
train_tag_loss, train_stop_loss, train_word_loss, train_loss = self._epoch_train() #???
File "/content/drive/Shared drives/shared drive-zma/ACL18/trainer.py", line 402, in _epoch_train
batch_tag_loss = self.mse_criterion(tags, self._to_var(label, requires_grad=False)).sum() # ???
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 431, in forward
return F.mse_loss(input, target, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2203, in mse_loss
expanded_input, expanded_target = torch.broadcast_tensors(input, target)
File "/usr/local/lib/python3.6/dist-packages/torch/functional.py", line 52, in broadcast_tensors
return torch._C._VariableFunctions.broadcast_tensors(tensors)

RuntimeError: The size of tensor a (210) must match the size of tensor b (0) at non-singleton dimension 1
"
it's really make me confused, anyone could do me a favor? Thx!

mfilipav · 2019-12-03T11:39:27Z

However , My test results are all the Same. All my predicted captions are the same
…
------------------ 原始邮件 ------------------ 发件人: "Ike-yang"notifications@github.com; 发送时间: 2019年8月7日(星期三) 中午12:26 收件人: "ZexinYan/Medical-Report-Generation"Medical-Report-Generation@noreply.github.com; 抄送: "横舟"xuwenting33@qq.com; "Author"author@noreply.github.com; 主题: Re: [ZexinYan/Medical-Report-Generation] About _epoch_train and_epoch_val (#7) I have tried tester.py, not working, someplace need to convert tensor.cpu(), have you run tester.py completely? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

Hi @fireholder! Did you eventually give up trying to solve the issue? were all the predicted captions always identical?

yangyan22 · 2020-04-26T04:33:01Z

My train loss is also very large. And all my predicted captions are the same: "No acute cardiopulmonary abnormality", could anyone do me a favor? Thx! Is it because of Python2 and Python3, since I used python3.

AnkitMalviya · 2020-05-17T00:33:33Z

Yes, extremely large train loss. Haven't tried the tester yet

Hi, you were able to decrease the loss. I am also facing the same issue.

AnkitMalviya · 2020-05-17T00:35:01Z

I have the same caption too. Can you find the reason？------------------ 原始邮件 ------------------ 发件人: "xwt"notifications@github.com 发送时间: 2019年8月9日(星期五) 晚上9:47 收件人: "ZexinYan/Medical-Report-Generation"Medical-Report-Generation@noreply.github.com; 抄送: "Subscribed"subscribed@noreply.github.com; 主题: Re: [ZexinYan/Medical-Report-Generation] About _epoch_train and_epoch_val (#7) However , My test results are all the Same. All my predicted captions are the same
…
------------------ 原始邮件 ------------------ 发件人: "Ike-yang"notifications@github.com; 发送时间: 2019年8月7日(星期三) 中午12:26 收件人: "ZexinYan/Medical-Report-Generation"Medical-Report-Generation@noreply.github.com; 抄送: "横舟"xuwenting33@qq.com; "Author"author@noreply.github.com; 主题: Re: [ZexinYan/Medical-Report-Generation] About _epoch_train and_epoch_val (#7) I have tried tester.py, not working, someplace need to convert tensor.cpu(), have you run tester.py completely? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

I am also facing the same issue. Are you able to solve this?

Alsalivan · 2020-09-04T14:46:28Z

My train loss is also very large. And all my predicted captions are the same: "No acute cardiopulmonary abnormality", could anyone do me a favor? Thx! Is it because of Python2 and Python3, since I used python3.

I guess train loss is large, because author uses MSELoss for predicting tags. If he has 156 different tags, then the exponent ~ (156-0)^2 = 24336. That is why so big loss

You can change it L1Loss or decrease lambda argument for tags loss (if you find it reasonable).

Hareem1997 · 2020-09-20T08:40:22Z

In debugger.py and tester.py file of the given project. I'm facing an error at 3rd last line in the following given section of code.
` tag_loss += self.args.lambda_tag * batch_tag_loss.data
stop_loss += self.args.lambda_stop * batch_stop_loss.data
word_loss += self.args.lambda_word * batch_word_loss.data
loss += batch_loss.data

return tag_loss, stop_loss, word_loss, loss`

Error is :
File "D:/Hareem/Auto_report/debugger.py", line 61, in train train_tag_loss, train_stop_loss, train_word_loss, train_loss = self._epoch_train() File "D:/Hareem/Auto_report/debugger.py", line 424, in _epoch_train word_loss += self.args.lambda_word * batch_word_loss.data AttributeError: 'int' object has no attribute 'data'

domyown · 2021-12-08T12:23:36Z

Is there anybody who solve the problem predicting captions all the same?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About _epoch_train and _epoch_val #7

About _epoch_train and _epoch_val #7

fireholder commented Aug 5, 2019

Ike-yang commented Aug 5, 2019

fireholder commented Aug 5, 2019

Ike-yang commented Aug 5, 2019

fireholder commented Aug 5, 2019

Ike-yang commented Aug 6, 2019

fireholder commented Aug 6, 2019

Ike-yang commented Aug 6, 2019

Ike-yang commented Aug 7, 2019

fireholder commented Aug 7, 2019

fireholder commented Aug 9, 2019 via email

Cao-Shuang commented Aug 10, 2019 via email

fireholder commented Aug 10, 2019

ShivamPanchal commented Sep 8, 2019 •

edited

Loading

CinKKKyo commented Nov 11, 2019

mfilipav commented Dec 3, 2019

yangyan22 commented Apr 26, 2020

AnkitMalviya commented May 17, 2020

AnkitMalviya commented May 17, 2020

Alsalivan commented Sep 4, 2020

Hareem1997 commented Sep 20, 2020

domyown commented Dec 8, 2021

About _epoch_train and _epoch_val #7

About _epoch_train and _epoch_val #7

Comments

fireholder commented Aug 5, 2019

Ike-yang commented Aug 5, 2019

fireholder commented Aug 5, 2019

Ike-yang commented Aug 5, 2019

fireholder commented Aug 5, 2019

Ike-yang commented Aug 6, 2019

fireholder commented Aug 6, 2019

Ike-yang commented Aug 6, 2019

Ike-yang commented Aug 7, 2019

fireholder commented Aug 7, 2019

fireholder commented Aug 9, 2019 via email

Cao-Shuang commented Aug 10, 2019 via email

fireholder commented Aug 10, 2019

ShivamPanchal commented Sep 8, 2019 • edited Loading

CinKKKyo commented Nov 11, 2019

mfilipav commented Dec 3, 2019

yangyan22 commented Apr 26, 2020

AnkitMalviya commented May 17, 2020

AnkitMalviya commented May 17, 2020

Alsalivan commented Sep 4, 2020

Hareem1997 commented Sep 20, 2020

domyown commented Dec 8, 2021

ShivamPanchal commented Sep 8, 2019 •

edited

Loading