Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When trying to train for BiLSTM ValueError: all the input arrays must have same number of dimensions from line 162 of data_utils.py #10

Open
monajalal opened this issue Apr 20, 2018 · 2 comments

Comments

@monajalal
Copy link

monajalal commented Apr 20, 2018

When I run:
[jalal@goku CharLSTM]$ python main.py bidirectional_lstm --train
I get the following error:

@buddhaqueen077    this is just not your day
@brwneyedbabe83  We got a last minute invite........alas not kid sitters 
Traceback (most recent call last):
  File "main.py", line 33, in <module>
    network.train()
  File "/scratch2/debate_tweets/sentiment/CharLSTM/lib_model/bidirectional_lstm.py", line 170, in train
    for minibatch in reader.iterate_minibatch(BATCH_SIZE, dataset=TRAIN_SET):
  File "/scratch2/debate_tweets/sentiment/CharLSTM/lib/data_utils.py", line 196, in iterate_minibatch
    inputs, targets = self.make_minibatch(self.data)
  File "/scratch2/debate_tweets/sentiment/CharLSTM/lib/data_utils.py", line 166, in make_minibatch
    minibatch_x = numpy_fillna(minibatch_x)
  File "/scratch2/debate_tweets/sentiment/CharLSTM/lib/data_utils.py", line 162, in numpy_fillna
    out[mask] = np.concatenate(data)
ValueError: all the input arrays must have same number of dimensions


How should this be fixed? I saw @andresiggesjo question #8 and I was expecting the current repo to have the fix for it. Can you please guide?

@RyanOngAI
Copy link

Hi monajalal,

Not sure if you are still working on this but I believe the issue is due to the training dataset rather than the code itself. The code should work fine. After cleaning the data as suggested in question #8 , I realised that there are more NaN (on the text side), which I believe is what causing the error message above. There should be 5 more NaN on the text side after the cleaning process suggested by #8

@RyanOngAI
Copy link

And also any texts that contain only weird symbols that's not readable (which is equivalent to NaN also)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants