Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train model & Tensorflow version #6

Open
alex-df opened this issue Feb 11, 2018 · 9 comments
Open

Train model & Tensorflow version #6

alex-df opened this issue Feb 11, 2018 · 9 comments

Comments

@alex-df
Copy link

alex-df commented Feb 11, 2018

I get an error while training the model:

lib/data_utils.py", line 160, in numpy_fillna
out[mask] = np.concatenate(data)
ValueError: all the input arrays must have same number of dimensions

By the way can you tell us which version of Tensorflow you use?

Thanks for your great work!

@charlesashby
Copy link
Owner

Hello Alex! I could not replicate the error... I added a requirements.txt file that worked for running the pretrained model, I hope that can help you! Note that I did not use the gpu version for that, so you might want to change that if you're going to train a model

@alex-df
Copy link
Author

alex-df commented Feb 11, 2018

Hi Charles, thank you so much for your fast answer, better with a requirement.txt :)

This is the error I get when using pre-trained data, I tried from my mac and from my RHEL server, same error.

Using model: lstm
Training: False
processing sentence: test sentence
Loading model /tmp/checkpoints/lstm...
Traceback (most recent call last):
  File "main.py", line 25, in <module>
    network.predict_sentences(args.sentences)
  File "/home/ec2-user/bitmood-core/SentimentAnalysis/DeepLearning/lib_model/char_lstm.py", line 292, in predict_sentences
    saver.restore(sess, SAVE_PATH)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1560, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key LSTM/rnn/basic_lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2_4 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_4/tensor_names, save/RestoreV2_4/shape_and_slices)]]

Caused by op u'save/RestoreV2_4', defined at:
  File "main.py", line 25, in <module>
    network.predict_sentences(args.sentences)
  File "/home/ec2-user/bitmood-core/SentimentAnalysis/DeepLearning/lib_model/char_lstm.py", line 288, in predict_sentences
    saver = tf.train.Saver()
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1140, in __init__
    self.build()
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 1172, in build
    filename=self._filename)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 688, in build
    restore_sequentially, reshape)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_io_ops.py", line 663, in restore_v2
    dtypes=dtypes, name=name)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/ec2-user/Project1/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

NotFoundError (see above for traceback): Key LSTM/rnn/basic_lstm_cell/bias not found in checkpoint
	 [[Node: save/RestoreV2_4 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_4/tensor_names, save/RestoreV2_4/shape_and_slices)]]

@charlesashby
Copy link
Owner

Hello Alex, this is weird I get the same error, I can confirm the checkpoints files were working when I uploaded them so something must be wrong with the requirements.txt file (I uploaded the file I found on my server, but I did not play with this model in a while...) my guess is the tensorflow version is not the right one if I have some time this week I'll try to fix it but meanwhile, you can try the different versions of tensorflow and see if you can get it to work (see https://pypi.python.org/pypi/yolk/0.4.3).

@alex-df
Copy link
Author

alex-df commented Feb 11, 2018

Thanks, I will also make some test in the upcoming week.

Fyi this is the error I get after training the model for a few minutes:

....
Only a week left at apple  gonna miss ya allll!:0
"#musicmonday I'm gonna choose two, just cos I can  Don't Stop Believin' by Journey"
@rose_janice is it my turn on the dishes today? 
"It's Friday night, done work and going out with the girls!  life is fabulous ."
"Hurt my lip last night, it's swollen and slightly infected. "
Traceback (most recent call last):
  File "main.py", line 23, in <module>
    network.train()
  File "/home/ec2-user/Project1/SentimentAnalysis/DeepLearning/lib_model/char_lstm.py", line 173, in train
    for minibatch in reader.iterate_minibatch(BATCH_SIZE, dataset=TRAIN_SET):
  File "/home/ec2-user/Project1/SentimentAnalysis/DeepLearning/lib/data_utils.py", line 194, in iterate_minibatch
    inputs, targets = self.make_minibatch(self.data)
  File "/home/ec2-user/Project1/SentimentAnalysis/DeepLearning/lib/data_utils.py", line 164, in make_minibatch
    minibatch_x = numpy_fillna(minibatch_x)
  File "/home/ec2-user/Project1/SentimentAnalysis/DeepLearning/lib/data_utils.py", line 160, in numpy_fillna
    out[mask] = np.concatenate(data)
ValueError: all the input arrays must have same number of dimensions

@charlesashby
Copy link
Owner

Yeah, it has to be a problem with the requirements.txt file, I can't believe I forgot to add it initially, I'll try to fix it this week, thanks for the heads up! By the way, if you manage to find the right packages before I do, please let me know hah!

@alex-df
Copy link
Author

alex-df commented Feb 14, 2018

I made some tests and it seems that the Stanford dataset is faulty. I took a small batch out of it, converted to UTF-8 from another file (with BBEdit) and then launched the training, with a few modifications as the batch was smaller. It completed and I was able to make sentiment analysis using --sentences.

Now I made the same thing with the full dataset and the training is running. I'll let you know if it works.

@charlesashby
Copy link
Owner

Hello Alex! I fixed the issue, you can clone the new version now. Keep in mind that I only tested it using tensorflow-gpu==1.1.0. I also changed some stuff in data_utils.py to fix the problem when loading the datasets

@andresiggesjo
Copy link

andresiggesjo commented Mar 3, 2018

Hello, i am also getting this error. I am using tensorflow cpu i don't know if that's the problem or if its the dataset that causes this.

out[mask] = np.concatenate(data)
ValueError: all the input arrays must have same number of dimensions

@charlesashby
Copy link
Owner

@andresiggesjo Hey, can you open a new issue with the complete output of the error you're getting? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants