Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Meanings about local state, global inputs, global state #4

Open
LeiBAI opened this issue Nov 17, 2018 · 3 comments
Open

Meanings about local state, global inputs, global state #4

LeiBAI opened this issue Nov 17, 2018 · 3 comments

Comments

@LeiBAI
Copy link

LeiBAI commented Nov 17, 2018

Hi yoshall,

Thanks for you contribution. I have some question about the code and work. I hope you could give me a hand.

Part 1: According to the sample_data, I think there are 35 nodes, each node generate 19 time series. However, I am a little confusing about the meaning of all input files:

  1. I notice that the code reads 7 files and process them to generate inputs for training. While the mearning for local_inputs, external_inputs and decoder_gts are obvious, I don't know what is global_atten_state and local_atten_state, why not generate them from raw data (for example, local inputs and global inputs)?
  2. What is global_inputs.npy? Why it's shape is 500*35?
  3. Is global_attn_state_indics the same as global_inputs_indics? I noticed that in get_batch_feed_dict() function, train_global_inp = training_data[1] which in fact is global_attn_index.

Part 2: Besides I think the model generate prediction for each node separately. Do you train the model for each node separately or train a unified model?

Part 3: a suggestion: I hope you could add some explanation to the input files as they are different to raw inputs and maybe also publish the code process your raw data (http://urban-computing.com/data/Data-1.zip).

Looking forward to your answer and thanks for you patient.

@CastleLiang
Copy link
Contributor

Hi LeiBAI,

Since I have answered the same question from a phd student, I will simply copy the answer as follows (if you are not proficient in Chinese, I will refine the answer into English, sorry about that):

For Q1 and Q2:

  1. 首先,要说明这些输入量分别代表的什么,这个我在代码当中都有注释。global input和global attn state都是global spatial attention的输入,代表论文中方程3和4之间那个无标号的方程中的变量。Global input的第一个维度不是batch_size, 而是时间,他是一个全局的概念。比如在我们的代码当中,global input的维度是(28752, 35),第一个维度是时间有28752片(3年多),第二个维度是35个sensor。因为我们的数据暂时还是不能release,如果我给出所有的global input,数据就会全部泄露。在你看到的sample_data当中,我只给出了前500个时间片的数据,这也是导致数据越界的原因。
  2. 其次,我们为什么要定义一个global_input和global_attn。你会发现针对train, validation和test set的global input那些都是一样的。为每个样本都存储一个global_input和global_attn是比较占用空间的,所以我们使用了一个全局索引的方法去节省空间。
  3. global_inputs[j: j + n_steps_encoder, :] (165行),其中j的值是取的global_attn_index(这里应该是global_input_index)。为什么取global_inputs的值时是[j : j +n_steps_encoder]?结合2当中我所写的,对于每一个sample来说,因为我的encoder length是12,我只要使用Index去全局的global input里头去索引那12片的target series数据即可。

For Q3: A nice suggestion. Since I am no longer in my previous company, I will ask my colleagues for help to publish the codes.

Thanks for your above questions!

@LeiBAI
Copy link
Author

LeiBAI commented Nov 23, 2018

@CastleLiang Hi Yuxuan,

Thanks for you patient. According to your reply and my understanding, I got following conclusion:

  • local_inputs.npy: shape is (100, 12, 19), this is the data of target sensor, 100 means there are 100 samples, 12 is the encoder time steps, 19 is the N_l in your paper, which means one sensor generate 19 kinds of time series; Local inputs are x^{i,k} in the paper.
  • global_input_indics.npy: shape is (100,), this is the indics of input samples in the global data;
  • global_attn_state_indics.npy: shape is (100,), in fact this one is the same with global_input_indics;
  • external_inputs.npy: shape is (100, 6, 83), 6 is the decoder time steps, 83 is the number of external features;
  • decoder_gts.npy: shape is (100, 6), 6 is the decoder time steps, this is the target time series value;
  • global_inputs.npy: shape is (500, 35), where 35 is the number of tensors (N_g in the paper). In fact 500 is meaningless considering that only 100 local inputs, only 100(+12) samples are useful and these 100 samples are extracted by global_input_indics, to form a the true glabal inputs shaped (100, 12, 35). Global inputs are y^l in the paper.
  • global_attn_state.npy: shape is (500, 35, 19, 12). This is the all the sensed data by 35 sensors. 12 is the enrolled encoder time steps as above. Global attention states are X^l in the paper.

SO I think only global_attn_state.npy is enough for the input. All others can be generated by this file.

Is my above understanding correct?

Thanks again.
Lei

@mapleyuen96
Copy link

@LeiBAI Hi LeiBAI
I have also spent some time studying this project.Your words help me a lot.Thank you very much.
But I still wonder the process to the raw data.As the sample_data show,each sensor has provided 19 attributes.So what are these 19 attributes in the raw data (http://urban-computing.com/data/Data-1.zip)? And how to deal with these raw data?If you have some ideas,please help me.
Looking forward to your replying anda thank you again!
Maple

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants