You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.
the next infer:
0 13:1 28:2 66:1 110:1 135:2 151:1 181:1 235:1 240:1 261:1 284:1 317:1 353:1 355:1 360:2
there is not same when every infer
but, when the topic is few, there is no problem
0 1:8 8:6 15:1
0 1:9 7:1 8:4 15:1
That makes me confused
@feiga
Thank you very much. you are right! I fixed two things in my latest commit
To make doc_topic_counter intact, infering slice by slice per interation as you metioned above
When sampling at inference phase, the word related term of Pi, i.e., n_sw_beta, n_s_beta_sum, n_tw_beta and n_t_beta_sum, SHOULD BE FIXED, which was ignored by our previous discussion.
After doing so, the result gets much better, here is the first 2 documents
============training phase=============
0 260:1 549:2 778:1 1178:2 1309:1 1789:1 1843:2 2131:2 2390:3 2886:1
1 93:1 140:1 204:1 278:4 320:2 404:1 814:1 856:1 1164:2 1496:1 1627:4 1629:1 2059:1 2122:1 2177:1 2430:1 2686:1 2818:1 2880:1
==============inference phase=========
0 47:1 559:1 778:1 1178:2 1345:2 1843:1 2131:4 2390:3 2886:1
1 93:1 204:1 278:4 320:2 404:2 600:1 711:1 856:1 1164:2 1461:1 1496:1 1627:4 2059:1 2122:1 2144:1 2430:1 2518:1 2818:1
I think it is almost correct
However, I think there are some defects in current logic. First of all, It is unnecessary to re-build alias table per slice/block/iteration. On the other hand, it's unnecessary to build alias table for every words in the big vocab of training phase. Maybe it's better to limit user's input to just one block, and generate just one slice for block without vocab spliting . How do you think it?
The same sentence, the result likes this:
first infer:
0 11:1 18:1 32:1 63:1 69:1 75:1 91:2 110:1 172:1 174:2 218:1 269:1 347:2 359:2
the next infer:
0 13:1 28:2 66:1 110:1 135:2 151:1 181:1 235:1 240:1 261:1 284:1 317:1 353:1 355:1 360:2
there is not same when every infer
but, when the topic is few, there is no problem
0 1:8 8:6 15:1
0 1:9 7:1 8:4 15:1
That makes me confused
@feiga
Thank you very much. you are right! I fixed two things in my latest commit
After doing so, the result gets much better, here is the first 2 documents
============training phase=============
0 260:1 549:2 778:1 1178:2 1309:1 1789:1 1843:2 2131:2 2390:3 2886:1
1 93:1 140:1 204:1 278:4 320:2 404:1 814:1 856:1 1164:2 1496:1 1627:4 1629:1 2059:1 2122:1 2177:1 2430:1 2686:1 2818:1 2880:1
==============inference phase=========
0 47:1 559:1 778:1 1178:2 1345:2 1843:1 2131:4 2390:3 2886:1
1 93:1 204:1 278:4 320:2 404:2 600:1 711:1 856:1 1164:2 1461:1 1496:1 1627:4 2059:1 2122:1 2144:1 2430:1 2518:1 2818:1
I think it is almost correct
However, I think there are some defects in current logic. First of all, It is unnecessary to re-build alias table per slice/block/iteration. On the other hand, it's unnecessary to build alias table for every words in the big vocab of training phase. Maybe it's better to limit user's input to just one block, and generate just one slice for block without vocab spliting . How do you think it?
Thanks
Originally posted by @hiyijian in #14 (comment)
The text was updated successfully, but these errors were encountered: