Skip to content
This repository has been archived by the owner on Jan 26, 2021. It is now read-only.

The topics don't match when every infer.@feiga #83

Open
RyanPeking opened this issue Dec 20, 2020 · 0 comments
Open

The topics don't match when every infer.@feiga #83

RyanPeking opened this issue Dec 20, 2020 · 0 comments

Comments

@RyanPeking
Copy link

RyanPeking commented Dec 20, 2020

The same sentence, the result likes this:

first infer:
0 11:1 18:1 32:1 63:1 69:1 75:1 91:2 110:1 172:1 174:2 218:1 269:1 347:2 359:2

the next infer:
0 13:1 28:2 66:1 110:1 135:2 151:1 181:1 235:1 240:1 261:1 284:1 317:1 353:1 355:1 360:2
there is not same when every infer

but, when the topic is few, there is no problem
0 1:8 8:6 15:1
0 1:9 7:1 8:4 15:1

That makes me confused

@feiga
Thank you very much. you are right! I fixed two things in my latest commit

  1. To make doc_topic_counter intact, infering slice by slice per interation as you metioned above
  2. When sampling at inference phase, the word related term of Pi, i.e., n_sw_beta, n_s_beta_sum, n_tw_beta and n_t_beta_sum, SHOULD BE FIXED, which was ignored by our previous discussion.

After doing so, the result gets much better, here is the first 2 documents
============training phase=============
0 260:1 549:2 778:1 1178:2 1309:1 1789:1 1843:2 2131:2 2390:3 2886:1
1 93:1 140:1 204:1 278:4 320:2 404:1 814:1 856:1 1164:2 1496:1 1627:4 1629:1 2059:1 2122:1 2177:1 2430:1 2686:1 2818:1 2880:1
==============inference phase=========
0 47:1 559:1 778:1 1178:2 1345:2 1843:1 2131:4 2390:3 2886:1
1 93:1 204:1 278:4 320:2 404:2 600:1 711:1 856:1 1164:2 1461:1 1496:1 1627:4 2059:1 2122:1 2144:1 2430:1 2518:1 2818:1

I think it is almost correct

However, I think there are some defects in current logic. First of all, It is unnecessary to re-build alias table per slice/block/iteration. On the other hand, it's unnecessary to build alias table for every words in the big vocab of training phase. Maybe it's better to limit user's input to just one block, and generate just one slice for block without vocab spliting . How do you think it?

Thanks

Originally posted by @hiyijian in #14 (comment)

@RyanPeking RyanPeking changed the title The topics don't match when everty infer.@feiga The topics don't match when every infer.@feiga Dec 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant