Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat Transcript Crawling No Longer Possible #25

Open
ghost opened this issue Nov 14, 2014 · 6 comments
Open

Chat Transcript Crawling No Longer Possible #25

ghost opened this issue Nov 14, 2014 · 6 comments
Labels

Comments

@ghost
Copy link

ghost commented Nov 14, 2014

Search keywords like tag:cv-pls tag:delv-pls tag:flag-pls tag:reopen-pls tag:ro-pls tag:rov-pls tag:review-pls tag:rv-pls are no longer able to be searched via the chat transcript search form. I don't know when this change happened. I can't find a reliable way to search the transcript for any close vote tags. If this does not get fixed, or someone does not have a workaround then chat support will have to be dropped for the backlog.

@ghost ghost added the bug label Nov 14, 2014
@hakre
Copy link

hakre commented Nov 14, 2014

I've seen that since some time searching for Imgur isn't reflected any longer for URLs in the chat log. Which is sort of pity. I suspect that Stackoverflows indexing of the chat is skipping now ccertain parts to keep the index small. It's perhaps worth to report this as problematic on SO? I for myself think it's a pitty that it's not possible any longer to search for URLs. Probably the reason is similar why searching for keywords doesn't work any longer?

@haneytron
Copy link

I investigated this with our chat devs and we have determined that we never supported tag searches in chat specifically. This bot worked by happenstance and the search query that it reads has now been polluted with mentions of the tags that aren't actually tags. If you'd like a tag search feature in chat, I'd suggest posting a meta feature request.

@PeeHaa
Copy link
Member

PeeHaa commented Nov 15, 2014

@ghost
Copy link
Author

ghost commented Dec 26, 2014

If the response is "no" to this I will be removing chat support in the next release. We can always cherry-pick the code out from the last commit and re-add it later if they decide on supporting tag searching.

@Wes0617
Copy link

Wes0617 commented Dec 26, 2014

You can always make a crawler that navigates through transcript pages...

@ghost
Copy link
Author

ghost commented Dec 26, 2014

@WesNetmo with the time limit imposed for querying each transcript page, and that you would have very little control in filtering out junk messages it can take a looong time to crawl the pages. Current dev version supports setting number of entries to find and process, defaults to 250. Let's say that with current search abilities we can achieve having 10 legit results per page, it would still take 25 page requests, each with their ~10sec time throttle, and this is to be done every 15mins by default. This will make each cache update taking over 4mins. Beforehand we only needed to access 3-4 pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants