-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gaby: exclude stale(?) web document #63
Comments
Change https://go.dev/cl/633395 mentions this issue: |
Change https://go.dev/cl/635176 mentions this issue: |
gopherbot
pushed a commit
that referenced
this issue
Dec 15, 2024
This page was temporarily added to help spec revision. It will be removed at the start of go1.25. Until then, ignore this page. (We have two entries for this page in our DB) For #63 Change-Id: Ibf369100ca25f47ca487bb87f7327388ef8dcef3 Reviewed-on: https://go-review.googlesource.com/c/oscar/+/633395 Reviewed-by: Tatiana Bradley <tatianabradley@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot
pushed a commit
that referenced
this issue
Dec 15, 2024
Gaby splits each crawled webpage into docs for embedding, computes embedding, and store them in the vector db. Delete all the docs and their embedding. This is meant to be run after the webpage is excluded from crawling with Crawler.Deny. For #63 Change-Id: I095a65b9a834ccf48062facc3654f40b43562e15 Reviewed-on: https://go-review.googlesource.com/c/oscar/+/635176 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Jonathan Amsterdam <jba@google.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
From golang/go#67901 (comment)
Docs like https://go.dev/doc/go1.17_spec#Package_initialization are kept for historical purposes.
We may come up with a workaround for this specific issue. I am not sure about general solutions.
Some approaches I am thinking of:
Label such docs manually in the document source and exclude them
Label such docs using LLM (e.g. "obsolete"?) and exclude them
(we can also do the same for issues that we don't want to appear in the related info by labelling/classifying appropriately)
Before posting, drop almost duplicates (e.g. by checking pair-wise similarity comparison)
The text was updated successfully, but these errors were encountered: