-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: added tests and redactPrivateRepoComments #24
fix: added tests and redactPrivateRepoComments #24
Conversation
@Keyrxng I’ve added some tests for Deduplication and Task Matching that you can pull into your PR. The tests seem to be working well, and the CI is passing with them. |
That's awesome dude taking initiative like that, love to see it. Is this ready for review I guess? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to test with our real specifications.
Tip
You can look through "closed as not complete" issues that contain comments like "duplicate." We generally link to the correct issue when closing as not complete.
You might need to use GraphQL because GitHub search UI does not clearly show that it is "closed as not complete" example.
I originally located the above using this comment.
Perhaps there is a way to "train" the system that all of our issues that exist open now are unique? We have been quite good about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps there is a way to "train" the system that all of our issues that exist open now are unique?
ubiquity-os/plugins-wishlist#50 (comment)
I believe using this system would be effective. Docs would contain the right context anchors uniformly, then when we fine-tune/train our own model with these docs we train it around these context anchors specifically although they'll be pretty effective without it imo.
If we only have docs for closed tasks then this already effectively treats open tasks as unique and a spec contains the why and typically the how also, which our docs contain for each task, we'll have improved performance in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can utilize https://openpipe.ai/. Once we have our embeddings and text corpus ready, we can fine-tune an LLM for this use case. I can elaborate further, but I'm not quite clear on what you mean by 'all of our current issues are unique.' There should be some common themes in the tickets and workflow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unique means that the plugin should not flag them as redundant.
This is a tangible test with a simple boolean result that we can use to tweak our approach.
If the test says any are a match, then keep adjusting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could tokenize the issues and use the Dice coefficient for comparison. If the coefficient falls below a certain threshold, we can permit the creation of that issue; otherwise, we won't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull actually looks fine whats the status?
@sshivaditya @sshivaditya2019 |
Fixed Merge conflicts, should be working now. |
@0x4007 I think this can be merged, this would add tests for the issue matching and issue deduplication feature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, you should separate into two different pulls, one adding tests and one solving the problem because larger pulls obviously take a lot more effort to review.
@sshivaditya2019 there are merge conflicts now so after you fix those, you can merge this in. Lastly, occasionally these fall through the cracks as this one did. You should request other viewers to process it if it's taking too long. |
weird |
Resolves #23
Issue-Deduplication
Task matching