Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: added tests and redactPrivateRepoComments #24

Closed

Conversation

shiv810
Copy link
Collaborator

@shiv810 shiv810 commented Oct 1, 2024

Resolves #23

  • Adds tests for Issue-Deduplication
  • Adds tests for Task matching
  • Adds redactPrivateRepoComments in the configuration

@shiv810 shiv810 changed the title fix: added tests fix: added tests and redactPrivateRepoComments Oct 1, 2024
@shiv810
Copy link
Collaborator Author

shiv810 commented Oct 1, 2024

QA

Private Repository
Screenshot 2024-10-01 at 3 34 26 AM

Supabase with redactPrivateRepoComments as False:
Screenshot 2024-10-01 at 3 33 40 AM

Supabase with redactPrivateRepoComments as True:
Screenshot 2024-10-01 at 3 31 57 AM

@shiv810
Copy link
Collaborator Author

shiv810 commented Oct 1, 2024

@Keyrxng I’ve added some tests for Deduplication and Task Matching that you can pull into your PR. The tests seem to be working well, and the CI is passing with them.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 2, 2024

@Keyrxng I’ve added some tests for Deduplication and Task Matching that you can pull into your PR. The tests seem to be working well, and the CI is passing with them.

That's awesome dude taking initiative like that, love to see it. Is this ready for review I guess?

Copy link
Member

@0x4007 0x4007 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to test with our real specifications.

Tip

You can look through "closed as not complete" issues that contain comments like "duplicate." We generally link to the correct issue when closing as not complete.

You might need to use GraphQL because GitHub search UI does not clearly show that it is "closed as not complete" example.

I originally located the above using this comment.

Perhaps there is a way to "train" the system that all of our issues that exist open now are unique? We have been quite good about this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps there is a way to "train" the system that all of our issues that exist open now are unique?

ubiquity-os/plugins-wishlist#50 (comment)

I believe using this system would be effective. Docs would contain the right context anchors uniformly, then when we fine-tune/train our own model with these docs we train it around these context anchors specifically although they'll be pretty effective without it imo.

If we only have docs for closed tasks then this already effectively treats open tasks as unique and a spec contains the why and typically the how also, which our docs contain for each task, we'll have improved performance in my opinion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can utilize https://openpipe.ai/. Once we have our embeddings and text corpus ready, we can fine-tune an LLM for this use case. I can elaborate further, but I'm not quite clear on what you mean by 'all of our current issues are unique.' There should be some common themes in the tickets and workflow

Copy link
Member

@0x4007 0x4007 Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unique means that the plugin should not flag them as redundant.

This is a tangible test with a simple boolean result that we can use to tweak our approach.

If the test says any are a match, then keep adjusting.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could tokenize the issues and use the Dice coefficient for comparison. If the coefficient falls below a certain threshold, we can permit the creation of that issue; otherwise, we won't.

Copy link
Member

@0x4007 0x4007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull actually looks fine whats the status?

@0x4007
Copy link
Member

0x4007 commented Nov 18, 2024

@sshivaditya @sshivaditya2019

@shiv810
Copy link
Collaborator Author

shiv810 commented Nov 20, 2024

Fixed Merge conflicts, should be working now.

@shiv810 shiv810 requested a review from 0x4007 November 22, 2024 14:36
@shiv810
Copy link
Collaborator Author

shiv810 commented Nov 25, 2024

@0x4007 I think this can be merged, this would add tests for the issue matching and issue deduplication feature

Copy link
Member

@0x4007 0x4007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, you should separate into two different pulls, one adding tests and one solving the problem because larger pulls obviously take a lot more effort to review.

@0x4007
Copy link
Member

0x4007 commented Dec 12, 2024

@sshivaditya2019 there are merge conflicts now so after you fix those, you can merge this in.

Lastly, occasionally these fall through the cracks as this one did. You should request other viewers to process it if it's taking too long.

@0x4007 0x4007 closed this Dec 19, 2024
@Keyrxng
Copy link
Contributor

Keyrxng commented Dec 20, 2024

weird

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Redact switch
4 participants