-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Exclude Forks" Additional Filter gives a result still containing some forks #166
Comments
The two example repositories also show stats in GHS that are different from the ones in the actual repo on GitHub. Examples:
|
Given that the former fork is still reachable if you look for it by its old name, this leads me to believe that the project started under the old name as a non-fork, was deleted by the owner, and then re-created as a fork under the same name, before finally being renamed. The deletion would explain the drop in stars. However, this is all guesswork on my part, and it might even be better to reach out to @anurodhp directly for a timeline of events. A clear understanding of the project's lifecycle will help us avoid instances of this in the future. The latter project was in all likelihood deleted and then re-created as a fork. Given that it never went over 10 stars and was not updated for a long time, we could never update the information. This does open a new can of worms: What should we do with projects that go below the star threshold, after they were mined? I think the best course of action would be to devise a new "maintenance" job that periodically checks repositories that have not been updated in a very long time, refreshing stale information, and removing the repository if it no longer satisfies the star criteria. |
By the way, regarding the naming mismatches: GitHub does not discriminate casing in repository names. What I mean by this is that given a repository Example:
All of these API links point to the same repository. My point is that there is no difference between the actual name and its lower-case variant. We used to keep all names in lowercase (due to a misunderstanding by one of the maintainers), but I have since changed it to be stored as is displayed in GitHub. As a result, you may still see some repositories that were not updated in a long time have a case mismatch in the stored and actual name. But I guess that this will also be rectified with the proposed "maintenance" job. |
No issue with the names, indeed GitHub is case insensitive for repo names. I reported them as coming from my tool (didn't mean to imply a difference with the casing) to check if renames are possibly linked to the problem. The only rename example seems to be |
Description
I got from GHS a list of projects with at least 10 contributors, 100 stars, 1000 commits, and explicitly requested GHS to exclude forks with the filter checkbox in the UI.
When checking projects in the list, there are still some that are forks (9 out of about 12,000).
Replication
These are two projects that were in the list and that you can use to reproduce the bug.
If you search for them in GHS with Exclude Forks checked, they are returned, but if clicked, they clearly show as forks on GitHub (and are also forks according to the REST API):
To replicate: just search for "quaprosoft" or "anurodhp" with Exclude Forks checked, they're returned, click the link to the repo on GitHub and see the "forked from:" at the top.
I included two examples because the
anurodhp/VaxProj
was renamed fromanurodhp/monal
, butqaprosoft/carina
was not.Both these projects had their last commit in 2021 (March and November respectively).
Other Info
Other projects I didn't manually check but that were marked as forks by my analysis (if you need more cases to investigate):
Casing is not relevant for renames, reported as coming from my tool, but GitHub is case insensitive for repo names. The only rename example is
anurodhp/monal
->anurodhp/VaxProj
.The list might be outdated since I have old data already analyzed from which I am getting them, but the two examples I manually checked are definitely still exhibiting the problem.
The text was updated successfully, but these errors were encountered: