Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weird scoring / wrong identification for Apache-2.0 license text without appendix since spdx-license-list data v3.23 #94

Open
decathorpe opened this issue Mar 14, 2024 · 2 comments

Comments

@decathorpe
Copy link
Contributor

see also spdx/license-list-XML#2418

The spdx-license-list v3.23 update added "Pixar" license, which is a variant of Apache-2.0.

Using this version of the SPDX data, Apache-2.0 licenses without appendix (like the one from the rust-lang/rust repo), the file is now a closer match to "Pixar" than it is to "Apache-2.0" despite being a perfect copy except that the appendix is missing.

Is it possible that this is because the appendix that is marked as optional is not missing entirely?
see spdx/license-list-XML#2418 (comment)

@jpeddicord
Copy link
Owner

Apologies for the incredibly slow reply here! I'm seeing that SPDX might have split out the optional sections of this which could help. I'm pulling in updates for that now and am encountering other scoring issues (BSD-3-Clause, this time) to debug -- hopefully nothing too crazy.

For what it's worth, regression tests can be added in to tests/data/real-licenses; if there's a particular license in the future that's causing trouble then this can help inform the problem a little bit. But because of the way text-matching works in this library, only so much it will do.

@decathorpe
Copy link
Contributor Author

Thank you for taking a look! Yeah, I reported this issue to the SPDX people, and they split the optional parts of the appendix further to try to help with this.

But I tried with the latest spdx license data version, and the issue is still there - this license text (without appendix but with the "END OF TERMS OF CONDITIONS" line), which is used by many Rust projects because they just copy the files from the rust-lang/rust repo, still triggers the issue of getting mis-classified as "Pixar":

https://github.com/rust-lang/rust/blob/master/LICENSE-APACHE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants