Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reuse spdx: also output the full SPDX-License-Identifier #586

Closed
tenzap opened this issue Sep 4, 2022 · 11 comments · Fixed by #623
Closed

reuse spdx: also output the full SPDX-License-Identifier #586

tenzap opened this issue Sep 4, 2022 · 11 comments · Fixed by #623

Comments

@tenzap
Copy link

tenzap commented Sep 4, 2022

The output of reuse spdx has the LicenseInfoInFile field.
However, with this field it is not always possible to determine the original value of SPDX-License-Identifier

For example if the input file has

// Copyright (C) 2016 Paul Lemire <paul.lemire350@gmail.com>
// SPDX-License-Identifier: LicenseRef-Qt-Commercial OR GPL-3.0-only WITH Qt-GPL-exception-1.0

The output will be

FileName: ./qt3d/tests/auto/input/utils/tst_utils.cpp
SPDXID: SPDXRef-ec9701a4a6e83e5c346e9f4377005153
FileChecksum: SHA1: 78c032c19e50ba890da9609caca80ca3edea1166
LicenseConcluded: NOASSERTION
LicenseInfoInFile: GPL-3.0-only
LicenseInfoInFile: LicenseRef-Qt-Commercial
LicenseInfoInFile: Qt-GPL-exception-1.0
FileCopyrightText: <text>Copyright (C) 2016 Paul Lemire <paul.lemire350@gmail.com></text>

Which is not sufficient to determine the SPDX-License-Identifier

This information would be useful when using the reuse spdx output as input of another script (for example to build a DEP5 file)

@rpavlik
Copy link

rpavlik commented Sep 6, 2022

I think this is just a straight mis-parse. I'm not 100% sure what the spdx file should look like, but I'm pretty sure that GPL-3.0-only WITH Qt-GPL-exception-1.0 is one license, while the other is LicenseRef-Qt-Commercial, and the "or" is between them. So I'd expected two "LicenseInfoInFile" lines.

Would be nice to have a "round-trip" test - make sure that the parsed license info turns back into an identical spdx-license-identifier line as the input, though I don't remember if there's any license-expression-formatting in the codebase yet.

@rpavlik
Copy link

rpavlik commented Sep 6, 2022

Looks like the relevant code is either https://github.com/fsfe/reuse-tool/blob/master/src/reuse/_util.py#L260 or called by it

@silverhook
Copy link
Contributor

In the SPDX community it is understood that:

LicenseInfoInFile: GPL-3.0-only
LicenseInfoInFile: LicenseRef-Qt-Commercial
LicenseInfoInFile: Qt-GPL-exception-1.0

would equate to:

GPL-3.0-only AND LicenseRef-Qt-Commercial AND Qt-GPL-exception-1.0, which would indeed be a pretty annoying bug compared to the input.

I would say, it should simply directly translate SPDX-License-Identifier to LicenseInfoInFile and keep the string following the tag intact. That is what was found in the file, and it’s a valid SPDX statement.

@rpavlik
Copy link

rpavlik commented Sep 15, 2022

Oh, so it's AND by default, not OR, that makes sense. (Think my tool or the Rust spdx-expression crate has a bug...) Yeah, the reuse tool is definitely doing too much work here and losing the details. (I think it's parsing further so it can check for the presence of the right license files, but that code probably shouldn't be used when generating spdx output)

@mxmehl
Copy link
Member

mxmehl commented Sep 30, 2022

Yes, we're reinventing the wheel here and should instead use standard libs for that. See #394

@pietroalbini
Copy link
Contributor

Also got bitten by this when trying to integrate REUSE with Rust. The Rust standard license is MIT OR Apache-2.0, and there is currently no way to get machine-readable output from REUSE that distinguishes between OR and AND (since we'll need NCSA AND (MIT OR Apache-2.0) in some places, licensing is fun!).

I would say, it should simply directly translate SPDX-License-Identifier to LicenseInfoInFile and keep the string following the tag intact. That is what was found in the file, and it’s a valid SPDX statement.

Would this something that would be accepted while we wait for the next generation of output?

@pietroalbini
Copy link
Contributor

So, I looked into this issue more, and it turns out the behavior of REUSE adheres to the SPDX 2.1 specification:

If license information for more than one license is contained in the file or if the license information offers the package recipient a choice of licenses, then each of the choices should be listed as a separate entry.

Still, SPDX assumes someone would also populate LicenseConcluded, which REUSE intentionally sets to NOASSERTION:

# IMPORTANT: Make no assertion about concluded license. This tool
# cannot, with full certainty, determine the license of a file.
out.write("LicenseConcluded: NOASSERTION\n")

The options I could see are:

  • Still include the full SPDX expression in LicenseInformationInFile. This would not be according to the narrative part of the spec, even though the "syntax" part says SpdxExpression for the contents of the field.
  • Always include the expressions in LicenseConcluded.
  • Add a flag reuse spdx --add-concluded-licenses so that people can opt into the behavior, noting in the documentation that REUSE cannot be sure the conclusion is correct.

@carmenbianca
Copy link
Member

I'm not exactly sure anymore why the behaviour is what it is.

For LicenseInfoInFile, I think it's reasonable to do the following:

# SPDX-License-Identifier: MIT OR 0BSD
# SPDX-License-Identifier: Apache-2.0

->

LicenseInfoInFile: MIT OR 0BSD
LicenseInfoInFile: Apache-2.0

I think --add-concluded-licenses is a good idea. Maybe @silverhook has some good input on concluding licensing using technology instead of humans.

The concluded licence in the above case would probably be (MIT OR 0BSD) AND Apache-2.0 or some such.

@pietroalbini
Copy link
Contributor

Opened a PR populating LicenseConcluded, with or without (by dropping the last commit) a CLI flag: #623

@silverhook
Copy link
Contributor

That would indeed be useful, and super thanks for the PR @pietroalbini.

There is just one caveat I would like us to address – namely that (even if perhaps the spec’s might not be super explicit about it; and if so, that is likely a bug in the spec), within SPDX it is understood that:

  • a tool (e.g. a scanner like ScanCode, FOSSology, reuse tool) is typically responsible for finding and as such populating the LicenseInfoInFile and other LicenseInfoIn*
  • but a human is the only one able to make conclusions, and as such responsible for the LicenseConcluded (and any other *Concluded fields)

As such, if we implement reuse spdx to include LicenseConcluded, it should be both:

  • optional, intentionally triggered by a flag; and
  • explicitly require the user of the tool to state the Person (and if applicable Organization) who made the conclusions that reuse tool’s output is “the truth” and not just a hint

see:

8.5 Concluded license field
[…]
8.5.2 Intent

Here, the intent is for the SPDX document creator to analyze the License Information in File (8.6) and other objective information, e.g., “COPYING FILE,” along with the results from any scanning tools, to arrive at a reasonably objective conclusion as to what license governs the file.

https://spdx.github.io/spdx-spec/v2.3/file-information/#85-concluded-license-field
https://spdx.github.io/spdx-spec/v2.3/document-creation-information/#68-creator-field

@pietroalbini
Copy link
Contributor

Sounds good @silverhook, thanks for the prompt feedback! I updated the PR accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants