Hint about extra fields in conda-forge.yml #1920

ytausch · 2024-05-03T18:08:38Z

Checklist

Added a news entry

This PR contains the functional changes of #1900, to be merged separately. See also parts of the discussion there.

TLDR: recipe-lint now hints about additional fields as compared with the Pydantic schema.

ytausch · 2024-05-03T18:15:22Z

conda_smithy/lint_recipe.py

@@ -1,6 +1,11 @@
 # -*- coding: utf-8 -*-

 from collections.abc import Sequence, Mapping
+from typing import List
+
+from pydantic import BaseModel


@isuruf Let's continue the discussion about adding pydantic here.

~~By the way: pydantic is already listed as a project dependency in environment.yml~~
Edit: This comment does not contribute to the discussion, my mistake.

For clarity pydantic is a development-only dependency. So that only affects development, not installation. We generate a JSON schema for runtime use. For context, it was added in PR: #1756

Is it possible to use jsonschema for this case too?

I understand that pydantic is not currently a part of https://github.com/conda-forge/conda-smithy-feedstock/blob/main/recipe/meta.yaml dependencies.

I am not aware of a good way of doing this with jsonschema.

Also, I want to repeat my argument:

I would like and strongly propose to keep pydantic here. Pydantic provides an elegant way to find additional fields that are not part of the schema (see my implementation) - and also, I think the pydantic model should be used in more parts of the smithy code anyway since it provides a type-safe way to access fields.

Why is adding pydantic as a new dependency so bad?

Why is adding pydantic as a new dependency so bad?

Please see my comment on the other thread.

I am not aware of a good way of doing this with jsonschema.

This is easy. Load the json schema and see if "additionalProperties": false is there for the attribute.

Please see my comment on the other thread.

@isuruf I just scrolled through every comment you left at my conda-smithy PRs but could not find an answer to my question. You said:

This adds a new dependency pydantic on users. Previously it was only a dev dependency. Is it possible to remove this?

and

I strongly disagree. This PR use pydantic to check for which fields to hint about. If we can't do that using the json schema we are essentially making a distinction between pydantic model and json schema.

I do not see how this explains why we want to avoid pydantic as a dependency. To your latter comment, I replied:

What do you mean with "distinction"? The pydantic model should be equivalent to the JSON schema anyway - if it's not, we have a bug.

This is easy. Load the json schema and see if "additionalProperties": false is there for the attribute.

This requires us to have a separate JSON schema that actually has additionalProperties set to false for the fields we want to warn about since we have agreed to allow additional properties in the main schema. Is this what you propose?

@isuruf I would really really like to solve this debate as I want this PR to get merged.

ytausch · 2024-05-05T10:15:20Z

Note: The signature of lintify_forge_yaml changed but makes a lot more sense now since not all lints or hints must be JSON validation-related. Formally, this would require a deprecation process but I want to raise the question of who actually uses this method.

We had a discussion about deprecation in #1906

ytausch · 2024-05-10T11:51:02Z

What is the status of this PR? @conda-forge/core @isuruf

conda_smithy/lint_recipe.py

ytausch · 2024-05-27T16:24:33Z

@isuruf

ytausch · 2024-06-07T12:42:56Z

@isuruf There are still some open discussions about pydantic

don't warn for AzureRunnerSettings and CondaBuildConfig

ytausch · 2024-07-10T10:07:06Z

@isuruf You did not reply here since almost 2 months although I pinged you multiple times. Is there anything I need to do? Fine of course if you're currently busy, I'll wait here.

h-vetinari · 2024-11-21T13:48:22Z

Sorry for the very long delay here @ytausch! I stumbled over this due to #2152 and would be very interested to help you unblock this work, as it's necessary for a lint I'd like to add (#2155).

CC @beckermr

beckermr

The issue here is requiring pydantic at run time IIRC. We need to resolve this sticking point before we can merge.

ytausch · 2024-11-21T15:25:45Z

What exactly is the reason that we don't want to have pydantic as a runtime dependency?

isuruf · 2024-11-21T20:19:38Z

I thought I made my reasons clear in #1900. Let me know if you need more clarification.

h-vetinari · 2024-11-21T21:10:58Z

@isuruf: I thought I made my reasons clear in #1900. Let me know if you need more clarification.

Looking at that PR, the only relevant-looking thread I see is

@isuruf: This adds a new dependency pydantic on users. Previously it was only a dev dependency. Is it possible to remove this?

@ytausch: I would like and strongly propose to keep pydantic here. Pydantic provides an elegant way to find additional fields that are not part of the schema (see my implementation) - and also, I think the pydantic model should be used in more parts of the smithy code anyway since it provides a type-safe way to access fields.

@isuruf: I strongly disagree. This PR use pydantic to check for which fields to hint about. If we can't do that using the json schema we are essentially making a distinction between pydantic model and json schema.

@ytausch: What do you mean with "distinction"? The pydantic model should be equivalent to the JSON schema anyway - if it's not, we have a bug. I am not aware of a good way of doing this with JSON schema.

@ytausch: @isuruf Let's continue this in #1920.

So the sticking point is not having a runtime dependence on pydantic? Why is that an issue? I don't care much, but I've found that #2155 without this PR doesn't actually work. I definitely think fixing linting on wrong distro-values is more important than whether we depend on pydantic, but I'd also be happy if there's a fix for #2152/#2155 that works without the runtime dependence.

isuruf · 2024-11-21T21:29:47Z

So the sticking point is not having a runtime dependence on pydantic?

No, the main point is that I want the json schema to be complete.

If we can't do that using the json schema we are essentially making a distinction between pydantic model and json schema.

h-vetinari · 2024-11-21T21:39:34Z

No, the main point is that I want the json schema to be complete.

How do you imagine we encode the fact that additional fields are forbidden in the json schema? AFAICT json has no way to specify that, so whatever we do, we'd have to introduce a marker-field that says "this JSON schema cannot be extended". Which is what this PR is doing with "additionalProperties" AFAIU (naming can be bikeshod of course, e.g. __no_extend / __closed, etc.).

To me saying "json schema needs to be complete" thus becomes equivalent to "we shouldn't lint on unknown fields", which I strongly disagree with.

isuruf · 2024-11-21T21:45:25Z

How do you imagine we encode the fact that additional fields are forbidden in the json schema?

additionalProperties: false

h-vetinari · 2024-11-21T21:58:06Z

So the issue is just the choice of default? I.e. whether we need to add "additionalProperties": false to non-extensible schemas, or whether - as this PR seems to be doing - set "additionalProperties": true (?!) and build on that?

It'd be completely fine for me to opt into extra="forbid" with "additionalProperties": false; that would be a natural equivalence IMO. Is this what you had in mind?

isuruf · 2024-11-21T22:02:31Z

It'd be completely fine for me to opt into extra="forbid" with "additionalProperties": false; that would be a natural equivalence IMO. Is this what you had in mind?

Yes, but with a hint in the linter instead of a lint.

h-vetinari · 2024-11-22T09:22:58Z

@ytausch, now that we've elucidated Isuru's objection, do you think you'd be able to implement this request?

Also, @beckermr, in case you're still opposed to the runtime dependence on pydantic, could you explain the problem with that please?

beckermr · 2024-11-22T10:50:18Z

I don't care personally. I was trying to make sure more discussion was had since others had objected.

ytausch · 2024-11-22T15:10:04Z

I appreciate this PR is going forward!

It'd be completely fine for me to opt into extra="forbid" with "additionalProperties": false;

Yes, but with a hint in the linter instead of a lint.

The problem with combining these 3 requirements is that we cannot partially validate a JSON schema with jsonschema, ignoring additionalProperties: false attributes. Similarly, if a Pydantic model has extra=forbid set, we cannot validate data with additional attributes. I think there is also a good reason: A schema or model is built to either validate or fail to validate; there should be nothing in between.

However, if we don't want to emit a hint and not a lint on extra fields (this PR already does that!), we need to differentiate between extra fields and other schema violations. The way I did it here was to allow extra fields in the Pydantic model (which is also reflected in the JSON schema) and then manually traverse the config to identify additional fields in a second hint step.

Doing it like this also preserves the following invariant: "All conda-forge.yml files in feedstocks can be validated with the Pydantic model of conda-smithy". This is very useful because it allows us to parse all feedstocks with the Pydantic model, which I see as a potential future use case for the autotickbot. Hints can be overridden and do not preserve this invariant.

If I understand correctly, the reason why you are not happy with the current solution is that you want to forbid extra fields in the JSON schema - which I also understand because, for example, IDE support.

Because of the two reasons I mentioned above (we need extra=allow to separate the check for extra fields from the rest of the schema check; Pydantic model invariant), I propose solving this with the following approach:

The current Pydantic model stays as-is.
We add a method that returns a new, "strict", Pydantic model, dynamically modifying all relevant instances of extra=allow to extra=forbid. Never did this before but it should be similar to adding new fields dynamically.
The JSON schema is generated from the strict Pydantic model, thus it has additionalProperties: false in all relevant cases.
The non-strict Pydantic model is used for generating lints, and we generate the hints for extra fields like this PR currently already does.

The strict Pydantic model (which is dynamically accessible via the added method) exactly reflects what is present in the JSON schema. And the non-strict (original) Pydantic model poses the minimum schema every feedstock satisfies.

What do you think?

isuruf · 2024-11-22T18:56:46Z

That sounds good. It wouldn't require a run time dependence on pydantic right?

h-vetinari · 2024-11-22T18:56:48Z

The problem with combining these 3 requirements is that we cannot partially validate a JSON schema with jsonschema, ignoring additionalProperties: false attributes.

I don't understand that problem yet. How does jsonschema come into it, if the models between JSON and pydantic are equivalent, and we can just use pydantic to enforce extra="forbid"?

ytausch · 2024-11-22T19:45:15Z

That sounds good. It wouldn't require a run time dependence on pydantic right?

If absolutely necessary, we could avoid the runtime dependency by generating a strict and a non-strict JSON schema. The non-strict schema would allow extra fields while the strict schema wouldn't. If the non-strict schema passes and the strict schema fails, we know the issue lies in the extra fields.

However, we probably couldn't format the lint messages as nicely as we currently can. And there will be compromises if a file violates the non-strict JSON schema, and on top of that, has extra fields since we cannot really detect that with a JSON-schema-only approach. This is why I propose keeping pydantic as a runtime dependency to have a more elegant way of extracting extra fields.

I don't understand that problem yet. How does jsonschema come into it, if the models between JSON and pydantic are equivalent, and we can just use pydantic to enforce extra="forbid"?

If the models between JSON and pydantic are equivalent and extra=forbid is set in pydantic, additionalProperties: false is set in the JSON schema.
This makes the JSON schema useless for generating schema lints (and hints) because jsonschema cannot ignore the additionalProperties JSON schema definitions. (Exception: There are separate strict and non-strict JSON schemas, see above to why I oppose that).
Thus, we need to use pydantic to 1) enforce the non-strict model to generate lints; and to 2) enforce the strict model to generate hints for extra fields.
We could just validate the strict model on the input yaml and process the ValidationError that might occur. Depending on the type of errors, we create lints or hints. While this would work for sole schema checking, it makes the Pydantic model useless for the future use case of parsing conda-forge.yml with it. This is because we do not get a Pydantic object if ValidationErrors are raised on validation.
To ensure that the model is good for parsing too, we have to set extra=allow or extra=ignore.
Because we wanted to have additionalProperties: true in the JSON schema, this means we need to dynamically modify the model to generate it. This is what I proposed above.

Hopefully, this clarified a bit? :)

h-vetinari · 2024-11-22T20:02:56Z

Hopefully, this clarified a bit? :)

Thank you for the elaboration, though it's the second time where you build a longer argument and I already stumble on one of the first assumptions/statements.

This makes the JSON schema useless for generating schema lints (and hints) because jsonschema cannot ignore the additionalProperties JSON schema definitions.

How does it become useless? The fact that it cannot seems irrelevant to me because it shouldn't ignore additionalProperties / extra fields (and we turn them into lints or hints).

ytausch · 2024-11-22T20:25:39Z

Thank you for the elaboration, though it's the second time where you build a longer argument and I already stumble on one of the first assumptions/statements.

Good that we are discussing this here then!

How does it become useless? The fact that it cannot seems irrelevant to me because it shouldn't ignore additionalProperties / extra fields (and we turn them into lints or hints).

Counter-question: How do you want to differentiate between schema violations that turn into a lint vs. violations that turn into a hint?

h-vetinari · 2024-11-22T20:33:10Z

How do you want to differentiate between schema violations that turn into a lint vs. violations that turn into a hint?

Counter-counter-question: Is the extra complexity that's necessary to make that distinction actually worth it, or should we just lint on any violation, including otherwise benign extra fields?

I haven't been involved in this discussion as long as you and Isuru, but so far my impression is that a simpler approach would be clearly preferable, and so what if wrong fields get lints instead if hints? 🤷‍♂️

ytausch · 2024-11-22T20:41:34Z

How do you want to differentiate between schema violations that turn into a lint vs. violations that turn into a hint?

Counter-counter-question: Is the extra complexity that's necessary to make that distinction actually worth it, or should we just lint on any violation, including otherwise benign extra fields?

I haven't been involved in this discussion as long as you and Isuru, but so far my impression is that a simpler approach would be clearly preferable, and so what if wrong fields get lints instead if hints? 🤷‍♂️

I'm totally in for classifying extra fields as lints instead of hints. This is also what I initially proposed a few months ago in #1865 :D
It has to do with Isuruf wanting to add new platforms to conda-forge "without touching conda-smithy", see the discussion in #1865.

h-vetinari · 2024-11-22T20:47:12Z

It has to do with Isuru wanting to add new platforms to conda-forge "without touching conda-smithy".

I understand the sentiment, but given all the overengineered complexity that flows from it, this requirement must be put in question.

New platforms get added extremely rarely, while we can do smithy releases as often as we want, so it's really not an unreasonable ask to make a smithy update for rolling out a new platform.

ytausch · 2024-11-22T21:28:59Z

I absolutely agree.

h-vetinari · 2024-11-23T20:48:57Z

New platforms get added extremely rarely, while we can do smithy releases as often as we want, so it's really not an unreasonable ask to make a smithy update for rolling out a new platform.

Even more than that: all the platform-related values get populated with

>>> from conda.base.constants import KNOWN_SUBDIRS
>>> [subdir.replace("-", "_") for subdir in KNOWN_SUBDIRS if "-" in subdir]
['emscripten_wasm32', 'wasi_wasm32', 'freebsd_64', 'linux_32', 'linux_64', 'linux_aarch64', 'linux_armv6l', 'linux_armv7l', 'linux_ppc64', 'linux_ppc64le', 'linux_riscv64', 'linux_s390x', 'osx_64', 'osx_arm64', 'win_32', 'win_64', 'win_arm64', 'zos_z']

so all conda-build-supported platforms are already part of the schema information, and won't need further changes to smithy even if we start rolling out a new platform in conda-forge.

ytausch requested a review from a team as a code owner May 3, 2024 18:08

ytausch mentioned this pull request May 3, 2024

refactor conda-forge.yml linting logic, hint about extra fields #1900

Closed

1 task

ytausch commented May 3, 2024

View reviewed changes

isuruf reviewed May 11, 2024

View reviewed changes

conda_smithy/lint_recipe.py Show resolved Hide resolved

ytausch force-pushed the hint-extra-fields branch from 036d3df to 155b56e Compare May 13, 2024 14:44

ytausch requested a review from isuruf June 7, 2024 12:43

ytausch added 6 commits July 10, 2024 12:02

hint about extra fields in conda-forge.yml

8a6d442

add news file

0d5c730

use outdated List type

d6307ea

lintify_forge_yaml returns lists of str

294ce7e

use pytest, add tests for AzureRunnerSettings and CondaBuildConfig

1068125

make extra fields hint configurable

8a429ed

don't warn for AzureRunnerSettings and CondaBuildConfig

ytausch force-pushed the hint-extra-fields branch from 155b56e to 8a429ed Compare July 10, 2024 10:03

ytausch mentioned this pull request Jul 26, 2024

fix: regenerate schema #1993

Merged

h-vetinari added 2 commits November 22, 2024 00:26

Merge remote-tracking branch 'upstream/main' into hint-extra-fields

8b6efc0

fix inaccurate tests

6c0a9db

h-vetinari mentioned this pull request Nov 21, 2024

Linting conda-forge.yml does not diagnose (some) schema violations #2152

Open

h-vetinari force-pushed the hint-extra-fields branch from aca42c0 to 1a129f6 Compare November 21, 2024 13:44

h-vetinari mentioned this pull request Nov 21, 2024

ENH: lint incorrect distro names in os_version #2155

Merged

appease linters

bfc6a4b

h-vetinari force-pushed the hint-extra-fields branch from 1a129f6 to bfc6a4b Compare November 21, 2024 13:49

beckermr requested changes Nov 21, 2024

View reviewed changes

Hint about extra fields in conda-forge.yml #1920

Are you sure you want to change the base?

Hint about extra fields in conda-forge.yml #1920

Conversation

ytausch commented May 3, 2024 • edited Loading

ytausch May 3, 2024

Choose a reason for hiding this comment

ytausch May 3, 2024 • edited Loading

Choose a reason for hiding this comment

jakirkham May 3, 2024

Choose a reason for hiding this comment

ytausch May 5, 2024 • edited Loading

Choose a reason for hiding this comment

ytausch May 5, 2024

Choose a reason for hiding this comment

isuruf May 11, 2024

Choose a reason for hiding this comment

ytausch May 13, 2024 • edited Loading

Choose a reason for hiding this comment

ytausch May 21, 2024

Choose a reason for hiding this comment

ytausch commented May 5, 2024 • edited Loading

ytausch commented May 10, 2024

ytausch commented May 27, 2024

ytausch commented Jun 7, 2024

ytausch commented Jul 10, 2024

h-vetinari commented Nov 21, 2024

beckermr left a comment

Choose a reason for hiding this comment

ytausch commented Nov 21, 2024

isuruf commented Nov 21, 2024

h-vetinari commented Nov 21, 2024

isuruf commented Nov 21, 2024

h-vetinari commented Nov 21, 2024 • edited Loading

isuruf commented Nov 21, 2024

h-vetinari commented Nov 21, 2024

isuruf commented Nov 21, 2024

h-vetinari commented Nov 22, 2024

beckermr commented Nov 22, 2024

ytausch commented Nov 22, 2024 • edited Loading

isuruf commented Nov 22, 2024

h-vetinari commented Nov 22, 2024

ytausch commented Nov 22, 2024 • edited Loading

h-vetinari commented Nov 22, 2024

ytausch commented Nov 22, 2024

h-vetinari commented Nov 22, 2024 • edited Loading

ytausch commented Nov 22, 2024 • edited Loading

h-vetinari commented Nov 22, 2024

ytausch commented Nov 22, 2024 • edited Loading

h-vetinari commented Nov 23, 2024

ytausch commented May 3, 2024 •

edited

Loading

ytausch May 3, 2024 •

edited

Loading

ytausch May 5, 2024 •

edited

Loading

ytausch May 13, 2024 •

edited

Loading

ytausch commented May 5, 2024 •

edited

Loading

h-vetinari commented Nov 21, 2024 •

edited

Loading

ytausch commented Nov 22, 2024 •

edited

Loading

ytausch commented Nov 22, 2024 •

edited

Loading

h-vetinari commented Nov 22, 2024 •

edited

Loading

ytausch commented Nov 22, 2024 •

edited

Loading

ytausch commented Nov 22, 2024 •

edited

Loading