-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: ensure variable finding works properly and add unit tests for variable finding #5535
Conversation
CodSpeed Performance ReportMerging #5535 will not alter performanceComparing Summary
|
@conda/builds-tools @jezdez @kenodegard @beeankha @jaimergp This request is unusual, but can we consider merging this PR before the next conda-build release? This PR fixes a very non-trivial bug that has been present in the code base for at least five years. conda-build is misdetecting which variants are being used, which is not great. :/ |
Hm, I wonder if this is behind conda-forge/conda-smithy#2100 and conda-forge/conda-smithy#2071 |
Looks like it very likely could be. Currently |
Co-authored-by: jaimergp <jaimergp@users.noreply.github.com>
And side note, I think these regexes (which have received several contributions lately) would benefit from |
That is a good idea, but I'd like to defer to another PR. Also, I am not sure I could actually describe what it is doing. I use regex101.com to build and test regex strings because they are so confusing. :/ |
Heh, same here but with regexr.com 😂
Let me give it a try, but yes, non-blocking. |
Hm, let's see: variant_regex = re.compile(
# This regex matches stuff in Jinja expressions and functions; e.g. {{ python }} and {{ pin_compatible('python') }}
rf"""
\{{\{{ # Double curly brace opens Ninja expression
\s* # Zero or more spaces
(?: # Opens non-capturing group
pin_[a-z]+ # Matches any lowercase pin_* function name (e.g. pin_compatible)
\( # Matches the opening parenthesis of the pin_* function
\s*? # Non-greedy glob for any number of spaces (JRG: not sure why we need the non greedy version)
['"] # Single or double quote, required once
)? # Closes non-capturing group and makes it optional; i.e. the whole pin_compatible(' part can be ommitted
{v_regex} # The regex from above, should match a variant name
[^_0-9a-zA-Z] # One instance of any character that is NOT a number, a letter or an underscore; i.e. not part of the variant name
.*? # Any number of any character, in a non-greedy way; aka "whatever comes before the closing curly braces"
\}}\}} # Double curly brace closes Ninja expression
""",
re.VERBOSE
)
selector_regex = re.compile(
# This regex matches selectors; e.g. # [py>38]
rf"""
^ # Starts with... (one thing, I don't know why 🎵)
[^#\[]*? # Any character that is not a # or [, any number of times, non-greedy
\#? # A hashtag, optionally
\s # A space
\[ # An opening square bracket
[^\]]*? # Any character that is not a ], any number of times, non-greedy
(?<! # Negative lookbehind assertion ✨ aka "NOT preceded by..."
[_\w\d] # ... underscores, letters or digits
) # Closes the negative lookbehind assertion
{v_regex} # The variant name regex
[ # One of
=\s<>!\] # equals, spaces, angle brackets, exclamation mark or closing square bracket
# ... These are all characters that "terminate" a variant name, like operators
# ... or the end of the selector
]
""",
re.VERBOSE
) |
@jaimergp my concern is that the conda-build test suite is not robust enough to catch bugs that might be introduced when we change the regex strings to the more verbose style. It'd be better to defer to future PRs that change the strings and introduce more robust tests to ensure nothing breaks. |
That's ok with me. Just updated my comment because I was already "parsing" the regexes for the review, so might as well write it down. |
@@ -749,7 +749,9 @@ def find_used_variables_in_text(variant, recipe_text, selectors_only=False): | |||
continue | |||
v_regex = re.escape(v) | |||
v_req_regex = "[-_]".join(map(re.escape, v.split("_"))) | |||
variant_regex = rf"\{{\s*(?:pin_[a-z]+\(\s*?['\"])?{v_regex}[^'\"]*?\}}\}}" | |||
variant_regex = ( | |||
rf"\{{\s*(?:pin_[a-z]+\(\s*?['\"])?{v_regex}[^_0-9a-zA-Z].*?\}}\}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of the change:
- The previous regex was non-greedy-matching any characters after the variant name that were NOT a single or double quote. That meant that partial matches were possible (e.g. it found
pari
inpin_compatible('napari')
. - The new regex tries to non-greedy-match any character after the variant name that is NOT a valid variant name character (underscores, numbers and letters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. We assert the variable name is terminated by a non-allowed character for variables. Then we get the rest of the characters up until the jinja2 statement is closed, asserting it is closed.
shall we merge @jaimergp? no pressure, but I am contemplating adding a patch to the conda-build feedstock if this PR is going to sit for a while. |
Let's give @beeankha and @ForgottenProgramme a few hours to take a look. The release branch has not been created yet so there's no rush to merge just yet. It does LGTM though. |
Description
Currently, conda-build will say the variant variable
python
is used for a statement likeThis is incorrect and only
python_min
is used in this case AFAIK.It also says this statement does NOT use
python
when it doesThis bug was introduced in #2838.
The issue appears to be a buggy regex for searching for variants.
This PR proposes a new regex and adds unit tests that assert the correct behavior.
Checklist - did you ...
news
directory (using the template) for the next release's release notes?