reduce input variants to only _used_ input variants #1968

minrk · 2024-06-28T12:04:31Z

before recomputing output metadata, which involves a number of expensive $O(\texttt{len(inputvariants)})$ operations.

reduces get_used_vars and Metadata.copy() time dramatically because input_variants can be several thousand items long to produce 10s of variants, since it is the cartesian product of all possible variants across conda-forge, including every single unused combination.

Combining this PR and the following upcoming optimizations to conda-build 24.7:

reduce cost of large variant matrix conda/conda-build#5392
perf: only set level if needed conda/conda-build#5384
Use pickle's loads/dumps as a faster deepcopy conda/conda-build#5281
(also requires HashableDict is removed in conda_build 24.7 #1967 for compatibility with conda-build 24.7)

render time for the petsc4py feedstock, which has 13,824 input variants to produce 72 linux-64 builds, is reduced from over 30 minutes to 55 seconds to produce the same result:

Checklist

Added a news entry

before recomputing output metadata reduces get_used_vars time dramatically because input_variants can be several thousand items long to produce 10s of variants

beckermr · 2024-06-28T12:16:01Z

Thanks for this. I don't quite follow it and need to think on it a bit so I understand. I'm concerned the test suite may not catch all of the edge cases.

minrk · 2024-06-28T13:17:24Z

Looks like it doesn't quite work in every case and tests catch it, despite working in all of the recipes I tested.

I think perhaps conda-smithy needs to do its own detection of used vars and reduce them before passing them to cond-build's render. Because the vast majority of variants are never used in all conda-forge recipes (every recipe computes every variant for every input parameter across all of conda-forge before checking if anything will be used, and then caries this exploded matrix through the whole render process, accounting for ~90% of the remaining render time).

I think the upstream PR is probably not going to work either: conda/conda-build#5392

In general, conda-build seems to assume that every input variant will be used, which doesn't match conda-forge where the input config is used for every package and has numerous unused dimensions which should eventually be dropped.

beckermr · 2024-06-28T14:11:04Z

Thanks. The detection of used variables is itself expensive and so we'll need to think on this a bit.

saves a lot of time in render_recipe

just remove unused variants that add dimensionality, don't try to remove anything else

beckermr

Some questions.

conda_smithy/configure_feedstock.py

minrk · 2024-07-01T10:22:32Z

The tests pass now, but reading carefully, I think this reduction must do exactly what _collapse_subpackage_variants does before computing the metadata objects, but _collapse_subpackage_variants indicates the sub-metadata used_vars are necessary to get the right answer every time (I don't know how to concoct a test where the result differs).

But it's not all keys where it matters. The only case where this should be able to get the wrong answer requires:

a key that appears in an output get_used_vars, but not the top-level, and
has multiple values so it contributes to dimensionality

The consequences of getting it wrong, however, are a missing pin and missing matrix dimension. There is a workaround where recipes can artificially ensure that the key is consumed at the top-level. I've actually had to do exactly this a number of times over the years in multi-output feedstocks, so I know it's doable, but it's not very nice when it is required. I'm not sure it happens anymore, so reintroducing a potential source of it isn't great.

minrk · 2024-07-01T10:31:32Z

The upstream PR conda/conda-build#5392 no longer reduces the variant list because I don't understand what conda-build promises to do with unused variants. Instead it reduces the cost of a very large variant list, so I think it has a better chance of being acceptable.

beckermr

Looking good. Do we have a test that covers a multidimensional key only in an output? This seems like the relevant corner case given the discussion.

minrk · 2024-07-01T11:42:13Z

Do we have a test that covers a multidimensional key only in an output?

yes, test_conda_build_api_render_for_smithy exercises this with multiple_outputs which has multiple variants for each of two outputs.

I believe I have found a case where this will miss a variant: if a variable is only used in the build script of an output in a multi-output recipe, but nowhere in meta.yaml, it will be missed by get_used_vars(global=True). I'm not sure if/when it makes sense to write a recipe like that and I'm not sure conda-smithy needs to handle it. If we do need to handle it, it may be tricky to do without constructing all of the outputs to get build.script (which is what we want to avoid doing until after we've reduced the variant matrix), but calling find_used_variables_in_shell_script on our own glob of recipe/*.sh (and equivalent for bat) might do it. The difference would be that it would search all scripts, while conda-build's logic only searches the ones that are actually output.script for an output. I'm not sure if the .sh extension can be assumed, even though it's ~always used.

beckermr

Looking at this again, I think in order to merge we need to do either one of the following two things

Fix the code getting all used vars check so it is reliable
Add a test showing exactly what fails and ensure we are ok with ignoring that case

minrk · 2024-09-18T11:13:03Z

Updated timings rerendering petsc4py (220 variants):

conda-build	conda-smithy	time (s)
24.7.1	3.39.1	79
5392	3.39.1	41
24.7.1	this PR	19
5392	this PR	18

(The savings in conda/conda-build#5392 are mostly redundant with this PR, which applies the same reduction earlier, getting the benefit in more places). So this saves a lot less now that other optimizations have landed, if that enters into the decision for whether this is worth doing.

beckermr · 2024-09-18T13:19:48Z

Given the latest change, I think we can discard the len(1) > 1 check.

Also, we need to add .bat file to the tests if we don't have one.

minrk · 2024-09-18T17:26:52Z

I haven't removed the len(variant) > 1 check in this PR, and when I tried it, it removed the mpi provider pins in petsc4py. I.e. having:

host:
- { mpi }

does not result in the pin for the value of mpi (mpich or openmpi) being in the used_vars found by reduce_variants. That does mean if mpich itself had multiple version values to build, it would render incorrectly. The following pattern would render correctly, if we had done that:

host:
- mpich  # { mpi == 'mpich' }

I can try to add a test for that tomorrow, if you think we should, or if that means we should hold off on this. One minute is a lot less dire to optimize than 30.

beckermr

Yes. Let's add more tests.

minrk requested a review from a team as a code owner June 28, 2024 12:04

reduce input variants to only _used_ input variants

d4e0c9b

before recomputing output metadata reduces get_used_vars time dramatically because input_variants can be several thousand items long to produce 10s of variants

minrk force-pushed the reduce-input-variants branch from 4601c1e to d4e0c9b Compare June 28, 2024 12:06

minrk added 3 commits June 28, 2024 19:08

reduce variants before passing to conda_render

c5a90cb

saves a lot of time in render_recipe

Merge from main

cd68c42

try a simpler reduction

cd985dd

just remove unused variants that add dimensionality, don't try to remove anything else

minrk force-pushed the reduce-input-variants branch from 468c631 to cd985dd Compare June 28, 2024 20:46

top-level always_keep_keys since it's used in more than one place

ffc2a07

minrk force-pushed the reduce-input-variants branch from 9b1a9a6 to ffc2a07 Compare July 1, 2024 07:57

beckermr requested changes Jul 1, 2024

View reviewed changes

conda_smithy/configure_feedstock.py Show resolved Hide resolved

conda_smithy/configure_feedstock.py Show resolved Hide resolved

More comments in reduce_variants

15e4e0f

minrk force-pushed the reduce-input-variants branch from 8520223 to 15e4e0f Compare July 1, 2024 11:14

beckermr requested changes Jul 1, 2024

View reviewed changes

minrk mentioned this pull request Jul 4, 2024

vtk update missing migrator regro/cf-scripts#2805

Open

minrk added 4 commits July 30, 2024 13:50

Merge branch 'main' into reduce-input-variants

2f56c96

Merge branch 'main' into reduce-input-variants

b898252

logging lint

dc2c062

Merge branch 'main' into reduce-input-variants

5cbf9c7

beckermr requested changes Sep 18, 2024

View reviewed changes

capture used_vars in scripts when reducing variants

3306275

update multi-output matrix test to cover script vars

471104c

minrk force-pushed the reduce-input-variants branch from df502f9 to 471104c Compare September 18, 2024 15:41

Merge branch 'main' into reduce-input-variants

a667c4a

beckermr approved these changes Sep 18, 2024

View reviewed changes

beckermr requested changes Sep 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce input variants to only _used_ input variants #1968

reduce input variants to only _used_ input variants #1968

minrk commented Jun 28, 2024 •

edited

Loading

beckermr commented Jun 28, 2024

minrk commented Jun 28, 2024

beckermr commented Jun 28, 2024

beckermr left a comment

minrk commented Jul 1, 2024 •

edited

Loading

minrk commented Jul 1, 2024

beckermr left a comment

minrk commented Jul 1, 2024

beckermr left a comment

minrk commented Sep 18, 2024

beckermr commented Sep 18, 2024

minrk commented Sep 18, 2024

beckermr left a comment

reduce input variants to only _used_ input variants #1968

Are you sure you want to change the base?

reduce input variants to only _used_ input variants #1968

Conversation

minrk commented Jun 28, 2024 • edited Loading

beckermr commented Jun 28, 2024

minrk commented Jun 28, 2024

beckermr commented Jun 28, 2024

beckermr left a comment

Choose a reason for hiding this comment

minrk commented Jul 1, 2024 • edited Loading

minrk commented Jul 1, 2024

beckermr left a comment

Choose a reason for hiding this comment

minrk commented Jul 1, 2024

beckermr left a comment

Choose a reason for hiding this comment

minrk commented Sep 18, 2024

beckermr commented Sep 18, 2024

minrk commented Sep 18, 2024

beckermr left a comment

Choose a reason for hiding this comment

minrk commented Jun 28, 2024 •

edited

Loading

minrk commented Jul 1, 2024 •

edited

Loading