Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support LLVM stable (v19) #23311

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Jan 6, 2025

Run the majority of the other under llvm stable. There are still
some features that don't seem to work.

Generate and experimental warning for now, at least until we can fix
the known issues.

Fixes: #11362

@sbc100 sbc100 force-pushed the llvm_stable branch 10 times, most recently from 208a081 to d26814e Compare January 6, 2025 20:33
@sbc100 sbc100 changed the title Add some minimal testing with LLVM stable (v19) Add initial support LLVM stable (v19) Jan 6, 2025
Run the majority of the `other` under llvm stable.  There are still
some features that don't seem to work.

Generate and experimental warning for now, at least until we can fix
the known issues.

Fixes: emscripten-core#11362
@kripken
Copy link
Member

kripken commented Jan 6, 2025

This is quite a lot of additional testing. Do we know we actually want this?

From our quick discussion before, I wasn't certain of the benefit here. If the goal is to support linux distros, being able to tell them "use this emscripten with your stable LLVM version N", then that is possible without this. This PR allows us to say that a particular emscripten commit works with two LLVM verisons - but why do we want that specifically?

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 7, 2025

This is quite a lot of additional testing. Do we know we actually want this?

From our quick discussion before, I wasn't certain of the benefit here. If the goal is to support linux distros, being able to tell them "use this emscripten with your stable LLVM version N", then that is possible without this. This PR allows us to say that a particular emscripten commit works with two LLVM verisons - but why do we want that specifically?

I feel like this way I would feel a lot more confident recommending using a stable LLVM release.

The proposal of "use this emscripten N with your stable LLVM version M" doesn't actually test that M and N are compatiable, its just somewhat likely due to branch point closeness.

This new approach has the advantage that we actually continuously the test that all emscripten releases are compatible with at least one stable LLVM release. This is a much higher bar IMHO and it means that we know it true, at least for the subset of features we test. It also means that we have an entire revision range that we can recomment. i.e. "emscripten version A-B is known to work with LLVM M".

@kripken
Copy link
Member

kripken commented Jan 7, 2025

The proposal of "use this emscripten N with your stable LLVM version M" doesn't actually test that M and N are compatiable, its just somewhat likely due to branch point closeness.

I think it can, as a slightly modified version of previous ideas. In the past, yeah, we said "get the closest version", and you're right that without explicit testing that seems risky. But if we pull from LLVM say 20 times a year (assuming we are moving to a model with less frequent rolls), then if the LLVM stable releases from the year are among those 20, then we get an "emscripten with stable LLVM release" for free.

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 7, 2025

The proposal of "use this emscripten N with your stable LLVM version M" doesn't actually test that M and N are compatiable, its just somewhat likely due to branch point closeness.

I think it can, as a slightly modified version of previous ideas. In the past, yeah, we said "get the closest version", and you're right that without explicit testing that seems risky. But if we pull from LLVM say 20 times a year (assuming we are moving to a model with less frequent rolls), then if the LLVM stable releases from the year are among those 20, then we get an "emscripten with stable LLVM release" for free.

One problem with that approach is that LLVM releases are never made from the trunk. They are always on a branch with a whole bunch of backports. I agree however testing against the branch point (where trunk and the release branch share common ancestor) would offer a pretty good guarantee.

I guess there is nothing stopping us from choosing actual release branches?

However I'm not sure was what to link our continuous updates to LLVM releases like that. I like the idea of the LLVM roller being flexible. Are you suggesting that we have emscripten follow each new release branch up until the point where we actaully need something from tot, and which point we switch it back to tot?

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 7, 2025

Also, we them end up deviding emscripten releases into those that we llvm-stable compatible and those that are not. With this approach all emscripten releases are packagable with at least one llvm stable release.

I don't really want to be a in the position of saying its simple not feasible to package certain emscripten releases since they didn't happen to conincide with an llvm release. I think it could be good if all emscripten versions should be packagable.

@kripken
Copy link
Member

kripken commented Jan 7, 2025

I guess there is nothing stopping us from choosing actual release branches?

Good questions, I'm not sure of the details. But I hope we could use the release branches, so we would have an emscripten release that literally uses the same LLVM as the LLVM release that came out that week. And the next update after would be from the main branch once more, etc.

@dschuff
Copy link
Member

dschuff commented Jan 7, 2025

@kripken I think what you are suggesting is that in the emscripten code we essentially do exactly what we are doing now (where every revision of emscripten supports exactly one revision of LLVM), but then just ensure that some revision of emscripten is released/tested with a release version of LLVM?

If we do that, then anyone who wants to use a release version of LLVM will only ever be able to use exactly one version of emscripten. This would be only a slight improvement over what we have now (i.e. an emscripten version that works with each LLVM release probably does exist today if you find the right one, but this would guarantee and test it). But there would be no way to update Emscripten for such users. This PR as written here would at least allow anyone with the most recent release of LLVM to get emscripten updates until the next stable release.
I would be interested in knowing how this would work in the distro use case, and what other use cases users might have.

The required change in this PR is quite minimaI. I would also be curious to see what happens if you try LLVM 18.

In terms of implementation, a more flexible option (compared to having just is_llvm_stable would be to just have something like

if get_clang_version() >= 19:
  use_new_feature
else:
  error  # or use old feature

Then whenever we wanted to drop support for LLVM 19, we could just find all the places we check for that version specifically, and delete them. We could also use a mechanism like this for newer revisions that aren't part of a release yet, if we wanted to use e.g. Google's tested revisions. It would need to be slightly more sophisticated than just the major version number to handle unreleased revisions but I'm sure we could figure something out.

@dschuff
Copy link
Member

dschuff commented Jan 7, 2025

Technically with our DEPS system nothing is stopping us from shipping an LLVM revision on a release branch.

@dschuff
Copy link
Member

dschuff commented Jan 7, 2025

Also I do think it would be OK to run these tests on the Chromium post-commit bots that don't block the roll, especially if we keep the autorollers turned on. Breakages probably won't be too frequent, and we'll find out about them pretty quickly. And we can run as many tests as we want.

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 7, 2025

While this is extra testing burden, I hope it won't be too much extra maintenance burden.

Since the tests run in parallel it doesn't actually cause PR to have to wait longer (since this is not the slowest job).

The the concern about testing time is really a concern about using more circleci budget I guess? In that can we could consider (as Derek says) running these tests on the chromium CI.. or we could use github actions to run them instead of circleci? Alternatively we could look into self hosted test bots as a way to reduce the financial impact of adding new bots.

@kripken
Copy link
Member

kripken commented Jan 7, 2025

To be clear, I'm not opposed to this PR or this feature. I'm just still not sure what it gains for us.

In particular, why is it important for people to be able to use more than one Emscripten version with a particular LLVM stable? For Linux distros I imagine one is enough, similarly to how a single Binaryen version is enough for a particular LLVM as well.

However, note that you can use multiple Emscriptens even with what I proposed: if Emscripten 4.18 uses LLVM stable, and the next time we update LLVM is 4.33, then 4.18-4.32 are all using LLVM stable. (This assumes we are going to a slower cadence of updates from LLVM, as discussed earlier.)

Btw, another way to look at it is this: Imagine we are starting from scratch when thinking about LLVM updates. Then a natural thing to do is to update LLVM on each stable release (as e.g. wasi-sdk does). Emscripten can do just that, but also roll in incremental checkpoints in the middle, at manually chosen times, whenever we have some update that we want to pull in.

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 7, 2025

It sounds like you are proposing a world were with "pin to stable when possible" and "jump to head when needed", in the hope that that later will be possible for a lot of the time?

I'm not necessarily opposed to that idea, but my idea is more like the best of both worlds. It allows us to be both always be pinned to stable and always be forward looking at the same time. There is of course a cost to supporting both which is twofold:

  1. Complexity cost in the driver
  2. CI/testing cost

I think if we can show that both (1) and (2) are small enough then it could be worth it. The questions remains as to is how to define "small enough" and how to measure these costs.

@kripken
Copy link
Member

kripken commented Jan 7, 2025

I just don't see yet the benefit from supporting both. Which users specifically do you see as being helped by this?

But I agree the complexity cost is not huge, so the bar is not very high.

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 7, 2025

Any user wanting to package the current version of emscripten today would benefit. Package maintainers try to package the current versions of software and there is an expectation that current version of X to be compatible with the current versions of its dependencies X, Y and Z.

Imagine you are packaging emscripten today and you want it to be compatible with the llvm package which your colleagues are working on (v19). You would not expect to be forced to go back in time to some older version of emscripten from last yeat, especially if there are features in the latest emscripten version that your users will want.

What is more, I'm not just convinced that emscripten will want to track LLVM stable for any significant length of time. This meaning that most of the time emscripten will be tracking tot, which means most emscripten versions will be "unpackagable with llvm stable" regardless of which version of llvm stable you choose.

@kripken
Copy link
Member

kripken commented Jan 7, 2025

Package maintainers try to package the current versions of software

In my experience, and this is where we might be seeing different things, package maintainers want the latest stable version, and they are ok if the stable version is a bit older. It's common for them to package a version that is several months old - e.g. LLVM is often 6 months old in Linux distros, if not more.

In my proposal, the Emscripten version they use would be just as old as the LLVM version. So I think it would be acceptable.

But, again, we may be thinking of different users here. I have in mind Linux distro packagers. What kind of user do you see that needs the very latest Emscripten and also wants an older LLVM? (I do agree that for such a user, what you propose would be ideal.)

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 7, 2025

I would argue that at any given time the "stable" version of LLVM is up to 6 months old but the "stable" version of emscripten is only a couple of weeks old. This is because emscripten does much more frequent releases than llvm.

Perhaps you are correct and packagers would ok going back 6 months when packaging emscripten. My other point still stands: We should not have some version of emscripten that are unpackagable. Something certain emscripten versions a shown to have issues and package maintainers should be allowed to pick whichever emscripten release suites them. You are correct, we could "bless" certain emscripten releases as being packaging-compatible, but that doesn't seem like a great outcome.

Also we have already missed the window so I guess that first time we could execute the plan your are suggesting is when v21 branch is created?

@kripken
Copy link
Member

kripken commented Jan 7, 2025

We should not have some version of emscripten that are unpackagable. Something certain emscripten versions a shown to have issues and package maintainers should be allowed to pick whichever emscripten release suites them. You are correct, we could "bless" certain emscripten releases as being packaging-compatible, but that doesn't seem like a great outcome.

I see, thanks for explaining. You are right that this might help sometimes.

But with that said, if we update somewhat rarely, I am not that worried here. First, if say LLVM 20 is compatible with Emscripten versions 4.20-4.24, but not 4.25-4.40, then any bug in the first release 4.20 would have had a few releases to get fixed, and packagers could use those. Second, AFAIK it is standard practice for packagers to do such backporting themselves in Linux distros, when there is a justified reason (like a security patch).

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 7, 2025

I'm still feel like the solution you are proposing means its hard to make strong guarantees about compatibility though.

Even if emscripten tracks a release branch for a while while its first created, how can we be sure we will track it long enough that we will actually get to the released version. IIRC release branches exist for many weeks (months?) before the release is created. Can we commit to tracking each new release branch at least that long? It sounds like this alternative means forces a certain llvm rolling schedule on us, which is quite a big change from what we have today.

@dschuff
Copy link
Member

dschuff commented Jan 7, 2025

I think I'd say that my intuition is about like Sam, in that it seems not great that some (probably most?) versions of Emscripten wouldn't be packageable, and that it would be better if Emscripten could always support some stable version. But also I don't have a clear picture yet of exactly who that would serve (e.g. are there people trying to maintain their own updated emscripten package on top of an LLVM that they can't update, such as from their distro? Or do distro packagers want to update Emscripten more often than however often they update LLVM?)

Regardless of whether we go to updating LLVM less often, I do think we'll want to be able to get changes into EMSDK more often than every 6 months, and we'll want to maintain the ability to update LLVM to the tip of trunk on-demand to bring in important improvements or fixes. So I think that would mean that most Emscripten versions wouldn't just accidentally correspond primarily to a release.

@kripken
Copy link
Member

kripken commented Jan 7, 2025

@sbc100

IIRC release branches exist for many weeks (months?) before the release is created. Can we commit to tracking each new release branch at least that long?

Good point, if those can take months it seems impractical. Out of curiosity, why is it that long, do you know?

Overall you and @dschuff make good points, so I'm ok with your approach.

@dschuff
Copy link
Member

dschuff commented Jan 7, 2025

For LLVM 19 it looks like it was branched on July 23, and marked final on September 17, with 4 RCs in between, so there was quite a bit of stabilization work in that time. That of course is the value of the release process, with all those patches on the branch reducing the number of bugs in the release. I've been thinking of "6 months" as the upper bound on the time for a change to reach stable, but maybe 2-8 is a more accurate range.

@curiousdannii
Copy link
Contributor

I'm not sure if this PR would help with my situation, but for another use case: I'm linking Rust and Emscripten. They each bundle their own LLVM, but so far I haven't run into any problems even though they have different major versions of LLVM. I guess that's still a possibility in the future (if they diverge on what a WASM is meant to look like?) However the one thing that would need the same LLVM version would be cross-language LTO.

My impression is that Emscripten gets new LLVM versions quickly, so that when Emscripten first switches to "19" that it's not necessarily going to be the same as the final version 19 - does it start essentially the same as 18 and then over time move to what will be stabilised as 19? So I think that simply watching for LLVM PRs here doesn't reliably indicate what the actual LLVM version a release of Emscripten is using, and I'm not sure how else to check. So greater clarity on the LLVM status with each release would be helpful.

The possibility of switching to a system version of stable LLVM could potentially make it easier to get cross-language LTO working. But because Emscripten seems to switch to new LLVM versions much quicker than Rust I'm not sure it's ever going to be feasible, unless I stick with quite old Emscripten releases, which I often can't do because I run into bugs.

@sbc100
Copy link
Collaborator Author

sbc100 commented Jan 8, 2025

My impression is that Emscripten gets new LLVM versions quickly, so that when Emscripten first switches to "19" that it's not necessarily going to be the same as the final version 19 - does it start essentially the same as 18 and then over time move to what will be stabilised as 19? So I think that simply watching for LLVM PRs here doesn't reliably indicate what the actual LLVM version a release of Emscripten is using, and I'm not sure how else to check. So greater clarity on the LLVM status with each release would be helpful.

Yes that is basically correct. We track tip-of-tree LLVM today, which means the LLVM version reported is always + higher than the current release branch. When LLVM create a new branch for vN then update the main branch to be vN+1 .. this is why emscripten currently requires v20 even though v20 has not be released and doesn't even have a release branch yet.

@dschuff
Copy link
Member

dschuff commented Jan 14, 2025

I think it makes sense to have a functionality like this, maybe as an undocumented experiment for now. I think we'd want to try to see how many places we end up accumulating in the code where we have a conditional on the LLVM version, or to disable a test, and we can see what kind of testing matrix we end up with. I'd also be interesting in finding out from users who use this, what they need in terms of how far back we'd need support. Do we just immediately drop support for LLVM 19 as soon as LLVM 20.1.0 (which is the first non-RC release, despite the 1 in the minor version number) releases? Or does there need to be some overlap?

wrt what I said above though: what do you think of making it more granular than "tot" vs "the most recent stable"? That would allow for a period of overlap if we want it, and also allow for having "stable" vs "current release" vs "future tot" if we want that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support llvm stable releases
4 participants