-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Add an "init BB" invariant #110404
JIT: Add an "init BB" invariant #110404
Conversation
This adds an invariant that there always exists an "init BB" throughout the JIT's phases. The init BB has the following properties: - It is only executed once, so it does not have any predecessors - It is not inside a try region, hence it dominates all other blocks in the function There are no further requirements on the BB. The init BB does not have to be `BBJ_ALWAYS` (unlike the old "scratch BB" concept). This is mainly because it philosophically does not make sense to insert IR at the end of the init BB, since earlier phases can have inserted arbitrary IR in them.
@jakobbotsch turns out I also need something like this so I can compute dominators early. |
I really want to get this in too, and have been investigating the diffs on and off. A large number of them come from loop inversion that now matches the pattern in more cases -- seems like the loop being the entry is blocking inversion. Even with loop inversion disabled in both the base and diff there are a fair amount of diffs. Almost all of these are in OSR functions. If I filter out diffs in OSR functions then the diffs of this PR look like https://gist.github.com/jakobbotsch/6f5907231d7999edb7de2598df36fe05. There are a large amount of zero sized diffs. Likely that's GC info changes. I need to look into that and ensure we aren't inflating the size of GC info unnecessarily. Could also be debug info changes. One of the causes of diffs is this code in LSRA: runtime/src/coreclr/jit/lsra.cpp Lines 2429 to 2460 in f377332
With the init block, we sometimes end up picking that init block as the pred of some loop exits. Previously we would just pick the loop pred as the pred. Maybe an acceptable solution is to just remove the init block in the backend, if it is empty at that point. |
OSR diffs are likely less interesting, so I think its ok to focus on the others. That stretch of LSRA logic always has seemed a bit fishy to me. Maybe it's just the inherent clunkiness of the linear world view. But seems like there might be better heuristics, esp with PGO. |
The zero diffs had to do with reusing the first non-internal BB as the init BB; that would mean insertion of JIT code (like calls to I think the codegen diffs are acceptable now to take as is. From spot checking the main causes of the diffs seem to be:
Here are diffs with loop inversion disabled, and with diffs in OSR functions filtered out. I'm going to kick off some stress legs. |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
Looks like tests aren't completely happy:
|
{ | ||
compiler->fgEnsureFirstBBisScratch(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should keep this or its moral equivalent, Tier0 instr needs to insert some execute once code at method entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would've thought the init BB would be sufficient. It should have been created before this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the issue is that the bbNum
"is backedge" approximation is placing patchpoints in the init BB. Seems like we can teach the logic to never place a patchpoint in the init BB (or, more ideally, switch to a DFS tree based approach – but that's maybe too expensive for tier0).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The importer places the patchpoint in the init BB because of stress.
What is the point of the assert? Is it simply used as an attempt to verify "is not part of a cycle"? Seems like the assert should just be removed if so, given the stress mode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ended up removing the assert and also changing code below to make sure to insert the initialization of the patchpoint at the beginning of the init BB.
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr pgo, runtime-coreclr pgostress |
Azure Pipelines successfully started running 4 pipeline(s). |
/azp run runtime-coreclr superpmi-diffs |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr superpmi-replay |
Azure Pipelines successfully started running 1 pipeline(s). |
cc @dotnet/jit-contrib PTAL @AndyAyersMS Diffs. See above for some notes on these. Stress failures look unrelated. |
src/coreclr/jit/fgprofile.cpp
Outdated
{ | ||
firstILBlock = firstILBlock->Next(); | ||
} | ||
firstILBlock = fgGetFirstILBlock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to submit a change to compute fgCalledCount
earlier before I merge this PR. fgGetFirstILBlock
added by this PR will follow BBJ_ALWAYS
to find the first IL block, but it seems dangerous to have a requirement that we can find it that way this late (after morph).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to revert this back to what it was to get this PR in quicker. Can look at cleaning this up separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this, it is something I've thought about doing for a while but never got around to it.
// Create a block for the start of the try region, where the monitor enter call | ||
// will go. | ||
BasicBlock* const tryBegBB = fgSplitBlockAtEnd(fgFirstBB); | ||
BasicBlock* const tryBegBB = fgSplitBlockAtBeginning(fgFirstBB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we split at the beginning, we may put already jit-inserted IR into the try. I suppose that's ok as none of it should ever cause an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree it looks a bit odd, but I think it's preferable over a solution that tries to differentiate between IR created because of the user IL and IR created due to the JIT. I really want us to be able to freely move IR and merge/duplicate blocks without breaking subtle invariants about what kind of IR can go where.
So far it seems like most of these cases come down to careful consideration of phase ordering. For example, this try-finally insertion is very much like IR created because of IL, so it would make sense to do it in (or close to) import. Another example is reverse pinvoke enter/exit – most of the JIT will naturally assume inserted IR is executed in cooperative mode, so this one makes sense to insert as the last thing in the JIT. Currently it's inserted quite early.
Co-authored-by: Austin Wise <AustinWise@gmail.com>
This adds an invariant that there always exists an "init BB" throughout the JIT's phases. The init BB has the following properties: - It is only executed once, so it does not have any predecessors - It is not inside a try region, hence it dominates all other blocks in the function There are no further requirements on the BB. The init BB does not have to be `BBJ_ALWAYS` (unlike the old "scratch BB" concept). This is mainly because it philosophically does not make sense to insert IR at the end of the init BB, since earlier phases can have inserted arbitrary IR in them.
This adds an invariant that there always exists an "init BB" throughout the JIT's phases. The init BB has the following properties:
There are no further requirements on the BB. The init BB does not have to be
BBJ_ALWAYS
(unlike the old "scratch BB" concept). This is mainly because it philosophically does not make sense to insert IR at the end of the init BB, since earlier phases can have inserted arbitrary IR in them.