JIT: Add an "init BB" invariant #110404

jakobbotsch · 2024-12-04T18:06:42Z

This adds an invariant that there always exists an "init BB" throughout the JIT's phases. The init BB has the following properties:

It is only executed once, so it does not have any predecessors
It is not inside a try region, hence it dominates all other blocks in the function
(For debug code only) it is an internal block (not corresponding to any user IL)

There are no further requirements on the BB. The init BB does not have to be BBJ_ALWAYS (unlike the old "scratch BB" concept). This is mainly because it philosophically does not make sense to insert IR at the end of the init BB, since earlier phases can have inserted arbitrary IR in them.

This adds an invariant that there always exists an "init BB" throughout the JIT's phases. The init BB has the following properties: - It is only executed once, so it does not have any predecessors - It is not inside a try region, hence it dominates all other blocks in the function There are no further requirements on the BB. The init BB does not have to be `BBJ_ALWAYS` (unlike the old "scratch BB" concept). This is mainly because it philosophically does not make sense to insert IR at the end of the init BB, since earlier phases can have inserted arbitrary IR in them.

AndyAyersMS · 2024-12-10T16:25:01Z

@jakobbotsch turns out I also need something like this so I can compute dominators early.

jakobbotsch · 2024-12-10T17:18:41Z

I really want to get this in too, and have been investigating the diffs on and off. A large number of them come from loop inversion that now matches the pattern in more cases -- seems like the loop being the entry is blocking inversion.

Even with loop inversion disabled in both the base and diff there are a fair amount of diffs. Almost all of these are in OSR functions. If I filter out diffs in OSR functions then the diffs of this PR look like https://gist.github.com/jakobbotsch/6f5907231d7999edb7de2598df36fe05.

There are a large amount of zero sized diffs. Likely that's GC info changes. I need to look into that and ensure we aren't inflating the size of GC info unnecessarily. Could also be debug info changes.

One of the causes of diffs is this code in LSRA:

runtime/src/coreclr/jit/lsra.cpp

Lines 2429 to 2460 in f377332

    
           // This is the case when we have a conditional branch where one target has already 
        
           // been visited.  It would be best to use the same incoming regs as that block, 
        
           // so that we have less likelihood of having to move registers. 
        
           // For example, in determining the block to use for the starting register locations for 
        
           // "block" in the following example, we'd like to use the same predecessor for "block" 
        
           // as for "otherBlock", so that both successors of predBlock have the same locations, reducing 
        
           // the likelihood of needing a split block on a backedge: 
        
           // 
        
           //   otherPred 
        
           //       | 
        
           //   otherBlock <-+ 
        
           //     . . .      | 
        
           //                | 
        
           //   predBlock----+ 
        
           //       | 
        
           //     block 
        
           // 
        
           if (blockInfo[otherBlock->bbNum].hasEHBoundaryIn) 
        
           { 
        
               return nullptr; 
        
           } 
        
           else 
        
           { 
        
               for (BasicBlock* const otherPred : otherBlock->PredBlocks()) 
        
               { 
        
                   if (otherPred->bbNum == blockInfo[otherBlock->bbNum].predBBNum) 
        
                   { 
        
                       predBlock = otherPred; 
        
                       break; 
        
                   } 
        
               } 
        
           }

With the init block, we sometimes end up picking that init block as the pred of some loop exits. Previously we would just pick the loop pred as the pred.

Maybe an acceptable solution is to just remove the init block in the backend, if it is empty at that point.

AndyAyersMS · 2024-12-10T18:08:50Z

OSR diffs are likely less interesting, so I think its ok to focus on the others.

That stretch of LSRA logic always has seemed a bit fishy to me. Maybe it's just the inherent clunkiness of the linear world view. But seems like there might be better heuristics, esp with PGO.

jakobbotsch · 2024-12-11T14:45:17Z

The zero diffs had to do with reusing the first non-internal BB as the init BB; that would mean insertion of JIT code (like calls to CORINFO_HELP_DBG_IS_JUST_MY_CODE) in a BB marked with an IL range, which would change some IL/var mappings reported for debug code.
I've added another requirement to the init BB: when generating debug-friendly code, it must be an internal block.

I think the codegen diffs are acceptable now to take as is. From spot checking the main causes of the diffs seem to be:

In OSR functions we frequently hit the case I described above, where LSRA picks another pred for some loop exits. I looked a bit further at this and the code seems well-intentioned and is meant to reduce number of split backedges, so even with larger code size this might actually be beneficial
Loop inversion and early jump-threading now kicks in more often (again, larger code size here is not necessarily bad)
We sometimes see different block layouts

Here are diffs with loop inversion disabled, and with diffs in OSR functions filtered out.

I'm going to kick off some stress legs.

jakobbotsch · 2024-12-11T14:45:52Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress

azure-pipelines · 2024-12-11T14:46:15Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2024-12-11T17:04:06Z

Looks like tests aren't completely happy:

Assert failure(PID 5932 [0x0000172c], Thread: 6412 [0x190c]): Assertion failed '!block->HasFlag(BBF_PATCHPOINT)' in 'System.AppContext:Setup(ulong,ulong,int)' during 'Expand patchpoints' (IL size 86; hash 0x16ec513f; Instrumented Tier0)

AndyAyersMS · 2024-12-11T22:05:20Z

src/coreclr/jit/patchpoint.cpp

-        {
-            compiler->fgEnsureFirstBBisScratch();
-        }
-


You should keep this or its moral equivalent, Tier0 instr needs to insert some execute once code at method entry

I would've thought the init BB would be sufficient. It should have been created before this.

Perhaps the issue is that the bbNum "is backedge" approximation is placing patchpoints in the init BB. Seems like we can teach the logic to never place a patchpoint in the init BB (or, more ideally, switch to a DFS tree based approach – but that's maybe too expensive for tier0).

The importer places the patchpoint in the init BB because of stress.

What is the point of the assert? Is it simply used as an attempt to verify "is not part of a cycle"? Seems like the assert should just be removed if so, given the stress mode.

I ended up removing the assert and also changing code below to make sure to insert the initialization of the patchpoint at the beginning of the init BB.

jakobbotsch · 2024-12-12T12:55:26Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr pgo, runtime-coreclr pgostress

azure-pipelines · 2024-12-12T12:55:56Z

Azure Pipelines successfully started running 4 pipeline(s).

jakobbotsch · 2024-12-12T14:51:07Z

/azp run runtime-coreclr superpmi-diffs

azure-pipelines · 2024-12-12T14:51:27Z

Azure Pipelines successfully started running 1 pipeline(s).

jakobbotsch · 2024-12-12T15:24:12Z

/azp run runtime-coreclr superpmi-replay

azure-pipelines · 2024-12-12T15:24:32Z

Azure Pipelines successfully started running 1 pipeline(s).

jakobbotsch · 2024-12-12T17:57:17Z

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Diffs. See above for some notes on these. Stress failures look unrelated.

jakobbotsch · 2024-12-12T18:01:21Z

src/coreclr/jit/fgprofile.cpp

-        {
-            firstILBlock = firstILBlock->Next();
-        }
+        firstILBlock = fgGetFirstILBlock();


I want to submit a change to compute fgCalledCount earlier before I merge this PR. fgGetFirstILBlock added by this PR will follow BBJ_ALWAYS to find the first IL block, but it seems dangerous to have a requirement that we can find it that way this late (after morph).

I decided to revert this back to what it was to get this PR in quicker. Can look at cleaning this up separately.

AndyAyersMS

Thanks for doing this, it is something I've thought about doing for a while but never got around to it.

AndyAyersMS · 2024-12-12T20:59:25Z

src/coreclr/jit/flowgraph.cpp

    // Create a block for the start of the try region, where the monitor enter call
    // will go.
-    BasicBlock* const tryBegBB  = fgSplitBlockAtEnd(fgFirstBB);
+    BasicBlock* const tryBegBB  = fgSplitBlockAtBeginning(fgFirstBB);


If we split at the beginning, we may put already jit-inserted IR into the try. I suppose that's ok as none of it should ever cause an exception.

Yeah, I agree it looks a bit odd, but I think it's preferable over a solution that tries to differentiate between IR created because of the user IL and IR created due to the JIT. I really want us to be able to freely move IR and merge/duplicate blocks without breaking subtle invariants about what kind of IR can go where.

So far it seems like most of these cases come down to careful consideration of phase ordering. For example, this try-finally insertion is very much like IR created because of IL, so it would make sense to do it in (or close to) import. Another example is reverse pinvoke enter/exit – most of the JIT will naturally assume inserted IR is executed in cooperative mode, so this one makes sense to insert as the last thing in the JIT. Currently it's inserted quite early.

src/coreclr/jit/compiler.h

Co-authored-by: Austin Wise <AustinWise@gmail.com>

This adds an invariant that there always exists an "init BB" throughout the JIT's phases. The init BB has the following properties: - It is only executed once, so it does not have any predecessors - It is not inside a try region, hence it dominates all other blocks in the function There are no further requirements on the BB. The init BB does not have to be `BBJ_ALWAYS` (unlike the old "scratch BB" concept). This is mainly because it philosophically does not make sense to insert IR at the end of the init BB, since earlier phases can have inserted arbitrary IR in them.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 4, 2024

dotnet-policy-service bot assigned jakobbotsch Dec 4, 2024

Fix build

7109a14

jakobbotsch added 2 commits December 5, 2024 19:16

Minor cleanups

cff80d1

Avoid picking first BB as pred if it is empty

78da43f

jakobbotsch added 3 commits December 11, 2024 15:04

Ensure init BB is internal for debug codegen

6896e2d

Merge branch 'main' of github.com:dotnet/runtime into init-bb

be25acc

Revert LSRA change

8eefb31

AndyAyersMS reviewed Dec 11, 2024

View reviewed changes

jakobbotsch added 3 commits December 12, 2024 13:03

Remove assert, place init IR at the beginning

6541270

Nit

4d2f9aa

Expand fgDebugCheckInitBB

b1af59f

jakobbotsch mentioned this pull request Dec 12, 2024

JIT: Repair profile after tail-merging #110656

Open

jakobbotsch marked this pull request as ready for review December 12, 2024 17:56

jakobbotsch requested a review from AndyAyersMS December 12, 2024 17:57

jakobbotsch commented Dec 12, 2024

View reviewed changes

AndyAyersMS approved these changes Dec 12, 2024

View reviewed changes

AustinWise reviewed Dec 12, 2024

View reviewed changes

src/coreclr/jit/compiler.h Outdated Show resolved Hide resolved

jakobbotsch and others added 2 commits December 13, 2024 01:18

Update src/coreclr/jit/compiler.h

b2c65ba

Co-authored-by: Austin Wise <AustinWise@gmail.com>

Revert fgComputeCalledCount first IL block code

d538236

This was referenced Dec 13, 2024

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

jakobbotsch merged commit 5a9362f into dotnet:main Dec 13, 2024
103 of 108 checks passed

jakobbotsch deleted the init-bb branch December 13, 2024 13:28

LoopedBard3 mentioned this pull request Dec 17, 2024

[Perf] Linux/x64: System.Numerics.Tests.Perf_Matrix3x2 Regression on 12/13/2024 1:28:01 PM #110789

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Add an "init BB" invariant #110404

JIT: Add an "init BB" invariant #110404

jakobbotsch commented Dec 4, 2024 •

edited

Loading

AndyAyersMS commented Dec 10, 2024

jakobbotsch commented Dec 10, 2024 •

edited

Loading

AndyAyersMS commented Dec 10, 2024

jakobbotsch commented Dec 11, 2024

jakobbotsch commented Dec 11, 2024

azure-pipelines bot commented Dec 11, 2024

jakobbotsch commented Dec 11, 2024

AndyAyersMS Dec 11, 2024

jakobbotsch Dec 12, 2024

jakobbotsch Dec 12, 2024

jakobbotsch Dec 12, 2024

jakobbotsch Dec 12, 2024

jakobbotsch commented Dec 12, 2024

azure-pipelines bot commented Dec 12, 2024

jakobbotsch commented Dec 12, 2024

azure-pipelines bot commented Dec 12, 2024

jakobbotsch commented Dec 12, 2024

azure-pipelines bot commented Dec 12, 2024

jakobbotsch commented Dec 12, 2024

jakobbotsch Dec 12, 2024 •

edited

Loading

jakobbotsch Dec 13, 2024

AndyAyersMS left a comment

AndyAyersMS Dec 12, 2024

jakobbotsch Dec 13, 2024 •

edited

Loading

JIT: Add an "init BB" invariant #110404

JIT: Add an "init BB" invariant #110404

Conversation

jakobbotsch commented Dec 4, 2024 • edited Loading

AndyAyersMS commented Dec 10, 2024

jakobbotsch commented Dec 10, 2024 • edited Loading

AndyAyersMS commented Dec 10, 2024

jakobbotsch commented Dec 11, 2024

jakobbotsch commented Dec 11, 2024

azure-pipelines bot commented Dec 11, 2024

jakobbotsch commented Dec 11, 2024

AndyAyersMS Dec 11, 2024

Choose a reason for hiding this comment

jakobbotsch Dec 12, 2024

Choose a reason for hiding this comment

jakobbotsch Dec 12, 2024

Choose a reason for hiding this comment

jakobbotsch Dec 12, 2024

Choose a reason for hiding this comment

jakobbotsch Dec 12, 2024

Choose a reason for hiding this comment

jakobbotsch commented Dec 12, 2024

azure-pipelines bot commented Dec 12, 2024

jakobbotsch commented Dec 12, 2024

azure-pipelines bot commented Dec 12, 2024

jakobbotsch commented Dec 12, 2024

azure-pipelines bot commented Dec 12, 2024

jakobbotsch commented Dec 12, 2024

jakobbotsch Dec 12, 2024 • edited Loading

Choose a reason for hiding this comment

jakobbotsch Dec 13, 2024

Choose a reason for hiding this comment

AndyAyersMS left a comment

Choose a reason for hiding this comment

AndyAyersMS Dec 12, 2024

Choose a reason for hiding this comment

jakobbotsch Dec 13, 2024 • edited Loading

Choose a reason for hiding this comment

jakobbotsch commented Dec 4, 2024 •

edited

Loading

jakobbotsch commented Dec 10, 2024 •

edited

Loading

jakobbotsch Dec 12, 2024 •

edited

Loading

jakobbotsch Dec 13, 2024 •

edited

Loading