-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] Enable EGPRs in JIT by adding REX2 encoding to the backend. #106557
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
@dotnet/avx512-contrib can we reopen this as a PR ready to review? |
@anthonycanino I re-opened it (it wasn't clear to me if your question implied you did not have permission to do so). Either you or @Ruihan-Yin need to update to latest main and resolve the conflicts, then mark it ready-for-review. |
Hi @tannergooding, thanks for the reviews in #104637, it seems like the CPUID changes are just pending merge and there should be no major changes expected, so while waiting, I wonder if we can start the conversion on this PR? |
@Ruihan-Yin, just got #104637 merged. If we could get this PR updated so it contains just the new changes, that should make it a lot simpler to review and get in. |
Thanks! I will work on it soon. |
Hi Bruce. Looks like I had bumped the Do you have plans to bump the LLVM version for .NET 10? It looks like that LLVM 19 will cover APX, but LLVM 20 will be required for AVX10.2 after further discussions internally (which should be released early 2025 I believe). |
#109939 (do note that only covers official builds, distro builds will probably still use older Clang/GCC) |
866253d
to
42c6cfc
Compare
Yes. Also, I expect to re-tool the cordistools build process to make it easier than it is currently to update the dependent LLVM version. @MichalPetryka thanks for that link. That might make it easier to build cordistools with our AzureLinux containers (currently, the libc in Ubuntu 16.04 I think is limiting our ability to build new LLVM). However, note that coredistools version of LLVM and .NET 10 version of LLVM are not currently related, and hopefully can still be chosen independently, in the future. |
Hi @tannergooding, I tried to refactor the stress mode for REX2, can you please check if it is as expected? Plus, there are a few questions regarding to the comments, would appreciate it if you can take a look. Thanks! |
@tannergooding ping on @Ruihan-Yin 's latest questions |
src/coreclr/jit/compiler.cpp
Outdated
@@ -2297,6 +2297,10 @@ void Compiler::compSetProcessor() | |||
codeGen->GetEmitter()->SetUseEvexEncoding(true); | |||
// TODO-XArch-AVX512 : Revisit other flags to be set once avx512 instructions are added. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to this PR, but this comment is likely irrelevant now.
src/coreclr/jit/emitxarch.cpp
Outdated
// TODO-apx: | ||
// there are duplicated stress logics here and in HasExtendedGPReg() | ||
// need to clean up later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment looks outdated now
src/coreclr/jit/emitxarch.cpp
Outdated
// TODO-apx: It would be better to have stress mode on LSRA to forcely allocate EGPRs, | ||
// instead of stressing here. | ||
#if defined(DEBUG) | ||
if (emitComp->DoJitStressRex2Encoding()) | ||
{ | ||
return true; | ||
} | ||
#endif // DEBUG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure a tracking issue exists for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not quite sure why this check is needed. We don't have/need such a path for EVEX, we simply emit the EVEX encoding always under stress mode.
We then use the JitStressReg
knobs to preference different register sets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree that this part is now not needed. I will remove it.
src/coreclr/jit/emitxarch.cpp
Outdated
} | ||
#endif // DEBUG | ||
|
||
if (UseRex2Encoding()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this check actually necessary?
IsExtendedReg
freely returns true
for XMM16-XMM31
without a corresponding EVEX check. So I would have expected this can simply be (reg >= REG_R16) && (reg <= REG_R31)
-- with a temporary path doing || (DoJitStressRex2Encoding() && (reg >= REG_RAX) && (reg <= REG_R15))
until LSRA is properly updated, but as per the above I'm not sure that's really needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsExtendedGPReg
is only used in HasExtendedGPReg
, which is then used in TakesRex2Prefix
, I think it should be fine to leave a simple check like (reg >= REG_R16) && (reg <= REG_R31)
there. If we stress REX2 encoding, TakesRex2Prefix
will return true by DoJitStressRex2Encoding
check and never reach the register check.
For now, since the EGPR definition is still missing, I will simply return false there, and come back when EGPRs are defined. I will document this with an issue. (I'd expect this could be addressed within the PR for register allocator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes generally look good/correct to me now for a foundational PR.
I think we need at least one tracking issue covering the TODOs in here and ensuring they're cleaned up as the other work gets completed.
This needs an additional review from @dotnet/jit-contrib
Thanks for the review! TODOs are now tracked in #110414. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few somewhat minor suggestions and comments.
src/coreclr/jit/codegenxarch.cpp
Outdated
genDefineTempLabel(genCreateTempLabel()); | ||
|
||
// This test suite needs REX2 enabled. | ||
assert(theEmitter->UseRex2Encoding() || theEmitter->emitComp->DoJitStressRex2Encoding()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that these are all "normal" non-APX / non-REX2 instructions, that only test REX2 if REX2 stress is enabled? So to use this, you need to set DOTNET_JitStressRex2Encoding=1
and DOTNET_JitEmitUnitTestsSections=apx
?
Note that these asserts will cause DOTNET_JitEmitUnitTestsSections=all
(without DOTNET_JitStressRex2Encoding=1
set) to assert, which is undesirable. Instead of assert, maybe just return if REX2 is not being stressed? Or maybe it's possible that if you get here, to temporarily "turn on" REX2 stress/encoding for the duration of this function?
Presumably there will be additional cases, with EGPR high registers, where the REX2 encodings will be required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we temporarily turn on REX2 stress only in this function, then only the code generator will do its work using REX2 encoding, after we exit the function and turn off REX2 stress, the corresponding emitter will emit code under REX2 off.
I can make it an early return if REX2 is not stressed.
And yes, currently the emitter unit test can only perform encoding on GPR0~15, while we are still waiting for the EGPR definition, that will come with the PR for LSRA. I can add a TODO item for more unit tests with EGPRs to #110414 if needed.
src/coreclr/jit/codegenxarch.cpp
Outdated
// it might fail due to stack value unavailable/mismatch, since these tests are mainly for | ||
// encoding correctness check, this kind of failures may be considered as not harmful. | ||
|
||
GenTree* stkNum = theEmitter->emitComp->stackState.esStack[0].val; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the relationship to the stackState
variable.
Couldn't you:
GenTreePhysReg physReg(REG_EDX);
GenTreeIndir load(TYP_INT, &physReg);
to create [EDX]
addressing mode?
src/coreclr/jit/compiler.h
Outdated
{ | ||
// we should make sure EVEX is also stressed when REX2 is stressed, as we will need to guarantee EGPR | ||
// functionality is properly turned on for every instructions when REX2 is stress. | ||
assert(JitConfig.JitStressEvexEncoding()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that if you set DOTNET_JitStressRex2Encoding=1
but don't set DOTNET_JitStressEvexEncoding=1
you will get an assert. That's annoying. Is it really necessary? That is, can you stress REX2 without also stressing EVEX?
If it's required to also stress EVEX, I suggest creating a small helper function:
bool JitStressEvexEncoding() const
{
return JitConfig.JitStressEvexEncoding() || JitConfig.JitStressRex2Encoding();
}
and replace all occurrences of calls to JitConfig.JitStressEvexEncoding()
to calls to this new helper function. Then you can remove this assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had more discussion on this here: #106557 (comment)
I will replace the EVEX encoding check with the suggested helper, thanks for pointing out.
src/coreclr/jit/emitxarch.cpp
Outdated
// 2-byte | ||
return true; | ||
} | ||
if ((code & 0xFF0000) == 0x0F0000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be:
if ((code & 0xFFFF0000) == 0x000F0000)
(that is, check that high byte 4 is zero)? If we don't check this, couldn't a 4-byte code with 0F second byte be caught?
src/coreclr/jit/emitxarch.cpp
Outdated
|
||
if ((code & 0xFF000000) == 0x0F000000) | ||
{ | ||
// 4-byte, need to check if PP is prefixs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
// 4-byte, need to check if PP is prefixs | |
// 4-byte, need to check if PP is a prefix |
src/coreclr/jit/emitxarch.cpp
Outdated
// TODO-xarch-apx: | ||
// At this stage, we are only using REX2 in the case that non-simd integer instructions | ||
// with EGPRs being used in its operands, it could be either direct register uses, or | ||
// memory addresssig operands, i.e. index and base. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
// memory addresssig operands, i.e. index and base. | |
// memory addressing operands, i.e. index and base. |
src/coreclr/jit/emitxarch.cpp
Outdated
return false; | ||
} | ||
|
||
#if defined(DEBUG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter, but it seems like the stress check should be last, just before the final return false;
src/coreclr/jit/emitxarch.cpp
Outdated
@@ -1657,6 +1785,36 @@ bool emitter::HasHighSIMDReg(const instrDesc* id) const | |||
return false; | |||
} | |||
|
|||
//------------------------------------------------------------------------ | |||
// HasExtendedGPReg: Checks if an instruction uses a extended general purpose registers - EGPRs (r16-r31) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
// HasExtendedGPReg: Checks if an instruction uses a extended general purpose registers - EGPRs (r16-r31) | |
// HasExtendedGPReg: Checks if an instruction uses an extended general-purpose register - EGPR (r16-r31) |
src/coreclr/jit/emitxarch.cpp
Outdated
if ((code & 0xFF) == 0x0F) | ||
{ | ||
// some map-1 instructions have opcode in forms like: | ||
// XX0F, remove the leading 0x0F byte as it have been recoreded in REX2. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// XX0F, remove the leading 0x0F byte as it have been recoreded in REX2. | |
// XX0F, remove the leading 0x0F byte as it has been recorded in REX2. |
src/coreclr/jit/emitxarch.cpp
Outdated
@@ -3233,12 +3554,18 @@ inline unsigned emitter::insEncodeReg012(const instrDesc* id, regNumber reg, emi | |||
{ | |||
*code = AddRexBPrefix(id, *code); // REX.B | |||
} | |||
if (false /*reg >= REG_R16 && reg <= REG_R31*/) | |||
{ | |||
// seperate the encoding for REX2.B3/B4, REX2.B3 will be handled in `AddRexBPrefix`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// seperate the encoding for REX2.B3/B4, REX2.B3 will be handled in `AddRexBPrefix`. | |
// Separate the encoding for REX2.B3/B4, REX2.B3 will be handled in `AddRexBPrefix`. |
Thanks all for the reviews and suggestions! |
Overview
This PR is the follow-up PR after #104637, which added the initial CPUID and XSAVE updates for APX.
This PR adds REX2 encoding functionality for legacy instructions which enables the use of EGPR for
add
,sub
, etc. Note that this PR focuses on REX2 encoding only: a follow up PR will enable EGPR support via the register allocator.Specification
REX2 is a 2-byte prefix with a leading byte of
0xD5
, detailed format below:Similar to REX prefix, it provides the extended bits for the MODRM.REG field, REX2.R4/R3, and MODRM.R/M field, REX2.B4/B3, and the index register in SIB byte, REX2.X4/X3, those bits will act as the higher 5th/4th bits and combine with the field in MODRM and SIB byte as a 5-bit binary to access up to 32 registers.
REX2 prefix is generally available for legacy-map-0 and legacy-map-1 instructions, say 1-byte opcode or 2-byte opcode with escape byte 0x0F, with some exceptions.
Like VEX/EVEX, REX2 is considered as the last prefix before the main opcode, so it can not co-exist with REX/VEX/EVEX.
Design
The bulk of the changes occur in the backend emitter.
As there is no existing hardware that has APX support yet, we had some hacks to bypass the CPUID checks. In this PR,
DOTNET_JitStressRex2Encoding
will force all the eligible instructions to be encoded in REX2, regardless the presence of EGPRs in the operand. We had another switchDOTNET_JitBypassAPXCheck
, with which will only bypass the APX CPUID check but JIT will encode REX2 only if needed, this is more useful when the LSRA changes come.Testing
We followed a multi-step testing plan to verify the encoding correctness and the semantic correctness.
Testing results will be presented below.
1. Emitter unit tests
In
codgenxarch.cpp
, similar togenAmd64EmitterUnitTestsSse2
, we used theJitLateDisasm
feature to insert instructions to encode as unit tests for emitter, andLateDisasm
will invoke LLVM to disasm the code stream, this gave us the chance to cross validate the disassembly from JIT and LLVM. The output of this step is to verify the emit paths are generating "correct" code that would not trigger #UD or have wrong semantics.Note that we are using a custom
coredistools.dll
which uses a recent LLVM that supports APX decoding.2. SuperPMI
In this step, we would run the SuperPMI pipeline to get the asmdiffs with REX2 on and off, the inputs are all the MCH files. This step will give us the chance to check if there is any assertion failure or internal error within JIT and since the pipeline will invoke
coredistools.dll
as well, so we can verify the encoding correctness in a larger scope.To ensure the new changes will not hit the existing code path in terms of throughput, we ran tpdiff with base JIT to be the main branch where changes are based on, and diff JIT to be the one with all the REX2 changes.
3. JIT unit tests
The 2 steps mentioned above are mainly verifying the encoding correctness of the generated binary code. Then the last will examine the semantic correctness of the generated code, say since we are simply forcing all the compatible instructions to be encoded in REX2, so the original semantics should not change, so we expect exactly the same output with REX2 on/off.
We used the existing CoreCLR unit test set:
JIT
and run it in the Intel SDE emulator.Follow-up plans
This PR is only intended to provide the REX2 encoding functionality to the JIT backend, in terms of how to properly use it, we are preparing another PR that includes the updates on LSRA such that JIT will be able to allocate EGPRs only when needed, and generate optimal code.