Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JIT] Enable EGPRs in JIT by adding REX2 encoding to the backend. #106557

Merged
merged 41 commits into from
Dec 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
1820567
Ruihan: POC with REX2
Ruihan-Yin Mar 25, 2024
d1afc68
resolve comments
Ruihan-Yin May 17, 2024
2335aa3
refactor register encoding for REX2
Ruihan-Yin May 20, 2024
6578c58
merge REX2 path to legacy path
Ruihan-Yin May 21, 2024
01eeb80
Enable REX2 in more instructions.
Ruihan-Yin May 30, 2024
690aee3
Avoid repeatedly estimate the size of REX2 prefix
Ruihan-Yin Jun 3, 2024
31d7fb4
Enable REX2 encoding on RI and SV path
Ruihan-Yin Jun 5, 2024
a995878
Add rex2 support to rotate and shift.
Ruihan-Yin Jun 6, 2024
74aacf6
CR session.
Ruihan-Yin Jun 7, 2024
c330927
Testing infra updates: assert REX2 is enabled.
Ruihan-Yin Jun 11, 2024
fbf20d1
revert rcl_N and rcr_N, tp and latency data for these instructions is…
Ruihan-Yin Jun 11, 2024
ea02e70
partially enable REX2 on emitOutputAM, case covered: R_AR and AR_R.
Ruihan-Yin Jun 12, 2024
c74b801
Adding unit tests.
Ruihan-Yin Jun 13, 2024
34980b4
push, pop, inc, dec, neg, not, xadd, shld, shrd, cmpxchg, setcc, bswap.
Ruihan-Yin Jun 26, 2024
2ffdbeb
bug fix for bswap
Ruihan-Yin Jun 27, 2024
3a729bb
bt
Ruihan-Yin Jun 28, 2024
d943b03
xchg, idiv
Ruihan-Yin Jul 1, 2024
c8fee9c
Make sure add REX2 prefix if register encoding for EGPRs are being ca…
Ruihan-Yin Jul 2, 2024
6ec0e97
Ensure code size is correctly computed in R_R_I path.
Ruihan-Yin Jul 8, 2024
1d01003
clean up
Ruihan-Yin Jul 9, 2024
1acc219
Change all AddSimdPrefix to AddX86Prefix
Ruihan-Yin Jul 15, 2024
87ad443
div, mulEAX
Ruihan-Yin Jul 16, 2024
bb9905a
filter out test from REX2 encoding when using ACC form.
Ruihan-Yin Jul 19, 2024
86083b2
Make sure REX prefix will not be added when emitting with REX2.
Ruihan-Yin Jul 24, 2024
dfe8760
resolve comments.
Ruihan-Yin Aug 5, 2024
64761cd
make sure the APX debug knob is only available under debug build.
Ruihan-Yin Oct 24, 2024
f1aba62
clean up some out-dated code.
Ruihan-Yin Nov 12, 2024
f5cc5a8
enable movsxd
Ruihan-Yin Nov 12, 2024
7ca8433
Enable "Call"
Ruihan-Yin Nov 13, 2024
bc4d225
Enable "JMP"
Ruihan-Yin Nov 15, 2024
deb3814
resolve merge errors
Ruihan-Yin Nov 18, 2024
0d63230
formatting
Ruihan-Yin Nov 18, 2024
13b8076
remote coredistools.dll for internal tests only
Ruihan-Yin Nov 18, 2024
42c6cfc
bug fix
Ruihan-Yin Nov 19, 2024
2e2eb01
resolve comments
Ruihan-Yin Nov 20, 2024
3d298b7
add more emitter tests.
Ruihan-Yin Nov 22, 2024
25a54d3
resolve comments.
Ruihan-Yin Dec 2, 2024
791b505
clean up some comments and tweak the REX2 stress logic
Ruihan-Yin Dec 4, 2024
094e76b
clean up
Ruihan-Yin Dec 4, 2024
6502ae1
formatting.
Ruihan-Yin Dec 4, 2024
5d3cca2
resolve comments.
Ruihan-Yin Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/coreclr/jit/codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,7 @@ class CodeGen final : public CodeGenInterface

#if defined(TARGET_AMD64)
void genAmd64EmitterUnitTestsSse2();
void genAmd64EmitterUnitTestsApx();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR really, but it'd be nice if we had similar tests for other ISAs/encodings (VEX, EVEX, etc). Sse2 itself is, afair, really just SimdLegacyEncoding.

#endif

#endif // defined(DEBUG)
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/jit/codegenlinear.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2702,6 +2702,10 @@ void CodeGen::genEmitterUnitTests()
{
genAmd64EmitterUnitTestsSse2();
}
if (unitTestSectionAll || (strstr(unitTestSection, "apx") != nullptr))
{
genAmd64EmitterUnitTestsApx();
}

#elif defined(TARGET_ARM64)
if (unitTestSectionAll || (strstr(unitTestSection, "general") != nullptr))
Expand Down
219 changes: 219 additions & 0 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9053,6 +9053,225 @@ void CodeGen::genAmd64EmitterUnitTestsSse2()
GetEmitter()->emitIns_R_R_R(INS_cvtsd2ss, EA_8BYTE, REG_XMM0, REG_XMM1, REG_XMM2);
}

/*****************************************************************************
* Unit tests for the APX instructions.
*/

void CodeGen::genAmd64EmitterUnitTestsApx()
{
emitter* theEmitter = GetEmitter();

genDefineTempLabel(genCreateTempLabel());

// This test suite needs REX2 enabled.
if (!theEmitter->UseRex2Encoding() && !theEmitter->emitComp->DoJitStressRex2Encoding())
{
return;
}

theEmitter->emitIns_R_R(INS_add, EA_1BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_2BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_add, EA_8BYTE, REG_EAX, REG_ECX);
tannergooding marked this conversation as resolved.
Show resolved Hide resolved
theEmitter->emitIns_R_R(INS_or, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_adc, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_sbb, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_and, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_sub, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_xor, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_cmp, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_test, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_bsf, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_bsr, EA_4BYTE, REG_EAX, REG_ECX);

theEmitter->emitIns_R_R(INS_cmovo, EA_4BYTE, REG_EAX, REG_ECX);

theEmitter->emitIns_Mov(INS_mov, EA_4BYTE, REG_EAX, REG_ECX, false);
theEmitter->emitIns_Mov(INS_movsx, EA_2BYTE, REG_EAX, REG_ECX, false);
theEmitter->emitIns_Mov(INS_movzx, EA_2BYTE, REG_EAX, REG_ECX, false);

theEmitter->emitIns_R_R(INS_popcnt, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_lzcnt, EA_4BYTE, REG_EAX, REG_ECX);
theEmitter->emitIns_R_R(INS_tzcnt, EA_4BYTE, REG_EAX, REG_ECX);

theEmitter->emitIns_R_I(INS_add, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_add, EA_2BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_or, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_adc, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_sbb, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_and, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_sub, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_xor, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_cmp, EA_4BYTE, REG_EAX, 0x05);
theEmitter->emitIns_R_I(INS_test, EA_4BYTE, REG_EAX, 0x05);

theEmitter->emitIns_R_I(INS_mov, EA_4BYTE, REG_EAX, 0xE0);

// JIT tend to compress imm64 to imm32 if higher half is all-zero, make sure this test checks the path for imm64.
theEmitter->emitIns_R_I(INS_mov, EA_8BYTE, REG_RAX, 0xFFFF000000000000);

// shf reg, cl
theEmitter->emitIns_R(INS_rol, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_ror, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_rcl, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_rcr, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_shl, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_shr, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_sar, EA_4BYTE, REG_EAX);

// shf reg, 1
theEmitter->emitIns_R(INS_rol_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_ror_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_rcl_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_rcr_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_shl_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_shr_1, EA_4BYTE, REG_EAX);
theEmitter->emitIns_R(INS_sar_1, EA_4BYTE, REG_EAX);

// shf reg, imm8
theEmitter->emitIns_R_I(INS_shl_N, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_shr_N, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_sar_N, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_rol_N, EA_4BYTE, REG_ECX, 0x05);
theEmitter->emitIns_R_I(INS_ror_N, EA_4BYTE, REG_ECX, 0x05);
// TODO-xarch-apx: not enable these 2 for now.
// theEmitter->emitIns_R_I(INS_rcl_N, EA_4BYTE, REG_ECX, 0x05);
// theEmitter->emitIns_R_I(INS_rcr_N, EA_4BYTE, REG_ECX, 0x05);
Comment on lines +9137 to +9139
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for these ones being skipped? Can we open tracking issues and list the issue number as part of the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/emitxarch.cpp#L18695

It seems that the latency/tp information is missing for rcl_N/rcr_N, so I was supposing if these 2 instructions are not used in JIT. I can add those information if it is needed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not required in this PR, but it would be good to ensure its all handled or tracked long term. I imagine this is representative of a potentially missing optimization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I can submit an issue accordingly.


theEmitter->emitIns_R(INS_neg, EA_2BYTE, REG_EAX);
theEmitter->emitIns_R(INS_not, EA_2BYTE, REG_EAX);

theEmitter->emitIns_R_AR(INS_lea, EA_4BYTE, REG_ECX, REG_EAX, 4);

theEmitter->emitIns_R_AR(INS_mov, EA_1BYTE, REG_ECX, REG_EAX, 4);
theEmitter->emitIns_R_AR(INS_mov, EA_2BYTE, REG_ECX, REG_EAX, 4);
theEmitter->emitIns_R_AR(INS_mov, EA_4BYTE, REG_ECX, REG_EAX, 4);
theEmitter->emitIns_R_AR(INS_mov, EA_8BYTE, REG_ECX, REG_EAX, 4);

theEmitter->emitIns_R_AR(INS_add, EA_1BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_add, EA_2BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_add, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_add, EA_8BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_or, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_adc, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_sbb, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_and, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_sub, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_xor, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_cmp, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_test, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_bsf, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_bsr, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_popcnt, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_lzcnt, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_tzcnt, EA_4BYTE, REG_EAX, REG_ECX, 4);

theEmitter->emitIns_AR_R(INS_add, EA_1BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_add, EA_2BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_add, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_add, EA_8BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_or, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_adc, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_sbb, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_and, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_sub, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_xor, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_cmp, EA_4BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_AR_R(INS_test, EA_4BYTE, REG_EAX, REG_ECX, 4);

theEmitter->emitIns_R_AR(INS_movsx, EA_2BYTE, REG_ECX, REG_EAX, 4);
theEmitter->emitIns_R_AR(INS_movzx, EA_2BYTE, REG_EAX, REG_ECX, 4);
theEmitter->emitIns_R_AR(INS_cmovo, EA_4BYTE, REG_EAX, REG_ECX, 4);

theEmitter->emitIns_AR_R(INS_xadd, EA_4BYTE, REG_EAX, REG_EDX, 2);

theEmitter->emitIns_R_R_I(INS_shld, EA_4BYTE, REG_EAX, REG_ECX, 5);
theEmitter->emitIns_R_R_I(INS_shrd, EA_2BYTE, REG_EAX, REG_ECX, 5);
// TODO-XArch-apx: S_R_I path only accepts SEE or VEX instructions,
// so I assuem shld/shrd will not be taking the first argument from stack.
// theEmitter->emitIns_S_R_I(INS_shld, EA_2BYTE, 1, 2, REG_EAX, 5);
// theEmitter->emitIns_S_R_I(INS_shrd, EA_2BYTE, 1, 2, REG_EAX, 5);

theEmitter->emitIns_AR_R(INS_cmpxchg, EA_2BYTE, REG_EAX, REG_EDX, 2);

theEmitter->emitIns_R(INS_seto, EA_1BYTE, REG_EDX);

theEmitter->emitIns_R(INS_bswap, EA_8BYTE, REG_EDX);

// INS_bt only has reg-to-reg form.
theEmitter->emitIns_R_R(INS_bt, EA_2BYTE, REG_EAX, REG_EDX);

theEmitter->emitIns_R(INS_idiv, EA_8BYTE, REG_EDX);

theEmitter->emitIns_R_R(INS_xchg, EA_8BYTE, REG_EAX, REG_EDX);

theEmitter->emitIns_R(INS_div, EA_8BYTE, REG_EDX);
theEmitter->emitIns_R(INS_mulEAX, EA_8BYTE, REG_EDX);

GenTreePhysReg physReg(REG_EDX);
physReg.SetRegNum(REG_EDX);
GenTreeIndir load = indirForm(TYP_INT, &physReg);

theEmitter->emitIns_R_A(INS_add, EA_1BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_add, EA_2BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_add, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_add, EA_8BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_or, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_adc, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_sbb, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_and, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_sub, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_xor, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_cmp, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_test, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_bsf, EA_4BYTE, REG_EAX, &load);
theEmitter->emitIns_R_A(INS_bsr, EA_4BYTE, REG_EAX, &load);

// Note:
// All the tests below rely on the runtime status of the stack this unit tests attaching to,
// it might fail due to stack value unavailable/mismatch, since these tests are mainly for
// encoding correctness check, this kind of failures may be considered as not harmful.

theEmitter->emitIns_R_S(INS_add, EA_1BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_add, EA_2BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_add, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_add, EA_8BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_or, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_adc, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_sbb, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_and, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_sub, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_xor, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_cmp, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_test, EA_4BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_S_R(INS_xadd, EA_2BYTE, REG_EAX, 0, 0);

theEmitter->emitIns_S_I(INS_shl_N, EA_4BYTE, 0, 0, 4);
theEmitter->emitIns_S(INS_shl_1, EA_4BYTE, 0, 4);

theEmitter->emitIns_R_S(INS_movsx, EA_2BYTE, REG_ECX, 0, 0);
theEmitter->emitIns_R_S(INS_movzx, EA_2BYTE, REG_EAX, 0, 0);
theEmitter->emitIns_R_S(INS_cmovo, EA_4BYTE, REG_EAX, 0, 0);

theEmitter->emitIns_R(INS_pop, EA_PTRSIZE, REG_EAX);
theEmitter->emitIns_R(INS_push, EA_PTRSIZE, REG_EAX);
theEmitter->emitIns_R(INS_pop_hide, EA_PTRSIZE, REG_EAX);
theEmitter->emitIns_R(INS_push_hide, EA_PTRSIZE, REG_EAX);

theEmitter->emitIns_S(INS_pop, EA_PTRSIZE, 0, 0);
theEmitter->emitIns_I(INS_push, EA_PTRSIZE, 50);

theEmitter->emitIns_R(INS_inc, EA_4BYTE, REG_EAX);
theEmitter->emitIns_AR(INS_inc, EA_2BYTE, REG_EAX, 2);
theEmitter->emitIns_S(INS_inc, EA_2BYTE, 0, 0);
theEmitter->emitIns_R(INS_dec, EA_4BYTE, REG_EAX);
theEmitter->emitIns_AR(INS_dec, EA_2BYTE, REG_EAX, 2);
theEmitter->emitIns_S(INS_dec, EA_2BYTE, 0, 0);

theEmitter->emitIns_S(INS_neg, EA_2BYTE, 0, 0);
theEmitter->emitIns_S(INS_not, EA_2BYTE, 0, 0);
}

#endif // defined(DEBUG) && defined(TARGET_AMD64)

#ifdef PROFILING_SUPPORTED
Expand Down
5 changes: 4 additions & 1 deletion src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2295,7 +2295,10 @@ void Compiler::compSetProcessor()
if (canUseEvexEncoding())
{
codeGen->GetEmitter()->SetUseEvexEncoding(true);
// TODO-XArch-AVX512 : Revisit other flags to be set once avx512 instructions are added.
}
if (canUseApxEncoding())
{
codeGen->GetEmitter()->SetUseRex2Encoding(true);
}
}
#endif // TARGET_XARCH
Expand Down
50 changes: 48 additions & 2 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -9945,6 +9945,17 @@ class Compiler
return (compOpportunisticallyDependsOn(InstructionSet_EVEX));
}

//------------------------------------------------------------------------
// canUseRex2Encoding - Answer the question: Is Rex2 encoding supported on this target.
//
// Returns:
// `true` if Rex2 encoding is supported, `false` if not.
//
bool canUseApxEncoding() const
{
return compOpportunisticallyDependsOn(InstructionSet_APX);
}

private:
//------------------------------------------------------------------------
// DoJitStressEvexEncoding- Answer the question: Do we force EVEX encoding.
Expand All @@ -9959,7 +9970,7 @@ class Compiler
// otherwise use VEX encoding but can be EVEX encoded to use EVEX encoding
// This requires AVX512F, AVX512BW, AVX512CD, AVX512DQ, and AVX512VL support

if (JitConfig.JitStressEvexEncoding() && IsBaselineVector512IsaSupportedOpportunistically())
if (JitStressEvexEncoding() && IsBaselineVector512IsaSupportedOpportunistically())
{
assert(compIsaSupportedDebugOnly(InstructionSet_AVX512F));
assert(compIsaSupportedDebugOnly(InstructionSet_AVX512F_VL));
Expand All @@ -9972,14 +9983,49 @@ class Compiler

return true;
}
else if (JitConfig.JitStressEvexEncoding() && compOpportunisticallyDependsOn(InstructionSet_AVX10v1))
else if (JitStressEvexEncoding() && compOpportunisticallyDependsOn(InstructionSet_AVX10v1))
{
return true;
}
#endif // DEBUG

return false;
}

//------------------------------------------------------------------------
// DoJitStressRex2Encoding- Answer the question: Do we force REX2 encoding.
//
// Returns:
// `true` if user requests REX2 encoding.
//
bool DoJitStressRex2Encoding() const
{
#ifdef DEBUG
if (JitConfig.JitStressRex2Encoding() && compOpportunisticallyDependsOn(InstructionSet_APX))
{
// we should make sure EVEX is also stressed when REX2 is stressed, as we will need to guarantee EGPR
// functionality is properly turned on for every instructions when REX2 is stress.
return true;
}
#endif // DEBUG

return false;
}

//------------------------------------------------------------------------
// JitStressEvexEncoding- Answer the question: Is Evex stress knob set
//
// Returns:
// `true` if user requests REX2 encoding.
//
bool JitStressEvexEncoding() const
{
#ifdef DEBUG
return JitConfig.JitStressEvexEncoding() || JitConfig.JitStressRex2Encoding();
#endif // DEBUG

return false;
}
#endif // TARGET_XARCH

/*
Expand Down
1 change: 1 addition & 0 deletions src/coreclr/jit/emit.h
Original file line number Diff line number Diff line change
Expand Up @@ -470,6 +470,7 @@ class emitter
#ifdef TARGET_XARCH
SetUseVEXEncoding(false);
SetUseEvexEncoding(false);
SetUseRex2Encoding(false);
#endif // TARGET_XARCH

emitDataSecCur = nullptr;
Expand Down
Loading
Loading