[AutoBump] Merge with 1211d979 (Sep 11) (1) #415

jorickert · 2024-12-12T09:24:39Z

No description provided.

The call chain to `Mutex:lock` can be polluted by stack protector. For completely safe, let's postpone the main TLS tearing down to a separate phase. fix llvm#108030

HLSL 202x inherits from C++11, which generates additional loop hint information for loops that must progress. Since HLSL 202x is going to be the default for Clang we want to make sure all our tests pass with it. Required for llvm#108044

…ard (llvm#106861) Since llvm#87832, unnamed identifiers are excluded from being diagnosed. As a result, the tests that were supposed to test that deleted functions are correctly ignored, are ignored because of the unnamed identifiers instead of the deleted function. This change simply introduces names for the parameters of the deleted functions.

…llvm#107943) GNU ld will error when encountering a pcrel_lo whose corresponding pcrel_hi is in a different section. [1] introduced a check to help prevent this issue by preventing outlining in a few circumstances. However, we can also hit this same issue when outlining from functions with prefixes ("hot"/"unlikely"/"unknown" from profile information, for example) as the outlined function might not have the same prefix, possibly resulting in a "paired" pcrel_lo and pcrel_hi ending up in different sections. To prevent this issue, take a similar approach as [1] and additionally prevent outlining when we see a pcrel_lo and the function has a prefix. [1] llvm@96c85f8 Fixes llvm#107520

Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar value to intialize v128.const.

…on. (llvm#108167) Treat WTFReportBacktrace, which prints out the backtrace, as trivial.

…vm#108238) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>

…fadd (llvm#108238)" (llvm#108256) This reverts commit 0d48d4d. Mistakenly landed without approval

On newer GPUs, where `cvta.param` instruction is available we can avoid making byval arguments when their pointers are used in a few more cases, even when `__grid_constant__` is not specified. - phi - select - memcpy from the parameter. Switched pointer traversal from a DIY implementation to PtrUseVisitor.

@jasonmolenda

Recently in llvm#107731 this change was revereted due to excess memory size in `TestSkinnyCore`. This was due to a bug where a range's end was being passed as size. Creating massive memory ranges. Additionally, and requiring additional review, I added more unit tests and more verbose logic to the merging of save core memory regions. @jasonmolenda as an FYI.

llvm#108199) Change comment command emitter to const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…#108201) Change HTMLNamedCharacterReferenceEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…lvm#108202) Change HTML Tags emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…m#108203) Change DataCollectors Emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…vm#108211) Change Opcode Emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…vm#108213) Change OpenCL builtins emitter to use const RecordKeeper This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

…lvm#108216) Change OptionDoc Emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089

Currently the nan* functions use nullptr dereferencing to crash with SIGSEGV if the input is nullptr. Both `nan(nullptr)` and `nullptr` dereferencing are undefined behaviors according to the C standard. Employing `nullptr` dereference in the `nan` function implementation is ok if users only linked against the pre-built library, but it might be completely removed by the compilers' optimizations if it is built from source together with the users' code. See for instance: https://godbolt.org/z/fd8KcM9bx This PR uses volatile load to prevent the undefined behavior if libc is built without sanitizers, and leave the current undefined behavior if libc is built with sanitizers, so that the undefined behavior can be caught for users' codes.

…is used (llvm#108263) This matches the behaviour of GNU ld and the ELF version of lld.

…ation is laid out (llvm#105714) In `User::operator new` a single allocation is created to store the `User` object itself, "intrusive" operands or a pointer for "hung off" operands, and the descriptor. After allocation, details about the layout (number of operands, how the operands are stored, if there is a descriptor) are stored in the `User` object by settings its fields. The `Value` and `User` constructors are then very careful not to initialize these fields so that the values set during allocation can be subsequently read. However, when the `User` object is returned from `operator new` [its value is technically "indeterminate" and so reading a field without first initializing it is undefined behavior (and will be erroneous in C++26)](https://en.cppreference.com/w/cpp/language/default_initialization#Indeterminate_and_erroneous_values). We discovered this issue when trying to build LLVM using MSVC's [`/sdl` flag](https://learn.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=msvc-170) which clears class fields after allocation (the docs say that this feature shouldn't be turned on for custom allocators and should only clear pointers, but that doesn't seem to match the implementation). MSVC's behavior both with and without the `/sdl` flag is standards conforming since a program is supposed to initialize storage before reading from it, thus the compiler implementation changing any values will never be observed in a well-formed program. The standard also provides no provisions for making storage bytes not indeterminate by setting them during allocation or `operator new`. The fix for this is to create a set of types that encode the layout and provide these to both `operator new` and the constructor: * The `AllocMarker` types are used to select which `operator new` to use. * `AllocMarker` can then be implicitly converted to a `AllocInfo` which tells the constructor how the type was laid out.

Summary: There's an extern weak symbol for this, we should just factor these into a more common interface. Stub them temporarily to make the bots happy. PTXAS does not handle extern weak.

This patch adds a benchmark for ReplaceUsesOfWith().

Make if constexpr due to constexpr condition.

Otherwise we fail to build with modules in C++03 mode once we migrate to a single top-level module, because those headers get pulled in but they don't compile as C++03.

It doesn't serve much of a purpose since we can easily put its contents inside __config. Removing it simplifies the modulemap once we are trying to create a single top-level module.

…8257) We should allow singleton and fooSingleton as singleton function names.

The implementation would crash with unloaded dialects.

…do-probes mode (llvm#106365) Implement selective probe parsing for profiled functions only when emitting probe information to YAML profile as suggested in llvm#102904 (review) For a large binary, this reduces probe parsing - processing time from 10.5925s to 5.6295s, - peak RSS from 10.54 to 7.98 GiB.

Align BAT YAML (DataAggregator) to YAMLProfileWriter which drops blocks without profile: https://github.com/llvm/llvm-project/blob/61372fc5db9b14fd612be8a58a76edd7f0ee38aa/bolt/lib/Profile/YAMLProfileWriter.cpp#L162-L176 Test Plan: NFCI

llvm#108107) …tting" (llvm#108104)" This recommits 0f56ba1 (reverted by 6007ad7). In the original patch llvm/utils/lit/tests/escape-color.py failed on Windows because it diffed llvm-lit output with a file containing '\n' newlines rather than '\r\n'. This issue is avoided by calling 'diff --strip-trailing-cr'. Original description below: Test output that carried color across newlines previously resulted in the formatting around the output also being colored. Detect the current ANSI color and reset it when printing formatting, and then reapply it. As an added bonus an unterminated color code is also detected, preventing it from leaking out into the rest of the terminal. Fixes llvm#106633

llvm#108311) …de loaded from different modules (llvm#104512)" This reverts commit d778689.

) This adds VL patterns for vfwmaccbf16.vv so that we can handle fixed length vectors. It does this by teaching combineOp_VLToVWOp_VL to emit RISCVISD::VFWMADD_VL for bf16. The change in getOrCreateExtendedOp is needed because getNarrowType is based off of the bitwidth so returns f16. We need to explicitly check for bf16. Note that the .vf patterns don't work yet, since the build_vector splat gets lowered to a (vmv_v_x_vl (fmv_x_anyexth x)) instead of a vfmv.v.f, which SplatFP doesn't pick up, see llvm#106637.

Previously they were legal by default, so the truncstore/extload test cases would get combined and crash during selection. These are set to expand for f16 so do the same for bf16.

…lvm#108041) This patch fixes attr type of out_shape, which is i64 dense array attribute with exactly 4 elements. - Fix description of DenseArrayMaxCt - Add DenseArrayMinCt and move it to CommonAttrConstraints.td - Change type of out_shape to Tosa_IntArrayAttr4 Fixes llvm#107804.

If the value we're replacing has a name, we might as well preserve it.

…07871) Fixes: llvm#107846

This patch implements sandboxir::ConstantTokenNone mirroring llvm::ConstantTokenNone.

Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ```

…lvm#94190)" This reverts commit 2d4bdfb. A build breakage is reported at: https://lab.llvm.org/buildbot/#/builders/138/builds/3524

Fixes llvm#107401. Fixes llvm#107574.

Building with -DLLVM_ENABLE_EXPORTED_SYMBOLS_IN_EXECUTABLES=Off should not prevent use of opt plugins. This fix uses the approach implemented in llvm#101741. rdar://135841478

…#107648) Hello Arjun! Please allow me to contribute this patch as it helps me debugging significantly! When the 1's and 0's don't line up when debugging farkas lemma of numerous polyhedrons using simplex lexmin solver, it is truly straining on the eyes. Hopefully this patch can help others! The unfortunate part is the lack of testcase as I'm not sure how to add testcase for debug dumps. :) However, you can add this testcase to the SimplexTest.cpp to witness the nice printing! ```c++ TEST(SimplexTest, DumpTest) { int COLUMNS = 2; int ROWS = 2; LexSimplex simplex(COLUMNS * 2); IntMatrix m1(ROWS, COLUMNS * 2 + 1); // Adding LHS columns. for (int i = 0; i < ROWS; i++) { // an arbitrary formula to test all kinds of integers for (int j = 0; j < COLUMNS; j++) m1(i, j) = i + (2 << (i % 3)) * (-1 * ((i + j) % 2)); } // Adding RHS columns. for (int i = 0; i < ROWS; i++) { for (int j = 0; j < COLUMNS; j++) m1(i, j + COLUMNS) = j - (3 << (j % 4)) * (-1 * ((i + j * 2) % 2)); } for (int i = 0; i < m1.getNumRows(); i++) { ArrayRef<DynamicAPInt> curRow = m1.getRow(i); simplex.addInequality(curRow); } IntegerRelation rel = parseRelationFromSet("(x, y, z)[] : (z - x - 17 * y == 0, x - 11 * z >= 1)",2); simplex.dump(); m1.dump(); rel.dump(); } ``` ``` rows = 2, columns = 7 var: c3, c4, c5, c6 con: r0 [>=0], r1 [>=0] r0: -1, r1: -2 c0: denom, c1: const, c2: 2147483647, c3: 0, c4: 1, c5: 2, c6: 3 1 0 1 0 -2 0 1 1 0 -8 -3 1 3 7 0 -2 0 1 0 -3 1 3 7 0 Domain: 2, Range: 1, Symbols: 0, Locals: 0 2 constraints -1 -17 1 0 = 0 1 0 -11 -1 >= 0 ```

…indows target (llvm#104676) This PR first adds osutils for Windows, and changes some libc code to make libc and its tests build on the Windows target. It then temporarily disables some libc tests that are currently problematic on Windows. Specifically, the changes besides the addition of osutils include: - Macro `LIBC_TYPES_HAS_FLOAT16` is disabled on Windows. `clang-cl` generates calls to functions in `compiler-rt` to handle float16 arithmetic and these functions are currently not linked in on Windows. - Macro `LIBC_TYPES_HAS_INT128` is disabled on Windows. - The invocation to `::aligned_malloc` is changed to an invocation to `::_aligned_malloc`. - The following unit tests are temporarily disabled because they currently fail on Windows: - `test.src.__support.big_int_test` - `test.src.__support.arg_list_test` - `test.src.fenv.getenv_and_setenv_test` - Tests involving `__m128i`, `__m256i`, and `__m512i` in `test.src.string.memory_utils.op_tests.cpp` - `test_range_errors` in `libc/test/src/math/smoke/AddTest.h` and `libc/test/src/math/smoke/SubTest.h`

…-range-compare (NFC) /llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:320:26: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare] preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0; ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ /llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:335:28: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare] preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0; ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ 2 errors generated.

) Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ``` Compared with llvm#94190, this PR fix build failure by making C++17 happy.

Update ISDOpcodes.h documentation according to commit ad9d13d ("SelectionDAG: Swap operands of atomic_store") for less confusion.

…from int to FP. (llvm#108284) selectFPImm previously handled cases where an FPImm could be materialized in an integer register. We can generalize this to cases where a value was in an integer register and then copied to a scalar FP register to be used by a vector instruction. In the affected test, the call lowering code used up all of the FP argument registers and started using GPRs. Now we use integer vector instructions to consume those GPRs instead of moving them to scalar FP first.

SSE & AVX do not include instructions for shifting i8 vectors. Instead, they must be synthesized via other instructions. If pairs of i8 vectors share a shift amount, we can use SWAR techniques to substantially reduce the amount of code generated. Say we were going to execute this shift right: x >> {0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, ...} LLVM would previously generate: vpxor %xmm1, %xmm1, %xmm1 vpunpckhbw %ymm0, %ymm1, %ymm2 vpunpckhbw %ymm1, %ymm0, %ymm3 vpsllw $4, %ymm3, %ymm3 vpblendd $204, %ymm3, %ymm2, %ymm2 vpsrlw $8, %ymm2, %ymm2 vpunpcklbw %ymm0, %ymm1, %ymm3 vpunpcklbw %ymm1, %ymm0, %ymm0 vpsllw $4, %ymm0, %ymm0 vpblendd $204, %ymm0, %ymm3, %ymm0 vpsrlw $8, %ymm0, %ymm0 vpackuswb %ymm2, %ymm0, %ymm0 Instead, we can reinterpret a pair of i8 elements as an i16 and shift use the same shift amount. The only thing we need to do is mask out any bits which crossed the boundary from the top i8 to the bottom i8. This SWAR-style technique achieves: vpsrlw $4, %ymm0, %ymm1 vpblendd $170, %ymm1, %ymm0, %ymm0 vpand .LCPI0_0(%rip), %ymm0, %ymm0 This is implemented for both left and right logical shift operations. Arithmetic shifts are less well behaved here because the shift cannot also perform the sign extension for the lower 8 bits.

SchrodingerZhu and others added 30 commits September 11, 2024 12:22

[libc] fix tls teardown while being used (llvm#108229)

779a444

The call chain to `Mutex:lock` can be polluted by stack protector. For completely safe, let's postpone the main TLS tearing down to a separate phase. fix llvm#108030

[libc] implement vdso (llvm#91572)

d8e124d

[WebAssembly] Add load and store patterns for V8F16. (llvm#108119)

415288a

[WebAssembly] Support BUILD_VECTOR with F16x8. (llvm#108117)

c076638

Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar value to intialize v128.const.

[WebKit Static Analyzer] Treat WTFReportBacktrace as a trivial functi…

7721db4

…on. (llvm#108167) Treat WTFReportBacktrace, which prints out the backtrace, as trivial.

[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (ll…

0d48d4d

…vm#108238) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>

Revert "[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic …

cb03126

…fadd (llvm#108238)" (llvm#108256) This reverts commit 0d48d4d. Mistakenly landed without approval

[gn build] Port 96b7c64

f02c72f

[lld][WebAssembly] Reject shared libraries when -static/-Bstatic …

be770ed

…is used (llvm#108263) This matches the behaviour of GNU ld and the ELF version of lld.

[libc] Stub TLS functions on the GPU temporarily (llvm#108267)

666a3f4

Summary: There's an extern weak symbol for this, we should just factor these into a more common interface. Stub them temporarily to make the bots happy. PTXAS does not handle extern weak.

[SandboxIR][Bench] Benchmark RUOW (llvm#107456)

bd4e0df

This patch adds a benchmark for ReplaceUsesOfWith().

[ADT][NFC] Constexpr-ify if in DenseMap::clear (llvm#108243)

c3d39cb

Make if constexpr due to constexpr condition.

[libc++] Guard PSTL headers with >= C++17 (llvm#108234)

bbff52b

Otherwise we fail to build with modules in C++03 mode once we migrate to a single top-level module, because those headers get pulled in but they don't compile as C++03.

[libc++] Get rid of experimental/__config (llvm#108233)

118f120

It doesn't serve much of a purpose since we can easily put its contents inside __config. Removing it simplifies the modulemap once we are trying to create a single top-level module.

[WebKit Checkers] Allow "singleton" suffix to be camelCased. (llvm#10…

882f21e

…8257) We should allow singleton and fooSingleton as singleton function names.

[mlir][bufferization] Fix OpFilter::denyDialect (llvm#108249)

aabb012

The implementation would crash with unloaded dialects.

aaupov and others added 24 commits September 11, 2024 16:33

Revert "[RFC][C++20][Modules] Fix crash when function and lambda insi… (

3cd0137

llvm#108311) …de loaded from different modules (llvm#104512)" This reverts commit d778689.

[RISCV] Expand bf16 vector truncstores and extloads (llvm#108235)

44d1221

Previously they were legal by default, so the truncstore/extload test cases would get combined and crash during selection. These are set to expand for f16 so do the same for bf16.

[RISCV] Allow -mcmodel= to accept large for RV64 (llvm#107817)

757d8b3

[DirectX] Preserve value names in DXILOpLowering. NFC (llvm#108089)

3d12901

If the value we're replacing has a name, we might as well preserve it.

[clang-tidy][NFC] fix add_new_check python3.8 incompatibility (llvm#1…

39751e7

…07871) Fixes: llvm#107846

[SandboxIR] Implement ConstantTokenNone (llvm#108106)

c9ab697

This patch implements sandboxir::ConstantTokenNone mirroring llvm::ConstantTokenNone.

[mlir][scf] Extend consumer fuse to single nested scf.for (llvm#94190)

2d4bdfb

Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ```

Revert "[mlir][scf] Extend consumer fuse to single nested scf.for (l…

335538c

…lvm#94190)" This reverts commit 2d4bdfb. A build breakage is reported at: https://lab.llvm.org/buildbot/#/builders/138/builds/3524

[clang-format] Fix regressions in BAS_AlwaysBreak (llvm#107506)

8168088

Fixes llvm#107401. Fixes llvm#107574.

[opt] Fix opt for LLVM_ENABLE_EXPORTED_SYMBOLS_IN_EXECUTABLES=Off.

5e80fc8

Building with -DLLVM_ENABLE_EXPORTED_SYMBOLS_IN_EXECUTABLES=Off should not prevent use of opt plugins. This fix uses the approach implemented in llvm#101741. rdar://135841478

[clang-format][NFC] Minor clean of TokenAnnotatorTest

9469836

[CodeGen] Fix documentation for ISD::ATOMIC_STORE. NFC (llvm#108126)

08740a6

Update ISDOpcodes.h documentation according to commit ad9d13d ("SelectionDAG: Swap operands of atomic_store") for less confusion.

[AutoBump] Merge with 1211d979 (Sep 11)

ad9406e

jorickert requested a review from mgehre-amd December 12, 2024 09:25

Fix format

45e07e7

mgehre-amd approved these changes Dec 12, 2024

View reviewed changes

jorickert enabled auto-merge December 12, 2024 11:11

jorickert merged commit eee1900 into feature/fused-ops Dec 12, 2024
11 checks passed

jorickert deleted the bump_to_1211d979 branch December 12, 2024 11:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 1211d979 (Sep 11) (1) #415

[AutoBump] Merge with 1211d979 (Sep 11) (1) #415

jorickert commented Dec 12, 2024

[AutoBump] Merge with 1211d979 (Sep 11) (1) #415

[AutoBump] Merge with 1211d979 (Sep 11) (1) #415

Conversation

jorickert commented Dec 12, 2024