forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 1211d979 (Sep 11) (1) #415
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The call chain to `Mutex:lock` can be polluted by stack protector. For completely safe, let's postpone the main TLS tearing down to a separate phase. fix llvm#108030
HLSL 202x inherits from C++11, which generates additional loop hint information for loops that must progress. Since HLSL 202x is going to be the default for Clang we want to make sure all our tests pass with it. Required for llvm#108044
…ard (llvm#106861) Since llvm#87832, unnamed identifiers are excluded from being diagnosed. As a result, the tests that were supposed to test that deleted functions are correctly ignored, are ignored because of the unnamed identifiers instead of the deleted function. This change simply introduces names for the parameters of the deleted functions.
…llvm#107943) GNU ld will error when encountering a pcrel_lo whose corresponding pcrel_hi is in a different section. [1] introduced a check to help prevent this issue by preventing outlining in a few circumstances. However, we can also hit this same issue when outlining from functions with prefixes ("hot"/"unlikely"/"unknown" from profile information, for example) as the outlined function might not have the same prefix, possibly resulting in a "paired" pcrel_lo and pcrel_hi ending up in different sections. To prevent this issue, take a similar approach as [1] and additionally prevent outlining when we see a pcrel_lo and the function has a prefix. [1] llvm@96c85f8 Fixes llvm#107520
Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar value to intialize v128.const.
…on. (llvm#108167) Treat WTFReportBacktrace, which prints out the backtrace, as trivial.
…vm#108238) Extend the lowering of atomic.fadd to support the v2f16 variant avaliable on some AMDGPU chips. Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>
…fadd (llvm#108238)" (llvm#108256) This reverts commit 0d48d4d. Mistakenly landed without approval
On newer GPUs, where `cvta.param` instruction is available we can avoid making byval arguments when their pointers are used in a few more cases, even when `__grid_constant__` is not specified. - phi - select - memcpy from the parameter. Switched pointer traversal from a DIY implementation to PtrUseVisitor.
Recently in llvm#107731 this change was revereted due to excess memory size in `TestSkinnyCore`. This was due to a bug where a range's end was being passed as size. Creating massive memory ranges. Additionally, and requiring additional review, I added more unit tests and more verbose logic to the merging of save core memory regions. @jasonmolenda as an FYI.
llvm#108199) Change comment command emitter to const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…#108201) Change HTMLNamedCharacterReferenceEmitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…lvm#108202) Change HTML Tags emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…m#108203) Change DataCollectors Emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…vm#108211) Change Opcode Emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…vm#108213) Change OpenCL builtins emitter to use const RecordKeeper This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…lvm#108216) Change OptionDoc Emitter to use const RecordKeeper. This is a part of effort to have better const correctness in TableGen backends: https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
Currently the nan* functions use nullptr dereferencing to crash with SIGSEGV if the input is nullptr. Both `nan(nullptr)` and `nullptr` dereferencing are undefined behaviors according to the C standard. Employing `nullptr` dereference in the `nan` function implementation is ok if users only linked against the pre-built library, but it might be completely removed by the compilers' optimizations if it is built from source together with the users' code. See for instance: https://godbolt.org/z/fd8KcM9bx This PR uses volatile load to prevent the undefined behavior if libc is built without sanitizers, and leave the current undefined behavior if libc is built with sanitizers, so that the undefined behavior can be caught for users' codes.
…is used (llvm#108263) This matches the behaviour of GNU ld and the ELF version of lld.
…ation is laid out (llvm#105714) In `User::operator new` a single allocation is created to store the `User` object itself, "intrusive" operands or a pointer for "hung off" operands, and the descriptor. After allocation, details about the layout (number of operands, how the operands are stored, if there is a descriptor) are stored in the `User` object by settings its fields. The `Value` and `User` constructors are then very careful not to initialize these fields so that the values set during allocation can be subsequently read. However, when the `User` object is returned from `operator new` [its value is technically "indeterminate" and so reading a field without first initializing it is undefined behavior (and will be erroneous in C++26)](https://en.cppreference.com/w/cpp/language/default_initialization#Indeterminate_and_erroneous_values). We discovered this issue when trying to build LLVM using MSVC's [`/sdl` flag](https://learn.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=msvc-170) which clears class fields after allocation (the docs say that this feature shouldn't be turned on for custom allocators and should only clear pointers, but that doesn't seem to match the implementation). MSVC's behavior both with and without the `/sdl` flag is standards conforming since a program is supposed to initialize storage before reading from it, thus the compiler implementation changing any values will never be observed in a well-formed program. The standard also provides no provisions for making storage bytes not indeterminate by setting them during allocation or `operator new`. The fix for this is to create a set of types that encode the layout and provide these to both `operator new` and the constructor: * The `AllocMarker` types are used to select which `operator new` to use. * `AllocMarker` can then be implicitly converted to a `AllocInfo` which tells the constructor how the type was laid out.
Summary: There's an extern weak symbol for this, we should just factor these into a more common interface. Stub them temporarily to make the bots happy. PTXAS does not handle extern weak.
This patch adds a benchmark for ReplaceUsesOfWith().
Make if constexpr due to constexpr condition.
Otherwise we fail to build with modules in C++03 mode once we migrate to a single top-level module, because those headers get pulled in but they don't compile as C++03.
It doesn't serve much of a purpose since we can easily put its contents inside __config. Removing it simplifies the modulemap once we are trying to create a single top-level module.
…8257) We should allow singleton and fooSingleton as singleton function names.
The implementation would crash with unloaded dialects.
…do-probes mode (llvm#106365) Implement selective probe parsing for profiled functions only when emitting probe information to YAML profile as suggested in llvm#102904 (review) For a large binary, this reduces probe parsing - processing time from 10.5925s to 5.6295s, - peak RSS from 10.54 to 7.98 GiB.
Align BAT YAML (DataAggregator) to YAMLProfileWriter which drops blocks without profile: https://github.com/llvm/llvm-project/blob/61372fc5db9b14fd612be8a58a76edd7f0ee38aa/bolt/lib/Profile/YAMLProfileWriter.cpp#L162-L176 Test Plan: NFCI
llvm#108107) …tting" (llvm#108104)" This recommits 0f56ba1 (reverted by 6007ad7). In the original patch llvm/utils/lit/tests/escape-color.py failed on Windows because it diffed llvm-lit output with a file containing '\n' newlines rather than '\r\n'. This issue is avoided by calling 'diff --strip-trailing-cr'. Original description below: Test output that carried color across newlines previously resulted in the formatting around the output also being colored. Detect the current ANSI color and reset it when printing formatting, and then reapply it. As an added bonus an unterminated color code is also detected, preventing it from leaking out into the rest of the terminal. Fixes llvm#106633
llvm#108311) …de loaded from different modules (llvm#104512)" This reverts commit d778689.
) This adds VL patterns for vfwmaccbf16.vv so that we can handle fixed length vectors. It does this by teaching combineOp_VLToVWOp_VL to emit RISCVISD::VFWMADD_VL for bf16. The change in getOrCreateExtendedOp is needed because getNarrowType is based off of the bitwidth so returns f16. We need to explicitly check for bf16. Note that the .vf patterns don't work yet, since the build_vector splat gets lowered to a (vmv_v_x_vl (fmv_x_anyexth x)) instead of a vfmv.v.f, which SplatFP doesn't pick up, see llvm#106637.
Previously they were legal by default, so the truncstore/extload test cases would get combined and crash during selection. These are set to expand for f16 so do the same for bf16.
…lvm#108041) This patch fixes attr type of out_shape, which is i64 dense array attribute with exactly 4 elements. - Fix description of DenseArrayMaxCt - Add DenseArrayMinCt and move it to CommonAttrConstraints.td - Change type of out_shape to Tosa_IntArrayAttr4 Fixes llvm#107804.
If the value we're replacing has a name, we might as well preserve it.
This patch implements sandboxir::ConstantTokenNone mirroring llvm::ConstantTokenNone.
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ```
…lvm#94190)" This reverts commit 2d4bdfb. A build breakage is reported at: https://lab.llvm.org/buildbot/#/builders/138/builds/3524
Building with -DLLVM_ENABLE_EXPORTED_SYMBOLS_IN_EXECUTABLES=Off should not prevent use of opt plugins. This fix uses the approach implemented in llvm#101741. rdar://135841478
…#107648) Hello Arjun! Please allow me to contribute this patch as it helps me debugging significantly! When the 1's and 0's don't line up when debugging farkas lemma of numerous polyhedrons using simplex lexmin solver, it is truly straining on the eyes. Hopefully this patch can help others! The unfortunate part is the lack of testcase as I'm not sure how to add testcase for debug dumps. :) However, you can add this testcase to the SimplexTest.cpp to witness the nice printing! ```c++ TEST(SimplexTest, DumpTest) { int COLUMNS = 2; int ROWS = 2; LexSimplex simplex(COLUMNS * 2); IntMatrix m1(ROWS, COLUMNS * 2 + 1); // Adding LHS columns. for (int i = 0; i < ROWS; i++) { // an arbitrary formula to test all kinds of integers for (int j = 0; j < COLUMNS; j++) m1(i, j) = i + (2 << (i % 3)) * (-1 * ((i + j) % 2)); } // Adding RHS columns. for (int i = 0; i < ROWS; i++) { for (int j = 0; j < COLUMNS; j++) m1(i, j + COLUMNS) = j - (3 << (j % 4)) * (-1 * ((i + j * 2) % 2)); } for (int i = 0; i < m1.getNumRows(); i++) { ArrayRef<DynamicAPInt> curRow = m1.getRow(i); simplex.addInequality(curRow); } IntegerRelation rel = parseRelationFromSet("(x, y, z)[] : (z - x - 17 * y == 0, x - 11 * z >= 1)",2); simplex.dump(); m1.dump(); rel.dump(); } ``` ``` rows = 2, columns = 7 var: c3, c4, c5, c6 con: r0 [>=0], r1 [>=0] r0: -1, r1: -2 c0: denom, c1: const, c2: 2147483647, c3: 0, c4: 1, c5: 2, c6: 3 1 0 1 0 -2 0 1 1 0 -8 -3 1 3 7 0 -2 0 1 0 -3 1 3 7 0 Domain: 2, Range: 1, Symbols: 0, Locals: 0 2 constraints -1 -17 1 0 = 0 1 0 -11 -1 >= 0 ```
…indows target (llvm#104676) This PR first adds osutils for Windows, and changes some libc code to make libc and its tests build on the Windows target. It then temporarily disables some libc tests that are currently problematic on Windows. Specifically, the changes besides the addition of osutils include: - Macro `LIBC_TYPES_HAS_FLOAT16` is disabled on Windows. `clang-cl` generates calls to functions in `compiler-rt` to handle float16 arithmetic and these functions are currently not linked in on Windows. - Macro `LIBC_TYPES_HAS_INT128` is disabled on Windows. - The invocation to `::aligned_malloc` is changed to an invocation to `::_aligned_malloc`. - The following unit tests are temporarily disabled because they currently fail on Windows: - `test.src.__support.big_int_test` - `test.src.__support.arg_list_test` - `test.src.fenv.getenv_and_setenv_test` - Tests involving `__m128i`, `__m256i`, and `__m512i` in `test.src.string.memory_utils.op_tests.cpp` - `test_range_errors` in `libc/test/src/math/smoke/AddTest.h` and `libc/test/src/math/smoke/SubTest.h`
…-range-compare (NFC) /llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:320:26: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare] preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0; ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ /llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:335:28: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare] preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0; ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~ 2 errors generated.
) Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g. ``` %0 = scf.for() { %1 = scf.for() { tiledProducer } yield %1 } %2 = consumer ins(%0) ``` Compared with llvm#94190, this PR fix build failure by making C++17 happy.
Update ISDOpcodes.h documentation according to commit ad9d13d ("SelectionDAG: Swap operands of atomic_store") for less confusion.
…from int to FP. (llvm#108284) selectFPImm previously handled cases where an FPImm could be materialized in an integer register. We can generalize this to cases where a value was in an integer register and then copied to a scalar FP register to be used by a vector instruction. In the affected test, the call lowering code used up all of the FP argument registers and started using GPRs. Now we use integer vector instructions to consume those GPRs instead of moving them to scalar FP first.
SSE & AVX do not include instructions for shifting i8 vectors. Instead, they must be synthesized via other instructions. If pairs of i8 vectors share a shift amount, we can use SWAR techniques to substantially reduce the amount of code generated. Say we were going to execute this shift right: x >> {0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, ...} LLVM would previously generate: vpxor %xmm1, %xmm1, %xmm1 vpunpckhbw %ymm0, %ymm1, %ymm2 vpunpckhbw %ymm1, %ymm0, %ymm3 vpsllw $4, %ymm3, %ymm3 vpblendd $204, %ymm3, %ymm2, %ymm2 vpsrlw $8, %ymm2, %ymm2 vpunpcklbw %ymm0, %ymm1, %ymm3 vpunpcklbw %ymm1, %ymm0, %ymm0 vpsllw $4, %ymm0, %ymm0 vpblendd $204, %ymm0, %ymm3, %ymm0 vpsrlw $8, %ymm0, %ymm0 vpackuswb %ymm2, %ymm0, %ymm0 Instead, we can reinterpret a pair of i8 elements as an i16 and shift use the same shift amount. The only thing we need to do is mask out any bits which crossed the boundary from the top i8 to the bottom i8. This SWAR-style technique achieves: vpsrlw $4, %ymm0, %ymm1 vpblendd $170, %ymm1, %ymm0, %ymm0 vpand .LCPI0_0(%rip), %ymm0, %ymm0 This is implemented for both left and right logical shift operations. Arithmetic shifts are less well behaved here because the shift cannot also perform the sign extension for the lower 8 bits.
mgehre-amd
approved these changes
Dec 12, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.