[AutoBump] Merge with 50b15341 (Jun 27) (85) #349

mgehre-amd · 2024-09-13T16:22:54Z

No description provided.

…llvm#95224) To enable function multi-versioning (FMV), current checks which rely on cmd line options or global macros to see if target feature is present need to be removed. This patch removes those for NEON and also implements changes to NEON header file as proposed in [ACLE](ARM-software/acle#321).

This patch implements an improvement introduced in P3029R1 that was missed in llvm#87873. It adds a deduction of static extents if integral_constant-like constants are passed to `std::extents`.

This reverts commit bb5ab1f.

The GCC build has gotten to the point where it's often hard to find the actual error in the build log. We should look into enabling these warnings again in the future, but it looks like a lot of them are bogous.

The builtin causes the program to stop its execution abnormally and shows a human-readable description of the reason for the termination when a debugger is attached or in a symbolicated crash log. The motivation for the builtin is explained in the following RFC: https://discourse.llvm.org/t/rfc-adding-builtin-verbose-trap-string-literal/75845 clang's CodeGen lowers the builtin to `llvm.trap` and emits debugging information that represents an artificial inline frame whose name encodes the category and reason strings passed to the builtin.

Mark LWG3382 as "Nothing To Do" and add tests.

Annoyingly gfx90a/940 support this for global/flat but not buffer.

New fixes: - properly init the `std::optional<std::vector>` to an empty vector as opposed to `{}` (which was effectively `std::nullopt`). --------- Co-authored-by: Vy Nguyen <oontvoo@users.noreply.github.com>

https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#6212stack-constraints-at-a-public-interface mentions that the stack on ARM32 is double word aligned. Remove confused comments around ArgcType. argc is always an int, passed on the stack, so we need to store a pointer to it (regardless of ILP32 or LP64).

Alastair Robertson reported a huge compilation time increase without -g for bpf target when comparing to x86 ([1]). In my setup, with '-O0', for x86, a large basic block compilation takes 0.19s while bpf target takes 2.46s. The top function which contributes to the compile time is eliminateFrameIndex(). Such long compilation time without -g is caused by commit 05de2e4 ("[bpf] error when BPF stack size exceeds 512 bytes") The compiler tries to get some debug loc by iterating all insns in the basic block which will be used when compiler warns larger-than-512 stack size. Even without -g, such iterating also happens which cause unnecessary compile time increase. To fix the issue, let us move the related code when the compiler is about to warn stack limit violation. This fixed the compile time regression, and on my system, the compile time is reduced from 2.46s to 0.35s. [1] bpftrace/bpftrace#3257 Co-authored-by: Yonghong Song <yonghong.song@linux.dev>

llvm#96593) Many state of the art models and quantization operations are now directly working on vector.contract on integers. This commit enables generalizes ext-contraction folding S.T we can emit more performant vector.contracts on codegen pipelines. Signed-off-by: Stanley Winata <stanley.winata@amd.com>

) The `getDroppedDims` utility function does not follow the convention of dropping outermost unit dimensions first when inferring a rank reduction mask for a slice. This PR updates the implementation to match this convention.

N2843 was subsumed by N2882; we could probably consider removing subsumed entries, but I've been leaving them to help folks looking at the editor's report from various working drafts and wondering about the changes.

…lang (llvm#96555) Updates the install path for clang-doc to share/clang-doc instead share/clang to avoid confusion

Removes stdexecpt from clang-doc test introduced in llvm#93928 since it violates the rule that test must be freestanding

We use REAL() calls in interceptors, but DEFINE_REAL_PTHREAD_FUNCTIONS has nothing to do with them and only used for internal maintenance threads. This is done to avoid confusion like in llvm#96456.

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>

…lvm#91327)

This paper only matters for TS18661-3 integration.

According to https://reviews.llvm.org/D114250 this was to handle Mac specific issue, however the test is Linux only. The test effectively prevents to lock main allocator on fork, but we do that on Linux for other sanitizers for years, and need to do the same for TSAN to avoid deadlocks.

r7 is reserved in thumb2 (typically for the frame pointer, as opposed to r11 in ARM mode), so assigning to a variable with explicit register storage in r7 will produce an error. But r7 is where the Linux kernel expects the syscall number to be placed. We can use a temporary to get the register allocator to pick a temporary, which we save+restore the previous value of r7 in. Fixes: llvm#93738

These functions used only for `fork`. Unused parameter `child` will be used in followup patches.

Cap the alignment to 128 bytes as that is the maximum alignment supported by PTX. The restriction is mentioned in the parameter passing section (Note D) of the [PTX Writer's Guide to Interoperability] (https://docs.nvidia.com/cuda/ptx-writers-guide-to-interoperability/index.html#parameter-passing) > D. The alignment must be 1, 2, 4, 8, 16, 32, 64, or 128 bytes.

Reland: llvm#95456 This patch improves the ROCDL gpu serialization API by: - Introducing the enum `AMDGCNLibraries` for specifying the AMD GCN device code libraries to use during linking. - Removing `getCommonBitcodeLibs` in favor of `AMDGCNLibraries`. Previously `getCommonBitcodeLibs` would try to load all AMD GCN bitcode librariesm now it will only load the requested libraries. - Exposing the `compileToBinary` method and making it virtual, allowing downstream users to re-use this method. - Exposing `moduleToObjectImpl`, this method provides a prototype flow for compiling to binary, allowing downstream users to re-use this method. - It also avoids constructing the control variables if no device libraries are being used. - Changes the style of the error messages to be composable, ie no full stops. - Adds an error message for when the ROCm toolkit can't be found but it was required.

We do that for other Sanitizers, and we should do the same for TSAN. There are know deadlocks reports here.

This reduces codesize. As discussed in llvm#92707.

…llvm#95180) These instructions can be generated using regular LL intrinsics. Specified at: https://github.com/WebAssembly/half-precision/blob/29a9b9462c9285d4ccc1a5dc39214ddfd1892658/proposals/half-precision/Overview.md

…lFeatures` Unify parts of ASI and Prefetch tag matching at `parseASITag` and `parsePrefetchTag` to use a common function to parse any immediate expressions. This introduces a slight regression to error messages, but is needed so we can enable `ParseForAllFeatures` in `MatchOperandParserImpl` in a future patch. Reviewers: jrtc27, brad0, rorth, s-barannikov Reviewed By: s-barannikov Pull Request: llvm#96020

…e add nsw

This PR fixes the following failure by adjusting the calculation of maximum displacement from Stack Pointer. `LLVM ERROR: Error while trying to spill R5D from class ADDR64Bit: Cannot scavenge register without an emergency spill slot! `

These have been replaced with atomicrmw.

XFAIL in llvm#96894 was too wide. This one actually passes.

There are only a handful of changes, and now the entire file can be kept clang-formatted.

…ll rounding modes. (llvm#96719) Sharing the same algorithm as double precision sin: llvm#95736 and cos: llvm#96591

It fails downstream now that llvm#95237 removed flushing the output stream on printing every instruction.

@ZequanWu

This is a regression from llvm#96484 caught by @ZequanWu. Note that we will still create separate enum types for types parsed from two definitions. This is different from how we handle classes, but it is not a regression. I'm also adding the DieToType check to the class type parsing code, although in this case, the type uniqueness should be enforced by the UniqueDWARFASTType map.

…96902) This is a helper to avoid writing `getModule()->getDataLayout()`. I regularly try to use this method only to remember it doesn't exist... `getModule()->getDataLayout()` is also a common (the most common?) reason why code has to include the Module.h header.

…llvm#96666)

…lvm#96771) If this is not set the fact that the dynamic channel is in-bounds cannot be inferred automatically (like it can for static sizes), which eventually leads to it being marked as out-of-bounds (which prevents some rewrites).

This patch moves some functions out of `SemaChecking.cpp`. ObjC-, HLSL-, OpenCL-related functions are affected. This patch continues the effort of splitting `Sema` into parts. Additional context can be found in llvm#84184 and llvm#92682.

…e any registers (llvm#96907) Fixes llvm#92541 When e69a3d1 added fallback register layouts, it assumed that the choices were target XML with registers, or no target XML at all. In the linked issue, a user has a debug stub that does have target XML, but it's missing register information. This caused us to finalize the register information using an empty set of registers got from target XML, then fail an assert when we attempted to add the fallback set. Since we think we've already completed the register information. This change adds a check to prevent that first call and expands the existing tests to check each architecture without target XML and with target XML missing register information.

When erasing elements in small mode, we currently leave behind tombstones. This means that insertion into the SmallPtrSet also has to check for these, making the operation more expensive than it really should be. We don't really need the tombstones in small mode, because we can just replace with the last element in the set instead. This changes the order, but SmallPtrSet order is fundamentally unstable anyway. However, not leaving tombstones means that the erase() operation now invalidates iterators. This means that consumers that want to remove elements while iterating over the set have to use remove_if() instead. If they fail to do so, there will be an assertion failure thanks to debug epochs, so any such cases are easy to detect (and I have already fixed all cases inside llvm at least).

This avoids the indirection through MCID when just accessing the opcode. This uses two of the four padding bytes at the end of MachineInstr.

…ializations (llvm#96699) With the recent fix for this situation in class members (llvm#93873) (for which the fixed code is invalid prior to this patch - making migrating code difficult as it must be in lock-step with the compiler migration, if building with -Werror) it'd be really useful to be able to disable this warning during the compiler migration/decouple the compiler migration from the source fixes. In theory this approach will regress the codebase to the previous non-member cases of this issue that were already being held back by the warning (as opposed to if we carved out the new cases into a separate warning from the existing cases) but I think this'll be so rare and the cleanup so simple, that the extra regressions of disabling the warning broadly won't be too much of a problem. (but if folks disagree, I'm open to making the warning more fine-grained)

These extensions had their version number bumped and still experimental (under public review). I didn't see anything in the [commit history](https://github.com/riscv/riscv-j-extension/commits/master/) since llvm#79929 that would warrant a change to the implementation of pointer masking in the compiler.

Dereferencing a pointer variable without a running process does not work on every arch/os. Fix the test to x86-linux, where it is known to work.

Split from llvm#91572 --------- Co-authored-by: Nick Desaulniers (paternity leave) <nickdesaulniers@users.noreply.github.com>

Lukacma and others added 30 commits June 25, 2024 17:19

[libc++] P3029R1: Better mdspan's CTAD - std::extents (llvm#89015)

8c11d37

This patch implements an improvement introduced in P3029R1 that was missed in llvm#87873. It adds a deduction of static extents if integral_constant-like constants are passed to `std::extents`.

Revert "[𝘀𝗽𝗿] initial version"

902952a

This reverts commit bb5ab1f.

[libc++] Get the GCC build mostly clean of warnings (llvm#96604)

731db06

The GCC build has gotten to the point where it's often hard to find the actual error in the build log. We should look into enabling these warnings again in the future, but it looks like a lot of them are bogous.

[libc++] LWG3382: NTTP for pair and array (llvm#85811)

bb075ee

Mark LWG3382 as "Nothing To Do" and add tests.

AMDGPU: Handle legal v2bf16 atomicrmw fadd for gfx12 (llvm#95930)

889f3c5

Annoyingly gfx90a/940 support this for global/flat but not buffer.

[clang][Interp][NFC] Use delegate() to delegate to only initlist item

b7768c5

Reapply PR/87550 (again) (llvm#95571)

e951bd0

New fixes: - properly init the `std::optional<std::vector>` to an empty vector as opposed to `{}` (which was effectively `std::nullopt`). --------- Co-authored-by: Vy Nguyen <oontvoo@users.noreply.github.com>

[clang][Interp][NFC] Destroy InitMap when moving contents to DeadBlock

580343d

[C23] Update status page regarding FLT_MAX_EXP

05ca207

N2843 was subsumed by N2882; we could probably consider removing subsumed entries, but I've been leaving them to help folks looking at the editor's report from various working drafts and wondering about the changes.

[clang-doc] update install path to share/clang-doc instead of share/c…

d7dd778

…lang (llvm#96555) Updates the install path for clang-doc to share/clang-doc instead share/clang to avoid confusion

[clang-doc] Remove stdexecpt from clang-doc test (llvm#96552)

dbd5c78

Removes stdexecpt from clang-doc test introduced in llvm#93928 since it violates the rule that test must be freestanding

[sanitizer] Rename DEFINE_REAL_PTHREAD_FUNCTIONS (llvm#96527)

f0f774e

We use REAL() calls in interceptors, but DEFINE_REAL_PTHREAD_FUNCTIONS has nothing to do with them and only used for internal maintenance threads. This is done to avoid confusion like in llvm#96456.

[SLP] NFC. Refactor and add getAltInstrMask help function. (llvm#94709)

de7c139

Co-authored-by: Alexey Bataev <a.bataev@gmx.com>

[AMDGPU] Disallow negative s_load offsets in isLegalAddressingMode (l…

aaf50bf

…lvm#91327)

[C23] Move WG14 N2931 to the TS18661 section

5e2beed

This paper only matters for TS18661-3 integration.

[nfc][tsan] Better name for locking functions (llvm#96598)

cd2bac8

These functions used only for `fork`. Unused parameter `child` will be used in followup patches.

[nfc][tsan] Clang format includes (llvm#96599)

0258a60

[tsan] Lock/Unlock allocator and stacks on fork (llvm#96600)

c0dc134

We do that for other Sanitizers, and we should do the same for TSAN. There are know deadlocks reports here.

[SelectionDAG] Lower llvm.ldexp.f32 to ldexp() on Windows. (llvm#95301)

39a0aa5

This reduces codesize. As discussed in llvm#92707.

koachan and others added 25 commits June 27, 2024 19:45

[X86] Add test case to check computeKnownBitsForPMADDWD doesn't assum…

9e7defc

…e add nsw

clang/AMDGPU: Use atomicrmw for ds fmin/fmax builtins (llvm#96738)

8f63d15

AMDGPU: Remove ds_fmin/ds_fmax intrinsics (llvm#96739)

4477ff6

These have been replaced with atomicrmw.

[AMDGPU] Add some gfx1200 test coverage

4e70720

[lldb] Un-XFAIL TestStepScripted.test_misspelled_plan_name

5da6f64

XFAIL in llvm#96894 was too wide. This one actually passes.

[clang][OpenMP] clang-format SemaOpenMP.cpp, NFC

4ed8796

There are only a handful of changes, and now the entire file can be kept clang-formatted.

[libc][math] Implement double precision sincos correctly rounded to a…

4080f17

…ll rounding modes. (llvm#96719) Sharing the same algorithm as double precision sin: llvm#95736 and cos: llvm#96591

[AMDGPU] Fix MC/Disassembler/AMDGPU/decode-err.txt. (llvm#96621)

2b6e3f3

It fails downstream now that llvm#95237 removed flushing the output stream on printing every instruction.

[bazel] Port e035ef0

43953af

[AMDGPU] Only reinitialize disassembler Bytes array when needed. NFC. (…

bb97378

…llvm#96666)

[CodeGen] Cache Opcode in MachineInstr (llvm#96797)

aa24e36

This avoids the indirection through MCID when just accessing the opcode. This uses two of the four padding bytes at the end of MachineInstr.

[lldb/test] Fix enum-declaration-uniqueness.cpp

2fefc04

Dereferencing a pointer variable without a running process does not work on every arch/os. Fix the test to x86-linux, where it is known to work.

[libc] inline fast path of callonce (llvm#96226)

6d61d83

Split from llvm#91572 --------- Co-authored-by: Nick Desaulniers (paternity leave) <nickdesaulniers@users.noreply.github.com>

[clang][Interp] Don't diagnose non-const reads from the evaluating decl

50b1534

[AutoBump] Merge with 50b1534 (Jun 27)

6d8ed84

cferry-AMD approved these changes Sep 16, 2024

View reviewed changes

Base automatically changed from bump_to_f1e0657d to feature/fused-ops September 16, 2024 13:43

An error occurred while trying to automatically change base from bump_to_f1e0657d to feature/fused-ops September 16, 2024 13:43

mgehre-amd merged commit 658586f into feature/fused-ops Sep 16, 2024
5 checks passed

mgehre-amd deleted the bump_to_50b15341 branch September 16, 2024 13:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 50b15341 (Jun 27) (85) #349

[AutoBump] Merge with 50b15341 (Jun 27) (85) #349

mgehre-amd commented Sep 13, 2024

[AutoBump] Merge with 50b15341 (Jun 27) (85) #349

[AutoBump] Merge with 50b15341 (Jun 27) (85) #349

Conversation

mgehre-amd commented Sep 13, 2024