[AutoBump] Merge with 12f77e81 (Jun 13) (74) #338

mgehre-amd · 2024-09-12T09:10:20Z

No description provided.

) Fixes llvm#92440 I had to delete part of reduction09.f90 because I don't think that should have ever worked.

…/EXTRACT_VECTOR_ELT (llvm#93357) DAGTypeLegalizer::SplitVecRes_INSERT_VECTOR_ELT and DAGTypeLegalizer::SplitVecRes_EXTRACT_VECTOR_ELT did not handle non byte-sized elements properly. In fact, it only dealt with elements smaller than 8 bits (as well as byte-sized elements). This patch generalizes the support for non byte-sized element by always widening the the vector elements to next "round integer type" (a power of 2 bit size). This should make sure that we can access a single element via a simple byte-addressed scalar load/store. Also removing a suspicious CustomLowerNode call from SplitVecRes_INSERT_VECTOR_ELT. Considering that it did not reset the Lo/Hi out arguments before the return I think that DAGTypeLegalizer::SplitVectorResult could be fooled into registering the input vector as being the result. This should however not have caused any problems since DAGTypeLegalizer::SplitVectorResult is doing the same CustomLowerNode call, making the code removed by this patch redundant.

…llvm#94911) An 8 x i16 raw load was incorrectly using a 64-bit memory type, which would assert in the MachineMemOperand constructor. This is preparation for a cleanup which will make the buffer intrinsics work for all legal types.

Use the internal computeForAddCarry directly since we know the exact values of the carry bit.

DWOName is still used afterwards. The only reason this works out right now is that SmallString does not actually have a constructor that can take advantage of the move.

I believe it has been implemented since D139837 "Implements CTAD for aggregates P1816R0 and P2082R1", so this just claims we have already supported it. Plus an update on the dr status page.

…lvm#94889) Fixes two issues in two ways: 1) The `braced-init-list` consisted of `initializer-list` and `designated-initializer-list`, and thus the designated initializer is subject to [over.match.class.deduct]p1.8, which means the brace elision is also applicable on it for CTAD deduction guides. 2) When forming a deduction guide where the brace elision is applicable, we should also consider the presence of braces within the initializer. For example, given template <class T, class U> struct X { T t[2]; U u[3]; }; X x = {{1, 2}, 3, 4, 5}; we should establish such deduction guide AFAIU: `X(T (&&)[2], U, U, U) -> X<T, U>`. Fixes llvm#64625 Fixes llvm#83368

…#84443) This commit implements the Window Scheduler as described in the RFC: https://discourse.llvm.org/t/rfc-window-scheduling-algorithm-for-machinepipeliner-in-llvm/74718 This Window Scheduler implements the window algorithm designed by Steven Muchnick in the book "Advanced Compiler Design And Implementation", with some improvements: 1. Copy 3 times of the loop kernel and construct the corresponding DAG to identify dependencies between MIs; 2. Use heuristic algorithm to obtain a set of window offsets. The window algorithm is equivalent to modulo scheduling algorithm with a stage of 2. It is mainly applied in targets where hardware resource conflicts are severe, and the SMS algorithm often fails in such cases. On our own DSA, this window algorithm typically can achieve a performance improvement of over 10%. Co-authored-by: Kai Yan <aklkaiyan@tencent.com> Co-authored-by: Ran Xiao <lennyxiao@tencent.com> --------- Co-authored-by: Kai Yan <aklkaiyan@tencent.com> Co-authored-by: Ran Xiao <lennyxiao@tencent.com>

…m#94845)

Due to alignment, the first two fields of MCEncodedFragment are currently at bytes 40 and 41, so 1 byte over the 8 byte boundary, causing 7 bytes padding to be inserted for the following pointer. Fold two bools of MCFragment into bitfields to reduce move the two fields of MCEncodedFragment one byte earlier to remove the padding bytes. This works, as in the Itanium ABI, there is no padding after base classes. This gives a space reduction of MCDataFragment from 224 to 216 bytes.

dyn_cast is allowed to return NULL - use cast<> to assert that the cast type is valid Fixes static analysis warning.

…lvm#95376) Mostly fixes handling of bfloat vectors, but also some missing i16 cases.

To detect features we either use HWCAPs or directly extract system register bitfields and compare with a value. In many cases equality comparisons give wrong results for example FEAT_SVE is not set if SVE2 is available (see the issue llvm#93651). I am also making the access to __aarch64_cpu_features atomic. The corresponding PR for the ACLE specification is ARM-software/acle#322.

By storing possible test vectors instead of combinations of conditions, the restriction is dramatically relaxed. This introduces two options to `cc1`: * `-fmcdc-max-conditions=32767` * `-fmcdc-max-test-vectors=2147483646` This change makes coverage mapping, profraw, and profdata incompatible with Clang-18. - Bitmap semantics changed. It is incompatible with previous format. - `BitmapIdx` in `Decision` points to the end of the bitmap. - Bitmap is packed per function. - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`. RFC: https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798

I was wrong: The purpose of CWG2685 is to avoid brace elision on string literals and we should be rejecting the case. Reverts llvm#95206

…ype> calls. NFC. Use getAs<ExtVectorType> directly to avoid duplicate getAs<> calls and static analyser warnings about dereferencing potentially null pointers.

…operands unless legal. Converting to avgfloor and then expanding it back to shift+add later is likely to prevent other folds (re-association and value-tracking in particular) in the meantime. Fixes llvm#95284

…nd (llvm#95142)" (llvm#95306) Fixed the link error that previously occurred on buildbots by adding IRPrinter to the linked components of the Flang frontend. This reverts commit 1d45235.

We apparently are missing codegen support for atomicrmw fmin/fmax. Also clean up FP atomicrmw tests to be more consistent and comprehensively test the relevant cases

Fix warning in SPIRVMergeRegionExitTargets.cpp about "non-void function does not return a value in all control paths" by changing assert to llvm_unreachable.

This fixes llvm#95412

…#95329) MLIR's LLVM dialect does not internally support debug records, only converting to/from debug intrinsics. To smooth the transition from intrinsics to records, there is a step prior to IR->MLIR translation that switches the IR module to intrinsic-form; this patch adds the equivalent conversion to record-form at MLIR->IR translation. This is a partial reapply of llvm#95098 which can be landed once the flang frontend has been updated by llvm#95306. This is the counterpart to the earlier patch llvm#89735 which handled the IR->MLIR conversion.

This isn't used by clang and isn't in the rvv-intrinsic-doc. The instruction requires Zvfh. If the F register passed to the instruction isn't nan-boxed correctly, the instruction will generate the wrong nan. So the instruction isn't a generic move FPR16 to vector register instruction.

The custom lowering converts to f32, splats as f32, then narrows the vector to bf16. None of that requires Zvfhmin. Add new bf16 test files without Zvfh/Zvfmin in their RUN lines. I will remove the bf16 tests from other files in a follow up patch.

Ahead of llvm#94242 and as requested in the technical call, I am adding a couple of tests for pointer components that I would like to make sure are covered.

…ernal and private global data

…llvm#93361) LLVM's Vector Predication Intrinsics require an explicit vector length parameter: https://llvm.org/docs/LangRef.html#vector-predication-intrinsics. For a scalable vector type, this should be caculated as VectorScaleOp multiplied by base vector length, e.g.: for <[4]xf32> we should return: vscale * 4.

Create additional helper functions for the ValueObject class, for: - returning the value as an APSInt or APFloat - additional type casting options - additional ways to create ValueObjects from various types of data - dereferencing a ValueObject These helper functions are needed for implementing the Data Inspection Language, described in https://discourse.llvm.org/t/rfc-data-inspection-language/69893

…fdump (llvm#93289) This patch adds a new set of statistics to llvm-dwarfdump that provide additional information about .debug_line regarding the number of bytes covered by the line table (and how many of those are covered by line 0 entries), and the number of entries within the table and how many of those are is_stmt, unique, or unique and non-line-0 (where "uniqueness" is based on file, line, and column only). Collectively these give a little more insight into the state of debug line information, rather than variables (as most of the dwarfdump statistics are currently oriented towards). I've added all of the stats that were useful to some degree, but I think the most generally useful stat is "unique line entries", since it gives the most straightforward indication of regressions, i.e. when the number goes down it means that fewer source lines are reachable in the program.

commonBits has been deprecated since: commit d8229e2 Author: Jay Foad <jay.foad@amd.com> Date: Wed May 10 16:50:33 2023 +0100

Support case-insensitive regex matches for `SBTarget::FindGlobalFunctions` and `SBTarget::FindGlobalVariables`.

This reverts commit 0079835. Causes crashes, see comments on llvm#92555.

…lvm#83301) If a function requires any streaming-mode change, the vector granule value must be stored to the stack and unwind info must also describe the save of VG to this location. This patch adds VG to the list of callee-saved registers and increases the callee-saved stack size if the function requires streaming-mode changes. A new type is added to RegPairInfo, which is also used to skip restoring the register used to spill the VG value in the epilogue. See https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst

The default debug info format for newer versions of Darwin is DWARF 5. https://developer.apple.com/documentation/xcode-release-notes/xcode-16-release-notes rdar://110925733 (relanding 8f6acd9 with the bridgeOS platform check removed)

Part of llvm#95250.

…llvm#95326) When an uninstrumented libatomic is used with a TSan instrumented memcpy, TSan may report a data race in circumstances where writes are arguably safe. This occurs because __atomic_compare_exchange won't be instrumented in an uninstrumented libatomic, so TSan doesn't know that the subsequent memcpy is race-free. On the other hand, pthread_mutex_(un)lock will be intercepted by TSan, meaning an uninstrumented libatomic will not report this false-positive. pthread_mutexes also may try a number of different strategies to acquire the lock, which may bound the amount of time a thread has to wait for a lock during contention. While pthread_mutex_lock has a larger overhead (due to the function call and some dispatching), a dispatch to libatomic already predicates a lack of performance guarantees.

Reverts llvm#73980 This broke static hwasan binaries in Android, for some reason the fixed_shadow_base branch gets taken

jayfoad and others added 30 commits June 13, 2024 10:04

[CodeGenTypes] Remove unused ElSz argument from generated GET_VT_VECA…

71a5b37

…TTR (llvm#95258)

[flang][Semantics][OpenMP] Check type of reduction variables (llvm#94596

f440239

) Fixes llvm#92440 I had to delete part of reduction09.f90 because I don't think that should have ever worked.

[KnownBits] avgCompute - don't create on-the-fly Carry. NFC.

fa9301f

Use the internal computeForAddCarry directly since we know the exact values of the carry bit.

[llvm-dwp] Remove incorrect std::move()

00bb18a

DWOName is still used afterwards. The only reason this works out right now is that SmallString does not actually have a constructor that can take advantage of the move.

[clang][NFC] Add a test for CWG2685 (llvm#95206)

3475116

I believe it has been implemented since D139837 "Implements CTAD for aggregates P1816R0 and P2082R1", so this just claims we have already supported it. Plus an update on the dr status page.

DAG: Replace bitwidth with type in suffix in atomic tablegen ops (llv…

5c9352e

…m#94845)

[gn build] Port b6bf402

d70d326

[NFC][clang-tidy] fix typo in readability-else-after-return

3f9e2e1

[CodeGen] ExpandLargeFpConvert - don't dereference a dyn_cast result

f991a16

dyn_cast is allowed to return NULL - use cast<> to assert that the cast type is valid Fixes static analysis warning.

[X86] avg.ll - add common CHECK prefix

1cbafb3

[X86] Add test coverage for llvm#95284

33a24b7

AMDGPU: Fix buffer intrinsic handling for various 16-bit elements. (l…

c0ff36e

…lvm#95376) Mostly fixes handling of bfloat vectors, but also some missing i16 cases.

[clang][ExprConst][NFC] Replace typecheck+castAs with getAs

e439d22

[clang][Interp] Prepare return value for composite InitListExprs

5563d91

AMDGPU: Fix buffer intrinsic store of bfloat (llvm#95377)

5e8cf0b

Revert "[clang][NFC] Add a test for CWG2685" (llvm#95389)

b53e085

I was wrong: The purpose of CWG2685 is to avoid brace elision on string literals and we should be rejecting the case. Reverts llvm#95206

[M68k] Fix atomic_cmp_swap patterns after llvm#94845

04a4254

[Sema] IsVectorConversion - merge isExtVectorType() and getAs<VectorT…

7ca52cd

…ype> calls. NFC. Use getAs<ExtVectorType> directly to avoid duplicate getAs<> calls and static analyser warnings about dereferencing potentially null pointers.

[clang][NFC] Update CWG issues list

71e4d70

[DAG] combineShiftToAVG - don't create avgfloor with scalar constant …

76c5158

…operands unless legal. Converting to avgfloor and then expanding it back to shift+add later is likely to prevent other folds (re-association and value-tracking in particular) in the meantime. Fixes llvm#95284

Reapply "[Flang] Use PrintModulePass to print LLVM IR from the fronte…

9b46838

…nd (llvm#95142)" (llvm#95306) Fixed the link error that previously occurred on buildbots by adding IRPrinter to the linked components of the Flang frontend. This reverts commit 1d45235.

Fix off-by-one issue found by post-commit review

4f09ac7

[clang][Interp] Handle BooleanToSignedIntegral casts

ffab938

arsenm and others added 25 commits June 13, 2024 17:07

AMDGPU: Cleanup FP atomicrmw tests and cover fmin/fmax (llvm#95131)

444dd9b

We apparently are missing codegen support for atomicrmw fmin/fmax. Also clean up FP atomicrmw tests to be more consistent and comprehensively test the relevant cases

[llvm] Use llvm::is_contained (NFC) (llvm#95362)

5dc99af

[SPIRV] Fix warning in SPIRVMergeRegionExitTargets.cpp (llvm#95283)

525c25a

Fix warning in SPIRVMergeRegionExitTargets.cpp about "non-void function does not return a value in all control paths" by changing assert to llvm_unreachable.

[X86][MC] Not decode 0xf3 as rep prefix if it's right before REX2

91a55cf

This fixes llvm#95412

Revert "[libc] fix aarch64 linux full build (llvm#95358)" (llvm#95419)

9e5428e

Adding a couple of pointer components tests (llvm#95287)

a8f8070

Ahead of llvm#94242 and as requested in the technical call, I am adding a couple of tests for pointer components that I would like to make sure are covered.

Add mad support for v_pk_* 16 bit integer (llvm#95104)

1fb1fcf

[PowerPC][NFC] Pre-commit test case to prepare for patch to merge int…

19b43e1

…ernal and private global data

[KnownBits] Remove commonBits (llvm#95430)

890ab28

commonBits has been deprecated since: commit d8229e2 Author: Jay Foad <jay.foad@amd.com> Date: Wed May 10 16:50:33 2023 +0100

[lldb] Support case-insensitive regex matches (llvm#95350)

8f2a4e8

Support case-insensitive regex matches for `SBTarget::FindGlobalFunctions` and `SBTarget::FindGlobalVariables`.

Revert "[VPlan] First step towards VPlan cost modeling. (llvm#92555)"

46080ab

This reverts commit 0079835. Causes crashes, see comments on llvm#92555.

Fix typos in comment

0f53a59

[HWASan] add test for hwasan_symbolize of stack uas (llvm#95186)

389142e

[libc][math][c23] Add f16sqrtf C23 math function (llvm#95251)

a239343

Part of llvm#95250.

Revert "[hwasan] Add fixed_shadow_base flag" (llvm#95435)

12f77e8

Reverts llvm#73980 This broke static hwasan binaries in Android, for some reason the fixed_shadow_base branch gets taken

[AutoBump] Merge with 12f77e8 (Jun 13)

ad8dd99

cferry-AMD approved these changes Sep 12, 2024

View reviewed changes

Base automatically changed from bump_to_705f8581 to feature/fused-ops September 13, 2024 06:27

An error occurred while trying to automatically change base from bump_to_705f8581 to feature/fused-ops September 13, 2024 06:27

mgehre-amd merged commit d847318 into feature/fused-ops Sep 13, 2024
7 checks passed

mgehre-amd deleted the bump_to_12f77e81 branch September 13, 2024 06:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with 12f77e81 (Jun 13) (74) #338

[AutoBump] Merge with 12f77e81 (Jun 13) (74) #338

mgehre-amd commented Sep 12, 2024

[AutoBump] Merge with 12f77e81 (Jun 13) (74) #338

[AutoBump] Merge with 12f77e81 (Jun 13) (74) #338

Conversation

mgehre-amd commented Sep 12, 2024