[AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21, needs torch & onnx bump) (82) #346

mgehre-amd · 2024-09-12T20:24:13Z

Needs Xilinx/onnx-mlir#183
Needs Xilinx/torch-mlir#321

The only special thing to do is to use fir.rebox_assumed_rank when reboxing the target to properly set the POINTER attribute inside the descriptor.

llvm#96049) Pattern scalarizes vector.gather operations and is incorrect for scalable vectors.

Add missing includes.

The initial check-in of compiler-rt/lib/nsan llvm#94322 has a lot of style issues. Fix them before the history becomes more useful. Pull Request: llvm#96142

…lvm#95702)

Per P1975R0 an expression like static_cast<U[]>(...) defines the type of the expression as U[1]. Fixes llvm#62863

…m#96055) This fixes llvm#93309, and seems to match how GNU ld handles this case.

There are only three actual uses of the section kind in MCSection: isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and store other info in the actual section variants where required. ELF and COFF flags also encode all relevant information, so for these two section variants, remove the SectionKind parameter entirely. This allows to remove the string switch (which is unnecessary and inaccurate) from createELFSectionImpl. This was introduced in [D133456](https://reviews.llvm.org/D133456), but apparently, it was never hit for non-writable sections anyway and the resulting kind was never used.

Reverts the behavior introduced by 770393b while keeping the refactored code. Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue

…ues (llvm#89944) The ABI mandates two things related to function calls: - Function arguments must be sign- or zero-extended to the register size by the caller. - Return values must be sign- or zero-extended to the register size by the callee. As consequence, callees can assume that function arguments have been extended and so can callers with regards to return values. Here lies the problem: Nonsecure code might deliberately ignore this mandate with the intent of attempting an exploit. It might try to pass values that lie outside the expected type's value range in order to trigger undefined behaviour, e.g. out of bounds access. With the mitigation implemented, Secure code always performs extension of values passed by Nonsecure code. This addresses the vulnerability described in CVE-2024-0151. Patches by Victor Campos. --------- Co-authored-by: Victor Campos <victor.campos@arm.com>

…vm#96066) Example: ```mlir %mask = vector.create_mask %a, %b : vector<[4]x[8]xi1> %slice = vector.extract %mask[%index] : vector<[8]xi1> from vector<[4]x[8]xi1> ``` Becomes: ```mlir %mask_rows = vector.create_mask %a : vector<[4]xi1> %mask_cols = vector.create_mask %b : vector<[8]xi1> %slice = arm_sve.psel %mask_cols, %mask_rows[%index] : vector<[8]xi1>, vector<[4]xi1> ``` Note: While psel is under ArmSVE it requires SME (or SVE 2.1), so this is currently the most logical place for this lowering.

…on-creation (llvm#94226) This patch simplifies instruction creation by replacing all overloads of instruction constructors/Create methods that are identical other than the Instruction *InsertBefore/BasicBlock *InsertAtEnd/BasicBlock::iterator InsertBefore argument with a single version that takes an InsertPosition argument. The InsertPosition class can be implicitly constructed from any of the above, internally converting them to the appropriate BasicBlock::iterator value which can then be used to insert the instruction (or to not insert it if an invalid iterator is passed). The upshot of this is that code will be deduplicated, and all callsites will switch to calling the new unified version without any changes needed to make the compiler happy. There is at least one exception to this; the construction of InsertPosition is a user-defined conversion, so any caller that was already relying on a different user-defined conversion won't work. In all of LLVM and Clang this happens exactly once: at clang/lib/CodeGen/CGExpr.cpp:123 we try to construct an alloca with an AssertingVH<Instruction> argument, which must now be cast to an Instruction* by using `&*`. If this is more common elsewhere, it could be fixed by adding an appropriate constructor to InsertPosition.

Previously, a symbol insertion requires (at least) three hash table operations: - Lookup/create entry in Symbols (main symbol table) - Lookup NextUniqueID to deduplicate identical temporary labels - Add entry to UsedNames, which is also used to serve as storage for the symbol name in the MCSymbol. All three lookups are done with the same name, so combining these into a single table reduces the number of lookups to one. Thus, a pointer to a symbol table entry can be passed to createSymbol to avoid a duplicate lookup of the same name. The new symbol table entry value is placed in a separate header to avoid including MCContext in MCSymbol or vice versa.

llvm#96057) This re-uses reduction declarations from intrinsic operators to add support for reductions of allocatables, pointers, and arrays with procedure designators (e.g. min/max). I have split this into two commits to make it easier to review. The first one makes the functional change. The second cleans things up now that we can share much more code between intrinsic operators and procedure designators.

…g / trailing dimensions. (llvm#92934) Generalizes `DropUnitDimFromElementwiseOps` to support inner unit dimensions. This change stems from improving lowering of contractionOps for Arm SME. Where we end up with inner unit dimensions on MulOp, BroadcastOp and TransposeOp, preventing the generation of outerproducts. discussed [here](https://discourse.llvm.org/t/on-improving-arm-sme-lowering-resilience-in-mlir/78543/17?u=nujaa). --------- Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>

Load from null is UB, load from pointer arg instead.

gep nuw can be null if and only if both the base pointer and offset are null. Unlike the inbounds case this does not depend on whether the null pointer is valid. Proofs: https://alive2.llvm.org/ce/z/PLoqK5

We use explicit template instantiation for these classes, so there is no need to have the definition in the header. The places that instantiate the method will include the PassManagerImpl.h file.

Internal label names never occur in the symbol table, so when using an object streamer, there's no point in constructing these names and then adding them to hash tables -- they are never visible in the output. It's not possible to reuse createTempSymbol, because on BPF has a different prefix for globals and basic blocks right now.

…id (llvm#96167) Without this SelectionDAG could fail assertions when using the intrinsic in a non-entry BB.

Reverts llvm#94717 This breaks on some buildbots: http://45.33.8.238/linux/141118/step_7.txt

Namespaces are terminated with a closing comment in the majority of the codebase so do the same here for consistency. Also format code within some namespaces to make clang-format happy.

…6342) Otherwise the startup objects will fail to link since they were cross compiled, but the linker is not informed of the intent to cross compile, which results in linker errors when the host architecture does not match the target architecture.

…96355)

…96351)

…lvm#96337) SubtargetPredicate should be the primary "does this instruction exist" predicate, with OtherPredicates used for other side pieces of information. Changes like 856d1c4 were backwards. The problematic usage is how GFX12 is using HasRestrictedOffset. The multiclasses for buffers should probably be split up instead of hiding OtherPredicates inside the buffer atomic multiclasses. The two cases are mutually exclusive and really need a negated predicate for the not-gfx12 case. It's pretty terrible we have to manage this in the first place. TableGen should be able to figure out the required predicates from any instructions that appear in the pattern output.

…96357) We only need to set `--target=` for LLD when cross compiling. This should fix the host build using BFD or targeting the host. Fixes: llvm#96342

#3) (llvm#93315) The ThreadLocalCache implementation is used by the MLIRContext (among other things) to try to manage thread contention in the StorageUniquers. There is a bunch of fancy shared pointer/weak pointer setups that basically keeps everything alive across threads at the right time, but a huge bottleneck is the `weak_ptr::lock` call inside the `::get` method. This is because the `lock` method has to hit the atomic refcount several times, and this is bottlenecking performance across many threads. However, all this is doing is checking whether the storage is initialized. Importantly, when the `PerThreadInstance` goes out of scope, it does not remove all of its associated entries from the thread-local hash map (it contains dangling `PerThreadInstance *` keys). The `weak_ptr` also allows the thread local cache to synchronize with the `PerThreadInstance`'s destruction: 1. if `ThreadLocalCache` destructs, the `weak_ptr`s that reference its contained values are immediately invalidated 2. if `CacheType` destructs within a thread, any entries still live are removed from the owning `PerThreadInstance`, and it locks the `weak_ptr` first to ensure it's kept alive long enough for the removal. This PR changes the TLC entries to contain a `shared_ptr<ValueT*>` and a `weak_ptr<PerInstanceState>`. It gives the `PerInstanceState` entries a `weak_ptr<ValueT*>` on top of the `unique_ptr<ValueT>`. This enables `ThreadLocalCache::get` to check if the value is initialized by dereferencing the `shared_ptr<ValueT*>` and check if the contained pointer is null. When `PerInstanceState` destructs, the values inside the TLC are written to nullptr. The TLC uses the `weak_ptr<PerInstanceState>` to satisfy (2). (1) is no longer the case. When `ThreadLocalCache` begins destruction, the `weak_ptr<PerInstanceState>` are invalidated, but not the `shared_ptr<ValueT*>`. This is OK: because the overall object is being destroyed, `::get` cannot get called and because the `shared_ptr<PerInstanceState>` finishes destruction before freeing the pointer, it cannot get reallocated to another `ThreadLocalCache` during destruction. I.e. the values inside the TLC associated with a `PerInstanceState` cannot be read during destruction. The most important thing is to make sure destruction of the TLC doesn't race with the destructor of `PerInstanceState`. Because `PerInstanceState` carries `weak_ptr` references into the TLC, we guarantee to not have any use-after-frees.

jeanPerier and others added 30 commits June 20, 2024 09:01

[flang] lower assumed-rank TARGET to intent(in) POINTER (llvm#96082)

fa08e97

The only special thing to do is to use fir.rebox_assumed_rank when reboxing the target to properly set the POINTER attribute inside the descriptor.

[mlir][vector] Disable Gather1DToConditionalLoads for scalable vectors (

cc145f4

llvm#96049) Pattern scalarizes vector.gather operations and is incorrect for scalable vectors.

Fix bazel build past abd9534 (llvm#96143)

1d1d007

Update ExternalPreprocessorSource.h (llvm#96144)

1134424

Add missing includes.

[nsan] Fix style issue

ef83c25

The initial check-in of compiler-rt/lib/nsan llvm#94322 has a lot of style issues. Fix them before the history becomes more useful. Pull Request: llvm#96142

mmapForContinuousMode: Align Linux's impl to __APPLE__'s more. NFC. (l…

7cf84d3

…lvm#95702)

[clang] Fix static_cast to array of unknown bound (llvm#96041)

b9ad0b6

Per P1975R0 an expression like static_cast<U[]>(...) defines the type of the expression as U[1]. Fixes llvm#62863

[LLD] [MinGW] Interpret an empty -entry option as no entry point (llv…

acf675b

…m#96055) This fixes llvm#93309, and seems to match how GNU ld handles this case.

[MachineLICM] Work-around Incomplete RegUnits (llvm#95926)

f0897ea

Reverts the behavior introduced by 770393b while keeping the refactored code. Fixes a miscompile on AArch64, at the cost of a small regression on AMDGPU. llvm#96146 opened to investigate the issue

[AMDGPU] Fix typo "GXF" in check prefix

81e8f01

[AMDGPU] Tweak comment to fix warning from filecheck_lint.py

2ccfa93

[AMDGPU] Fix GFX90A/GFX940 check prefix typos

70748dc

[AMDGPU] Add a RUN line to test the OSABI-PAL-ERR prefix

d594d9f

[AMDGPU] Add ALL prefix to all RUN lines for better diagnostics

94fdfc1

[X86] Fix indention in X86InstrArithmetic.td, NFCI

919c547

[LV] Remove loads from null from pr73894.ll test.

ffc51b9

Load from null is UB, load from pointer arg instead.

[AArch64] Remove -debug flag from mlicm-csr-mask.mir

105a9f3

[ValueTracking] Support gep nuw in isKnownNonZero()

6012de2

gep nuw can be null if and only if both the base pointer and offset are null. Unlike the inbounds case this does not depend on whether the null pointer is valid. Proofs: https://alive2.llvm.org/ce/z/PLoqK5

Fix bazel build past e2296d8 (llvm#96166)

7977249

[NewPM] Move PassManager::run() into Impl.h (NFC)

b18bf8f

We use explicit template instantiation for these classes, so there is no need to have the definition in the header. The places that instantiate the method will include the PassManagerImpl.h file.

[MC] Fix compilation

84428da

[AMDGPU] Preserve chain when selecting llvm.amdgcn.pops.exiting.wave.…

90779fd

…id (llvm#96167) Without this SelectionDAG could fail assertions when using the intrinsic in a non-entry BB.

fabio-d and others added 22 commits June 21, 2024 21:45

[scudo] Add TEST_SKIP macro to skip the current test (llvm#96192)

513644b

Enable ASAN in amdgpu toolchain for OpenCL (llvm#96262)

60fa7c7

Revert "[clang-doc] Add --asset option to clang-doc" (llvm#96354)

bf824d9

Reverts llvm#94717 This breaks on some buildbots: http://45.33.8.238/linux/141118/step_7.txt

[llvm] format and terminate namespaces with closing comment (llvm#94917)

7b57a1b

Namespaces are terminated with a closing comment in the majority of the codebase so do the same here for consistency. Also format code within some namespaces to make clang-format happy.

[Clang] Replace emitXXXBuiltin with a unified interface (llvm#96313)

e52016a

[libc][stdlib] Only use freelist_malloc for baremetal targets. (llvm#…

09bc1e8

…96355)

[gn build] Port 5ece35d

31bbaf4

[gn build] Port 7c814c1

3984e58

[gn build] Port b8f0ca0

df54be4

AMDGPU: Fix overriding SubtargetPredicate in MUBUF_Real_gfx90a (llvm#…

5d6d2fc

…96351)

[libc][startup] check that we're cross compiling and using LLD (llvm#…

781d5cf

…96357) We only need to set `--target=` for LLD when cross compiling. This should fix the host build using BFD or targeting the host. Fixes: llvm#96342

[AutoBump] Merge with c091dd4 (Jun 15)

5006bfc

[AutoBump] Merge with fixes of 93ffe17 (Jun 15)

ab22c35

[AutoBump] Merge with 770393b (Jun 17)

7e79490

[AutoBump] Merge with fixes of 3cead57 (Jun 17)

86fd3f7

[AutoBump] Merge with 21ba91c (Jun 17)

75073a8

[AutoBump] Merge with fixes of 13d983e (Jun 17)

41f4ee0

Fix emitc tests

c0235f0

Merge commit '9b78ddf3b2ab' into bump_to_13d983e7

cca70a5

mgehre-amd changed the title ~~[AutoBump] Merge with fixes of 13d983e7 (Jun 17) (82)~~ [AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21) (82) Sep 13, 2024

cferry-AMD approved these changes Sep 13, 2024

View reviewed changes

mgehre-amd mentioned this pull request Sep 13, 2024

[AutoBump] Merge with fixes of d5c1c586 (June 27) (12) (needs LLVM bump Jun 20) Xilinx/onnx-mlir#183

Merged

mgehre-amd changed the title ~~[AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21) (82)~~ [AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21, needs torch & onnx bump) (82) Sep 13, 2024

mgehre-amd mentioned this pull request Sep 13, 2024

[AutoBump] Merge with fixes of 1f73895f (Jun 28, needs LLVM) (81) Xilinx/torch-mlir#321

Merged

mgehre-amd changed the base branch from bump_to_21ba91c4 to feature/fused-ops September 16, 2024 10:59

mgehre-amd merged commit b309613 into feature/fused-ops Sep 16, 2024
6 checks passed

mgehre-amd deleted the bump_to_13d983e7 branch September 16, 2024 10:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21, needs torch & onnx bump) (82) #346

[AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21, needs torch & onnx bump) (82) #346

mgehre-amd commented Sep 12, 2024 •

edited

Loading

[AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21, needs torch & onnx bump) (82) #346

[AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21, needs torch & onnx bump) (82) #346

Conversation

mgehre-amd commented Sep 12, 2024 • edited Loading

mgehre-amd commented Sep 12, 2024 •

edited

Loading