Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21, needs torch & onnx bump) (82) #346

Merged
merged 687 commits into from
Sep 16, 2024

Conversation

mgehre-amd
Copy link
Collaborator

@mgehre-amd mgehre-amd commented Sep 12, 2024

jeanPerier and others added 30 commits June 20, 2024 09:01
The only special thing to do is to use fir.rebox_assumed_rank when
reboxing the target to properly set the POINTER attribute inside the
descriptor.
llvm#96049)

Pattern scalarizes vector.gather operations and is incorrect for
scalable vectors.
The initial check-in of compiler-rt/lib/nsan llvm#94322 has a lot of style
issues. Fix them before the history becomes more useful.

Pull Request: llvm#96142
Per P1975R0 an expression like static_cast<U[]>(...) defines the type of
the expression as U[1].

Fixes llvm#62863
…m#96055)

This fixes llvm#93309, and seems
to match how GNU ld handles this case.
There are only three actual uses of the section kind in MCSection:
isText(), XCOFF, and WebAssembly. Store isText() in the MCSection, and
store other info in the actual section variants where required.

ELF and COFF flags also encode all relevant information, so for these
two section variants, remove the SectionKind parameter entirely.

This allows to remove the string switch (which is unnecessary and
inaccurate) from createELFSectionImpl. This was introduced in
[D133456](https://reviews.llvm.org/D133456), but apparently, it was
never hit for non-writable sections anyway and the resulting kind was
never used.
Reverts the behavior introduced by 770393b while keeping the refactored
code.

Fixes a miscompile on AArch64, at the cost of a small regression on
AMDGPU.
llvm#96146 opened to investigate the issue
…ues (llvm#89944)

The ABI mandates two things related to function calls:
 - Function arguments must be sign- or zero-extended to the register
   size by the caller.
 - Return values must be sign- or zero-extended to the register size by
   the callee.

As consequence, callees can assume that function arguments have been
extended and so can callers with regards to return values.

Here lies the problem: Nonsecure code might deliberately ignore this
mandate with the intent of attempting an exploit. It might try to pass
values that lie outside the expected type's value range in order to
trigger undefined behaviour, e.g. out of bounds access.

With the mitigation implemented, Secure code always performs extension
of values passed by Nonsecure code.

This addresses the vulnerability described in CVE-2024-0151.

Patches by Victor Campos.

---------

Co-authored-by: Victor Campos <victor.campos@arm.com>
…vm#96066)

Example:
```mlir
%mask = vector.create_mask %a, %b : vector<[4]x[8]xi1>
%slice = vector.extract %mask[%index]
           : vector<[8]xi1> from vector<[4]x[8]xi1>
```
Becomes:
```mlir
%mask_rows = vector.create_mask %a : vector<[4]xi1>
%mask_cols = vector.create_mask %b : vector<[8]xi1>
%slice = arm_sve.psel %mask_cols, %mask_rows[%index]
           : vector<[8]xi1>, vector<[4]xi1>
```

Note: While psel is under ArmSVE it requires SME (or SVE 2.1), so this
is currently the most logical place for this lowering.
…on-creation (llvm#94226)

This patch simplifies instruction creation by replacing all overloads of
instruction constructors/Create methods that are identical other than
the Instruction *InsertBefore/BasicBlock *InsertAtEnd/BasicBlock::iterator
InsertBefore argument with a single version that takes an InsertPosition
argument. The InsertPosition class can be implicitly constructed from
any of the above, internally converting them to the appropriate
BasicBlock::iterator value which can then be used to insert the
instruction (or to not insert it if an invalid iterator is passed).

The upshot of this is that code will be deduplicated, and all callsites
will switch to calling the new unified version without any changes
needed to make the compiler happy. There is at least one exception to
this; the construction of InsertPosition is a user-defined conversion,
so any caller that was already relying on a different user-defined
conversion won't work. In all of LLVM and Clang this happens exactly
once: at clang/lib/CodeGen/CGExpr.cpp:123 we try to construct an alloca
with an AssertingVH<Instruction> argument, which must now be cast to an
Instruction* by using `&*`. If this is more common elsewhere, it could
be fixed by adding an appropriate constructor to InsertPosition.
Previously, a symbol insertion requires (at least) three hash table
operations:

- Lookup/create entry in Symbols (main symbol table)
- Lookup NextUniqueID to deduplicate identical temporary labels
- Add entry to UsedNames, which is also used to serve as storage for the
symbol name in the MCSymbol.

All three lookups are done with the same name, so combining these into a
single table reduces the number of lookups to one. Thus, a pointer to a
symbol table entry can be passed to createSymbol to avoid a duplicate
lookup of the same name.

The new symbol table entry value is placed in a separate header to avoid
including MCContext in MCSymbol or vice versa.
llvm#96057)

This re-uses reduction declarations from intrinsic operators to add
support for reductions of allocatables, pointers, and arrays with
procedure designators (e.g. min/max).

I have split this into two commits to make it easier to review. The
first one makes the functional change. The second cleans things up now
that we can share much more code between intrinsic operators and
procedure designators.
…g / trailing dimensions. (llvm#92934)

Generalizes `DropUnitDimFromElementwiseOps` to support inner unit
dimensions.
This change stems from improving lowering of contractionOps for Arm SME.
Where we end up with inner unit dimensions on MulOp, BroadcastOp and
TransposeOp, preventing the generation of outerproducts.
discussed
[here](https://discourse.llvm.org/t/on-improving-arm-sme-lowering-resilience-in-mlir/78543/17?u=nujaa).

---------

Co-authored-by: Benjamin Maxwell <macdue@dueutil.tech>
Load from null is UB, load from pointer arg instead.
gep nuw can be null if and only if both the base pointer and offset
are null. Unlike the inbounds case this does not depend on whether
the null pointer is valid.

Proofs: https://alive2.llvm.org/ce/z/PLoqK5
We use explicit template instantiation for these classes, so there
is no need to have the definition in the header. The places that
instantiate the method will include the PassManagerImpl.h file.
Internal label names never occur in the symbol table, so when using an
object streamer, there's no point in constructing these names and then
adding them to hash tables -- they are never visible in the output.

It's not possible to reuse createTempSymbol, because on BPF has a
different prefix for globals and basic blocks right now.
…id (llvm#96167)

Without this SelectionDAG could fail assertions when using the intrinsic
in a non-entry BB.
fabio-d and others added 22 commits June 21, 2024 21:45
Namespaces are terminated with a closing comment in the majority of the
codebase so do the same here for consistency. Also format code within
some namespaces to make clang-format happy.
…6342)

Otherwise the startup objects will fail to link since they were cross compiled,
but the linker is not informed of the intent to cross compile, which results in
linker errors when the host architecture does not match the target
architecture.
…lvm#96337)

SubtargetPredicate should be the primary "does this instruction exist"
predicate, with OtherPredicates used for other side pieces of information.

Changes like 856d1c4 were backwards. The problematic usage is how
GFX12 is using HasRestrictedOffset. The multiclasses for buffers
should probably be split up instead of hiding OtherPredicates inside
the buffer atomic multiclasses. The two cases are mutually exclusive
and really need a negated predicate for the not-gfx12 case.

It's pretty terrible we have to manage this in the first place.
TableGen should be able to figure out the required predicates
from any instructions that appear in the pattern output.
…96357)

We only need to set `--target=` for LLD when cross compiling. This should fix
the host build using BFD or targeting the host.

Fixes: llvm#96342
#3) (llvm#93315)

The ThreadLocalCache implementation is used by the MLIRContext (among
other things) to try to manage thread contention in the StorageUniquers.
There is a bunch of fancy shared pointer/weak pointer setups that
basically keeps everything alive across threads at the right time, but a
huge bottleneck is the `weak_ptr::lock` call inside the `::get` method.

This is because the `lock` method has to hit the atomic refcount several
times, and this is bottlenecking performance across many threads.
However, all this is doing is checking whether the storage is
initialized. Importantly, when the `PerThreadInstance` goes out of
scope, it does not remove all of its associated entries from the
thread-local hash map (it contains dangling `PerThreadInstance *` keys).
The `weak_ptr` also allows the thread local cache to synchronize with
the `PerThreadInstance`'s destruction:

1. if `ThreadLocalCache` destructs, the `weak_ptr`s that reference its
contained values are immediately invalidated
2. if `CacheType` destructs within a thread, any entries still live are
removed from the owning `PerThreadInstance`, and it locks the `weak_ptr`
first to ensure it's kept alive long enough for the removal.

This PR changes the TLC entries to contain a `shared_ptr<ValueT*>` and a
`weak_ptr<PerInstanceState>`. It gives the `PerInstanceState` entries a
`weak_ptr<ValueT*>` on top of the `unique_ptr<ValueT>`. This enables
`ThreadLocalCache::get` to check if the value is initialized by
dereferencing the `shared_ptr<ValueT*>` and check if the contained
pointer is null. When `PerInstanceState` destructs, the values inside
the TLC are written to nullptr. The TLC uses the
`weak_ptr<PerInstanceState>` to satisfy (2).

(1) is no longer the case. When `ThreadLocalCache` begins destruction,
the `weak_ptr<PerInstanceState>` are invalidated, but not the
`shared_ptr<ValueT*>`. This is OK: because the overall object is being
destroyed, `::get` cannot get called and because the
`shared_ptr<PerInstanceState>` finishes destruction before freeing the
pointer, it cannot get reallocated to another `ThreadLocalCache` during
destruction. I.e. the values inside the TLC associated with a
`PerInstanceState` cannot be read during destruction. The most important
thing is to make sure destruction of the TLC doesn't race with the
destructor of `PerInstanceState`. Because `PerInstanceState` carries
`weak_ptr` references into the TLC, we guarantee to not have any
use-after-frees.
@mgehre-amd mgehre-amd changed the title [AutoBump] Merge with fixes of 13d983e7 (Jun 17) (82) [AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21) (82) Sep 13, 2024
@mgehre-amd mgehre-amd changed the title [AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21) (82) [AutoBump] Merge with fixes of 9b78ddf3b2ab (Jun 21, needs torch & onnx bump) (82) Sep 13, 2024
@mgehre-amd mgehre-amd changed the base branch from bump_to_21ba91c4 to feature/fused-ops September 16, 2024 10:59
@mgehre-amd mgehre-amd merged commit b309613 into feature/fused-ops Sep 16, 2024
6 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_13d983e7 branch September 16, 2024 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.