Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 507e59aa (13) #270

Merged
merged 301 commits into from
Aug 19, 2024
Merged

Conversation

cferry-AMD
Copy link
Collaborator

No description provided.

paperchalice and others added 30 commits March 23, 2024 12:53
Add Passes in dependency list
Prepare for dag-isel, also migrate some test case
The zero-value DT_JMPREL is benign but not needed.
This is also code simplification available after https://reviews.llvm.org/D65651
Here we introduce three new GMIR instructions to cover a set of trap
intrinsics. The idea behind it is that generic intrinsics shouldn't be
used with G_INTRINSIC opcode.

These new instructions can match perfectly with existing trap ISD nodes.
It allows X86, AArch64, RISCV and Mips to reuse SelectionDAG patterns for
selection and avoid manual selection. However AMDGPU is an exception. It
selects traps during legalization regardless SelectionDAG or GlobalISel.

Since there are not many places where traps are used, this change
attempts to clean up all the usages of G_INTRINSIC with trap intrinsics. So,
there is no stage when both G_TRAP and
G_INTRINSIC_W_SIDE_EFFECTS(@llvm.trap) are allowed.
…odules (llvm#85917)

Clang modules take a significant compile time hit when pushing and
popping diagnostics. Since all the headers are marked as system headers
in the modulemap, we can simply disable this pushing and popping when
building with clang modules.
Remove getSizeOrUnknown call when MachineMemOperand is created.  For Scalable
TypeSize, the MemoryType created becomes a scalable_vector.

2 MMOs that have scalable memory access can then use the updated BasicAA that
understands scalable LocationSize.

Original Patch by Harvin Iriawan
Co-authored-by: David Green <david.green@arm.com>
```
---------------------------------------------------
Benchmark                           old         new
---------------------------------------------------
bm_mismatch<char>/1           0.835 ns      2.37 ns
bm_mismatch<char>/2            1.44 ns      2.60 ns
bm_mismatch<char>/3            2.06 ns      2.83 ns
bm_mismatch<char>/4            2.60 ns      3.29 ns
bm_mismatch<char>/5            3.15 ns      3.77 ns
bm_mismatch<char>/6            3.82 ns      4.17 ns
bm_mismatch<char>/7            4.29 ns      4.52 ns
bm_mismatch<char>/8            4.78 ns      4.86 ns
bm_mismatch<char>/16           9.06 ns      7.54 ns
bm_mismatch<char>/64           31.7 ns      19.1 ns
bm_mismatch<char>/512           249 ns      8.16 ns
bm_mismatch<char>/4096         1956 ns      44.2 ns
bm_mismatch<char>/32768       15498 ns       501 ns
bm_mismatch<char>/262144     123965 ns      4479 ns
bm_mismatch<char>/1048576    495668 ns     21306 ns
bm_mismatch<short>/1          0.710 ns      2.12 ns
bm_mismatch<short>/2           1.03 ns      2.66 ns
bm_mismatch<short>/3           1.29 ns      3.56 ns
bm_mismatch<short>/4           1.68 ns      4.29 ns
bm_mismatch<short>/5           1.96 ns      5.18 ns
bm_mismatch<short>/6           2.59 ns      5.91 ns
bm_mismatch<short>/7           2.86 ns      6.63 ns
bm_mismatch<short>/8           3.19 ns      7.33 ns
bm_mismatch<short>/16          5.48 ns      13.0 ns
bm_mismatch<short>/64          16.6 ns      4.06 ns
bm_mismatch<short>/512          130 ns      13.8 ns
bm_mismatch<short>/4096         985 ns      93.8 ns
bm_mismatch<short>/32768       7846 ns      1002 ns
bm_mismatch<short>/262144     63217 ns     10637 ns
bm_mismatch<short>/1048576   251782 ns     42471 ns
bm_mismatch<int>/1            0.716 ns      1.91 ns
bm_mismatch<int>/2             1.21 ns      2.49 ns
bm_mismatch<int>/3             1.38 ns      3.46 ns
bm_mismatch<int>/4             1.71 ns      4.04 ns
bm_mismatch<int>/5             2.00 ns      4.98 ns
bm_mismatch<int>/6             2.43 ns      5.67 ns
bm_mismatch<int>/7             3.05 ns      6.38 ns
bm_mismatch<int>/8             3.22 ns      7.09 ns
bm_mismatch<int>/16            5.18 ns      12.8 ns
bm_mismatch<int>/64            16.6 ns      5.28 ns
bm_mismatch<int>/512            129 ns      25.2 ns
bm_mismatch<int>/4096          1009 ns       201 ns
bm_mismatch<int>/32768         7776 ns      2144 ns
bm_mismatch<int>/262144       62371 ns     20551 ns
bm_mismatch<int>/1048576     254750 ns     90097 ns
```
Use `LINK_COMPONENTS` parameter of `add_llvm_library` rather than
passing LLVM components directly to `target_link_libraries`, in order to
ensure that LLVM dylib is linked correctly when used. Otherwise, CMake
insists on linking to static libraries that aren't present on
distributions doing pure dylib installs, such as Gentoo.

This fixes a regression introduced
in dcbddc2.
If any return from overwriteChangedFiles is true some fixes were not
applied.
Commit f44db24 (2015) enabled this
simplication.
MIPS is different and should better off use separate code.
…lvm#84464)

Instead of keeping a mapping of Inst->VPValues (of their corresponding
recipes) in VPlan's Value2VPValue mapping, keep it in VPRecipeBuilder
instead. After recently replacing the last user of this mapping after
initial construction, this mapping is only needed for recipe
construction (to map IR operands to VPValue operands).

By moving the mapping, VPlan's VPValue tracking can be simplified and
limited only to live-ins. It also allows removing disableValue2VPValue
and associated machinery & asserts.

PR: llvm#84464
…ion (llvm#86386)

Move the code adding top-level cmake/Modules directory to
CMAKE_MODULE_PATH prior to including `GetDarwinLinkerVersion`, in order
to fix standalone builds.

Fixes a regression introduced by
3bc71c2.
Hide the implementations of `FuncHashes` and `BBHashMap` classes,
getting rid of `at` accessors that could throw an exception.

Test Plan: NFC

Reviewers: ayermolo, maksfb, dcci, rafaelauler

Reviewed By: rafaelauler

Pull Request: llvm#86353
Attach branch counters to YAML profile, covering intra-function control
flow.

Depends on: llvm#86353

Test Plan: Updated bolt/test/X86/bolt-address-translation-yaml.test

Reviewers: rafaelauler, dcci, ayermolo, maksfb

Reviewed By: rafaelauler

Pull Request: llvm#76911
…non-splat vector SREM expansion when we aren't hitting the special case. (llvm#86238)

Fixes llvm#84830
Introduced in llvm#82706
The indexed MemProf file has a huge amount of redundancy.  In a large
internal application, 82% of call stacks, stored in
IndexedAllocationInfo::CallStack, are duplicates.

We should work toward deduplicating call stacks by referring to them
with unique IDs with actual call stacks stored in a separate data
structure, much like we refer to memprof::Frame with memprof::FrameId.

At the same time, we need to facilitate a graceful transition from the
current version of the MemProf format to the next.  We should be able
to read (but not write) the current version of the MemProf file even
after we move onto the next one.

With those goals in mind, I propose to have an integer ID next to
CallStack in IndexedAllocationInfo to refer to a call stack in a
succinct manner.  We'll gradually increase the areas of the compiler
where IDs and call stacks have one-to-one correspondence and
eventually remove the existing CallStack field.

This patch adds call stack ID, named CSId, to IndexedAllocationInfo
and teaches the raw profile reader to compute unique call stack IDs
and store them in the new field.  It does not introduce any user of
the call stack IDs yet, except in verifyFunctionProfileData.
This commit adds the `BufferViewFlowOpInterface` to the bufferization
dialect. This interface can be implemented by ops that operate on
buffers to indicate that a buffer op result and/or region entry block
argument may be the same buffer as a buffer operand (or a view thereof).
This interface is queried by the `BufferViewFlowAnalysis`.

The new interface has two interface methods:
* `populateDependencies`: Implementations use the provided callback to
declare dependencies between operands and op results/region entry block
arguments. E.g., for `%r = arith.select %c, %m1, %m2 : memref<5xf32>`,
the interface implementation should declare two dependencies: %m1 -> %r
and %m2 -> %r.
* `mayBeTerminalBuffer`: An SSA value is a terminal buffer if the buffer
view flow analysis stops at the specified value. E.g., because the value
is a newly allocated buffer or because no further information is
available about the origin of the buffer.

Ops that implement the `RegionBranchOpInterface` or `BranchOpInterface`
do not have to implement the `BufferViewFlowOpInterface`. The buffer
dependencies can be inferred from those two interfaces.

This commit makes the `BufferViewFlowAnalysis` more accurate. For
unknown ops, it conservatively used to declare all combinations of
operands and op results/region entry block arguments as dependencies
(false positives). This is no longer the case. While the analysis is
still a "maybe" analysis with false positives (e.g., when analyzing ops
such as `arith.select` or `scf.if` where the taken branch is not known
at compile time), results and region entry block arguments of unknown
ops are now marked as terminal buffers.

This commit addresses a TODO in `BufferViewFlowAnalysis.cpp`:
```
// TODO: We should have an op interface instead of a hard-coded list of
// interfaces/ops.
```
It is no longer needed to hard-code ops.
…lvm#86416)

Reverting llvm#85188 with follow up patches.

This reverts commit 362d263.
This reverts commit c9bdeab.
This reverts commit 6bc6e1a.
This reverts commit 01fa550.
This reverts commit ddcbab3.
RKSimon and others added 25 commits March 26, 2024 10:43
Fixes llvm#83561.

When a thread is blocked on a mutex and we send an async signal to that
mutex, it never arrives because tsan thinks that `pthread_mutex_lock` is
not a blocking function. This patch marks `pthread_*_lock` functions as
blocking so we can successfully deliver async signals like `SIGPROF`
when the thread is blocked on them.

See the issue also for more details. I also added a test, which is a
simplified version of the compiler explorer example I posted in the
issue.

Please let me know if you have any other ideas or things to improve!
Happy to work on them.

Also I filed llvm#83844 which is more tricky because we don't have a libc
wrapper for `SYS_futex`. I'm not sure how to intercept this yet. Please
let me know if you have ideas on that as well. Thanks!
…tsan (llvm#86537)

Fixes llvm#83844.

This PR adds callbacks to mark futex syscalls as blocking. Unfortunately
we didn't have a mechanism before to mark syscalls as a blocking call,
so I had to implement it, but it mostly reuses the `BlockingCall`
implementation
[here](https://github.com/llvm/llvm-project/blob/96819daa3d095cf9f662e0229dc82eaaa25480e8/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp#L362-L380).

The issue includes some information but this issue was discovered
because Rust uses futexes directly. So most likely we need to update
Rust as well to use these callbacks.

Also see the latest comments in llvm#85188 for some context.
I also sent another PR llvm#84162 to mark `pthread_*_lock` calls as
blocking.
The latest ACLE allows it and further clarifies the following
in regards to the combination of the two attributes:

"If the `default` matches with another explicitly provided
 version in the same translation unit, then the compiler can
 emit only one function instead of the two. The explicitly
 provided version shall be preferred."

("default" refers to the default clone here)

ARM-software/acle#310
…aming functions. (llvm#85388)

Similar to how we protected FP/fixed-vector arguments and results from
calls, we should do the same for arguments/results from locally-streaming
functions such that those are not spilled/filled as ZPR registers.

This may cause a small regression (additional spills/fills), which is
addressed by llvm#85386.
…86535)

Testing with MSVC link.exe showed that it respects such options, while
LLD currently discards them.
…es (llvm#86595)

Summary:
We have a plugin singleton that implements the Plugin interface. This
then spawns separate device and kernels. Previously when these needed to
reach into the global singleton they would use the `PluginTy::get`
routine to get access to it. In the future we will move away from this
as the lifetime of the plugin will be handled by `libomptarget`
directly. This patch removes uses of this inside of the plugin
implementaion themselves by simply keeping a reference to the plugin
inside of the device.

The external `__tgt_rtl` functions still use the global method, but will
be removed later.
…ed .def files. (llvm#86564)

It's similar to llvm#86535, but for export specified in .def files.
llvm#86486)

This attribute tells the compiler that the variable must have its exit-time
destructor run, so it makes sense that it would silence the warning telling
users that an exit-time destructor is required.

Fixes llvm#68686
llvm-project/clang/lib/Sema/SemaDecl.cpp:11653:20:
error: unused variable 'OldMVKind' [-Werror,-Wunused-variable]
  MultiVersionKind OldMVKind = OldFD->getMultiVersionKind();
                   ^
1 error generated.
This patch removes APIs that creating NUW neg. It is a trivial case
because `sub nuw 0, X` always gets simplified into zero.
I believe there is no optimization opportunities in the real-world
applications that we can take advantage of the nuw flag.

Motivated by
llvm#84792 (comment).

Compile-time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=da7b7478b7cbb32c09d760f6b8d0e67901e0d533&stat=instructions:u
Fixed the printing of templated argument list and added test case.
…NFC. (llvm#86259)

This patch passes APInt by const reference in m_SpecificInt instead of
by value. Specifically, it refactors `m_SpecificInt(uint64_t V)` to
avoid APInt construction and dangling reference.

I believe it is safe to pass the APInt by const reference into
`m_SpecificInt` even if it is a temporary.
See also https://en.cppreference.com/w/cpp/language/lifetime
> All temporary objects are destroyed as the last step in evaluating the
[full-expression](https://en.cppreference.com/w/cpp/language/expressions#Full-expressions)
that (lexically) contains the point where they were created

Compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=d1f182c895728d89c5c3d198b133e212a5d9d4a3&to=7edf459b95ab2be33b70ec67faf87b3b8cc84f09&stat=instructions:u
…5911)

Currently patchpoints can only have two result types, `void` and `i64`.
This limits the result to general purpose registers.
This patch makes `patchpoint.i64` an overloadable intrinsic, allowing
result values that can fit in a single register (e.g. integers,
pointers, floats).
Need to include initial sext/zext/trunc nodes to the list of the demoted
root values to correctly calculate the cost and handle the
vectorization.
Added a new variant of the CHECK() function that takes a custom message
as a parameter. This is useful for more meaninful error messages when
the compiler is expected to crash.

Fixes llvm#78931
…lvm#84864)

This change:
- Updates the existing Clang User's Manual section on SPGO so that it
describes how to use llvm-profgen to perform SPGO on Windows. This is
new functionality implemented in llvm#83972.
- Fixes a minor typo in the existing llvm-profgen invocation example.
- Adds an LLVM release note on this new functionality in llvm-profgen.
…#86655)

This matches the CMake targets and reduces the number of headers that
need to be included in multiple targets.
…5378)

Using remove() on DeclContext::lookup_result list invalidates iterators.

This assertion failure was one (fortunate) symptom:
```
clang/include/clang/AST/DeclBase.h:1337: reference clang::DeclListNode::iterator::operator*() const: Assertion `Ptr && "dereferencing end() iterator"' failed.
```
…ich are needed to authenticate signed pointers (llvm#67454)" (llvm#86674)

This reverts commit 8bd1f91.

It appears that the commit broke msan bots.
These tests show invalid tbaa.struct metadata that is currently accepted
in preparation for a change to the IR Verifier that will then reject it.

PR: llvm#86167
@cferry-AMD cferry-AMD requested a review from cmcgirr-amd August 16, 2024 12:16
Base automatically changed from bump_to_72c729f3 to feature/fused-ops August 19, 2024 06:35
@cferry-AMD cferry-AMD merged commit 647a14d into feature/fused-ops Aug 19, 2024
9 checks passed
@cferry-AMD cferry-AMD deleted the bump_to_507e59aa branch August 19, 2024 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment