Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with fixes of 2d50029f (Aug 15, needs torch-mlir bump) (5) #358

Merged
merged 445 commits into from
Nov 29, 2024

Conversation

mgehre-amd
Copy link
Collaborator

@mgehre-amd mgehre-amd commented Sep 20, 2024

stefankoncarevic and others added 30 commits August 16, 2024 11:19
Defined AMDGPU DPP operation in mlir to represent semantics. Introduced
a new enumeration attribute for different permutations and allowed for
different types of arguments. Implemented constant attribute handling
for ROCDL::DPPMovOp operation. The operation now correctly accepts
constant attributes for dppCtrl, rowMask, bankMask, boundCtrl, and
passes them to the corresponding LLVM intrinsic.
… a few places. (llvm#104555)

PR llvm#80309 proposes to have users of APInt's uint64_t
constructor opt-in to implicit truncation. Currently, that patch
requires SelectionDAG::getConstant to opt-in.

This patch adds getSignedConstant so we can start fixing some of the
cases that require implicit truncation.
This PR is continuation of the [previous
one](llvm#101478). As a result, the
`emitc::SwitchOp` op was developed inspired by `scf::IndexSwitchOp`.

Main points of PR:

- Added the `emitc::SwitchOp` op  to the EmitC dialect + CppEmitter
- Corresponding tests were added
- Conversion from the SCF dialect to the EmitC dialect for the op
CodeGenIntrinsic changes:
  - Use `const` Record pointers, and `StringRef` when possible.
  - Default initialize several fields with their definition instead of in
 the constructor.
- Simplify various string checks in the constructor using StringRef
starts_with()/ends_with() functions.
- Eliminate first argument to `setDefaultProperties` and use `TheDef`
class member instead.

IntrinsicEmitter changes:
  - Emit `namespace llvm::Intrinsic` instead of nested namespaces.
  - End generated comments with a .
  - Use range based for loops, and early continue within loops.
  - Emit `static constexpr` instead of `static const` for arrays.
- Change `compareFnAttributes` to use std::tie() to compare intrinsic
attributes and return a default value when all attributes are equal.

STLExtras:
  - Add std::replace wrapper which takes a range.
…eGen/bit-int-ubsan.c (llvm#104607)

Add missing -triple x86_64-pc-linux-gnu line into RUN line, which should be here.

---------

Co-authored-by: Eänolituri Lómitaurë <vladislav.aranov@ericsson.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: Paul Kirth <paulkirth@google.com>
Co-authored-by: Vitaly Buka <vitalybuka@gmail.com>
Passing to the `PGOInstrumentationGen` pass whether it needs to produce contextual profiling instrumentation as a flag, in the process restructuring a bit the places that need to be aware of that (some were unnecessarily in `PGOInstrumentationUse`)
llvm#100367)

This is split off from llvm#71764, and moves only the vmv.v.v part of
performCombineVMergeAndVOps to work on MachineInstrs.

In retrospect trying to handle PseudoVMV_V_V and PseudoVMERGE_VVM in the
same function makes the code quite hard to read, so this just does it in
a separate peephole.

This turns out to be simpler since for PseudoVMV_V_V we don't need to
convert the Src instruction to a masked variant, and we don't need to
create a fake all ones mask.
This patch implements sandboxir::AtomicRMWInst mirroring
llvm::AtomicRMWInst.
Old Headergen needed extra build rules to ensure that it worked in
runtimes mode. This patch disables those checks if new headergen is
enabled. Also some new headers were not being properly built with
new headergen, and that's also fixed.
Similar to llvm#104481. Replace
more "Utility" dependencies with "UtilityHeaders" to avoid cyclic
dependency when building on macos.
An HLSL function has internal linkage by default unless it is:
1. shader entry point function
2. marked with the `export` keyword
(llvm#92812)
3. patch constant function (not implemented yet)

This PR adds a link-time pass `DXILFinalizeLinkage` that updates the
linkage of functions to make sure only shader entry points and exported
functions are visible from the module (have _program linkage_). All
other functions will be updated to have internal linkage.

Related spec update: microsoft/hlsl-specs#295

Fixes #llvm#92071
This reverts commit e592c2d.

We can finally reland the PR since the issue that caused the PR to be
reverted has been resolved in
llvm#104051.
This allows annotating fields of C/C++ structs using API Notes.

Previously API Notes supported Objective-C properties, but not fields.

rdar://131548377
…vm#102986)

When a test case inside of a gtest suite fails, we report a failure
which causes the entire `ninja check-lldb` invocation to fail, even if
the outer test case is marked as XFAIL - each test case result is
reported as its own lit test run. This PR updates lit so it checks
whether each test case's parent test suite is XFAIL before setting the
status to FAIL.

This is especially problematic because the failing tests can't manually
be marked as XFAIL, due to
llvm#102264.

Fixes llvm#102265

### Repro instructions

1. Modify any gtest test case to generate a failure.
2. Mark the outer lit test with XFAIL using either `--xfail-tests` flag
or `LIT_XFAIL` env var.
3. Run the tests
4. Observe the lit test is XFAIL as expected, but the failed child test
cases show up as separate failures.

Co-authored-by: kendal <kendal@thebrowser.company>
…llvm#104519)

This patch makes `-objc_relative_method_lists` default on MacOS
10.16+/iOS 14+. Manual override still work if command line argument is
provided.

To test this change, many explict arguments are removed from the test
files. Some explict `-objc_no_objc_relative_method_lists` are also added
for tests that don't support this yet.

This commit tries to revive llvm#101360, which exposes a bug that breaks CI.
llvm#104081 has fixed that bug.
This feature provided CPM_IOACC_CTL_EL3, a lone system register that has
been carried over since the original ARM64 implementation, where it was
the only processor-specific register in a long list of architectural
sysregs. We don't need it here.

It's been used as a generic processor-specific sysreg in tests, but the
functionality they target is now better covered in other more exhaustive
tests.
This analysis can't be used with other analyses if this isn't set.

Pull Request: llvm#104244
…or buffers" (llvm#104517)

Some build configs allow `llvm_unreachable` in a constexpr context, but
not all, so these functions that map a fully covered enum to a string
can't be constexpr. This version fixes that by dropping constexpr from
those functions.

This reverts commit fcc318f, reapplying
28d577e.

Original message follows:

This implements the DXILResourceAnalysis pass for `dx.TypedBuffer` and
`dx.RawBuffer` types. This should be sufficient to lower
`dx.handle.fromBinding` for this set of types, but it leaves a number of
TODOs around for other resource types.

This also includes a straightforward `print` method in `ResourceInfo` to
make the analysis testable. This is deliberately different than the
printer in `lib/Target/DirectX/DXILResource.cpp`, which attempts to
print bindings in a format compatible with the comments `dxc` prints. We
will eventually want to make that functionality driven by this analysis
pass, but it isn't sufficient for testing so we need both.
…es to be treated as loads (llvm#99999)

This change avoids deleting `!willReturn` intrinsics for which the
return value is unused when building the SDAG. Currently, calls to
read-only intrinsics not marked with `IntrWillReturn` cannot be deleted
at the LLVM IR level but may be deleted when building the SDAG. 
These calls are unsafe to remove from the IR because the functions are
`!willReturn` and should also be unsafe to remove fromthe SDAG for
the same reason. This change aligns the behavior of the SDAG to that
of LLVM IR. This change also requires that intrinsics not have the
`Throws` attribute to be treated as loads for the same reason.
Summary:
This used an old name I forgot to fix, linter didn't catch it because it
was behind `ifdef` and the branch which I tested it on I forgot to
update the one I landed.
Some new headers were not being properly built with
new headergen, since they were using the old "add_gen_header" instead of
the new "add_header_macro". This patch fixes the issue.
…4613)

Flang is switch to cc1 when we use `-x cuda`. Make sure we can use fc1
with cuda fortran input.

The current pipeline will fail at MLIR level for the moment. 

llvm#104483
This adds MachO support for emission of authenticated pointer
relocations.

We already support AArch64AuthMCExpr, to represent assembly expressions
such as:
  .quad <symbol>@AUTH(<key>, <discriminator> [, addr])
For example:
  .quad _g3@AUTH(ib, 1234, addr)

These @AUTH expressions lower to a new kind of MachO relocation:
  ARM64_RELOC_AUTHENTICATED_POINTER (11)

The relocation points to the referenced symbol.
The other data, describing the signing scheme and original addend
(only 32 bits instead of 64), is encoded into the addend (in the
relocated location):

  |63|62|61-51|50-49|  48  |47     -     32|31  -  0|
  | 1| 0|  0  | key | addr | discriminator | addend |
…lvm#94059)

This patch prevents thread-local constants to be merged within
PPCMergeStringPool.cpp.

The PPCMergeStringPool pass primarily merges non-thread-local constants
together, and thread-local constants should not be mixed together with
other (non-thread-local) constants. In the event that thread-local and
other non-thread-local constants are pooled together, the
llvm.threadlocal.address intrinsic can fail as it expects its argument
to be a thread-local global value, but the merged string structure
created by the PPCMergeStringPool pass is not thread-local as a whole.
…98764)

Implement VPWidenRecipe::computeCost for most cases (except 
UDiv,SDiv,URem,SRem which require additional logic).

Note that this specializes `::computeCost` instead of `::cost`, as
`VPRecipeBase::cost` is responsible for skipping cost-computations
for pre-computed recipes for now.

The most recent version of the VPlan-based cost model introduction 
has been committed on Jul 10 (b841e2e) and we should
probably give it at least a week in case additional mismatches surface.

PR: llvm#98764
jurahul and others added 19 commits August 19, 2024 15:31
- When an unterminated open { is detected in the format string, instead
of asserting and ignoring the error, replace that string with another to
indicate the error, and remove the assert as well.
- This will make the error evident in both assert and release builds and
make observing the error more convenient (as several uses of this
function are in TableGen and it is often built in release mode even in
debug builds)
…KnownNonEqual`; NFC

Downstream hit this assert, since it doesn't really make any
difference, just change code to return false.
Error: CommandLine Error: Option 'attributor-manifest-internal'
registered more than once

During the standalone debug build of offload the above error is seen at
app runtime when using a prebuilt llvm with LLVM_LINK_LLVM_DYLIB=ON.
This is caused by linking both libLLVM.so and various archives that are
found via llvm_map_components_to_libnames for jit support.
… on LLVM Dialect and LLVM Core in CMake build (llvm#104832)

This change removes dependencies declared as either 'LINK_LIBS' or
'LINK_COMPONENTS' across several MLIR libraries. The removed
dependencies appear
to be incorrect and may have been required in older versions of the
project.
These dependencies cause many high level dialects to have transitive
dependence on the LLVM dialect and the LLVM 'Core' library
('llvm/lib/IR').

Note that if using the 'Ninja' CMake generator, one can inspect the
dependencies
(including all transitive libraries) of any given MLIR target but using
the command `ninja -C <build dir> -t browse` and navigating to the
library
of interest in a web browser.
)

Previously the secondary cache retrieval algorithm would not allow
retrievals of memory chunks where the number of unused bytes would be
greater than than `MaxUnusedCachePages * PageSize` bytes. This meant
that even if a memory chunk satisfied the requirements of the optimal
fit algorithm, it may not be returned. This remains true if memory
tagging is enabled. However, if memory tagging is disabled, a new
heuristic has been put in place. Specifically, If a memory chunk is a
non-optimal fit, the cache retrieval algorithm will attempt to release
the excess memory to force a cache hit while keeping RSS down.

In the event that a memory chunk is a non-optimal fit, the retrieval
algorithm will release excess memory as long as the amount of memory to
be released is less than or equal to 16 KB. If the amount of memory to
be released exceeds 16 KB, the retrieval algorithm will not consider
that cached memory chunk valid for retrieval.
Inverse mapping needs to be updated for the result that was remapped (it
was previously only updated halfway).
Fix list formatting, improve the wording, and fix the description when
both options (note: prefer "option" to "flag" when arguments are
supported) are specified.

Pull Request: llvm#104886
D57497 added -msmall-data-limit= as an alias for -G and defaulted it to 8 for
-fno-pic/-fpie.

The behavior is already different from GCC in a few ways:

* GCC doesn't accept -G.
* GCC -fpie doesn't seem to use -msmall-data-limit=.
* GCC emits .srodata.cst* that we don't use (llvm#82214). Writable contents
  caused confusion (https://bugs.chromium.org/p/llvm/issues/detail?id=61)

In addition,

* claiming `-shared` means we don't get a desired `-Wunused-command-line-argument` for `clang --target=riscv64-linux-gnu -fpic -c -shared a.c`.
* -mcmodel=large doesn't work for RISC-V yet, so the special case is strange.
* It's quite unusual to emit a warning when an option (unrelated to relocation model) is used with -fpic.
* We don't want future configurations (Android) to continue adding customization to `SetRISCVSmallDataLimit`.

I believe the extra code just doesn't pull its weight and should be
cleaned up. This patch also changes the default to 0. GP relaxation
users are encouraged to specify these customization options explicitly.

Pull Request: llvm#83093
A quick follow-up fix for
llvm#99403

Buildbot
[reported](https://lab.llvm.org/buildbot/#/builders/168/builds/2330) an
error:

```
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ADT/FunctionExtrasTest.cpp:320:8: error: variable 'ptr' is uninitialized when used here [-Werror,-Wuninitialized]
  320 |       [ptr](void *self) {
      |        ^~~
/home/buildbots/llvm-external-buildbots/workers/ppc64le-lld-multistage-test/ppc64le-lld-multistage-test/llvm-project/llvm/unittests/ADT/FunctionExtrasTest.cpp:318:12: note: initialize the variable 'ptr' to silence this warning
  318 |   void *ptr;
      |            ^
      |             = nullptr
1 error generated.
```

So that PR does exactly what's sugested.
…dialects on LLVM Dialect and LLVM Core in CMake build (llvm#104832)"

This reverts commit 43b5085 since it
caused the build to break with BUILD_SHARED_LIBS=ON.
I started out by adding a new pointer type for blocks, and I was fully
prepared to compile their AST to bytecode and later call them.

... then I found out that the current interpreter doesn't support
calling blocks at all. So we reuse `Function` to support sources other
than `FunctionDecl`s and classify `BlockPointerType` as `PT_FnPtr`.
…s. (llvm#104876)

This was broken back in llvm#78658 when we transitioned away from archive
indexes to parsing lazy object files.

Fixes: llvm#94077
Fixes: emscripten-core/emscripten#22008
…able files. (llvm#102978)

This change is enough to allow `--strip-debug` to work on object files,
without breaking the relocation information or symbol table.

A more complete version of this change would instead reconstruct the
symbol table and relocation sections, but that is much larger change.

Bug: llvm#102002
We used integer comparisons instead of floating point comparisons
resulting in very odd behavior.
We would crash on sufficiently old NV hardware (Volta or so) due to
incorrectly marking certain operations legal.
@mgehre-amd mgehre-amd changed the title [AutoBump] Merge with fixes of 2d50029f (Aug 15) (5) [AutoBump] Merge with fixes of 2d50029f (Aug 15, needs torch-mlir bump) (5) Sep 20, 2024
@mgehre-amd mgehre-amd force-pushed the bump_to_2d50029f branch 2 times, most recently from 141a45a to d840dbb Compare September 20, 2024 09:03
Base automatically changed from bump_to_894d3eeb to feature/fused-ops November 29, 2024 10:15
@mgehre-amd mgehre-amd merged commit 09f9db8 into feature/fused-ops Nov 29, 2024
5 of 6 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_2d50029f branch November 29, 2024 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment