Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) #424

Merged
merged 608 commits into from
Jan 6, 2025

Conversation

jorickert
Copy link

@jorickert jorickert commented Dec 13, 2024

cjappl and others added 30 commits September 24, 2024 17:08
Ran:

    git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format

No behavior change.
The accumulateUsedDefed() was missing if block prologue interference
check does not pass. This would cause incorrect register dependency,
which cause incorrect sinking.
… G_IMPLICIT_DEF (llvm#101331)"

This reverts commit 63b2595.
(llvmorg-20-init-6782-g63b2595846b8)
A few bots have been failing on `inst-select-unmerge-values.mir`
…lvm#109088)

This patch improves error message, and fixes a copy-paste
mistake in GET_REPORT_OPTIONS argument.

Address comment
llvm#104741 (comment).

---------

Co-authored-by: Vitaly Buka <vitalybuka@google.com>
Previously, this pass would not generate traps if `NoTrapAfterNoreturn`
was set and would generate traps even if the instruction directly before
the `UnreachableInst` was `llvm.trap()`.

Fix both of these problems and add some tests.
llvm#109882)

This adds to the assert the implicit global module case as in module
purview.

Fixes llvm#109879
…lvm#97121)

When an initializer is provided to a variable, the Linux kernel relied
on the compiler to zero-initialize unspecified fields, as clarified in
https://www.spinics.net/lists/netdev/msg1007244.html.

But clang doesn't guarantee this:
1. For a union type, if an empty initializer is given, clang only
   initializes bytes for the first field, left bytes for other (larger)
   fields are marked as undef. Accessing those undef bytes can lead
   to undefined behaviors.
2. For a union type, if an initializer explicitly sets a field, left
   bytes for other (larger) fields are marked as undef.
3. When an initializer is given, clang doesn't zero initialize padding.

So this patch makes the following change:
1. In C, when an initializer is provided for a variable, zero-initialize
   undef and padding fields in the initializer.
2. Document the change in LanguageExtensions.rst.

As suggested in
llvm#78034 (comment),
the change isn't required by C23, but it's standards conforming to do
so.

Fixes: llvm#97459
It should not be implied form mapping settings.
No longer disable frame records for fixed offset.
…9867)

These will be used to translate simple cuf.alloc/cuf.free and
cuf.data_transfer on scalar and constant size arrays.
## Purpose
Running the LLDB API tests against a remote Android target with NDK
version r22 or later fails to compile the test inferiors. NDK r21 from
2021 is the most recent NDK that still works with the LLDB API tests.
This PR updates the Android make rules to support newer Android NDK
versions (r19 and later).

## Overview
* Updates and simplifies `Android.rules` to match the newer Android NDK
unified toolchain layout introduced in NDK r19
* Sets `OBJCOPY` and `ARCHIVER` env vars, required by a few test cases,
to their `llvm-` versions in the unified toolchain
* Drops support for pre-2019 Android NDK versions to keep the rules
simple
* Provides an error message if the tests are run using an incompatible
NDK layout

## Problem Details
Android introduced a unified tools layout in NDK r19 (2019) and removed
the old layout in r22 (2021). Releases r19, r20, and r21 support both
the old and new layout side-by-side. More details are in llvm#106270.

## Validation
Ran a sub-set of the LLDB API tests against remote Android targets for
the four primary architectures i386, x86_64, arm, and aarch64. No
validation was done against riscv targets.

For each case, ran the copy of `lldb-server` from the Android NDK on the
device with the latest LLDB test cases in llvm-project

Ran tests with both r19 (the oldest supported) and r26 (more recent,
unified layout only) NDK versions.

Example test command for aarch64:
```
./build/bin/lldb-dotest --out-of-tree-debugserver --arch aarch64 --platform-name remote-android --platform-url connect://localhost:5432 --platform-working-dir /data/local/tmp --compiler=$ANDROID_NDK_ROOT/toolchains/llvm/prebuilt/linux-x86_64/bin/clang lldb/test/API/android/
```
**NOTE: there are a lot of test failures when running the full suite
(especially against 32-bit ARM target). These failures occur independent
of this change.**

Verified the expected error message appears when attempting to run using
NDK r18
```
Build Command Output:
make: Entering directory '/home/andrew/src/llvm/llvm-project/build/lldb-test-build.noindex/android/platform/TestDefaultCacheLineSize.test_cache_line_size'
/home/andrew/src/llvm/llvm-project/lldb/packages/Python/lldbsuite/test/make/Android.rules:16: *** "No unified toolchain sysroot found in /home/andrew/Android/Sdk/ndk/18.1.5063045/toolchains/llvm/prebuilt/linux-x86_64/bin/../../../../... NDK must be r19 or later.".  Stop.
make: Leaving directory '/home/andrew/src/llvm/llvm-project/build/lldb-test-build.noindex/android/platform/TestDefaultCacheLineSize.test_cache_line_size'
```

## Impact
**This change explicitly removes support for the pre-2019 NDK
structure.** Only NDK r19 (from 2019) and later can be used when running
the LLDB API tests. If the maintainers object, we can easily support
both the old and new NDK toolchain layouts side-by-side at the cost of
readability/maintainability. Since this change only impacts tests, I
don't see much value in supporting NDKs that are over 5 years old.

## Guidance to Reviewers
* I am not an expert on `clang` arguments so if anything looks off let
me know.
* While I personally thing supporting 5+ year old NDKs for testing seems
unnecessary, please chime-in if you are concerned with dropping that
support. I can easily revise to support both old and new layouts
side-by-side.
* If there are any specific tests you'd like me to run I will do my best
to accommodate. It doesn't look like there's much (any?) Android LLDB CI
coverage.
… in C" (llvm#109898)

Reverts llvm#97121

Causing failures on LNT bots; log shows a crash in
ConstStructBuilder::BuildStruct.
…m#108516)

This PR adds semantic checks to ensure the atomic capture construct
conforms to one of the valid forms:
[capture-stmt, update-stmt], [capture-stmt, write-stmt] or [update-stmt,
capture-stmt].

Functions checkForSymbolMatch and checkForSingleVariableOnRHS are moved
from flang/lib/Lower/DirectivesCommon.h to flang/Semantics/tools.h for
reuse.

---------

Co-authored-by: Kiran Chandramohan <kiranchandramohan@gmail.com>
llvm#109878)

This change is part of this proposal:
https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294

This preliminary work adds the intrinsic to llvm and expands using atan
intrinsic for DXIL backend, since DXIL has no atan2 op.

Part 1 for Implement the atan2 HLSL Function llvm#70096.

(reland llvm#108865 reverted in llvm#109842 due to doc build break)
Flags "-hwasan-mapping-offset" and
"-hwasan-mapping-offset-dynamic" are mutually
exclusive, use the last one.
… intrinsics with Zvfhmin. (llvm#109889)

These intrinsics don't produce any instructions so don't require Zvfh.

This makes Zvfhmin consistent with Zvfbfmin. See also
riscv-non-isa/rvv-intrinsic-doc#351
…Reg. NFC (llvm#109848)

I think the 8 here represents RVVBitsPerBlock / 8.
Adopt scaled indent in PredicateExpander.
Added pre/post inc/dec operators to `indent` and related unit tests.
Verified by comparing *.inc files generated by LLVM build with/without
the change.
The extension has been ratified for some time, but we kept it
experimental (see llvm#99898) due to
<riscv-non-isa/riscv-elf-psabi-doc#444>. The
ABI issue has been resolved by llvm#101023 so I believe there's no known
barrier to moving Zacas to non-experimental.
**Description:** 

`OneShotModuleBufferize` deals with the bufferization of `FuncOp`,
`CallOp` and `ReturnOp` but they are hard-coded. Any custom
function-like operations will not be handled. The PR replaces a part of
`FuncOp` and `CallOp` with `FunctionOpInterface` and `CallOpInterface`
in `OneShotModuleBufferize` so that custom function ops and call ops can
be bufferized.

**Related Discord Discussion:**
[Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900)

---------

Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>
…09829)

So we don't accidentally try to use those with the wrong type.
try_emplace can default-construct the value, so:

  try_emplace(CSId, CallTargetMapTy())
  try_emplace(CSId)

are equivalent to each other.  We can further simplify the function
using the fact that Map.try_emplace(Key).first->second is the same as
Map[Key].
…86483)

This patch adds basic constant range support for floating-point types to enable range-based optimizations.
This patch extends [D34590](https://reviews.llvm.org/D34590) to check
assumption violations.

---------

Co-authored-by: Vitaly Buka <vitalybuka@google.com>
…lvm#109828)

We can't make the assumption that types are always fine in std
functions.
artagnon and others added 15 commits September 26, 2024 16:08
There is an underlying bug in KnownBits, and we should theoretically be
able to determine the high-bits of an srem as shown in the test, just
like urem. In preparation to fix this bug, add pre-commit tests testing
high-bits of srem and urem.
This PR fixes how broadcast dims (identified as "zero" results in
permutation maps) corresponding to a reduction iterator are vectorised
in the case of generic Ops. Here's an example:

```mlir
  #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
  #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)>

  func.func @generic_with_reduction_and_broadcast(%arg0: tensor<1x12x197x197xf32>) -> (tensor<1x12x197x1xf32>) {
    %0 = tensor.empty() : tensor<1x12x197x1xf32>

    %1 = linalg.generic {indexing_maps = [#map, #map1],
                        iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
      ins(%arg0 : tensor<1x12x197x197xf32>)
      outs(%0 : tensor<1x12x197x1xf32>) {

    ^bb0(%in: f32, %out: f32):
      %818 = arith.addf %in, %out : f32
      linalg.yield %818 : f32
    } -> tensor<1x12x197x1xf32>
    return %1 : tensor<1x12x197x1xf32>
  }
```

This is a perfectly valid Generic Op, but currently triggers two issues
in the vectoriser. The root cause is this map:

```mlir
  #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)>
```

This map triggers an assert in `reindexIndexingMap` -  this hook
incorrectly assumes that every result in the input map is a `dim`
expression and that there are no constants. That's not the case in this
example. `reindexIndexingMap` is extended to allow maps like the one
above. For now, only constant "zero" results are allowed. This can be
extended in the future once a good motivating example is available.

Separately, the permutation map highlighted above "breaks" mask
calculation (ATM masks are always computed, even in the presence of
static shapes). When applying the following permutation:
```mlir
  (d0, d1, d2, d3) -> (d0, d1, d2, 0)
```

to these canonical shapes (corresponding to the example above):
```
  (1, 12, 197, 197)
```
we end up with the following error:
```bash
error: vector types must have positive constant sizes but got 1, 12, 197, 0
```

The error makes sense and indicates that we should update the
permutation map above to:
```
  (d0, d1, d2, d3) -> (d0, d1, d2)
```

This would correctly give the following vector type:
```
  vector<1x12x197xi1>
```

Fixes llvm#97247
Refine `createPadHighOp` so that the output tensor is required to be
statically shaped. This is to prevent the current behaviour, which is
incorrect:

>  // If `type` has dynamic dimensions the padding width is set to zero.

The actual padding width should be set to: `%new_dim - %old_dim`, where
%new_dim` and `%old_dim` are defined via e.g. `tensor.dim` Op applied to
output and input tensors, respectively.

This PR is an attempt to clarify the semantics surrounding dynamic
shapes in preparation for adding support for scalable vectors to the
pack/unpack logic in Tensor/Linalg (dynamic shapes is what we use to
model scalable (*) sizes at the Tensor/MemRef level).

(*) Scalable as in Arm's Scalable Vector Extension (SVE)
This patch implements following intrinsics:

```
float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm)	
float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm)
float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm)
float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm)
float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm)
```

as defined in ARM-software/acle#323

Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
This patch fixes failure of  acle_neon_fscale.c in non-aarch64 targets.
Don't call raw_string_ostream::flush(), which is essentially a no-op. As
specified in the docs, raw_string_ostream is always unbuffered. (
65b1361 for further reference )
…110138)

We recently added versioning support to Flang's OpenMP, which restricts
and enables certain things based on the OpenMP specification version.
Currently one of the check-offload tests makes use of a feature that's
at a slightly higher version than the current default causing it to
fail.

This PR basically applies the highest current OpenMP version number as a
default argument for the lit.cfg, if we need more fine grained control
in the future we can expand it to different lit commands for each
relevant version than can then be added in each test. But for now, to
keep it simple, just set the max level version.
…es (llvm#107638)

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
The `NodeCounts` parameter of `calcExtTspScore()` is unused, so remove
it.
Use `SmallVector` since arrays are expected to be small since they
represent MBBs.
# Why?

In real-time programming, you often have a process or dispatch loop that
is called many, many, many times. Without de-duplication the user will
be drowning in errors.

Introduce a way to only print the stacks one time only, if they have
been seen before
…10147)

Fix typo which should be "at least" instead of "at lease".
@jorickert jorickert changed the title [AutoBump] Merge with fixes of 85278611 (Sep 21) (10) [AutoBump] Merge with fixes of 85278611 (Needs torch PR)(Sep 21) (10) Dec 13, 2024
Base automatically changed from bump_to_c57b9f5a to feature/fused-ops January 2, 2025 07:31
[AutoBump] Merge with fixes of 852b648 (Needs torch and onnx bump) (Sep 26) (12)
[AutoBump] Merge with f0162fcd (Sep 26) (11)
@mgehre-amd mgehre-amd changed the title [AutoBump] Merge with fixes of 85278611 (Needs torch PR)(Sep 21) (10) [AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) Jan 2, 2025
@mgehre-amd mgehre-amd merged commit b3562f3 into feature/fused-ops Jan 6, 2025
4 of 5 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_85278611 branch January 6, 2025 10:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment