Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 1211d979 (Sep 11) (1) #415

Merged
merged 96 commits into from
Dec 12, 2024

Conversation

jorickert
Copy link

No description provided.

SchrodingerZhu and others added 30 commits September 11, 2024 12:22
The call chain to `Mutex:lock` can be polluted by stack protector. For
completely safe, let's postpone the main TLS tearing down to a separate
phase.

fix llvm#108030
HLSL 202x inherits from C++11, which generates additional loop hint
information for loops that must progress. Since HLSL 202x is going to be
the default for Clang we want to make sure all our tests pass with it.

Required for llvm#108044
…ard (llvm#106861)

Since llvm#87832, unnamed identifiers are excluded from being diagnosed. As
a result, the tests that were supposed to test that deleted functions
are correctly ignored, are ignored because of the unnamed identifiers
instead of the deleted function. This change simply introduces names for
the parameters of the deleted functions.
…llvm#107943)

GNU ld will error when encountering a pcrel_lo whose corresponding
pcrel_hi is in a different section. [1] introduced a check to help
prevent this issue by preventing outlining in a few circumstances.
However, we can also hit this same issue when outlining from functions
with prefixes ("hot"/"unlikely"/"unknown" from profile information, for
example) as the outlined function might not have the same prefix,
possibly resulting in a "paired" pcrel_lo and pcrel_hi ending up in
different sections.

To prevent this issue, take a similar approach as [1] and additionally
prevent outlining when we see a pcrel_lo and the function has a prefix.

[1]
llvm@96c85f8

Fixes llvm#107520
Convert BUILD_VECTORS with FP16x8 to I16x8 since there's no FP16 scalar
value to intialize v128.const.
…on. (llvm#108167)

Treat WTFReportBacktrace, which prints out the backtrace, as trivial.
…vm#108238)

Extend the lowering of atomic.fadd to support the v2f16 variant
avaliable on some AMDGPU chips.

Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>
…fadd (llvm#108238)" (llvm#108256)

This reverts commit 0d48d4d.

Mistakenly landed without approval
On newer GPUs, where `cvta.param` instruction is available we can avoid
making byval arguments when their pointers are used in a few more cases, 
even when `__grid_constant__` is not specified.

- phi
- select
- memcpy from the parameter.

Switched pointer traversal from a DIY implementation to PtrUseVisitor.
Recently in llvm#107731 this change was revereted due to excess memory size
in `TestSkinnyCore`. This was due to a bug where a range's end was being
passed as size. Creating massive memory ranges.

Additionally, and requiring additional review, I added more unit tests
and more verbose logic to the merging of save core memory regions.

@jasonmolenda as an FYI.
…#108201)

Change HTMLNamedCharacterReferenceEmitter to use const RecordKeeper.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…m#108203)

Change DataCollectors Emitter to use const RecordKeeper.

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
…vm#108213)

Change OpenCL builtins emitter to use const RecordKeeper

This is a part of effort to have better const correctness in TableGen
backends:


https://discourse.llvm.org/t/psa-planned-changes-to-tablegen-getallderiveddefinitions-api-potential-downstream-breakages/81089
Currently the nan* functions use nullptr dereferencing to crash with
SIGSEGV if the input is nullptr. Both `nan(nullptr)` and `nullptr`
dereferencing are undefined behaviors according to the C standard.
Employing `nullptr` dereference in the `nan` function implementation is
ok if users only linked against the pre-built library, but it might be
completely removed by the compilers' optimizations if it is built from
source together with the users' code.

See for instance:  https://godbolt.org/z/fd8KcM9bx

This PR uses volatile load to prevent the undefined behavior if libc is
built without sanitizers, and leave the current undefined behavior if
libc is built with sanitizers, so that the undefined behavior can be
caught for users' codes.
…is used (llvm#108263)

This matches the behaviour of GNU ld and the ELF version of lld.
…ation is laid out (llvm#105714)

In `User::operator new` a single allocation is created to store the
`User` object itself, "intrusive" operands or a pointer for "hung off"
operands, and the descriptor. After allocation, details about the layout
(number of operands, how the operands are stored, if there is a
descriptor) are stored in the `User` object by settings its fields. The
`Value` and `User` constructors are then very careful not to initialize
these fields so that the values set during allocation can be
subsequently read. However, when the `User` object is returned from
`operator new` [its value is technically "indeterminate" and so reading
a field without first initializing it is undefined behavior (and will be
erroneous in
C++26)](https://en.cppreference.com/w/cpp/language/default_initialization#Indeterminate_and_erroneous_values).

We discovered this issue when trying to build LLVM using MSVC's [`/sdl`
flag](https://learn.microsoft.com/en-us/cpp/build/reference/sdl-enable-additional-security-checks?view=msvc-170)
which clears class fields after allocation (the docs say that this
feature shouldn't be turned on for custom allocators and should only
clear pointers, but that doesn't seem to match the implementation).
MSVC's behavior both with and without the `/sdl` flag is standards
conforming since a program is supposed to initialize storage before
reading from it, thus the compiler implementation changing any values
will never be observed in a well-formed program. The standard also
provides no provisions for making storage bytes not indeterminate by
setting them during allocation or `operator new`.

The fix for this is to create a set of types that encode the layout and
provide these to both `operator new` and the constructor:
* The `AllocMarker` types are used to select which `operator new` to
use.
* `AllocMarker` can then be implicitly converted to a `AllocInfo` which
tells the constructor how the type was laid out.
Summary:
There's an extern weak symbol for this, we should just factor these into
a more common interface. Stub them temporarily to make the bots happy.
PTXAS does not handle extern weak.
This patch adds a benchmark for ReplaceUsesOfWith().
Make if constexpr due to constexpr condition.
Otherwise we fail to build with modules in C++03 mode once we migrate to
a single top-level module, because those headers get pulled in but they
don't compile as C++03.
It doesn't serve much of a purpose since we can easily put its contents
inside __config. Removing it simplifies the modulemap once we are trying
to create a single top-level module.
…8257)

We should allow singleton and fooSingleton as singleton function names.
The implementation would crash with unloaded dialects.
aaupov and others added 24 commits September 11, 2024 16:33
…do-probes mode (llvm#106365)

Implement selective probe parsing for profiled functions only when
emitting probe information to YAML profile as suggested in

llvm#102904 (review)

For a large binary, this reduces probe parsing 
- processing time from 10.5925s to 5.6295s,
- peak RSS from 10.54 to 7.98 GiB.
llvm#108107)

…tting" (llvm#108104)"

This recommits 0f56ba1 (reverted by
6007ad7). In the original patch
llvm/utils/lit/tests/escape-color.py failed on Windows because it diffed
llvm-lit output with a file containing '\n' newlines rather than '\r\n'.
This issue is avoided by calling 'diff --strip-trailing-cr'.

Original description below:
Test output that carried color across newlines previously resulted in
the formatting around the output also being colored. Detect the current
ANSI color and reset it when printing formatting, and then reapply it.
As an added bonus an unterminated color code is also detected,
preventing it from leaking out into the rest of the terminal.

Fixes llvm#106633
)

This adds VL patterns for vfwmaccbf16.vv so that we can handle fixed
length vectors.

It does this by teaching combineOp_VLToVWOp_VL to emit
RISCVISD::VFWMADD_VL for bf16. The change in getOrCreateExtendedOp is
needed because getNarrowType is based off of the bitwidth so returns
f16. We need to explicitly check for bf16.

Note that the .vf patterns don't work yet, since the build_vector splat
gets lowered to a (vmv_v_x_vl (fmv_x_anyexth x)) instead of a vfmv.v.f,
which SplatFP doesn't pick up, see llvm#106637.
Previously they were legal by default, so the truncstore/extload test
cases would get combined and crash during selection.
These are set to expand for f16 so do the same for bf16.
…lvm#108041)

This patch fixes attr type of out_shape, which is i64 dense array
attribute with exactly 4 elements.

- Fix description of DenseArrayMaxCt
- Add DenseArrayMinCt and move it to CommonAttrConstraints.td
- Change type of out_shape to Tosa_IntArrayAttr4

Fixes llvm#107804.
If the value we're replacing has a name, we might as well preserve it.
This patch implements sandboxir::ConstantTokenNone mirroring
llvm::ConstantTokenNone.
Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g.

```
%0 = scf.for() {
  %1 = scf.for() {
     tiledProducer
  }
  yield %1
}
%2 = consumer ins(%0)
```
Building with -DLLVM_ENABLE_EXPORTED_SYMBOLS_IN_EXECUTABLES=Off should not
prevent use of opt plugins.

This fix uses the approach implemented in
llvm#101741.

rdar://135841478
…#107648)

Hello Arjun! Please allow me to contribute this patch as it helps me
debugging significantly! When the 1's and 0's don't line up when
debugging farkas lemma of numerous polyhedrons using simplex lexmin
solver, it is truly straining on the eyes. Hopefully this patch can help
others!

The unfortunate part is the lack of testcase as I'm not sure how to add
testcase for debug dumps. :) However, you can add this testcase to the
SimplexTest.cpp to witness the nice printing!

```c++
TEST(SimplexTest, DumpTest) {
  int COLUMNS = 2;
  int ROWS = 2;
  LexSimplex simplex(COLUMNS * 2);
  IntMatrix m1(ROWS, COLUMNS * 2 + 1);
  // Adding LHS columns.
  for (int i = 0; i < ROWS; i++) {
    // an arbitrary formula to test all kinds of integers
    for (int j = 0; j < COLUMNS; j++) 
      m1(i, j) = i + (2 << (i % 3)) * (-1 * ((i + j) % 2));
  }
  // Adding RHS columns.
  for (int i = 0; i < ROWS; i++) {
    for (int j = 0; j < COLUMNS; j++)
      m1(i, j + COLUMNS) = j - (3 << (j % 4)) * (-1 * ((i + j * 2) % 2));
  }
  for (int i = 0; i < m1.getNumRows(); i++) {
    ArrayRef<DynamicAPInt> curRow = m1.getRow(i);
    simplex.addInequality(curRow);
  }
  IntegerRelation rel =
      parseRelationFromSet("(x, y, z)[] : (z - x - 17 * y == 0, x - 11 * z >= 1)",2);
  simplex.dump();
  m1.dump();
  rel.dump();
}
```

```
rows = 2, columns = 7
var: c3, c4, c5, c6
con: r0 [>=0], r1 [>=0]
r0: -1, r1: -2
c0: denom, c1: const, c2: 2147483647, c3: 0, c4: 1, c5: 2, c6: 3
  1  0  1  0 -2  0  1
  1  0 -8 -3  1  3  7

  0 -2  0  1  0
 -3  1  3  7  0
Domain: 2, Range: 1, Symbols: 0, Locals: 0
2 constraints
 -1  -17  1   0   = 0
  1   0  -11 -1  >= 0

```
…indows target (llvm#104676)

This PR first adds osutils for Windows, and changes some libc code to
make libc and its tests build on the Windows target. It then temporarily
disables some libc tests that are currently problematic on Windows.

Specifically, the changes besides the addition of osutils include:

- Macro `LIBC_TYPES_HAS_FLOAT16` is disabled on Windows. `clang-cl`
generates calls to functions in `compiler-rt` to handle float16
arithmetic and these functions are currently not linked in on Windows.
- Macro `LIBC_TYPES_HAS_INT128` is disabled on Windows.
- The invocation to `::aligned_malloc` is changed to an invocation to
`::_aligned_malloc`.
- The following unit tests are temporarily disabled because they
currently fail on Windows:
  - `test.src.__support.big_int_test`
  - `test.src.__support.arg_list_test`
  - `test.src.fenv.getenv_and_setenv_test`
- Tests involving `__m128i`, `__m256i`, and `__m512i` in
`test.src.string.memory_utils.op_tests.cpp`
- `test_range_errors` in `libc/test/src/math/smoke/AddTest.h` and
`libc/test/src/math/smoke/SubTest.h`
…-range-compare (NFC)

/llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:320:26:
error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
  preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0;
               ~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~
/llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:335:28:
error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
    preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0;
                 ~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~
2 errors generated.
)

Refactor current consumer fusion based on `addInitOperandsToLoopNest` to support single nested `scf.for`, E.g.

```
%0 = scf.for() {
  %1 = scf.for() {
     tiledProducer
  }
  yield %1
}
%2 = consumer ins(%0)
```

Compared with llvm#94190, this PR fix build failure by making C++17 happy.
Update ISDOpcodes.h documentation according to commit ad9d13d
("SelectionDAG: Swap operands of atomic_store") for less confusion.
…from int to FP. (llvm#108284)

selectFPImm previously handled cases where an FPImm could be
materialized in an integer register.

We can generalize this to cases where a value was in an integer register
and then copied to a scalar FP register to be used by a vector
instruction.

In the affected test, the call lowering code used up all of the FP
argument registers and started using GPRs. Now we use integer vector
instructions to consume those GPRs instead of moving them to scalar FP
first.
SSE & AVX do not include instructions for shifting i8 vectors. Instead,
they must be synthesized via other instructions.

If pairs of i8 vectors share a shift amount, we can use SWAR techniques
to substantially reduce the amount of code generated.

Say we were going to execute this shift right:
  x >> {0, 0, 0, 0, 4, 4, 4, 4, 0, 0, 0, 0, ...}

LLVM would previously generate:
        vpxor   %xmm1, %xmm1, %xmm1
        vpunpckhbw      %ymm0, %ymm1, %ymm2
        vpunpckhbw      %ymm1, %ymm0, %ymm3
        vpsllw  $4, %ymm3, %ymm3
        vpblendd        $204, %ymm3, %ymm2, %ymm2
        vpsrlw  $8, %ymm2, %ymm2
        vpunpcklbw      %ymm0, %ymm1, %ymm3
        vpunpcklbw      %ymm1, %ymm0, %ymm0
        vpsllw  $4, %ymm0, %ymm0
        vpblendd        $204, %ymm0, %ymm3, %ymm0
        vpsrlw  $8, %ymm0, %ymm0
        vpackuswb       %ymm2, %ymm0, %ymm0

Instead, we can reinterpret a pair of i8 elements as an i16 and shift
use the same shift amount. The only thing we need to do is mask out any
bits which crossed the boundary from the top i8 to the bottom i8.

This SWAR-style technique achieves:
        vpsrlw  $4, %ymm0, %ymm1
        vpblendd        $170, %ymm1, %ymm0, %ymm0
        vpand   .LCPI0_0(%rip), %ymm0, %ymm0

This is implemented for both left and right logical shift operations.
Arithmetic shifts are less well behaved here because the shift cannot
also perform the sign extension for the lower 8 bits.
@jorickert jorickert requested a review from mgehre-amd December 12, 2024 09:25
@jorickert jorickert enabled auto-merge December 12, 2024 11:11
@jorickert jorickert merged commit eee1900 into feature/fused-ops Dec 12, 2024
11 checks passed
@jorickert jorickert deleted the bump_to_1211d979 branch December 12, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.