[AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) #424

jorickert · 2024-12-13T16:13:06Z

Required torch commit: [AutoBump] Merge with fixes of 2374b9e0 (Oct 04, needs LLVM bump) (68) torch-mlir#435
Required onnx commit: [AutoBump] Merge with fixes of ec314b7c (Needs llvm bump) (Oct 02) (18) onnx-mlir#240

Ran: git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format No behavior change.

The accumulateUsedDefed() was missing if block prologue interference check does not pass. This would cause incorrect register dependency, which cause incorrect sinking.

…7257)

… G_IMPLICIT_DEF (llvm#101331)" This reverts commit 63b2595. (llvmorg-20-init-6782-g63b2595846b8) A few bots have been failing on `inst-select-unmerge-values.mir`

…lvm#109088) This patch improves error message, and fixes a copy-paste mistake in GET_REPORT_OPTIONS argument. Address comment llvm#104741 (comment). --------- Co-authored-by: Vitaly Buka <vitalybuka@google.com>

Previously, this pass would not generate traps if `NoTrapAfterNoreturn` was set and would generate traps even if the instruction directly before the `UnreachableInst` was `llvm.trap()`. Fix both of these problems and add some tests.

llvm#109882) This adds to the assert the implicit global module case as in module purview. Fixes llvm#109879

…lvm#97121) When an initializer is provided to a variable, the Linux kernel relied on the compiler to zero-initialize unspecified fields, as clarified in https://www.spinics.net/lists/netdev/msg1007244.html. But clang doesn't guarantee this: 1. For a union type, if an empty initializer is given, clang only initializes bytes for the first field, left bytes for other (larger) fields are marked as undef. Accessing those undef bytes can lead to undefined behaviors. 2. For a union type, if an initializer explicitly sets a field, left bytes for other (larger) fields are marked as undef. 3. When an initializer is given, clang doesn't zero initialize padding. So this patch makes the following change: 1. In C, when an initializer is provided for a variable, zero-initialize undef and padding fields in the initializer. 2. Document the change in LanguageExtensions.rst. As suggested in llvm#78034 (comment), the change isn't required by C23, but it's standards conforming to do so. Fixes: llvm#97459

It should not be implied form mapping settings. No longer disable frame records for fixed offset.

…9867) These will be used to translate simple cuf.alloc/cuf.free and cuf.data_transfer on scalar and constant size arrays.

## Purpose Running the LLDB API tests against a remote Android target with NDK version r22 or later fails to compile the test inferiors. NDK r21 from 2021 is the most recent NDK that still works with the LLDB API tests. This PR updates the Android make rules to support newer Android NDK versions (r19 and later). ## Overview * Updates and simplifies `Android.rules` to match the newer Android NDK unified toolchain layout introduced in NDK r19 * Sets `OBJCOPY` and `ARCHIVER` env vars, required by a few test cases, to their `llvm-` versions in the unified toolchain * Drops support for pre-2019 Android NDK versions to keep the rules simple * Provides an error message if the tests are run using an incompatible NDK layout ## Problem Details Android introduced a unified tools layout in NDK r19 (2019) and removed the old layout in r22 (2021). Releases r19, r20, and r21 support both the old and new layout side-by-side. More details are in llvm#106270. ## Validation Ran a sub-set of the LLDB API tests against remote Android targets for the four primary architectures i386, x86_64, arm, and aarch64. No validation was done against riscv targets. For each case, ran the copy of `lldb-server` from the Android NDK on the device with the latest LLDB test cases in llvm-project Ran tests with both r19 (the oldest supported) and r26 (more recent, unified layout only) NDK versions. Example test command for aarch64: ``` ./build/bin/lldb-dotest --out-of-tree-debugserver --arch aarch64 --platform-name remote-android --platform-url connect://localhost:5432 --platform-working-dir /data/local/tmp --compiler=$ANDROID_NDK_ROOT/toolchains/llvm/prebuilt/linux-x86_64/bin/clang lldb/test/API/android/ ``` **NOTE: there are a lot of test failures when running the full suite (especially against 32-bit ARM target). These failures occur independent of this change.** Verified the expected error message appears when attempting to run using NDK r18 ``` Build Command Output: make: Entering directory '/home/andrew/src/llvm/llvm-project/build/lldb-test-build.noindex/android/platform/TestDefaultCacheLineSize.test_cache_line_size' /home/andrew/src/llvm/llvm-project/lldb/packages/Python/lldbsuite/test/make/Android.rules:16: *** "No unified toolchain sysroot found in /home/andrew/Android/Sdk/ndk/18.1.5063045/toolchains/llvm/prebuilt/linux-x86_64/bin/../../../../... NDK must be r19 or later.". Stop. make: Leaving directory '/home/andrew/src/llvm/llvm-project/build/lldb-test-build.noindex/android/platform/TestDefaultCacheLineSize.test_cache_line_size' ``` ## Impact **This change explicitly removes support for the pre-2019 NDK structure.** Only NDK r19 (from 2019) and later can be used when running the LLDB API tests. If the maintainers object, we can easily support both the old and new NDK toolchain layouts side-by-side at the cost of readability/maintainability. Since this change only impacts tests, I don't see much value in supporting NDKs that are over 5 years old. ## Guidance to Reviewers * I am not an expert on `clang` arguments so if anything looks off let me know. * While I personally thing supporting 5+ year old NDKs for testing seems unnecessary, please chime-in if you are concerned with dropping that support. I can easily revise to support both old and new layouts side-by-side. * If there are any specific tests you'd like me to run I will do my best to accommodate. It doesn't look like there's much (any?) Android LLDB CI coverage.

… in C" (llvm#109898) Reverts llvm#97121 Causing failures on LNT bots; log shows a crash in ConstStructBuilder::BuildStruct.

…m#108516) This PR adds semantic checks to ensure the atomic capture construct conforms to one of the valid forms: [capture-stmt, update-stmt], [capture-stmt, write-stmt] or [update-stmt, capture-stmt]. Functions checkForSymbolMatch and checkForSingleVariableOnRHS are moved from flang/lib/Lower/DirectivesCommon.h to flang/Semantics/tools.h for reuse. --------- Co-authored-by: Kiran Chandramohan <kiranchandramohan@gmail.com>

llvm#109878) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This preliminary work adds the intrinsic to llvm and expands using atan intrinsic for DXIL backend, since DXIL has no atan2 op. Part 1 for Implement the atan2 HLSL Function llvm#70096. (reland llvm#108865 reverted in llvm#109842 due to doc build break)

Flags "-hwasan-mapping-offset" and "-hwasan-mapping-offset-dynamic" are mutually exclusive, use the last one.

… intrinsics with Zvfhmin. (llvm#109889) These intrinsics don't produce any instructions so don't require Zvfh. This makes Zvfhmin consistent with Zvfbfmin. See also riscv-non-isa/rvv-intrinsic-doc#351

…Reg. NFC (llvm#109848) I think the 8 here represents RVVBitsPerBlock / 8.

Adopt scaled indent in PredicateExpander. Added pre/post inc/dec operators to `indent` and related unit tests. Verified by comparing *.inc files generated by LLVM build with/without the change.

The extension has been ratified for some time, but we kept it experimental (see llvm#99898) due to <riscv-non-isa/riscv-elf-psabi-doc#444>. The ABI issue has been resolved by llvm#101023 so I believe there's no known barrier to moving Zacas to non-experimental.

**Description:** `OneShotModuleBufferize` deals with the bufferization of `FuncOp`, `CallOp` and `ReturnOp` but they are hard-coded. Any custom function-like operations will not be handled. The PR replaces a part of `FuncOp` and `CallOp` with `FunctionOpInterface` and `CallOpInterface` in `OneShotModuleBufferize` so that custom function ops and call ops can be bufferized. **Related Discord Discussion:** [Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900) --------- Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>

…09829) So we don't accidentally try to use those with the wrong type.

try_emplace can default-construct the value, so: try_emplace(CSId, CallTargetMapTy()) try_emplace(CSId) are equivalent to each other. We can further simplify the function using the fact that Map.try_emplace(Key).first->second is the same as Map[Key].

…86483) This patch adds basic constant range support for floating-point types to enable range-based optimizations.

This patch extends [D34590](https://reviews.llvm.org/D34590) to check assumption violations. --------- Co-authored-by: Vitaly Buka <vitalybuka@google.com>

…lvm#109828) We can't make the assumption that types are always fine in std functions.

There is an underlying bug in KnownBits, and we should theoretically be able to determine the high-bits of an srem as shown in the test, just like urem. In preparation to fix this bug, add pre-commit tests testing high-bits of srem and urem.

This PR fixes how broadcast dims (identified as "zero" results in permutation maps) corresponding to a reduction iterator are vectorised in the case of generic Ops. Here's an example: ```mlir #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> func.func @generic_with_reduction_and_broadcast(%arg0: tensor<1x12x197x197xf32>) -> (tensor<1x12x197x1xf32>) { %0 = tensor.empty() : tensor<1x12x197x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%arg0 : tensor<1x12x197x197xf32>) outs(%0 : tensor<1x12x197x1xf32>) { ^bb0(%in: f32, %out: f32): %818 = arith.addf %in, %out : f32 linalg.yield %818 : f32 } -> tensor<1x12x197x1xf32> return %1 : tensor<1x12x197x1xf32> } ``` This is a perfectly valid Generic Op, but currently triggers two issues in the vectoriser. The root cause is this map: ```mlir #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> ``` This map triggers an assert in `reindexIndexingMap` - this hook incorrectly assumes that every result in the input map is a `dim` expression and that there are no constants. That's not the case in this example. `reindexIndexingMap` is extended to allow maps like the one above. For now, only constant "zero" results are allowed. This can be extended in the future once a good motivating example is available. Separately, the permutation map highlighted above "breaks" mask calculation (ATM masks are always computed, even in the presence of static shapes). When applying the following permutation: ```mlir (d0, d1, d2, d3) -> (d0, d1, d2, 0) ``` to these canonical shapes (corresponding to the example above): ``` (1, 12, 197, 197) ``` we end up with the following error: ```bash error: vector types must have positive constant sizes but got 1, 12, 197, 0 ``` The error makes sense and indicates that we should update the permutation map above to: ``` (d0, d1, d2, d3) -> (d0, d1, d2) ``` This would correctly give the following vector type: ``` vector<1x12x197xi1> ``` Fixes llvm#97247

Refine `createPadHighOp` so that the output tensor is required to be statically shaped. This is to prevent the current behaviour, which is incorrect: > // If `type` has dynamic dimensions the padding width is set to zero. The actual padding width should be set to: `%new_dim - %old_dim`, where %new_dim` and `%old_dim` are defined via e.g. `tensor.dim` Op applied to output and input tensors, respectively. This PR is an attempt to clarify the semantics surrounding dynamic shapes in preparation for adding support for scalable vectors to the pack/unpack logic in Tensor/Linalg (dynamic shapes is what we use to model scalable (*) sizes at the Tensor/MemRef level). (*) Scalable as in Arm's Scalable Vector Extension (SVE)

This patch implements following intrinsics: ``` float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm) float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm) float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm) float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm) float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm) ``` as defined in ARM-software/acle#323 Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>

This patch fixes failure of acle_neon_fscale.c in non-aarch64 targets.

Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b1361 for further reference )

Similar to previous cleanup.

…110138) We recently added versioning support to Flang's OpenMP, which restricts and enables certain things based on the OpenMP specification version. Currently one of the check-offload tests makes use of a feature that's at a slightly higher version than the current default causing it to fail. This PR basically applies the highest current OpenMP version number as a default argument for the lit.cfg, if we need more fine grained control in the future we can expand it to different lit commands for each relevant version than can then be added in each test. But for now, to keep it simple, just set the max level version.

…es (llvm#107638) This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes llvm#86193

The `NodeCounts` parameter of `calcExtTspScore()` is unused, so remove it. Use `SmallVector` since arrays are expected to be small since they represent MBBs.

# Why? In real-time programming, you often have a process or dispatch loop that is called many, many, many times. Without de-duplication the user will be drowning in errors. Introduce a way to only print the stacks one time only, if they have been seen before

…10147) Fix typo which should be "at least" instead of "at lease".

Full revamp of the 'quant' dialect. This is an implementation for the RFC at https://discourse.llvm.org/t/rfc-improvements-in-the-quant-dialect/79942

[AutoBump] Merge with fixes of 852b648 (Needs torch and onnx bump) (Sep 26) (12)

[AutoBump] Merge with f0162fcd (Sep 26) (11)

…85278611

cjappl and others added 30 commits September 24, 2024 17:08

[rtsan] Introduce halt_on_error flag (llvm#109832)

9ef9acb

[gn] Reformat build files

4a2d24e

Ran: git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format No behavior change.

[MachineSink] Update register dependency correctly (llvm#109763)

e33e087

The accumulateUsedDefed() was missing if block prologue interference check does not pass. This would cause incorrect register dependency, which cause incorrect sinking.

[compiler-rt] Add missing carry to 128x128->256 wide multiply (llvm#9…

2495130

…7257)

Revert "[AMDGPU][GlobalIsel] Use isRegisterClassType for G_FREEZE and…

4fc08b6

… G_IMPLICIT_DEF (llvm#101331)" This reverts commit 63b2595. (llvmorg-20-init-6782-g63b2595846b8) A few bots have been failing on `inst-select-unmerge-values.mir`

[Clang][compiler-rt][UBSan] Improve __ubsan_handle_invalid_builtin (l…

642bfd8

…lvm#109088) This patch improves error message, and fixes a copy-paste mistake in GET_REPORT_OPTIONS argument. Address comment llvm#104741 (comment). --------- Co-authored-by: Vitaly Buka <vitalybuka@google.com>

[clang] fix assert in ADL finding entity in the implicit global module (

0a42c7c

llvm#109882) This adds to the assert the implicit global module case as in module purview. Fixes llvm#109879

[NVPTX][NFC] Refactor utilities to use std::optional (llvm#109883)

489acb2

[hwasan] Add "-hwasan-with-frame-record" (llvm#109620)

4ca4460

It should not be implied form mapping settings. No longer disable frame records for fixed offset.

[flang][cuda] Add entry point for alloc/free and simple copy (llvm#10…

fa627d9

…9867) These will be used to translate simple cuf.alloc/cuf.free and cuf.data_transfer on scalar and constant size arrays.

Revert "[clang][CodeGen] Zero init unspecified fields in initializers…

d50eaac

… in C" (llvm#109898) Reverts llvm#97121 Causing failures on LNT bots; log shows a crash in ConstStructBuilder::BuildStruct.

[hwasan] Consider order of mapping copts (llvm#109621)

b218048

Flags "-hwasan-mapping-offset" and "-hwasan-mapping-offset-dynamic" are mutually exclusive, use the last one.

[RISCV] Enable f16 vget/vset/vcreate/vlmul_ext/vlmul_trunc/vundefined…

3b8c78a

… intrinsics with Zvfhmin. (llvm#109889) These intrinsics don't produce any instructions so don't require Zvfh. This makes Zvfhmin consistent with Zvfbfmin. See also riscv-non-isa/rvv-intrinsic-doc#351

[RISCV] Use RVVBitsPerBlock in assignRVVStackObjectOffsets and adjust…

d0878f1

…Reg. NFC (llvm#109848) I think the 8 here represents RVVBitsPerBlock / 8.

[llvm] Use std::optional::value_or (NFC) (llvm#109890)

74d9f7c

[NFC][TableGen] Adopt scaled indent in PredicateExpander (llvm#109801)

c92137e

Adopt scaled indent in PredicateExpander. Added pre/post inc/dec operators to `indent` and related unit tests. Verified by comparing *.inc files generated by LLVM build with/without the change.

[bazel] Port f586b1e (llvm#109908)

470e5af

[clang][bytecode][NFC] Add type assertions to ArrayElem{,Pop} (llvm#1…

e1365ce

…09829) So we don't accidentally try to use those with the wrong type.

[LLVM][IR] Add constant range support for floating-point types (llvm#…

fa824dc

…86483) This patch adds basic constant range support for floating-point types to enable range-based optimizations.

[UBSan] Diagnose assumption violation (llvm#104741)

d8f555d

This patch extends [D34590](https://reviews.llvm.org/D34590) to check assumption violations. --------- Co-authored-by: Vitaly Buka <vitalybuka@google.com>

[clang][bytecode] Fix diagnosing std::construct_at with wrong type (l…

4bd3a62

…lvm#109828) We can't make the assumption that types are always fine in std functions.

[clang] Use std::optional::value_or (NFC) (llvm#109894)

416f101

artagnon and others added 15 commits September 26, 2024 16:08

Fix "[AArch64] Implement NEON vscale intrinsics" (llvm#110136)

24d707e

This patch fixes failure of acle_neon_fscale.c in non-aarch64 targets.

[lldb] Don't flush llvm::raw_string_ostream (NFC) (llvm#110128)

f35719f

Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b1361 for further reference )

[Driver][test] Replace legacy -target with --target=

784e0cf

Similar to previous cleanup.

[mlir] Use std::optional::value_or (NFC) (llvm#109893)

b52885b

[NFC][CodeLayout] Remove unused parameter (llvm#110145)

fbec1c2

The `NodeCounts` parameter of `calcExtTspScore()` is unused, so remove it. Use `SmallVector` since arrays are expected to be small since they represent MBBs.

[NFC] [Flang] [Semantics] [OpenMP] Fix typo in error message. (llvm#1…

f0162fc

…10147) Fix typo which should be "at least" instead of "at lease".

[mlir] Improvements to the 'quant' dialect (llvm#100667)

852b648

Full revamp of the 'quant' dialect. This is an implementation for the RFC at https://discourse.llvm.org/t/rfc-improvements-in-the-quant-dialect/79942

[AutoBump] Merge with fixes of 8527861 (Sep 21)

18bca5d

jorickert changed the title ~~[AutoBump] Merge with fixes of 85278611 (Sep 21) (10)~~ [AutoBump] Merge with fixes of 85278611 (Needs torch PR)(Sep 21) (10) Dec 13, 2024

jorickert mentioned this pull request Dec 13, 2024

Update tests to match changed op order Xilinx/torch-mlir#416

Closed

jorickert added 2 commits December 13, 2024 12:18

[AutoBump] Merge with f0162fcd (Sep 26)

e7cd77b

[AutoBump] Merge with fixes of 852b648 (Sep 26)

69364a9

mgehre-amd approved these changes Dec 16, 2024

View reviewed changes

Base automatically changed from bump_to_c57b9f5a to feature/fused-ops January 2, 2025 07:31

mgehre-amd added 2 commits January 2, 2025 10:41

Merge pull request #426 from Xilinx/bump_to_852b6486

f141579

[AutoBump] Merge with fixes of 852b648 (Needs torch and onnx bump) (Sep 26) (12)

Merge pull request #425 from Xilinx/bump_to_f0162fcd

ddc0879

[AutoBump] Merge with f0162fcd (Sep 26) (11)

mgehre-amd changed the title ~~[AutoBump] Merge with fixes of 85278611 (Needs torch PR)(Sep 21) (10)~~ [AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) Jan 2, 2025

Merge remote-tracking branch 'origin/feature/fused-ops' into bump_to_…

b51a5a5

…85278611

mgehre-amd merged commit b3562f3 into feature/fused-ops Jan 6, 2025
4 of 5 checks passed

mgehre-amd deleted the bump_to_85278611 branch January 6, 2025 10:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) #424

[AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) #424

jorickert commented Dec 13, 2024 •

edited

Loading

[AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) #424

[AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) #424

Conversation

jorickert commented Dec 13, 2024 • edited Loading

jorickert commented Dec 13, 2024 •

edited

Loading