forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10) #424
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
jorickert
commented
Dec 13, 2024
•
edited
Loading
edited
- Required torch commit: [AutoBump] Merge with fixes of 2374b9e0 (Oct 04, needs LLVM bump) (68) torch-mlir#435
- Required onnx commit: [AutoBump] Merge with fixes of ec314b7c (Needs llvm bump) (Oct 02) (18) onnx-mlir#240
Ran: git ls-files '*.gn' '*.gni' | xargs llvm/utils/gn/gn.py format No behavior change.
The accumulateUsedDefed() was missing if block prologue interference check does not pass. This would cause incorrect register dependency, which cause incorrect sinking.
… G_IMPLICIT_DEF (llvm#101331)" This reverts commit 63b2595. (llvmorg-20-init-6782-g63b2595846b8) A few bots have been failing on `inst-select-unmerge-values.mir`
…lvm#109088) This patch improves error message, and fixes a copy-paste mistake in GET_REPORT_OPTIONS argument. Address comment llvm#104741 (comment). --------- Co-authored-by: Vitaly Buka <vitalybuka@google.com>
Previously, this pass would not generate traps if `NoTrapAfterNoreturn` was set and would generate traps even if the instruction directly before the `UnreachableInst` was `llvm.trap()`. Fix both of these problems and add some tests.
llvm#109882) This adds to the assert the implicit global module case as in module purview. Fixes llvm#109879
…lvm#97121) When an initializer is provided to a variable, the Linux kernel relied on the compiler to zero-initialize unspecified fields, as clarified in https://www.spinics.net/lists/netdev/msg1007244.html. But clang doesn't guarantee this: 1. For a union type, if an empty initializer is given, clang only initializes bytes for the first field, left bytes for other (larger) fields are marked as undef. Accessing those undef bytes can lead to undefined behaviors. 2. For a union type, if an initializer explicitly sets a field, left bytes for other (larger) fields are marked as undef. 3. When an initializer is given, clang doesn't zero initialize padding. So this patch makes the following change: 1. In C, when an initializer is provided for a variable, zero-initialize undef and padding fields in the initializer. 2. Document the change in LanguageExtensions.rst. As suggested in llvm#78034 (comment), the change isn't required by C23, but it's standards conforming to do so. Fixes: llvm#97459
It should not be implied form mapping settings. No longer disable frame records for fixed offset.
…9867) These will be used to translate simple cuf.alloc/cuf.free and cuf.data_transfer on scalar and constant size arrays.
## Purpose Running the LLDB API tests against a remote Android target with NDK version r22 or later fails to compile the test inferiors. NDK r21 from 2021 is the most recent NDK that still works with the LLDB API tests. This PR updates the Android make rules to support newer Android NDK versions (r19 and later). ## Overview * Updates and simplifies `Android.rules` to match the newer Android NDK unified toolchain layout introduced in NDK r19 * Sets `OBJCOPY` and `ARCHIVER` env vars, required by a few test cases, to their `llvm-` versions in the unified toolchain * Drops support for pre-2019 Android NDK versions to keep the rules simple * Provides an error message if the tests are run using an incompatible NDK layout ## Problem Details Android introduced a unified tools layout in NDK r19 (2019) and removed the old layout in r22 (2021). Releases r19, r20, and r21 support both the old and new layout side-by-side. More details are in llvm#106270. ## Validation Ran a sub-set of the LLDB API tests against remote Android targets for the four primary architectures i386, x86_64, arm, and aarch64. No validation was done against riscv targets. For each case, ran the copy of `lldb-server` from the Android NDK on the device with the latest LLDB test cases in llvm-project Ran tests with both r19 (the oldest supported) and r26 (more recent, unified layout only) NDK versions. Example test command for aarch64: ``` ./build/bin/lldb-dotest --out-of-tree-debugserver --arch aarch64 --platform-name remote-android --platform-url connect://localhost:5432 --platform-working-dir /data/local/tmp --compiler=$ANDROID_NDK_ROOT/toolchains/llvm/prebuilt/linux-x86_64/bin/clang lldb/test/API/android/ ``` **NOTE: there are a lot of test failures when running the full suite (especially against 32-bit ARM target). These failures occur independent of this change.** Verified the expected error message appears when attempting to run using NDK r18 ``` Build Command Output: make: Entering directory '/home/andrew/src/llvm/llvm-project/build/lldb-test-build.noindex/android/platform/TestDefaultCacheLineSize.test_cache_line_size' /home/andrew/src/llvm/llvm-project/lldb/packages/Python/lldbsuite/test/make/Android.rules:16: *** "No unified toolchain sysroot found in /home/andrew/Android/Sdk/ndk/18.1.5063045/toolchains/llvm/prebuilt/linux-x86_64/bin/../../../../... NDK must be r19 or later.". Stop. make: Leaving directory '/home/andrew/src/llvm/llvm-project/build/lldb-test-build.noindex/android/platform/TestDefaultCacheLineSize.test_cache_line_size' ``` ## Impact **This change explicitly removes support for the pre-2019 NDK structure.** Only NDK r19 (from 2019) and later can be used when running the LLDB API tests. If the maintainers object, we can easily support both the old and new NDK toolchain layouts side-by-side at the cost of readability/maintainability. Since this change only impacts tests, I don't see much value in supporting NDKs that are over 5 years old. ## Guidance to Reviewers * I am not an expert on `clang` arguments so if anything looks off let me know. * While I personally thing supporting 5+ year old NDKs for testing seems unnecessary, please chime-in if you are concerned with dropping that support. I can easily revise to support both old and new layouts side-by-side. * If there are any specific tests you'd like me to run I will do my best to accommodate. It doesn't look like there's much (any?) Android LLDB CI coverage.
… in C" (llvm#109898) Reverts llvm#97121 Causing failures on LNT bots; log shows a crash in ConstStructBuilder::BuildStruct.
…m#108516) This PR adds semantic checks to ensure the atomic capture construct conforms to one of the valid forms: [capture-stmt, update-stmt], [capture-stmt, write-stmt] or [update-stmt, capture-stmt]. Functions checkForSymbolMatch and checkForSingleVariableOnRHS are moved from flang/lib/Lower/DirectivesCommon.h to flang/Semantics/tools.h for reuse. --------- Co-authored-by: Kiran Chandramohan <kiranchandramohan@gmail.com>
llvm#109878) This change is part of this proposal: https://discourse.llvm.org/t/rfc-all-the-math-intrinsics/78294 This preliminary work adds the intrinsic to llvm and expands using atan intrinsic for DXIL backend, since DXIL has no atan2 op. Part 1 for Implement the atan2 HLSL Function llvm#70096. (reland llvm#108865 reverted in llvm#109842 due to doc build break)
Flags "-hwasan-mapping-offset" and "-hwasan-mapping-offset-dynamic" are mutually exclusive, use the last one.
… intrinsics with Zvfhmin. (llvm#109889) These intrinsics don't produce any instructions so don't require Zvfh. This makes Zvfhmin consistent with Zvfbfmin. See also riscv-non-isa/rvv-intrinsic-doc#351
…Reg. NFC (llvm#109848) I think the 8 here represents RVVBitsPerBlock / 8.
Adopt scaled indent in PredicateExpander. Added pre/post inc/dec operators to `indent` and related unit tests. Verified by comparing *.inc files generated by LLVM build with/without the change.
The extension has been ratified for some time, but we kept it experimental (see llvm#99898) due to <riscv-non-isa/riscv-elf-psabi-doc#444>. The ABI issue has been resolved by llvm#101023 so I believe there's no known barrier to moving Zacas to non-experimental.
**Description:** `OneShotModuleBufferize` deals with the bufferization of `FuncOp`, `CallOp` and `ReturnOp` but they are hard-coded. Any custom function-like operations will not be handled. The PR replaces a part of `FuncOp` and `CallOp` with `FunctionOpInterface` and `CallOpInterface` in `OneShotModuleBufferize` so that custom function ops and call ops can be bufferized. **Related Discord Discussion:** [Link](https://discord.com/channels/636084430946959380/642426447167881246/1280556809911799900) --------- Co-authored-by: erick-xanadu <110487834+erick-xanadu@users.noreply.github.com>
…09829) So we don't accidentally try to use those with the wrong type.
try_emplace can default-construct the value, so: try_emplace(CSId, CallTargetMapTy()) try_emplace(CSId) are equivalent to each other. We can further simplify the function using the fact that Map.try_emplace(Key).first->second is the same as Map[Key].
…86483) This patch adds basic constant range support for floating-point types to enable range-based optimizations.
This patch extends [D34590](https://reviews.llvm.org/D34590) to check assumption violations. --------- Co-authored-by: Vitaly Buka <vitalybuka@google.com>
…lvm#109828) We can't make the assumption that types are always fine in std functions.
There is an underlying bug in KnownBits, and we should theoretically be able to determine the high-bits of an srem as shown in the test, just like urem. In preparation to fix this bug, add pre-commit tests testing high-bits of srem and urem.
This PR fixes how broadcast dims (identified as "zero" results in permutation maps) corresponding to a reduction iterator are vectorised in the case of generic Ops. Here's an example: ```mlir #map = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)> #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> func.func @generic_with_reduction_and_broadcast(%arg0: tensor<1x12x197x197xf32>) -> (tensor<1x12x197x1xf32>) { %0 = tensor.empty() : tensor<1x12x197x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%arg0 : tensor<1x12x197x197xf32>) outs(%0 : tensor<1x12x197x1xf32>) { ^bb0(%in: f32, %out: f32): %818 = arith.addf %in, %out : f32 linalg.yield %818 : f32 } -> tensor<1x12x197x1xf32> return %1 : tensor<1x12x197x1xf32> } ``` This is a perfectly valid Generic Op, but currently triggers two issues in the vectoriser. The root cause is this map: ```mlir #map1 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, 0)> ``` This map triggers an assert in `reindexIndexingMap` - this hook incorrectly assumes that every result in the input map is a `dim` expression and that there are no constants. That's not the case in this example. `reindexIndexingMap` is extended to allow maps like the one above. For now, only constant "zero" results are allowed. This can be extended in the future once a good motivating example is available. Separately, the permutation map highlighted above "breaks" mask calculation (ATM masks are always computed, even in the presence of static shapes). When applying the following permutation: ```mlir (d0, d1, d2, d3) -> (d0, d1, d2, 0) ``` to these canonical shapes (corresponding to the example above): ``` (1, 12, 197, 197) ``` we end up with the following error: ```bash error: vector types must have positive constant sizes but got 1, 12, 197, 0 ``` The error makes sense and indicates that we should update the permutation map above to: ``` (d0, d1, d2, d3) -> (d0, d1, d2) ``` This would correctly give the following vector type: ``` vector<1x12x197xi1> ``` Fixes llvm#97247
Refine `createPadHighOp` so that the output tensor is required to be statically shaped. This is to prevent the current behaviour, which is incorrect: > // If `type` has dynamic dimensions the padding width is set to zero. The actual padding width should be set to: `%new_dim - %old_dim`, where %new_dim` and `%old_dim` are defined via e.g. `tensor.dim` Op applied to output and input tensors, respectively. This PR is an attempt to clarify the semantics surrounding dynamic shapes in preparation for adding support for scalable vectors to the pack/unpack logic in Tensor/Linalg (dynamic shapes is what we use to model scalable (*) sizes at the Tensor/MemRef level). (*) Scalable as in Arm's Scalable Vector Extension (SVE)
This patch implements following intrinsics: ``` float16x4_t vscale_f16(float16x4_t vn, int16x4_t vm) float16x8_t vscaleq_f16(float16x8_t vn, int16x8_t vm) float32x2_t vscale_f32(float32x2_t vn, int32x2_t vm) float32x4_t vscaleq_f32(float32x4_t vn, int32x4_t vm) float64x2_t vscaleq_f64(float64x2_t vn, int64x2_t vm) ``` as defined in ARM-software/acle#323 Co-authored-by: Hassnaa Hamdi <hassnaa.hamdi@arm.com>
This patch fixes failure of acle_neon_fscale.c in non-aarch64 targets.
Don't call raw_string_ostream::flush(), which is essentially a no-op. As specified in the docs, raw_string_ostream is always unbuffered. ( 65b1361 for further reference )
Similar to previous cleanup.
…110138) We recently added versioning support to Flang's OpenMP, which restricts and enables certain things based on the OpenMP specification version. Currently one of the check-offload tests makes use of a feature that's at a slightly higher version than the current default causing it to fail. This PR basically applies the highest current OpenMP version number as a default argument for the lit.cfg, if we need more fine grained control in the future we can expand it to different lit commands for each relevant version than can then be added in each test. But for now, to keep it simple, just set the max level version.
…es (llvm#107638) This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes llvm#86193
The `NodeCounts` parameter of `calcExtTspScore()` is unused, so remove it. Use `SmallVector` since arrays are expected to be small since they represent MBBs.
# Why? In real-time programming, you often have a process or dispatch loop that is called many, many, many times. Without de-duplication the user will be drowning in errors. Introduce a way to only print the stacks one time only, if they have been seen before
…10147) Fix typo which should be "at least" instead of "at lease".
Full revamp of the 'quant' dialect. This is an implementation for the RFC at https://discourse.llvm.org/t/rfc-improvements-in-the-quant-dialect/79942
jorickert
changed the title
[AutoBump] Merge with fixes of 85278611 (Sep 21) (10)
[AutoBump] Merge with fixes of 85278611 (Needs torch PR)(Sep 21) (10)
Dec 13, 2024
mgehre-amd
approved these changes
Dec 16, 2024
[AutoBump] Merge with fixes of 852b648 (Needs torch and onnx bump) (Sep 26) (12)
[AutoBump] Merge with f0162fcd (Sep 26) (11)
This was referenced Jan 2, 2025
Merged
mgehre-amd
changed the title
[AutoBump] Merge with fixes of 85278611 (Needs torch PR)(Sep 21) (10)
[AutoBump] Merge with fixes of 85278611 (Needs onnx/torch bump)(Sep 21) (10)
Jan 2, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.