[AutoBump] Merge with e99755d4 (Sep 16) (3) #417

jorickert · 2024-12-12T15:07:10Z

No description provided.

Adds a python script to automatically take output from a failed clang -verify test and update the test case(s) to expect the new behaviour.

* To create custom ABIs plugin libraries need access to CoroShape. * As a step in enabling plugin libraries, move Shape into its own header * The header will eventually be moved into include/llvm/Transforms/Coroutines See RFC for more info: https://discourse.llvm.org/t/rfc-abi-objects-for-coroutines/81057

…izationRewrite` (llvm#108359) The dialect conversion maintains a set of unresolved materializations (`UnrealizedConversionCastOp`). Turn that set into a `DenseMap` that maps from ops to `UnresolvedMaterializationRewrite *`. This improves efficiency a bit, because an iteration over `ConversionPatternRewriterImpl::rewrites` can be avoided. Also delete some dead code.

This patch implements sandboxir::GlobalObject mirroring llvm::GlobalObject.

This patch implements a new empty pass for the Bottom-up vectorizer and creates a pass pipeline that includes it. The SandboxVectorizer LLVM pass runs the Sandbox IR pass pipeline.

Summary: We can include `stdint.h` just fine as long as we don't allow it to find system headers, passing `-nostdlibinc` and `-nogpuinc` suppresses these extra paths so we will just use the clang resource headers for `stdint.h` and `stddef.h`.

… NFC

Should not return the original phi vector instruction, need to return actual vectorized value as a result.

…iple `ucmp`/`scmp` operands and a constant with `phi` of individual comparisons of original intrinsic's arguments (llvm#107769) When we have a `phi` instruction with more than one of its incoming values being a call to `ucmp` or `scmp`, which is then compared with an integer constant, we can move the comparison through the `phi` into the incoming basic blocks because we know that a comparison of `ucmp`/`scmp` with a constant will be simplified by the next iteration of InstCombine. There's a high chance that other similar patterns can be identified, in which case they can be easily handled by the same code by moving the check for "simplifiable" instructions into a lambda.

Inline VPBuilder::createICmp in the header, in line with the other VPBuilder functions.

Reverts llvm#97369

Create `eh-assembly.s` that contains EH tests and remove EH tests from `basic-assembly.s`, given that it's easier to manage. (We can have many different tests, including the legacy EH and the new exnref, and with nesting for readability)

) This is a follow-up to llvm#92289 that adds lowering of the new `@llvm.experimental.vector.compress` intrinsic on x86 with AVX512 instructions. This intrinsic maps directly to `vpcompress`.

Fixes llvm@39f2d2f

This patch implements sandboxir::GlobalIFunc mirroring llvm::GlobalIFunc.

This patch adds support for a user-defined pass-pipeline that overrides the default pipeline of the vectorizer. This will commonly be used by lit tests.

This allows the clang driver to know which tool is meant to be executed, which allows the clang driver to load the right clang config files, and allows clang to find colocated sysroots. This makes sure that doing `clang-scan-deps -- <tool> ...` looks up things in the same way as if one just would execute `<tool> ...`, when `<tool>` isn't an absolute or relative path.

Emit signpost intervals for progress events so that when users report an operation takes a long time, we can investigate the issue with Instruments.app.

…n libc warnings (llvm#108308) For `snprintf(a, sizeof a, ...)`, the first two arguments form a safe pattern if `a` is a constant array. In such a case, this commit will suppress the warning. (rdar://117182250)

This change adds a new HLSL 202y language mode. Currently HLSL 202y is planned to add `auto` and `constexpr`. This change updates extension diagnostics to state that lambadas are a "clang HLSL" extension (since we have no planned release yet to include them), and that `auto` is a HLSL 202y extension when used in earlier language modes. Note: This PR does temporarily work around some differences between HLSL 2021 and 202x in Clang by changing test cases to explicitly specify 202x. A subsequent PR will update 2021's language flags to match 202x.

Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe and VPScalarIVStepsRecipe. Use them to simplify the code to create scalar IV steps slightly.

This is a quick fix to unbreak Bazel build. The right solution would probably add vdso.cpp in the support library which includes circular dependency and needs more restructuring.

…cs section of RISCVUsage.rst. NFC (llvm#108718) These are no longer experimental after 051054e. I left the section because we will be adding intrinsics for Zvkgs and Zvbc32e.

…llvm#107905) That missing space was causing the whole sentence to be rendered incorrectly in the resulting HTML.

Breaks the [Solaris/sparcv9](https://lab.llvm.org/buildbot/#/builders/13/builds/2219) and [Solaris/amd64](https://lab.llvm.org/staging/#/builders/120/builds/1770) builds. This reverts commit c21909a.

…st (llvm#108524) This extends the fix in llvm#106242 for other derived class types.

…#108852) In the process of adding scalarization support for DirectX target intrinsics I found that intrinsics that weren't marked with `IntrNoMem` did not get removed by `RecursivelyDeleteTriviallyDeadInstructionsPermissive`. So this change is to make it more clear that our intrinsics don't have side effects. I only added `IntrNoMem` to the intrinics in `IntrinsicsDirectX.td` I was involved with. There a potentially a few other cases that might warrant this attribute, but will need input on the others.

There is possibility of static_tls_begin is set and static_tls_end is not yet The test reproduces the case. Stack trace looks like this: * `MsanThread::Init` * `SetThreadStackAndTls` * `GetThreadStackAndTls` * `GetThreadStackTopAndBottom` * `pthread_getattr_np` * `realloc` * `__sanitizer_malloc_hook` * TLS access * `___interceptor___tls_get_addr` * `DTLS_on_tls_get_addr` The issue is that `SetThreadStackAndTls` implementation stores `tls_begin` before `GetThreadStackTopAndBottom`, and `tls_end` after. So we have partially initialized state in `DTLS_on_tls_get_addr`.

This is an implementation of `ctime` and includes `ctime_r`. According to documentation, `ctime` and `ctime_r` are defined as the following: ```c char *ctime(const time_t *timep); char *ctime_r(const time_t *restrict timep, char buf[restrict 26]); ``` closes llvm#86567

Given that the instructions here are all control flow instructions, adding indentations seem to make it easier to read.

The plan was to make `eh-assembly.s` contain both the legacy and the new tests, but the new tests require `--no-type-check` because the type checker for the new EH is in progress. In case this drags on further than expected, this renames the current file to `-legacy.s` in order to follow the current naming scheme in `test/CodeGen/WebAssembly`. After landing this first, `eh-assembly-new.s` in llvm#108668 will be renamed to `eh-assembly.s`.

Secondary cache entries are now released to the OS from least recent to most recent entries. This helps to avoid unnecessary scans of the cache since entries ready to be released (specifically, entries that are considered old relative to the configurable release interval) will always be at the tail of the list of committed entries by the LRU ordering. For this same reason, the `OldestTime` variable is no longer needed to indicate when releases are necessary so it has been removed.

This PR is implementing `asfloat` for HLSL. Fixes: llvm#70098 Co-authored-by: Joao Saffran <jderezende@microsoft.com>

This patch adds a large number of missing includes in the libc++ headers and the test suite. Those were found as part of the effort to move towards a mostly monolithic top-level std module.

This PR adds `f6E2M3FN` type to mlir. `f6E2M3FN` type is proposed in [OpenCompute MX Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf). It defines a 6-bit floating point number with bit layout S1E2M3. Unlike IEEE-754 types, there are no infinity or NaN values. ```c f6E2M3FN - Exponent bias: 1 - Maximum stored exponent value: 3 (binary 11) - Maximum unbiased exponent value: 3 - 1 = 2 - Minimum stored exponent value: 1 (binary 01) - Minimum unbiased exponent value: 1 − 1 = 0 - Has Positive and Negative zero - Doesn't have infinity - Doesn't have NaNs Additional details: - Zeros (+/-): S.00.000 - Max normal number: S.11.111 = ±2^(2) x (1 + 0.875) = ±7.5 - Min normal number: S.01.000 = ±2^(0) = ±1.0 - Max subnormal number: S.00.111 = ±2^(0) x 0.875 = ±0.875 - Min subnormal number: S.00.001 = ±2^(0) x 0.125 = ±0.125 ``` Related PRs: - [PR-94735](llvm#94735) [APFloat] Add APFloat support for FP6 data types - [PR-105573](llvm#105573) [MLIR] Add f6E3M2FN type - was used as a template for this PR

This reverts commit 69f3244. Reason: buildbot breakage because Android doesn't have <gnu/libc-version.h> https://lab.llvm.org/buildbot/#/builders/186/builds/2381 (It's probably easy to fix but I don't readily have an Android device to test.)

…cxx.rst` (llvm#108714) This makes it easier for readers to locate how to build the library.

The paper was implemented by commit b0386a5 (https://reviews.llvm.org/D46845) in LLVM 7.0. But it would be nice to have test coverage for desired properties of `insert_return_type`. Closes llvm#99944

Fixes asan, msan crash on check added in llvm#108684. The llvm#108684 includes reproducer of the issue. Change interface of `GetThreadStackAndTls` to set `tls_begin` and `tls_end` at the same time.

…-zbc.mir. NFC The IR used loads instead of stores.

@callee

…tion signature (llvm#107644) When there is a function signature mismatch between a call instruction and the callee, lower the call to an indirect call. The current behavior is to produce direct calls that may or may not be valid PTX. Consider the following example with mismatching return types: ``` %struct.1 = type <{i64}> %struct.2 = type <{i64}> declare %struct.1 @callee() ... %call1 = call %struct.2 @callee() %call2 = call i64 @callee() ``` The return type of `callee` in PTX is `.b8 _[8]`. The return type of `%call1` will be the same and so the PTX has no problems. The return type of `%call2` will be `.b64`, so the types will not match and PTX will be unacceptable to ptxas. This despite all the types having the same size. The same is true for mismatching parameter types. If we instead convert these calls to indirect calls, we will generate functional PTX when the types have the same size. If they do not have the same size then the PTX will likely be incorrect, though this will not necessarily be caught by ptxas. Also, even if the sizes are the same, if the types differ then it is technically undefined behavior. This change allows for more flexibility in the bitcode that can be lowered to functioning PTX, at the cost of sometimes producing PTX that is less clearly wrong than it would have been previously (i.e. incorrect indirect calls are not as obviously wrong as incorrect direct calls). We consider it okay to generate PTX with undefined behavior as the behavior of calls with mismatching types is not explicitly defined.

When both CREL and the experimental lld partitions feature are enabled, the relocation section may look like .crel.llvm_sympart.f1, and `rels.relas` is empty. While here, support relocation sections with zero entry.

The test expects a hex float format of `0x0p+0`, but AIX prints `0x0.0p+0`. This change adjusts the test to accept both.

hnrklssn and others added 30 commits September 13, 2024 10:47

[Utils] add update-verify-tests.py (llvm#97369)

d4f41be

Adds a python script to automatically take output from a failed clang -verify test and update the test case(s) to expect the new behaviour.

[AMDGPU] Error on non-global pointer with s_prefetch_data (llvm#107624)

d0e7714

[SandboxIR] Implement GlobalObject (llvm#108604)

9f738c8

This patch implements sandboxir::GlobalObject mirroring llvm::GlobalObject.

[SandboxVec] Boilerplate for vectorization passes (llvm#108603)

39f2d2f

This patch implements a new empty pass for the Bottom-up vectorizer and creates a pass pipeline that includes it. The SandboxVectorizer LLVM pass runs the Sandbox IR pass pipeline.

[SLP][NFC]Test with incorrect value for phi node with reused scalars,…

98b1d01

… NFC

[SLP]Return proper value for phi vectorized node

c13bf6d

Should not return the original phi vector instruction, need to return actual vectorized value as a result.

[Xtensa] Lowering FRAMEADDR/RETURNADDR operations. (llvm#107363)

0ba8b24

[RISCV] Add Zvfhmin to RISCVUsage.rst. NFC (llvm#108574)

1fc3ca1

[gn build] Port 39f2d2f

29e5fe7

[libc] Fix vdso VER_FLG_BASE redefinition in overlay mod. (llvm#108628)

b659abe

VPlan/Builder: inline VPBuilder::createICmp (NFC) (llvm#105650)

75a57ed

Inline VPBuilder::createICmp in the header, in line with the other VPBuilder functions.

Revert "[Utils] add update-verify-tests.py" (llvm#108630)

b7e585b

Reverts llvm#97369

[x86] Add lowering for @llvm.experimental.vector.compress (llvm#104904

b74e779

) This is a follow-up to llvm#92289 that adds lowering of the new `@llvm.experimental.vector.compress` intrinsic on x86 with AVX512 instructions. This intrinsic maps directly to `vpcompress`.

[Bazel] Fix build break for SandboxVectorizer (llvm#108638)

aca226c

Fixes llvm@39f2d2f

[SandboxIR] Implement GlobalIFunc (llvm#108622)

ae3e825

This patch implements sandboxir::GlobalIFunc mirroring llvm::GlobalIFunc.

[SandboxVec] User-defined pass pipeline (llvm#108625)

5130f32

This patch adds support for a user-defined pass-pipeline that overrides the default pipeline of the vectorizer. This will commonly be used by lit tests.

[gn build] Port a26ec54

1b4aea6

[lldb] Emit signpost intervals for progress events (NFC) (llvm#108498)

90f077c

Emit signpost intervals for progress events so that when users report an operation takes a long time, we can investigate the issue with Instruments.app.

[-Wunsafe-buffer-usage] Reduce false positives with constant arrays i…

ebf25d9

…n libc warnings (llvm#108308) For `snprintf(a, sizeof a, ...)`, the first two arguments form a safe pattern if `a` is a constant array. In such a case, this commit will suppress the warning. (rdar://117182250)

[VPlan] Use VPBuilder to create scalar IV steps and derived IV (NFCI).

c3fda44

Extend VPBuilder to allow creating VPDerivedIVRecipe, VPScalarCastRecipe and VPScalarIVStepsRecipe. Use them to simplify the code to create scalar IV steps slightly.

[SandboxIR] Implement missng Instruction::comesBefore() (llvm#108635)

6cbb245

[bazel] add vdso dependency to time_linux lib (llvm#108647)

a592e4b

This is a quick fix to unbreak Bazel build. The right solution would probably add vdso.cpp in the support library which includes circular dependency and needs more restructuring.

[bazel] Port a953982 (llvm#108651)

ee3f5c2

llvmgnsyncbot and others added 24 commits September 16, 2024 18:01

[gn build] Port 5c348f6

a40b36f

[RISCV][Docs] Remove Zvbb, Zvbc and Zvk* from experimental C intrinsi…

aaa0f4d

…cs section of RISCVUsage.rst. NFC (llvm#108718) These are no longer experimental after 051054e. I left the section because we will be adding intrinsics for Zvkgs and Zvbc32e.

[clang][NFC] Add missing space in -Wunsafe-buffer-usage documentation (…

83bb731

…llvm#107905) That missing space was causing the whole sentence to be rendered incorrectly in the resulting HTML.

Revert "[NFC][sanitizer] Simplify ifdef"

9ec1f65

Breaks the [Solaris/sparcv9](https://lab.llvm.org/buildbot/#/builders/13/builds/2219) and [Solaris/amd64](https://lab.llvm.org/staging/#/builders/120/builds/1770) builds. This reverts commit c21909a.

[Format] Dont treat LBrace after extends/implements as initializer li…

04d71ea

…st (llvm#108524) This extends the fix in llvm#106242 for other derived class types.

[WebAssembly] Add indentations to annotations.s (llvm#108790)

d8ee96c

Given that the instructions here are all control flow instructions, adding indentations seem to make it easier to read.

Implementing asfloat using bit_cast (llvm#108686)

1bfc3d0

This PR is implementing `asfloat` for HLSL. Fixes: llvm#70098 Co-authored-by: Joao Saffran <jderezende@microsoft.com>

[libc++][modules] Fix missing and incorrect includes (llvm#108850)

09e3a36

This patch adds a large number of missing includes in the libc++ headers and the test suite. Those were found as part of the effort to move towards a mostly monolithic top-level std module.

[libc++][docs] Add link to VendorDocumentation.rst from `TestingLib…

397e4dc

…cxx.rst` (llvm#108714) This makes it easier for readers to locate how to build the library.

[libc++][test] Confirm that P0508R0 has been implemented (llvm#108172)

2d13302

The paper was implemented by commit b0386a5 (https://reviews.llvm.org/D46845) in LLVM 7.0. But it would be nice to have test coverage for desired properties of `insert_return_type`. Closes llvm#99944

[RISCV] Add coverage for select C, C1, C2 where (C1-C2)*[0,1] is cheap

8f023ec

[sanitizer] Fix partially initialized static TLS range (llvm#108685)

b7c9ebe

Fixes asan, msan crash on check added in llvm#108684. The llvm#108684 includes reproducer of the issue. Change interface of `GetThreadStackAndTls` to set `tls_begin` and `tls_end` at the same time.

[RISCV] Fix IR for store_large_offset_no_opt_i16 in make-compressible…

4eb9780

…-zbc.mir. NFC The IR used loads instead of stores.

[ELF] .llvm.sympart: support CREL

cf70a1e

When both CREL and the experimental lld partitions feature are enabled, the relocation section may look like .crel.llvm_sympart.f1, and `rels.relas` is empty. While here, support relocation sections with zero entry.

[libc++][test] Adjust expected hexfloat format (llvm#95011)

e99755d

The test expects a hex float format of `0x0p+0`, but AIX prints `0x0.0p+0`. This change adjusts the test to accept both.

[AutoBump] Merge with e99755d (Sep 16)

45c9757

jorickert requested a review from mgehre-amd December 12, 2024 15:07

Base automatically changed from bump_to_d5f0969c to feature/fused-ops December 12, 2024 15:08

jorickert enabled auto-merge December 12, 2024 15:43

mgehre-amd approved these changes Dec 13, 2024

View reviewed changes

jorickert merged commit 1700d25 into feature/fused-ops Dec 13, 2024
38 checks passed

jorickert deleted the bump_to_e99755d4 branch December 13, 2024 13:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with e99755d4 (Sep 16) (3) #417

[AutoBump] Merge with e99755d4 (Sep 16) (3) #417

jorickert commented Dec 12, 2024

[AutoBump] Merge with e99755d4 (Sep 16) (3) #417

[AutoBump] Merge with e99755d4 (Sep 16) (3) #417

Conversation

jorickert commented Dec 12, 2024