[AutoBump] Merge with f284af48 (May 28) (52) #311

mgehre-amd · 2024-08-26T08:36:59Z

No description provided.

…out of bounds shift amounts SHL/SRL are guaranteed to fold to zero, SRA is guaranteed to fold to 'all sign bits'

This PR adds the type conversion support for fixed size arrays. Mostly mechanical changes converting dimension values to subrange fields. A limitation is that lower bound is always one for the moment as that information is missing in `SequenceType`. With this change in place, I can evaluate fixed size arrays in debugger. ``` (gdb) p x $1 = ((2, 3, 4, 5) (3, 4, 5, 6) (4, 5, 6, 7) (5, 6, 7, 8) (6, 7, 8, 9)) (gdb) ptype x type = integer (4,5) ``` --------- Co-authored-by: Tom Eccles <t@freedommail.info>

…vm#92579) Prior to this patch, for "selective" DLL import/export, the vtable & typeinfo would be imported/exported on the condition that all non-inline virtual methods are imported/exported. This condition was based upon MS guidelines related to "selective" DLL import/export. However, in reality, this condition is too rigid and can result in undefined vtable & typeinfo symbols for code that builds fine with MSVC. Therefore, relax this condition to be if any non-inline method is imported/exported.

This PR introduces support for inline assembly calls for SPIR-V Backend in general, and support for SPV_INTEL_inline_assembly [1] extension in particular. The former part of the PR is agnostic towards vendor-specific requirements and resolves the task of supporting successful transformation of inline assembly as long as it's possible without specific SPIR-V instruction codes. As a part of the PR there appears an opportunity to bring coherent inline assembly information up to latest passes of the transformation process (emitting final SPIR-V instructions), so that PR makes it easy to add any another required flavor of inline assembly, other then supported by the vendor specific SPV_INTEL_inline_assembly extension, if/when needed. At the moment, however, SPV_INTEL_inline_assembly is the only implemented way to bring LLVM IR inline assembly calls up to valid SPIR-V instructions and also the default one. This means that inline assembly calls will generate an error message of such extension is not used to prevent LLVM-generated error messages at the final stages of translation. When the SPV_INTEL_inline_assembly extension is mentioned among supported, translation of inline assembly is intercepted by this extension implementation on a pre-legalizer step, and this is a place where support for a new inline assembly extension may be added if needed. This PR also extends support for register classes, improves type inference during pre-legalizer pass, and fixes a minor bug with asm-printing of string literals. [1] https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc

Only with primitive fields for now.

Validation succeeds on this test since SPIRV-Tools commit `e2646f5e ("spirv-val: Consider target env for OpReadClockKHR scope", 2024-05-21)`.

Add do02.f90 and taskloop03.f90 that were removed in llvm#92739 Replace shell script tests with python.

…m#93168) This PR updates docs to describe support of SPV_KHR_shader_clock extension added by llvm#92771.

One of the previous patches introduced initial support for non-power-of-2 number of elements but some parts of the SLP vectorizer still were not adjusted to handle the costs correctly. Patch fixes it by improving analysis of the non-power-of-2 number of elements and fixes in the cost of the extractelements instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#93213

… coverage VBMI2 has legal FSHL/FSHR operations which makes it easier to test non-uniform shift amounts as it won't get expanded

This patch adds the support for `ballot_sync` in ompx.

…he template differ (llvm#93265) This was not implemented in llvm#78041 when StructuralValue TemplateArguments were originally added. This patch does not implement this functionality, it just falls back to the expression when possible. Otherwise, such as when dealing with canonical types to begin with, this will just ignore the argument as if it wasn't even there. Fixes llvm#93068

…lvm#92632) When looking for missing frames due to tail calls, we were not checking the output parameter of the recursive call in the correct place. Make sure we check for the case when that recursive call returned false due to multiple possible callee chains. Extended the existing test a bit to catch this case.

…lueDecl (llvm#93266)

Incremental change here, but a step in the right direction. Before, an assignment to a dummy variable was diagnosed as a "read of a non-const variable".

Use different -verify prefixes and make sure the tests really break when fixing the eval order.

llvm#67174 added the `__prefetch` intrinsic, however it used the wrong signature: the argument should be `const void*`, not `void*`. Docs: https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#:~:text=__prefetch Unfortunately, this can't be backported (there are no more 18.x releases, and this change is a breaking change), so I'll see if I can get a workaround added on MSVC's side for Clang 18.

A precommit test case to show vector loops generated from EVL transform - This is a precommit test for llvm#92092

…k" (llvm#93306) Reverts llvm#93270 This was found to have a race and the forward fix was reverted, reverting this until can forward fix.

… existing BITCASTs and limit recursion depth Add XOR + constant handling to allow us to detect NOT patterns. If a recursive combineBitcastToBoolVector call finds an existing BITCAST node then use that. As combineBitcastToBoolVector is recursive, ensure we limit the maximum recursion depth. Fixes llvm#93000

…vm#93272) Specified at: https://github.com/WebAssembly/half-precision/blob/29a9b9462c9285d4ccc1a5dc39214ddfd1892658/proposals/half-precision/Overview.md Note: the current spec has f16x8.extract_lane as opcode 0x124, but this is incorrect and will be changed to 0x121 soon.

llvm#93008) LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX was enabled when building LLVM. Use this instead of manually defining MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).

…d if available (llvm#93205)

This change expands the existing instrumentation that prints the IR before/after each pass to an output stream (usually stderr). It adds a new configuration that will print the output of each pass to a separate file. The files will be organized into a directory tree rooted at a specified directory. For existing tools, a CL option `-mlir-print-ir-tree-dir` is added to specify this directory and activate the new printing config. The created directory tree mirrors the nesting structure of the IR. For example, if the IR is congruent to the pass-pipeline "builtin.module(pass1,pass2,func.func(pass3,pass4),pass5)", and `-mlir-print-ir-tree-dir=/tmp/pipeline_output`, then then the tree file tree created will look like: ``` /tmp/pass_output ├── builtin_module_the_symbol_name │ ├── 0_pass1.mlir │ ├── 1_pass2.mlir │ ├── 2_pass5.mlir │ ├── func_func_my_func_name │ │ ├── 1_0_pass3.mlir │ │ ├── 1_1_pass4.mlir │ ├── func_func_my_other_func_name │ │ ├── 1_0_pass3.mlir │ │ ├── 1_1_pass4.mlir ``` The subdirectories are named by concatenating the relevant parent operation names and symbol name (if present). The printer keeps a counter associated with ops that are targeted by passes and their isolated-from-above parents. Each filename is given a numeric prefix using the counter value for the op that the pass is targeting and then prepending the counter values for each parent. This gives a naming where it is easy to distinguish which passes may have run concurrently vs. which have a clear ordering. In the above example, for both `1_1_pass4.mlir` files, the first `1` refers to the counter for the parent op, and the second refers to the counter for the respective function.

…vm#93221) This maintains consistency with the non-VP ISD opcodes.

The test select-dependence.ll can be eliminated completely by dce, as it returns a constant, and doesn't write any arguments. Lift out the local allocas into arguments, so that it is less nonsensical. While at it, rename the variables for greater readability, and regenerate the test with UpdateTestChecks.

…inearize (llvm#92370) Building on top of [llvm#88204](llvm#88204), this PR adds support for converting `vector.insert` into an equivalent `vector.shuffle` operation that operates on linearized (1-D) vectors.

…93539) The pass constructor can be generated automatically. This pass is module-level and then runs on all relevant intrinsic operations inside of the module, no matter what top level operation they are inside of.

@main

…nd in dropUnitDims pass. (llvm#93317) `mlir-opt --linalg-fold-unit-extent-dims` pass on the following IR ``` #map = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 + d4, d2 + d5, d6)> #map1 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)> #map2 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)> module { func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> { %cst = arith.constant dense<1.000000e+00> : tensor<1x1x1x1xf32> %0 = tensor.empty(%arg1) : tensor<?x1x61x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) { ^bb0(%in: f32, %in_0: f32, %out: f32): %2 = arith.mulf %in, %in_0 : f32 %3 = arith.addf %out, %2 : f32 linalg.yield %3 : f32 } -> tensor<?x1x61x1xf32> return %1 : tensor<?x1x61x1xf32> } } ``` produces an incorrect tensor.expand_shape operation: ``` error: 'tensor.expand_shape' op expected dimension 0 of collapsed type to be dynamic since one or more of the corresponding dimensions in the expanded type is dynamic %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) { ^ /mathworks/devel/sandbox/sayans/geckWorks/g3294570/repro.mlir:8:10: note: see current operation: %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32> // -----// IR Dump After LinalgFoldUnitExtentDimsPass Failed (linalg-fold-unit-extent-dims) //----- // #map = affine_map<(d0) -> (0, d0)> #map1 = affine_map<(d0) -> ()> #map2 = affine_map<(d0) -> (d0)> "builtin.module"() ({ "func.func"() <{function_type = (tensor<1x?x?x1xf32>, index) -> tensor<?x1x61x1xf32>, sym_name = "main"}> ({ ^bb0(%arg0: tensor<1x?x?x1xf32>, %arg1: index): %0 = "arith.constant"() <{value = dense<1.000000e+00> : tensor<f32>}> : () -> tensor<f32> %1 = "tensor.collapse_shape"(%arg0) <{reassociation = [[0, 1], [2, 3]]}> : (tensor<1x?x?x1xf32>) -> tensor<?x?xf32> %2 = "tensor.empty"() : () -> tensor<61xf32> %3 = "tensor.empty"() : () -> tensor<61xf32> %4 = "linalg.generic"(%1, %0, %2, %3) <{indexing_maps = [#map, #map1, #map2, #map2], iterator_types = [#linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 3, 1>}> ({ ^bb0(%arg2: f32, %arg3: f32, %arg4: f32, %arg5: f32): %6 = "arith.mulf"(%arg2, %arg3) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 %7 = "arith.addf"(%arg4, %6) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 "linalg.yield"(%7) : (f32) -> () }) : (tensor<?x?xf32>, tensor<f32>, tensor<61xf32>, tensor<61xf32>) -> tensor<61xf32> %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32> "func.return"(%5) : (tensor<?x1x61x1xf32>) -> () }) : () -> () }) : () -> () ``` The reason of this is because the dimension `d0` is determined to be an unit-dim that can be dropped based on the dimensions of operand `arg0` to `linalg.generic`. Later on when iterating over operand `outs` the dimension `d0` is determined to be an unit-dim even though the shape corresponding to it is `Shape::kDynamic`. For the `linalg.generic` to be valid `d0` of `outs` does need to be `1` but that isn't properly processed in the current implementation and the dimension is dropped resulting in `outs` operand to be `tensor<61xf32>` in the example. The fix is to also check that the dimension shape is actually `1` before dropping the dimension. The IR after the fix is: ``` #map = affine_map<()[s0, s1] -> (s0 * s1)> #map1 = affine_map<(d0) -> (0, d0)> #map2 = affine_map<(d0) -> ()> module { func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %cst = arith.constant dense<1.000000e+00> : tensor<f32> %collapsed = tensor.collapse_shape %arg0 [[0, 1], [2, 3]] : tensor<1x?x?x1xf32> into tensor<?x?xf32> %0 = tensor.empty(%arg1) : tensor<?x61xf32> %1 = affine.apply #map()[%arg1, %c1] %2 = tensor.empty(%1) : tensor<?x61xf32> %3 = linalg.generic {indexing_maps = [#map1, #map2, #map1, #map1], iterator_types = ["parallel"]} ins(%collapsed, %cst, %0 : tensor<?x?xf32>, tensor<f32>, tensor<?x61xf32>) outs(%2 : tensor<?x61xf32>) { ^bb0(%in: f32, %in_0: f32, %in_1: f32, %out: f32): %4 = arith.mulf %in, %in_0 : f32 %5 = arith.addf %in_1, %4 : f32 linalg.yield %5 : f32 } -> tensor<?x61xf32> %expanded = tensor.expand_shape %3 [[0, 1], [2, 3]] output_shape [%c0, 1, 61, 1] : tensor<?x61xf32> into tensor<?x1x61x1xf32> return %expanded : tensor<?x1x61x1xf32> } } ```

Clang has some unwritten rules about diagnostic wording regarding things like punctuation and capitalization. This patch documents those rules and adds some tablegen support for checking diagnostics follow the rules. Specifically: tablegen now checks that a diagnostic does not start with a capital letter or end with punctuation, except for the usual exceptions like proper nouns or ending with a question. Now that the code base is clean of such issues, the diagnostics are emitted as an error rather than a warning to ensure that failure to follow these rules is either addressed by an author, or a new exception is added to the checking logic.

Fixes llvm#90941. Add support for ``[[msvc::noinline]]`` attribute, which is actually an alias of ``[[clang::noinline]]``.

And as an extension in older language modes. Per https://eel.is/c++draft/lex.string#nt:d-char Fixes llvm#93130

…lvm#91599) This reverts commit aa9d467.

…le::makeUniqueName()`. (llvm#89057) E.g. during inlining new symbol name can be duplicated and then `ValueSymbolTable::makeUniqueName()` will add unique suffix, exceeding the `non-global-value-max-name-size` restriction. Also fixed `unsigned` type of the option to `int` since `ValueSymbolTable`' constructor can use `-1` value that means unrestricted name size.

…93415) "const" being removed in this patch prevents the move semantics from being used in: AI.CallStack = Callback(IndexedAI.CSId); With this patch on an indexed MemProf Version 2 profile, the cycle count and instruction count go down by 13.3% and 26.3%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.

There was existing support for constant folding a `linalg.generic` that was actually a transpose. This commit adds support for the named op, `linalg.transpose`, as well by making use of the `LinalgOp` interface.

…2127) This change updates the dataLayout string to ensure alignment with the latest LLVM TargetMachine configuration. The aim is to maintain consistency and prevent potential compilation issues related to memory address space handling.

fir.box_rank codegen was invalid, it was assuming the rank field in the descriptor was an i32. This is not correct. Do not hard code the type, use the named position to find the type, and convert as needed in the patterns.

Rename things in a couple of places to make the code a bit clearer.

…ing when parsing declaration DIEs. (llvm#92328) This reapplies llvm@9a7262c (llvm#90663) and added llvm#91808 as a fix. It was causing tests on macos to fail because `SymbolFileDWARF::GetForwardDeclCompilerTypeToDIE` returned the map owned by this symol file. When there were two symbol files, two different maps were created for caching from compiler type to DIE even if they are for the same module. The solution is to do the same as `SymbolFileDWARF::GetUniqueDWARFASTTypeMap`: inquery SymbolFileDWARFDebugMap first to get the shared underlying SymbolFile so the map is shared among multiple SymbolFileDWARF.

…ounding ops. (llvm#93356) The elements that aren't sNans need to get passed through this fadd instruction unchanged. With the agnostic mask policy they might be forced to all ones.

Summary: There was a bug here where we would initialize the plugin multiple times when there were multiple images. Fix it by putting the `is_initliaized` check later.

RKSimon and others added 30 commits May 24, 2024 12:45

[X86] canCreateUndefOrPoisonForTargetNode - SSE vector shifts handle …

fcd086c

…out of bounds shift amounts SHL/SRL are guaranteed to fold to zero, SRA is guaranteed to fold to 'all sign bits'

[clang][Interp] Fix zero-initializing unions

f35aac6

Only with primitive fields for now.

[SPIR-V] Enable spirv-val in SPV_KHR_shader_clock test (llvm#93292)

fb9f5aa

Validation succeeds on this test since SPIRV-Tools commit `e2646f5e ("spirv-val: Consider target env for OpReadClockKHR scope", 2024-05-21)`.

[Flang][OpenMP] Reenable and fix final few tests 6/6 (llvm#93295)

879b726

Add do02.f90 and taskloop03.f90 that were removed in llvm#92739 Replace shell script tests with python.

[SPIR-V] Update docs to describe support of SPV_KHR_shader_clock (llv…

0f26aa5

…m#93168) This PR updates docs to describe support of SPV_KHR_shader_clock extension added by llvm#92771.

[X86] funnel-shifts.ll - add VBMI2 and non-uniform shift amounts test…

1430405

… coverage VBMI2 has legal FSHL/FSHR operations which makes it easier to test non-uniform shift amounts as it won't get expanded

[DAG] visitFunnelShift - pull out repeated SDLoc.

729fdb6

[OpenMP][OMPX] Add ballot_sync (llvm#91297)

7eeec8e

This patch adds the support for `ballot_sync` in ompx.

[clang] add fallback to expr in the template differ when comparing Va…

ad190fc

…lueDecl (llvm#93266)

[clang][Interp] Diagnose dummy assignments differently

d8c8c8c

Incremental change here, but a step in the right direction. Before, an assignment to a dummy variable was diagnosed as a "read of a non-const variable".

[X86] Add test case for llvm#93000

abc4c21

[clang][Interp][NFC] Make eval-order test more useful

82a5d0d

Use different -verify prefixes and make sure the tests really break when fixing the eval order.

[LV][NFC] precommit test for EVL transform (llvm#92203)

b008a2d

A precommit test case to show vector loops generated from EVL transform - This is a precommit test for llvm#92092

Revert "[mlir] Optimize ThreadLocalCache by removing atomic bottlenec…

fab234a

…k" (llvm#93306) Reverts llvm#93270 This was found to have a race and the forward fix was reverted, reverting this until can forward fix.

[clang][ExtractAPI] Ensure TemplateArgumentLocations are only accesse…

ab7e6b6

…d if available (llvm#93205)

[memprof] Use a SetVector (NFC) (llvm#93312)

15135af

[SelectionDAG][RISCV][VE] Rename VP_ASHR->VP_SRA VP_LSHR->VP_SRL. (ll…

a1c9b96

…vm#93221) This maintains consistency with the non-VP ISD opcodes.

[libc++][test] Close LWG3045 (llvm#93053)

96af54b

kkwli and others added 24 commits May 28, 2024 08:50

[flang] Fix typos PPC intrinsics tests (NFC) (llvm#92943)

1da52ca

[mlir][vector] Add support for linearizing Insert VectorOp in VectorL…

01fbc56

…inearize (llvm#92370) Building on top of [llvm#88204](llvm#88204), this PR adds support for converting `vector.insert` into an equivalent `vector.shuffle` operation that operates on linearized (1-D) vectors.

[bazel] Port 17ecd23

bdd4e8b

[X86][tablgen] Add assertions when emitting NF transform table

5988c79

[gn] port 17ecd23 (-gen-x86-instr-mapping)

2c7c9df

Fix failure after d46e373

6e1a042

[Clang] Add support for [[msvc::noinline]] attribute. (llvm#91720)

8995ccc

Fixes llvm#90941. Add support for ``[[msvc::noinline]]`` attribute, which is actually an alias of ``[[clang::noinline]]``.

[Clang] allow ` @$ `` in raw string delimiters in C++26 (llvm#93216)

2ace7bd

And as an extension in older language modes. Per https://eel.is/c++draft/lex.string#nt:d-char Fixes llvm#93130

[gn build] Port 23e1ed6

57790db

Reland "[AArch64] NFC: Add RUN lines for streaming-compatible code." (l…

46a30df

…lvm#91599) This reverts commit aa9d467.

[mlir][linalg] Add linalg.transpose constant folding (llvm#92589)

74ed79f

There was existing support for constant folding a `linalg.generic` that was actually a transpose. This commit adds support for the named op, `linalg.transpose`, as well by making use of the `LinalgOp` interface.

[lldb][NativePDB] Fix uninitialized values found by msan.

cde1ae4

[Frontend][OpenMP] Rename some variables, NFC

8890214

Rename things in a couple of places to make the code a bit clearer.

[RISCV] Use mask undisturbed policy when silencing sNans for strict r…

d490ce2

…ounding ops. (llvm#93356) The elements that aren't sNans need to get passed through this fadd instruction unchanged. With the agnostic mask policy they might be forced to all ones.

[Offload][Fix] Fix lazy initialization with multiple images

f284af4

Summary: There was a bug here where we would initialize the plugin multiple times when there were multiple images. Fix it by putting the `is_initliaized` check later.

[AutoBump] Merge with f284af4 (May 28)

b4715ec

mgehre-amd requested a review from cferry-AMD August 26, 2024 09:08

cferry-AMD approved these changes Aug 26, 2024

View reviewed changes

Base automatically changed from bump_to_76303791 to feature/fused-ops September 4, 2024 05:02

An error occurred while trying to automatically change base from bump_to_76303791 to feature/fused-ops September 4, 2024 05:02

mgehre-amd merged commit a7c393d into feature/fused-ops Sep 4, 2024
6 checks passed

mgehre-amd deleted the bump_to_f284af48 branch September 4, 2024 05:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AutoBump] Merge with f284af48 (May 28) (52) #311

[AutoBump] Merge with f284af48 (May 28) (52) #311

mgehre-amd commented Aug 26, 2024

[AutoBump] Merge with f284af48 (May 28) (52) #311

[AutoBump] Merge with f284af48 (May 28) (52) #311

Conversation

mgehre-amd commented Aug 26, 2024