Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with f284af48 (May 28) (52) #311

Merged
merged 495 commits into from
Sep 4, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

RKSimon and others added 30 commits May 24, 2024 12:45
…out of bounds shift amounts

SHL/SRL are guaranteed to fold to zero, SRA is guaranteed to fold to 'all sign bits'
This PR adds the type conversion support for fixed size arrays. Mostly
mechanical changes converting dimension values to subrange fields. A
limitation is that lower bound is always one for the moment as that
information is missing in `SequenceType`.

With this change in place, I can evaluate fixed size arrays in debugger.
```

(gdb) p x
$1 = ((2, 3, 4, 5) (3, 4, 5, 6) (4, 5, 6, 7) (5, 6, 7, 8) (6, 7, 8, 9))
(gdb) ptype x
type = integer (4,5)
```

---------

Co-authored-by: Tom Eccles <t@freedommail.info>
…vm#92579)

Prior to this patch, for "selective" DLL import/export, the vtable &
typeinfo would be imported/exported on the condition that all non-inline
virtual methods are imported/exported. This condition was based upon MS
guidelines related to "selective" DLL import/export.

However, in reality, this condition is too rigid and can result in
undefined vtable & typeinfo symbols for code that builds fine with MSVC.
Therefore, relax this condition to be if any non-inline method is
imported/exported.
This PR introduces support for inline assembly calls for SPIR-V Backend
in general, and support for SPV_INTEL_inline_assembly [1] extension in
particular. The former part of the PR is agnostic towards
vendor-specific requirements and resolves the task of supporting
successful transformation of inline assembly as long as it's possible
without specific SPIR-V instruction codes.

As a part of the PR there appears an opportunity to bring coherent
inline assembly information up to latest passes of the transformation
process (emitting final SPIR-V instructions), so that PR makes it easy
to add any another required flavor of inline assembly, other then
supported by the vendor specific SPV_INTEL_inline_assembly extension,
if/when needed.

At the moment, however, SPV_INTEL_inline_assembly is the only
implemented way to bring LLVM IR inline assembly calls up to valid
SPIR-V instructions and also the default one. This means that inline
assembly calls will generate an error message of such extension is not
used to prevent LLVM-generated error messages at the final stages of
translation. When the SPV_INTEL_inline_assembly extension is mentioned
among supported, translation of inline assembly is intercepted by this
extension implementation on a pre-legalizer step, and this is a place
where support for a new inline assembly extension may be added if
needed.

This PR also extends support for register classes, improves type
inference during pre-legalizer pass, and fixes a minor bug with
asm-printing of string literals.

[1]
https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc
Only with primitive fields for now.
Validation succeeds on this test since SPIRV-Tools commit `e2646f5e
("spirv-val: Consider target env for OpReadClockKHR scope",
2024-05-21)`.
Add do02.f90 and taskloop03.f90 that were removed in
llvm#92739
Replace shell script tests with python.
…m#93168)

This PR updates docs to describe support of SPV_KHR_shader_clock
extension added by llvm#92771.
One of the previous patches introduced initial support for non-power-of-2
number of elements but some parts of the SLP vectorizer still were not
adjusted to handle the costs correctly. Patch fixes it by improving
analysis of the non-power-of-2 number of elements and fixes in the cost
of the extractelements instructions.

Reviewers: RKSimon

Reviewed By: RKSimon

Pull Request: llvm#93213
… coverage

VBMI2 has legal FSHL/FSHR operations which makes it easier to test non-uniform shift amounts as it won't get expanded
This patch adds the support for `ballot_sync` in ompx.
…he template differ (llvm#93265)

This was not implemented in
llvm#78041 when StructuralValue
TemplateArguments were originally added.

This patch does not implement this functionality, it just falls back to
the expression when possible.

Otherwise, such as when dealing with canonical types to begin with, this
will just ignore the argument as if it wasn't even there.

Fixes llvm#93068
…lvm#92632)

When looking for missing frames due to tail calls, we were not checking
the output parameter of the recursive call in the correct place.
Make sure we check for the case when that recursive call returned false
due to multiple possible callee chains.

Extended the existing test a bit to catch this case.
Incremental change here, but a step in the right direction. Before,
an assignment to a dummy variable was diagnosed as a "read of a
non-const variable".
Use different -verify prefixes and make sure the tests really break
when fixing the eval order.
llvm#67174 added the `__prefetch` intrinsic, however it used the wrong
signature: the argument should be `const void*`, not `void*`.

Docs:
https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#:~:text=__prefetch

Unfortunately, this can't be backported (there are no more 18.x
releases, and this change is a breaking change), so I'll see if I can
get a workaround added on MSVC's side for Clang 18.
A precommit test case to show vector loops generated from EVL transform
- This is a precommit test for
llvm#92092
…k" (llvm#93306)

Reverts llvm#93270

This was found to have a race and the forward fix was reverted,
reverting this until can forward fix.
… existing BITCASTs and limit recursion depth

Add XOR + constant handling to allow us to detect NOT patterns.

If a recursive combineBitcastToBoolVector call finds an existing BITCAST node then use that.

As combineBitcastToBoolVector is recursive, ensure we limit the maximum recursion depth.

Fixes llvm#93000
llvm#93008)

LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX
was enabled when building LLVM. Use this instead of manually defining
MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).
This change expands the existing instrumentation that prints the IR
before/after each pass to an output stream (usually stderr). It adds
a new configuration that will print the output of each pass to a
separate file. The files will be organized into a directory tree
rooted at a specified directory. For existing tools, a CL option
`-mlir-print-ir-tree-dir` is added to specify this directory and
activate the new printing config.

The created directory tree mirrors the nesting structure of the IR. For
example,
if the IR is congruent to the pass-pipeline
"builtin.module(pass1,pass2,func.func(pass3,pass4),pass5)", and
`-mlir-print-ir-tree-dir=/tmp/pipeline_output`, then then the tree file
tree
created will look like:

```
/tmp/pass_output
├── builtin_module_the_symbol_name
│   ├── 0_pass1.mlir
│   ├── 1_pass2.mlir
│   ├── 2_pass5.mlir
│   ├── func_func_my_func_name
│   │   ├── 1_0_pass3.mlir
│   │   ├── 1_1_pass4.mlir
│   ├── func_func_my_other_func_name
│   │   ├── 1_0_pass3.mlir
│   │   ├── 1_1_pass4.mlir
```

The subdirectories are named by concatenating the relevant parent
operation names and symbol name (if present). The printer keeps a
counter associated with ops that are targeted by passes and their
isolated-from-above parents. Each filename is given a numeric prefix
using the counter value for the op that the pass is targeting and then
prepending the counter values for each parent. This gives a naming
where it is easy to distinguish which passes may have run concurrently
vs. which have a clear ordering. In the above example, for both
`1_1_pass4.mlir` files, the first `1` refers to the counter for the
parent op, and the second refers to the counter for the respective
function.
…vm#93221)

This maintains consistency with the non-VP ISD opcodes.
The test select-dependence.ll can be eliminated completely by dce, as it
returns a constant, and doesn't write any arguments. Lift out the local
allocas into arguments, so that it is less nonsensical. While at it,
rename the variables for greater readability, and regenerate the test
with UpdateTestChecks.
kkwli and others added 24 commits May 28, 2024 08:50
…inearize (llvm#92370)

Building on top of
[llvm#88204](llvm#88204), this PR adds
support for converting `vector.insert` into an equivalent
`vector.shuffle` operation that operates on linearized (1-D) vectors.
…93539)

The pass constructor can be generated automatically.

This pass is module-level and then runs on all relevant intrinsic
operations inside of the module, no matter what top level operation they
are inside of.
…nd in dropUnitDims pass. (llvm#93317)

`mlir-opt --linalg-fold-unit-extent-dims` pass on the following IR

```
#map = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 + d4, d2 + d5, d6)>
#map1 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)>
#map2 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)>
module {
  func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> {
    %cst = arith.constant dense<1.000000e+00> : tensor<1x1x1x1xf32>
    %0 = tensor.empty(%arg1) : tensor<?x1x61x1xf32>
    %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) {
    ^bb0(%in: f32, %in_0: f32, %out: f32):
      %2 = arith.mulf %in, %in_0 : f32
      %3 = arith.addf %out, %2 : f32
      linalg.yield %3 : f32
    } -> tensor<?x1x61x1xf32>
    return %1 : tensor<?x1x61x1xf32>
  }
}
```

produces an incorrect tensor.expand_shape operation:

```
error: 'tensor.expand_shape' op expected dimension 0 of collapsed type to be dynamic since one or more of the corresponding dimensions in the expanded type is dynamic
    %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) {
         ^
/mathworks/devel/sandbox/sayans/geckWorks/g3294570/repro.mlir:8:10: note: see current operation: %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32>
// -----// IR Dump After LinalgFoldUnitExtentDimsPass Failed (linalg-fold-unit-extent-dims) //----- //
#map = affine_map<(d0) -> (0, d0)>
#map1 = affine_map<(d0) -> ()>
#map2 = affine_map<(d0) -> (d0)>
"builtin.module"() ({
  "func.func"() <{function_type = (tensor<1x?x?x1xf32>, index) -> tensor<?x1x61x1xf32>, sym_name = "main"}> ({
  ^bb0(%arg0: tensor<1x?x?x1xf32>, %arg1: index):
    %0 = "arith.constant"() <{value = dense<1.000000e+00> : tensor<f32>}> : () -> tensor<f32>
    %1 = "tensor.collapse_shape"(%arg0) <{reassociation = [[0, 1], [2, 3]]}> : (tensor<1x?x?x1xf32>) -> tensor<?x?xf32>
    %2 = "tensor.empty"() : () -> tensor<61xf32>
    %3 = "tensor.empty"() : () -> tensor<61xf32>
    %4 = "linalg.generic"(%1, %0, %2, %3) <{indexing_maps = [#map, #map1, #map2, #map2], iterator_types = [#linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 3, 1>}> ({
    ^bb0(%arg2: f32, %arg3: f32, %arg4: f32, %arg5: f32):
      %6 = "arith.mulf"(%arg2, %arg3) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
      %7 = "arith.addf"(%arg4, %6) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32
      "linalg.yield"(%7) : (f32) -> ()
    }) : (tensor<?x?xf32>, tensor<f32>, tensor<61xf32>, tensor<61xf32>) -> tensor<61xf32>
    %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32>
    "func.return"(%5) : (tensor<?x1x61x1xf32>) -> ()
  }) : () -> ()
}) : () -> ()
```

The reason of this is because the dimension `d0` is determined to be an
unit-dim that can be dropped based on the dimensions of operand `arg0`
to `linalg.generic`. Later on when iterating over operand `outs` the
dimension `d0` is determined to be an unit-dim even though the shape
corresponding to it is `Shape::kDynamic`. For the `linalg.generic` to be
valid `d0` of `outs` does need to be `1` but that isn't properly
processed in the current implementation and the dimension is dropped
resulting in `outs` operand to be `tensor<61xf32>` in the example.

The fix is to also check that the dimension shape is actually `1` before
dropping the dimension. The IR after the fix is:

```
#map = affine_map<()[s0, s1] -> (s0 * s1)>
#map1 = affine_map<(d0) -> (0, d0)>
#map2 = affine_map<(d0) -> ()>
module {
  func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %cst = arith.constant dense<1.000000e+00> : tensor<f32>
    %collapsed = tensor.collapse_shape %arg0 [[0, 1], [2, 3]] : tensor<1x?x?x1xf32> into tensor<?x?xf32>
    %0 = tensor.empty(%arg1) : tensor<?x61xf32>
    %1 = affine.apply #map()[%arg1, %c1]
    %2 = tensor.empty(%1) : tensor<?x61xf32>
    %3 = linalg.generic {indexing_maps = [#map1, #map2, #map1, #map1], iterator_types = ["parallel"]} ins(%collapsed, %cst, %0 : tensor<?x?xf32>, tensor<f32>, tensor<?x61xf32>) outs(%2 : tensor<?x61xf32>) {
    ^bb0(%in: f32, %in_0: f32, %in_1: f32, %out: f32):
      %4 = arith.mulf %in, %in_0 : f32
      %5 = arith.addf %in_1, %4 : f32
      linalg.yield %5 : f32
    } -> tensor<?x61xf32>
    %expanded = tensor.expand_shape %3 [[0, 1], [2, 3]] output_shape [%c0, 1, 61, 1] : tensor<?x61xf32> into tensor<?x1x61x1xf32>
    return %expanded : tensor<?x1x61x1xf32>
  }
}
```
Clang has some unwritten rules about diagnostic wording regarding things
like punctuation and capitalization. This patch documents those rules
and adds some tablegen support for checking diagnostics follow the
rules.

Specifically: tablegen now checks that a diagnostic does not start with
a capital letter or end with punctuation, except for the usual
exceptions like proper nouns or ending with a question.

Now that the code base is clean of such issues, the diagnostics are
emitted as an error rather than a warning to ensure that failure to
follow these rules is either addressed by an author, or a new exception
is added to the checking logic.
Fixes llvm#90941.
Add support for ``[[msvc::noinline]]`` attribute, which is actually an
alias of ``[[clang::noinline]]``.
…le::makeUniqueName()`. (llvm#89057)

E.g. during inlining new symbol name can be duplicated and then
`ValueSymbolTable::makeUniqueName()` will add unique suffix, exceeding
the `non-global-value-max-name-size` restriction.

Also fixed `unsigned` type of the option to `int` since `ValueSymbolTable`'
constructor can use `-1` value that means unrestricted name size.
…93415)

"const" being removed in this patch prevents the move semantics from
being used in:

  AI.CallStack = Callback(IndexedAI.CSId);

With this patch on an indexed MemProf Version 2 profile, the cycle
count and instruction count go down by 13.3% and 26.3%, respectively,
with "llvm-profdata show" modified to deserialize all MemProfRecords.
There was existing support for constant folding a `linalg.generic` that
was actually a transpose. This commit adds support for the named op,
`linalg.transpose`, as well by making use of the `LinalgOp` interface.
…2127)

This change updates the dataLayout string to ensure alignment with the
latest LLVM TargetMachine configuration. The aim is to
maintain consistency and prevent potential compilation issues related to
memory address space handling.
fir.box_rank codegen was invalid, it was assuming the rank field in the
descriptor was an i32. This is not correct. Do not hard code the type,
use the named position to find the type, and convert as needed in the
patterns.
Rename things in a couple of places to make the code a bit clearer.
…ing when parsing declaration DIEs. (llvm#92328)

This reapplies
llvm@9a7262c
(llvm#90663) and added llvm#91808 as a
fix.

It was causing tests on macos to fail because
`SymbolFileDWARF::GetForwardDeclCompilerTypeToDIE` returned the map
owned by this symol file. When there were two symbol files, two
different maps were created for caching from compiler type to DIE even
if they are for the same module. The solution is to do the same as
`SymbolFileDWARF::GetUniqueDWARFASTTypeMap`: inquery
SymbolFileDWARFDebugMap first to get the shared underlying SymbolFile so
the map is shared among multiple SymbolFileDWARF.
…ounding ops. (llvm#93356)

The elements that aren't sNans need to get passed through this fadd
instruction unchanged. With the agnostic mask policy they might be
forced to all ones.
Summary:
There was a bug here where we would initialize the plugin multiple times
when there were multiple images. Fix it by putting the `is_initliaized`
check later.
@mgehre-amd mgehre-amd requested a review from cferry-AMD August 26, 2024 09:08
Base automatically changed from bump_to_76303791 to feature/fused-ops September 4, 2024 05:02
An error occurred while trying to automatically change base from bump_to_76303791 to feature/fused-ops September 4, 2024 05:02
@mgehre-amd mgehre-amd merged commit a7c393d into feature/fused-ops Sep 4, 2024
6 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_f284af48 branch September 4, 2024 05:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment