forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with f284af48 (May 28) (52) #311
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…out of bounds shift amounts SHL/SRL are guaranteed to fold to zero, SRA is guaranteed to fold to 'all sign bits'
This PR adds the type conversion support for fixed size arrays. Mostly mechanical changes converting dimension values to subrange fields. A limitation is that lower bound is always one for the moment as that information is missing in `SequenceType`. With this change in place, I can evaluate fixed size arrays in debugger. ``` (gdb) p x $1 = ((2, 3, 4, 5) (3, 4, 5, 6) (4, 5, 6, 7) (5, 6, 7, 8) (6, 7, 8, 9)) (gdb) ptype x type = integer (4,5) ``` --------- Co-authored-by: Tom Eccles <t@freedommail.info>
…vm#92579) Prior to this patch, for "selective" DLL import/export, the vtable & typeinfo would be imported/exported on the condition that all non-inline virtual methods are imported/exported. This condition was based upon MS guidelines related to "selective" DLL import/export. However, in reality, this condition is too rigid and can result in undefined vtable & typeinfo symbols for code that builds fine with MSVC. Therefore, relax this condition to be if any non-inline method is imported/exported.
This PR introduces support for inline assembly calls for SPIR-V Backend in general, and support for SPV_INTEL_inline_assembly [1] extension in particular. The former part of the PR is agnostic towards vendor-specific requirements and resolves the task of supporting successful transformation of inline assembly as long as it's possible without specific SPIR-V instruction codes. As a part of the PR there appears an opportunity to bring coherent inline assembly information up to latest passes of the transformation process (emitting final SPIR-V instructions), so that PR makes it easy to add any another required flavor of inline assembly, other then supported by the vendor specific SPV_INTEL_inline_assembly extension, if/when needed. At the moment, however, SPV_INTEL_inline_assembly is the only implemented way to bring LLVM IR inline assembly calls up to valid SPIR-V instructions and also the default one. This means that inline assembly calls will generate an error message of such extension is not used to prevent LLVM-generated error messages at the final stages of translation. When the SPV_INTEL_inline_assembly extension is mentioned among supported, translation of inline assembly is intercepted by this extension implementation on a pre-legalizer step, and this is a place where support for a new inline assembly extension may be added if needed. This PR also extends support for register classes, improves type inference during pre-legalizer pass, and fixes a minor bug with asm-printing of string literals. [1] https://github.com/intel/llvm/blob/sycl/sycl/doc/design/spirv-extensions/SPV_INTEL_inline_assembly.asciidoc
Only with primitive fields for now.
Validation succeeds on this test since SPIRV-Tools commit `e2646f5e ("spirv-val: Consider target env for OpReadClockKHR scope", 2024-05-21)`.
Add do02.f90 and taskloop03.f90 that were removed in llvm#92739 Replace shell script tests with python.
…m#93168) This PR updates docs to describe support of SPV_KHR_shader_clock extension added by llvm#92771.
One of the previous patches introduced initial support for non-power-of-2 number of elements but some parts of the SLP vectorizer still were not adjusted to handle the costs correctly. Patch fixes it by improving analysis of the non-power-of-2 number of elements and fixes in the cost of the extractelements instructions. Reviewers: RKSimon Reviewed By: RKSimon Pull Request: llvm#93213
… coverage VBMI2 has legal FSHL/FSHR operations which makes it easier to test non-uniform shift amounts as it won't get expanded
This patch adds the support for `ballot_sync` in ompx.
…he template differ (llvm#93265) This was not implemented in llvm#78041 when StructuralValue TemplateArguments were originally added. This patch does not implement this functionality, it just falls back to the expression when possible. Otherwise, such as when dealing with canonical types to begin with, this will just ignore the argument as if it wasn't even there. Fixes llvm#93068
…lvm#92632) When looking for missing frames due to tail calls, we were not checking the output parameter of the recursive call in the correct place. Make sure we check for the case when that recursive call returned false due to multiple possible callee chains. Extended the existing test a bit to catch this case.
Incremental change here, but a step in the right direction. Before, an assignment to a dummy variable was diagnosed as a "read of a non-const variable".
Use different -verify prefixes and make sure the tests really break when fixing the eval order.
llvm#67174 added the `__prefetch` intrinsic, however it used the wrong signature: the argument should be `const void*`, not `void*`. Docs: https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#:~:text=__prefetch Unfortunately, this can't be backported (there are no more 18.x releases, and this change is a breaking change), so I'll see if I can get a workaround added on MSVC's side for Clang 18.
A precommit test case to show vector loops generated from EVL transform - This is a precommit test for llvm#92092
…k" (llvm#93306) Reverts llvm#93270 This was found to have a race and the forward fix was reverted, reverting this until can forward fix.
… existing BITCASTs and limit recursion depth Add XOR + constant handling to allow us to detect NOT patterns. If a recursive combineBitcastToBoolVector call finds an existing BITCAST node then use that. As combineBitcastToBoolVector is recursive, ensure we limit the maximum recursion depth. Fixes llvm#93000
…vm#93272) Specified at: https://github.com/WebAssembly/half-precision/blob/29a9b9462c9285d4ccc1a5dc39214ddfd1892658/proposals/half-precision/Overview.md Note: the current spec has f16x8.extract_lane as opcode 0x124, but this is incorrect and will be changed to 0x121 soon.
llvm#93008) LLVM_HAS_NVPTX_TARGET is automatically set depending on whether NVPTX was enabled when building LLVM. Use this instead of manually defining MLIR_ENABLE_CUDA_CONVERSIONS (whose name is a bit misleading btw).
This change expands the existing instrumentation that prints the IR before/after each pass to an output stream (usually stderr). It adds a new configuration that will print the output of each pass to a separate file. The files will be organized into a directory tree rooted at a specified directory. For existing tools, a CL option `-mlir-print-ir-tree-dir` is added to specify this directory and activate the new printing config. The created directory tree mirrors the nesting structure of the IR. For example, if the IR is congruent to the pass-pipeline "builtin.module(pass1,pass2,func.func(pass3,pass4),pass5)", and `-mlir-print-ir-tree-dir=/tmp/pipeline_output`, then then the tree file tree created will look like: ``` /tmp/pass_output ├── builtin_module_the_symbol_name │ ├── 0_pass1.mlir │ ├── 1_pass2.mlir │ ├── 2_pass5.mlir │ ├── func_func_my_func_name │ │ ├── 1_0_pass3.mlir │ │ ├── 1_1_pass4.mlir │ ├── func_func_my_other_func_name │ │ ├── 1_0_pass3.mlir │ │ ├── 1_1_pass4.mlir ``` The subdirectories are named by concatenating the relevant parent operation names and symbol name (if present). The printer keeps a counter associated with ops that are targeted by passes and their isolated-from-above parents. Each filename is given a numeric prefix using the counter value for the op that the pass is targeting and then prepending the counter values for each parent. This gives a naming where it is easy to distinguish which passes may have run concurrently vs. which have a clear ordering. In the above example, for both `1_1_pass4.mlir` files, the first `1` refers to the counter for the parent op, and the second refers to the counter for the respective function.
…vm#93221) This maintains consistency with the non-VP ISD opcodes.
The test select-dependence.ll can be eliminated completely by dce, as it returns a constant, and doesn't write any arguments. Lift out the local allocas into arguments, so that it is less nonsensical. While at it, rename the variables for greater readability, and regenerate the test with UpdateTestChecks.
…inearize (llvm#92370) Building on top of [llvm#88204](llvm#88204), this PR adds support for converting `vector.insert` into an equivalent `vector.shuffle` operation that operates on linearized (1-D) vectors.
…93539) The pass constructor can be generated automatically. This pass is module-level and then runs on all relevant intrinsic operations inside of the module, no matter what top level operation they are inside of.
…nd in dropUnitDims pass. (llvm#93317) `mlir-opt --linalg-fold-unit-extent-dims` pass on the following IR ``` #map = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1 + d4, d2 + d5, d6)> #map1 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d4, d5, d6, d3)> #map2 = affine_map<(d0, d1, d2, d3, d4, d5, d6) -> (d0, d1, d2, d3)> module { func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> { %cst = arith.constant dense<1.000000e+00> : tensor<1x1x1x1xf32> %0 = tensor.empty(%arg1) : tensor<?x1x61x1xf32> %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) { ^bb0(%in: f32, %in_0: f32, %out: f32): %2 = arith.mulf %in, %in_0 : f32 %3 = arith.addf %out, %2 : f32 linalg.yield %3 : f32 } -> tensor<?x1x61x1xf32> return %1 : tensor<?x1x61x1xf32> } } ``` produces an incorrect tensor.expand_shape operation: ``` error: 'tensor.expand_shape' op expected dimension 0 of collapsed type to be dynamic since one or more of the corresponding dimensions in the expanded type is dynamic %1 = linalg.generic {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "parallel", "parallel", "reduction", "reduction", "reduction"]} ins(%arg0, %cst : tensor<1x?x?x1xf32>, tensor<1x1x1x1xf32>) outs(%0 : tensor<?x1x61x1xf32>) { ^ /mathworks/devel/sandbox/sayans/geckWorks/g3294570/repro.mlir:8:10: note: see current operation: %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32> // -----// IR Dump After LinalgFoldUnitExtentDimsPass Failed (linalg-fold-unit-extent-dims) //----- // #map = affine_map<(d0) -> (0, d0)> #map1 = affine_map<(d0) -> ()> #map2 = affine_map<(d0) -> (d0)> "builtin.module"() ({ "func.func"() <{function_type = (tensor<1x?x?x1xf32>, index) -> tensor<?x1x61x1xf32>, sym_name = "main"}> ({ ^bb0(%arg0: tensor<1x?x?x1xf32>, %arg1: index): %0 = "arith.constant"() <{value = dense<1.000000e+00> : tensor<f32>}> : () -> tensor<f32> %1 = "tensor.collapse_shape"(%arg0) <{reassociation = [[0, 1], [2, 3]]}> : (tensor<1x?x?x1xf32>) -> tensor<?x?xf32> %2 = "tensor.empty"() : () -> tensor<61xf32> %3 = "tensor.empty"() : () -> tensor<61xf32> %4 = "linalg.generic"(%1, %0, %2, %3) <{indexing_maps = [#map, #map1, #map2, #map2], iterator_types = [#linalg.iterator_type<parallel>], operandSegmentSizes = array<i32: 3, 1>}> ({ ^bb0(%arg2: f32, %arg3: f32, %arg4: f32, %arg5: f32): %6 = "arith.mulf"(%arg2, %arg3) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 %7 = "arith.addf"(%arg4, %6) <{fastmath = #arith.fastmath<none>}> : (f32, f32) -> f32 "linalg.yield"(%7) : (f32) -> () }) : (tensor<?x?xf32>, tensor<f32>, tensor<61xf32>, tensor<61xf32>) -> tensor<61xf32> %5 = "tensor.expand_shape"(%4) <{reassociation = [[0, 1, 2, 3]]}> : (tensor<61xf32>) -> tensor<?x1x61x1xf32> "func.return"(%5) : (tensor<?x1x61x1xf32>) -> () }) : () -> () }) : () -> () ``` The reason of this is because the dimension `d0` is determined to be an unit-dim that can be dropped based on the dimensions of operand `arg0` to `linalg.generic`. Later on when iterating over operand `outs` the dimension `d0` is determined to be an unit-dim even though the shape corresponding to it is `Shape::kDynamic`. For the `linalg.generic` to be valid `d0` of `outs` does need to be `1` but that isn't properly processed in the current implementation and the dimension is dropped resulting in `outs` operand to be `tensor<61xf32>` in the example. The fix is to also check that the dimension shape is actually `1` before dropping the dimension. The IR after the fix is: ``` #map = affine_map<()[s0, s1] -> (s0 * s1)> #map1 = affine_map<(d0) -> (0, d0)> #map2 = affine_map<(d0) -> ()> module { func.func @main(%arg0: tensor<1x?x?x1xf32>, %arg1: index) -> tensor<?x1x61x1xf32> { %c0 = arith.constant 0 : index %c1 = arith.constant 1 : index %cst = arith.constant dense<1.000000e+00> : tensor<f32> %collapsed = tensor.collapse_shape %arg0 [[0, 1], [2, 3]] : tensor<1x?x?x1xf32> into tensor<?x?xf32> %0 = tensor.empty(%arg1) : tensor<?x61xf32> %1 = affine.apply #map()[%arg1, %c1] %2 = tensor.empty(%1) : tensor<?x61xf32> %3 = linalg.generic {indexing_maps = [#map1, #map2, #map1, #map1], iterator_types = ["parallel"]} ins(%collapsed, %cst, %0 : tensor<?x?xf32>, tensor<f32>, tensor<?x61xf32>) outs(%2 : tensor<?x61xf32>) { ^bb0(%in: f32, %in_0: f32, %in_1: f32, %out: f32): %4 = arith.mulf %in, %in_0 : f32 %5 = arith.addf %in_1, %4 : f32 linalg.yield %5 : f32 } -> tensor<?x61xf32> %expanded = tensor.expand_shape %3 [[0, 1], [2, 3]] output_shape [%c0, 1, 61, 1] : tensor<?x61xf32> into tensor<?x1x61x1xf32> return %expanded : tensor<?x1x61x1xf32> } } ```
Clang has some unwritten rules about diagnostic wording regarding things like punctuation and capitalization. This patch documents those rules and adds some tablegen support for checking diagnostics follow the rules. Specifically: tablegen now checks that a diagnostic does not start with a capital letter or end with punctuation, except for the usual exceptions like proper nouns or ending with a question. Now that the code base is clean of such issues, the diagnostics are emitted as an error rather than a warning to ensure that failure to follow these rules is either addressed by an author, or a new exception is added to the checking logic.
Fixes llvm#90941. Add support for ``[[msvc::noinline]]`` attribute, which is actually an alias of ``[[clang::noinline]]``.
And as an extension in older language modes. Per https://eel.is/c++draft/lex.string#nt:d-char Fixes llvm#93130
…lvm#91599) This reverts commit aa9d467.
…le::makeUniqueName()`. (llvm#89057) E.g. during inlining new symbol name can be duplicated and then `ValueSymbolTable::makeUniqueName()` will add unique suffix, exceeding the `non-global-value-max-name-size` restriction. Also fixed `unsigned` type of the option to `int` since `ValueSymbolTable`' constructor can use `-1` value that means unrestricted name size.
…93415) "const" being removed in this patch prevents the move semantics from being used in: AI.CallStack = Callback(IndexedAI.CSId); With this patch on an indexed MemProf Version 2 profile, the cycle count and instruction count go down by 13.3% and 26.3%, respectively, with "llvm-profdata show" modified to deserialize all MemProfRecords.
There was existing support for constant folding a `linalg.generic` that was actually a transpose. This commit adds support for the named op, `linalg.transpose`, as well by making use of the `LinalgOp` interface.
…2127) This change updates the dataLayout string to ensure alignment with the latest LLVM TargetMachine configuration. The aim is to maintain consistency and prevent potential compilation issues related to memory address space handling.
fir.box_rank codegen was invalid, it was assuming the rank field in the descriptor was an i32. This is not correct. Do not hard code the type, use the named position to find the type, and convert as needed in the patterns.
Rename things in a couple of places to make the code a bit clearer.
…ing when parsing declaration DIEs. (llvm#92328) This reapplies llvm@9a7262c (llvm#90663) and added llvm#91808 as a fix. It was causing tests on macos to fail because `SymbolFileDWARF::GetForwardDeclCompilerTypeToDIE` returned the map owned by this symol file. When there were two symbol files, two different maps were created for caching from compiler type to DIE even if they are for the same module. The solution is to do the same as `SymbolFileDWARF::GetUniqueDWARFASTTypeMap`: inquery SymbolFileDWARFDebugMap first to get the shared underlying SymbolFile so the map is shared among multiple SymbolFileDWARF.
…ounding ops. (llvm#93356) The elements that aren't sNans need to get passed through this fadd instruction unchanged. With the agnostic mask policy they might be forced to all ones.
Summary: There was a bug here where we would initialize the plugin multiple times when there were multiple images. Fix it by putting the `is_initliaized` check later.
cferry-AMD
approved these changes
Aug 26, 2024
An error occurred while trying to automatically change base from
bump_to_76303791
to
feature/fused-ops
September 4, 2024 05:02
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.