First batch post March 7th [Nr 0] #237

josel-amd · 2024-08-07T13:57:21Z

No description provided.

This adds a `CExpression` trait and replaces the `isCExpression()` function.

…nyext load. This load isn't selected by tablegen due to the anyext, but wasn't generating a subreg_to_reg. Maybe it shouldn't be formed at all during the combiner but to stop crashes later in codegen select it manually for now.

This reverts commit 2a13422. It was causing test failures on the expensive check builders.

Changing SPIR-V backend meeting day and removing my office hours

llvm#77753) Per https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2448r2.html function/constructor/destructor can be marked `constexpr` even though it never produces a constant expression. Non-literal types as return types and parameter types of functions marked `constexpr` are also allowed. Since this is not a DR, the diagnostic messages are still preserved for C++ standards older than C++23.

This is the concat_vector equivalent of llvm#81312, in that we recursively split concat_vectors with more than two operands into smaller concat_vectors. This allows us to break up the chain of vslideups, as well as perform the vslideups at a smaller LMUL, which in turn reduces register pressure as the previous lowering performed N vslideups at the highest result LMUL. For now, it stops splitting past MF2. This is done as a DAG combine so that any undef operands are combined away: If we do this during lowering then we end up with unnecessary vslideups of undefs.

…ences within trivial statements (llvm#82229) This PR makes alpha.webkit.UncountedLocalVarsChecker ignore raw references and pointers to a ref counted type which appears within "trival" statements. To do this, this PR extends TrivialFunctionAnalysis so that it can also analyze "triviality" of statements as well as that of functions Each Visit* function is now augmented with withCachedResult, which is responsible for looking up and updating the cache for each Visit* functions. As this PR dramatically improves the false positive rate of the checker, it also deletes the code to ignore raw pointers and references within if and for statements.

Try to make this editable by using functions for the number of wait states as a function of the number of passes. I'm assuming the current hazard test coverage is comprehensive. This could probably use another round to further simplify it. Alternatively, I believe this could all be expressed in a constant table indexed by an instruction classify function and number of passes.

…vm#84292) This trips the verifier changes added in llvm#83251 Stimulated by llvm/test/MC/WebAssembly/extern-functype-intrinsic.ll

Currently, the SLS hardening pass is run before the machine outliner, which means that the outliner creates new functions and calls which do not have the SLS hardening applied. The fix for this is to move the SLS passes to after the outliner, as has recently been done for the return address signing pass. This also avoids a bug where the SLS outliner emits code with instructions after a return, which the outliner doesn't correctly handle.

Clang returns an error when compiling this file with c++20 ``` error: ISO C++20 does not permit initialization of char array with UTF-8 string literal ``` It seems like c++20 treats u8strings differently than strings (probably needs char8_t). Make this a string to fix the error.

) The standard specifies that it it UB to specialize the following traits: - `std::is_integral` - `std::is_unsigned` - `std::make_unsigned` - `std::make_signed` This patch: - Removes specializations for `BigInt` - Transforms SFINAE for `bit.h` functions from template parameter to return type (This makes specialization easier). - Adds `BigInt` specialization for `bit.h` functions. - Fixes code depending on previous specializations.

…llvm#84297) Reverts llvm#84035 Several bots are failing: - https://lab.llvm.org/buildbot/#/builders/223/builds/37522 - https://lab.llvm.org/buildbot/#/builders/162/builds/51978 - https://lab.llvm.org/buildbot/#/builders/163/builds/52560 - https://lab.llvm.org/buildbot/#/builders/250/builds/19619

Following the pattern used for SOP instructions, we can use the same multiclass with a default argument to define renamed and non-renamed instructions.

This allows us to configure the pass to emit linalg.copy instead of memref.copy. This is consistent with one-shot-bufferize, which also allows to configure the `memCpyFn`, see https://discord.com/channels/636084430946959380/642426447167881246/1211698722438783087

ACLE suggests: ARM-software/acle#308. GCC emits diagnostics for attribute strings which contain duplicate features, but for now let's follow the SPEC in regards to mangling rules and we can change the semantic behavior of the compiler later if there's value to it.

This adds patterns and a pass to convert the Arith dialect to EmitC. For now, this covers arithemtic binary ops operating on floating point types. It is not checked within the patterns whether the types, such as the Tensor type, are supported in the respective EmitC operations. If unsupported types should be converted, the conversion will fail anyway because no legal EmitC operation can be created. This can clearly be improved in a follow up, also resulting in better error messages. Functions for such checks should not solely be used in the conversions and should also be (re)used in the verifier.

…lvm#84299) Note: This is a reland of llvm#84035. The standard specifies that it it UB to specialize the following traits: - `std::is_integral` - `std::is_unsigned` - `std::make_unsigned` - `std::make_signed` This patch: - Removes specializations for `BigInt` - Transforms SFINAE for `bit.h` functions from template parameter to return type (This makes specialization easier). - Adds `BigInt` specialization for `bit.h` functions. - Fixes code depending on previous specializations.

…th changing shuffles (llvm#84156) Fix gap in the cost estimation for length changing shuffles, by adjusting the shuffle mask and either widening the shuffle inputs or extracting the lower elements of the result. A small step towards moving some of this implementation inside improveShuffleKindFromMask and/or target getShuffleCost handlers (and reduce the diffs in cost estimation depending on whether coming from a ShuffleVectorInst or the raw operands / mask components)

Reverts llvm#84291 The patch broke Windows builds.

Pass the Pseudo (instead of its name) into EXP_Real_Row and EXP_Real_ComprVM since it is already available in all subclasses.

…lvm#83747)

Rename getNumVGPRBlocks to getEncodedNumVGPRBlocks, to clarify that it's using the encoding granule. This is used to program the hardware. In practice, the hardware will use the alloc granule instead, so this patch also adds a new helper, getAllocatedNumVGPRBlocks, which can be useful when driving heuristics.

See also discussion on llvm#84291.

…l-expression. (llvm#82611) In llvm#72985, I made a change to discard expression state (`ExprToLoc` and `ExprToVal`) at the beginning of each basic block. I did so with the claim that "we never need to access entries from these maps outside of the current basic block", noting that there are exceptions to this claim when control flow happens inside a full-expression (the operands of `&&`, `||`, and the conditional operator live in different basic blocks than the operator itself) but that we already have a mechanism for retrieving the values of these operands from the environment for the block they are computed in. It turns out, however, that the operands of these operators aren't the only expressions whose values can be accessed from a different basic block; when control flow happens within a full-expression, that control flow can be "interposed" between an expression and its parent. Here is an example: ```cxx void f(int*, int); bool cond(); void target() { int i = 0; f(&i, cond() ? 1 : 0); } ``` ([godbolt](https://godbolt.org/z/hrbj1Mj3o)) In the CFG[^1] , note how the expression for `&i` is computed in block B4, but the parent of this expression (the `CallExpr`) is located in block B1. The the argument expression `&i` and the `CallExpr` are essentially "torn apart" into different basic blocks by the conditional operator in the second argument. In other words, the edge between the `CallExpr` and its argument `&i` straddles the boundary between two blocks. I used to think that this scenario -- where an edge between an expression and one of its children straddles a block boundary -- could only happen between the expression that triggers the control flow (`&&`, `||`, or the conditional operator) and its children, but the example above shows that other expressions can be affected as well; the control flow is still triggered by `&&`, `||` or the conditional operator, but the expressions affected lie outside these operators. Discarding expression state too soon is harmful. For example, an analysis that checks the arguments of the `CallExpr` above would not be able to retrieve a value for the `&i` argument. This patch therefore ensures that we don't discard expression state before the end of a full-expression. In other cases -- when the evaluation of a full-expression is complete -- we still want to discard expression state for the reasons explained in llvm#72985 (avoid performing joins on boolean values that are no longer needed, which unnecessarily extends the flow condition; improve debuggability by removing clutter from the expression state). The impact on performance from this change is about a 1% slowdown in the Crubit nullability check benchmarks: ``` name old cpu/op new cpu/op delta BM_PointerAnalysisCopyPointer 71.9µs ± 1% 71.9µs ± 2% ~ (p=0.987 n=15+20) BM_PointerAnalysisIntLoop 190µs ± 1% 192µs ± 2% +1.06% (p=0.000 n=14+16) BM_PointerAnalysisPointerLoop 325µs ± 5% 324µs ± 4% ~ (p=0.496 n=18+20) BM_PointerAnalysisBranch 193µs ± 0% 192µs ± 4% ~ (p=0.488 n=14+18) BM_PointerAnalysisLoopAndBranch 521µs ± 1% 525µs ± 3% +0.94% (p=0.017 n=18+19) BM_PointerAnalysisTwoLoops 337µs ± 1% 341µs ± 3% +1.19% (p=0.004 n=17+19) BM_PointerAnalysisJoinFilePath 1.62ms ± 2% 1.64ms ± 3% +0.92% (p=0.021 n=20+20) BM_PointerAnalysisCallInLoop 1.14ms ± 1% 1.15ms ± 4% ~ (p=0.135 n=16+18) ``` [^1]: ``` [B5 (ENTRY)] Succs (1): B4 [B1] 1: [B4.9] ? [B2.1] : [B3.1] 2: [B4.4]([B4.6], [B1.1]) Preds (2): B2 B3 Succs (1): B0 [B2] 1: 1 Preds (1): B4 Succs (1): B1 [B3] 1: 0 Preds (1): B4 Succs (1): B1 [B4] 1: 0 2: int i = 0; 3: f 4: [B4.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int *, int)) 5: i 6: &[B4.5] 7: cond 8: [B4.7] (ImplicitCastExpr, FunctionToPointerDecay, _Bool (*)(void)) 9: [B4.8]() T: [B4.9] ? ... : ... Preds (1): B5 Succs (2): B2 B3 [B0 (EXIT)] Preds (1): B1 ```

…llvm#84296) I did not know how `-mmlir` flag works and was deferring the addition of `--openm-enabled-delayed-privatization` until later because I thought some work needs to be done to do that. This commit just adds some extra `RUN` lines to delayed privatization tests to run them from `flang` as well.

…llvm#79818) This patch adds support for parsing the proposed non-instruction debug info ("RemoveDIs") from textual IR, and adds a test for the parser as well as a set of verifier tests that are dependent on parsing to fire. An important detail of this patch is the fact that although we can now parse in the RemoveDIs (new) and Intrinsic (old) debug info formats, we will always convert back to the old format at the end of parsing - this is done for two reasons: firstly to ensure that every tool is able to process IR printed in the new format, regardless of whether that tool has had RemoveDIs support added, and secondly to maintain the effect of the existing flags: for the tools where support for the new format has been added, we will run LLVM passes in the new format iff `--try-experimental-debuginfo-iterators=true`, and we will print in the new format iff `--write-experimental-debuginfo-iterators=true`; the format of the textual IR input should have no effect on either of these features.

…are local to the inner context (llvm#84150) Make TopLevelStmtDecl a DeclContext so that variables defined in statements are attached to the TopLevelDeclContext. This fixes redefinition errors from variables declared in if conditions and for-init statements. These must be local to the inner context (C++ 3.3.2p4), but they had generated definitions on global scope instead. This PR makes the TopLevelStmtDecl looking more like a FunctionDecl and that's fine because the FunctionDecl is very close in terms of semantics. Additionally, ActOnForStmt() requires a CompoundScope when processing a NullStmt body. --------- Co-authored-by: Vassil Vassilev <v.g.vassilev@gmail.com>

Fixes llvm#84216

This reverts commit fb02f9a. Looks like some Python version incompatibility, will investigate.

Resubmitting this after previous revert with the following changes: - Split table into table_rhs_idx and table_candidate_idx so that bisect.bisect_left can be used without the `key` argument, which was introduced in Python 3.10 - Remove a re.Pattern type annotation Original commit message: Prior to this change, running UTC on larger tests, especially tests with unnamed IR values, often resulted in a spuriously large diff because e.g. TMPnn variables in the CHECK lines were renumbered. This change attempts to reduce the diff by keeping those variable names the same. There are cases in which this "drift" of variable names can end up being more confusing. The old behavior can be re-enabled with the --reset-variable-names command line argument. The improvement may not be immediately apparent in the diff of this change. The point is that the diff of stable_ir_values.ll against stable_ir_values.ll.expected after this change is smaller. Ideally, we'd also keep meta variables for "global" objects stable, e.g. for attributes (#nn) and metadata (!nn). However, that would require a much more substantial refactoring of how we generate check lines, so I left it for future work.

Older versions of clang do not have __builtin_complex, but they may define `__GNUC__`.

…s. (llvm#84187) With the many pseudos used in SVE codegen it can be too easy to miss instructions. This enables the existing test we have for checking the scheduling info of the pseudos matches the real instructions, and adjusts the scheduling info in the NeoverseV1 model to make sure all are handled. In the cases I could I opted to use the same info as in the NeoverseV2 model, to keep the differences smaller.

…this` with values. (llvm#84164) This is the constructor's job, and we want to be able to test that it does this.

…y value. (llvm#84317) I'm making some changes to `Environment::getResultObjectLocation()`, with the ultimate goal of eliminating `RecordValue` entirely, and I'd like to make sure I don't break this behavior (and I've realized we don't have a test for it yet).

These checks have been broken since 6afe972. The check_cxx_compiler_flag macro only takes two arguments and passing three to it ends up calling `cmake_check_compiler_flag(CXX "${_FLAG}" ${_RESULT})` with ${_FLAG} equal to `-Werror` and the result variable being the actually tested compiler flag. I noticed this because some of the flags that I know should be supported were being flagged as not supported. `--debug-trycompile` shows the following surprising line in the generated CMakeLists.txt: `add_definitions([==[-D-Wempty-body]==] [==[-Werror]==])` which then results in the following error while running the check: ``` FAILED: CMakeFiles/cmTC_72736.dir/src.cxx.o tmp/upstream-llvm-readonly/bin/clang++ -nodefaultlibs -std=c++17 -fcolor-diagnostics -D-Wempty-body -Werror -MD -MT CMakeFiles/cmTC_72736.dir/src.cxx.o -MF CMakeFiles/cmTC_72736.dir/src.cxx.o.d -o CMakeFiles/cmTC_72736.dir/src.cxx.o -c .../cmake-build-all-sanitizers/CMakeFiles/CMakeScratch/TryCompile-nyh3QR/src.cxx In file included from <built-in>:450: <command line>:1:9: error: macro name must be an identifier 1 | #define -Wempty-body 1 | ^ 1 error generated. ``` It would be great if CMake could be a bit more helpful here so I've filed https://gitlab.kitware.com/cmake/cmake/-/issues/25735. See also https://reviews.llvm.org/D146920. Reviewed By: nikic Pull Request: llvm#83779

…t_march_7

…t_march_7_3

Bump to bfa6444 [Nr 6]

Bump to f355cd6 [Nr 5]

Bump to e4d4cfa [Nr 4]

Bump to c40146c [Nr 3]

Bump to afac64c [Nr 2]

Bump to 469c5e3 [Nr 1]

josel-amd · 2024-08-08T12:34:56Z

Auto-merged enabled. Should merge itself soon. Thanks for the approval!

marbre and others added 30 commits March 7, 2024 08:37

[mlir][EmitC] Introduce a CExpression trait (llvm#84177)

7c63431

This adds a `CExpression` trait and replaces the `isCExpression()` function.

Revert "Convert many LivePhysRegs uses to LiveRegUnits (llvm#83905)"

7a0e222

This reverts commit 2a13422. It was causing test failures on the expensive check builders.

[docs] Change SPIR-V backend meeting day (llvm#84286)

91808c8

Changing SPIR-V backend meeting day and removing my office hours

[RemoveDIs][wasm] Apply current debug mode to new function protos (ll…

812c22b

…vm#84292) This trips the verifier changes added in llvm#83251 Stimulated by llvm/test/MC/WebAssembly/extern-functype-intrinsic.ll

[AMDGPU] Make use of Mnem_gfx11_gfx12. NFC.

bf7f62a

[AMDGPU] Simplify definition of renamed DS instructions. NFC.

469c5e3

Following the pattern used for SOP instructions, we can use the same multiclass with a default argument to define renamed and non-renamed instructions.

Revert "[dataflow][nfc] Fix u8 string usage with c++20" (llvm#84301)

5830d1a

Reverts llvm#84291 The patch broke Windows builds.

[AMDGPU] Simplify EXP Real instruction definitions. NFC.

4119042

Pass the Pseudo (instead of its name) into EXP_Real_Row and EXP_Real_ComprVM since it is already available in all subclasses.

[libc++] Remove unused includes from __type_traits/is_convertible.h (l…

937a539

…lvm#83747)

[clang][dataflow] Fix u8 string error with C++20. (llvm#84302)

a11ab13

See also discussion on llvm#84291.

[Clang] Update value for __cpp_implicit_move (llvm#84216) (llvm#84228)

2acccf6

Fixes llvm#84216

nhaehnle and others added 8 commits March 8, 2024 04:25

Revert "update_test_checks: keep meta variables stable by default"

b565126

This reverts commit fb02f9a. Looks like some Python version incompatibility, will investigate.

[flang] Fixed build issues after f20ea05. (llvm#84377)

d9c8550

Older versions of clang do not have __builtin_complex, but they may define `__GNUC__`.

[clang][dataflow] When analyzing ctors, don't initialize fields of `*…

2d539db

…this` with values. (llvm#84164) This is the constructor's job, and we want to be able to test that it does this.

Merge commit '99500e8c08a4d941acb8a7eb00523296fb2acf7a' into jose.pos…

4466520

…t_march_7

mgehre-amd approved these changes Aug 7, 2024

View reviewed changes

Base automatically changed from jose.post_pdll_xilinx_3 to feature/fused-ops August 7, 2024 15:14

josel-amd added 9 commits August 7, 2024 16:23

Merge commit '469c5e3da46115b9625e2d4771bd19d4968e3fa9' into jose.pos…

4889a58

…t_march_7_3

Merge commit 'afac64cef40c77320cc49808be30f3e5ef7f7357' into jose.bump_4

6b9eab2

Merge commit '6f54a54c6f5f644b4f4c79882154fd9737568c8e' into jose.bump_5

5aac423

Merge commit 'c40146c214a705a232848144d9412c8a7c73f0fe' into jose.bump_5

768d339

Fix merge problem

88698cc

Merge branch 'jose.bump_4' into jose.bump_5

4399998

Merge commit 'e4d4cfa5a0111372dff2b01126545cf3139ee40b' into jose.bump_6

bc3916d

Merge commit 'f355cd6f6c51580316e1e88ef5534bd2f8cfa498' into jose.bump_7

9c0a083

Merge commit 'bfa6444a332f82843f9fa44821d68fcc772e0272' into jose.bump_8

18addab

josel-amd changed the title ~~First batch post March 7th~~ First batch post March 7th [Nr 0] Aug 8, 2024

josel-amd added 6 commits August 8, 2024 14:32

Merge pull request #243 from Xilinx/jose.bump_8

826a499

Bump to bfa6444 [Nr 6]

Merge pull request #242 from Xilinx/jose.bump_7

334b09f

Bump to f355cd6 [Nr 5]

Merge pull request #241 from Xilinx/jose.bump_6

c9901d0

Bump to e4d4cfa [Nr 4]

Merge pull request #240 from Xilinx/jose.bump_5

202aa38

Bump to c40146c [Nr 3]

Merge pull request #239 from Xilinx/jose.bump_4

48c15f5

Bump to afac64c [Nr 2]

Merge pull request #238 from Xilinx/jose.post_march_7_3

a9423f9

Bump to 469c5e3 [Nr 1]

josel-amd enabled auto-merge August 8, 2024 12:34

josel-amd merged commit 91d4461 into feature/fused-ops Aug 8, 2024
4 checks passed

josel-amd deleted the jose.post_march_7 branch August 8, 2024 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First batch post March 7th [Nr 0] #237

First batch post March 7th [Nr 0] #237

josel-amd commented Aug 7, 2024

josel-amd commented Aug 8, 2024

First batch post March 7th [Nr 0] #237

First batch post March 7th [Nr 0] #237

Conversation

josel-amd commented Aug 7, 2024

josel-amd commented Aug 8, 2024