Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First batch post March 7th [Nr 0] #237

Merged
merged 173 commits into from
Aug 8, 2024
Merged

Conversation

josel-amd
Copy link
Collaborator

No description provided.

marbre and others added 30 commits March 7, 2024 08:37
This adds a `CExpression` trait and replaces the `isCExpression()`
function.
…nyext load.

This load isn't selected by tablegen due to the anyext, but wasn't generating
a subreg_to_reg. Maybe it shouldn't be formed at all during the combiner but to stop
crashes later in codegen select it manually for now.
This reverts commit 2a13422.

It was causing test failures on the expensive check builders.
Changing SPIR-V backend meeting day and removing my office hours
llvm#77753)

Per
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2448r2.html
function/constructor/destructor can be marked `constexpr` even though it
never produces a constant expression.
Non-literal types as return types and parameter types of functions
marked `constexpr` are also allowed.
Since this is not a DR, the diagnostic messages are still preserved for
C++ standards older than C++23.
This is the concat_vector equivalent of llvm#81312, in that we recursively
split concat_vectors with more than two operands into smaller
concat_vectors.

This allows us to break up the chain of vslideups, as well as perform
the vslideups at a smaller LMUL, which in turn reduces register pressure
as the previous lowering performed N vslideups at the highest result
LMUL. For now, it stops splitting past MF2.

This is done as a DAG combine so that any undef operands are combined
away: If we do this during lowering then we end up with unnecessary
vslideups of undefs.
…ences within trivial statements (llvm#82229)

This PR makes alpha.webkit.UncountedLocalVarsChecker ignore raw
references and pointers to a ref counted type which appears within
"trival" statements. To do this, this PR extends TrivialFunctionAnalysis
so that it can also analyze "triviality" of statements as well as that
of functions Each Visit* function is now augmented with
withCachedResult, which is responsible for looking up and updating the
cache for each Visit* functions.

As this PR dramatically improves the false positive rate of the checker,
it also deletes the code to ignore raw pointers and references within if
and for statements.
Try to make this editable by using functions for the number of wait
states as a function of the number of passes. I'm assuming the current
hazard test coverage is comprehensive.

This could probably use another round to further simplify it.
Alternatively, I believe this could all be expressed in a constant table
indexed by an instruction classify function and number of passes.
…vm#84292)

This trips the verifier changes added in llvm#83251

Stimulated by llvm/test/MC/WebAssembly/extern-functype-intrinsic.ll
Currently, the SLS hardening pass is run before the machine outliner,
which means that the outliner creates new functions and calls which do
not have the SLS hardening applied.

The fix for this is to move the SLS passes to after the outliner, as has
recently been done for the return address signing pass.

This also avoids a bug where the SLS outliner emits code with
instructions after a return, which the outliner doesn't correctly
handle.
Clang returns an error when compiling this file with c++20
```
error: ISO C++20 does not permit initialization of char array with UTF-8 string literal
```
It seems like c++20 treats u8strings differently than strings (probably
needs char8_t).
Make this a string to fix the error.
)

The standard specifies that it it UB to specialize the following traits:
 - `std::is_integral`
 - `std::is_unsigned`
 - `std::make_unsigned`
 - `std::make_signed`

This patch:
 - Removes specializations for `BigInt`
 - Transforms SFINAE for `bit.h` functions from template parameter to
   return type (This makes specialization easier).
 - Adds `BigInt` specialization for `bit.h` functions.
 - Fixes code depending on previous specializations.
Following the pattern used for SOP instructions, we can use the same
multiclass with a default argument to define renamed and non-renamed
instructions.
This allows us to configure the pass to emit
linalg.copy instead of memref.copy.

This is consistent with one-shot-bufferize, which also allows to configure the `memCpyFn`,
see https://discord.com/channels/636084430946959380/642426447167881246/1211698722438783087
ACLE suggests: ARM-software/acle#308. GCC emits
diagnostics for attribute strings which contain duplicate features, but
for now let's follow the SPEC in regards to mangling rules and we can
change the semantic behavior of the compiler later if there's value to
it.
This adds patterns and a pass to convert the Arith dialect to EmitC. For
now, this covers arithemtic binary ops operating on floating point
types.

It is not checked within the patterns whether the types, such as the
Tensor type, are supported in the respective EmitC operations. If
unsupported types should be converted, the conversion will fail anyway
because no legal EmitC operation can be created. This can clearly be
improved in a follow up, also resulting in better error messages.
Functions for such checks should not solely be used in the conversions
and should also be (re)used in the verifier.
…lvm#84299)

Note: This is a reland of llvm#84035.

The standard specifies that it it UB to specialize the following traits:
 - `std::is_integral`
 - `std::is_unsigned`
 - `std::make_unsigned`
 - `std::make_signed`

This patch:
 - Removes specializations for `BigInt`
 - Transforms SFINAE for `bit.h` functions from template parameter to
   return type (This makes specialization easier).
 - Adds `BigInt` specialization for `bit.h` functions.
 - Fixes code depending on previous specializations.
…th changing shuffles (llvm#84156)

Fix gap in the cost estimation for length changing shuffles, by adjusting the shuffle mask and either widening the shuffle inputs or extracting the lower elements of the result.

A small step towards moving some of this implementation inside improveShuffleKindFromMask and/or target getShuffleCost handlers (and reduce the diffs in cost estimation depending on whether coming from a ShuffleVectorInst or the raw operands / mask components)
Pass the Pseudo (instead of its name) into EXP_Real_Row and
EXP_Real_ComprVM since it is already available in all subclasses.
Rename getNumVGPRBlocks to getEncodedNumVGPRBlocks, to clarify that it's
using the encoding granule. This is used to program the hardware. In
practice, the hardware will use the alloc granule instead, so this patch
also adds a new helper, getAllocatedNumVGPRBlocks, which can be useful
when driving heuristics.
…l-expression. (llvm#82611)

In llvm#72985, I made a change to
discard
expression state (`ExprToLoc` and `ExprToVal`) at the beginning of each
basic
block. I did so with the claim that "we never need to access entries
from these
maps outside of the current basic block", noting that there are
exceptions to
this claim when control flow happens inside a full-expression (the
operands of
`&&`, `||`, and the conditional operator live in different basic blocks
than the
operator itself) but that we already have a mechanism for retrieving the
values
of these operands from the environment for the block they are computed
in.

It turns out, however, that the operands of these operators aren't the
only
expressions whose values can be accessed from a different basic block;
when
control flow happens within a full-expression, that control flow can be
"interposed" between an expression and its parent. Here is an example:

```cxx
void f(int*, int);
bool cond();

void target() {
  int i = 0;
  f(&i, cond() ? 1 : 0);
}
```

([godbolt](https://godbolt.org/z/hrbj1Mj3o))

In the CFG[^1] , note how the expression for `&i` is computed in block
B4,
but the parent of this expression (the `CallExpr`) is located in block
B1.
The the argument expression `&i` and the `CallExpr` are essentially
"torn apart"
into different basic blocks by the conditional operator in the second
argument.
In other words, the edge between the `CallExpr` and its argument `&i`
straddles
the boundary between two blocks.

I used to think that this scenario -- where an edge between an
expression and
one of its children straddles a block boundary -- could only happen
between the
expression that triggers the control flow (`&&`, `||`, or the
conditional
operator) and its children, but the example above shows that other
expressions
can be affected as well; the control flow is still triggered by `&&`,
`||` or
the conditional operator, but the expressions affected lie outside these
operators.

Discarding expression state too soon is harmful. For example, an
analysis that
checks the arguments of the `CallExpr` above would not be able to
retrieve a
value for the `&i` argument.

This patch therefore ensures that we don't discard expression state
before the
end of a full-expression. In other cases -- when the evaluation of a
full-expression is complete -- we still want to discard expression state
for the
reasons explained in llvm#72985
(avoid
performing joins on boolean values that are no longer needed, which
unnecessarily extends the flow condition; improve debuggability by
removing
clutter from the expression state).

The impact on performance from this change is about a 1% slowdown in the
Crubit nullability check benchmarks:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     71.9µs ± 1%  71.9µs ± 2%    ~     (p=0.987 n=15+20)
BM_PointerAnalysisIntLoop          190µs ± 1%   192µs ± 2%  +1.06%  (p=0.000 n=14+16)
BM_PointerAnalysisPointerLoop      325µs ± 5%   324µs ± 4%    ~     (p=0.496 n=18+20)
BM_PointerAnalysisBranch           193µs ± 0%   192µs ± 4%    ~     (p=0.488 n=14+18)
BM_PointerAnalysisLoopAndBranch    521µs ± 1%   525µs ± 3%  +0.94%  (p=0.017 n=18+19)
BM_PointerAnalysisTwoLoops         337µs ± 1%   341µs ± 3%  +1.19%  (p=0.004 n=17+19)
BM_PointerAnalysisJoinFilePath    1.62ms ± 2%  1.64ms ± 3%  +0.92%  (p=0.021 n=20+20)
BM_PointerAnalysisCallInLoop      1.14ms ± 1%  1.15ms ± 4%    ~     (p=0.135 n=16+18)
```

[^1]:
```
 [B5 (ENTRY)]
   Succs (1): B4

 [B1]
   1: [B4.9] ? [B2.1] : [B3.1]
   2: [B4.4]([B4.6], [B1.1])
   Preds (2): B2 B3
   Succs (1): B0

 [B2]
   1: 1
   Preds (1): B4
   Succs (1): B1

 [B3]
   1: 0
   Preds (1): B4
   Succs (1): B1

 [B4]
   1: 0
   2: int i = 0;
   3: f
   4: [B4.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int *, int))
   5: i
   6: &[B4.5]
   7: cond
   8: [B4.7] (ImplicitCastExpr, FunctionToPointerDecay, _Bool (*)(void))
   9: [B4.8]()
   T: [B4.9] ? ... : ...
   Preds (1): B5
   Succs (2): B2 B3

 [B0 (EXIT)]
   Preds (1): B1
```
…llvm#84296)

I did not know how `-mmlir` flag works and was deferring the addition of
`--openm-enabled-delayed-privatization` until later because I thought
some work needs to be done to do that. This commit just adds some extra
`RUN` lines to delayed privatization tests to run them from `flang` as
well.
…llvm#79818)

This patch adds support for parsing the proposed non-instruction debug
info ("RemoveDIs") from textual IR, and adds a test for the parser as well
as a set of verifier tests that are dependent on parsing to fire.

An important detail of this patch is the fact that although we can now
parse in the RemoveDIs (new) and Intrinsic (old) debug info formats, we
will always convert back to the old format at the end of parsing - this
is done for two reasons: firstly to ensure that every tool is able to
process IR printed in the new format, regardless of whether that tool
has had RemoveDIs support added, and secondly to maintain the effect of
the existing flags: for the tools where support for the new format has
been added, we will run LLVM passes in the new format iff
`--try-experimental-debuginfo-iterators=true`, and we will print in the
new format iff `--write-experimental-debuginfo-iterators=true`; the
format of the textual IR input should have no effect on either of these
features.
…are local to the inner context (llvm#84150)

Make TopLevelStmtDecl a DeclContext so that variables defined in statements
are attached to the TopLevelDeclContext. This fixes redefinition errors
from variables declared in if conditions and for-init statements. These
must be local to the inner context (C++ 3.3.2p4), but they had generated
definitions on global scope instead.

This PR makes the TopLevelStmtDecl looking more like a FunctionDecl and
that's fine because the FunctionDecl is very close in terms of semantics.

Additionally, ActOnForStmt() requires a CompoundScope when processing a
NullStmt body.

---------

Co-authored-by: Vassil Vassilev <v.g.vassilev@gmail.com>
nhaehnle and others added 8 commits March 8, 2024 04:25
This reverts commit fb02f9a.

Looks like some Python version incompatibility, will investigate.
Resubmitting this after previous revert with the following changes:

- Split table into table_rhs_idx and table_candidate_idx so that
  bisect.bisect_left can be used without the `key` argument, which
  was introduced in Python 3.10
- Remove a re.Pattern type annotation

Original commit message:

Prior to this change, running UTC on larger tests, especially tests
with unnamed IR values, often resulted in a spuriously large diff
because e.g. TMPnn variables in the CHECK lines were renumbered. This
change attempts to reduce the diff by keeping those variable names the
same.

There are cases in which this "drift" of variable names can end up being
more confusing. The old behavior can be re-enabled with the
--reset-variable-names command line argument.

The improvement may not be immediately apparent in the diff of this change.
The point is that the diff of stable_ir_values.ll against
stable_ir_values.ll.expected after this change is smaller.

Ideally, we'd also keep meta variables for "global" objects stable, e.g.
for attributes (#nn) and metadata (!nn). However, that would require a
much more substantial refactoring of how we generate check lines, so I
left it for future work.
Older versions of clang do not have __builtin_complex, but they
may define `__GNUC__`.
…s. (llvm#84187)

With the many pseudos used in SVE codegen it can be too easy to miss
instructions. This enables the existing test we have for checking the
scheduling info of the pseudos matches the real instructions, and
adjusts the scheduling info in the NeoverseV1 model to make sure all are
handled. In the cases I could I opted to use the same info as in the
NeoverseV2 model, to keep the differences smaller.
…this` with values. (llvm#84164)

This is the constructor's job, and we want to be able to test that it
does this.
…y value. (llvm#84317)

I'm making some changes to `Environment::getResultObjectLocation()`,
with the
ultimate goal of eliminating `RecordValue` entirely, and I'd like to
make sure
I don't break this behavior (and I've realized we don't have a test for
it yet).
These checks have been broken since 6afe972.
The check_cxx_compiler_flag macro only takes two arguments and passing
three to it ends up calling
`cmake_check_compiler_flag(CXX "${_FLAG}" ${_RESULT})` with ${_FLAG}
equal to `-Werror` and the result variable being the actually tested
compiler flag.

I noticed this because some of the flags that I know should be supported
were being flagged as not supported. `--debug-trycompile` shows the
following surprising line in the generated CMakeLists.txt:
`add_definitions([==[-D-Wempty-body]==] [==[-Werror]==])` which then
results in the following error while running the check:
```
FAILED: CMakeFiles/cmTC_72736.dir/src.cxx.o
tmp/upstream-llvm-readonly/bin/clang++   -nodefaultlibs -std=c++17 -fcolor-diagnostics   -D-Wempty-body -Werror -MD -MT CMakeFiles/cmTC_72736.dir/src.cxx.o -MF CMakeFiles/cmTC_72736.dir/src.cxx.o.d -o CMakeFiles/cmTC_72736.dir/src.cxx.o -c .../cmake-build-all-sanitizers/CMakeFiles/CMakeScratch/TryCompile-nyh3QR/src.cxx
In file included from <built-in>:450:
<command line>:1:9: error: macro name must be an identifier
    1 | #define -Wempty-body 1
      |         ^
   1 error generated.
```

It would be great if CMake could be a bit more helpful here so I've
filed https://gitlab.kitware.com/cmake/cmake/-/issues/25735.

See also https://reviews.llvm.org/D146920.

Reviewed By: nikic

Pull Request: llvm#83779
Base automatically changed from jose.post_pdll_xilinx_3 to feature/fused-ops August 7, 2024 15:14
@josel-amd josel-amd changed the title First batch post March 7th First batch post March 7th [Nr 0] Aug 8, 2024
@josel-amd josel-amd enabled auto-merge August 8, 2024 12:34
@josel-amd
Copy link
Collaborator Author

Auto-merged enabled. Should merge itself soon. Thanks for the approval!

@josel-amd josel-amd merged commit 91d4461 into feature/fused-ops Aug 8, 2024
4 checks passed
@josel-amd josel-amd deleted the jose.post_march_7 branch August 8, 2024 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.