Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with 12f77e81 (Jun 13) (74) #338

Merged
merged 73 commits into from
Sep 13, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

jayfoad and others added 30 commits June 13, 2024 10:04
)

Fixes llvm#92440

I had to delete part of reduction09.f90 because I don't think that
should have ever worked.
…/EXTRACT_VECTOR_ELT (llvm#93357)

DAGTypeLegalizer::SplitVecRes_INSERT_VECTOR_ELT and
DAGTypeLegalizer::SplitVecRes_EXTRACT_VECTOR_ELT did not handle
non byte-sized elements properly. In fact, it only dealt with
elements smaller than 8 bits (as well as byte-sized elements).

This patch generalizes the support for non byte-sized element by
always widening the the vector elements to next "round integer type"
(a power of 2 bit size). This should make sure that we can access a
single element via a simple byte-addressed scalar load/store.

Also removing a suspicious CustomLowerNode call from
SplitVecRes_INSERT_VECTOR_ELT. Considering that it did not reset
the Lo/Hi out arguments before the return I think that
DAGTypeLegalizer::SplitVectorResult could be fooled into registering
the input vector as being the result. This should however not have
caused any problems since DAGTypeLegalizer::SplitVectorResult is
doing the same CustomLowerNode call, making the code removed by
this patch redundant.
…llvm#94911)

An 8 x i16 raw load was incorrectly using a 64-bit memory type, which
would assert in the MachineMemOperand constructor.

This is preparation for a cleanup which will make the buffer intrinsics
work for all legal types.
Use the internal computeForAddCarry directly since we know the exact values of the carry bit.
DWOName is still used afterwards. The only reason this works out
right now is that SmallString does not actually have a constructor
that can take advantage of the move.
I believe it has been implemented since D139837 "Implements CTAD for
aggregates P1816R0 and P2082R1", so this just claims we have already
supported it.

Plus an update on the dr status page.
…lvm#94889)

Fixes two issues in two ways:

1) The `braced-init-list` consisted of `initializer-list` and
`designated-initializer-list`, and thus the designated initializer is
subject to [over.match.class.deduct]p1.8, which means the brace elision
is also applicable on it for CTAD deduction guides.

2) When forming a deduction guide where the brace elision is applicable,
we should also consider the presence of braces within the initializer.
For example, given

template <class T, class U> struct X {
  T t[2];
  U u[3];
};

X x = {{1, 2}, 3, 4, 5};

we should establish such deduction guide AFAIU: `X(T (&&)[2], U, U, U) -> X<T, U>`.

Fixes llvm#64625
Fixes llvm#83368
…#84443)

This commit implements the Window Scheduler as described in the RFC:

https://discourse.llvm.org/t/rfc-window-scheduling-algorithm-for-machinepipeliner-in-llvm/74718

This Window Scheduler implements the window algorithm designed by
Steven Muchnick in the book "Advanced Compiler Design And
Implementation",
with some improvements:

1. Copy 3 times of the loop kernel and construct the corresponding DAG
   to identify dependencies between MIs;
2. Use heuristic algorithm to obtain a set of window offsets.

The window algorithm is equivalent to modulo scheduling algorithm with a
stage of 2. It is mainly applied in targets where hardware resource
conflicts are severe, and the SMS algorithm often fails in such cases.
On our own DSA, this window algorithm typically can achieve a
performance
improvement of over 10%.

Co-authored-by: Kai Yan <aklkaiyan@tencent.com>
Co-authored-by: Ran Xiao <lennyxiao@tencent.com>

---------

Co-authored-by: Kai Yan <aklkaiyan@tencent.com>
Co-authored-by: Ran Xiao <lennyxiao@tencent.com>
Due to alignment, the first two fields of MCEncodedFragment are
currently at bytes 40 and 41, so 1 byte over the 8 byte boundary,
causing 7 bytes padding to be inserted for the following pointer.

Fold two bools of MCFragment into bitfields to reduce move the two
fields of MCEncodedFragment one byte earlier to remove the padding
bytes. This works, as in the Itanium ABI, there is no padding after
base classes.

This gives a space reduction of MCDataFragment from 224 to 216 bytes.
dyn_cast is allowed to return NULL - use cast<> to assert that the cast type is valid

Fixes static analysis warning.
…lvm#95376)

Mostly fixes handling of bfloat vectors, but also some missing
i16 cases.
To detect features we either use HWCAPs or directly extract system
register bitfields and compare with a value. In many cases equality
comparisons give wrong results for example FEAT_SVE is not set if SVE2
is available (see the issue llvm#93651). I am also making the access to
__aarch64_cpu_features atomic.

The corresponding PR for the ACLE specification is
ARM-software/acle#322.
By storing possible test vectors instead of combinations of conditions,
the restriction is dramatically relaxed.

This introduces two options to `cc1`:

* `-fmcdc-max-conditions=32767`
* `-fmcdc-max-test-vectors=2147483646`

This change makes coverage mapping, profraw, and profdata incompatible
with Clang-18.

- Bitmap semantics changed. It is incompatible with previous format.
- `BitmapIdx` in `Decision` points to the end of the bitmap.
- Bitmap is packed per function.
- `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`.

RFC:
https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798
I was wrong: The purpose of CWG2685 is to avoid brace elision on string
literals and we should be rejecting the case.

Reverts llvm#95206
…ype> calls. NFC.

Use getAs<ExtVectorType> directly to avoid duplicate getAs<> calls and static analyser warnings about dereferencing potentially null pointers.
…operands unless legal.

Converting to avgfloor and then expanding it back to shift+add later is likely to prevent other folds (re-association and value-tracking in particular) in the meantime.

Fixes llvm#95284
…nd (llvm#95142)" (llvm#95306)

Fixed the link error that previously occurred on buildbots by adding
IRPrinter to the linked components of the Flang frontend.

This reverts commit 1d45235.
arsenm and others added 25 commits June 13, 2024 17:07
We apparently are missing codegen support for atomicrmw fmin/fmax. Also
clean up FP atomicrmw tests
to be more consistent and comprehensively test the relevant cases
Fix warning in SPIRVMergeRegionExitTargets.cpp about "non-void function
does not return a value in all control paths" by changing assert to
llvm_unreachable.
…#95329)

MLIR's LLVM dialect does not internally support debug records, only
converting to/from debug intrinsics. To smooth the transition from
intrinsics to records, there is a step prior to IR->MLIR translation
that switches the IR module to intrinsic-form; this patch adds the
equivalent conversion to record-form at MLIR->IR translation.

This is a partial reapply of
llvm#95098 which can be landed once
the flang frontend has been updated by
llvm#95306. This is the counterpart
to the earlier patch llvm#89735
which handled the IR->MLIR conversion.
This isn't used by clang and isn't in the rvv-intrinsic-doc.

The instruction requires Zvfh.

If the F register passed to the instruction isn't nan-boxed correctly,
the instruction will generate the wrong nan. So the instruction isn't a
generic move FPR16 to vector register instruction.
The custom lowering converts to f32, splats as f32, then narrows the
vector to bf16. None of that requires Zvfhmin.

Add new bf16 test files without Zvfh/Zvfmin in their RUN lines. I will
remove the bf16 tests from other files in a follow up patch.
Ahead of llvm#94242 and as
requested in the technical call, I am adding a couple of tests for
pointer components that I would like to make sure are covered.
…llvm#93361)

LLVM's Vector Predication Intrinsics require an explicit vector length
parameter:
https://llvm.org/docs/LangRef.html#vector-predication-intrinsics.

For a scalable vector type, this should be caculated as VectorScaleOp
multiplied by base vector length, e.g.: for <[4]xf32> we should return:
vscale * 4.
Create additional helper functions for the ValueObject class, for:
  - returning the value as an APSInt or APFloat
  - additional type casting options
  - additional ways to create ValueObjects from various types of data
  - dereferencing a ValueObject

These helper functions are needed for implementing the Data Inspection
Language, described in
https://discourse.llvm.org/t/rfc-data-inspection-language/69893
…fdump (llvm#93289)

This patch adds a new set of statistics to llvm-dwarfdump that provide
additional information about .debug_line regarding the number of bytes
covered by the line table (and how many of those are covered by line 0
entries), and the number of entries within the table and how many of
those are is_stmt, unique, or unique and non-line-0 (where "uniqueness"
is based on file, line, and column only).

Collectively these give a little more insight into the state of debug
line information, rather than variables (as most of the dwarfdump
statistics are currently oriented towards). I've added all of the stats
that were useful to some degree, but I think the most generally useful
stat is "unique line entries", since it gives the most straightforward
indication of regressions, i.e. when the number goes down it means that
fewer source lines are reachable in the program.
commonBits has been deprecated since:

  commit d8229e2
  Author: Jay Foad <jay.foad@amd.com>
  Date:   Wed May 10 16:50:33 2023 +0100
Support case-insensitive regex matches for
`SBTarget::FindGlobalFunctions` and `SBTarget::FindGlobalVariables`.
…lvm#83301)

If a function requires any streaming-mode change, the vector granule
value must be stored to the stack and unwind info must also describe the
save of VG to this location.

This patch adds VG to the list of callee-saved registers and increases
the
callee-saved stack size if the function requires streaming-mode changes.
A new type is added to RegPairInfo, which is also used to skip restoring
the register used to spill the VG value in the epilogue.

See
https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
The default debug info format for newer versions of Darwin is DWARF 5.

https://developer.apple.com/documentation/xcode-release-notes/xcode-16-release-notes

rdar://110925733

(relanding 8f6acd9 with the bridgeOS platform check removed)
…llvm#95326)

When an uninstrumented libatomic is used with a TSan instrumented
memcpy, TSan may report a data race in circumstances where writes are
arguably safe.

This occurs because __atomic_compare_exchange won't be instrumented in
an uninstrumented libatomic, so TSan doesn't know that the subsequent
memcpy is race-free.

On the other hand, pthread_mutex_(un)lock will be intercepted by TSan,
meaning an uninstrumented libatomic will not report this false-positive.

pthread_mutexes also may try a number of different strategies to acquire
the lock, which may bound the amount of time a thread has to wait for a
lock during contention.

While pthread_mutex_lock has a larger overhead (due to the function call
and some dispatching), a dispatch to libatomic already predicates a lack
of performance guarantees.
Reverts llvm#73980

This broke static hwasan binaries in Android, for some reason the
fixed_shadow_base branch gets taken
Base automatically changed from bump_to_705f8581 to feature/fused-ops September 13, 2024 06:27
An error occurred while trying to automatically change base from bump_to_705f8581 to feature/fused-ops September 13, 2024 06:27
@mgehre-amd mgehre-amd merged commit d847318 into feature/fused-ops Sep 13, 2024
7 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_12f77e81 branch September 13, 2024 06:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.