forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 12f77e81 (Jun 13) (74) #338
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
) Fixes llvm#92440 I had to delete part of reduction09.f90 because I don't think that should have ever worked.
…/EXTRACT_VECTOR_ELT (llvm#93357) DAGTypeLegalizer::SplitVecRes_INSERT_VECTOR_ELT and DAGTypeLegalizer::SplitVecRes_EXTRACT_VECTOR_ELT did not handle non byte-sized elements properly. In fact, it only dealt with elements smaller than 8 bits (as well as byte-sized elements). This patch generalizes the support for non byte-sized element by always widening the the vector elements to next "round integer type" (a power of 2 bit size). This should make sure that we can access a single element via a simple byte-addressed scalar load/store. Also removing a suspicious CustomLowerNode call from SplitVecRes_INSERT_VECTOR_ELT. Considering that it did not reset the Lo/Hi out arguments before the return I think that DAGTypeLegalizer::SplitVectorResult could be fooled into registering the input vector as being the result. This should however not have caused any problems since DAGTypeLegalizer::SplitVectorResult is doing the same CustomLowerNode call, making the code removed by this patch redundant.
…llvm#94911) An 8 x i16 raw load was incorrectly using a 64-bit memory type, which would assert in the MachineMemOperand constructor. This is preparation for a cleanup which will make the buffer intrinsics work for all legal types.
Use the internal computeForAddCarry directly since we know the exact values of the carry bit.
DWOName is still used afterwards. The only reason this works out right now is that SmallString does not actually have a constructor that can take advantage of the move.
I believe it has been implemented since D139837 "Implements CTAD for aggregates P1816R0 and P2082R1", so this just claims we have already supported it. Plus an update on the dr status page.
…lvm#94889) Fixes two issues in two ways: 1) The `braced-init-list` consisted of `initializer-list` and `designated-initializer-list`, and thus the designated initializer is subject to [over.match.class.deduct]p1.8, which means the brace elision is also applicable on it for CTAD deduction guides. 2) When forming a deduction guide where the brace elision is applicable, we should also consider the presence of braces within the initializer. For example, given template <class T, class U> struct X { T t[2]; U u[3]; }; X x = {{1, 2}, 3, 4, 5}; we should establish such deduction guide AFAIU: `X(T (&&)[2], U, U, U) -> X<T, U>`. Fixes llvm#64625 Fixes llvm#83368
…#84443) This commit implements the Window Scheduler as described in the RFC: https://discourse.llvm.org/t/rfc-window-scheduling-algorithm-for-machinepipeliner-in-llvm/74718 This Window Scheduler implements the window algorithm designed by Steven Muchnick in the book "Advanced Compiler Design And Implementation", with some improvements: 1. Copy 3 times of the loop kernel and construct the corresponding DAG to identify dependencies between MIs; 2. Use heuristic algorithm to obtain a set of window offsets. The window algorithm is equivalent to modulo scheduling algorithm with a stage of 2. It is mainly applied in targets where hardware resource conflicts are severe, and the SMS algorithm often fails in such cases. On our own DSA, this window algorithm typically can achieve a performance improvement of over 10%. Co-authored-by: Kai Yan <aklkaiyan@tencent.com> Co-authored-by: Ran Xiao <lennyxiao@tencent.com> --------- Co-authored-by: Kai Yan <aklkaiyan@tencent.com> Co-authored-by: Ran Xiao <lennyxiao@tencent.com>
Due to alignment, the first two fields of MCEncodedFragment are currently at bytes 40 and 41, so 1 byte over the 8 byte boundary, causing 7 bytes padding to be inserted for the following pointer. Fold two bools of MCFragment into bitfields to reduce move the two fields of MCEncodedFragment one byte earlier to remove the padding bytes. This works, as in the Itanium ABI, there is no padding after base classes. This gives a space reduction of MCDataFragment from 224 to 216 bytes.
dyn_cast is allowed to return NULL - use cast<> to assert that the cast type is valid Fixes static analysis warning.
…lvm#95376) Mostly fixes handling of bfloat vectors, but also some missing i16 cases.
To detect features we either use HWCAPs or directly extract system register bitfields and compare with a value. In many cases equality comparisons give wrong results for example FEAT_SVE is not set if SVE2 is available (see the issue llvm#93651). I am also making the access to __aarch64_cpu_features atomic. The corresponding PR for the ACLE specification is ARM-software/acle#322.
By storing possible test vectors instead of combinations of conditions, the restriction is dramatically relaxed. This introduces two options to `cc1`: * `-fmcdc-max-conditions=32767` * `-fmcdc-max-test-vectors=2147483646` This change makes coverage mapping, profraw, and profdata incompatible with Clang-18. - Bitmap semantics changed. It is incompatible with previous format. - `BitmapIdx` in `Decision` points to the end of the bitmap. - Bitmap is packed per function. - `llvm-cov` can understand `profdata` generated by `llvm-profdata-18`. RFC: https://discourse.llvm.org/t/rfc-coverage-new-algorithm-and-file-format-for-mc-dc/76798
I was wrong: The purpose of CWG2685 is to avoid brace elision on string literals and we should be rejecting the case. Reverts llvm#95206
…ype> calls. NFC. Use getAs<ExtVectorType> directly to avoid duplicate getAs<> calls and static analyser warnings about dereferencing potentially null pointers.
…operands unless legal. Converting to avgfloor and then expanding it back to shift+add later is likely to prevent other folds (re-association and value-tracking in particular) in the meantime. Fixes llvm#95284
…nd (llvm#95142)" (llvm#95306) Fixed the link error that previously occurred on buildbots by adding IRPrinter to the linked components of the Flang frontend. This reverts commit 1d45235.
We apparently are missing codegen support for atomicrmw fmin/fmax. Also clean up FP atomicrmw tests to be more consistent and comprehensively test the relevant cases
Fix warning in SPIRVMergeRegionExitTargets.cpp about "non-void function does not return a value in all control paths" by changing assert to llvm_unreachable.
…#95329) MLIR's LLVM dialect does not internally support debug records, only converting to/from debug intrinsics. To smooth the transition from intrinsics to records, there is a step prior to IR->MLIR translation that switches the IR module to intrinsic-form; this patch adds the equivalent conversion to record-form at MLIR->IR translation. This is a partial reapply of llvm#95098 which can be landed once the flang frontend has been updated by llvm#95306. This is the counterpart to the earlier patch llvm#89735 which handled the IR->MLIR conversion.
This isn't used by clang and isn't in the rvv-intrinsic-doc. The instruction requires Zvfh. If the F register passed to the instruction isn't nan-boxed correctly, the instruction will generate the wrong nan. So the instruction isn't a generic move FPR16 to vector register instruction.
The custom lowering converts to f32, splats as f32, then narrows the vector to bf16. None of that requires Zvfhmin. Add new bf16 test files without Zvfh/Zvfmin in their RUN lines. I will remove the bf16 tests from other files in a follow up patch.
Ahead of llvm#94242 and as requested in the technical call, I am adding a couple of tests for pointer components that I would like to make sure are covered.
…ernal and private global data
…llvm#93361) LLVM's Vector Predication Intrinsics require an explicit vector length parameter: https://llvm.org/docs/LangRef.html#vector-predication-intrinsics. For a scalable vector type, this should be caculated as VectorScaleOp multiplied by base vector length, e.g.: for <[4]xf32> we should return: vscale * 4.
Create additional helper functions for the ValueObject class, for: - returning the value as an APSInt or APFloat - additional type casting options - additional ways to create ValueObjects from various types of data - dereferencing a ValueObject These helper functions are needed for implementing the Data Inspection Language, described in https://discourse.llvm.org/t/rfc-data-inspection-language/69893
…fdump (llvm#93289) This patch adds a new set of statistics to llvm-dwarfdump that provide additional information about .debug_line regarding the number of bytes covered by the line table (and how many of those are covered by line 0 entries), and the number of entries within the table and how many of those are is_stmt, unique, or unique and non-line-0 (where "uniqueness" is based on file, line, and column only). Collectively these give a little more insight into the state of debug line information, rather than variables (as most of the dwarfdump statistics are currently oriented towards). I've added all of the stats that were useful to some degree, but I think the most generally useful stat is "unique line entries", since it gives the most straightforward indication of regressions, i.e. when the number goes down it means that fewer source lines are reachable in the program.
commonBits has been deprecated since: commit d8229e2 Author: Jay Foad <jay.foad@amd.com> Date: Wed May 10 16:50:33 2023 +0100
Support case-insensitive regex matches for `SBTarget::FindGlobalFunctions` and `SBTarget::FindGlobalVariables`.
This reverts commit 0079835. Causes crashes, see comments on llvm#92555.
…lvm#83301) If a function requires any streaming-mode change, the vector granule value must be stored to the stack and unwind info must also describe the save of VG to this location. This patch adds VG to the list of callee-saved registers and increases the callee-saved stack size if the function requires streaming-mode changes. A new type is added to RegPairInfo, which is also used to skip restoring the register used to spill the VG value in the epilogue. See https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
The default debug info format for newer versions of Darwin is DWARF 5. https://developer.apple.com/documentation/xcode-release-notes/xcode-16-release-notes rdar://110925733 (relanding 8f6acd9 with the bridgeOS platform check removed)
…llvm#95326) When an uninstrumented libatomic is used with a TSan instrumented memcpy, TSan may report a data race in circumstances where writes are arguably safe. This occurs because __atomic_compare_exchange won't be instrumented in an uninstrumented libatomic, so TSan doesn't know that the subsequent memcpy is race-free. On the other hand, pthread_mutex_(un)lock will be intercepted by TSan, meaning an uninstrumented libatomic will not report this false-positive. pthread_mutexes also may try a number of different strategies to acquire the lock, which may bound the amount of time a thread has to wait for a lock during contention. While pthread_mutex_lock has a larger overhead (due to the function call and some dispatching), a dispatch to libatomic already predicates a lack of performance guarantees.
Reverts llvm#73980 This broke static hwasan binaries in Android, for some reason the fixed_shadow_base branch gets taken
cferry-AMD
approved these changes
Sep 12, 2024
An error occurred while trying to automatically change base from
bump_to_705f8581
to
feature/fused-ops
September 13, 2024 06:27
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.