forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with cc04bbb2 (Jun 11) (70) #334
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…reTable (llvm#95082) The wasm backend fetches the tan runtime lib call in `llvm/include/llvm/IR/RuntimeLibcalls.def` via `StaticLibcallNameMap()`, but ignores the runtime function because a function sinature mapping is not specified in RuntimeLibcallSignatureTable(). The fix is to specify the function signatures for float32-128. This is a fix for a build break reported on PR llvm#94559 (comment).
Following a rather direct approach to expose PDL usage from C and then Python. This doesn't yes plumb through adding support for custom matchers through this interface, so constrained to basics initially. This also exposes greedy rewrite driver. Only way currently to define patterns is via PDL (just to keep small). The creation of the PDL pattern module could be improved to avoid folks potentially accessing the module used to construct it post construction. No ergonomic work done yet. --------- Signed-off-by: Jacques Pienaar <jpienaar@google.com>
Otherwise this would fail when using gnuwin32.
…EG + SETO/SETNO (llvm#94948) For i64 this avoids loading a 64-bit value into register, for smaller registers this just avoids an immediate operand. For i8+i16, limit to one use case as we save fewer bytes and these can be wasted entirely on extra register moves. Fixes llvm#67709
Ignore the base and visit the Member decl like a regular DeclRefExpr.
Fragments are allocated with `operator new` and stored in an ilist with Prev/Next/Parent pointers. A more efficient representation would be an array of fragments without the overhead of Prev/Next pointers. As the first step, replace ilist with singly-linked lists. * `getPrevNode` uses have been eliminated by previous changes. * The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and the current insertion point is stored at `CurInsertionPoint`. * `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all fragments within `Frags`. Hexagon programs are usually small, and the performance does not matter that much. To eliminate `Prev`, change the subsection representation to singly-linked lists for subsections and a pointer to the active singly-linked list. The fragments from all subsections will be chained together at layout time. Since fragment lists are disconnected before layout time, we can remove `MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The current implementation of `AttemptToFoldSymbolOffsetDifference` requires future improvement for robustness. Pull Request: llvm#95077
Like many other tests, this one times out when run under the address sanitizer. To reduce noise, this commit skips it in those builds.
These tests pass on Linux using lit's internal shell.
…5040) Co-authored-by: David Parks <dparks@nvidia.com>
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC and the call graph updates done as a result of the inliner take a long time. This is due to RefSCC::removeInternalRefEdges() getting called many times, each time removing one function from the RefSCC, but each call to removeInternalRefEdges() is proportional to the size of the RefSCC. There are two places that call removeInternalRefEdges(), in updateCGAndAnalysisManagerForPass() and LazyCallGraph::removeDeadFunction(). 1) Since LazyCallGraph can deal with spurious (edges that exist in the graph but not in the IR) ref edges, we can simply not call removeInternalRefEdges() in updateCGAndAnalysisManagerForPass(). 2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of compile time with the above change for the original reason. So instead we batch all the dead function removals so we can call removeInternalRefEdges() just once. This requires some changes to callers of removeDeadFunction() to not actually erase the function from the module, but defer it to when we batch delete dead functions at the end of the CGSCC run, leaving the function body as "unreachable" in the meantime. We still need to ensure that call edges are accurate. I had also tried deleting dead functions after visiting a RefSCC, but deleting them all at once at the end was simpler. Many test changes are due to not performing unnecessary revisits of an SCC (the CGSCC infrastructure deems ref edge refinements as unimportant when it comes to revisiting SCCs, although that seems to not be consistently true given these changes) because we don't remove some ref edges. Specifically for devirt-invalidated.ll this seems to expose an inlining order issue with the inliner. Probably unimportant for this type of intentionally weird call graph. Compile time: https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u
We hit this downstream and the only evidence of the mistake was that the results of `Find` on `SubtargetFeatureKV` were corrupted.
…5076) The character reduce runtime functions expect a pointer to a scalar character of the correct length for the result of character reduce. A descriptor was passed so far. Fix the lowering so a proper temporary is created and passed to the runtime.
… check (llvm#94920) Before this PR, clangd forcefully disabled misc-const-correctness in disableUnusableChecks(). Now we have a FastCheckFilter configuration whose default value (Strict) also disables it. This patch removes misc-const-correctness from disableUnusableChecks() so it's possible to enable by setting FastCheckFilter to None. Fixes llvm#89758
Remove old usages of GDB Index functions after replacing them with new ones.
`GetDeclContextDIEs` and `DIEDeclContextsMatch` are unused (possibly since we added support for simplified template names, but I haven't checked). `GetDeclContextDIEs` is also very similar (but subtly different) from `GetDeclContext` and `GetTypeLookupContext`. I am keeping `GetParentDeclContextDIE` as that one still has some callers, but I want to look into the possibility of merging it with at least one of the functions mentioned above.
.altinstructions section contains a list of structures where fields can have different sizes while other fields could be present or not depending on the kernel version. Add automatic detection of such variations and use it by default. The user can still overwrite the automatic detection with `--alt-inst-has-padlen` and `--alt-inst-feature-size` options.
This change makes sure the preferred switch condition int type size remains the same throughout CodeGen optimizations. The change fixes running several OpenCL applications with -O2 or higher opt levels, and fixes Basic/stream/stream_max_stmt_exceed.cpp DPC++ E2E test with -O2.
`convertCallToIndirectCall` applies the PLTCall optimization and returns an (updated if needed) iterator to the converted call instruction. Since AArch64 requires to inject additional instructions to implement this pass, the relevant BasicBlock and an iterator was passed to the `convertCallToIndirectCall`. `NumCallsOptimized` is updated only on successful application of the pass. Tests: - Inputs/plt-tailcall.c: an example of a tail call optimized PLT call. - AArch64/plt-call.test: it is the actual A64 test, that runs the PLTCall optimization on the above input file and verifies the application of the pass to the calls: 'printf' and 'puts'.
Avoid wastefully setting CanVecMem in several places in analyzeLoop, complicating the logic, to get the function to return a bool, and set CanVecMem in the caller.
std::list default-constructs itself as an empty list, so we don't need to call ValueData.clear() in the constructor.
…pointee types (llvm#94952) This PR is a tweak to ensure that DuplicatesTracker is working with TypedPointers pointee types rather than with original llvm's untyped pointers. This enforces DuplicatesTracker promise to avoid emission of several identical OpTypePointer instructions.
…tions (llvm#95055) This PR implements insertion of OpGenericCastToPtr using builtin functions (both opencl `to_global|local|private` and `__spirv_` wrappers), and improves type inference.
…#95054) As stated in `UnwindInfoSectionImpl::prepareRelocations`'s comments, the unwind info uses section+addend relocations for personality functions defined in the same file as the function itself. As personality functions are always accessed via the GOT, we need to resolve those to a symbol. Previously, we did this by keeping a map which resolves these to symbols, creating a synthetic symbol if we didn't find it in the map. This approach has an issue: if we process the object file containing the personality function before any external uses, the entry in the map remains unpopulated, so we create a synthetic symbol and a corresponding GOT entry. If we encounter a relocation to it in a later file which requires GOT (such as in `__eh_frame`), we add that symbol to the GOT, too, effectively creating two entries which point to the same piece of code. This commit fixes that by searching the personality function's section for a symbol at that offset which already has a GOT entry, and only creating a synthetic symbol if there is none. As all non-unwind sections are already processed by this point, it ensures no duplication. This should only really affect our tests (and make them clearer), as personality functions are usually defined in platform runtime libraries. Or even if they are local, they are likely not in the first object file to be linked.
This PR improves legalization process of SPIR-V instructions. Namely, it introduces validation and fixing of bit width of scalar registers as a part of pre-legalizer. A test case is added that demonstrates ability to legalize instructions with non 8/16/32/64 bit width both with and without vendor-specific SPIR-V extension (SPV_INTEL_arbitrary_precision_integers). In the case of absence of the extension, a generated SPIR-V code will fallback to 8/16/32/64 bit width in OpTypeInt, but SPIR-V Backend still is able to legalize operations with original integer sizes.
…ithTypeAndScope (llvm#95146) `thread step-in` (and other step commands) take a `<thread-index>`, not a `<thread-id>`.
…m#94996) This avoids breaking code that should arguably be valid but technically isn't after enforcing the constraints on shared_ptr's constructors. A new LWG issue was filed to fix this in the Standard. This patch applies the expected resolution of this issue to avoid flip-flopping users whose code should always be considered valid. See llvm#93071 for more context.
Instead of hardcoding a loop for small strings, always call char_traits::compare which ends up desugaring to __builtin_memcmp. Note that the original code dates back 11 years, when we didn't lower to intrinsics in `char_traits::compare`. Fixes llvm#94222
It looks like the last references got removed in c747bd0. It removed a __zero() function, which was probably created at some point in the ancient past to optimize copying the string representation. The __zero() function got simplified to an assignment as part of making string constexpr, rendering this code unnecessary.
llvm#94846) The function that calculated the declaration context for a DIE was incorrectly transparently traversing acrosss DW_TAG_subprogram dies when climbing the parent DIE chain. This meant that types defined in functions would appear to have the declaration context of anything above the function. I fixed the GetTypeLookupContextImpl(...) function in DWARFDIE.cpp to not transparently skip over functions, lexical blocks and inlined functions and compile and type units. Added a test to verify things are working.
These are the HLSL specific fixes from llvm#93193. Thanks klensy!
Some of the options only fed into the full sparse pipeline. However, some backends prefer to use the sparse minipipeline. This change exposes some important optimization flags to the pass as well. This prepares some SIMDization of PyTorch sparsified code.
…ereferencing pointer to pointers. llvm#94100" (llvm#95174) The option is causing the binary output to be different when compiled under `-O0`, because it introduce dbg.declare on pseudovariables. Going to change this implementation to use dbg.value instead.
BytesInBG is always greater or equal to BG->BytesInBGAtLastCheckpoint. Note that the bug led to unnecessary attempts of page releasing and doesn't have critical impact on the correctness.
cferry-AMD
approved these changes
Sep 11, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.