Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AutoBump] Merge with cc04bbb2 (Jun 11) (70) #334

Merged
merged 39 commits into from
Sep 11, 2024

Conversation

mgehre-amd
Copy link
Collaborator

No description provided.

farzonl and others added 30 commits June 11, 2024 10:43
…reTable (llvm#95082)

The wasm backend fetches the tan runtime lib call in
`llvm/include/llvm/IR/RuntimeLibcalls.def` via `StaticLibcallNameMap()`,
but ignores the runtime function because a function sinature mapping is
not specified in RuntimeLibcallSignatureTable(). The fix is to specify
the function signatures for float32-128.

This is a fix for a build break reported on PR
llvm#94559 (comment).
Following a rather direct approach to expose PDL usage from C and then
Python. This doesn't yes plumb through adding support for custom
matchers through this interface, so constrained to basics initially.

This also exposes greedy rewrite driver. Only way currently to define
patterns is via PDL (just to keep small). The creation of the PDL
pattern module could be improved to avoid folks potentially accessing
the module used to construct it post construction. No ergonomic work
done yet.

---------

Signed-off-by: Jacques Pienaar <jpienaar@google.com>
Otherwise this would fail when using gnuwin32.
…EG + SETO/SETNO (llvm#94948)

For i64 this avoids loading a 64-bit value into register, for smaller registers this just avoids an immediate operand.

For i8+i16, limit to one use case as we save fewer bytes and these can be wasted entirely on extra register moves.

Fixes llvm#67709
Ignore the base and visit the Member decl like a regular DeclRefExpr.
Fragments are allocated with `operator new` and stored in an ilist with
Prev/Next/Parent pointers. A more efficient representation would be an
array of fragments without the overhead of Prev/Next pointers.

As the first step, replace ilist with singly-linked lists.

* `getPrevNode` uses have been eliminated by previous changes.
* The last use of the `Prev` pointer remains: for each subsection, there is an insertion point and
  the current insertion point is stored at `CurInsertionPoint`.
* `HexagonAsmBackend::finishLayout` needs a backward iterator. Save all
  fragments within `Frags`. Hexagon programs are usually small, and the
  performance does not matter that much.

To eliminate `Prev`, change the subsection representation to
singly-linked lists for subsections and a pointer to the active
singly-linked list. The fragments from all subsections will be chained
together at layout time.

Since fragment lists are disconnected before layout time, we can remove
`MCFragment::SubsectionNumber` (https://reviews.llvm.org/D69411). The
current implementation of `AttemptToFoldSymbolOffsetDifference` requires
future improvement for robustness.

Pull Request: llvm#95077
Like many other tests, this one times out when run under the address sanitizer.
To reduce noise, this commit skips it in those builds.
These tests pass on Linux using lit's internal shell.
In some modules, e.g. Kotlin-generated IR, we end up with a huge RefSCC
and the call graph updates done as a result of the inliner take a long
time. This is due to RefSCC::removeInternalRefEdges() getting called
many times, each time removing one function from the RefSCC, but each
call to removeInternalRefEdges() is proportional to the size of the
RefSCC.

There are two places that call removeInternalRefEdges(), in
updateCGAndAnalysisManagerForPass() and
LazyCallGraph::removeDeadFunction().

1) Since LazyCallGraph can deal with spurious (edges that exist in the
graph but not in the IR) ref edges, we can simply not call
removeInternalRefEdges() in updateCGAndAnalysisManagerForPass().

2) LazyCallGraph::removeDeadFunction() still ends up taking the brunt of
compile time with the above change for the original reason. So instead
we batch all the dead function removals so we can call
removeInternalRefEdges() just once. This requires some changes to
callers of removeDeadFunction() to not actually erase the function from
the module, but defer it to when we batch delete dead functions at the
end of the CGSCC run, leaving the function body as "unreachable" in the
meantime. We still need to ensure that call edges are accurate. I had
also tried deleting dead functions after visiting a RefSCC, but deleting
them all at once at the end was simpler.

Many test changes are due to not performing unnecessary revisits of an
SCC (the CGSCC infrastructure deems ref edge refinements as unimportant
when it comes to revisiting SCCs, although that seems to not be
consistently true given these changes) because we don't remove some ref
edges. Specifically for devirt-invalidated.ll this seems to expose an
inlining order issue with the inliner. Probably unimportant for this
type of intentionally weird call graph.

Compile time:
https://llvm-compile-time-tracker.com/compare.php?from=6f2c61071c274a1b5e212e6ad4114641ec7c7fc3&to=b08c90d05e290dd065755ea776ceaf1420680224&stat=instructions:u
We hit this downstream and the only evidence of the mistake was that the
results of `Find` on `SubtargetFeatureKV` were corrupted.
…5076)

The character reduce runtime functions expect a pointer to a scalar
character of the correct length for the result of character reduce. A
descriptor was passed so far. Fix the lowering so a proper temporary is
created and passed to the runtime.
… check (llvm#94920)

Before this PR, clangd forcefully disabled misc-const-correctness in
disableUnusableChecks().

Now we have a FastCheckFilter configuration whose default value
(Strict) also disables it. This patch removes misc-const-correctness
from disableUnusableChecks() so it's possible to enable by setting
FastCheckFilter to None.

Fixes llvm#89758
Remove old usages of GDB Index functions after replacing them with new
ones.
`GetDeclContextDIEs` and `DIEDeclContextsMatch` are unused (possibly
since we added support for simplified template names, but I haven't
checked). `GetDeclContextDIEs` is also very similar (but subtly
different) from `GetDeclContext` and `GetTypeLookupContext`.

I am keeping `GetParentDeclContextDIE` as that one still has some
callers, but I want to look into the possibility of merging it with at
least one of the functions mentioned above.
.altinstructions section contains a list of structures where fields can
have different sizes while other fields could be present or not
depending on the kernel version. Add automatic detection of such
variations and use it by default. The user can still overwrite the
automatic detection with `--alt-inst-has-padlen` and
`--alt-inst-feature-size` options.
This change makes sure the preferred switch condition int type size
remains the same throughout CodeGen optimizations.

The change fixes running several OpenCL applications with -O2 or higher
opt levels, and fixes Basic/stream/stream_max_stmt_exceed.cpp DPC++ E2E
test with -O2.
`convertCallToIndirectCall` applies the PLTCall optimization and returns
an (updated if needed) iterator to the converted call instruction. Since
AArch64 requires to inject additional instructions to implement this
pass, the relevant BasicBlock and an iterator was passed to the
`convertCallToIndirectCall`.

`NumCallsOptimized` is updated only on successful application of the
pass.

Tests:
- Inputs/plt-tailcall.c: an example of a tail call optimized PLT call.
- AArch64/plt-call.test: it is the actual A64 test, that runs the
PLTCall optimization on the above input file and verifies the
application of the pass to the calls: 'printf' and 'puts'.
Avoid wastefully setting CanVecMem in several places in analyzeLoop,
complicating the logic, to get the function to return a bool, and set
CanVecMem in the caller.
std::list default-constructs itself as an empty list, so we don't need
to call ValueData.clear() in the constructor.
…pointee types (llvm#94952)

This PR is a tweak to ensure that DuplicatesTracker is working with
TypedPointers pointee types rather than with original llvm's untyped
pointers. This enforces DuplicatesTracker promise to avoid emission of
several identical OpTypePointer instructions.
…tions (llvm#95055)

This PR implements insertion of OpGenericCastToPtr using builtin
functions (both opencl `to_global|local|private` and `__spirv_`
wrappers), and improves type inference.
…#95054)

As stated in `UnwindInfoSectionImpl::prepareRelocations`'s comments, the
unwind info uses section+addend relocations for personality functions
defined in the same file as the function itself. As personality
functions are always accessed via the GOT, we need to resolve those to a
symbol. Previously, we did this by keeping a map which resolves these to
symbols, creating a synthetic symbol if we didn't find it in the map.

This approach has an issue: if we process the object file containing the
personality function before any external uses, the entry in the map
remains unpopulated, so we create a synthetic symbol and a corresponding
GOT entry. If we encounter a relocation to it in a later file which
requires GOT (such as in `__eh_frame`), we add that symbol to the GOT,
too, effectively creating two entries which point to the same piece of
code.

This commit fixes that by searching the personality function's section
for a symbol at that offset which already has a GOT entry, and only
creating a synthetic symbol if there is none. As all non-unwind sections
are already processed by this point, it ensures no duplication.

This should only really affect our tests (and make them clearer), as
personality functions are usually defined in platform runtime libraries.
Or even if they are local, they are likely not in the first object file
to be linked.
This PR improves legalization process of SPIR-V instructions. Namely, it
introduces validation and fixing of bit width of scalar registers as a
part of pre-legalizer. A test case is added that demonstrates ability to
legalize instructions with non 8/16/32/64 bit width both with and
without vendor-specific SPIR-V extension
(SPV_INTEL_arbitrary_precision_integers). In the case of absence of the
extension, a generated SPIR-V code will fallback to 8/16/32/64 bit width
in OpTypeInt, but SPIR-V Backend still is able to legalize operations
with original integer sizes.
…ithTypeAndScope (llvm#95146)

`thread step-in` (and other step commands) take a `<thread-index>`, not a `<thread-id>`.
…m#94996)

This avoids breaking code that should arguably be valid but technically
isn't after enforcing the constraints on shared_ptr's constructors. A
new LWG issue was filed to fix this in the Standard.

This patch applies the expected resolution of this issue to avoid
flip-flopping users whose code should always be considered valid.

See llvm#93071 for more context.
ldionne and others added 9 commits June 11, 2024 16:48
Instead of hardcoding a loop for small strings, always call
char_traits::compare which ends up desugaring to __builtin_memcmp.

Note that the original code dates back 11 years, when we didn't lower to
intrinsics in `char_traits::compare`.

Fixes llvm#94222
It looks like the last references got removed in c747bd0.
It removed a __zero() function, which was probably created at
some point in the ancient past to optimize copying the string 
representation. The __zero() function got simplified to an 
assignment as part of making string constexpr, rendering this 
code unnecessary.
llvm#94846)

The function that calculated the declaration context for a DIE was incorrectly transparently traversing acrosss DW_TAG_subprogram dies when climbing the parent DIE chain. This meant that types defined in functions would appear to have the declaration context of anything above the function. I fixed the GetTypeLookupContextImpl(...) function in DWARFDIE.cpp to not transparently skip over functions, lexical blocks and inlined functions and compile and type units. Added a test to verify things are working.
These are the HLSL specific fixes from llvm#93193. Thanks klensy!
Some of the options only fed into the full sparse pipeline. However,
some backends prefer to use the sparse minipipeline. This change exposes
some important optimization flags to the pass as well. This prepares
some SIMDization of PyTorch sparsified code.
…ereferencing pointer to pointers. llvm#94100" (llvm#95174)

The option is causing the binary output to be different when compiled
under `-O0`, because it introduce dbg.declare on pseudovariables. Going
to change this implementation to use dbg.value instead.
BytesInBG is always greater or equal to BG->BytesInBGAtLastCheckpoint.

Note that the bug led to unnecessary attempts of page releasing and
doesn't have critical impact on the correctness.
Base automatically changed from bump_to_d5863721 to feature/fused-ops September 11, 2024 12:07
@mgehre-amd mgehre-amd merged commit e6eae35 into feature/fused-ops Sep 11, 2024
11 checks passed
@mgehre-amd mgehre-amd deleted the bump_to_cc04bbb2 branch September 11, 2024 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.