forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AutoBump] Merge with 4b7f07a0 (Aug 27) (12) #365
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Comparison operations regression tests, from the original larger PR that has been broken down: llvm#92272 --------- Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Co-authored-by: Andrzej Warzyński <andrzej.warzynski@gmail.com>
S.substr(N) is simpler than S.slice(N, StringRef::npos) and S.slice(N, S.size()). Also, substr is probably better recognizable than slice thanks to std::string_view::substr.
Use existing helper.
llvm#106000) This reverts commit 33f3ebc.
The helper can simply use VPRecipeBuilder::Plan.
I'm planning to change the inner loop to a range-based for loop.
…#105845)" (llvm#106000)" (llvm#106001) This reverts commit 4b6c064. Add a requirement for an amdgpu target in the test.
…/store on RV32. (llvm#105874) In order to support -unaligned-scalar-mem properly, we need to be more careful with immediates of global variables. We need to guarantee that adding 4 in RISCVExpandingPseudos won't overflow simm12. Since we don't know what the simm12 is until link time, the only way to guarantee this is to make sure the base address is at least 8 byte aligned. There were also several corner cases bugs in immediate folding where we would fold an immediate in the range [2044,2047] where adding 4 would overflow. These are not related to unaligned-scalar-mem.
…I.liveins(). NFC MachineRegisterInfo::liveins returns std::pair<MCRegister, Register>. Don't convert to std::pair<unsigned, unsigned>.
…5554) The stub class for `FloatType` is present in `ir.pyi`, but it is missing from the `__all__` export list.
This is a fix forward for the issue introduced in llvm#104523.
…en constructing the debug varaible for __coro_frame (llvm#105626) As the title mentioned, do not search for the DILocalVariable for __promise when constructing the debug variable for __coro_frame. This should make sense because the debug variable of `__coro_frame` shouldn't dependent on the debug variable of `__promise`. And actually, it is not. Currently, we search the debug variable for `__promise` only because we want to get the debug location and the debug scope for the `__promise`. However, we can construct the debug location directly from the debug scope of the being compiled function. Then it is not necessary any more to search the `__promise` variable. And this patch makes the codes to construct the debug variable for `__coro_frame` to be more stable. Now we will always be able to construct the debug variable for the coroutine frame no matter if we found the debug variable for the __promise or not. This patch is not strictly NFC but it is intended to not affect any end users.
…ltiple-modules' As the title shows.
…lvm#87265) This PR introduces new pass "amdgpu-sw-lower-lds". This pass lowers the local data store, LDS, uses in kernel and non-kernel functions in module to use dynamically allocated global memory. Packed LDS Layout is emulated in the global memory. The lowered memory instructions from LDS to global memory are then instrumented for address sanitizer, to catch addressing errors. This pass only work when address sanitizer has been enabled and has instrumented the IR. It identifies that IR has been instrumented using "nosanitize_address" module flag. For a kernel, LDS access can be static or dynamic which are direct (accessed within kernel) and indirect (accessed through non-kernels). **Replacement of Kernel LDS accesses:** - All the LDS accesses corresponding to kernel will be packed together, where all static LDS accesses will be allocated first and then dynamic LDS follows. The total size with alignment is calculated. A new LDS global will be created for the kernel called "SW LDS" and it will have the attribute "amdgpu-lds-size" attached with value of the size calculated. All the LDS accesses in the module will be replaced by GEP with offset into the "Sw LDS". - A new "llvm.amdgcn.<kernel>.dynlds" is created per kernel accessing the dynamic LDS. This will be marked used by kernel and will have MD_absolue_symbol metadata set to total static LDS size, Since dynamic LDS allocation starts after all static LDS allocation. - A device global memory equal to the total LDS size will be allocated. At the prologue of the kernel, a single work-item from the work-group, does a "malloc" and stores the pointer of the allocation in "SW LDS". To store the offsets corresponding to all LDS accesses, another global variable is created which will be called "SW LDS metadata" in this pass. - **SW LDS:** It is LDS global of ptr type with name "llvm.amdgcn.sw.lds.<kernel-name>". - **SW LDS Metadata:** It is of struct type, with n members. n equals the number of LDS globals accessed by the kernel(direct and indirect). Each member of struct is another struct of type {i32, i32, i32}. First member corresponds to offset, second member corresponds to size of LDS global being replaced and third represents the total aligned size. It will have name "llvm.amdgcn.sw.lds.<kernel-name>.md". This global will have an intializer with static LDS related offsets and sizes initialized. But for dynamic LDS related entries, offsets will be intialized to previous static LDS allocation end offset. Sizes for them will be zero initially. These dynamic LDS offset and size values will be updated with in the kernel, since kernel can read the dynamic LDS size allocation done at runtime with query to "hidden_dynamic_lds_size" hidden kernel argument. - At the epilogue of kernel, allocated memory would be made free by the same single work-item. **Replacement of non-kernel LDS accesses:** - Multiple kernels can access the same non-kernel function. All the kernels accessing LDS through non-kernels are sorted and assigned a kernel-id. All the LDS globals accessed by non-kernels are sorted. - This information is used to build two tables: - **Base table:** Base table will have single row, with elements of the row placed as per kernel ID. Each element in the row corresponds to ptr of "SW LDS" variable created for that kernel. - **Offset table:** Offset table will have multiple rows and columns. Rows are assumed to be from 0 to (n-1). n is total number of kernels accessing the LDS through non-kernels. Each row will have m elements. m is the total number of unique LDS globals accessed by all non-kernels. Each element in the row correspond to the ptr of the replacement of LDS global done by that particular kernel. - A LDS variable in non-kernel will be replaced based on the information from base and offset tables. Based on kernel-id query, ptr of "SW LDS" for that corresponding kernel is obtained from base table. The Offset into the base "SW LDS" is obtained from corresponding element in offset table. With this information, replacement value is obtained.
/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: error: moving a local object in a return statement prevents copy elision [-Werror,-Wpessimizing-move] return std::move(OrderedKernels); ^ /llvm-project/llvm/lib/Target/AMDGPU/AMDGPUSwLowerLDS.cpp:260:10: note: remove std::move call here return std::move(OrderedKernels); ^~~~~~~~~~ ~ 1 error generated.
…mbdas (llvm#105999) Fixes llvm#104722. Missed handling `decltype(auto)` trailing return types for lambdas. This was a mistake and regression on my part with my PR, llvm#104722. Added some missing unit tests to test for the various placeholder trailing return types in lambdas.
… in compiler-rt with lit internal shell (llvm#105917) There are several files in the compiler-rt subproject that have command not found errors. This patch uses the `env` command to properly set the environment variables correctly when using the lit internal shell. fixes: llvm#102395 [This change is relevant [RFC] Enabling the lit internal shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179)
…ce. NFC This matches copyPhysReg.
…arameter packs (llvm#102131) We established an instantiation scope in order for constraint equivalence checking to properly map the uninstantiated parameters. That mechanism mapped even packs to themselves. Consequently, parameter packs e.g. appearing in a function call, were not expanded. So they would end up becoming `SubstTemplateTypeParmPackType`s that circularly depend on the canonical declaration of the function template, which is not yet determined, hence the spurious error. No release note as I plan to backport it to 19. Fixes llvm#101735 --------- Co-authored-by: cor3ntin <corentinjabot@gmail.com>
…6039) Fixes linking error in llvm CI: "AMDGPUSwLowerLDS::run()': AMDGPUSwLowerLDS.cpp:(.text._ZN12_GLOBAL__N_116AMDGPUSwLowerLDS3runEv+0x164): undefined reference to `llvm::getAddressSanitizerParams(llvm::Triple const&, int, bool, unsigned long*, int*, bool*)'" llvm#87265 amdgpu-sw-lower-lds pass uses getAddressSanitizerParams method from AddressSanitizer pass. It misses linking of LLVMInstrumentation to AMDGPUCodegen. This PR adds it.
Take the intersection of the existing range attribute for the return value and the inferred range.
…lvm#104941) getMaskedTypeForICmpPair() tries to model non-and operands as x & -1. However, this can end up confusing the matching logic, by picking the -1 operand as the "common" operand, resulting in a successful, but useless, match. This is what causes commutation failures for some of the optimizations driven by this function. Fix this by treating a match against -1 as a non-match.
…lvm#104788) This is a followup to llvm#104579 to remove the limitation on sinking loads/stores of allocas entirely, even if this would introduce a phi node. Nowadays, SROA supports speculating load/store over select/phi. Additionally, SimplifyCFG with sinking only runs at the end of the function simplification pipeline, after SROA. I checked that the two tests modified here still successfully SROA after the SimplifyCFG transform. We should, however, keep the limitation on lifetime intrinsics. SROA does not have speculation support for these, and I've also found that the way these are handled in the backend is very problematic (llvm#104776), so I think we should leave them alone.
The legacy cost model in some parts checks if any of the operands are constants via SCEV. Update VPlan construction to replace live-ins that are constants via SCEV with such constants. This means VPlans (and codegen) reflects what we computing the cost of and removes another case where the legacy and VPlan cost model diverged. Fixes llvm#105722.
This PR enables "amdgpu-sw-lower-lds" pass in the pipeline. Also introduces "amdgpu-enable-sw-lower-lds" cmd line flag to enbale/disable the pass.
The renamable flag is useful during MachineCopyPropagation but renamable flag will be dropped after lowerCopy in some case. This patch introduces extra arguments to pass the renamable flag to copyPhysReg.
While working on a MIR unittest, I noticed that parseMIR includes an unused argument that sets a function name. This is not only redundant but also irrelevant, as parseMIR is designed to parse entire module, not specific functions, even though most unittests contain a single function per module. To streamline the API, I have removed this unnecessary argument from parseMIR. However, if this argument was originally included to enhance readability or for any other purpose, please let me know.
add f8E5M2 and tests for np_to_memref --------- Co-authored-by: Zhicheng Xiong <zhichengx@dc2-sim-c01-215.nvidia.com>
TSAN warns that `ptr` is read and write without protection in `clearExpiredEntries` and in the destructor of `Owner`. Add an atomic bool to synchronize these without incurring a cost when calling `get`.
…rt command in lit's internal shell (llvm#105961) This patch fixes the incorrect usage of lit's built-in `export` command. There is a typo in raising the error itself where the error being raised had the wrong number of parameters passed in. Fixes llvm#102386.
… tests (llvm#105754) This patch rewrites tests in clang and compiler-rt that uses bash command substitution syntax $() to execute the dirname command. This is done so that the tests can be run using lit's internal shell. Fixes llvm#102384.
…sts with lit internal shell (llvm#105729) This patch addresses compatibility issues with the lit internal shell by removing the use of subshell execution (parentheses and subshell syntax) in the `merge-posix.test` and `vptr.cpp` tests. The lit internal shell does not support parentheses, so the tests have been refactored to use separate command invocations. This change is relevant for enabling the lit internal shell by default, as outlined in [[RFC] Enabling the Lit Internal Shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179) fixes: llvm#102401
…ompatibility (llvm#106115) This patch addresses compatibility issues with the lit internal shell by expanding and rewriting test scripts in the compiler-rt subproject. These changes were prompted by the FileNotFound unresolved errors encountered during the testing process, specifically when running the command `LIT_USE_INTERNAL_SHELL=1 ninja check compiler-rt`. **Why the error occurred:** The error occurred because the original test scripts used process substitution `(<(...))` in their diff commands. Process substitution creates temporary files or FIFOs to hold command output, and these are then passed to `diff`. However, the lit internal shell, which is more limited than a typical shell like `bash`, does not support process substitution. When lit tries to execute these commands, it is unable to create or access the temporary files or FIFOs generated by process substitution. As a result, lit attempts to open a file or directory that doesn't exist, leading to the `FileNotFoundError`. **Changes Made:** - Instead of using process substitution, the commands now explicitly redirect the output of `llvm-profdata show` to temporary files before performing the `diff` comparison. This ensures that the lit internal shell can correctly find and open these files, resolving the `FileNotFoundError`. [This change is relevant [RFC] Enabling the lit internal shell by Default](https://discourse.llvm.org/t/rfc-enabling-the-lit-internal-shell-by-default/80179) fixes: llvm#106111
…or-loop (llvm#106150) This patch adds `REQUIRES: shell` to the `focus-function.test` because the lit internal shell does not support the for loop syntax. This will make the test file unsupported when running llvm-lit with its internal shell implementation, which is enabled by turning on the `LIT_USE_INTERNAL_SHELL=1`. fixes: llvm#106111
By default, type legalization will try to promote the build_vector, but that generic type legalizer doesn't support that. Bitcast to vXi16 instead. Same as what we do for vXf16 without Zfhmin. Fixes llvm#100846.
Add NotConstant(Null) roots for nonnull arguments and then propagate them through nuw/inbounds GEPs. Having this functionality in SCCP is useful because it allows reliably eliminating null comparisons, independently of how deeply nested they are in selects/phis. This handles cases that would hit a cutoff in ValueTracking otherwise. The implementation is something of a MVP, there are a number of obvious extensions (e.g. allocas are also non-null).
…icit-bool-conversion (llvm#104882) When readability-implicit-bool-conversion-check and readability-uppercase-literal-suffix-check is enabled this will cause you to apply a fix twice from (!i) -> (i == 0u) to (i == 0U) twice instead will skip the middle one Adding this option allows this check to be in sync with readability-uppercase-literal-suffix, avoiding duplicate warnings. Fixes llvm#40544
…lvm#104785) There was some inconsistency with ConvertVectorToLLVM Pass builder, files and option names. This patch aims to move all occurences to ConvertVectorToLLVM.
OpenCL's vload_half builtin expects two arguments, but the current TableGen definition expects three. This change fixes the mismatch and adds a test to check this.
…lvm#105663) Use SmallVectorImpl instead of SmallVector for function arguments to give the caller greater flexibility in choice of initial size.
cferry-AMD
approved these changes
Sep 30, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.