Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Is this the proposed fix for the missing lowering of the LDS in the following?
@lds = addrspace(3) global float undef, align 8 @gptr = addrspace(1) global i64* addrspacecast (float addrspace(3)* @lds to i64*), align 8 @llvm.used = appending global [2 x i8*] [i8* addrspacecast (i8 addrspace(3)* bitcast (float addrspace(3)* @lds to i8 addrspace(3)*) to i8*), i8* addrspacecast (i8 addrspace(1)* bitcast (i64* addrspace(1)* @gptr to i8 addrspace(1)*) to i8*)], section "llvm.metadata" define void @f0() { %ld = load i64*, i64* addrspace(1)* @gptr ret void }
For that test, I note that commenting out the llvm.used statement results in the variable being lowered. I.e. the following works fine:
@lds = addrspace(3) global float undef, align 8 @gptr = addrspace(1) global i64* addrspacecast (float addrspace(3)* @lds to i64*), align 8 define void @f0() { %ld = load i64*, i64* addrspace(1)* @gptr ret void }
It's therefore likely that the missed handling is related to skipping uses from the .used list. This change seems to be a complete rewrite of the algorithm in shouldLowerLDSToStruct. I'm not sure we have the test coverage to be confident that a different algorithm will work, can we fix the .used oversight directly instead?
If the current logic of shouldLowerLDSToStruct() is more robust compere to earlier ones, then, why cannot we use it, instead of patching-up earlier ones?
I spent dedicated time to understand the shortcomings of shouldLowerLDSToStruct() and tried to overcome it in this patch. Please take a look at newly added tests.
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
64 | .equals(). I don't think == does what you want. | |
124–125 | It seems to miss the case when a same constant used by multiple kernels if I am not missing something. We could fix it by converting constant to an instruction but not doing it yet. You even had a utility function for that in some other patch. | |
132 | .equals(). |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
124–125 | That is correct - The intention of this patch is to fix bug within shouldLowerLDSToStruct() by making sure that it will properly analyse the uses of LDS to decide if it is to be lowered or not (for both module lds lowering and kernel lds lowering), and also to make the code bit programmer freindly for understanding it. I implemented a separate patch to handle constants users of LDS for kernel lds lowering (not yet uploaded). I was trying resolve a minor bug where in some cases, LDS was not getting erased from the module. But, now looking at your new patch https://reviews.llvm.org/D103655, looks like I was missing the statement - GV->removeDeadConstantUsers(); But, anyway, let's review the patch https://reviews.llvm.org/D103655. |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
124–125 | Yeah, it took me a good couple of hours to come up with this statement to fix that. |
By current, you mean the code in this patch? It's not obvious to me that this patch is correct. I'm suspicious of branchy code with lots of comments. It looks like other people are already reviewing it, so hopefully they'll catch any other shortcomings.
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
64 | I do not see StringRef::operator==() in the documentation? |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
64 | @rampitec Here is the StringRef::operator==() - https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/StringRef.h#L897 It infact internally calls .equals() only. Anyway, I think, it is fine either way except that .equals() is bit verbose, but, I kept .equals() only for now. |
Rebased, and tried my best to minimize the code changes within shouldLowerLDSToStruct()
based on recent patch https://reviews.llvm.org/D103655.
If the user of LDS is neither global variable nor instruction, then it should be a constant.
Can you add tests for GlobalAlias? It is not immediately clear if these are handled properly.
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
76 | You do not need to cast here, just append U->users(). | |
100–106 | stripPointerCasts() still makes sense. | |
126 | No need to cast, you are adding all users anyway. You might want an assert(isa<Constant>(V)) though. | |
137 | How can you get here? You should have inspected all the users by this time and met Instruction uses if any. |
I have not yet verified the GlobalAlias. Let me look into it.
GlobalAlias is handled and new lit test is added. Please have a look at lower-module-lds-global-alias.ll.
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
100–106 | No, as I understand it, we should be very carefull when using stripPointerCasts(), otherwise, it will lead to unexpected and surprising behavior. For exmaple, in this case, let's try below simple example. Here, only user of "@lds.1" is bitcast inst within kernel. Now, when we are analysing this user, we apply stripPointerCasts(), which will return back "@lds.1" itself, and since it is a global variable, the IF condition satisfies, and we incorrectly land within IF block, where we did not want, which leads to incorrect transormation. @lds.1 = internal unnamed_addr addrspace(3) global [1 x i8] undef, align 1 define amdgpu_kernel void @k0() { %bc = bitcast [1 x i8] addrspace(3)* @lds.1 to i8 addrspace(3)* store i8 1, i8 addrspace(3)* %bc, align 1 ret void } | |
137 | Please note that the global(s) that we are trying to inspect here are not lds (GV) itself, but, they are the ones which use lds (GV) as an initializer in the global scope or they are special llvm.used. We collect such globals which use lds (GV), and decide whether to lower or not based on the use of those globals. |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
100–106 | Ack. | |
137 |
Ack. Makes sense. |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
65 | For future reference, SetVector can be useful in cases like this, so that you don't need to declare both a set and a vector and keep them in sync. | |
137 | Could "return true" early here if any G has instruction users, or use std::any_of which will return early for you. |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
137 | Correct. Since it does not break any semantics, I will keep it for now, and make suggested code change when I submit any other related patch in near future (we definetely need few more patches sooner than later). |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
127 | After this patch, all users are added. Before it, only those that were not already in the Visited set were added. On the face of it, this seems to discard the guard against cycles. This patch also leaves the Visited set dead (values are inserted into it, but never read). I don't see any discussion in this patch about removing that guard and it's strange that the set was left in place if that was intentional. Is there a reason cycles cannot occur here? |
llvm/lib/Target/AMDGPU/Utils/AMDGPULDSUtils.cpp | ||
---|---|---|
127 | We might be OK - uses can definitely be cyclic, e.g. |
.equals(). I don't think == does what you want.