While initializing the LDS pointers within entry basic block of kernel(s), make
sure that the entry basic block is split after alloca instructions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp | ||
---|---|---|
196 | Wouldn't this break if the entry block contains something that needs the lds pointer initialized, followed by an alloca? |
llvm/lib/Target/AMDGPU/AMDGPUReplaceLDSUseWithPointer.cpp | ||
---|---|---|
196 | Only such possibility for breaking is - there is a call to non-kernel function foo() before alloca, and foo() uses LDS. But, the assumption here is that usually won't happen, because allocas are usually put at the beggining of the entry block. |
Only such possibility for breaking is - there is a call to non-kernel function foo() before alloca, and foo() uses LDS. But, the assumption here is that usually won't happen, because allocas are usually put at the beggining of the entry block.
I've seen alloca in blocks other than the entry block after inlining. I think I've seen a function call followed by an alloca as well. I can't think of a reason why that would be invalid IR, and I think it would be possible to set up a series of passes that create it. Could you add a (handwritten) test case with the pattern that is miscompiled?
In such a case we could move the alloca to the start of the basic block. We might actually want to move alloca to the (start of the) entry block in general for amdgpu as (I think, it's been a few months) we can only lower them in the entry block.
I think we discussed these items in our last Monday's internal weekly meeting. I am recapping it here again:
- First, the entry block splitting here should happen after all the allocas which are inserted at the beginning of the block before any other non-alloca instructions which actually make sense. Note down the word beginning of the block here. But, this will not guarentee that split is always happened after alloca. As Jon mentioned, there could be a call to function foo(), and foo() has allocas, and which is inlined so that these allocas do appear after split and which is perfectly legal. But, AMDGPU back-end is not handling it correctly at the moment which is a bug in AMDGPU back-end.
- Second, root-cause why AMDGPU back-end is not able to handle allocas which are not inserted at the beginning of the block and fix it.
Allocas can appear anywhere in the block. They do not have to be clustered at the beginning. As long as it's in the entry block, it doesn't look like a dynamic alloca
It's legal for an alloca to appear anywhere but this patch doesn't need to worry about it. If you want optimal code, you're expected to put them at the start of the entry block. Since this is just an optimization, this only needs to concern itself with these allocas. Any problems with allocas in other positions is an unrelated problem
Wouldn't this break if the entry block contains something that needs the lds pointer initialized, followed by an alloca?