Frame-base materialization may insert vector instructions before EXEC is initialised.
Fix this by moving lowering of llvm.amdgcn.init.exec later in backend.
Also remove SI_INIT_EXEC_LO pseudo as this is not necessary.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1887–1893 ↗ | (On Diff #316551) | We shouldn't need to do ad-hoc use/def searches. Can you do this pre-RA, post-SSA like SILowerControlFlow? |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1887–1893 ↗ | (On Diff #316551) | We definitely could do. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1916 ↗ | (On Diff #316551) | In case the FirstMI is not null, which means a COPY instruction to copy from input register was not eliminated for some reason, we need to move the COPY instruction to the beginning of the block and insert exec initialization right after that. Because the frame-base materialization instructions may be inserted before the COPY instruction. We need to make sure the exec initialization always be inserted at the entry of the block. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1916 ↗ | (On Diff #316551) | I only partially agree with this. |
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1916 ↗ | (On Diff #316551) | what I said is the most common case at MachineIR. a Copy from funtion input, then init.exec.from.input with the Copy destination. I think no frontend will generate the argument to init.exec.with.input with arbitray operation. I am not saying moving operation before the instruction that generates the input to the operation. The general idea is we need to ensure the instructions to set the EXEC should be before any vector instruction. Based on this, we should move all instructions in the define-use chain of the input to init.exec.with.input to block entry to be before all vector instructions. In practice, your suggestion to use the Copy' source should work because that's the only usage the frontend will generate. The existing implementation before your change already assume Copy is the only possible operation that generates input to this intrinsic. |
BTW I felt the input in the name of llvm.amdgcn.init.exec.from.input some kind of means the argument is the function input argument in LLVM IR. I think we can clarify that the value to this intrinsic should be an function input argument. That is the only requirement I can see currently from frontend. So that we don't need to consider various hard to solve situations. @mareko What do you think?
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
1887–1893 ↗ | (On Diff #316551) |
Can we just implement this in SILowerControlFlow? |
- Add text refining definition of llvm.amdgcn.init.exec.from.input.
- Move code to SILowerControlFlow, this is not really a good place for this, but sufficient for now.
There is still an issue if the SGPR used to hold the input llvm.amdgcn.init.exec.from.input is spilt; however, this is not a new issue.
From my testing llvm.amdgcn.init.exec.from.input actually only worked in the entry block previous to this change, so we could tighten its description even further.
There is still an issue if the SGPR used to hold the input llvm.amdgcn.init.exec.from.input is spilt; however, this is not a new issue.
From my testing llvm.amdgcn.init.exec.from.input actually only worked in the entry block previous to this change, so we could tighten its description even further.
I am not sure what the problem is. May be we can fix it later. But I don't want to restrict it can only be used in the entry block for now unless we later prove that is really hard to make it correct. We may possibly use it for the second part of the merged shader.
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp | ||
---|---|---|
690–715 | I think this piece of code can be simplified something like below: Register InputReg = MI.getOperand(0).getReg(); MachineInstr *FirstMI = &*MBB->begin(); if (InputReg.isVirtual()) { MachineInstr *DefInstr = MRI->getVRegDef(InputReg); assert(DefInstr && DefInstr->isCopy()); if (DefInstr->getParent() == MBB && DefInstr != FirstMI) { // If the `InputReg` is defined in current block, we also need to // move that instruction to the beginning of the block. DefInstr->removeFromParent(); MBB->insert(FirstMI, DefInstr); if (LIS) LIS->handleMove(*DefInstr); } } | |
754 | Maybe also LIS->removeAllRegUnitsForPhysReg(AMDGPU::EXEC);? This also applies to SI_INIT_EXEC. |
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp | ||
---|---|---|
690–715 | Yes, that need to be fixed. I have not tested the code. |
Thanks for the patch. Basically LGTM with some minor comments.
Fix this by moving lowering of llvm.amdgcn.init.exec post-RA.
The commit message need slightly updated as this is not post-RA anymore.
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp | ||
---|---|---|
746 | I am not sure whether you like to put this under an if (InputReg.isVirtual()) condition. I guess we may only meet issue when moving this piece of code or the pass around. | |
llvm/test/CodeGen/AMDGPU/llvm.amdgcn.init.exec.ll | ||
171–172 | There are some testing failure for Windows, seems dynamic alloca not correctly handled for GlobalISel path under Windows. I think we can simply move these two allocas to entry block. The case will be used to only check llvm.amdgcn.init.exec.from.input works correctly when placing in non-entry block. Do you have any other idea? |
LGTM, Let's wait some time to see if anybody else has more comments. And make sure to update the commit message before push.
Fix this by moving lowering of llvm.amdgcn.init.exec post-RA
I think this piece of code can be simplified something like below: