Ensure we only put the exec modification in a terminator instruction.
The trickiest part of this was dealing with SI_KILL_CLEANUP, which I
don't fully understand. It tries to preserve it as a terminator and
avoids removing it.
Paths
| Differential D87543
AMDGPU: Always split si_end_cf blocks AbandonedPublic Authored by arsenm on Sep 11 2020, 12:35 PM.
Details
Diff Detail Event Timelinearsenm added a parent revision: D87542: AMDGPU: Don't sometimes allow instructions before lowered si_end_cf.Sep 11 2020, 12:35 PM Comment Actions LGTM. SI_KILL_CLEANUP pseudos are inserted to mark points where control flow merges and hence the exec mask can be evaluated for early termination of a pixel shader. This revision is now accepted and ready to land.Sep 24 2020, 2:08 AM Comment Actions It's unclear to me what this is trying to achieve. If it is to prevent bb: <-- reload inserted here during live range splitting $exec = S_OR_B64 $exec, %other ... rest of code ... ... then this change only replaces it by: bb: <-- reload inserted here during live range splitting $exec = S_OR_B64_term $exec, %other // fallthrough bb.new: ... rest of code ... The inserted reload code is as incorrect as it was before. This revision now requires changes to proceed.Sep 30 2020, 7:55 AM Comment Actions
I'm not trying to fully solve the live range splitting problem greedy regalloc hits. I'm trying to eliminate the isBasicBlockPrologue concept that fastregalloc trips over when inserting spills at the beginning of the block Comment Actions
What if the concept of a basic block prolog *is* the correct long term solution? Comment Actions
I don't think it is a well formed concept. The operations the prolog inputs depend on transitively turn into prolog instructions, which generally breaks down. Comment Actions
I still not not see how you can get away without isBasicBlockPrologue. I can see how splitting can help with it, but not without. You can split everything and have S_OR the only instruction, but that does not prevent RA from inserting a reload right before it into the same BB.
Revision Contents
Diff 291313 llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
llvm/lib/Target/AMDGPU/SIInstructions.td
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.div.fmas.ll
llvm/test/CodeGen/AMDGPU/atomic_optimizations_pixelshader.ll
llvm/test/CodeGen/AMDGPU/collapse-endcf.mir
llvm/test/CodeGen/AMDGPU/multilevel-break.ll
llvm/test/CodeGen/AMDGPU/sdiv64.ll
llvm/test/CodeGen/AMDGPU/srem64.ll
llvm/test/CodeGen/AMDGPU/transform-block-with-return-to-epilog.ll
llvm/test/CodeGen/AMDGPU/udiv64.ll
llvm/test/CodeGen/AMDGPU/urem64.ll
|