Sinking must check for interference between the block prologue
and the instruction being sunk.
Specifically check for clobbering of uses by the prologue, and
overwrites to prologue defined registers by the sunk instruction.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
What do you mean by "block prologue"? Can you tell me which of the 2 COPYs in your test is falsely moved with your test and which instruction clobbers the register?
(Intuitively I would expect a live-in value to be available after all PHI instructions so I don't see why there can be a clobber when something is inserted at the beginning of the basic block...)
I have pre-committed the test, so the offending sink should now be obvious.
This occurs because SkipPHIsAndLabels skips block prologue instructions as determined by the target, something which is not obvious from the function name.
On AMDGPU the S_AND_SAVEEXEC_B64 is considered part of the block prologue because it is setting up the correct execution mask for the instructions in the block; however RA has assigned the register $sgpr4 to the destination of this instruction, effectively clobbering the source of the COPY being sunk.
I agree this issue should be fixed here. Machine sinking should check for register dependency between the sunk instruction and the prologue instruction in the successor block.
But I think there are two kinds of register dependency need to be checked:
a.) the definition of a register which would be used in successor prologue instruction. b.) the instruction which has a source physical register being overwritten by successor prologue instruction.
I think you are fixing the second kind of dependency currently. Maybe check the first kind of dependency in the same change?
The first kind of def-use dependency should also apply to the pre-RA sinking.
I am not sure whether other uses of SkipPHIsXXX() in the backend would also need such kind of register dependency check, but I think that needs to be investigated separately.
Aside from this, whether we should treat a S_AND_XXX exec as prologue might change in the future. IMO, it is only S_OR_XXX exec like instructions which would activating more threads should really be treated as prologue instruction.
| llvm/lib/CodeGen/MachineSink.cpp | ||
|---|---|---|
| 1855–1870 | Is there not a helper for this somewhere? | |
Presumably (a) only applies to pre-RA sinking? As post-RA everything should be physical registers?
At the moment I think we are getting away without checking because most virtual registers are still unique from SSA form.
However, I agree the risk exists.
I am not sure whether other uses of SkipPHIsXXX() in the backend would also need such kind of register dependency check, but I think that needs to be investigated separately.
Agreed.
Aside from this, whether we should treat a S_AND_XXX exec as prologue might change in the future. IMO, it is only S_OR_XXX exec like instructions which would activating more threads should really be treated as prologue instruction.
Yes, it would be preferable to narrow the definition of prologue instructions in the long run.
Try this. I don't know what would be the most likely pattern in real cases, this is just to show that sinking $sgpr0_sgpr1 definition instruction into successor block would break the register dependency of type (a).
---
name:            _amdgpu_ps_main
alignment:       1
tracksRegLiveness: true
registers:       []
liveins:
  - { reg: '$sgpr4', virtual-reg: '' }
body:             |
  bb.0:
    successors: %bb.1(0x80000000)
    liveins: $sgpr4, $sgpr5
    renamable $sgpr9 = COPY $sgpr4
    renamable $vgpr5 = IMPLICIT_DEF
    renamable $sgpr0_sgpr1 = COPY $sgpr4_sgpr5
    S_BRANCH %bb.1
  bb.1:
    successors: %bb.2(0x40000000), %bb.8(0x40000000)
    liveins: $sgpr6, $sgpr9, $sgpr0_sgpr1
    $sgpr4_sgpr5 = S_AND_SAVEEXEC_B64 $sgpr0_sgpr1, implicit-def $exec, implicit-def $scc, implicit $exec
    renamable $sgpr14_sgpr15 = S_XOR_B64 $exec, killed renamable $sgpr4_sgpr5, implicit-def dead $scc
    S_CBRANCH_EXECZ %bb.8, implicit $exec
    S_BRANCH %bb.2
  bb.2:
    successors: %bb.8(0x40000000)
    liveins: $sgpr6
    $m0 = COPY killed renamable $sgpr6
    S_BRANCH %bb.8
  bb.8:
    S_ENDPGM 0
...- Generalize to any target defined block prologue instruction interfering with a sink candidate
- Extend to pre-RA machine sink
Thanks for this example. I used this as a basis for a test case, but used separate sources for the two COPY instructions.
| llvm/lib/CodeGen/MachineSink.cpp | ||
|---|---|---|
| 1855–1870 | I could not see immediately find one. | |
Thanks for the change, mostly looks good to me. with only a few minor comments.
| llvm/lib/CodeGen/MachineSink.cpp | ||
|---|---|---|
| 1307 | maybe start from BB->getFirstNonPHI()? so you don't need to pass PHI to isBasicBlockPrologue(). | |
| 1324 | Could you also return true if PI->modifiesRegister()? This is mainly for case like the sunk instruction define whole piece of the register, but there is some prologue instruction overwrite the sub-register. With this check, we would be much like what hasRegisterDependency() has done. | |
| 1851 | May be better to move this log after the interference check? | |
- Address case of sunk instruction clobbering register partially defined in block prologue
To the best of my knowledge, this should work. But please wait one or two days in case others have more comment. Please also update commit message before push.
| llvm/lib/CodeGen/MachineSink.cpp | ||
|---|---|---|
| 1332 | auto *DefOp = PI->findRegisterDefOperand(...); if (DefOp && !DefOp->isDead()) return true; | |
Thanks!
I won't submit this for at least 72 hours.
| llvm/lib/CodeGen/MachineSink.cpp | ||
|---|---|---|
| 1324 | Yes, I guess this edge case exists. | |
| 1851 | I don't think that makes sense, then the debug output will not indicate what was rejected due to interference. The pre-RA sink outputs this information before performing its checks. | |
maybe start from BB->getFirstNonPHI()? so you don't need to pass PHI to isBasicBlockPrologue().