"Rename Disconnected Subregister Components " pass marks register operands as dead but does not eliminate them.
Machine Copy Propagation does not expect dead stuff to appear so far downstairs. So, it mistakenly replaces live copy with upper one that is marked as dead.
This patch is temporary workaround until Rename Disconnected Subregister Components is fixed.
Details
- Reviewers
rampitec MatzeB - Commits
- rG28da06778fb1: [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the…
rG3828242c7e65: [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the…
rL317884: [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the…
rL315908: [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the…
Diff Detail
Event Timeline
LGTM, but please also wait for @MatzeB.
In general I believe that "Rename Disconnected Subregister Components" should either eliminate what it has marked dead or have a pass running after it to do so, as MachineDCE will not handle it.
Machine Copy Propagation does not expect dead stuff to appear so far downstairs. So, it mistakenly replaces live copy with upper one that is marked as dead.
This patch is temporary workaround until Rename Disconnected Subregister Components is fixed.
This is first a bug in machine copy propagation, there is simply no rule that dead copies are not allowed. (If you really want to make it a rule then you really should write a MachineVerifier check for it!)
Do you have any experience how common it is for RenameIndependentSubregs to produce dead code? I'd rather keep that code simpler and purely a renaming operation as the name suggests. Do you know whether the situation is already detected by DetectDeadLanes? I'd be more open to add dead code elimination to that pass (as we could run it instead of dead code elimination I believe).
I think a better fix here would be to not add dead copies into the maps in the first place. Changing earlier code to:
if (!MI->getOperand(0).isDead() && !MI->getOperand(1).isUndef()) { // Remember Def is defined by the copy. for (MCSubRegIterator SR(Def, TRI, /*IncludeSelf=*/true); SR.isValid(); ++SR) { CopyMap[*SR] = MI; AvailCopyMap[*SR] = MI; } // Remember source that's copied to Def. Once it's clobbered, then // it's no longer available for copy propagation. RegList &DestList = SrcMap[Src]; if (!is_contained(DestList, Def)) DestList.push_back(Def); }
I also added a check for copy from undef here as that could provoke a similar problem.
This is first a bug in machine copy propagation
Yes. This is a bug. Nevertheless, MCP relies on DCE running before and no dead-producers in between :)
Do you have any experience how common it is for RenameIndependentSubregs to produce dead code?
I have the trash copied in the register instead of actual data and related bug in AMD OCL conformance suite. That's all.
I'd rather keep that code simpler and purely a renaming operation as the name suggests.
In fact you not only rename but change the MOs :
// After assigning the new vreg we may not have any other sublanes living // in and out of the instruction anymore. We need to add new dead and // undef flags in these cases. if (!MO.isUndef()) { SlotIndex Pos = LIS->getInstructionIndex(*MO.getParent()); if (!subRangeLiveAt(LI, Pos)) MO.setIsUndef(); } if (!MO.isDead()) { SlotIndex Pos = LIS->getInstructionIndex(*MO.getParent()).getDeadSlot(); if (!subRangeLiveAt(LI, Pos)) MO.setIsDead(); } }
but you change them inconsistently. You set'em dead but no DCE expected to run after this.
Do you know whether the situation is already detected by DetectDeadLanes? I'd be more open to add dead code elimination to that pass (as we could run it instead of dead code elimination I believe).
No. Thanks for suggestion. I'll check this.
No. It does not detect the operand in question as dead.
I think a better fix here would be to not add dead copies into the maps in the first place. Changing earlier code to:
if (!MI->getOperand(0).isDead() && !MI->getOperand(1).isUndef()) { // Remember Def is defined by the copy. for (MCSubRegIterator SR(Def, TRI, /*IncludeSelf=*/true); SR.isValid(); ++SR) { CopyMap[*SR] = MI; AvailCopyMap[*SR] = MI; } // Remember source that's copied to Def. Once it's clobbered, then // it's no longer available for copy propagation. RegList &DestList = SrcMap[Src]; if (!is_contained(DestList, Def)) DestList.push_back(Def); }I also added a check for copy from undef here as that could provoke a similar problem.
My fix was deliberately done as an explicit workaround. It's main goal was to attract your attention so that you can advise the proper approach to this problem.
It is a good idea to not gather dead copies instead of check them later if we're speaking about the permanent fix given that it should be done in Machine Copy Propagation pass.
Matthias,
The fix you suggested breaks x86 backend on SingleSource/Benchmarks/Adobe-C++/CMakeFiles/simple_types_loop_invariant.dir/simple_types_loop_invariant.cpp
with the following:
Bad machine code: Using an undefined physical register ***
function: main
basic block: BB#4 if.end10 (0x7f92910866e8)
instruction: DIV8r
operand 4: %AX<imp-use>
Here is the excerpt from the asm dump:
%RAX<def> = LEA64r %RIP, 1, %noreg, <ga:@.str.28>, %noreg
MOV64mr %RSP, 1, %noreg, 0, %noreg, %RAX<kill>; mem:ST8[Stack] -- // %RAX is dead now
// a lot of code...
DIV8r %R14B, %AL<imp-def>, %AH<imp-def,dead>, %EFLAGS<imp-def,dead>, %AX<imp-use> -- / / use of the subreg !
// a lot of code ...
%RAX<def> = LEA64r %RIP, 1, %noreg, <ga:@.str.31>, %noreg -- // def that kills the upper def of %RAX
if (!MI->getOperand(0).isDead() && !MI->getOperand(1).isUndef()) { // Remember Def is defined by the copy. f**or (MCSubRegIterator SR(Def, TRI, /*IncludeSelf=*/true); SR.isValid(); ++SR) { CopyMap[*SR] = MI;** AvailCopyMap[*SR] = MI; }
Please note, we're collecting sub-registers of the register. If COPY destination has subregs we cannot rely on the dead flag.
I reverted my change.
I tend to return to my initial approach - not remove the copy if the upper with same dest/src is dead.
In this case subregs/superregs are explicit.
I'm not sure I understand what exactly is going on. dead and kill flags work just as well for subregisters; I don't see a single COPY in your MI excerpt so I'm not sure what is going on or how copy propagation could get it wrong.
I only noted that the dead flag is the property of MachineOperand not register. We have operands as the data structures and the registers represented by the unsigned numbers per target. DetectDeadlanes pass was intended to map operand flags changes to the corresponding changes of those operands representing subregisters. In the place we are discussing we check the dead flag on operand representing super-register but prevent recording all it's subregisters.
Unfortunately I'm currently assigned another urgent task and have no enough time to debug x86 backend and MCP and DetectDeadLanes to pick the exact scenario of the failure.
This MCP misbehavior breaks AMD OCL conformance suite and we need to fix this somehow.
Is it acceptable to submit my workaround (explicitly marked as TODO i.e. with the comment that this is a temporary workaround) and get to this later?
It should be fixed since DetectDeadLanes is specialized pass just to handle MO to regs state changes.
In other words:
We set MO dead in case it is not used before the next definition or it is not live-out and not used to the end of the block.
It NOT USED as a whole register unit! Same time there may be uses of it's sub-registers. And they are still live.
MachineRegisterInfo does not consider the use of sub-register as use of all it's super-registers. And I suspect that same approach is everywhere across the llvm code.
DetectDeadLanes is a good attempt to handle this. Apparently it misses something in this particular case.
My workaround prevent replacing the concrete COPY with concrete register operands. That's why it more conservative and does not hit this problem.