This is an archive of the discontinued LLVM Phabricator instance.

[UnreachableBlockElim] Don't remove LCSSA phi nodes
Needs ReviewPublic

Authored by foad on May 2 2023, 7:33 AM.

Details

Reviewers
resistor
nhaehnle
Petar.Avramovic
sameerds
Group Reviewers
Restricted Project
Summary

Revert svn r54432:

SDISel's constant branch folding can fold away self-loops, which doesn't result in any dead blocks, but
rather an incorrect phi input.  Add code to UnreachableMachineBlockElim to get rid of these entries.

The effect of the reversion is not to remove single-input phi nodes as
created by LCSSA. Instead, phi nodes are only removed if they started
with multiple inputs but were reduced to a single input because some of
their predecessor blocks were unreachable.

This makes a significant difference on AMDGPU where we deliberately go
into LCSSA form before instruction selection. I have not noticed any
effect on other targets.

Diff Detail

Event Timeline

foad created this revision.May 2 2023, 7:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 7:33 AM
foad requested review of this revision.May 2 2023, 7:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 2 2023, 7:33 AM
foad added a comment.May 2 2023, 7:35 AM

SDISel's constant branch folding can fold away self-loops, which doesn't result in any dead blocks, but
rather an incorrect phi input. Add code to UnreachableMachineBlockElim to get rid of these entries.

This description from r54432 is pretty unclear to me. It sounds like maybe it was using UnreachableBlockElim to fix MIR that was broken by a previous pass? Anyway reverting does not appear to cause any failures of that sort.

foad added a comment.May 2 2023, 7:39 AM

I forgot a FIXME: this patches currently causes CodeGen/AMDGPU/control-flow-optnone.ll to fail with *** Bad machine code: Kill missing from LiveVariables ***.

foad updated this revision to Diff 518740.May 2 2023, 8:12 AM

Add a small fix for AMDGPU SILowerControlFlow test failure.

foad added inline comments.May 2 2023, 8:13 AM
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
430–431 ↗(On Diff #518740)

I can submit this separately, but I guess I would have to come up with a MIR test case. I am not sure why the change to UnreachableBlockElim exposed this problem.

Pierre-vh added inline comments.
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
430–431 ↗(On Diff #518740)

No strong opinion on leaving this in this patch or splitting it up, but it would be nice to have a MIR testcase just for this fix IMO (with -run-pass or -start-before).

Maybe you can quickly get one by using -stop-before + llvm-reduce & some manual cleanup on the test that was previously failing?

foad added inline comments.May 3 2023, 7:54 AM
llvm/lib/Target/AMDGPU/SILowerControlFlow.cpp
430–431 ↗(On Diff #518740)

It was too hard. I can dump the MIR after UnreachableBlockElim to generate a test case, but to show the bug I would need to run something like llc -run-pass=livevars,phi-node-elimination,si-lower-control-flow. But the LiveVariables pass adds a run of UnreachableBlockElim as a prerequisite, which is enough to mess up the test case. I think I'll just commit this separately as an obvious fix.

foad updated this revision to Diff 519082.May 3 2023, 8:11 AM

Rebase.

ruiling added a subscriber: ruiling.EditedMay 5 2023, 12:30 AM

I guess you are observing code generation bug for AMDGPU? Is it replacing a phi like %1:vgpr = phi %0:sgpr with a %1:vgpr = COPY %0:sgpr? If that is the case, I think this does not sound like a root-fix.

If we have some MachineIR like below (bb1 is a self-loop):

 entry
  |
[ bb1  bb2
   \    /
     bb3

The unreachable block bb2 will be eliminated and the phi in bb3 will then be further simplified the same way as in the LCSSA case you are seeing.

I think the problem is for AMDGPU, we depends on the sgpr to vgpr copy lowered from phi should be in predecessor block, which is the way PHIElimination lowers phi. This is mainly because when the predecessor block is inside a loop, the COPY in predecessor block would be executed totally different from a COPY in the successor block.
For this specific issue, I think we can teach the pass here to insert COPY in the predecessor block as PHIElimination. This should not hurt other target as this is the standard way to lower phi and COPY between coalescable register classes would be coalesced away later.

foad added a comment.May 5 2023, 12:58 AM

I guess you are observing code generation bug for AMDGPU? Is it replacing a phi like %1:vgpr = phi %0:sgpr with a %1:vgpr = COPY %0:sgpr? If that is the case, I think this does not sound like a root-fix.

Right, I was investigating codegen problems on AMDGPU, but this patch is not supposed to fix them all. The rationale for this patch is:

  • simplify the pass
  • revert a "fix" which is apparently not required
  • do not *deliberately* remove all LCSSA phi nodes

If we have some MachineIR like below (bb1 is a self-loop):

 entry
  |
[ bb1  bb2
   \    /
     bb3

The unreachable block bb2 will be eliminated and the phi in bb3 will then be further simplified the same way as in the LCSSA case you are seeing.

That is a good point. I guess this pass is still going to remove *some* LCSSA phi nodes.

arsenm added a comment.Jun 6 2023, 3:11 PM

I thought the deliberate LCSSA thing was just a hack for DAG divergence. If you're on MIR it's no longer necessary?