Page MenuHomePhabricator

[AMDGPU] Run '' after isel to simplify PHIs.
ClosedPublic

Authored by hliao on Jul 8 2019, 9:43 AM.

Details

Summary
  • As LCSSA is turned on just before isel, it may create PHI of the flow, which is consumed by pseudo structurized CFG instructions. When that PHIs are eliminated in O0, COPY may be placed wrongly as the these pseudo structurized CFG instructions are considering prologue of MBB.
  • Run extra unreachable-mbb-elimination at the end of isel to clean up PHIs.

Diff Detail

Repository
rL LLVM

Event Timeline

hliao created this revision.Jul 8 2019, 9:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 8 2019, 9:43 AM
arsenm added a comment.Jul 8 2019, 9:49 AM

I don't follow how eliminating unreachable blocks fixes this.

In general expecting code placement with the block "prolog" instructions isn't reliable. I had fixed this in rL357634, except it was reverted

llvm/test/CodeGen/AMDGPU/lcssa-optnone.ll
1–3 ↗(On Diff #208446)

Generate check lines

hliao updated this revision to Diff 208449.Jul 8 2019, 9:50 AM

revise commit message.

hliao added a comment.Jul 8 2019, 9:53 AM

I don't follow how eliminating unreachable blocks fixes this.

In general expecting code placement with the block "prolog" instructions isn't reliable. I had fixed this in rL357634, except it was reverted

LCSSA created PHI of flow used by cf.end. These PHIs (due to LCSSA) are unnecessary. unreachable-mbb-elimination could remove them.

hliao added a comment.Jul 8 2019, 9:55 AM

I don't follow how eliminating unreachable blocks fixes this.

In general expecting code placement with the block "prolog" instructions isn't reliable. I had fixed this in rL357634, except it was reverted

BTW, the flow created by CFG structurizer should not have PHI at all by design.

arsenm added a comment.Jul 8 2019, 9:57 AM

I'm still confused. When I try your testcase, I see an earlier iterator crash in SILowerControlFlow

hliao marked an inline comment as done.Jul 8 2019, 9:58 AM
hliao added inline comments.
llvm/test/CodeGen/AMDGPU/lcssa-optnone.ll
1–3 ↗(On Diff #208446)

that test case crash with the fix. I added CHECK-LABEL to check codegen could pass. Checking the code generated seems unnecessary to me.

hliao added a comment.Jul 8 2019, 10:10 AM

I'm still confused. When I try your testcase, I see an earlier iterator crash in SILowerControlFlow

SILowerControlFlow runs after PHIElim, which converts PHI to COPY and could generate code like this in the last MBB

END_CF %v0, ....
%v0 = COPY %vX

the use (END_CF) is before the def (COPY).
the iterator in SILowereControlFlow starts from that def (COPY) to its use (END_CF). But, as they misplaced, it reaches the end of MBB first and triggers the assertion of illegal iterator access.

I think you should probably fix SILowerControlFlow

llvm/test/CodeGen/AMDGPU/lcssa-optnone.ll
1–3 ↗(On Diff #208446)

Tests with no checks are bad form. I think any tests related to control flow handling should probably just be generated

hliao added a comment.Jul 8 2019, 11:00 AM

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

hliao updated this revision to Diff 208460.Jul 8 2019, 11:07 AM

revise test case

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

This isn't a property the pass should rely on. SILowerControlFlow should defend against unexpected iteration order and unreachable blocks

hliao added a comment.Jul 8 2019, 11:11 AM

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

This isn't a property the pass should rely on. SILowerControlFlow should defend against unexpected iteration order and unreachable blocks

any use before def is invalid MIR, right?

hliao added a comment.Jul 8 2019, 11:20 AM

cf.end and related MBB prologue instructions should not take PHIs as input. Before LCSSA is turned on, we won't have that IR. After LCSSA, the only possible PHI input is the one from non-latch loop single exit. That PHI is unnecessary and should be removed early to assure that the later passes could relies on the fact they cf.end and etc. won't take PHI as input.

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

This isn't a property the pass should rely on. SILowerControlFlow should defend against unexpected iteration order and unreachable blocks

any use before def is invalid MIR, right?

So the MIR is valid before phi elimination, and then becomes invalid after. Removing unreachable blocks earlier to avoid this happening is a workaround. It seems to me like we need to fix PHI elimination or avoid relying on the prolog instructions

hliao added a comment.Jul 8 2019, 11:53 AM

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

This isn't a property the pass should rely on. SILowerControlFlow should defend against unexpected iteration order and unreachable blocks

any use before def is invalid MIR, right?

So the MIR is valid before phi elimination, and then becomes invalid after. Removing unreachable blocks earlier to avoid this happening is a workaround. It seems to me like we need to fix PHI elimination or avoid relying on the prolog instructions

my understanding is that MBB prologue instructions are designed NOT to take PHI as input. Like these pseudo instructions for structurized CFG, by design, they don't take any PHI as input. As an alternative approach (not workaround), we could ensure these instructions not taking PHI as input before PHIElim instead of teaching PHIElim to under them, more or less very target-specific stuff.

No matter what, running UnreachableMachineBlockElim is not a fix for this.

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

This isn't a property the pass should rely on. SILowerControlFlow should defend against unexpected iteration order and unreachable blocks

any use before def is invalid MIR, right?

So the MIR is valid before phi elimination, and then becomes invalid after. Removing unreachable blocks earlier to avoid this happening is a workaround. It seems to me like we need to fix PHI elimination or avoid relying on the prolog instructions

my understanding is that MBB prologue instructions are designed NOT to take PHI as input. Like these pseudo instructions for structurized CFG, by design, they don't take any PHI as input. As an alternative approach (not workaround), we could ensure these instructions not taking PHI as input before PHIElim instead of teaching PHIElim to under them, more or less very target-specific stuff.

The verifier should probably check this. We definitely should not hack around this by deleting unreachable blocks that happens to run into this

No matter what, running UnreachableMachineBlockElim is not a fix for this.

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

This isn't a property the pass should rely on. SILowerControlFlow should defend against unexpected iteration order and unreachable blocks

any use before def is invalid MIR, right?

So the MIR is valid before phi elimination, and then becomes invalid after. Removing unreachable blocks earlier to avoid this happening is a workaround. It seems to me like we need to fix PHI elimination or avoid relying on the prolog instructions

my understanding is that MBB prologue instructions are designed NOT to take PHI as input. Like these pseudo instructions for structurized CFG, by design, they don't take any PHI as input. As an alternative approach (not workaround), we could ensure these instructions not taking PHI as input before PHIElim instead of teaching PHIElim to under them, more or less very target-specific stuff.

The verifier should probably check this. We definitely should not hack around this by deleting unreachable blocks that happens to run into this

But, the verifier so far doesn't change that. For your concerns, I'd like to take the following approach:

  • Add target checker to assert those instructions with direct PHI input.
  • Add a special pass after isel to canonicalize these pseudo CFG instructions by removing PHI inputs.

Does that sound to be acceptable?

I think PHIElimination::LowerPHINode needs to be taught to look for prolog instructions

No matter what, running UnreachableMachineBlockElim is not a fix for this.

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

This isn't a property the pass should rely on. SILowerControlFlow should defend against unexpected iteration order and unreachable blocks

any use before def is invalid MIR, right?

So the MIR is valid before phi elimination, and then becomes invalid after. Removing unreachable blocks earlier to avoid this happening is a workaround. It seems to me like we need to fix PHI elimination or avoid relying on the prolog instructions

my understanding is that MBB prologue instructions are designed NOT to take PHI as input. Like these pseudo instructions for structurized CFG, by design, they don't take any PHI as input. As an alternative approach (not workaround), we could ensure these instructions not taking PHI as input before PHIElim instead of teaching PHIElim to under them, more or less very target-specific stuff.

The verifier should probably check this. We definitely should not hack around this by deleting unreachable blocks that happens to run into this

But, the verifier so far doesn't change that. For your concerns, I'd like to take the following approach:

  • Add target checker to assert those instructions with direct PHI input.
  • Add a special pass after isel to canonicalize these pseudo CFG instructions by removing PHI inputs.

    Does that sound to be acceptable?
hliao added a comment.Jul 18 2019, 1:32 PM

Please review https://reviews.llvm.org/D64946 following your suggestion.

I think PHIElimination::LowerPHINode needs to be taught to look for prolog instructions

No matter what, running UnreachableMachineBlockElim is not a fix for this.

I think you should probably fix SILowerControlFlow

the misplacement of def/use is not the fault of SILowerControlFlow but PHIEliminate. However, even PHIEliminate should not be blamed as cf.end is part of MBB prologue. cf.end should not take any input from PHI.

In O2 compilation, there won't be such issue as, in optimized RA, livevariable is used to eliminate PHI better. LiveVariables depends on unreachable-mbb-elimin, which remove the PHI to cf.end.

This isn't a property the pass should rely on. SILowerControlFlow should defend against unexpected iteration order and unreachable blocks

any use before def is invalid MIR, right?

So the MIR is valid before phi elimination, and then becomes invalid after. Removing unreachable blocks earlier to avoid this happening is a workaround. It seems to me like we need to fix PHI elimination or avoid relying on the prolog instructions

my understanding is that MBB prologue instructions are designed NOT to take PHI as input. Like these pseudo instructions for structurized CFG, by design, they don't take any PHI as input. As an alternative approach (not workaround), we could ensure these instructions not taking PHI as input before PHIElim instead of teaching PHIElim to under them, more or less very target-specific stuff.

The verifier should probably check this. We definitely should not hack around this by deleting unreachable blocks that happens to run into this

But, the verifier so far doesn't change that. For your concerns, I'd like to take the following approach:

  • Add target checker to assert those instructions with direct PHI input.
  • Add a special pass after isel to canonicalize these pseudo CFG instructions by removing PHI inputs.

    Does that sound to be acceptable?
arsenm added inline comments.Jul 24 2019, 9:20 AM
llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
885 ↗(On Diff #208460)

Can you add a fixme that this is a workaround and should be removed

Given that D65141 may avoid the source of this problem, I think we can go with this as a quick workaround for now. Once LCSSA is no longer required, we can add a verifier check that a phi is not used as the intrinsic source

hliao updated this revision to Diff 211658.Jul 24 2019, 7:15 PM

Add fixme.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 25 2019, 7:50 AM
This revision was automatically updated to reflect the committed changes.
foad added a subscriber: foad.EditedWed, Oct 9, 7:32 AM

Can we revert this now that D67101 has landed? I have tried locally reverting your change to addInstSelector and your lcssa-optnone.ll test case still passes.

foad added a comment.Thu, Oct 10, 2:22 AM

Can we revert this now that D67101 has landed? I have tried locally reverting your change to addInstSelector and your lcssa-optnone.ll test case still passes.

See D68769.