This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Optimize compare by using record form in post-RA.
ClosedPublic

Authored by Esme on Aug 14 2022, 10:07 PM.

Details

Summary

We currently optimize the comparison only in SSA, therefore we will miss some optimization opportunities where the input of comparison is lowered from COPY in post-RA.
Ie. ExpandPostRA::LowerCopy is called after PPCInstrInfo::optimizeCompareInstr.

This patch optimizes the comparison in post-RA and only the cases that compare against zero can be handled.
D131374 converts the comparison and its user to a compare against zero with the appropriate predicate on the branch, which creates additional opportunities for this patch.

Diff Detail

Event Timeline

Esme created this revision.Aug 14 2022, 10:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2022, 10:07 PM
Esme requested review of this revision.Aug 14 2022, 10:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 14 2022, 10:07 PM
Esme edited the summary of this revision. (Show Details)Aug 14 2022, 11:23 PM

This patch misses MIR cases for this transformation. See example in llvm/test/CodeGen/PowerPC/fold-frame-offset-using-rr.mir

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2778

Better to add comments to explicitly specify the condition for optimizing, eg:
1: this should be a compare instruction between a reg and an imm
2: the imm should be 0

2779

nit: isVirtual() is not needed since this is for post RA.

2782

Add a MIR case for this?

2788

getDefMIPostRA() would not return a def in another block, so is SrcMI->getParent() != CmpMI.getParent() needed?

SrcReg = op1
xx1 = op2 SrcReg
xx2= cmp SrcReg

Intermediate use of SrcReg should not stop the compare optimization?

2801

If there is any use of cr0 between SrcMI and CmpMI, we should also bail out? We will define the cr0 in the new record form or SrcMI?

2805

I can not image a case where OldOpc == NewOpC, could you please give an example here?

2808

I don't understand the logic here, and what result will MRI->use_instructions() produce after RA?

2823

RegMO is the def of CmpMI, it will not have kill flag. And CR0 will be be killed between SrcMI and CmpMI, otherwise we can not do the transformation. No use of CR0 should exist between SrcMI and CmpMI.

Here I think we should handle the killed flag for SrcReg if it is killed in CmpMI and there is other use between SrcMI and CmpMI?

Esme updated this revision to Diff 456242.Aug 28 2022, 10:12 PM

Addressed Zheng's comments for the code part and the MIR case will added soon.

Esme updated this revision to Diff 456249.Aug 28 2022, 10:14 PM
Esme updated this revision to Diff 457174.Aug 31 2022, 10:11 PM

Added mir test cases.

lkail added inline comments.Sep 2 2022, 12:18 AM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2792

nit: OtherUse_SrcReg -> SrcRegHasOtherUse

2804

This check can be moved after

if (CmpMI.hasImplicitDef())
  return false;
2808

nit: OtherUse_CRReg -> SeenUseOfCRReg.

2823

Here I think we should handle the killed flag for SrcReg if it is killed in CmpMI and there is other use between SrcMI and CmpMI?

Look no need. We are folding

L1: a = op1 b, c
...
L2: b = op2 a, 0

->

L1: a, b = op3 b, c
...

The constraint here is b can't be def/use between [L1,L2). Use of a between [L1, L2] is still the version defined at L1. And call of getDefMIPostRA should guarantee L1 is the nearest point defined a.

lkail added inline comments.Sep 2 2022, 12:20 AM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2823

edit:

L1: a = op1 b, c
...
L2: d = op2 a, 0

->

L1: a, d = op3 b, c
...

The constraint here is d can't be def/use between [L1,L2).

Look almost good to me now. Thanks for adding this folding.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2797

Have you seen any perf impact due to this pattern? If so, there are many other record form rotate and shift instructions on PowerPC target, like RLDICL/RLDICR(not complete.)

2828

Can we add a MIR case for this?

Esme updated this revision to Diff 463760.Sep 28 2022, 10:54 PM

Apologize for the late update due to some compiler-rt errors occurred. I'm working on fixing it.

Esme updated this revision to Diff 470032.Oct 23 2022, 7:11 PM

The bootstrap test is clean.

lkail added inline comments.Oct 23 2022, 8:21 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2791

Is

if (Opc != PPC::CMPWI && Opc != PPC::CMPDI)
  return false;

// <Explain why CMPWI doesn't apply on PPC64>
if (Subtarget.isPPC64() && Opc == PPC::CMPWI)
  return false;

more clear?

Esme updated this revision to Diff 470360.Oct 24 2022, 8:19 PM

Update comments.

lkail accepted this revision as: lkail.Oct 24 2022, 9:20 PM

LGTM thanks.

This revision is now accepted and ready to land.Oct 24 2022, 9:20 PM
shchenz added a comment.EditedOct 24 2022, 10:35 PM

Compared with previous versions, to me, seems some valid transformations are gone now because of the new bail out case Subtarget.isPPC64() && Opc == PPC::CMPWI. Maybe we can have a further check why PPC64 generates CMPWI 0 for some cases. For these cases, I guess CMPDI 0 should also be ok, so that there will be more record form opportunities here. But we can do that in another patch.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2779

We don't need the check for isVirtual()? we are post-ra here.

2791

Maybe we don't need Subtarget.isPPC64() for PPC::CMPLDI because this should be impossible.

2839

hmm, another caller of fixupIsDeadOrKill(), we refactor this in D133103. FYI @nemanjai

shchenz added inline comments.Oct 24 2022, 10:51 PM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2791

I meant isPPC32 && PPC::CMPLDI is impossible, so that we don't need check Subtarget.isPPC64().

Esme updated this revision to Diff 470994.Oct 26 2022, 8:07 PM

Compared with previous versions, to me, seems some valid transformations are gone now because of the new bail out case Subtarget.isPPC64() && Opc == PPC::CMPWI. Maybe we can have a further check why PPC64 generates CMPWI 0 for some cases. For these cases, I guess CMPDI 0 should also be ok, so that there will be more record form opportunities here. But we can do that in another patch.

Currently PPC64 generates CMPWI 0 for int32 operands, which is reasonable.
CMPDI 0 will be ok if the use of the comparison is an equality check, however it's not easy to find the reliable use in Post-RA.
More investigation is required to exploit other opportunities for optimization comparison.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
2839

D133103 seems in progress, so this patch will not be rebase on it. Maybe D133103 can refactor this after this patch lands.

shchenz accepted this revision as: shchenz.Oct 26 2022, 8:27 PM

More investigation is required to exploit other opportunities for optimization comparison.

Yes, I agree, this is definitely worth further investigation.

This patch LGTM too. Thanks for adding this.

This revision was landed with ongoing or failed builds.Oct 30 2022, 10:35 PM
This revision was automatically updated to reflect the committed changes.