Page MenuHomePhabricator

[PowerPC] support register pressure reduction in machine combiner.
AcceptedPublic

Authored by shchenz on Nov 24 2020, 7:58 PM.

Details

Reviewers
jsji
nemanjai
steven.zhang
Group Reviewers
Restricted Project
Summary

This patch tries to transform following patterns:

// Pattern 1:
//   A = FSUB  X,    Y        (Leaf)
//   C = FMA   A31,  M31,  A  (Root)
// M31 is const -->
//   A = FMA   A31,  Y,  -M31
//   C = FMA   A,    X,  M31
//
// Pattern 2:
//   A = FSUB  X,    Y        (Leaf)
//   C = FMA   A31,  A,  M32  (Root)
// M32 is const -->
//   A = FMA   A31,  Y,  -M32
//   C = FMA   A,    X,  M32
//

On PowerPC target, fma instructions are destructive, its def is always assigned with the same physical register with one of its operands. We could use this characteristic to generate more fma instructions to generate friendly code for register allocation.

For example, for the below case:

T = A * B + Const1 * (C - D) + Const2 * (E - F)

Without this patch, llvm generates:

t0 = mul A, B
t1 = sub C, D
t2 = sub E, F
....
t3 = FMA t0, Const1, t1
T = FMA t3, Const2, t2

t0 & t1 & t2 must be assigned with different registers.
With this patch, we get

t0 = mul A, B
t1 = FMA t0, Const1, C
t2 = FMA t1, -Const1, D
t3 = FMA t2, Const2, E
T = FMA t3, -Const2, F

Now, t0 & t1 & t2 & t3 & T must be assigned with same physical register.

We only do this transformation when the register is high as the transformation will reduce ILP.
We saw some obvious improvement for some cpu2017 benchmarks.

Diff Detail

Event Timeline

shchenz created this revision.Nov 24 2020, 7:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 7:58 PM
shchenz requested review of this revision.Nov 24 2020, 7:58 PM
shchenz updated this revision to Diff 310830.Dec 10 2020, 3:58 AM
shchenz edited the summary of this revision. (Show Details)

1: don't require LiveIntervals analysis pass to estimate register pressure.

shchenz updated this revision to Diff 311122.Dec 10 2020, 11:40 PM

1: update according to parent patch https://reviews.llvm.org/D92068 changes

jsji added inline comments.Dec 15 2020, 8:03 PM
llvm/include/llvm/CodeGen/MachineCombinerPattern.h
34

REASSOC_XY_BCA,
REASSOC_XY_BAC,

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
80

"ppc-fma-rp-factor"

Since we multiple this value, this is more a factor than a threshold, then the default value should be slightly more than 1.0?

83

EnableFMARegPressureReduction

84

"ppc-fma-rp-reduction"

350

A = FSUB X, Y (Leaf)
D = FMA B, C, A (Root)
->
A = FMA B, Y, -C
C = FMA A, X, C

355

This is the same pattern as above, as we can commute the operands of FMA?

A = FSUB X, Y (Leaf)
D = FMA B, A, C (Root)
->
A = FMA B, Y, -C
C = FMA A, X, C

443

Do all candidate checks in isRegisterPressureReduceCandidate, so that we have simple logic like:

if (DoRegPressureReduce && isRegisterPressureReduceCandidate(Root, MULInstrL,MULInstrR )) {
     if (isLoadFromConstantPool(MULInstrL) &&
         IsReassociableAddOrSub(*MULInstrR, InfoArrayIdxFSubInst)) {
          LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n");
          Patterns.push_back(MachineCombinerPattern::REASSOC_XY_BCA);
          return true;
     }
     if (isLoadFromConstantPool(MULInstrR) &&
         IsReassociableAddOrSub(*MULInstrL, InfoArrayIdxFSubInst)) {
          LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n");
          Patterns.push_back(MachineCombinerPattern::REASSOC_XY_BCA);
          return true;
     }
}
446

This check should be in isRegisterPressureReduceCandidate

448

Uninitialized variable.

450

This should be in isRegisterPressureReduceCandidate as well.

567

Why this can not be inlined into next line of ConstantFP::get?

586

What if we are in non-assert version and we can't find Placeholder ?

619

Split this out into a lamda or function , just call something like

currMBBPresure = getMBBPressure (MBB...);
VSSRCRPSetLimit= ...

return currMBBPresure >= VSSRCRPSetLimit * RegPressureFactor
654

This should check the Root, and look through copies, isVirtualRegister.
Only return true when all the requirements are met.

722

Add document about function and argument assumptions.

804

I think a switch table here would be clearer.

switch (Pattern){
case: MachineCombinerPattern::REASSOC_XY_AMM_BMM:
case :MachineCombinerPattern::REASSOC_XY_AMM_BMM:
     IsILPReassociate = true;
    Prev = MRI.getUniqueVRegDef(Root.getOperand(AddOpIdx).getReg());
    Leaf = MRI.getUniqueVRegDef(Prev->getOperand(AddOpIdx).getReg());
    IntersectedFlags = Root.getFlags() & Prev->getFlags() & Leaf->getFlags();
break;
case: MachineCombinerPattern::REASSOC_XY_BAC:
...
break;
}
820

Uninitialized variable.

903–907

Switch table please as well.

llvm/lib/Target/PowerPC/PPCInstrInfo.h
255

isRPReductionCandidate

352

DoRPReduction

361

/// Return true when ...

367

Fixup the placeholders we put in genAlternativeCodeSequence() for MachineCombiner.

llvm/test/CodeGen/PowerPC/register-pressure-reduction.ll
10

Can we add test to see whether we will be able to reuse const?
eg: If there is already negative const in the const pool.

46

If we have 2 patterns, we should test both patterns..

shchenz updated this revision to Diff 312386.Dec 16 2020, 11:50 PM
shchenz marked 22 inline comments as done.

update according to @jsji comments

Thanks very much for your careful review. I updated or replied accordingly.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
80

The logic here is if the calculated register pressure(RPTracker.getPressure()) exceeds the limits(get`RegPressureSetLimit`), we should do the reassociation. So I used 1.0. Does changing the compare from >= to > make sense to you?

567

return type of F1.changeSign(); is void

586

if we are in non-assert version, we should get SEGV, we use Placeholder like Placeholder->setReg(LoadNewConst);. If we come here, Placeholder must be not null, we only handle register reduction patterns here.

Do we need to add if(!Placeholder) return;? Seems we do not do this kind of protection in current llvm code base, for example:

ICmpInst::Predicate Loop::LoopBounds::getCanonicalPredicate() const {
  BasicBlock *Latch = L.getLoopLatch();
  assert(Latch && "Expecting valid latch");

  BranchInst *BI = dyn_cast_or_null<BranchInst>(Latch->getTerminator());
654

new IsRPReductionCandidate already does more than this.

llvm/test/CodeGen/PowerPC/register-pressure-reduction.ll
46

I tried this before, seems in ISel, constant operand is put as the second mul operand in FMA instruction. I also can not create a MIR case, as there are const pool inputs, met some syntax errors.

shchenz updated this revision to Diff 313627.Wed, Dec 23, 4:48 PM

fix lint warnings

gentle ping

jsji added inline comments.Wed, Dec 30, 6:51 AM
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
80

Yes, I know it is exceed the limit, but I don't think we should invoke this optimization immediately when we exceed the limit, shouldn't we do such reg pressure reduction when the reg pressure is really high enough?

shchenz updated this revision to Diff 314158.Wed, Dec 30, 6:07 PM

1: set the register pressure factor to 1.5

shchenz marked an inline comment as done.Wed, Dec 30, 6:08 PM
shchenz added inline comments.
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
80

Make sense. I changed the factor to 1.5 in the updated patch.

jsji accepted this revision as: jsji.Mon, Jan 4, 8:36 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Mon, Jan 4, 8:36 AM
shchenz marked an inline comment as done.EditedMon, Jan 4, 6:04 PM

Appreciate your review @jsji . Could you please help to have a look at this patch's parent https://reviews.llvm.org/D92069. Thanks again.