Page MenuHomePhabricator

[PowerPC] support register pressure reduction in machine combiner.

Authored by shchenz on Nov 24 2020, 7:58 PM.



This patch tries to transform following patterns:

// Pattern 1:
//   A = FSUB  X,    Y        (Leaf)
//   C = FMA   A31,  M31,  A  (Root)
// M31 is const -->
//   A = FMA   A31,  Y,  -M31
//   C = FMA   A,    X,  M31
// Pattern 2:
//   A = FSUB  X,    Y        (Leaf)
//   C = FMA   A31,  A,  M32  (Root)
// M32 is const -->
//   A = FMA   A31,  Y,  -M32
//   C = FMA   A,    X,  M32

On PowerPC target, fma instructions are destructive, its def is always assigned with the same physical register with one of its operands. We could use this characteristic to generate more fma instructions to generate friendly code for register allocation.

For example, for the below case:

T = A * B + Const1 * (C - D) + Const2 * (E - F)

Without this patch, llvm generates:

t0 = mul A, B
t1 = sub C, D
t2 = sub E, F
t3 = FMA t0, Const1, t1
T = FMA t3, Const2, t2

t0 & t1 & t2 must be assigned with different registers.
With this patch, we get

t0 = mul A, B
t1 = FMA t0, Const1, C
t2 = FMA t1, -Const1, D
t3 = FMA t2, Const2, E
T = FMA t3, -Const2, F

Now, t0 & t1 & t2 & t3 & T must be assigned with same physical register.

We only do this transformation when the register is high as the transformation will reduce ILP.
We saw some obvious improvement for some cpu2017 benchmarks.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 7:58 PM
shchenz requested review of this revision.Nov 24 2020, 7:58 PM
shchenz updated this revision to Diff 310830.Dec 10 2020, 3:58 AM
shchenz edited the summary of this revision. (Show Details)

1: don't require LiveIntervals analysis pass to estimate register pressure.

shchenz updated this revision to Diff 311122.Dec 10 2020, 11:40 PM

1: update according to parent patch changes

jsji added inline comments.Dec 15 2020, 8:03 PM




Since we multiple this value, this is more a factor than a threshold, then the default value should be slightly more than 1.0?






A = FSUB X, Y (Leaf)
D = FMA B, C, A (Root)
A = FMA B, Y, -C
C = FMA A, X, C


This is the same pattern as above, as we can commute the operands of FMA?

A = FSUB X, Y (Leaf)
D = FMA B, A, C (Root)
A = FMA B, Y, -C
C = FMA A, X, C


Do all candidate checks in isRegisterPressureReduceCandidate, so that we have simple logic like:

if (DoRegPressureReduce && isRegisterPressureReduceCandidate(Root, MULInstrL,MULInstrR )) {
     if (isLoadFromConstantPool(MULInstrL) &&
         IsReassociableAddOrSub(*MULInstrR, InfoArrayIdxFSubInst)) {
          LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n");
          return true;
     if (isLoadFromConstantPool(MULInstrR) &&
         IsReassociableAddOrSub(*MULInstrL, InfoArrayIdxFSubInst)) {
          LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n");
          return true;

This check should be in isRegisterPressureReduceCandidate


Uninitialized variable.


This should be in isRegisterPressureReduceCandidate as well.


Why this can not be inlined into next line of ConstantFP::get?


What if we are in non-assert version and we can't find Placeholder ?


Split this out into a lamda or function , just call something like

currMBBPresure = getMBBPressure (MBB...);
VSSRCRPSetLimit= ...

return currMBBPresure >= VSSRCRPSetLimit * RegPressureFactor

This should check the Root, and look through copies, isVirtualRegister.
Only return true when all the requirements are met.


Add document about function and argument assumptions.


I think a switch table here would be clearer.

switch (Pattern){
case: MachineCombinerPattern::REASSOC_XY_AMM_BMM:
case :MachineCombinerPattern::REASSOC_XY_AMM_BMM:
     IsILPReassociate = true;
    Prev = MRI.getUniqueVRegDef(Root.getOperand(AddOpIdx).getReg());
    Leaf = MRI.getUniqueVRegDef(Prev->getOperand(AddOpIdx).getReg());
    IntersectedFlags = Root.getFlags() & Prev->getFlags() & Leaf->getFlags();
case: MachineCombinerPattern::REASSOC_XY_BAC:

Uninitialized variable.


Switch table please as well.






/// Return true when ...


Fixup the placeholders we put in genAlternativeCodeSequence() for MachineCombiner.


Can we add test to see whether we will be able to reuse const?
eg: If there is already negative const in the const pool.


If we have 2 patterns, we should test both patterns..

shchenz updated this revision to Diff 312386.Dec 16 2020, 11:50 PM
shchenz marked 22 inline comments as done.

update according to @jsji comments

Thanks very much for your careful review. I updated or replied accordingly.


The logic here is if the calculated register pressure(RPTracker.getPressure()) exceeds the limits(get`RegPressureSetLimit`), we should do the reassociation. So I used 1.0. Does changing the compare from >= to > make sense to you?


return type of F1.changeSign(); is void


if we are in non-assert version, we should get SEGV, we use Placeholder like Placeholder->setReg(LoadNewConst);. If we come here, Placeholder must be not null, we only handle register reduction patterns here.

Do we need to add if(!Placeholder) return;? Seems we do not do this kind of protection in current llvm code base, for example:

ICmpInst::Predicate Loop::LoopBounds::getCanonicalPredicate() const {
  BasicBlock *Latch = L.getLoopLatch();
  assert(Latch && "Expecting valid latch");

  BranchInst *BI = dyn_cast_or_null<BranchInst>(Latch->getTerminator());

new IsRPReductionCandidate already does more than this.


I tried this before, seems in ISel, constant operand is put as the second mul operand in FMA instruction. I also can not create a MIR case, as there are const pool inputs, met some syntax errors.

shchenz updated this revision to Diff 313627.Dec 23 2020, 4:48 PM

fix lint warnings

gentle ping

jsji added inline comments.Dec 30 2020, 6:51 AM

Yes, I know it is exceed the limit, but I don't think we should invoke this optimization immediately when we exceed the limit, shouldn't we do such reg pressure reduction when the reg pressure is really high enough?

shchenz updated this revision to Diff 314158.Dec 30 2020, 6:07 PM

1: set the register pressure factor to 1.5

shchenz marked an inline comment as done.Dec 30 2020, 6:08 PM
shchenz added inline comments.

Make sense. I changed the factor to 1.5 in the updated patch.

jsji accepted this revision as: jsji.Jan 4 2021, 8:36 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Jan 4 2021, 8:36 AM
shchenz marked an inline comment as done.EditedJan 4 2021, 6:04 PM

Appreciate your review @jsji . Could you please help to have a look at this patch's parent Thanks again.

tpopp added a subscriber: tpopp.Jan 18 2021, 3:00 AM

I'm going to roll this back as it's causing build bot failures (in progress example here: Sanitizers show the following issues:

==9097==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5628a1bd3d10 in llvm::PPCInstrInfo::getFMAPatterns(llvm::MachineInstr&, llvm::SmallVectorImpl<llvm::MachineCombinerPattern>&, bool) const third_party/llvm/llvm-project/llvm/lib/Target/PowerPC/PPCInstrIn
    #1 0x5628a1bd7e6c in llvm::PPCInstrInfo::getMachineCombinerPatterns(llvm::MachineInstr&, llvm::SmallVectorImpl<llvm::MachineCombinerPattern>&, bool) const third_party/llvm/llvm-project/llvm/lib/Target/PowerP
    #2 0x5628a3291840 in combineInstructions third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp:593:15
    #3 0x5628a3291840 in (anonymous namespace)::MachineCombiner::runOnMachineFunction(llvm::MachineFunction&) third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp:736:16                            
    #4 0x5628a32f91a2 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:13                                                    
    #5 0x5628a5e0dce7 in llvm::FPPassManager::runOnFunction(llvm::Function&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1439:27                                                               
    #6 0x5628a5e22f60 in llvm::FPPassManager::runOnModule(llvm::Module&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1485:16                                                                   
    #7 0x5628a5e0f1d5 in runOnModule third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1554:27
    #8 0x5628a5e0f1d5 in llvm::legacy::PassManagerImpl::run(llvm::Module&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:44                                                                  
    #9 0x56289f8dcb66 in compileModule(char**, llvm::LLVMContext&) third_party/llvm/llvm-project/llvm/tools/llc/llc.cpp:658:8


==1709==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f155364a060 at pc 0x555a067755c4 bp 0x7fff9b909050 sp 0x7fff9b909048
READ of size 8 at 0x7f155364a060 thread T0
    #0 0x555a067755c3 in operator[] third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/vector:1550:18
    #1 0x555a067755c3 in llvm::PPCInstrInfo::shouldReduceRegisterPressure(llvm::MachineBasicBlock*, llvm::RegisterClassInfo*) const third_party/llvm/llvm-project/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:655:10
    #2 0x555a07df5618 in combineInstructions third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp:561:12
    #3 0x555a07df5618 in (anonymous namespace)::MachineCombiner::runOnMachineFunction(llvm::MachineFunction&) third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp:736:16
    #4 0x555a07e54479 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:13
    #5 0x555a0b360710 in llvm::FPPassManager::runOnFunction(llvm::Function&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1439:27
    #6 0x555a0b374470 in llvm::FPPassManager::runOnModule(llvm::Module&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1485:16
    #7 0x555a0b3616e0 in runOnModule third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1554:27
    #8 0x555a0b3616e0 in llvm::legacy::PassManagerImpl::run(llvm::Module&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:44
    #9 0x555a0461db8b in compileModule(char**, llvm::LLVMContext&) third_party/llvm/llvm-project/llvm/tools/llc/llc.cpp:658:8
    #10 0x555a046183bf in main third_party/llvm/llvm-project/llvm/tools/llc/llc.cpp:363:22               
Address 0x7f155364a060 is located in stack of thread T0 at offset 96 in frame                                                                                                                                      
    #0 0x555a067742ff in llvm::PPCInstrInfo::shouldReduceRegisterPressure(llvm::MachineBasicBlock*, llvm::RegisterClassInfo*) const third_party/llvm/llvm-project/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:598
  This frame has 7 object(s):                                                                            
    [32, 40) '__x.i.i'                                                                                                                                                                                             
    [64, 72) 'retval.i.i'                                                                                                                                                                                          
    [96, 424) 'Pressure.i' (line 624) <== Memory access at offset 96 is inside this variable                                                                                                                       
    [496, 848) 'RPTracker.i' (line 625)                                                                                                                                                                            
    [912, 920) 'MII.i' (line 631)                                                                                                                                                                                  
    [944, 952) 'MIE.i' (line 631)                                                                                                                                                                                  
    [976, 1408) 'RegOpers.i' (line 637)                                                                                                                                                                            
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork                                                                                                   
      (longjmp and C++ exceptions *are* supported)
shchenz updated this revision to Diff 317334.Jan 18 2021, 5:32 AM

1: fix sanitizer warning

Hi @tpopp , thanks for reverting. Do you happen to know how to run the Sanitizers tests on X86-64 target? I don't have an X86 testing environment.

I addressed one issue related to stack-use-after-scope, but I can not find out the reason for failure use-of-uninitialized-value. Maybe the use-of-uninitialized-value failure is also the same issue as there is one variable be assigned with returning value of function shouldReduceRegisterPressure(This is where failure stack-use-after-scope happens.). But I am not sure.

If you can rerun the checks, could you please help to have a verify with the updated patch? Thanks.

shchenz updated this revision to Diff 317335.Jan 18 2021, 5:35 AM

full context

Confirmed that use-of-uninitialized-value can be reproduced on the PowerPC target and it can be fixed by the new patch.

Thanks for addressing the issues shchenz! I'm glad you figured out the use-of-uninitialized-value because that one confused me. I'm finishing my workday for the day, so I can rerun the patch tomorrow. Alternatively, all I know is that I had errors in my environment and that's what sanitizers showed, so you reproducing and fixing the issues is plenty I think.

shchenz updated this revision to Diff 317460.Jan 18 2021, 9:17 PM

update according to api change in D92069

This revision was landed with ongoing or failed builds.Jan 24 2021, 6:28 PM
foad added a subscriber: foad.Feb 2 2021, 3:26 AM
foad added inline comments.

Typo "floating".


Is it true that these patterns only improve register pressure if X and Y are "live out" of the pattern? Otherwise A could be allocated the same register as X or Y and there would be no increase.


Typo "attribute".

shchenz marked 3 inline comments as done.Feb 2 2021, 6:06 PM
shchenz added inline comments.

Thanks for pointing this out. NFC patch b0869a7d72f121b77d48d6496c6c3f00dd4731da fix this.


One condition for this transformation is FMA is the only user of FSUB, which means we can delete the original FSUB to save its definition, the original A. After the transformation, A must be assigned with the same register with B and D