This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/1
MachineCombinerPattern.h
-
lib/
-
CodeGen/
-
MachineCombiner.cpp
-
Target/PowerPC/
-
PowerPC/
4/4
PPCInstrInfo.h
28/29
PPCInstrInfo.cpp
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
2/3
register-pressure-reduction.ll

Differential D92071

[PowerPC] support register pressure reduction in machine combiner.
ClosedPublic

Authored by shchenz on Nov 24 2020, 7:58 PM.

Download Raw Diff

Details

Reviewers

jsji
nemanjai
steven.zhang

Group Reviewers

Restricted Project

Commits

rG0ed4cf4bf3b6: [PowerPC] support register pressure reduction in machine combiner.
rG26a396c4ef48: [PowerPC] support register pressure reduction in machine combiner.

Summary

This patch tries to transform following patterns:

// Pattern 1:
//   A = FSUB  X,    Y        (Leaf)
//   C = FMA   A31,  M31,  A  (Root)
// M31 is const -->
//   A = FMA   A31,  Y,  -M31
//   C = FMA   A,    X,  M31
//
// Pattern 2:
//   A = FSUB  X,    Y        (Leaf)
//   C = FMA   A31,  A,  M32  (Root)
// M32 is const -->
//   A = FMA   A31,  Y,  -M32
//   C = FMA   A,    X,  M32
//

On PowerPC target, fma instructions are destructive, its def is always assigned with the same physical register with one of its operands. We could use this characteristic to generate more fma instructions to generate friendly code for register allocation.

For example, for the below case:

T = A * B + Const1 * (C - D) + Const2 * (E - F)

Without this patch, llvm generates:

t0 = mul A, B
t1 = sub C, D
t2 = sub E, F
....
t3 = FMA t0, Const1, t1
T = FMA t3, Const2, t2

t0 & t1 & t2 must be assigned with different registers.
With this patch, we get

t0 = mul A, B
t1 = FMA t0, Const1, C
t2 = FMA t1, -Const1, D
t3 = FMA t2, Const2, E
T = FMA t3, -Const2, F

Now, t0 & t1 & t2 & t3 & T must be assigned with same physical register.

We only do this transformation when the register is high as the transformation will reduce ILP.
We saw some obvious improvement for some cpu2017 benchmarks.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

shchenz created this revision.Nov 24 2020, 7:58 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 7:58 PM

Herald added subscribers: llvm-commits, kbarton, hiraditya. · View Herald Transcript

shchenz requested review of this revision.Nov 24 2020, 7:58 PM

Harbormaster completed remote builds in B80043: Diff 307504.Nov 24 2020, 7:59 PM

shchenz edited the summary of this revision. (Show Details)Nov 24 2020, 7:59 PM

shchenz added parent revisions: D92070: [PowerPC] [NFC] code refactor: split IsReassociable to fma and add. , D92069: [NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike., D92068: [MachineCombiner] [NFC]Add MustReduceRegisterPressure goal.

shchenz mentioned this in D92068: [MachineCombiner] [NFC]Add MustReduceRegisterPressure goal.

shchenz mentioned this in D92069: [NFC] [TargetRegisterInfo] add one use check to lookThruCopyLike..Nov 24 2020, 8:04 PM

1: don't require LiveIntervals analysis pass to estimate register pressure.

Harbormaster completed remote builds in B81807: Diff 310830.Dec 10 2020, 4:40 AM

1: update according to parent patch https://reviews.llvm.org/D92068 changes

Harbormaster completed remote builds in B81974: Diff 311122.Dec 11 2020, 12:15 AM

jsji added inline comments.Dec 15 2020, 8:03 PM

llvm/include/llvm/CodeGen/MachineCombinerPattern.h
34	REASSOC_XY_BCA, REASSOC_XY_BAC,
llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
80	"ppc-fma-rp-factor" Since we multiple this value, this is more a factor than a threshold, then the default value should be slightly more than 1.0?
83	EnableFMARegPressureReduction
84	"ppc-fma-rp-reduction"
350	A = FSUB X, Y (Leaf) D = FMA B, C, A (Root) -> A = FMA B, Y, -C C = FMA A, X, C
355	This is the same pattern as above, as we can commute the operands of FMA? A = FSUB X, Y (Leaf) D = FMA B, A, C (Root) -> A = FMA B, Y, -C C = FMA A, X, C
443	Do all candidate checks in isRegisterPressureReduceCandidate, so that we have simple logic like: if (DoRegPressureReduce && isRegisterPressureReduceCandidate(Root, MULInstrL,MULInstrR )) { if (isLoadFromConstantPool(MULInstrL) && IsReassociableAddOrSub(MULInstrR, InfoArrayIdxFSubInst)) { LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n"); Patterns.push_back(MachineCombinerPattern::REASSOC_XY_BCA); return true; } if (isLoadFromConstantPool(MULInstrR) && IsReassociableAddOrSub(MULInstrL, InfoArrayIdxFSubInst)) { LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n"); Patterns.push_back(MachineCombinerPattern::REASSOC_XY_BCA); return true; } }
446	This check should be in `isRegisterPressureReduceCandidate`
448	Uninitialized variable.
450	This should be in `isRegisterPressureReduceCandidate` as well.
567	Why this can not be inlined into next line of ConstantFP::get?
586	What if we are in non-assert version and we can't find `Placeholder` ?
619	Split this out into a lamda or function , just call something like currMBBPresure = getMBBPressure (MBB...); VSSRCRPSetLimit= ... return currMBBPresure >= VSSRCRPSetLimit * RegPressureFactor
654	This should check the Root, and look through copies, isVirtualRegister. Only return true when all the requirements are met.
722	Add document about function and argument assumptions.
804	I think a switch table here would be clearer. switch (Pattern){ case: MachineCombinerPattern::REASSOC_XY_AMM_BMM: case :MachineCombinerPattern::REASSOC_XY_AMM_BMM: IsILPReassociate = true; Prev = MRI.getUniqueVRegDef(Root.getOperand(AddOpIdx).getReg()); Leaf = MRI.getUniqueVRegDef(Prev->getOperand(AddOpIdx).getReg()); IntersectedFlags = Root.getFlags() & Prev->getFlags() & Leaf->getFlags(); break; case: MachineCombinerPattern::REASSOC_XY_BAC: ... break; }
820	Uninitialized variable.
903–907	Switch table please as well.
llvm/lib/Target/PowerPC/PPCInstrInfo.h
255	isRPReductionCandidate
352	DoRPReduction
361	/// Return true when ...
367	Fixup the placeholders we put in genAlternativeCodeSequence() for MachineCombiner.
llvm/test/CodeGen/PowerPC/register-pressure-reduction.ll
10	Can we add test to see whether we will be able to reuse const? eg: If there is already negative const in the const pool.
46	If we have 2 patterns, we should test both patterns..

update according to @jsji comments

Thanks very much for your careful review. I updated or replied accordingly.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
80	The logic here is if the calculated register pressure(`RPTracker.getPressure()`) exceeds the limits(get`RegPressureSetLimit`), we should do the reassociation. So I used 1.0. Does changing the compare from `>=` to `>` make sense to you?
567	return type of `F1.changeSign();` is `void`
586	if we are in non-assert version, we should get SEGV, we use `Placeholder` like `Placeholder->setReg(LoadNewConst);`. If we come here, `Placeholder` must be not null, we only handle register reduction patterns here. Do we need to add `if(!Placeholder) return;`? Seems we do not do this kind of protection in current llvm code base, for example: ICmpInst::Predicate Loop::LoopBounds::getCanonicalPredicate() const { BasicBlock Latch = L.getLoopLatch(); assert(Latch && "Expecting valid latch"); BranchInst BI = dyn_cast_or_null<BranchInst>(Latch->getTerminator());
654	new `IsRPReductionCandidate` already does more than this.
llvm/test/CodeGen/PowerPC/register-pressure-reduction.ll
46	I tried this before, seems in ISel, constant operand is put as the second mul operand in FMA instruction. I also can not create a MIR case, as there are const pool inputs, met some syntax errors.

Harbormaster completed remote builds in B82741: Diff 312386.Dec 17 2020, 12:28 AM

fix lint warnings

gentle ping

Harbormaster completed remote builds in B83437: Diff 313627.Dec 23 2020, 5:34 PM

jsji added inline comments.Dec 30 2020, 6:51 AM

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
80	Yes, I know it is exceed the limit, but I don't think we should invoke this optimization immediately when we exceed the limit, shouldn't we do such reg pressure reduction when the reg pressure is really high enough?

1: set the register pressure factor to 1.5

shchenz marked an inline comment as done.Dec 30 2020, 6:08 PM

shchenz added inline comments.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
80	Make sense. I changed the factor to 1.5 in the updated patch.

Harbormaster completed remote builds in B83772: Diff 314158.Dec 30 2020, 6:57 PM

LGTM. Thanks.

This revision is now accepted and ready to land.Jan 4 2021, 8:36 AM

Appreciate your review @jsji . Could you please help to have a look at this patch's parent https://reviews.llvm.org/D92069. Thanks again.

Closed by commit rG26a396c4ef48: [PowerPC] support register pressure reduction in machine combiner. (authored by shchenz). · Explain WhyJan 17 2021, 8:56 PM

This revision was automatically updated to reflect the committed changes.

shchenz added a commit: rG26a396c4ef48: [PowerPC] support register pressure reduction in machine combiner..

I'm going to roll this back as it's causing build bot failures (in progress example here: http://45.33.8.238/win/31506/). Sanitizers show the following issues:

==9097==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x5628a1bd3d10 in llvm::PPCInstrInfo::getFMAPatterns(llvm::MachineInstr&, llvm::SmallVectorImpl<llvm::MachineCombinerPattern>&, bool) const third_party/llvm/llvm-project/llvm/lib/Target/PowerPC/PPCInstrIn
fo.cpp:474:27                                                                                            
    #1 0x5628a1bd7e6c in llvm::PPCInstrInfo::getMachineCombinerPatterns(llvm::MachineInstr&, llvm::SmallVectorImpl<llvm::MachineCombinerPattern>&, bool) const third_party/llvm/llvm-project/llvm/lib/Target/PowerP
C/PPCInstrInfo.cpp:751:7                            
    #2 0x5628a3291840 in combineInstructions third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp:593:15
    #3 0x5628a3291840 in (anonymous namespace)::MachineCombiner::runOnMachineFunction(llvm::MachineFunction&) third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp:736:16                            
    #4 0x5628a32f91a2 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:13                                                    
    #5 0x5628a5e0dce7 in llvm::FPPassManager::runOnFunction(llvm::Function&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1439:27                                                               
    #6 0x5628a5e22f60 in llvm::FPPassManager::runOnModule(llvm::Module&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1485:16                                                                   
    #7 0x5628a5e0f1d5 in runOnModule third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1554:27
    #8 0x5628a5e0f1d5 in llvm::legacy::PassManagerImpl::run(llvm::Module&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:44                                                                  
    #9 0x56289f8dcb66 in compileModule(char**, llvm::LLVMContext&) third_party/llvm/llvm-project/llvm/tools/llc/llc.cpp:658:8

and

==1709==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f155364a060 at pc 0x555a067755c4 bp 0x7fff9b909050 sp 0x7fff9b909048
READ of size 8 at 0x7f155364a060 thread T0
    #0 0x555a067755c3 in operator[] third_party/crosstool/v18/stable/toolchain/bin/../include/c++/v1/vector:1550:18
    #1 0x555a067755c3 in llvm::PPCInstrInfo::shouldReduceRegisterPressure(llvm::MachineBasicBlock*, llvm::RegisterClassInfo*) const third_party/llvm/llvm-project/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:655:10
    #2 0x555a07df5618 in combineInstructions third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp:561:12
    #3 0x555a07df5618 in (anonymous namespace)::MachineCombiner::runOnMachineFunction(llvm::MachineFunction&) third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineCombiner.cpp:736:16
    #4 0x555a07e54479 in llvm::MachineFunctionPass::runOnFunction(llvm::Function&) third_party/llvm/llvm-project/llvm/lib/CodeGen/MachineFunctionPass.cpp:72:13
    #5 0x555a0b360710 in llvm::FPPassManager::runOnFunction(llvm::Function&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1439:27
    #6 0x555a0b374470 in llvm::FPPassManager::runOnModule(llvm::Module&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1485:16
    #7 0x555a0b3616e0 in runOnModule third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:1554:27
    #8 0x555a0b3616e0 in llvm::legacy::PassManagerImpl::run(llvm::Module&) third_party/llvm/llvm-project/llvm/lib/IR/LegacyPassManager.cpp:541:44
    #9 0x555a0461db8b in compileModule(char**, llvm::LLVMContext&) third_party/llvm/llvm-project/llvm/tools/llc/llc.cpp:658:8
    #10 0x555a046183bf in main third_party/llvm/llvm-project/llvm/tools/llc/llc.cpp:363:22               
                                                    
Address 0x7f155364a060 is located in stack of thread T0 at offset 96 in frame                                                                                                                                      
    #0 0x555a067742ff in llvm::PPCInstrInfo::shouldReduceRegisterPressure(llvm::MachineBasicBlock*, llvm::RegisterClassInfo*) const third_party/llvm/llvm-project/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp:598
                                                    
  This frame has 7 object(s):                                                                            
    [32, 40) '__x.i.i'                                                                                                                                                                                             
    [64, 72) 'retval.i.i'                                                                                                                                                                                          
    [96, 424) 'Pressure.i' (line 624) <== Memory access at offset 96 is inside this variable                                                                                                                       
    [496, 848) 'RPTracker.i' (line 625)                                                                                                                                                                            
    [912, 920) 'MII.i' (line 631)                                                                                                                                                                                  
    [944, 952) 'MIE.i' (line 631)                                                                                                                                                                                  
    [976, 1408) 'RegOpers.i' (line 637)                                                                                                                                                                            
HINT: this may be a false positive if your program uses some custom stack unwind mechanism, swapcontext or vfork                                                                                                   
      (longjmp and C++ exceptions *are* supported)

Tres Popp <tpopp@google.com> added a reverting change: rG3bd24574c7d0: Revert "[PowerPC] support register pressure reduction in machine combiner.".Jan 18 2021, 3:02 AM

1: fix sanitizer warning

Hi @tpopp , thanks for reverting. Do you happen to know how to run the Sanitizers tests on X86-64 target? I don't have an X86 testing environment.

I addressed one issue related to stack-use-after-scope, but I can not find out the reason for failure use-of-uninitialized-value. Maybe the use-of-uninitialized-value failure is also the same issue as there is one variable be assigned with returning value of function shouldReduceRegisterPressure(This is where failure stack-use-after-scope happens.). But I am not sure.

If you can rerun the checks, could you please help to have a verify with the updated patch? Thanks.

full context

Harbormaster completed remote builds in B85589: Diff 317334.Jan 18 2021, 6:06 AM

Harbormaster completed remote builds in B85590: Diff 317335.Jan 18 2021, 6:10 AM

Confirmed that use-of-uninitialized-value can be reproduced on the PowerPC target and it can be fixed by the new patch.

Thanks for addressing the issues shchenz! I'm glad you figured out the use-of-uninitialized-value because that one confused me. I'm finishing my workday for the day, so I can rerun the patch tomorrow. Alternatively, all I know is that I had errors in my environment and that's what sanitizers showed, so you reproducing and fixing the issues is plenty I think.

update according to api change in D92069

Harbormaster completed remote builds in B85669: Diff 317460.Jan 18 2021, 9:32 PM

This revision was landed with ongoing or failed builds.Jan 24 2021, 6:28 PM

shchenz added a commit: rG0ed4cf4bf3b6: [PowerPC] support register pressure reduction in machine combiner..

foad added a subscriber: foad.Feb 2 2021, 3:26 AM

foad added inline comments.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
347	Typo "floating".
362	Is it true that these patterns only improve register pressure if X and Y are "live out" of the pattern? Otherwise A could be allocated the same register as X or Y and there would be no increase.
365	Typo "attribute".

shchenz mentioned this in rGb0869a7d72f1: [PowerPC] [NFC] fix wording typos.Feb 2 2021, 6:03 PM

shchenz marked 3 inline comments as done.Feb 2 2021, 6:06 PM

shchenz added inline comments.

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
347	Thanks for pointing this out. NFC patch b0869a7d72f121b77d48d6496c6c3f00dd4731da fix this.
362	One condition for this transformation is `FMA` is the only user of `FSUB`, which means we can delete the original `FSUB` to save its definition, the original `A`. After the transformation, A must be assigned with the same register with `B` and `D`
365	fixed in commit b0869a7d72f121b77d48d6496c6c3f00dd4731da

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

MachineCombinerPattern.h

5 lines

lib/

CodeGen/

MachineCombiner.cpp

3 lines

Target/

PowerPC/

PPCInstrInfo.h

22 lines

PPCInstrInfo.cpp

501 lines

test/

CodeGen/

PowerPC/

135 lines

Diff 317335

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

Show All 23 Lines	enum class MachineCombinerPattern {
REASSOC_AX_YB,		REASSOC_AX_YB,
REASSOC_XA_BY,		REASSOC_XA_BY,
REASSOC_XA_YB,		REASSOC_XA_YB,

// These are patterns matched by the PowerPC to reassociate FMA chains.		// These are patterns matched by the PowerPC to reassociate FMA chains.
REASSOC_XY_AMM_BMM,		REASSOC_XY_AMM_BMM,
REASSOC_XMM_AMM_BMM,		REASSOC_XMM_AMM_BMM,

		// These are patterns matched by the PowerPC to reassociate FMA and FSUB to
		// reduce register pressure.
		REASSOC_XY_BCA,
		jsjiUnsubmitted Done Reply Inline Actions REASSOC_XY_BCA, REASSOC_XY_BAC, jsji: REASSOC_XY_BCA, REASSOC_XY_BAC,
		REASSOC_XY_BAC,

// These are multiply-add patterns matched by the AArch64 machine combiner.		// These are multiply-add patterns matched by the AArch64 machine combiner.
MULADDW_OP1,		MULADDW_OP1,
MULADDW_OP2,		MULADDW_OP2,
MULSUBW_OP1,		MULSUBW_OP1,
MULSUBW_OP2,		MULSUBW_OP2,
MULADDWI_OP1,		MULADDWI_OP1,
MULSUBWI_OP1,		MULSUBWI_OP1,
MULADDX_OP1,		MULADDX_OP1,
▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/lib/CodeGen/MachineCombiner.cpp

Show First 20 Lines • Show All 273 Lines • ▼ Show 20 Lines	static CombinerObjective getCombinerObjective(MachineCombinerPattern P) {
switch (P) {		switch (P) {
case MachineCombinerPattern::REASSOC_AX_BY:		case MachineCombinerPattern::REASSOC_AX_BY:
case MachineCombinerPattern::REASSOC_AX_YB:		case MachineCombinerPattern::REASSOC_AX_YB:
case MachineCombinerPattern::REASSOC_XA_BY:		case MachineCombinerPattern::REASSOC_XA_BY:
case MachineCombinerPattern::REASSOC_XA_YB:		case MachineCombinerPattern::REASSOC_XA_YB:
case MachineCombinerPattern::REASSOC_XY_AMM_BMM:		case MachineCombinerPattern::REASSOC_XY_AMM_BMM:
case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:		case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:
return CombinerObjective::MustReduceDepth;		return CombinerObjective::MustReduceDepth;
		case MachineCombinerPattern::REASSOC_XY_BCA:
		case MachineCombinerPattern::REASSOC_XY_BAC:
		return CombinerObjective::MustReduceRegisterPressure;
default:		default:
return CombinerObjective::Default;		return CombinerObjective::Default;
}		}
}		}

/// Estimate the latency of the new and original instruction sequence by summing		/// Estimate the latency of the new and original instruction sequence by summing
/// up the latencies of the inserted and deleted instructions. This assumes		/// up the latencies of the inserted and deleted instructions. This assumes
/// that the inserted and deleted instructions are dependent instruction chains,		/// that the inserted and deleted instructions are dependent instruction chains,
▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrInfo.h

Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines	class PPCInstrInfo : public PPCGenInstrInfo {
const unsigned *getStoreOpcodesForSpillArray() const;		const unsigned *getStoreOpcodesForSpillArray() const;
const unsigned *getLoadOpcodesForSpillArray() const;		const unsigned *getLoadOpcodesForSpillArray() const;
unsigned getSpillIndex(const TargetRegisterClass *RC) const;		unsigned getSpillIndex(const TargetRegisterClass *RC) const;
int16_t getFMAOpIdxInfo(unsigned Opcode) const;		int16_t getFMAOpIdxInfo(unsigned Opcode) const;
void reassociateFMA(MachineInstr &Root, MachineCombinerPattern Pattern,		void reassociateFMA(MachineInstr &Root, MachineCombinerPattern Pattern,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const;		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const;
		bool isLoadFromConstantPool(MachineInstr *I) const;
		jsjiUnsubmitted Done Reply Inline Actions isRPReductionCandidate jsji: isRPReductionCandidate
		Register
		generateLoadForNewConst(unsigned Idx, MachineInstr MI, Type Ty,
		SmallVectorImpl<MachineInstr *> &InsInstrs) const;
		const Constant getConstantFromConstantPool(MachineInstr I) const;
virtual void anchor();		virtual void anchor();

protected:		protected:
/// Commutes the operands in the given instruction.		/// Commutes the operands in the given instruction.
/// The commutable operands are specified by their indices OpIdx1 and OpIdx2.		/// The commutable operands are specified by their indices OpIdx1 and OpIdx2.
///		///
/// Do not call this method for a non-commutable instruction or for		/// Do not call this method for a non-commutable instruction or for
/// non-commutable pair of operand indices OpIdx1 and OpIdx2.		/// non-commutable pair of operand indices OpIdx1 and OpIdx2.
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	void genAlternativeCodeSequence(
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const override;		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const override;

/// Return true when there is potentially a faster code sequence for a fma		/// Return true when there is potentially a faster code sequence for a fma
/// chain ending in \p Root. All potential patterns are output in the \p		/// chain ending in \p Root. All potential patterns are output in the \p
/// P array.		/// P array.
bool getFMAPatterns(MachineInstr &Root,		bool getFMAPatterns(MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &P) const;		SmallVectorImpl<MachineCombinerPattern> &P,
		bool DoRegPressureReduce) const;
		jsjiUnsubmitted Done Reply Inline Actions DoRPReduction jsji: DoRPReduction

/// Return true when there is potentially a faster code sequence		/// Return true when there is potentially a faster code sequence
/// for an instruction chain ending in <Root>. All potential patterns are		/// for an instruction chain ending in <Root>. All potential patterns are
/// output in the <Pattern> array.		/// output in the <Pattern> array.
bool getMachineCombinerPatterns(MachineInstr &Root,		bool getMachineCombinerPatterns(MachineInstr &Root,
SmallVectorImpl<MachineCombinerPattern> &P,		SmallVectorImpl<MachineCombinerPattern> &P,
bool DoRegPressureReduce) const override;		bool DoRegPressureReduce) const override;

		/// On PowerPC, we leverage machine combiner pass to reduce register pressure
		jsjiUnsubmitted Done Reply Inline Actions /// Return true when ... jsji: /// Return true when ...
		/// when the register pressure is high for one BB.
		/// Return true if register pressure for \p MBB is high and ABI is supported
		/// to reduce register pressure. Otherwise return false.
		bool
		shouldReduceRegisterPressure(MachineBasicBlock *MBB,
		RegisterClassInfo *RegClassInfo) const override;
		jsjiUnsubmitted Done Reply Inline Actions Fixup the placeholders we put in genAlternativeCodeSequence() for MachineCombiner. jsji: Fixup the placeholders we put in genAlternativeCodeSequence() for MachineCombiner.

		/// Fixup the placeholders we put in genAlternativeCodeSequence() for
		/// MachineCombiner.
		void
		finalizeInsInstrs(MachineInstr &Root, MachineCombinerPattern &P,
		SmallVectorImpl<MachineInstr *> &InsInstrs) const override;

bool isAssociativeAndCommutative(const MachineInstr &Inst) const override;		bool isAssociativeAndCommutative(const MachineInstr &Inst) const override;

/// On PowerPC, we try to reassociate FMA chain which will increase		/// On PowerPC, we try to reassociate FMA chain which will increase
/// instruction size. Set extension resource length limit to 1 for edge case.		/// instruction size. Set extension resource length limit to 1 for edge case.
/// Resource Length is calculated by scaled resource usage in getCycles().		/// Resource Length is calculated by scaled resource usage in getCycles().
/// Because of the division in getCycles(), it returns different cycles due to		/// Because of the division in getCycles(), it returns different cycles due to
/// legacy scaled resource usage. So new resource length may be same with		/// legacy scaled resource usage. So new resource length may be same with
/// legacy or 1 bigger than legacy.		/// legacy or 1 bigger than legacy.
▲ Show 20 Lines • Show All 317 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp

Show All 15 Lines
#include "PPCHazardRecognizers.h"		#include "PPCHazardRecognizers.h"
#include "PPCInstrBuilder.h"		#include "PPCInstrBuilder.h"
#include "PPCMachineFunctionInfo.h"		#include "PPCMachineFunctionInfo.h"
#include "PPCTargetMachine.h"		#include "PPCTargetMachine.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/CodeGen/LiveIntervals.h"		#include "llvm/CodeGen/LiveIntervals.h"
		#include "llvm/CodeGen/MachineConstantPool.h"
#include "llvm/CodeGen/MachineFrameInfo.h"		#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunctionPass.h"		#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/PseudoSourceValue.h"		#include "llvm/CodeGen/PseudoSourceValue.h"
		#include "llvm/CodeGen/RegisterClassInfo.h"
		#include "llvm/CodeGen/RegisterPressure.h"
#include "llvm/CodeGen/ScheduleDAG.h"		#include "llvm/CodeGen/ScheduleDAG.h"
#include "llvm/CodeGen/SlotIndexes.h"		#include "llvm/CodeGen/SlotIndexes.h"
#include "llvm/CodeGen/StackMaps.h"		#include "llvm/CodeGen/StackMaps.h"
#include "llvm/MC/MCAsmInfo.h"		#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCInst.h"		#include "llvm/MC/MCInst.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
Show All 30 Lines
static cl::opt<bool> VSXSelfCopyCrash("crash-on-ppc-vsx-self-copy",		static cl::opt<bool> VSXSelfCopyCrash("crash-on-ppc-vsx-self-copy",
cl::desc("Causes the backend to crash instead of generating a nop VSX copy"),		cl::desc("Causes the backend to crash instead of generating a nop VSX copy"),
cl::Hidden);		cl::Hidden);

static cl::opt<bool>		static cl::opt<bool>
UseOldLatencyCalc("ppc-old-latency-calc", cl::Hidden,		UseOldLatencyCalc("ppc-old-latency-calc", cl::Hidden,
cl::desc("Use the old (incorrect) instruction latency calculation"));		cl::desc("Use the old (incorrect) instruction latency calculation"));

		static cl::opt<float>
		FMARPFactor("ppc-fma-rp-factor", cl::Hidden, cl::init(1.5),
		jsjiUnsubmitted Not Done Reply Inline Actions "ppc-fma-rp-factor" Since we multiple this value, this is more a factor than a threshold, then the default value should be slightly more than 1.0? jsji: "ppc-fma-rp-factor" Since we multiple this value, this is more a factor than a threshold, then…
		shchenzAuthorUnsubmitted Done Reply Inline Actions The logic here is if the calculated register pressure(`RPTracker.getPressure()`) exceeds the limits(get`RegPressureSetLimit`), we should do the reassociation. So I used 1.0. Does changing the compare from `>=` to `>` make sense to you? shchenz: The logic here is if the calculated register pressure(`RPTracker.getPressure()`) exceeds the…
		jsjiUnsubmitted Done Reply Inline Actions Yes, I know it is exceed the limit, but I don't think we should invoke this optimization immediately when we exceed the limit, shouldn't we do such reg pressure reduction when the reg pressure is really high enough? jsji: Yes, I know it is exceed the limit, but I don't think we should invoke this optimization…
		shchenzAuthorUnsubmitted Done Reply Inline Actions Make sense. I changed the factor to 1.5 in the updated patch. shchenz: Make sense. I changed the factor to 1.5 in the updated patch.
		cl::desc("register pressure factor for the transformations."));

		static cl::opt<bool> EnableFMARegPressureReduction(
		jsjiUnsubmitted Done Reply Inline Actions EnableFMARegPressureReduction jsji: EnableFMARegPressureReduction
		"ppc-fma-rp-reduction", cl::Hidden, cl::init(true),
		jsjiUnsubmitted Done Reply Inline Actions "ppc-fma-rp-reduction" jsji: "ppc-fma-rp-reduction"
		cl::desc("enable register pressure reduce in machine combiner pass."));

// Pin the vtable to this file.		// Pin the vtable to this file.
void PPCInstrInfo::anchor() {}		void PPCInstrInfo::anchor() {}

PPCInstrInfo::PPCInstrInfo(PPCSubtarget &STI)		PPCInstrInfo::PPCInstrInfo(PPCSubtarget &STI)
: PPCGenInstrInfo(PPC::ADJCALLSTACKDOWN, PPC::ADJCALLSTACKUP,		: PPCGenInstrInfo(PPC::ADJCALLSTACKDOWN, PPC::ADJCALLSTACKUP,
/* CatchRetOpcode */ -1,		/* CatchRetOpcode */ -1,
STI.isPPC64() ? PPC::BLR8 : PPC::BLR),		STI.isPPC64() ? PPC::BLR8 : PPC::BLR),
Subtarget(STI), RI(STI.getTargetMachine()) {}		Subtarget(STI), RI(STI.getTargetMachine()) {}
▲ Show 20 Lines • Show All 189 Lines • ▼ Show 20 Lines	bool PPCInstrInfo::isAssociativeAndCommutative(const MachineInstr &Inst) const {
}		}
}		}

#define InfoArrayIdxFMAInst 0		#define InfoArrayIdxFMAInst 0
#define InfoArrayIdxFAddInst 1		#define InfoArrayIdxFAddInst 1
#define InfoArrayIdxFMULInst 2		#define InfoArrayIdxFMULInst 2
#define InfoArrayIdxAddOpIdx 3		#define InfoArrayIdxAddOpIdx 3
#define InfoArrayIdxMULOpIdx 4		#define InfoArrayIdxMULOpIdx 4
		#define InfoArrayIdxFSubInst 5
// Array keeps info for FMA instructions:		// Array keeps info for FMA instructions:
// Index 0(InfoArrayIdxFMAInst): FMA instruction;		// Index 0(InfoArrayIdxFMAInst): FMA instruction;
// Index 1(InfoArrayIdxFAddInst): ADD instruction assoaicted with FMA;		// Index 1(InfoArrayIdxFAddInst): ADD instruction associated with FMA;
// Index 2(InfoArrayIdxFMULInst): MUL instruction assoaicted with FMA;		// Index 2(InfoArrayIdxFMULInst): MUL instruction associated with FMA;
// Index 3(InfoArrayIdxAddOpIdx): ADD operand index in FMA operands;		// Index 3(InfoArrayIdxAddOpIdx): ADD operand index in FMA operands;
// Index 4(InfoArrayIdxMULOpIdx): first MUL operand index in FMA operands;		// Index 4(InfoArrayIdxMULOpIdx): first MUL operand index in FMA operands;
// second MUL operand index is plus 1.		// second MUL operand index is plus 1;
static const uint16_t FMAOpIdxInfo[][5] = {		// Index 5(InfoArrayIdxFSubInst): SUB instruction associated with FMA.
		static const uint16_t FMAOpIdxInfo[][6] = {
// FIXME: Add more FMA instructions like XSNMADDADP and so on.		// FIXME: Add more FMA instructions like XSNMADDADP and so on.
{PPC::XSMADDADP, PPC::XSADDDP, PPC::XSMULDP, 1, 2},		{PPC::XSMADDADP, PPC::XSADDDP, PPC::XSMULDP, 1, 2, PPC::XSSUBDP},
{PPC::XSMADDASP, PPC::XSADDSP, PPC::XSMULSP, 1, 2},		{PPC::XSMADDASP, PPC::XSADDSP, PPC::XSMULSP, 1, 2, PPC::XSSUBSP},
{PPC::XVMADDADP, PPC::XVADDDP, PPC::XVMULDP, 1, 2},		{PPC::XVMADDADP, PPC::XVADDDP, PPC::XVMULDP, 1, 2, PPC::XVSUBDP},
{PPC::XVMADDASP, PPC::XVADDSP, PPC::XVMULSP, 1, 2},		{PPC::XVMADDASP, PPC::XVADDSP, PPC::XVMULSP, 1, 2, PPC::XVSUBSP},
{PPC::FMADD, PPC::FADD, PPC::FMUL, 3, 1},		{PPC::FMADD, PPC::FADD, PPC::FMUL, 3, 1, PPC::FSUB},
{PPC::FMADDS, PPC::FADDS, PPC::FMULS, 3, 1}};		{PPC::FMADDS, PPC::FADDS, PPC::FMULS, 3, 1, PPC::FSUBS}};

// Check if an opcode is a FMA instruction. If it is, return the index in array		// Check if an opcode is a FMA instruction. If it is, return the index in array
// FMAOpIdxInfo. Otherwise, return -1.		// FMAOpIdxInfo. Otherwise, return -1.
int16_t PPCInstrInfo::getFMAOpIdxInfo(unsigned Opcode) const {		int16_t PPCInstrInfo::getFMAOpIdxInfo(unsigned Opcode) const {
for (unsigned I = 0; I < array_lengthof(FMAOpIdxInfo); I++)		for (unsigned I = 0; I < array_lengthof(FMAOpIdxInfo); I++)
if (FMAOpIdxInfo[I][InfoArrayIdxFMAInst] == Opcode)		if (FMAOpIdxInfo[I][InfoArrayIdxFMAInst] == Opcode)
return I;		return I;
return -1;		return -1;
}		}

		// On PowerPC target, we have two kinds of patterns related to FMA:
		// 1: Improve ILP.
// Try to reassociate FMA chains like below:		// Try to reassociate FMA chains like below:
//		//
// Pattern 1:		// Pattern 1:
// A = FADD X, Y (Leaf)		// A = FADD X, Y (Leaf)
// B = FMA A, M21, M22 (Prev)		// B = FMA A, M21, M22 (Prev)
// C = FMA B, M31, M32 (Root)		// C = FMA B, M31, M32 (Root)
// -->		// -->
// A = FMA X, M21, M22		// A = FMA X, M21, M22
// B = FMA Y, M31, M32		// B = FMA Y, M31, M32
// C = FADD A, B		// C = FADD A, B
//		//
// Pattern 2:		// Pattern 2:
// A = FMA X, M11, M12 (Leaf)		// A = FMA X, M11, M12 (Leaf)
// B = FMA A, M21, M22 (Prev)		// B = FMA A, M21, M22 (Prev)
// C = FMA B, M31, M32 (Root)		// C = FMA B, M31, M32 (Root)
// -->		// -->
// A = FMUL M11, M12		// A = FMUL M11, M12
// B = FMA X, M21, M22		// B = FMA X, M21, M22
// D = FMA A, M31, M32		// D = FMA A, M31, M32
// C = FADD B, D		// C = FADD B, D
//		//
// breaking the dependency between A and B, allowing FMA to be executed in		// breaking the dependency between A and B, allowing FMA to be executed in
// parallel (or back-to-back in a pipeline) instead of depending on each other.		// parallel (or back-to-back in a pipeline) instead of depending on each other.
		//
		// 2: Reduce register pressure.
		// Try to reassociate FMA with FSUB and a constant like below:
		// C is a floatint point const.
		foadUnsubmitted Not Done Reply Inline Actions Typo "floating". foad: Typo "floating".
		shchenzAuthorUnsubmitted Done Reply Inline Actions Thanks for pointing this out. NFC patch b0869a7d72f121b77d48d6496c6c3f00dd4731da fix this. shchenz: Thanks for pointing this out. NFC patch b0869a7d72f121b77d48d6496c6c3f00dd4731da fix this.
		//
		// Pattern 1:
		// A = FSUB X, Y (Leaf)
		jsjiUnsubmitted Done Reply Inline Actions A = FSUB X, Y (Leaf) D = FMA B, C, A (Root) -> A = FMA B, Y, -C C = FMA A, X, C jsji: // A = FSUB X, Y (Leaf) // D = FMA B, C, A (Root) -> // A = FMA B, Y…
		// D = FMA B, C, A (Root)
		// -->
		// A = FMA B, Y, -C
		// D = FMA A, X, C
		//
		jsjiUnsubmitted Done Reply Inline Actions This is the same pattern as above, as we can commute the operands of FMA? A = FSUB X, Y (Leaf) D = FMA B, A, C (Root) -> A = FMA B, Y, -C C = FMA A, X, C jsji: This is the same pattern as above, as we can commute the operands of FMA? // A = FSUB X…
		// Pattern 2:
		// A = FSUB X, Y (Leaf)
		// D = FMA B, A, C (Root)
		// -->
		// A = FMA B, Y, -C
		// D = FMA A, X, C
		//
		foadUnsubmitted Not Done Reply Inline Actions Is it true that these patterns only improve register pressure if X and Y are "live out" of the pattern? Otherwise A could be allocated the same register as X or Y and there would be no increase. foad: Is it true that these patterns only improve register pressure if X and Y are "live out" of the…
		shchenzAuthorUnsubmitted Done Reply Inline Actions One condition for this transformation is `FMA` is the only user of `FSUB`, which means we can delete the original `FSUB` to save its definition, the original `A`. After the transformation, A must be assigned with the same register with `B` and `D` shchenz: One condition for this transformation is `FMA` is the only user of `FSUB`, which means we can…
		// Before the transformation, A must be assigned with different hardware
		// register with D. After the transformation, A and D must be assigned with
		// same hardware register due to TIE attricute of FMA instructions.
		foadUnsubmitted Not Done Reply Inline Actions Typo "attribute". foad: Typo "attribute".
		shchenzAuthorUnsubmitted Done Reply Inline Actions fixed in commit b0869a7d72f121b77d48d6496c6c3f00dd4731da shchenz: fixed in commit b0869a7d72f121b77d48d6496c6c3f00dd4731da
		//
bool PPCInstrInfo::getFMAPatterns(		bool PPCInstrInfo::getFMAPatterns(
MachineInstr &Root,		MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,
SmallVectorImpl<MachineCombinerPattern> &Patterns) const {		bool DoRegPressureReduce) const {
MachineBasicBlock *MBB = Root.getParent();		MachineBasicBlock *MBB = Root.getParent();
const MachineRegisterInfo &MRI = MBB->getParent()->getRegInfo();		const MachineRegisterInfo *MRI = &MBB->getParent()->getRegInfo();
		const TargetRegisterInfo *TRI = &getRegisterInfo();

auto IsAllOpsVirtualReg = [](const MachineInstr &Instr) {		auto IsAllOpsVirtualReg = [](const MachineInstr &Instr) {
for (const auto &MO : Instr.explicit_operands())		for (const auto &MO : Instr.explicit_operands())
if (!(MO.isReg() && Register::isVirtualRegister(MO.getReg())))		if (!(MO.isReg() && Register::isVirtualRegister(MO.getReg())))
return false;		return false;
return true;		return true;
};		};

auto IsReassociableAdd = [&](const MachineInstr &Instr) {		auto IsReassociableAddOrSub = [&](const MachineInstr &Instr,
		unsigned OpType) {
if (Instr.getOpcode() !=		if (Instr.getOpcode() !=
FMAOpIdxInfo[getFMAOpIdxInfo(Root.getOpcode())][InfoArrayIdxFAddInst])		FMAOpIdxInfo[getFMAOpIdxInfo(Root.getOpcode())][OpType])
return false;		return false;

// Instruction can be reassociated.		// Instruction can be reassociated.
// fast math flags may prohibit reassociation.		// fast math flags may prohibit reassociation.
if (!(Instr.getFlag(MachineInstr::MIFlag::FmReassoc) &&		if (!(Instr.getFlag(MachineInstr::MIFlag::FmReassoc) &&
Instr.getFlag(MachineInstr::MIFlag::FmNsz)))		Instr.getFlag(MachineInstr::MIFlag::FmNsz)))
return false;		return false;

// Instruction operands are virtual registers for reassociation.		// Instruction operands are virtual registers for reassociation.
if (!IsAllOpsVirtualReg(Instr))		if (!IsAllOpsVirtualReg(Instr))
return false;		return false;

		// For register pressure reassociation, the FSub must have only one use as
		// we want to delete the sub to save its def.
		if (OpType == InfoArrayIdxFSubInst &&
		!MRI->hasOneNonDBGUse(Instr.getOperand(0).getReg()))
		return false;

return true;		return true;
};		};

auto IsReassociableFMA = [&](const MachineInstr &Instr, int16_t &AddOpIdx,		auto IsReassociableFMA = [&](const MachineInstr &Instr, int16_t &AddOpIdx,
bool IsLeaf) {		int16_t &MulOpIdx, bool IsLeaf) {
int16_t Idx = getFMAOpIdxInfo(Instr.getOpcode());		int16_t Idx = getFMAOpIdxInfo(Instr.getOpcode());
if (Idx < 0)		if (Idx < 0)
return false;		return false;

// Instruction can be reassociated.		// Instruction can be reassociated.
// fast math flags may prohibit reassociation.		// fast math flags may prohibit reassociation.
if (!(Instr.getFlag(MachineInstr::MIFlag::FmReassoc) &&		if (!(Instr.getFlag(MachineInstr::MIFlag::FmReassoc) &&
Instr.getFlag(MachineInstr::MIFlag::FmNsz)))		Instr.getFlag(MachineInstr::MIFlag::FmNsz)))
return false;		return false;

// Instruction operands are virtual registers for reassociation.		// Instruction operands are virtual registers for reassociation.
if (!IsAllOpsVirtualReg(Instr))		if (!IsAllOpsVirtualReg(Instr))
return false;		return false;

		MulOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxMULOpIdx];
if (IsLeaf)		if (IsLeaf)
return true;		return true;

AddOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxAddOpIdx];		AddOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxAddOpIdx];

const MachineOperand &OpAdd = Instr.getOperand(AddOpIdx);		const MachineOperand &OpAdd = Instr.getOperand(AddOpIdx);
MachineInstr *MIAdd = MRI.getUniqueVRegDef(OpAdd.getReg());		MachineInstr *MIAdd = MRI->getUniqueVRegDef(OpAdd.getReg());
// If 'add' operand's def is not in current block, don't do ILP related opt.		// If 'add' operand's def is not in current block, don't do ILP related opt.
if (!MIAdd \|\| MIAdd->getParent() != MBB)		if (!MIAdd \|\| MIAdd->getParent() != MBB)
return false;		return false;

// If this is not Leaf FMA Instr, its 'add' operand should only have one use		// If this is not Leaf FMA Instr, its 'add' operand should only have one use
// as this fma will be changed later.		// as this fma will be changed later.
return IsLeaf ? true : MRI.hasOneNonDBGUse(OpAdd.getReg());		return IsLeaf ? true : MRI->hasOneNonDBGUse(OpAdd.getReg());
};		};

int16_t AddOpIdx = -1;		int16_t AddOpIdx = -1;
		int16_t MulOpIdx = -1;

		bool IsUsedOnceL = false;
		bool IsUsedOnceR = false;
		jsjiUnsubmitted Done Reply Inline Actions Do all candidate checks in isRegisterPressureReduceCandidate, so that we have simple logic like: if (DoRegPressureReduce && isRegisterPressureReduceCandidate(Root, MULInstrL,MULInstrR )) { if (isLoadFromConstantPool(MULInstrL) && IsReassociableAddOrSub(MULInstrR, InfoArrayIdxFSubInst)) { LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n"); Patterns.push_back(MachineCombinerPattern::REASSOC_XY_BCA); return true; } if (isLoadFromConstantPool(MULInstrR) && IsReassociableAddOrSub(MULInstrL, InfoArrayIdxFSubInst)) { LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n"); Patterns.push_back(MachineCombinerPattern::REASSOC_XY_BCA); return true; } } jsji: Do all candidate checks in isRegisterPressureReduceCandidate, so that we have simple logic like…
		MachineInstr *MULInstrL = nullptr;
		MachineInstr *MULInstrR = nullptr;

		jsjiUnsubmitted Done Reply Inline Actions This check should be in `isRegisterPressureReduceCandidate` jsji: This check should be in `isRegisterPressureReduceCandidate`
		auto IsRPReductionCandidate = [&]() {
		// Currently, we only support float and double.
		jsjiUnsubmitted Done Reply Inline Actions Uninitialized variable. jsji: Uninitialized variable.
		// FIXME: add support for other types.
		unsigned Opcode = Root.getOpcode();
		jsjiUnsubmitted Done Reply Inline Actions This should be in `isRegisterPressureReduceCandidate` as well. jsji: This should be in `isRegisterPressureReduceCandidate` as well.
		if (Opcode != PPC::XSMADDASP && Opcode != PPC::XSMADDADP)
		return false;

// Root must be a valid FMA like instruction.		// Root must be a valid FMA like instruction.
if (!IsReassociableFMA(Root, AddOpIdx, false))		// Treat it as leaf as we don't care its add operand.
		if (IsReassociableFMA(Root, AddOpIdx, MulOpIdx, true)) {
		assert((MulOpIdx >= 0) && "mul operand index not right!");
		Register MULRegL = TRI->lookThruCopyLike(
		Root.getOperand(MulOpIdx).getReg(), MRI, &IsUsedOnceL);
		Register MULRegR = TRI->lookThruCopyLike(
		Root.getOperand(MulOpIdx + 1).getReg(), MRI, &IsUsedOnceR);
		if (!Register::isVirtualRegister(MULRegL) \|\|
		!Register::isVirtualRegister(MULRegR))
		return false;

		MULInstrL = MRI->getVRegDef(MULRegL);
		MULInstrR = MRI->getVRegDef(MULRegR);
		return true;
		}
		return false;
		};

		// Register pressure fma reassociation patterns.
		if (DoRegPressureReduce && IsRPReductionCandidate()) {
		assert((MULInstrL && MULInstrR) && "wrong register preduction candidate!");
		// Register pressure pattern 1
		if (isLoadFromConstantPool(MULInstrL) && IsUsedOnceR &&
		IsReassociableAddOrSub(*MULInstrR, InfoArrayIdxFSubInst)) {
		LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BCA\n");
		Patterns.push_back(MachineCombinerPattern::REASSOC_XY_BCA);
		return true;
		}

		// Register pressure pattern 2
		if ((isLoadFromConstantPool(MULInstrR) && IsUsedOnceL &&
		IsReassociableAddOrSub(*MULInstrL, InfoArrayIdxFSubInst))) {
		LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_BAC\n");
		Patterns.push_back(MachineCombinerPattern::REASSOC_XY_BAC);
		return true;
		}
		}

		// ILP fma reassociation patterns.
		// Root must be a valid FMA like instruction.
		AddOpIdx = -1;
		if (!IsReassociableFMA(Root, AddOpIdx, MulOpIdx, false))
return false;		return false;

assert((AddOpIdx >= 0) && "add operand index not right!");		assert((AddOpIdx >= 0) && "add operand index not right!");

Register RegB = Root.getOperand(AddOpIdx).getReg();		Register RegB = Root.getOperand(AddOpIdx).getReg();
MachineInstr *Prev = MRI.getUniqueVRegDef(RegB);		MachineInstr *Prev = MRI->getUniqueVRegDef(RegB);

// Prev must be a valid FMA like instruction.		// Prev must be a valid FMA like instruction.
AddOpIdx = -1;		AddOpIdx = -1;
if (!IsReassociableFMA(*Prev, AddOpIdx, false))		if (!IsReassociableFMA(*Prev, AddOpIdx, MulOpIdx, false))
return false;		return false;

assert((AddOpIdx >= 0) && "add operand index not right!");		assert((AddOpIdx >= 0) && "add operand index not right!");

Register RegA = Prev->getOperand(AddOpIdx).getReg();		Register RegA = Prev->getOperand(AddOpIdx).getReg();
MachineInstr *Leaf = MRI.getUniqueVRegDef(RegA);		MachineInstr *Leaf = MRI->getUniqueVRegDef(RegA);
AddOpIdx = -1;		AddOpIdx = -1;
if (IsReassociableFMA(*Leaf, AddOpIdx, true)) {		if (IsReassociableFMA(*Leaf, AddOpIdx, MulOpIdx, true)) {
Patterns.push_back(MachineCombinerPattern::REASSOC_XMM_AMM_BMM);		Patterns.push_back(MachineCombinerPattern::REASSOC_XMM_AMM_BMM);
		LLVM_DEBUG(dbgs() << "add pattern REASSOC_XMM_AMM_BMM\n");
return true;		return true;
}		}
if (IsReassociableAdd(*Leaf)) {		if (IsReassociableAddOrSub(*Leaf, InfoArrayIdxFAddInst)) {
Patterns.push_back(MachineCombinerPattern::REASSOC_XY_AMM_BMM);		Patterns.push_back(MachineCombinerPattern::REASSOC_XY_AMM_BMM);
		LLVM_DEBUG(dbgs() << "add pattern REASSOC_XY_AMM_BMM\n");
return true;		return true;
}		}
return false;		return false;
}		}

		void PPCInstrInfo::finalizeInsInstrs(
		MachineInstr &Root, MachineCombinerPattern &P,
		SmallVectorImpl<MachineInstr *> &InsInstrs) const {
		assert(!InsInstrs.empty() && "Instructions set to be inserted is empty!");

		MachineFunction *MF = Root.getMF();
		MachineRegisterInfo *MRI = &MF->getRegInfo();
		const TargetRegisterInfo *TRI = &getRegisterInfo();
		MachineConstantPool *MCP = MF->getConstantPool();

		int16_t Idx = getFMAOpIdxInfo(Root.getOpcode());
		if (Idx < 0)
		return;

		uint16_t FirstMulOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxMULOpIdx];

		// For now we only need to fix up placeholder for register pressure reduce
		// patterns.
		Register ConstReg = 0;
		switch (P) {
		case MachineCombinerPattern::REASSOC_XY_BCA:
		ConstReg =
		TRI->lookThruCopyLike(Root.getOperand(FirstMulOpIdx).getReg(), MRI);
		break;
		case MachineCombinerPattern::REASSOC_XY_BAC:
		ConstReg =
		TRI->lookThruCopyLike(Root.getOperand(FirstMulOpIdx + 1).getReg(), MRI);
		break;
		default:
		// Not register pressure reduce patterns.
		return;
		}

		MachineInstr *ConstDefInstr = MRI->getVRegDef(ConstReg);
		// Get const value from const pool.
		const Constant *C = getConstantFromConstantPool(ConstDefInstr);
		assert(isa<llvm::ConstantFP>(C) && "not a valid constant!");

		// Get negative fp const.
		APFloat F1((dyn_cast<ConstantFP>(C))->getValueAPF());
		F1.changeSign();
		jsjiUnsubmitted Done Reply Inline Actions Why this can not be inlined into next line of ConstantFP::get? jsji: Why this can not be inlined into next line of ConstantFP::get?
		shchenzAuthorUnsubmitted Done Reply Inline Actions return type of `F1.changeSign();` is `void` shchenz: return type of `F1.changeSign();` is `void`
		Constant *NegC = ConstantFP::get(dyn_cast<ConstantFP>(C)->getContext(), F1);
		Align Alignment = MF->getDataLayout().getPrefTypeAlign(C->getType());

		// Put negative fp const into constant pool.
		unsigned ConstPoolIdx = MCP->getConstantPoolIndex(NegC, Alignment);

		MachineOperand *Placeholder = nullptr;
		// Record the placeholder PPC::ZERO8 we add in reassociateFMA.
		for (auto *Inst : InsInstrs) {
		for (MachineOperand &Operand : Inst->explicit_operands()) {
		assert(Operand.isReg() && "Invalid instruction in InsInstrs!");
		if (Operand.getReg() == PPC::ZERO8) {
		Placeholder = &Operand;
		break;
		}
		}
		}

		assert(Placeholder && "Placeholder does not exist!");
		jsjiUnsubmitted Done Reply Inline Actions What if we are in non-assert version and we can't find `Placeholder` ? jsji: What if we are in non-assert version and we can't find `Placeholder` ?
		shchenzAuthorUnsubmitted Done Reply Inline Actions if we are in non-assert version, we should get SEGV, we use `Placeholder` like `Placeholder->setReg(LoadNewConst);`. If we come here, `Placeholder` must be not null, we only handle register reduction patterns here. Do we need to add `if(!Placeholder) return;`? Seems we do not do this kind of protection in current llvm code base, for example: ICmpInst::Predicate Loop::LoopBounds::getCanonicalPredicate() const { BasicBlock Latch = L.getLoopLatch(); assert(Latch && "Expecting valid latch"); BranchInst BI = dyn_cast_or_null<BranchInst>(Latch->getTerminator()); shchenz: if we are in non-assert version, we should get SEGV, we use `Placeholder` like `Placeholder…

		// Generate instructions to load the const fp from constant pool.
		// We only support PPC64 and medium code model.
		Register LoadNewConst =
		generateLoadForNewConst(ConstPoolIdx, &Root, C->getType(), InsInstrs);

		// Fill the placeholder with the new load from constant pool.
		Placeholder->setReg(LoadNewConst);
		}

		bool PPCInstrInfo::shouldReduceRegisterPressure(
		MachineBasicBlock MBB, RegisterClassInfo RegClassInfo) const {

		if (!EnableFMARegPressureReduction)
		return false;

		// Currently, we only enable register pressure reducing in machine combiner
		// for: 1: PPC64; 2: Code Model is Medium; 3: Power9 which also has vector
		// support.
		//
		// So we need following instructions to access a TOC entry:
		//
		// %6:g8rc_and_g8rc_nox0 = ADDIStocHA8 $x2, %const.0
		// %7:vssrc = DFLOADf32 target-flags(ppc-toc-lo) %const.0,
		// killed %6:g8rc_and_g8rc_nox0, implicit $x2 :: (load 4 from constant-pool)
		//
		// FIXME: add more supported targets, like Small and Large code model, PPC32,
		// AIX.
		if (!(Subtarget.isPPC64() && Subtarget.hasP9Vector() &&
		Subtarget.getTargetMachine().getCodeModel() == CodeModel::Medium))
		return false;

		const TargetRegisterInfo *TRI = &getRegisterInfo();
		jsjiUnsubmitted Done Reply Inline Actions Split this out into a lamda or function , just call something like currMBBPresure = getMBBPressure (MBB...); VSSRCRPSetLimit= ... return currMBBPresure >= VSSRCRPSetLimit * RegPressureFactor jsji: Split this out into a lamda or function , just call something like ``` currMBBPresure =…
		MachineFunction *MF = MBB->getParent();
		MachineRegisterInfo *MRI = &MF->getRegInfo();

		auto GetMBBPressure = [&](MachineBasicBlock *MBB) -> std::vector<unsigned> {
		RegionPressure Pressure;
		RegPressureTracker RPTracker(Pressure);

		// Initialize the register pressure tracker.
		RPTracker.init(MBB->getParent(), RegClassInfo, nullptr, MBB, MBB->end(),
		/TrackLaneMasks/ false, /TrackUntiedDefs=/true);

		for (MachineBasicBlock::iterator MII = MBB->instr_end(),
		MIE = MBB->instr_begin();
		MII != MIE; --MII) {
		MachineInstr &MI = *std::prev(MII);
		if (MI.isDebugValue() \|\| MI.isDebugLabel())
		continue;
		RegisterOperands RegOpers;
		RegOpers.collect(MI, TRI, MRI, false, false);
		RPTracker.recedeSkipDebugValues();
		assert(&*RPTracker.getPos() == &MI && "RPTracker sync error!");
		RPTracker.recede(RegOpers);
		}

		// Close the RPTracker to finalize live ins.
		RPTracker.closeRegion();

		return RPTracker.getPressure().MaxSetPressure;
		};

		// For now we only care about float and double type fma.
		unsigned VSSRCLimit = TRI->getRegPressureSetLimit(
		*MBB->getParent(), PPC::RegisterPressureSets::VSSRC);

		// Only reduce register pressure when pressure is high.
		jsjiUnsubmitted Done Reply Inline Actions This should check the Root, and look through copies, isVirtualRegister. Only return true when all the requirements are met. jsji: This should check the Root, and look through copies, isVirtualRegister. Only return true when…
		shchenzAuthorUnsubmitted Done Reply Inline Actions new `IsRPReductionCandidate` already does more than this. shchenz: new `IsRPReductionCandidate` already does more than this.
		return GetMBBPressure(MBB)[PPC::RegisterPressureSets::VSSRC] >
		(float)VSSRCLimit * FMARPFactor;
		}

		bool PPCInstrInfo::isLoadFromConstantPool(MachineInstr *I) const {
		// I has only one memory operand which is load from constant pool.
		if (!I->hasOneMemOperand())
		return false;

		MachineMemOperand *Op = I->memoperands()[0];
		return Op->isLoad() && Op->getPseudoValue() &&
		Op->getPseudoValue()->kind() == PseudoSourceValue::ConstantPool;
		}

		Register PPCInstrInfo::generateLoadForNewConst(
		unsigned Idx, MachineInstr MI, Type Ty,
		SmallVectorImpl<MachineInstr *> &InsInstrs) const {
		// Now we only support PPC64, Medium code model and P9 with vector.
		// We have immutable pattern to access const pool. See function
		// shouldReduceRegisterPressure.
		assert((Subtarget.isPPC64() && Subtarget.hasP9Vector() &&
		Subtarget.getTargetMachine().getCodeModel() == CodeModel::Medium) &&
		"Target not supported!\n");

		MachineFunction *MF = MI->getMF();
		MachineRegisterInfo *MRI = &MF->getRegInfo();

		// Generate ADDIStocHA8
		Register VReg1 = MRI->createVirtualRegister(&PPC::G8RC_and_G8RC_NOX0RegClass);
		MachineInstrBuilder TOCOffset =
		BuildMI(*MF, MI->getDebugLoc(), get(PPC::ADDIStocHA8), VReg1)
		.addReg(PPC::X2)
		.addConstantPoolIndex(Idx);

		assert((Ty->isFloatTy() \|\| Ty->isDoubleTy()) &&
		"Only float and double are supported!");

		unsigned LoadOpcode;
		// Should be float type or double type.
		if (Ty->isFloatTy())
		LoadOpcode = PPC::DFLOADf32;
		else
		LoadOpcode = PPC::DFLOADf64;

		const TargetRegisterClass *RC = MRI->getRegClass(MI->getOperand(0).getReg());
		Register VReg2 = MRI->createVirtualRegister(RC);
		MachineMemOperand *MMO = MF->getMachineMemOperand(
		MachinePointerInfo::getConstantPool(*MF), MachineMemOperand::MOLoad,
		Ty->getScalarSizeInBits() / 8, MF->getDataLayout().getPrefTypeAlign(Ty));

		// Generate Load from constant pool.
		MachineInstrBuilder Load =
		BuildMI(*MF, MI->getDebugLoc(), get(LoadOpcode), VReg2)
		.addConstantPoolIndex(Idx)
		.addReg(VReg1, getKillRegState(true))
		.addMemOperand(MMO);

		Load->getOperand(1).setTargetFlags(PPCII::MO_TOC_LO);

		// Insert the toc load instructions into InsInstrs.
		InsInstrs.insert(InsInstrs.begin(), Load);
		InsInstrs.insert(InsInstrs.begin(), TOCOffset);
		return VReg2;
		}

		// This function returns the const value in constant pool if the \p I is a load
		// from constant pool.
		const Constant *
		jsjiUnsubmitted Done Reply Inline Actions Add document about function and argument assumptions. jsji: Add document about function and argument assumptions.
		PPCInstrInfo::getConstantFromConstantPool(MachineInstr *I) const {
		MachineFunction *MF = I->getMF();
		MachineRegisterInfo *MRI = &MF->getRegInfo();
		MachineConstantPool *MCP = MF->getConstantPool();
		assert(I->mayLoad() && "Should be a load instruction.\n");
		for (auto MO : I->uses()) {
		if (!MO.isReg())
		continue;
		Register Reg = MO.getReg();
		if (Reg == 0 \|\| !Register::isVirtualRegister(Reg))
		continue;
		// Find the toc address.
		MachineInstr *DefMI = MRI->getVRegDef(Reg);
		for (auto MO2 : DefMI->uses())
		if (MO2.isCPI())
		return (MCP->getConstants())[MO2.getIndex()].Val.ConstVal;
		}
		return nullptr;
		}

bool PPCInstrInfo::getMachineCombinerPatterns(		bool PPCInstrInfo::getMachineCombinerPatterns(
MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,		MachineInstr &Root, SmallVectorImpl<MachineCombinerPattern> &Patterns,
bool DoRegPressureReduce) const {		bool DoRegPressureReduce) const {
// Using the machine combiner in this way is potentially expensive, so		// Using the machine combiner in this way is potentially expensive, so
// restrict to when aggressive optimizations are desired.		// restrict to when aggressive optimizations are desired.
if (Subtarget.getTargetMachine().getOptLevel() != CodeGenOpt::Aggressive)		if (Subtarget.getTargetMachine().getOptLevel() != CodeGenOpt::Aggressive)
return false;		return false;

if (getFMAPatterns(Root, Patterns))		if (getFMAPatterns(Root, Patterns, DoRegPressureReduce))
return true;		return true;

return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,		return TargetInstrInfo::getMachineCombinerPatterns(Root, Patterns,
DoRegPressureReduce);		DoRegPressureReduce);
}		}

void PPCInstrInfo::genAlternativeCodeSequence(		void PPCInstrInfo::genAlternativeCodeSequence(
MachineInstr &Root, MachineCombinerPattern Pattern,		MachineInstr &Root, MachineCombinerPattern Pattern,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
switch (Pattern) {		switch (Pattern) {
case MachineCombinerPattern::REASSOC_XY_AMM_BMM:		case MachineCombinerPattern::REASSOC_XY_AMM_BMM:
case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:		case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:
		case MachineCombinerPattern::REASSOC_XY_BCA:
		case MachineCombinerPattern::REASSOC_XY_BAC:
reassociateFMA(Root, Pattern, InsInstrs, DelInstrs, InstrIdxForVirtReg);		reassociateFMA(Root, Pattern, InsInstrs, DelInstrs, InstrIdxForVirtReg);
break;		break;
default:		default:
// Reassociate default patterns.		// Reassociate default patterns.
TargetInstrInfo::genAlternativeCodeSequence(Root, Pattern, InsInstrs,		TargetInstrInfo::genAlternativeCodeSequence(Root, Pattern, InsInstrs,
DelInstrs, InstrIdxForVirtReg);		DelInstrs, InstrIdxForVirtReg);
break;		break;
}		}
}		}

// Currently, only handle two patterns REASSOC_XY_AMM_BMM and
// REASSOC_XMM_AMM_BMM. See comments for getFMAPatterns.
void PPCInstrInfo::reassociateFMA(		void PPCInstrInfo::reassociateFMA(
MachineInstr &Root, MachineCombinerPattern Pattern,		MachineInstr &Root, MachineCombinerPattern Pattern,
SmallVectorImpl<MachineInstr *> &InsInstrs,		SmallVectorImpl<MachineInstr *> &InsInstrs,
SmallVectorImpl<MachineInstr *> &DelInstrs,		SmallVectorImpl<MachineInstr *> &DelInstrs,
DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {		DenseMap<unsigned, unsigned> &InstrIdxForVirtReg) const {
MachineFunction *MF = Root.getMF();		MachineFunction *MF = Root.getMF();
MachineRegisterInfo &MRI = MF->getRegInfo();		MachineRegisterInfo &MRI = MF->getRegInfo();
		const TargetRegisterInfo *TRI = &getRegisterInfo();
MachineOperand &OpC = Root.getOperand(0);		MachineOperand &OpC = Root.getOperand(0);
Register RegC = OpC.getReg();		Register RegC = OpC.getReg();
const TargetRegisterClass *RC = MRI.getRegClass(RegC);		const TargetRegisterClass *RC = MRI.getRegClass(RegC);
MRI.constrainRegClass(RegC, RC);		MRI.constrainRegClass(RegC, RC);

unsigned FmaOp = Root.getOpcode();		unsigned FmaOp = Root.getOpcode();
int16_t Idx = getFMAOpIdxInfo(FmaOp);		int16_t Idx = getFMAOpIdxInfo(FmaOp);
assert(Idx >= 0 && "Root must be a FMA instruction");		assert(Idx >= 0 && "Root must be a FMA instruction");

		bool IsILPReassociate =
		(Pattern == MachineCombinerPattern::REASSOC_XY_AMM_BMM) \|\|
		(Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM);

uint16_t AddOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxAddOpIdx];		uint16_t AddOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxAddOpIdx];
uint16_t FirstMulOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxMULOpIdx];		uint16_t FirstMulOpIdx = FMAOpIdxInfo[Idx][InfoArrayIdxMULOpIdx];
MachineInstr *Prev = MRI.getUniqueVRegDef(Root.getOperand(AddOpIdx).getReg());
MachineInstr *Leaf =		MachineInstr *Prev = nullptr;
MRI.getUniqueVRegDef(Prev->getOperand(AddOpIdx).getReg());		MachineInstr *Leaf = nullptr;
uint16_t IntersectedFlags =		switch (Pattern) {
		jsjiUnsubmitted Done Reply Inline Actions I think a switch table here would be clearer. switch (Pattern){ case: MachineCombinerPattern::REASSOC_XY_AMM_BMM: case :MachineCombinerPattern::REASSOC_XY_AMM_BMM: IsILPReassociate = true; Prev = MRI.getUniqueVRegDef(Root.getOperand(AddOpIdx).getReg()); Leaf = MRI.getUniqueVRegDef(Prev->getOperand(AddOpIdx).getReg()); IntersectedFlags = Root.getFlags() & Prev->getFlags() & Leaf->getFlags(); break; case: MachineCombinerPattern::REASSOC_XY_BAC: ... break; } jsji: I think a switch table here would be clearer. ``` switch (Pattern){ case…
Root.getFlags() & Prev->getFlags() & Leaf->getFlags();		default:
		llvm_unreachable("not recognized pattern!");
		case MachineCombinerPattern::REASSOC_XY_AMM_BMM:
		case MachineCombinerPattern::REASSOC_XMM_AMM_BMM:
		Prev = MRI.getUniqueVRegDef(Root.getOperand(AddOpIdx).getReg());
		Leaf = MRI.getUniqueVRegDef(Prev->getOperand(AddOpIdx).getReg());
		break;
		case MachineCombinerPattern::REASSOC_XY_BAC: {
		Register MULReg =
		TRI->lookThruCopyLike(Root.getOperand(FirstMulOpIdx).getReg(), &MRI);
		Leaf = MRI.getVRegDef(MULReg);
		break;
		}
		case MachineCombinerPattern::REASSOC_XY_BCA: {
		Register MULReg = TRI->lookThruCopyLike(
		Root.getOperand(FirstMulOpIdx + 1).getReg(), &MRI);
		jsjiUnsubmitted Done Reply Inline Actions Uninitialized variable. jsji: Uninitialized variable.
		Leaf = MRI.getVRegDef(MULReg);
		break;
		}
		}

		uint16_t IntersectedFlags = 0;
		if (IsILPReassociate)
		IntersectedFlags = Root.getFlags() & Prev->getFlags() & Leaf->getFlags();
		else
		IntersectedFlags = Root.getFlags() & Leaf->getFlags();

auto GetOperandInfo = [&](const MachineOperand &Operand, Register &Reg,		auto GetOperandInfo = [&](const MachineOperand &Operand, Register &Reg,
bool &KillFlag) {		bool &KillFlag) {
Reg = Operand.getReg();		Reg = Operand.getReg();
MRI.constrainRegClass(Reg, RC);		MRI.constrainRegClass(Reg, RC);
KillFlag = Operand.isKill();		KillFlag = Operand.isKill();
};		};

auto GetFMAInstrInfo = [&](const MachineInstr &Instr, Register &MulOp1,		auto GetFMAInstrInfo = [&](const MachineInstr &Instr, Register &MulOp1,
Register &MulOp2, bool &MulOp1KillFlag,		Register &MulOp2, Register &AddOp,
bool &MulOp2KillFlag) {		bool &MulOp1KillFlag, bool &MulOp2KillFlag,
		bool &AddOpKillFlag) {
GetOperandInfo(Instr.getOperand(FirstMulOpIdx), MulOp1, MulOp1KillFlag);		GetOperandInfo(Instr.getOperand(FirstMulOpIdx), MulOp1, MulOp1KillFlag);
GetOperandInfo(Instr.getOperand(FirstMulOpIdx + 1), MulOp2, MulOp2KillFlag);		GetOperandInfo(Instr.getOperand(FirstMulOpIdx + 1), MulOp2, MulOp2KillFlag);
		GetOperandInfo(Instr.getOperand(AddOpIdx), AddOp, AddOpKillFlag);
};		};

Register RegM11, RegM12, RegX, RegY, RegM21, RegM22, RegM31, RegM32;		Register RegM11, RegM12, RegX, RegY, RegM21, RegM22, RegM31, RegM32, RegA11,
		RegA21, RegB;
bool KillX = false, KillY = false, KillM11 = false, KillM12 = false,		bool KillX = false, KillY = false, KillM11 = false, KillM12 = false,
KillM21 = false, KillM22 = false, KillM31 = false, KillM32 = false;		KillM21 = false, KillM22 = false, KillM31 = false, KillM32 = false,
		KillA11 = false, KillA21 = false, KillB = false;

		GetFMAInstrInfo(Root, RegM31, RegM32, RegB, KillM31, KillM32, KillB);

GetFMAInstrInfo(Root, RegM31, RegM32, KillM31, KillM32);		if (IsILPReassociate)
GetFMAInstrInfo(*Prev, RegM21, RegM22, KillM21, KillM22);		GetFMAInstrInfo(*Prev, RegM21, RegM22, RegA21, KillM21, KillM22, KillA21);

if (Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM) {		if (Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM) {
GetFMAInstrInfo(*Leaf, RegM11, RegM12, KillM11, KillM12);		GetFMAInstrInfo(*Leaf, RegM11, RegM12, RegA11, KillM11, KillM12, KillA11);
GetOperandInfo(Leaf->getOperand(AddOpIdx), RegX, KillX);		GetOperandInfo(Leaf->getOperand(AddOpIdx), RegX, KillX);
} else if (Pattern == MachineCombinerPattern::REASSOC_XY_AMM_BMM) {		} else if (Pattern == MachineCombinerPattern::REASSOC_XY_AMM_BMM) {
GetOperandInfo(Leaf->getOperand(1), RegX, KillX);		GetOperandInfo(Leaf->getOperand(1), RegX, KillX);
GetOperandInfo(Leaf->getOperand(2), RegY, KillY);		GetOperandInfo(Leaf->getOperand(2), RegY, KillY);
		} else {
		// Get FSUB instruction info.
		GetOperandInfo(Leaf->getOperand(1), RegX, KillX);
		GetOperandInfo(Leaf->getOperand(2), RegY, KillY);
}		}

// Create new virtual registers for the new results instead of		// Create new virtual registers for the new results instead of
// recycling legacy ones because the MachineCombiner's computation of the		// recycling legacy ones because the MachineCombiner's computation of the
// critical path requires a new register definition rather than an existing		// critical path requires a new register definition rather than an existing
// one.		// one.
		// For register pressure reassociation, we only need create one virtual
		// register for the new fma.
Register NewVRA = MRI.createVirtualRegister(RC);		Register NewVRA = MRI.createVirtualRegister(RC);
InstrIdxForVirtReg.insert(std::make_pair(NewVRA, 0));		InstrIdxForVirtReg.insert(std::make_pair(NewVRA, 0));

Register NewVRB = MRI.createVirtualRegister(RC);		Register NewVRB = 0;
		if (IsILPReassociate) {
		NewVRB = MRI.createVirtualRegister(RC);
InstrIdxForVirtReg.insert(std::make_pair(NewVRB, 1));		InstrIdxForVirtReg.insert(std::make_pair(NewVRB, 1));
		}

Register NewVRD = 0;		Register NewVRD = 0;
if (Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM) {		if (Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM) {
NewVRD = MRI.createVirtualRegister(RC);		NewVRD = MRI.createVirtualRegister(RC);
InstrIdxForVirtReg.insert(std::make_pair(NewVRD, 2));		InstrIdxForVirtReg.insert(std::make_pair(NewVRD, 2));
}		}

auto AdjustOperandOrder = [&](MachineInstr *MI, Register RegAdd, bool KillAdd,		auto AdjustOperandOrder = [&](MachineInstr *MI, Register RegAdd, bool KillAdd,
Register RegMul1, bool KillRegMul1,		Register RegMul1, bool KillRegMul1,
Register RegMul2, bool KillRegMul2) {		Register RegMul2, bool KillRegMul2) {
MI->getOperand(AddOpIdx).setReg(RegAdd);		MI->getOperand(AddOpIdx).setReg(RegAdd);
MI->getOperand(AddOpIdx).setIsKill(KillAdd);		MI->getOperand(AddOpIdx).setIsKill(KillAdd);
MI->getOperand(FirstMulOpIdx).setReg(RegMul1);		MI->getOperand(FirstMulOpIdx).setReg(RegMul1);
MI->getOperand(FirstMulOpIdx).setIsKill(KillRegMul1);		MI->getOperand(FirstMulOpIdx).setIsKill(KillRegMul1);
MI->getOperand(FirstMulOpIdx + 1).setReg(RegMul2);		MI->getOperand(FirstMulOpIdx + 1).setReg(RegMul2);
MI->getOperand(FirstMulOpIdx + 1).setIsKill(KillRegMul2);		MI->getOperand(FirstMulOpIdx + 1).setIsKill(KillRegMul2);
};		};

if (Pattern == MachineCombinerPattern::REASSOC_XY_AMM_BMM) {		MachineInstrBuilder NewARegPressure, NewCRegPressure;
		switch (Pattern) {
		default:
		llvm_unreachable("not recognized pattern!");
		case MachineCombinerPattern::REASSOC_XY_AMM_BMM: {
		jsjiUnsubmitted Done Reply Inline Actions Switch table please as well. jsji: Switch table please as well.
// Create new instructions for insertion.		// Create new instructions for insertion.
MachineInstrBuilder MINewB =		MachineInstrBuilder MINewB =
BuildMI(*MF, Prev->getDebugLoc(), get(FmaOp), NewVRB)		BuildMI(*MF, Prev->getDebugLoc(), get(FmaOp), NewVRB)
.addReg(RegX, getKillRegState(KillX))		.addReg(RegX, getKillRegState(KillX))
.addReg(RegM21, getKillRegState(KillM21))		.addReg(RegM21, getKillRegState(KillM21))
.addReg(RegM22, getKillRegState(KillM22));		.addReg(RegM22, getKillRegState(KillM22));
MachineInstrBuilder MINewA =		MachineInstrBuilder MINewA =
BuildMI(*MF, Root.getDebugLoc(), get(FmaOp), NewVRA)		BuildMI(*MF, Root.getDebugLoc(), get(FmaOp), NewVRA)
Show All 16 Lines	case MachineCombinerPattern::REASSOC_XY_AMM_BMM: {
setSpecialOperandAttr(*MINewA, IntersectedFlags);		setSpecialOperandAttr(*MINewA, IntersectedFlags);
setSpecialOperandAttr(*MINewB, IntersectedFlags);		setSpecialOperandAttr(*MINewB, IntersectedFlags);
setSpecialOperandAttr(*MINewC, IntersectedFlags);		setSpecialOperandAttr(*MINewC, IntersectedFlags);

// Record new instructions for insertion.		// Record new instructions for insertion.
InsInstrs.push_back(MINewA);		InsInstrs.push_back(MINewA);
InsInstrs.push_back(MINewB);		InsInstrs.push_back(MINewB);
InsInstrs.push_back(MINewC);		InsInstrs.push_back(MINewC);
} else if (Pattern == MachineCombinerPattern::REASSOC_XMM_AMM_BMM) {		break;
		}
		case MachineCombinerPattern::REASSOC_XMM_AMM_BMM: {
assert(NewVRD && "new FMA register not created!");		assert(NewVRD && "new FMA register not created!");
// Create new instructions for insertion.		// Create new instructions for insertion.
MachineInstrBuilder MINewA =		MachineInstrBuilder MINewA =
BuildMI(*MF, Leaf->getDebugLoc(),		BuildMI(*MF, Leaf->getDebugLoc(),
get(FMAOpIdxInfo[Idx][InfoArrayIdxFMULInst]), NewVRA)		get(FMAOpIdxInfo[Idx][InfoArrayIdxFMULInst]), NewVRA)
.addReg(RegM11, getKillRegState(KillM11))		.addReg(RegM11, getKillRegState(KillM11))
.addReg(RegM12, getKillRegState(KillM12));		.addReg(RegM12, getKillRegState(KillM12));
MachineInstrBuilder MINewB =		MachineInstrBuilder MINewB =
Show All 25 Lines	case MachineCombinerPattern::REASSOC_XMM_AMM_BMM: {
setSpecialOperandAttr(*MINewD, IntersectedFlags);		setSpecialOperandAttr(*MINewD, IntersectedFlags);
setSpecialOperandAttr(*MINewC, IntersectedFlags);		setSpecialOperandAttr(*MINewC, IntersectedFlags);

// Record new instructions for insertion.		// Record new instructions for insertion.
InsInstrs.push_back(MINewA);		InsInstrs.push_back(MINewA);
InsInstrs.push_back(MINewB);		InsInstrs.push_back(MINewB);
InsInstrs.push_back(MINewD);		InsInstrs.push_back(MINewD);
InsInstrs.push_back(MINewC);		InsInstrs.push_back(MINewC);
		break;
		}
		case MachineCombinerPattern::REASSOC_XY_BAC:
		case MachineCombinerPattern::REASSOC_XY_BCA: {
		Register VarReg;
		bool KillVarReg = false;
		if (Pattern == MachineCombinerPattern::REASSOC_XY_BCA) {
		VarReg = RegM31;
		KillVarReg = KillM31;
		} else {
		VarReg = RegM32;
		KillVarReg = KillM32;
		}
		// We don't want to get negative const from memory pool too early, as the
		// created entry will not be deleted even if it has no users. Since all
		// operand of Leaf and Root are virtual register, we use zero register
		// here as a placeholder. When the InsInstrs is selected in
		// MachineCombiner, we call finalizeInsInstrs to replace the zero register
		// with a virtual register which is a load from constant pool.
		NewARegPressure = BuildMI(*MF, Root.getDebugLoc(), get(FmaOp), NewVRA)
		.addReg(RegB, getKillRegState(RegB))
		.addReg(RegY, getKillRegState(KillY))
		.addReg(PPC::ZERO8);
		NewCRegPressure = BuildMI(*MF, Root.getDebugLoc(), get(FmaOp), RegC)
		.addReg(NewVRA, getKillRegState(true))
		.addReg(RegX, getKillRegState(KillX))
		.addReg(VarReg, getKillRegState(KillVarReg));
		// For now, we only support xsmaddadp/xsmaddasp, their add operand are
		// both at index 1, no need to adjust.
		// FIXME: when add more fma instructions support, like fma/fmas, adjust
		// the operand index here.
		break;
		}
		}

		if (!IsILPReassociate) {
		setSpecialOperandAttr(*NewARegPressure, IntersectedFlags);
		setSpecialOperandAttr(*NewCRegPressure, IntersectedFlags);

		InsInstrs.push_back(NewARegPressure);
		InsInstrs.push_back(NewCRegPressure);
}		}

assert(!InsInstrs.empty() &&		assert(!InsInstrs.empty() &&
"Insertion instructions set should not be empty!");		"Insertion instructions set should not be empty!");

// Record old instructions for deletion.		// Record old instructions for deletion.
DelInstrs.push_back(Leaf);		DelInstrs.push_back(Leaf);
		if (IsILPReassociate)
DelInstrs.push_back(Prev);		DelInstrs.push_back(Prev);
DelInstrs.push_back(&Root);		DelInstrs.push_back(&Root);
}		}

// Detect 32 -> 64-bit extensions where we may reuse the low sub-register.		// Detect 32 -> 64-bit extensions where we may reuse the low sub-register.
bool PPCInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,		bool PPCInstrInfo::isCoalescableExtInstr(const MachineInstr &MI,
Register &SrcReg, Register &DstReg,		Register &SrcReg, Register &DstReg,
unsigned &SubIdx) const {		unsigned &SubIdx) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
▲ Show 20 Lines • Show All 4,434 Lines • Show Last 20 Lines

llvm/test/CodeGen/PowerPC/register-pressure-reduction.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -verify-machineinstrs -ppc-asm-full-reg-names -O3 < %s \
				; RUN: -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 \| FileCheck %s
				; RUN: llc -verify-machineinstrs -ppc-asm-full-reg-names -O3 < %s \
				; RUN: -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr8 \| FileCheck %s --check-prefix=CHECK-P8
				; RUN: llc -verify-machineinstrs -ppc-asm-full-reg-names -ppc-fma-rp-factor=0.0 -O3 < %s \
				; RUN: -mtriple=powerpc64le-unknown-linux-gnu -mcpu=pwr9 \| FileCheck %s --check-prefix=CHECK-FMA

				@global_val = external global float, align 4

				jsjiUnsubmitted Done Reply Inline Actions Can we add test to see whether we will be able to reuse const? eg: If there is already negative const in the const pool. jsji: Can we add test to see whether we will be able to reuse const? eg: If there is already negative…
				define float @foo_float(float %0, float %1, float %2, float %3) {
				; CHECK-LABEL: foo_float:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addis r3, r2, .LCPI0_0@toc@ha
				; CHECK-NEXT: xsmulsp f1, f2, f1
				; CHECK-NEXT: xssubsp f0, f3, f4
				; CHECK-NEXT: lfs f2, .LCPI0_0@toc@l(r3)
				; CHECK-NEXT: xsmaddasp f1, f0, f2
				; CHECK-NEXT: blr
				;
				; CHECK-P8-LABEL: foo_float:
				; CHECK-P8: # %bb.0:
				; CHECK-P8-NEXT: xsmulsp f1, f2, f1
				; CHECK-P8-NEXT: addis r3, r2, .LCPI0_0@toc@ha
				; CHECK-P8-NEXT: xssubsp f0, f3, f4
				; CHECK-P8-NEXT: lfs f2, .LCPI0_0@toc@l(r3)
				; CHECK-P8-NEXT: xsmaddasp f1, f0, f2
				; CHECK-P8-NEXT: blr
				;
				; CHECK-FMA-LABEL: foo_float:
				; CHECK-FMA: # %bb.0:
				; CHECK-FMA-NEXT: addis r3, r2, .LCPI0_0@toc@ha
				; CHECK-FMA-NEXT: xsmulsp f1, f2, f1
				; CHECK-FMA-NEXT: lfs f0, .LCPI0_0@toc@l(r3)
				; CHECK-FMA-NEXT: addis r3, r2, .LCPI0_1@toc@ha
				; CHECK-FMA-NEXT: lfs f2, .LCPI0_1@toc@l(r3)
				; CHECK-FMA-NEXT: xsmaddasp f1, f4, f2
				; CHECK-FMA-NEXT: xsmaddasp f1, f3, f0
				; CHECK-FMA-NEXT: blr
				%5 = fmul reassoc nsz float %1, %0
				%6 = fsub reassoc nsz float %2, %3
				%7 = fmul reassoc nsz float %6, 0x3DB2533FE0000000
				%8 = fadd reassoc nsz float %7, %5
				ret float %8
				}

				jsjiUnsubmitted Not Done Reply Inline Actions If we have 2 patterns, we should test both patterns.. jsji: If we have 2 patterns, we should test both patterns..
				shchenzAuthorUnsubmitted Done Reply Inline Actions I tried this before, seems in ISel, constant operand is put as the second mul operand in FMA instruction. I also can not create a MIR case, as there are const pool inputs, met some syntax errors. shchenz: I tried this before, seems in ISel, constant operand is put as the second mul operand in FMA…
				define double @foo_double(double %0, double %1, double %2, double %3) {
				; CHECK-LABEL: foo_double:
				; CHECK: # %bb.0:
				; CHECK-NEXT: xsmuldp f1, f2, f1
				; CHECK-NEXT: xssubdp f0, f3, f4
				; CHECK-NEXT: addis r3, r2, .LCPI1_0@toc@ha
				; CHECK-NEXT: lfd f2, .LCPI1_0@toc@l(r3)
				; CHECK-NEXT: xsmaddadp f1, f0, f2
				; CHECK-NEXT: blr
				;
				; CHECK-P8-LABEL: foo_double:
				; CHECK-P8: # %bb.0:
				; CHECK-P8-NEXT: xsmuldp f1, f2, f1
				; CHECK-P8-NEXT: addis r3, r2, .LCPI1_0@toc@ha
				; CHECK-P8-NEXT: xssubdp f0, f3, f4
				; CHECK-P8-NEXT: lfd f2, .LCPI1_0@toc@l(r3)
				; CHECK-P8-NEXT: xsmaddadp f1, f0, f2
				; CHECK-P8-NEXT: blr
				;
				; CHECK-FMA-LABEL: foo_double:
				; CHECK-FMA: # %bb.0:
				; CHECK-FMA-NEXT: addis r3, r2, .LCPI1_0@toc@ha
				; CHECK-FMA-NEXT: xsmuldp f1, f2, f1
				; CHECK-FMA-NEXT: lfd f0, .LCPI1_0@toc@l(r3)
				; CHECK-FMA-NEXT: addis r3, r2, .LCPI1_1@toc@ha
				; CHECK-FMA-NEXT: lfd f2, .LCPI1_1@toc@l(r3)
				; CHECK-FMA-NEXT: xsmaddadp f1, f4, f2
				; CHECK-FMA-NEXT: xsmaddadp f1, f3, f0
				; CHECK-FMA-NEXT: blr
				%5 = fmul reassoc nsz double %1, %0
				%6 = fsub reassoc nsz double %2, %3
				%7 = fmul reassoc nsz double %6, 0x3DB2533FE68CADDE
				%8 = fadd reassoc nsz double %7, %5
				ret double %8
				}

				define float @foo_float_reuse_const(float %0, float %1, float %2, float %3) {
				; CHECK-LABEL: foo_float_reuse_const:
				; CHECK: # %bb.0:
				; CHECK-NEXT: addis r3, r2, .LCPI2_0@toc@ha
				; CHECK-NEXT: xsmulsp f1, f2, f1
				; CHECK-NEXT: xssubsp f0, f3, f4
				; CHECK-NEXT: lfs f3, .LCPI2_0@toc@l(r3)
				; CHECK-NEXT: addis r3, r2, .LCPI2_1@toc@ha
				; CHECK-NEXT: xsmaddasp f1, f0, f3
				; CHECK-NEXT: lfs f0, .LCPI2_1@toc@l(r3)
				; CHECK-NEXT: addis r3, r2, .LC0@toc@ha
				; CHECK-NEXT: ld r3, .LC0@toc@l(r3)
				; CHECK-NEXT: xsmulsp f0, f2, f0
				; CHECK-NEXT: stfs f0, 0(r3)
				; CHECK-NEXT: blr
				;
				; CHECK-P8-LABEL: foo_float_reuse_const:
				; CHECK-P8: # %bb.0:
				; CHECK-P8-NEXT: xsmulsp f1, f2, f1
				; CHECK-P8-NEXT: addis r3, r2, .LCPI2_0@toc@ha
				; CHECK-P8-NEXT: addis r4, r2, .LCPI2_1@toc@ha
				; CHECK-P8-NEXT: xssubsp f0, f3, f4
				; CHECK-P8-NEXT: lfs f3, .LCPI2_0@toc@l(r3)
				; CHECK-P8-NEXT: lfs f4, .LCPI2_1@toc@l(r4)
				; CHECK-P8-NEXT: addis r3, r2, .LC0@toc@ha
				; CHECK-P8-NEXT: ld r3, .LC0@toc@l(r3)
				; CHECK-P8-NEXT: xsmaddasp f1, f0, f3
				; CHECK-P8-NEXT: xsmulsp f0, f2, f4
				; CHECK-P8-NEXT: stfsx f0, 0, r3
				; CHECK-P8-NEXT: blr
				;
				; CHECK-FMA-LABEL: foo_float_reuse_const:
				; CHECK-FMA: # %bb.0:
				; CHECK-FMA-NEXT: addis r3, r2, .LCPI2_0@toc@ha
				; CHECK-FMA-NEXT: xsmulsp f1, f2, f1
				; CHECK-FMA-NEXT: lfs f0, .LCPI2_0@toc@l(r3)
				; CHECK-FMA-NEXT: addis r3, r2, .LCPI2_1@toc@ha
				; CHECK-FMA-NEXT: lfs f5, .LCPI2_1@toc@l(r3)
				; CHECK-FMA-NEXT: addis r3, r2, .LC0@toc@ha
				; CHECK-FMA-NEXT: ld r3, .LC0@toc@l(r3)
				; CHECK-FMA-NEXT: xsmaddasp f1, f4, f5
				; CHECK-FMA-NEXT: xsmaddasp f1, f3, f0
				; CHECK-FMA-NEXT: xsmulsp f0, f2, f5
				; CHECK-FMA-NEXT: stfs f0, 0(r3)
				; CHECK-FMA-NEXT: blr
				%5 = fmul reassoc nsz float %1, %0
				%6 = fsub reassoc nsz float %2, %3
				%7 = fmul reassoc nsz float %6, 0x3DB2533FE0000000
				%8 = fadd reassoc nsz float %7, %5
				%9 = fmul reassoc nsz float %1, 0xBDB2533FE0000000
				store float %9, float* @global_val, align 4
				ret float %8
				}

This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] support register pressure reduction in machine combiner.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 317335

llvm/include/llvm/CodeGen/MachineCombinerPattern.h

llvm/lib/CodeGen/MachineCombiner.cpp

llvm/lib/Target/PowerPC/PPCInstrInfo.h

llvm/lib/Target/PowerPC/PPCInstrInfo.cpp

llvm/test/CodeGen/PowerPC/register-pressure-reduction.ll

[PowerPC] support register pressure reduction in machine combiner.
ClosedPublic