This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC][WIP] Provide context-sensitive cost to the Greedy Allocator to favour splitting over CSR first use
AbandonedPublic

Authored by nemanjai on Dec 2 2016, 3:48 PM.

Download Raw Diff

Details

Reviewers

qcolombet
wschmidt
echristo
kbarton
amehsan
hfinkel

Summary

PowerPC is a target that can see great benefits from shrink-wrapping due to a high overhead for non-leaf calls (spilling SPR's and CSR's). However, the register allocator currently favours allocating a CSR over splitting a region. This in effect results in copies of parameter registers into CSR's in the entry block when the parameter is live across any calls in the function. And of course, this disables shrink-wrapping because the save point then must be the entry block.
Just providing a cost in TargetRegisterInfo::getCSRFirstUseCost() is not all that effective in alleviating this issue because it is a global setting and there are situations where allocating a CSR is better (i.e. less spilling around calls).

As the title mentions, this is a work in progress. The cost function probably needs a fair bit of tuning both for the actual values for the cost and for conditions under which a non-zero cost is returned. The current value is arbitrary (just seems to work well) and the condition is simple:

The live range of the value spans any blocks that have no calls

However, with this very coarse cost function, we see a doubling in the number of shrink-wrapped functions as well as significant performance improvement. There are 4 SPEC INT benchmarks that degrade by 0.07% - 1.07%. Everything else improves by 2.7% - 4.7% (INT) and 0.16% - 15.0% (FP).

So hopefully through this patch, we can have a productive discussion of how to proceed with a more fine-grained cost model for allocating CSR's.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai updated this revision to Diff 80139.Dec 2 2016, 3:48 PM

nemanjai retitled this revision from to [PowerPC][WIP] Provide context-sensitive cost to the Greedy Allocator to favour splitting over CSR first use.

nemanjai updated this object.

nemanjai added reviewers: qcolombet, kbarton, echristo, hfinkel, amehsan, wschmidt.

nemanjai set the repository for this revision to rL LLVM.

nemanjai added subscribers: sfertile, lei, syzaara, jtony.

Herald added subscribers: mehdi_amini, MatzeB. · View Herald TranscriptDec 2 2016, 3:48 PM

In general this patch might be a good idea. I'm currently wondering what the impact is on other targets if we just make the implementation the same for all architectures? Can you explore the tradeoffs that we're looking at on, say, aarch64 and x86? Basically I'm not sure why we allocate a CSR rather than split the range - I'm guessing register pressure, but that's less of an issue I think.

Quentin?

-eric

Hi,

In general this patch might be a good idea.

I agree. Moreover, I would do something similar for the spilling case and make sure the cost is different for copies and spills.

I'm currently wondering what the impact is on other targets if we just make the implementation the same for all architectures?

My guess is this will depend on whether or not shrink-wrapping is enabled and the cost of the spill/split instruction for the target. That being said, that's possible that we could derive something sensible for all architectures.

Basically I'm not sure why we allocate a CSR rather than split the range - I'm guessing register pressure, but that's less of an issue I think.

We allocate a CSR rather than splitting a live-range when the cost model says it is more expensive to split... that helps right :).
To be concrete here, IIRC, the cost model assumes that the split points we add won't be coalesced and thus accounts for the frequencies of the copies at the related spot. If that cost is bigger than a spill of the CSR in the entry block, we spill the CSR. E.g., we trade a spill of a CSR against copies in a loop.

Cheers,
-Quentin

Also, that's possible that the right fix/simple fix is to have one CSRCost for split and one for spill. Indeed, IIRC, right now the returned cost for both spilling and splitting is going to be the sum of the frequencies where the split/spill happen and given the spill and copy have different cost, we may want to have different comparison.

E.g., CSRCostForSpill = 5 (ok to trade against more than 5 executed spill/reload) but CSRCostForSpilt = 20 (ok to trade against more than 20 executed copies)

In D27366#617875, @qcolombet wrote:

Hi,

In general this patch might be a good idea.

I agree. Moreover, I would do something similar for the spilling case and make sure the cost is different for copies and spills.

I'm currently wondering what the impact is on other targets if we just make the implementation the same for all architectures?

My guess is this will depend on whether or not shrink-wrapping is enabled and the cost of the spill/split instruction for the target. That being said, that's possible that we could derive something sensible for all architectures.

Basically I'm not sure why we allocate a CSR rather than split the range - I'm guessing register pressure, but that's less of an issue I think.

We allocate a CSR rather than splitting a live-range when the cost model says it is more expensive to split... that helps right :).
To be concrete here, IIRC, the cost model assumes that the split points we add won't be coalesced and thus accounts for the frequencies of the copies at the related spot. If that cost is bigger than a spill of the CSR in the entry block, we spill the CSR. E.g., we trade a spill of a CSR against copies in a loop.

That's a lot more "heuristic" than I'm comfortable with in general, but OK. :)

Next step:

I think we should work on a more unified interface and then get this committed.

Thanks!

-eric

Ah, raising CSR cost to be high enough and using existing splitting is indeed a simpler approach. I didn't think of the idea even after you hinted this in PR. I just tried https://reviews.llvm.org/D27366 and it can cover the simple testcases I have. I can help to evaluate the performance on x86 and see how the overall performance looks. Thanks!

Ah, very sorry, I was intended to reply to https://reviews.llvm.org/D27596.

I proposed a patch in https://reviews.llvm.org/D27596 to address the same issue and Quentin suggested D27366 is simpler. I agree it is a better approach. The patch already covers my simple testcases when I tried it on x86. I can help to evaluate its performance on x86 side with internal benchmarks.

Forgive me if the answer to this was meant to be evident from the comments, but I am not clear on it:
Do we agree that the cost should be context sensitive or are we suggesting that a global cost should be provided by the target for the spill/split (i.e. two separate costs that are properties of the subtarget irrespective of the MachineFunction)?

Just providing a cost in TargetRegisterInfo::getCSRFirstUseCost() is not all that effective in alleviating this issue because it is a global setting and there are situations where allocating a CSR is better (i.e. less spilling around calls).

Could you be more specific about the case when providing a global cost in TargetRegisterInfo::getCSRFirstUseCost() is not enough? I am asking because if there is case using CSR is better than splitting or spilling, maybe calculateRegionSplitCost or calcSpillCost are to be adjusted, not CSRCost. I think CSRCost is only related with prolog/epilog so it is relatively fixed?

In D27366#621609, @wmi wrote:

Just providing a cost in TargetRegisterInfo::getCSRFirstUseCost() is not all that effective in alleviating this issue because it is a global setting and there are situations where allocating a CSR is better (i.e. less spilling around calls).

Could you be more specific about the case when providing a global cost in TargetRegisterInfo::getCSRFirstUseCost() is not enough? I am asking because if there is case using CSR is better than splitting or spilling, maybe calculateRegionSplitCost or calcSpillCost are to be adjusted, not CSRCost. I think CSRCost is only related with prolog/epilog so it is relatively fixed?

In general, yes. If you have a lot of high-register-pressure computation, then you want to use a CSR instead of repeatedly spilling. In other words, shrink-wrapping aside, using the CSR is only cost-equivalent to the first spill, if you'd need multiple spills, then using a CSR is better.

Rebasing to current trunk.

gberry added a subscriber: gberry.Jan 20 2017, 9:38 AM

nemanjai, do you have any update about the patch? We also depend on it on x86 side so want to know the status.

Short version is that I am stuck in terms of how to proceed as none of my numerous attempts to do this differently bore fruit.

TL; DR version:
Unfortunately, I have not been able to make much progress either by providing two separate context-insensitive costs or by asking for direction on llvm-dev. I'm not sure if the goal for X86 is the same as it is for us on PPC, but I'll try to clarify what we're after.

The motivation for this patch was to facilitate shrink-wrapping (of course, that much is obvious). On PPC, the most prevalent reason by far that functions don't get shrink-wrapped (and could be) is that a CSR is used too early with respect to blocks that contain calls. From the standpoint of shrink-wrapping, the register allocator would ideally allocate a CSR in a block only if:

It is dominated by a block that contains a call (notice, not properly dominated)
Avoiding CSR use would require a spill

This patch does not quite achieve this, but none of the other approaches I've tried come as close as this patch does. I suppose that the solution would be to add these heuristics into the register allocator somehow, but I don't understand the RA code well enough to do this. I also don't know if anyone agrees with that assessment :).

Another observation that I made is that on PPC, this is particularly prevalent in functions that take parameters and have calls. The pattern in those functions is that a parameter register is copied into a CSR in the entry block so that the value is available in the same register in all blocks where it is used (regardless of whether the block contains a call). So perhaps it would be possible to tweak whatever it is that associates a physical (parameter) register with a virtual register to associate the parameter register with a different vreg in blocks that have calls and those that don't.

Hi Nemanja,

I tried your testcase with my experimental patch (got from here: http://lists.llvm.org/pipermail/llvm-dev/2017-February/109977.html) and saw that the testcase was not shrinkwrap optimized (cmd I used: clang -O2 -target powerpc64le-grtev4-linux-gnu -S 1.c).

Existing reg splitting for live range across function calls took effect. After splitting, the sub vreg across call got CSR register assigned, and the other sub vreg got a non-CSR register. These all work as we expect. However, during tryHintRecoloring, the two sub vregs are coalesced again.

I add a simple logic in tryHIntRecoloring: If we are going to switch from a non-CSR reg to a CSR reg, only when the recoloring cost difference is larger than CSRCost, we will do such recoloring. In other words, to justify the planning recoloring, the benefit must be at least larger than the potential negative impact on shrinkwrapping.

With the change, the testcase is shrinkwrap optimized. I attach the changed experimental patch.

Thanks,
Wei.

csrsplit.v2.txt4 KBDownload

In D27366#674099, @wmi wrote:

Hi Nemanja,

I tried your testcase with my experimental patch (got from here: http://lists.llvm.org/pipermail/llvm-dev/2017-February/109977.html) and saw that the testcase was not shrinkwrap optimized (cmd I used: clang -O2 -target powerpc64le-grtev4-linux-gnu -S 1.c).

Existing reg splitting for live range across function calls took effect. After splitting, the sub vreg across call got CSR register assigned, and the other sub vreg got a non-CSR register. These all work as we expect. However, during tryHintRecoloring, the two sub vregs are coalesced again.

I add a simple logic in tryHIntRecoloring: If we are going to switch from a non-CSR reg to a CSR reg, only when the recoloring cost difference is larger than CSRCost, we will do such recoloring. In other words, to justify the planning recoloring, the benefit must be at least larger than the potential negative impact on shrinkwrapping.

With the change, the testcase is shrinkwrap optimized. I attach the changed experimental patch.

Thanks,
Wei.
csrsplit.v2.txt4 KBDownload

Hi Wei,
sorry for the delay in getting back to you. I think we should try to make some progress here and the patch you proposed certainly seems less intrusive and like a reasonable first step forward. Do you want to post it for review and I can then abandon this patch?
My experiments with your proposed patch definitely increase the number of functions we shrink wrap and provide a performance improvement (albeit slightly smaller than this context-sensitive patch). However, I think that your patch is definitely needed so we can prevent regalloc from reversing decisions it has made.
So what I propose is that we move forward with your proposed patch and abandon this.

For further improvements, I added a (quick-and-dirty) analysis to output all functions that we could theoretically shrink-wrap but don't. As you may expect, this produces a whole lot of data. I plan to spend some time distilling that data into usable reproducers so we can see if we are missing many opportunities that would actually be profitable to shrink-wrap.

Rebasing to ToT.

thegameg added a subscriber: thegameg.Apr 21 2017, 2:02 PM

Do we have any update about the patch? I tried this on AArch64 for Spec2006, and found 8% performance improvement in astar due to the shrink wrapping happened in one of the hot functions; no regressions in other benchmarks. I will be happy to help evaluating this change on AArch64 side.

Kindly ping.
Do we have any plan about the patch? If nobody is really pursuing this, I can take a closer look at this if this is right fix.

junbuml mentioned this in D34608: [WIP][AArch64] Increase CSR cost when defering use of CSR is preferred.Jun 25 2017, 8:08 PM

We decided not to pursue this further as there are better approaches to aiding shrink wrapping.

Herald added a subscriber: llvm-commits. · View Herald TranscriptFeb 6 2018, 4:23 AM

• wuzish mentioned this in D32201: [RALLOC] Increase CSR cost in RegAllocGreedy to favour splitting over CSR first use.Aug 8 2019, 2:56 AM

jedilyn added a subscriber: jedilyn.Aug 12 2019, 11:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2019, 11:50 PM

Herald added subscribers: shchenz, jsji, MaskRay. · View Herald Transcript

• wuzish mentioned this in D66576: [Regalloc][WIP] Increase CSR cost in RegAllocGreedy to favour splitting/spill over CSR first use.Aug 22 2019, 1:13 AM

Revision Contents

Path

Size

include/

llvm/

Target/

TargetLowering.h

16 lines

lib/

CodeGen/

RegAllocGreedy.cpp

19 lines

Target/

PowerPC/

PPCISelLowering.h

3 lines

PPCISelLowering.cpp

30 lines

test/

CodeGen/

PowerPC/

branch-opt.ll

8 lines

ppc-shrink-wrapping.ll

19 lines

ppc64-func-desc-hoist.ll

8 lines

tail-dup-break-cfg.ll

16 lines

tail-dup-layout.ll

35 lines

Diff 92186

include/llvm/Target/TargetLowering.h

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/MC/MCRegisterInfo.h"		#include "llvm/MC/MCRegisterInfo.h"
#include "llvm/Support/AtomicOrdering.h"		#include "llvm/Support/AtomicOrdering.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Target/TargetCallingConv.h"		#include "llvm/Target/TargetCallingConv.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
		#include "llvm/CodeGen/LiveInterval.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <climits>		#include <climits>
#include <cstdint>		#include <cstdint>
#include <iterator>		#include <iterator>
#include <map>		#include <map>
#include <string>		#include <string>
#include <utility>		#include <utility>
▲ Show 20 Lines • Show All 3,112 Lines • ▼ Show 20 Lines	virtual void AdjustInstrPostInstrSelection(MachineInstr &MI,
SDNode *Node) const;		SDNode *Node) const;

/// If this function returns true, SelectionDAGBuilder emits a		/// If this function returns true, SelectionDAGBuilder emits a
/// LOAD_STACK_GUARD node when it is lowering Intrinsic::stackprotector.		/// LOAD_STACK_GUARD node when it is lowering Intrinsic::stackprotector.
virtual bool useLoadStackGuardNode() const {		virtual bool useLoadStackGuardNode() const {
return false;		return false;
}		}

		/// The target can specify whether a callee-saved register should be used
		/// rather than spliting the live range. Default behaviour is yes.
		virtual bool useCSRInsteadOfSplit(const LiveInterval &LI) const {
		return true;
		}

		/// Target specific cost of using a callee-saved register for the first time
		/// when the live range of the value spans the passed blocks. A target should
		/// only return a value other than zero here if splitting might be preferred
		/// to a CSR use (i.e. it would return false from useCSRInsteadOfSplit for the
		/// respective live range).
		virtual int64_t costOfFirstCSRForBlocks(
		const SmallVectorImpl<MachineBasicBlock*> &UseMBBs) const {
		return 0;
		}
/// Lower TLS global address SDNode for target independent emulated TLS model.		/// Lower TLS global address SDNode for target independent emulated TLS model.
virtual SDValue LowerToTLSEmulatedModel(const GlobalAddressSDNode *GA,		virtual SDValue LowerToTLSEmulatedModel(const GlobalAddressSDNode *GA,
SelectionDAG &DAG) const;		SelectionDAG &DAG) const;

// seteq(x, 0) -> truncate(srl(ctlz(zext(x)), log2(#bits)))		// seteq(x, 0) -> truncate(srl(ctlz(zext(x)), log2(#bits)))
// If we're comparing for equality to zero and isCtlzFast is true, expose the		// If we're comparing for equality to zero and isCtlzFast is true, expose the
// fact that this can be implemented as a ctlz/srl pair, so that the dag		// fact that this can be implemented as a ctlz/srl pair, so that the dag
// combiner can fold the new nodes.		// combiner can fold the new nodes.
Show All 18 Lines

lib/CodeGen/RegAllocGreedy.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
#include "llvm/Support/BranchProbability.h"		#include "llvm/Support/BranchProbability.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/Timer.h"		#include "llvm/Support/Timer.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetInstrInfo.h"		#include "llvm/Target/TargetInstrInfo.h"
#include "llvm/Target/TargetSubtargetInfo.h"		#include "llvm/Target/TargetSubtargetInfo.h"
		#include "llvm/Target/TargetLowering.h"
#include <queue>		#include <queue>

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "regalloc"		#define DEBUG_TYPE "regalloc"

STATISTIC(NumGlobalSplits, "Number of split global live ranges");		STATISTIC(NumGlobalSplits, "Number of split global live ranges");
STATISTIC(NumLocalSplits, "Number of split local live ranges");		STATISTIC(NumLocalSplits, "Number of split local live ranges");
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	class RAGreedy : public MachineFunctionPass,
typedef SmallSet<unsigned, 16> SmallVirtRegSet;		typedef SmallSet<unsigned, 16> SmallVirtRegSet;

// context		// context
MachineFunction *MF;		MachineFunction *MF;

// Shortcuts to some useful interface.		// Shortcuts to some useful interface.
const TargetInstrInfo *TII;		const TargetInstrInfo *TII;
const TargetRegisterInfo *TRI;		const TargetRegisterInfo *TRI;
		const TargetLowering *TLI;
RegisterClassInfo RCI;		RegisterClassInfo RCI;

// analyses		// analyses
SlotIndexes *Indexes;		SlotIndexes *Indexes;
MachineBlockFrequencyInfo *MBFI;		MachineBlockFrequencyInfo *MBFI;
MachineDominatorTree *DomTree;		MachineDominatorTree *DomTree;
MachineLoopInfo *Loops;		MachineLoopInfo *Loops;
MachineOptimizationRemarkEmitter *ORE;		MachineOptimizationRemarkEmitter *ORE;
▲ Show 20 Lines • Show All 2,195 Lines • ▼ Show 20 Lines	if (getStage(VirtReg) == RS_Spill && VirtReg.isSpillable()) {
CostPerUseLimit = 1;		CostPerUseLimit = 1;
return 0;		return 0;
}		}
if (getStage(VirtReg) < RS_Split) {		if (getStage(VirtReg) < RS_Split) {
// We choose pre-splitting over using the CSR for the first time if		// We choose pre-splitting over using the CSR for the first time if
// the cost of splitting is lower than CSRCost.		// the cost of splitting is lower than CSRCost.
SA->analyze(&VirtReg);		SA->analyze(&VirtReg);
unsigned NumCands = 0;		unsigned NumCands = 0;
BlockFrequency BestCost = CSRCost; // Don't modify CSRCost.		BlockFrequency BestCost;
		bool GetCostFromTarget = !TLI->useCSRInsteadOfSplit(VirtReg);
		if (GetCostFromTarget) {
		ArrayRef<SplitAnalysis::BlockInfo> UseBlocks = SA->getUseBlocks();
		SmallVector<MachineBasicBlock*, 4> UseMBBs;
		for (int i = 0, e = UseBlocks.size(); i < e; ++i)
		UseMBBs.push_back(UseBlocks[i].MBB);
		BestCost = TLI->costOfFirstCSRForBlocks(UseMBBs);
		} else
		BestCost = CSRCost;

unsigned BestCand = calculateRegionSplitCost(VirtReg, Order, BestCost,		unsigned BestCand = calculateRegionSplitCost(VirtReg, Order, BestCost,
NumCands, true /IgnoreCSR/);		NumCands, true /IgnoreCSR/);
if (BestCand == NoCand)		if (BestCand == NoCand)
// Use the CSR if we can't find a region split below CSRCost.		// Use the CSR if we can't find a region split below CSRCost.
return PhysReg;		return PhysReg;

// Perform the actual pre-splitting.		// Perform the actual pre-splitting.
doRegionSplit(VirtReg, BestCand, false/HasCompact/, NewVRegs);		doRegionSplit(VirtReg, BestCand, false/HasCompact/, NewVRegs);
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	unsigned RAGreedy::selectOrSplitImpl(LiveInterval &VirtReg,
unsigned Depth) {		unsigned Depth) {
unsigned CostPerUseLimit = ~0u;		unsigned CostPerUseLimit = ~0u;
// First try assigning a free register.		// First try assigning a free register.
AllocationOrder Order(VirtReg.reg, *VRM, RegClassInfo, Matrix);		AllocationOrder Order(VirtReg.reg, *VRM, RegClassInfo, Matrix);
if (unsigned PhysReg = tryAssign(VirtReg, Order, NewVRegs)) {		if (unsigned PhysReg = tryAssign(VirtReg, Order, NewVRegs)) {
// When NewVRegs is not empty, we may have made decisions such as evicting		// When NewVRegs is not empty, we may have made decisions such as evicting
// a virtual register, go with the earlier decisions and use the physical		// a virtual register, go with the earlier decisions and use the physical
// register.		// register.
if (CSRCost.getFrequency() && isUnusedCalleeSavedReg(PhysReg) &&		if ((CSRCost.getFrequency() \|\| !TLI->useCSRInsteadOfSplit(VirtReg)) &&
NewVRegs.empty()) {		isUnusedCalleeSavedReg(PhysReg) && NewVRegs.empty()) {
unsigned CSRReg = tryAssignCSRFirstTime(VirtReg, Order, PhysReg,		unsigned CSRReg = tryAssignCSRFirstTime(VirtReg, Order, PhysReg,
CostPerUseLimit, NewVRegs);		CostPerUseLimit, NewVRegs);
if (CSRReg \|\| !NewVRegs.empty())		if (CSRReg \|\| !NewVRegs.empty())
// Return now if we decide to use a CSR or create new vregs due to		// Return now if we decide to use a CSR or create new vregs due to
// pre-splitting.		// pre-splitting.
return CSRReg;		return CSRReg;
} else		} else
return PhysReg;		return PhysReg;
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines

bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {		bool RAGreedy::runOnMachineFunction(MachineFunction &mf) {
DEBUG(dbgs() << "******** GREEDY REGISTER ALLOCATION ********\n"		DEBUG(dbgs() << "******** GREEDY REGISTER ALLOCATION ********\n"
<< "********** Function: " << mf.getName() << '\n');		<< "********** Function: " << mf.getName() << '\n');

MF = &mf;		MF = &mf;
TRI = MF->getSubtarget().getRegisterInfo();		TRI = MF->getSubtarget().getRegisterInfo();
TII = MF->getSubtarget().getInstrInfo();		TII = MF->getSubtarget().getInstrInfo();
		TLI = MF->getSubtarget().getTargetLowering();
RCI.runOnMachineFunction(mf);		RCI.runOnMachineFunction(mf);

EnableLocalReassign = EnableLocalReassignment \|\|		EnableLocalReassign = EnableLocalReassignment \|\|
MF->getSubtarget().enableRALocalReassignment(		MF->getSubtarget().enableRALocalReassignment(
MF->getTarget().getOptLevel());		MF->getTarget().getOptLevel());

if (VerifyEnabled)		if (VerifyEnabled)
MF->verify(this, "Before greedy register allocator");		MF->verify(this, "Before greedy register allocator");
Show All 38 Lines

lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 1,007 Lines • ▼ Show 20 Lines	SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
int &RefinementSteps) const override;		int &RefinementSteps) const override;
unsigned combineRepeatedFPDivisors() const override;		unsigned combineRepeatedFPDivisors() const override;

CCAssignFn *useFastISelCCs(unsigned Flag) const;		CCAssignFn *useFastISelCCs(unsigned Flag) const;

SDValue		SDValue
combineElementTruncationToVectorTruncation(SDNode *N,		combineElementTruncationToVectorTruncation(SDNode *N,
DAGCombinerInfo &DCI) const;		DAGCombinerInfo &DCI) const;
		virtual bool useCSRInsteadOfSplit(const LiveInterval &LI) const override;
		virtual int64_t costOfFirstCSRForBlocks(
		const SmallVectorImpl<MachineBasicBlock*> &UseMBBs) const override;
};		};

namespace PPC {		namespace PPC {

FastISel *createFastISel(FunctionLoweringInfo &FuncInfo,		FastISel *createFastISel(FunctionLoweringInfo &FuncInfo,
const TargetLibraryInfo *LibInfo);		const TargetLibraryInfo *LibInfo);

} // end namespace PPC		} // end namespace PPC
Show All 28 Lines

lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 12,931 Lines • ▼ Show 20 Lines	default:
// false. Examples: f16, f80.		// false. Examples: f16, f80.
return false;		return false;
case MVT::f32:		case MVT::f32:
case MVT::f64:		case MVT::f64:
case MVT::ppcf128:		case MVT::ppcf128:
return Imm.isPosZero();		return Imm.isPosZero();
}		}
}		}

		// If the live interval can be spilled, we'd prefer to do so (subject to cost
		// function below).
		bool PPCTargetLowering::useCSRInsteadOfSplit(const LiveInterval &LI) const {
		return !LI.isSpillable();
		}

		static bool hasCall(MachineBasicBlock *MBB) {
		for (MachineInstr &MI : *MBB)
		if (MI.isCall())
		return true;
		return false;
		}

		// Return a high cost for the first use of a callee-saved register if the live
		// range of the value spans basic blocks in which we'd prefer not to use one.
		// This will often defer use of a CSR and give shrink-wrapping an opportunity
		// to sink/hoist the save/restore from entry/exit blocks respectively.
		int64_t PPCTargetLowering::costOfFirstCSRForBlocks(
		const SmallVectorImpl<MachineBasicBlock*> &UseMBBs) const {
		// Make uses of callee-saved registers expensive if any blocks in the live
		// range have no calls. The actual cost may need some adjustment - this is a
		// rather arbitrary number that seems to produce good performance improvement.
		const int CostOnBlockWithoutCall = 1 << 15;
		for (MachineBasicBlock *It : UseMBBs) {
		if (!hasCall(It))
		return CostOnBlockWithoutCall;
		}
		return TargetLowering::costOfFirstCSRForBlocks(UseMBBs);
		}

test/CodeGen/PowerPC/branch-opt.ll

	; RUN: llc -verify-machineinstrs < %s -march=ppc32 \| FileCheck %s			; RUN: llc -verify-machineinstrs < %s -march=ppc32 \| FileCheck %s

	target datalayout = "E-p:32:32"			target datalayout = "E-p:32:32"
	target triple = "powerpc-apple-darwin8.7.0"			target triple = "powerpc-apple-darwin8.7.0"

	;CHECK-LABEL: foo:			;CHECK-LABEL: foo:
	; There are 4 inner loops (%bb, %bb12, %bb25, %bb38) that all exit to %cond_next48			; There are 4 inner loops (%bb, %bb12, %bb25, %bb38) that all exit to %cond_next48
	; The last (whichever it is) should have a fallthrough exit, and the other three			; The last (whichever it is) should have a fallthrough exit, and the other three
	; need an unconditional branch. No other block should have an unconditional			; need an unconditional branch. No other block should have an unconditional
	; branch to cond_next48			; branch to cond_next48
	; One of the blocks ends up with a loop exit block that gets a tail-duplicated copy			; One of the blocks ends up with a loop exit block that gets a tail-duplicated copy
	; of %cond_next48, so there should only be two unconditional branches.			; of %cond_next48, so there should only be two unconditional branches.

	;CHECK: b LBB0_13			;CHECK: b LBB0_20
	;CHECK: b LBB0_13			;CHECK: b LBB0_20
	;CHECK-NOT: b LBB0_13			;CHECK-NOT: b LBB0_18
	;CHECK: LBB0_13: ; %cond_next48			;CHECK: LBB0_18: ; %cond_next48

	define void @foo(i32 %W, i32 %X, i32 %Y, i32 %Z) {			define void @foo(i32 %W, i32 %X, i32 %Y, i32 %Z) {
	entry:			entry:
	%tmp1 = and i32 %W, 1 ; <i32> [#uses=1]			%tmp1 = and i32 %W, 1 ; <i32> [#uses=1]
	%tmp1.upgrd.1 = icmp eq i32 %tmp1, 0 ; <i1> [#uses=1]			%tmp1.upgrd.1 = icmp eq i32 %tmp1, 0 ; <i1> [#uses=1]
	br i1 %tmp1.upgrd.1, label %cond_false, label %bb5			br i1 %tmp1.upgrd.1, label %cond_false, label %bb5
	bb: ; preds = %bb5, %bb			bb: ; preds = %bb5, %bb
	%indvar77 = phi i32 [ %indvar.next78, %bb ], [ 0, %bb5 ] ; <i32> [#uses=1]			%indvar77 = phi i32 [ %indvar.next78, %bb ], [ 0, %bb5 ] ; <i32> [#uses=1]
	▲ Show 20 Lines • Show All 58 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/ppc-shrink-wrapping.ll

	Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
	; DISABLE: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]			; DISABLE: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; Loop preheader			; Loop preheader
	; CHECK-DAG: li [[SUM:[0-9]+]], 0			; CHECK-DAG: li [[SUM:[0-9]+]], 0
	; CHECK-DAG: li [[IV:[0-9]+]], 10			; CHECK-DAG: li [[IV:[0-9]+]], 10
	;			;
	; Loop body			; Loop body
	; CHECK: .[[LOOP:LBB[0-9_]+]]: # %for.body			; CHECK: .[[LOOP:LBB[0-9_]+]]: # %for.body
				; CHECK: mr [[NEWSUM:[0-9]+]], [[SUM]]
	; CHECK: bl something			; CHECK: bl something
	; CHECK-DAG: addi [[IV]], [[IV]], -1			; CHECK-DAG: addi [[IV]], [[IV]], -1
	; CHECK-DAG: add [[SUM]], 3, [[SUM]]			; CHECK-DAG: add [[NEWSUM]], 3, [[NEWSUM]]
	; CHECK-NEXT: cmplwi [[IV]], 0			; CHECK: cmplwi [[IV]], 0
	; CHECK-NEXT: bne 0, .[[LOOP]]			; CHECK-NEXT: bne 0, .[[LOOP]]
	;			;
	; Next BB.			; Next BB.
	; CHECK: slwi 3, [[SUM]], 3			; CHECK: slwi 3, [[SUM]], 3
	;			;
	; Jump to epilogue.			; Jump to epilogue.
	; DISABLE: b .[[EPILOG_BB:LBB[0-9_]+]]			; DISABLE: b .[[EPILOG_BB:LBB[0-9_]+]]
	;			;
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: freqSaveAndRestoreOutsideLoop2:			; CHECK-LABEL: freqSaveAndRestoreOutsideLoop2:
	; Prologue code.			; Prologue code.
	; Make sure we save the link register before the call			; Make sure we save the link register before the call
	; CHECK: mflr {{[0-9]+}}			; CHECK: mflr {{[0-9]+}}
	;			;
	; Loop preheader			; Loop preheader
	; CHECK-DAG: li [[SUM:[0-9]+]], 0			; CHECK-DAG: li [[SUM:[0-9]+]], 0
	; CHECK-DAG: li [[IV:[0-9]+]], 10			; CHECK-DAG: li [[IV:[0-9]+]], 10
	;			;
	; Loop body			; Loop body
	; CHECK: .[[LOOP:LBB[0-9_]+]]: # %for.body			; CHECK: .[[LOOP:LBB[0-9_]+]]: # %for.body
	; CHECK: bl something			; CHECK: bl something
				; CHECK: mr [[NEWRET:[0-9]+]], 3
	; CHECK-DAG: addi [[IV]], [[IV]], -1			; CHECK-DAG: addi [[IV]], [[IV]], -1
	; CHECK-DAG: add [[SUM]], 3, [[SUM]]			; CHECK-DAG: add [[SUM]], [[NEWRET]], [[SUM]]
	; CHECK-NEXT: cmplwi [[IV]], 0			; CHECK-NEXT: cmplwi [[IV]], 0
	; CHECK-NEXT: bne 0, .[[LOOP]]			; CHECK-NEXT: bne 0, .[[LOOP]]
	;			;
	; Next BB			; Next BB
	; CHECK: %for.end			; CHECK: %for.end
	; CHECK: mtlr {{[0-9]+}}			; CHECK: mtlr {{[0-9]+}}
	; CHECK-NEXT: blr			; CHECK-NEXT: blr
	define i32 @freqSaveAndRestoreOutsideLoop2(i32 %cond) {			define i32 @freqSaveAndRestoreOutsideLoop2(i32 %cond) {
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	; DISABLE-NEXT: std			; DISABLE-NEXT: std
	; DISABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]			; DISABLE-NEXT: beq 0, .[[ELSE_LABEL:LBB[0-9_]+]]
	;			;
	; CHECK: bl somethingElse			; CHECK: bl somethingElse
	;			;
	; Loop preheader			; Loop preheader
	; CHECK-DAG: li [[SUM:[0-9]+]], 0			; CHECK-DAG: li [[SUM:[0-9]+]], 0
	; CHECK-DAG: li [[IV:[0-9]+]], 10			; CHECK-DAG: li [[IV:[0-9]+]], 10
	;			;
	; Loop body			; Loop body
	; CHECK: .[[LOOP:LBB[0-9_]+]]: # %for.body			; CHECK: .[[LOOP:LBB[0-9_]+]]: # %for.body
				; CHECK: mr [[NEWSUM:[0-9]+]], [[SUM]]
	; CHECK: bl something			; CHECK: bl something
	; CHECK-DAG: addi [[IV]], [[IV]], -1			; CHECK-DAG: addi [[IV]], [[IV]], -1
	; CHECK-DAG: add [[SUM]], 3, [[SUM]]			; CHECK-DAG: add [[NEWSUM]], 3, [[NEWSUM]]
	; CHECK-NEXT: cmplwi [[IV]], 0			; CHECK-NEXT: cmplwi [[IV]], 0
	; CHECK-NEXT: bne 0, .[[LOOP]]			; CHECK: bne 0, .[[LOOP]]
	;			;
	; Next BB.			; Next BB.
	; slwi 3, [[SUM]], 3			; slwi 3, [[SUM]], 3
	;			;
	; DISABLE: b .[[EPILOG_BB:LBB[0-9_]+]]			; DISABLE: b .[[EPILOG_BB:LBB[0-9_]+]]
	;			;
	; DISABLE: .[[ELSE_LABEL]]: # %if.else			; DISABLE: .[[ELSE_LABEL]]: # %if.else
	; Shift second argument by one and store into returned register.			; Shift second argument by one and store into returned register.
	; DISABLE: slwi 3, 4, 1			; DISABLE: slwi 3, 4, 1
	; DISABLE: .[[EPILOG_BB]]: # %if.end			; DISABLE: .[[EPILOG_BB]]: # %if.end
	▲ Show 20 Lines • Show All 468 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/ppc64-func-desc-hoist.ll

	Show All 19 Lines
	; INVFUNCDESC-DAG: mr 11, [[REG2]]			; INVFUNCDESC-DAG: mr 11, [[REG2]]
	; INVFUNCDESC-DAG: mr 2, [[REG1]]			; INVFUNCDESC-DAG: mr 2, [[REG1]]
	; INVFUNCDESC: bctrl			; INVFUNCDESC: bctrl
	; INVFUNCDESC-NEXT: ld 2, 40(1)			; INVFUNCDESC-NEXT: ld 2, 40(1)

	; NONINVFUNCDESC-LABEL: @bar			; NONINVFUNCDESC-LABEL: @bar
	; NONINVFUNCDESC: %for.body			; NONINVFUNCDESC: %for.body
	; NONINVFUNCDESC: std 2, 40(1)			; NONINVFUNCDESC: std 2, 40(1)
	; NONINVFUNCDESC-DAG: ld 3, 0(30)			; NONINVFUNCDESC-DAG: ld 5, 0(3)
	; NONINVFUNCDESC-DAG: ld 11, 16(30)			; NONINVFUNCDESC-DAG: ld 11, 16(3)
	; NONINVFUNCDESC-DAG: ld 2, 8(30)			; NONINVFUNCDESC-DAG: ld 2, 8(3)
	; NONINVFUNCDESC: mtctr 3			; NONINVFUNCDESC: mtctr 5
	; NONINVFUNCDESC: bctrl			; NONINVFUNCDESC: bctrl
	; NONINVFUNCDESC-NEXT: ld 2, 40(1)			; NONINVFUNCDESC-NEXT: ld 2, 40(1)

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ]			%i.02 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
	tail call void %callee.knr.cast() #0			tail call void %callee.knr.cast() #0
	%inc = add nuw nsw i32 %i.02, 1			%inc = add nuw nsw i32 %i.02, 1
	%exitcond = icmp eq i32 %inc, 1600000000			%exitcond = icmp eq i32 %inc, 1600000000
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body

	for.end: ; preds = %for.body			for.end: ; preds = %for.body
	ret void			ret void
	}			}

	attributes #0 = { nounwind }			attributes #0 = { nounwind }

test/CodeGen/PowerPC/tail-dup-break-cfg.ll

; RUN: llc -O2 -o - %s \| FileCheck %s		; RUN: llc -O2 -o - %s \| FileCheck %s
target datalayout = "e-m:e-i64:64-n32:64"		target datalayout = "e-m:e-i64:64-n32:64"
target triple = "powerpc64le-grtev4-linux-gnu"		target triple = "powerpc64le-grtev4-linux-gnu"

; Intended layout:		; Intended layout:
; The code for tail-duplication during layout will produce the layout:		; The code for tail-duplication during layout will produce the layout:
; test1		; test1
; test2		; test2
; body1 (with copy of test2)		; body1 (with copy of test2)
; body2		; body2
; exit		; exit

;CHECK-LABEL: tail_dup_break_cfg:		;CHECK-LABEL: tail_dup_break_cfg:
;CHECK: mr [[TAGREG:[0-9]+]], 3		;CHECK: andi. {{[0-9]+}}, 3, 1
;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1		;CHECK: bc 12, 1, [[BODY1LABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: bc 12, 1, [[BODY1LABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: # %test2		;CHECK-NEXT: # %test2
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK-NEXT: rlwinm. {{[0-9]+}}, 3, 0, 30, 30
;CHECK-NEXT: bne 0, [[BODY2LABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: bne 0, [[BODY2LABEL:[._0-9A-Za-z]+]]
;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit		;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit
;CHECK: blr		;CHECK: blr
;CHECK-NEXT: [[BODY1LABEL]]		;CHECK-NEXT: [[BODY1LABEL]]
;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK: rlwinm. {{[0-9]+}}, 3, 0, 30, 30
;CHECK-NEXT: beq 0, [[EXITLABEL]]		;CHECK-NEXT: beq 0, [[EXITLABEL]]
;CHECK-NEXT: [[BODY2LABEL:[._0-9A-Za-z]+]]:		;CHECK-NEXT: [[BODY2LABEL:[._0-9A-Za-z]+]]:
;CHECK: b [[EXITLABEL]]		;CHECK: b [[EXITLABEL]]
define void @tail_dup_break_cfg(i32 %tag) {		define void @tail_dup_break_cfg(i32 %tag) {
entry:		entry:
br label %test1		br label %test1
test1:		test1:
%tagbit1 = and i32 %tag, 1		%tagbit1 = and i32 %tag, 1
Show All 16 Lines	body2:
call void @b()		call void @b()
br label %exit		br label %exit
exit:		exit:
ret void		ret void
}		}

; The branch weights here hint that we shouldn't tail duplicate in this case.		; The branch weights here hint that we shouldn't tail duplicate in this case.
;CHECK-LABEL: tail_dup_dont_break_cfg:		;CHECK-LABEL: tail_dup_dont_break_cfg:
;CHECK: mr [[TAGREG:[0-9]+]], 3		;CHECK: andi. {{[0-9]+}}, 3, 1
;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1		;CHECK: bc 4, 1, [[TEST2LABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: bc 4, 1, [[TEST2LABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: # %body1		;CHECK-NEXT: # %body1
;CHECK: [[TEST2LABEL]]: # %test2		;CHECK: [[TEST2LABEL]]: # %test2
;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30		;CHECK-NEXT: rlwinm. {{[0-9]+}}, 3, 0, 30, 30
;CHECK-NEXT: beq 0, [[EXITLABEL:[._0-9A-Za-z]+]]		;CHECK-NEXT: beq 0, [[EXITLABEL:[._0-9A-Za-z]+]]
;CHECK-NEXT: # %body2		;CHECK-NEXT: # %body2
;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit		;CHECK: [[EXITLABEL:[._0-9A-Za-z]+]]: # %exit
;CHECK: blr		;CHECK: blr
define void @tail_dup_dont_break_cfg(i32 %tag) {		define void @tail_dup_dont_break_cfg(i32 %tag) {
entry:		entry:
br label %test1		br label %test1
test1:		test1:
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/tail-dup-layout.ll

	Show All 15 Lines
	; Tail duplication puts test n+1 at the end of optional n			; Tail duplication puts test n+1 at the end of optional n
	; so optional1 includes a copy of test2 at the end, and branches			; so optional1 includes a copy of test2 at the end, and branches
	; to test3 (at the top) or falls through to optional 2.			; to test3 (at the top) or falls through to optional 2.
	; The CHECK statements check for the whole string of tests			; The CHECK statements check for the whole string of tests
	; and then check that the correct test has been duplicated into the end of			; and then check that the correct test has been duplicated into the end of
	; the optional blocks and that the optional blocks are in the correct order.			; the optional blocks and that the optional blocks are in the correct order.
	;CHECK-LABEL: straight_test:			;CHECK-LABEL: straight_test:
	; test1 may have been merged with entry			; test1 may have been merged with entry
	;CHECK: mr [[TAGREG:[0-9]+]], 3			;CHECK: andi. {{[0-9]+}}, 3, 1
	;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1			;CHECK: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: # %test2			;CHECK-NEXT: # %test2
	;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30			;CHECK-NEXT: rlwinm. {{[0-9]+}}, 3, 0, 30, 30
	;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3			;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3
	;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29			;CHECK-NEXT: rlwinm. {{[0-9]+}}, 3, 0, 29, 29
	;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: .[[TEST4LABEL:[_0-9A-Za-z]+]]: # %test4			;CHECK-NEXT: .[[TEST4LABEL:[_0-9A-Za-z]+]]: # %test4
	;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28			;CHECK-NEXT: rlwinm. {{[0-9]+}}, 3, 0, 28, 28
	;CHECK-NEXT: bne 0, .[[OPT4LABEL:[_0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT4LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: .[[EXITLABEL:[_0-9A-Za-z]+]]: # %exit			;CHECK-NEXT: .[[EXITLABEL:[_0-9A-Za-z]+]]: # %exit
	;CHECK: blr			;CHECK: blr
	;CHECK-NEXT: .[[OPT1LABEL]]:			;CHECK-NEXT: .[[OPT1LABEL]]:
	;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30			;CHECK: rlwinm. {{[0-9]+}}, 3, 0, 30, 30
	;CHECK-NEXT: beq 0, .[[TEST3LABEL]]			;CHECK-NEXT: beq 0, .[[TEST3LABEL]]
	;CHECK-NEXT: .[[OPT2LABEL]]:			;CHECK-NEXT: .[[OPT2LABEL]]:
	;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29			;CHECK: rlwinm. {{[0-9]+}}, 3, 0, 29, 29
	;CHECK-NEXT: beq 0, .[[TEST4LABEL]]			;CHECK-NEXT: beq 0, .[[TEST4LABEL]]
	;CHECK-NEXT: .[[OPT3LABEL]]:			;CHECK-NEXT: .[[OPT3LABEL]]:
	;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28			;CHECK: rlwinm. {{[0-9]+}}, 3, 0, 28, 28
	;CHECK-NEXT: beq 0, .[[EXITLABEL]]			;CHECK-NEXT: beq 0, .[[EXITLABEL]]
	;CHECK-NEXT: .[[OPT4LABEL]]:			;CHECK-NEXT: .[[OPT4LABEL]]:
	;CHECK: b .[[EXITLABEL]]			;CHECK: b .[[EXITLABEL]]

	define void @straight_test(i32 %tag) {			define void @straight_test(i32 %tag) {
	entry:			entry:
	br label %test1			br label %test1
	test1:			test1:
	▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines
	; Tail duplication puts test n+1 at the end of optional n			; Tail duplication puts test n+1 at the end of optional n
	; so optional1 includes a copy of test2 at the end, and branches			; so optional1 includes a copy of test2 at the end, and branches
	; to test3 (at the top) or falls through to optional 2.			; to test3 (at the top) or falls through to optional 2.
	; The CHECK statements check for the whole string of tests			; The CHECK statements check for the whole string of tests
	; and then check that the correct test has been duplicated into the end of			; and then check that the correct test has been duplicated into the end of
	; the optional blocks and that the optional blocks are in the correct order.			; the optional blocks and that the optional blocks are in the correct order.
	;CHECK-LABEL: straight_test_50:			;CHECK-LABEL: straight_test_50:
	; test1 may have been merged with entry			; test1 may have been merged with entry
	;CHECK: mr [[TAGREG:[0-9]+]], 3			;CHECK: andi. {{[0-9]+}}, 3, 1
	;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1			;CHECK: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: bc 12, 1, .[[OPT1LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: # %test2			;CHECK-NEXT: # %test2
	;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30			;CHECK-NEXT: rlwinm. {{[0-9]+}}, 3, 0, 30, 30
	;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT2LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3			;CHECK-NEXT: .[[TEST3LABEL:[_0-9A-Za-z]+]]: # %test3
	;CHECK-NEXT: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29			;CHECK-NEXT: rlwinm. {{[0-9]+}}, 3, 0, 29, 29
	;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT3LABEL:[_0-9A-Za-z]+]]
	;CHECK-NEXT: .[[EXITLABEL:[_0-9A-Za-z]+]]: # %exit			;CHECK-NEXT: .[[EXITLABEL:[_0-9A-Za-z]+]]: # %exit
	;CHECK: blr			;CHECK: blr
	;CHECK-NEXT: .[[OPT1LABEL]]:			;CHECK-NEXT: .[[OPT1LABEL]]:
	;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30			;CHECK: rlwinm. {{[0-9]+}}, 3, 0, 30, 30
	;CHECK-NEXT: beq 0, .[[TEST3LABEL]]			;CHECK-NEXT: beq 0, .[[TEST3LABEL]]
	;CHECK-NEXT: .[[OPT2LABEL]]:			;CHECK-NEXT: .[[OPT2LABEL]]:
	;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29			;CHECK: rlwinm. {{[0-9]+}}, 3, 0, 29, 29
	;CHECK-NEXT: beq 0, .[[EXITLABEL]]			;CHECK-NEXT: beq 0, .[[EXITLABEL]]
	;CHECK-NEXT: .[[OPT3LABEL]]:			;CHECK-NEXT: .[[OPT3LABEL]]:
	;CHECK: b .[[EXITLABEL]]			;CHECK: b .[[EXITLABEL]]

	define void @straight_test_50(i32 %tag) {			define void @straight_test_50(i32 %tag) {
	entry:			entry:
	br label %test1			br label %test1
	test1:			test1:
	Show All 36 Lines
	; optional3			; optional3
	; optional4			; optional4
	; --- End loop ---			; --- End loop ---
	; exit			; exit
	; The CHECK statements check for the whole string of tests and exit block,			; The CHECK statements check for the whole string of tests and exit block,
	; and then check that the correct test has been duplicated into the end of			; and then check that the correct test has been duplicated into the end of
	; the optional blocks and that the optional blocks are in the correct order.			; the optional blocks and that the optional blocks are in the correct order.
	;CHECK-LABEL: loop_test:			;CHECK-LABEL: loop_test:
	;CHECK: add [[TAGPTRREG:[0-9]+]], 3, 4			;CHECK: extsw [[SEXTREG:[0-9]+]], 4
				;CHECK: add [[TAGPTRREG:[0-9]+]], 3, [[SEXTREG]]
	;CHECK: .[[LATCHLABEL:[._0-9A-Za-z]+]]: # %for.latch			;CHECK: .[[LATCHLABEL:[._0-9A-Za-z]+]]: # %for.latch
	;CHECK: addi			;CHECK: addi
	;CHECK: .[[CHECKLABEL:[._0-9A-Za-z]+]]: # %for.check			;CHECK: .[[CHECKLABEL:[._0-9A-Za-z]+]]: # %for.check
	;CHECK: lwz [[TAGREG:[0-9]+]], 0([[TAGPTRREG]])			;CHECK: lwz [[TAGREG:[0-9]+]], 0([[TAGPTRREG]])
	;CHECK: # %test1			;CHECK: # %test1
	;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1			;CHECK: andi. {{[0-9]+}}, [[TAGREG]], 1
	;CHECK-NEXT: bc 12, 1, .[[OPT1LABEL:[._0-9A-Za-z]+]]			;CHECK: bc 12, 1, .[[OPT1LABEL:[._0-9A-Za-z]+]]
	;CHECK-NEXT: # %test2			;CHECK-NEXT: # %test2
	;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30			;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 30, 30
	;CHECK-NEXT: bne 0, .[[OPT2LABEL:[._0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT2LABEL:[._0-9A-Za-z]+]]
	;CHECK-NEXT: .[[TEST3LABEL:[._0-9A-Za-z]+]]: # %test3			;CHECK-NEXT: .[[TEST3LABEL:[._0-9A-Za-z]+]]: # %test3
	;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29			;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 29, 29
	;CHECK-NEXT: bne 0, .[[OPT3LABEL:[._0-9A-Za-z]+]]			;CHECK-NEXT: bne 0, .[[OPT3LABEL:[._0-9A-Za-z]+]]
	;CHECK-NEXT: .[[TEST4LABEL:[._0-9A-Za-z]+]]: # %{{(test4\|optional3)}}			;CHECK-NEXT: .[[TEST4LABEL:[._0-9A-Za-z]+]]: # %{{(test4\|optional3)}}
	;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28			;CHECK: rlwinm. {{[0-9]+}}, [[TAGREG]], 0, 28, 28
	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines