This is an archive of the discontinued LLVM Phabricator instance.

Implement machine unroller utility class
Needs ReviewPublic

Authored by jverma on Oct 8 2018, 4:28 PM.

Download Raw Diff

Details

Reviewers

Summary

This patch implements the target independent MachineUnroller utility class which provides APIs to perform loop unrolling at the MI level. Only small inner-most loops with the run-time trip count and a single basic block are handled. The unroller is invoked from the software pipeliner if it's determined to improve the resource usage of the loop. With the increased ILP, the pipeliner often generates better code for the small loops with high latency instructions and under utilized resources.

For now, this feature is enabled only for Hexagon. To enable it for other targets, they must extend the MachineUnroller class and provide their own implementation of the target specific APIs. In addition, the target must also implement createMachineUnroller function which creates and returns the pointer to the target's MachineUnroller object.

Diff Detail

Event Timeline

jverma created this revision.Oct 8 2018, 4:28 PM

Herald added subscribers: dmgreen, zzheng, mgorny. · View Herald TranscriptOct 8 2018, 4:28 PM

Hello. Very nice. I don't think I can speak to much of the detail here, especially the Hexagon parts, but can you:

Add full context to the patch (-U99999)
Replace the copyright headers to be more "llvmy"

In D53005#1258653, @dmgreen wrote:

Hello. Very nice. I don't think I can speak to much of the detail here, especially the Hexagon parts, but can you:

Add full context to the patch (-U99999)

Replace the copyright headers to be more "llvmy"

Hi Dave,

Sorry, I missed the copyright header. I will make it inline with the standard llvm header. I didn't quite get what you meant by 'Add the full context to the patch'? Do you want me to provide some background to the patch?

Thanks,
Jyotsna

Hello, see the part about context in https://llvm.org/docs/Phabricator.html#phabricator-request-review-web. It's easier to review things if we can see the code around the patch as well as the code in the patch.

jverma updated this revision to Diff 168841.Oct 9 2018, 11:37 AM

jverma added a reviewer: kparzysz.Oct 9 2018, 11:40 AM

I'd prefer to rename this to MachineLoopUnroll to match the IR pass name

arsenm added inline comments.Oct 9 2018, 6:51 PM

include/llvm/CodeGen/MachineUnroller.h
1	Missing C++ mode comment
lib/CodeGen/MachinePipeliner.cpp
907	I think this is misleading since it isn't a size. InstrCount?
941–948	Why is this under NDEBUG? This looks problematic
960–961	You should avoid using FP types for this
lib/CodeGen/MachineUnroller.cpp
437–439	Isn't this just terminators()?

thegameg added a subscriber: thegameg.Oct 10 2018, 5:45 AM

jverma added inline comments.Oct 10 2018, 1:51 PM

include/llvm/CodeGen/MachineUnroller.h
1	Will fix.
lib/CodeGen/MachinePipeliner.cpp
941–948	It is just for the debugging purpose. There is a command line flag (pipeliner-unroll-max) that can be used to set the unrolling limit and is available only with a debug build.
960–961	Sure. In this case, we do need it here. Since the loops handled by the machine unroller are fairly small in size, ResMII happens to be a small value as well, usually in the range of 1 to 5. For this reason, the change in UnrollResMIIRatio from one unroll factor (i) to another is pretty small and can't be captured without it being a float.
lib/CodeGen/MachineUnroller.cpp
437–439	Will fix.

In D53005#1259889, @arsenm wrote:

I'd prefer to rename this to MachineLoopUnroll to match the IR pass name

I don't really have a preference here. I will be happy to change the name If anyone else feels the same way.

arsenm added inline comments.Oct 11 2018, 2:04 AM

lib/CodeGen/MachinePipeliner.cpp
941–948	The flag should always be available or just removed entirely. Flags that disappear under some builds are really annoying

zzheng added inline comments.Oct 11 2018, 1:54 PM

lib/CodeGen/MachinePipeliner.cpp
960–961	You can scale both ratio to get rid of FP type. UnrollResMIIRatio * (i * MinUnrollFactor) = UnrollResMIIRatio / i * (i * MinUnrollFactor) = UnrollResMIIRatio * MinUnrollFactor; MinResMIIRatio * (i * MinUnrollFactor) = MinResMII / MinUnrollFactor * (i * MinUnrollFactor) = MinResIIRatio * i; if (UnrollResMIIRatio * MinUnrollFactor < MinResIIRatio * i ) { ... } instead two float divs it's now cost two int muls, without lost precision, assuming no mul overflow in real cases.

jverma added inline comments.Oct 12 2018, 8:54 AM

lib/CodeGen/MachinePipeliner.cpp
941–948	That's understandable. I will remove NDEBUG and make it available for all builds.
960–961	Thanks a lot ! I will fix it.

jverma updated this revision to Diff 169458.Oct 12 2018, 10:45 AM

steleman added a subscriber: steleman.Oct 15 2018, 11:10 AM

Ping!

Can someone please review this patch?

ramshankar123 added a subscriber: ramshankar123.Oct 25 2018, 8:24 PM

I am working on an out of tree target and this would be useful.
So what is the status of this patch?

In D53005#2064021, @fpichet wrote:

I am working on an out of tree target and this would be useful.
So what is the status of this patch?

Same here, but unfortunately I'm unable to say that I'm capable of reviewing this work.
One suggestion I have, at least from my use-case, is to modify the heuristic in MachinePipeliner to include the target in the decision.

Continuing to unroll until the ResMII is optimal may not be the best option for every target. Depending on the loop, the target might be able to generate better code if unrolled 2x instead of 4x, even if 4x would be more attractive, theoretically.

I believe you might be able to do this with a target-specific implementation of MachineUnroller, but I don't think that's a great answer. The unroller should unroll, not analyze.

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 28 2020, 1:39 PM

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineUnroller.h

131 lines

TargetPassConfig.h

8 lines

lib/

CodeGen/

CMakeLists.txt

1 line

MachinePipeliner.cpp

195 lines

MachineUnroller.cpp

728 lines

Target/

Hexagon/

CMakeLists.txt

1 line

Hexagon.td

2 lines

HexagonDepInstrInfo.td

8 lines

HexagonMachineUnroller.h

63 lines

HexagonMachineUnroller.cpp

471 lines

HexagonTargetMachine.cpp

12 lines

test/

CodeGen/

Hexagon/

bit-gen-rseq.ll

3 lines

hwloop4.ll

3 lines

late_instr.ll

3 lines

miunroll-optimize-memrefs1.ll

93 lines

miunroll-optimize-memrefs2.ll

65 lines

miunroll-update-memoperands.ll

64 lines

miunroll-update-offset.ll

53 lines

miunroll.ll

55 lines

no-packets.ll

2 lines

simplify64bitops_7223.ll

5 lines

swp-carried-1.ll

4 lines

swp-change-deps.ll

3 lines

swp-epilog-numphis.ll

2 lines

3 lines

3 lines

3 lines

10 lines

3 lines

Diff 168841

include/llvm/CodeGen/MachineUnroller.h

This file was added.

				//===-------- llvm/CodeGen/MachineUnroller.h - Unrolling utilities --------===//
				arsenmUnsubmitted Not Done Reply Inline Actions Missing C++ mode comment arsenm: Missing C++ mode comment
				jvermaAuthorUnsubmitted Not Done Reply Inline Actions Will fix. jverma: Will fix.
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines loop unrolling utilities used at MI level.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_MACHINEUNROLLER_H
				#define LLVM_CODEGEN_MACHINEUNROLLER_H

				#include "llvm/CodeGen/LiveIntervals.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineLoopInfo.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"

				namespace llvm {

				// This is a utility class for unrolling loops at MI level.
				// It only unroll loops with the run-time trip count and
				// with a single basic block.
				//
				// After unrolling, the loop structure will be the following:
				//
				// Original LoopPreheader
				// Unrolled LoopPreheader
				// Unrolled Loop
				// Unrolled LoopExit
				// Remainder LoopPreheader
				// Remainder Loop
				// Remainder LoopExit
				// Original LoopExit

				struct MachineUnrollerContext {
				MachineFunction *MF = nullptr;
				MachineLoopInfo *MLI = nullptr;
				LiveIntervals *LIS = nullptr;
				const TargetInstrInfo *TII = nullptr;
				MachineUnrollerContext() {}
				MachineUnrollerContext(MachineFunction mf, MachineLoopInfo mli,
				LiveIntervals lis, const TargetInstrInfo tii)
				: MF(mf), MLI(mli), LIS(lis), TII(tii) {}
				};

				class MachineUnroller {
				protected:
				MachineFunction *MF = nullptr;
				MachineLoopInfo *MLI = nullptr;
				LiveIntervals *LIS = nullptr;
				const TargetInstrInfo *TII = nullptr;
				MachineRegisterInfo *MRI = nullptr;
				MachineLoop *L;
				MachineBasicBlock *OrigHeader;
				MachineBasicBlock *OrigPreheader;
				MachineBasicBlock *ULPreheader;
				MachineBasicBlock *ULHeader;
				MachineBasicBlock *ULExit;
				MachineBasicBlock *RLPreheader;
				MachineBasicBlock *RLHeader;
				MachineBasicBlock *RLExit;
				MachineBasicBlock *OrigLoopExit;
				MachineInstr *LoopIndVar;
				MachineInstr *LoopCmp;
				unsigned UnrollFactor;
				unsigned LC;
				SmallVector<MachineBasicBlock *, 4> LoopBBs;
				SmallVector<unsigned, 4> ExitBBLiveIns;

				typedef SmallDenseMap<MachineBasicBlock *, DenseMap<unsigned, unsigned>, 4>
				ValueMapTy;
				ValueMapTy VRMap;
				DenseMap<unsigned, unsigned> ULPhiVRMap;
				void createUnrolledLoopStruct();
				void updateInstruction(MachineInstr *NewMI, bool FirstIter,
				ValueMapTy &OldVRMap);
				void generateUnrolledLoop();
				unsigned getMappedRegORCreate(unsigned Reg, MachineBasicBlock *BB);
				void generateNewPhis(MachineBasicBlock BB, MachineBasicBlock BB1,
				MachineBasicBlock *BB2);
				void generatePhisForRLExit();
				void generatePhisForULExit();
				void getExitBBLiveIns();
				void addBBIntoVRMap(MachineBasicBlock *BB);
				void fixBranchesAndLoopCount(unsigned ULCount, unsigned RLCount);
				unsigned getLatestInstance(unsigned reg, MachineBasicBlock *BB,
				ValueMapTy &VRMap);
				void init(MachineLoop *loop, unsigned unrollFactor);
				bool canUnroll();
				void preprocessPhiNodes(MachineBasicBlock &B);

				public:
				MachineUnroller(MachineUnrollerContext *C)
				: MF(C->MF), MLI(C->MLI), LIS(C->LIS), TII(C->TII) {
				MRI = &MF->getRegInfo();
				}

				virtual ~MachineUnroller() = default;

				bool unroll(MachineLoop *loop, unsigned unrollFactor);

				virtual unsigned getLoopCount(MachineBasicBlock &MBB, MachineInstr *IndVar,
				MachineInstr &Cmp) const = 0;

				/// Add instruction to compute trip count for the unrolled loop.
				virtual unsigned addUnrolledLoopCountMI(MachineBasicBlock &MBB, unsigned LC,
				unsigned UnrollFactor) const = 0;

				/// Add instruction to compute remainder trip count for the unrolled loop.
				virtual unsigned addRemLoopCountMI(MachineBasicBlock &MBB, unsigned LC,
				unsigned UnrollFactor) const = 0;

				virtual void changeLoopCount(MachineBasicBlock &BB,
				MachineBasicBlock &Preheader,
				MachineBasicBlock &Header, unsigned LC,
				MachineInstr *IndVar, MachineInstr &Cmp,
				SmallVectorImpl<MachineOperand> &Cond) const = 0;

				bool computeDelta(MachineInstr &MI, unsigned &Delta) const;
				void updateMemOperands(MachineInstr NewMI, MachineInstr OldMI,
				unsigned iter) const;
				virtual void optimize(MachineBasicBlock &BB) const {};
				};
				} // namespace llvm
				#endif

include/llvm/CodeGen/TargetPassConfig.h

Show All 19 Lines
#include <string>		#include <string>

namespace llvm {		namespace llvm {

class LLVMTargetMachine;		class LLVMTargetMachine;
struct MachineSchedContext;		struct MachineSchedContext;
class PassConfigImpl;		class PassConfigImpl;
class ScheduleDAGInstrs;		class ScheduleDAGInstrs;
		class MachineUnroller;
		struct MachineUnrollerContext;

// The old pass manager infrastructure is hidden in a legacy namespace now.		// The old pass manager infrastructure is hidden in a legacy namespace now.
namespace legacy {		namespace legacy {

class PassManagerBase;		class PassManagerBase;

} // end namespace legacy		} // end namespace legacy

▲ Show 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	public:
/// MachineScheduler pass for this function and target at the current		/// MachineScheduler pass for this function and target at the current
/// optimization level.		/// optimization level.
///		///
/// This can also be used to plug a new MachineSchedStrategy into an instance		/// This can also be used to plug a new MachineSchedStrategy into an instance
/// of the standard ScheduleDAGMI:		/// of the standard ScheduleDAGMI:
/// return new ScheduleDAGMI(C, make_unique<MyStrategy>(C), /RemoveKillFlags=/false)		/// return new ScheduleDAGMI(C, make_unique<MyStrategy>(C), /RemoveKillFlags=/false)
///		///
/// Return NULL to select the default (generic) machine scheduler.		/// Return NULL to select the default (generic) machine scheduler.

virtual ScheduleDAGInstrs *		virtual ScheduleDAGInstrs *
createMachineScheduler(MachineSchedContext *C) const {		createMachineScheduler(MachineSchedContext *C) const {
return nullptr;		return nullptr;
}		}

		virtual MachineUnroller *
		createMachineUnroller(MachineUnrollerContext *C) const {
		return nullptr;
		}

/// Similar to createMachineScheduler but used when postRA machine scheduling		/// Similar to createMachineScheduler but used when postRA machine scheduling
/// is enabled.		/// is enabled.
virtual ScheduleDAGInstrs *		virtual ScheduleDAGInstrs *
createPostMachineScheduler(MachineSchedContext *C) const {		createPostMachineScheduler(MachineSchedContext *C) const {
return nullptr;		return nullptr;
}		}

/// printAndVerify - Add a pass to dump then verify the machine function, if		/// printAndVerify - Add a pass to dump then verify the machine function, if
▲ Show 20 Lines • Show All 138 Lines • Show Last 20 Lines

lib/CodeGen/CMakeLists.txt

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	add_llvm_library(LLVMCodeGen
MachinePipeliner.cpp		MachinePipeliner.cpp
MachinePostDominators.cpp		MachinePostDominators.cpp
MachineRegionInfo.cpp		MachineRegionInfo.cpp
MachineRegisterInfo.cpp		MachineRegisterInfo.cpp
MachineScheduler.cpp		MachineScheduler.cpp
MachineSink.cpp		MachineSink.cpp
MachineSSAUpdater.cpp		MachineSSAUpdater.cpp
MachineTraceMetrics.cpp		MachineTraceMetrics.cpp
		MachineUnroller.cpp
MachineVerifier.cpp		MachineVerifier.cpp
PatchableFunction.cpp		PatchableFunction.cpp
MIRPrinter.cpp		MIRPrinter.cpp
MIRPrintingPass.cpp		MIRPrintingPass.cpp
MacroFusion.cpp		MacroFusion.cpp
OptimizePHIs.cpp		OptimizePHIs.cpp
ParallelCG.cpp		ParallelCG.cpp
PeepholeOptimizer.cpp		PeepholeOptimizer.cpp
▲ Show 20 Lines • Show All 76 Lines • Show Last 20 Lines

lib/CodeGen/MachinePipeliner.cpp

Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"		#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineLoopInfo.h"		#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineMemOperand.h"		#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineOperand.h"		#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/RegisterClassInfo.h"		#include "llvm/CodeGen/RegisterClassInfo.h"
#include "llvm/CodeGen/RegisterPressure.h"		#include "llvm/CodeGen/RegisterPressure.h"
		#include "llvm/CodeGen/MachineUnroller.h"
#include "llvm/CodeGen/ScheduleDAG.h"		#include "llvm/CodeGen/ScheduleDAG.h"
#include "llvm/CodeGen/ScheduleDAGInstrs.h"		#include "llvm/CodeGen/ScheduleDAGInstrs.h"
#include "llvm/CodeGen/ScheduleDAGMutation.h"		#include "llvm/CodeGen/ScheduleDAGMutation.h"
#include "llvm/CodeGen/TargetInstrInfo.h"		#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetOpcodes.h"		#include "llvm/CodeGen/TargetOpcodes.h"
		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"		#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"		#include "llvm/CodeGen/TargetSubtargetInfo.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/IR/Attributes.h"		#include "llvm/IR/Attributes.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/MC/LaneBitmask.h"		#include "llvm/MC/LaneBitmask.h"
#include "llvm/MC/MCInstrDesc.h"		#include "llvm/MC/MCInstrDesc.h"
Show All 26 Lines
STATISTIC(NumPipelined, "Number of loops software pipelined");		STATISTIC(NumPipelined, "Number of loops software pipelined");
STATISTIC(NumNodeOrderIssues, "Number of node order issues found");		STATISTIC(NumNodeOrderIssues, "Number of node order issues found");

/// A command line option to turn software pipelining on or off.		/// A command line option to turn software pipelining on or off.
static cl::opt<bool> EnableSWP("enable-pipeliner", cl::Hidden, cl::init(true),		static cl::opt<bool> EnableSWP("enable-pipeliner", cl::Hidden, cl::init(true),
cl::ZeroOrMore,		cl::ZeroOrMore,
cl::desc("Enable Software Pipelining"));		cl::desc("Enable Software Pipelining"));

		/// A command line option to turn unrolling on or off before pipeling the loop.
		static cl::opt<bool>
		EnableSWPUnroll("enable-pipeliner-unroll", cl::Hidden, cl::init(false),
		cl::ZeroOrMore, cl::desc("Enable runtime unrolling before pipelining"));

		/// A command line argument to limit size of the unrolled loop.
		static cl::opt<unsigned> SwpUnrollThres("pipeliner-unroll-threshold",
		cl::desc("Size limit for the unrolled loop."),
		cl::Hidden, cl::init(30));

/// A command line option to enable SWP at -Os.		/// A command line option to enable SWP at -Os.
static cl::opt<bool> EnableSWPOptSize("enable-pipeliner-opt-size",		static cl::opt<bool> EnableSWPOptSize("enable-pipeliner-opt-size",
cl::desc("Enable SWP at Os."), cl::Hidden,		cl::desc("Enable SWP at Os."), cl::Hidden,
cl::init(false));		cl::init(false));

/// A command line argument to limit minimum initial interval for pipelining.		/// A command line argument to limit minimum initial interval for pipelining.
static cl::opt<int> SwpMaxMii("pipeliner-max-mii",		static cl::opt<int> SwpMaxMii("pipeliner-max-mii",
cl::desc("Size limit for the MII."),		cl::desc("Size limit for the MII."),
Show All 16 Lines
/// dependences.		/// dependences.
static cl::opt<bool>		static cl::opt<bool>
SwpPruneLoopCarried("pipeliner-prune-loop-carried",		SwpPruneLoopCarried("pipeliner-prune-loop-carried",
cl::desc("Prune loop carried order dependences."),		cl::desc("Prune loop carried order dependences."),
cl::Hidden, cl::init(true));		cl::Hidden, cl::init(true));

#ifndef NDEBUG		#ifndef NDEBUG
static cl::opt<int> SwpLoopLimit("pipeliner-max", cl::Hidden, cl::init(-1));		static cl::opt<int> SwpLoopLimit("pipeliner-max", cl::Hidden, cl::init(-1));
		static cl::opt<int> SwpUnrollLimit("pipeliner-unroll-max",
		cl::Hidden, cl::init(-1));
#endif		#endif

static cl::opt<bool> SwpIgnoreRecMII("pipeliner-ignore-recmii",		static cl::opt<bool> SwpIgnoreRecMII("pipeliner-ignore-recmii",
cl::ReallyHidden, cl::init(false),		cl::ReallyHidden, cl::init(false),
cl::ZeroOrMore, cl::desc("Ignore RecMII"));		cl::ZeroOrMore, cl::desc("Ignore RecMII"));

namespace {		namespace {

class NodeSet;		class NodeSet;
class SMSchedule;		class SMSchedule;

/// The main class in the implementation of the target independent		/// The main class in the implementation of the target independent
/// software pipeliner pass.		/// software pipeliner pass.
class MachinePipeliner : public MachineFunctionPass {		class MachinePipeliner : public MachineFunctionPass {
public:		public:
		const TargetPassConfig *PassConfig = nullptr;
MachineFunction *MF = nullptr;		MachineFunction *MF = nullptr;
const MachineLoopInfo *MLI = nullptr;		const MachineLoopInfo *MLI = nullptr;
const MachineDominatorTree *MDT = nullptr;		const MachineDominatorTree *MDT = nullptr;
const InstrItineraryData *InstrItins;		const InstrItineraryData *InstrItins;
const TargetInstrInfo *TII = nullptr;		const TargetInstrInfo *TII = nullptr;
		MachineUnroller *Unroller = nullptr;
RegisterClassInfo RegClassInfo;		RegisterClassInfo RegClassInfo;

#ifndef NDEBUG		#ifndef NDEBUG
static int NumTries;		static int NumTries;
#endif		#endif

/// Cache the target analysis information about the loop.		/// Cache the target analysis information about the loop.
struct LoopInfo {		struct LoopInfo {
Show All 14 Lines	#endif
bool runOnMachineFunction(MachineFunction &MF) override;		bool runOnMachineFunction(MachineFunction &MF) override;

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addPreserved<AAResultsWrapperPass>();		AU.addPreserved<AAResultsWrapperPass>();
AU.addRequired<MachineLoopInfo>();		AU.addRequired<MachineLoopInfo>();
AU.addRequired<MachineDominatorTree>();		AU.addRequired<MachineDominatorTree>();
AU.addRequired<LiveIntervals>();		AU.addRequired<LiveIntervals>();
		AU.addRequired<TargetPassConfig>();
MachineFunctionPass::getAnalysisUsage(AU);		MachineFunctionPass::getAnalysisUsage(AU);
}		}

private:		private:
void preprocessPhiNodes(MachineBasicBlock &B);		void preprocessPhiNodes(MachineBasicBlock &B);
bool canPipelineLoop(MachineLoop &L);		bool canPipelineLoop(MachineLoop &L);
bool scheduleLoop(MachineLoop &L);		bool scheduleLoop(MachineLoop &L);
bool swingModuloScheduler(MachineLoop &L);		bool swingModuloScheduler(MachineLoop &L);
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines

public:		public:
SwingSchedulerDAG(MachinePipeliner &P, MachineLoop &L, LiveIntervals &lis,		SwingSchedulerDAG(MachinePipeliner &P, MachineLoop &L, LiveIntervals &lis,
const RegisterClassInfo &rci)		const RegisterClassInfo &rci)
: ScheduleDAGInstrs(*P.MF, P.MLI, false), Pass(P), Loop(L), LIS(lis),		: ScheduleDAGInstrs(*P.MF, P.MLI, false), Pass(P), Loop(L), LIS(lis),
RegClassInfo(rci), Topo(SUnits, &ExitSU) {		RegClassInfo(rci), Topo(SUnits, &ExitSU) {
P.MF->getSubtarget().getSMSMutations(Mutations);		P.MF->getSubtarget().getSMSMutations(Mutations);
}		}
		#ifndef NDEBUG
		static int NumUnrollTries;
		#endif

void schedule() override;		void schedule() override;
void finishBlock() override;		void finishBlock() override;

/// Return true if the loop kernel has been scheduled.		/// Return true if the loop kernel has been scheduled.
bool hasNewSchedule() { return Scheduled; }		bool hasNewSchedule() { return Scheduled; }

/// Return the earliest time an instruction may be scheduled.		/// Return the earliest time an instruction may be scheduled.
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	#endif
void addMutation(std::unique_ptr<ScheduleDAGMutation> Mutation) {		void addMutation(std::unique_ptr<ScheduleDAGMutation> Mutation) {
Mutations.push_back(std::move(Mutation));		Mutations.push_back(std::move(Mutation));
}		}

private:		private:
void addLoopCarriedDependences(AliasAnalysis *AA);		void addLoopCarriedDependences(AliasAnalysis *AA);
void updatePhiDependences();		void updatePhiDependences();
void changeDependences();		void changeDependences();
unsigned calculateResMII();		unsigned calculateResMII(unsigned UnrollCount = 1);
unsigned calculateRecMII(NodeSetType &RecNodeSets);		unsigned calculateRecMII(NodeSetType &RecNodeSets);
void findCircuits(NodeSetType &NodeSets);		void findCircuits(NodeSetType &NodeSets);
void fuseRecs(NodeSetType &NodeSets);		void fuseRecs(NodeSetType &NodeSets);
void removeDuplicateNodes(NodeSetType &NodeSets);		void removeDuplicateNodes(NodeSetType &NodeSets);
void computeNodeFunctions(NodeSetType &NodeSets);		void computeNodeFunctions(NodeSetType &NodeSets);
void registerPressureFilter(NodeSetType &NodeSets);		void registerPressureFilter(NodeSetType &NodeSets);
void colocateNodeSets(NodeSetType &NodeSets);		void colocateNodeSets(NodeSetType &NodeSets);
void checkNodeSets(NodeSetType &NodeSets);		void checkNodeSets(NodeSetType &NodeSets);
Show All 19 Lines	void generatePhis(MachineBasicBlock NewBB, MachineBasicBlock BB1,
MachineBasicBlock BB2, MachineBasicBlock KernelBB,		MachineBasicBlock BB2, MachineBasicBlock KernelBB,
SMSchedule &Schedule, ValueMapTy *VRMap,		SMSchedule &Schedule, ValueMapTy *VRMap,
InstrMapTy &InstrMap, unsigned LastStageNum,		InstrMapTy &InstrMap, unsigned LastStageNum,
unsigned CurStageNum, bool IsLast);		unsigned CurStageNum, bool IsLast);
void removeDeadInstructions(MachineBasicBlock *KernelBB,		void removeDeadInstructions(MachineBasicBlock *KernelBB,
MBBVectorTy &EpilogBBs);		MBBVectorTy &EpilogBBs);
void splitLifetimes(MachineBasicBlock *KernelBB, MBBVectorTy &EpilogBBs,		void splitLifetimes(MachineBasicBlock *KernelBB, MBBVectorTy &EpilogBBs,
SMSchedule &Schedule);		SMSchedule &Schedule);
		void removeBB(MachineBasicBlock *RemoveBB, MBBVectorTy &UpdateBBs);
void addBranches(MBBVectorTy &PrologBBs, MachineBasicBlock *KernelBB,		void addBranches(MBBVectorTy &PrologBBs, MachineBasicBlock *KernelBB,
MBBVectorTy &EpilogBBs, SMSchedule &Schedule,		MBBVectorTy &EpilogBBs, MBBVectorTy &UpdateBBs,
ValueMapTy *VRMap);		SMSchedule &Schedule, ValueMapTy *VRMap);
bool computeDelta(MachineInstr &MI, unsigned &Delta);		bool computeDelta(MachineInstr &MI, unsigned &Delta);
void updateMemOperands(MachineInstr &NewMI, MachineInstr &OldMI,		void updateMemOperands(MachineInstr &NewMI, MachineInstr &OldMI,
unsigned Num);		unsigned Num);
MachineInstr cloneInstr(MachineInstr OldMI, unsigned CurStageNum,		MachineInstr cloneInstr(MachineInstr OldMI, unsigned CurStageNum,
unsigned InstStageNum);		unsigned InstStageNum);
MachineInstr cloneAndChangeInstr(MachineInstr OldMI, unsigned CurStageNum,		MachineInstr cloneAndChangeInstr(MachineInstr OldMI, unsigned CurStageNum,
unsigned InstStageNum,		unsigned InstStageNum,
SMSchedule &Schedule);		SMSchedule &Schedule);
▲ Show 20 Lines • Show All 281 Lines • ▼ Show 20 Lines
};		};

} // end anonymous namespace		} // end anonymous namespace

unsigned SwingSchedulerDAG::Circuits::MaxPaths = 5;		unsigned SwingSchedulerDAG::Circuits::MaxPaths = 5;
char MachinePipeliner::ID = 0;		char MachinePipeliner::ID = 0;
#ifndef NDEBUG		#ifndef NDEBUG
int MachinePipeliner::NumTries = 0;		int MachinePipeliner::NumTries = 0;
		int SwingSchedulerDAG::NumUnrollTries = 0;
#endif		#endif
char &llvm::MachinePipelinerID = MachinePipeliner::ID;		char &llvm::MachinePipelinerID = MachinePipeliner::ID;

INITIALIZE_PASS_BEGIN(MachinePipeliner, DEBUG_TYPE,		INITIALIZE_PASS_BEGIN(MachinePipeliner, DEBUG_TYPE,
"Modulo Software Pipelining", false, false)		"Modulo Software Pipelining", false, false)
INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)		INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)		INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)		INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
Show All 14 Lines	if (mf.getFunction().getAttributes().hasAttribute(
!EnableSWPOptSize.getPosition())		!EnableSWPOptSize.getPosition())
return false;		return false;

MF = &mf;		MF = &mf;
MLI = &getAnalysis<MachineLoopInfo>();		MLI = &getAnalysis<MachineLoopInfo>();
MDT = &getAnalysis<MachineDominatorTree>();		MDT = &getAnalysis<MachineDominatorTree>();
TII = MF->getSubtarget().getInstrInfo();		TII = MF->getSubtarget().getInstrInfo();
RegClassInfo.runOnMachineFunction(*MF);		RegClassInfo.runOnMachineFunction(*MF);
		PassConfig = &getAnalysis<TargetPassConfig>();
		if (EnableSWPUnroll) {
		MachineUnrollerContext C(MF, &getAnalysis<MachineLoopInfo>(),
		&getAnalysis<LiveIntervals>(), TII);
		Unroller = PassConfig->createMachineUnroller(&C);
		}
for (auto &L : *MLI)		for (auto &L : *MLI)
scheduleLoop(*L);		scheduleLoop(*L);

		delete Unroller;
return false;		return false;
}		}

/// Attempt to perform the SMS algorithm on the specified loop. This function is		/// Attempt to perform the SMS algorithm on the specified loop. This function is
/// the main entry point for the algorithm. The function identifies candidate		/// the main entry point for the algorithm. The function identifies candidate
/// loops, calculates the minimum initiation interval, and attempts to schedule		/// loops, calculates the minimum initiation interval, and attempts to schedule
/// the loop.		/// the loop.
bool MachinePipeliner::scheduleLoop(MachineLoop &L) {		bool MachinePipeliner::scheduleLoop(MachineLoop &L) {
▲ Show 20 Lines • Show All 104 Lines • ▼ Show 20 Lines	bool MachinePipeliner::swingModuloScheduler(MachineLoop &L) {
SMS.enterRegion(MBB, MBB->begin(), MBB->getFirstTerminator(), size);		SMS.enterRegion(MBB, MBB->begin(), MBB->getFirstTerminator(), size);
SMS.schedule();		SMS.schedule();
SMS.exitRegion();		SMS.exitRegion();

SMS.finishBlock();		SMS.finishBlock();
return SMS.hasNewSchedule();		return SMS.hasNewSchedule();
}		}

		static unsigned getNonDebugMBBSize(MachineBasicBlock *MBB) {
		arsenmUnsubmitted Not Done Reply Inline Actions I think this is misleading since it isn't a size. InstrCount? arsenm: I think this is misleading since it isn't a size. InstrCount?
		int size = 0;
		for (MachineBasicBlock::iterator I = MBB->getFirstNonPHI(),
		E = MBB->getFirstTerminator();
		I != E; ++I) {
		if (!I->isDebugValue())
		size++;
		}
		return size;
		}

/// We override the schedule function in ScheduleDAGInstrs to implement the		/// We override the schedule function in ScheduleDAGInstrs to implement the
/// scheduling part of the Swing Modulo Scheduling algorithm.		/// scheduling part of the Swing Modulo Scheduling algorithm.
void SwingSchedulerDAG::schedule() {		void SwingSchedulerDAG::schedule() {
AliasAnalysis *AA = &Pass.getAnalysis<AAResultsWrapperPass>().getAAResults();		AliasAnalysis *AA = &Pass.getAnalysis<AAResultsWrapperPass>().getAAResults();
		MachineLoopInfo *MLI = &Pass.getAnalysis<MachineLoopInfo>();

buildSchedGraph(AA);		buildSchedGraph(AA);
addLoopCarriedDependences(AA);		addLoopCarriedDependences(AA);
updatePhiDependences();		updatePhiDependences();
Topo.InitDAGTopologicalSorting();		Topo.InitDAGTopologicalSorting();
postprocessDAG();		postprocessDAG();
changeDependences();		changeDependences();
LLVM_DEBUG(dump());		LLVM_DEBUG(dump());

NodeSetType NodeSets;		NodeSetType NodeSets;
findCircuits(NodeSets);		findCircuits(NodeSets);
NodeSetType Circuits = NodeSets;		NodeSetType Circuits = NodeSets;

// Calculate the MII.		// Calculate the MII.
unsigned ResMII = calculateResMII();		unsigned ResMII = calculateResMII();
unsigned RecMII = calculateRecMII(NodeSets);		unsigned RecMII = calculateRecMII(NodeSets);

		bool UnrollLimitReached = false;
		#ifndef NDEBUG
		// Stop unrolling after reaching the limit (if any).
		int Limit = SwpUnrollLimit;
		if (Limit >= 0) {
		if (NumUnrollTries >= SwpUnrollLimit)
		UnrollLimitReached = true;
		}
		#endif
		arsenmUnsubmitted Not Done Reply Inline Actions Why is this under NDEBUG? This looks problematic arsenm: Why is this under NDEBUG? This looks problematic
		jvermaAuthorUnsubmitted Not Done Reply Inline Actions It is just for the debugging purpose. There is a command line flag (pipeliner-unroll-max) that can be used to set the unrolling limit and is available only with a debug build. jverma: It is just for the debugging purpose. There is a command line flag (pipeliner-unroll-max) that…
		arsenmUnsubmitted Not Done Reply Inline Actions The flag should always be available or just removed entirely. Flags that disappear under some builds are really annoying arsenm: The flag should always be available or just removed entirely. Flags that disappear under some…
		jvermaAuthorUnsubmitted Not Done Reply Inline Actions That's understandable. I will remove NDEBUG and make it available for all builds. jverma: That's understandable. I will remove NDEBUG and make it available for all builds.

		// Try to unroll the loop only if ResMII >= RecMII.
		if ((ResMII >= RecMII) && EnableSWPUnroll && !UnrollLimitReached) {
		unsigned MinResMII = ResMII;
		unsigned MinUnrollFactor = 1;
		unsigned UnrollThres = 4;
		unsigned LoopHeaderSize = getNonDebugMBBSize(Loop.getHeader());
		for (unsigned i = 2; i <= UnrollThres; i+=2) {
		unsigned UnrollResMII = calculateResMII(i);
		LLVM_DEBUG(dbgs() << "Unroll Factor = " << i << "(res=" << UnrollResMII
		<< ")\n");
		float UnrollResMIIRatio = (float) UnrollResMII / i;
		float MinResMIIRatio = (float) MinResMII / MinUnrollFactor;
		arsenmUnsubmitted Not Done Reply Inline Actions You should avoid using FP types for this arsenm: You should avoid using FP types for this
		jvermaAuthorUnsubmitted Not Done Reply Inline Actions Sure. In this case, we do need it here. Since the loops handled by the machine unroller are fairly small in size, ResMII happens to be a small value as well, usually in the range of 1 to 5. For this reason, the change in UnrollResMIIRatio from one unroll factor (i) to another is pretty small and can't be captured without it being a float. jverma: Sure. In this case, we do need it here. Since the loops handled by the machine unroller are…
		zzhengUnsubmitted Not Done Reply Inline Actions You can scale both ratio to get rid of FP type. UnrollResMIIRatio * (i * MinUnrollFactor) = UnrollResMIIRatio / i * (i * MinUnrollFactor) = UnrollResMIIRatio * MinUnrollFactor; MinResMIIRatio * (i * MinUnrollFactor) = MinResMII / MinUnrollFactor * (i * MinUnrollFactor) = MinResIIRatio * i; if (UnrollResMIIRatio * MinUnrollFactor < MinResIIRatio * i ) { ... } instead two float divs it's now cost two int muls, without lost precision, assuming no mul overflow in real cases. zzheng: You can scale both ratio to get rid of FP type. ``` UnrollResMIIRatio * (i * MinUnrollFactor)…
		jvermaAuthorUnsubmitted Not Done Reply Inline Actions Thanks a lot ! I will fix it. jverma: Thanks a lot ! I will fix it.
		if (UnrollResMIIRatio < MinResMIIRatio &&
		(LoopHeaderSize * i) <= SwpUnrollThres) {
		MinResMII = UnrollResMII;
		MinUnrollFactor = i;
		}
		}

		LLVM_DEBUG(dbgs() << "Best Unroll Factor = " << MinUnrollFactor
		<< "(res=" << MinResMII << ")\n");

		bool Changed = false;
		if (MinUnrollFactor > 1)
		Changed = Pass.Unroller->unroll(&Loop, MinUnrollFactor);

		if (Changed) {
		#ifndef NDEBUG
		NumUnrollTries++;
		#endif
		this->MLI = MLI;
		Pass.LI.TBB = nullptr;
		Pass.LI.FBB = nullptr;
		Pass.LI.BrCond.clear();
		if (TII->analyzeBranch(*Loop.getHeader(), Pass.LI.TBB, Pass.LI.FBB, Pass.LI.BrCond))
		return;

		Pass.LI.LoopInductionVar = nullptr;
		Pass.LI.LoopCompare = nullptr;
		if (TII->analyzeLoop(Loop, Pass.LI.LoopInductionVar, Pass.LI.LoopCompare))
		return;

		MachineBasicBlock *MBB = Loop.getHeader();
		startBlock(MBB);
		unsigned size = MBB->size();
		enterRegion(MBB, MBB->begin(), MBB->getFirstTerminator(), size);
		buildSchedGraph(AA);
		addLoopCarriedDependences(AA);
		updatePhiDependences();
		Topo.InitDAGTopologicalSorting();
		postprocessDAG();
		changeDependences();
		LLVM_DEBUG(dump());

		NodeSets.clear();
		findCircuits(NodeSets);

		// Recalculate the MII after unrolling.
		ResMII = calculateResMII();
		RecMII = calculateRecMII(NodeSets);
		}
		}

fuseRecs(NodeSets);		fuseRecs(NodeSets);

// This flag is used for testing and can cause correctness problems.		// This flag is used for testing and can cause correctness problems.
if (SwpIgnoreRecMII)		if (SwpIgnoreRecMII)
RecMII = 0;		RecMII = 0;

MII = std::max(ResMII, RecMII);		MII = std::max(ResMII, RecMII);
LLVM_DEBUG(dbgs() << "MII = " << MII << " (rec=" << RecMII		LLVM_DEBUG(dbgs() << "MII = " << MII << " (rec=" << RecMII
▲ Show 20 Lines • Show All 455 Lines • ▼ Show 20 Lines
} // end anonymous namespace		} // end anonymous namespace

/// Calculate the resource constrained minimum initiation interval for the		/// Calculate the resource constrained minimum initiation interval for the
/// specified loop. We use the DFA to model the resources needed for		/// specified loop. We use the DFA to model the resources needed for
/// each instruction, and we ignore dependences. A different DFA is created		/// each instruction, and we ignore dependences. A different DFA is created
/// for each cycle that is required. When adding a new instruction, we attempt		/// for each cycle that is required. When adding a new instruction, we attempt
/// to add it to each existing DFA, until a legal space is found. If the		/// to add it to each existing DFA, until a legal space is found. If the
/// instruction cannot be reserved in an existing DFA, we create a new one.		/// instruction cannot be reserved in an existing DFA, we create a new one.
unsigned SwingSchedulerDAG::calculateResMII() {		unsigned SwingSchedulerDAG::calculateResMII(unsigned UnrollFactor) {
SmallVector<DFAPacketizer *, 8> Resources;		SmallVector<DFAPacketizer *, 8> Resources;
MachineBasicBlock *MBB = Loop.getHeader();		MachineBasicBlock *MBB = Loop.getHeader();
Resources.push_back(TII->CreateTargetScheduleState(MF.getSubtarget()));		Resources.push_back(TII->CreateTargetScheduleState(MF.getSubtarget()));

// Sort the instructions by the number of available choices for scheduling,		// Sort the instructions by the number of available choices for scheduling,
// least to most. Use the number of critical resources as the tie breaker.		// least to most. Use the number of critical resources as the tie breaker.
FuncUnitSorter FUS =		FuncUnitSorter FUS =
FuncUnitSorter(MF.getSubtarget().getInstrItineraryData());		FuncUnitSorter(MF.getSubtarget().getInstrItineraryData());
for (MachineBasicBlock::iterator I = MBB->getFirstNonPHI(),		for (MachineBasicBlock::iterator I = MBB->getFirstNonPHI(),
E = MBB->getFirstTerminator();		E = MBB->getFirstTerminator();
I != E; ++I)		I != E; ++I)
FUS.calcCriticalResources(*I);		FUS.calcCriticalResources(*I);
PriorityQueue<MachineInstr , std::vector<MachineInstr >, FuncUnitSorter>		PriorityQueue<MachineInstr , std::vector<MachineInstr >, FuncUnitSorter>
FuncUnitOrder(FUS);		FuncUnitOrder(FUS);

		// To compute ResMII for the unrolled loop, simply replicate instructions as
		// many times as the unroll factor.
		for (unsigned i = 0; i < UnrollFactor; i++) {
for (MachineBasicBlock::iterator I = MBB->getFirstNonPHI(),		for (MachineBasicBlock::iterator I = MBB->getFirstNonPHI(),
E = MBB->getFirstTerminator();		E = MBB->getFirstTerminator();
I != E; ++I)		I != E; ++I)
FuncUnitOrder.push(&*I);		FuncUnitOrder.push(&*I);
		}
while (!FuncUnitOrder.empty()) {		while (!FuncUnitOrder.empty()) {
MachineInstr *MI = FuncUnitOrder.top();		MachineInstr *MI = FuncUnitOrder.top();
FuncUnitOrder.pop();		FuncUnitOrder.pop();
if (TII->isZeroCost(MI->getOpcode()))		if (TII->isZeroCost(MI->getOpcode()))
continue;		continue;
// Attempt to reserve the instruction in an existing DFA. At least one		// Attempt to reserve the instruction in an existing DFA. At least one
// DFA is needed for each cycle.		// DFA is needed for each cycle.
unsigned NumCycles = getSUnit(MI)->Latency;		unsigned NumCycles = getSUnit(MI)->Latency;
▲ Show 20 Lines • Show All 909 Lines • ▼ Show 20 Lines	bool SwingSchedulerDAG::schedulePipeline(SMSchedule &Schedule) {
if (scheduleFound)		if (scheduleFound)
Schedule.finalizeSchedule(this);		Schedule.finalizeSchedule(this);
else		else
Schedule.reset();		Schedule.reset();

return scheduleFound && Schedule.getMaxStageCount() > 0;		return scheduleFound && Schedule.getMaxStageCount() > 0;
}		}

		static void updateLiveness(SmallVector<MachineBasicBlock *, 4> &MBBList,
		LiveIntervals &LIS) {
		for (auto MBB: MBBList) {
		for (MachineInstr &MI : *MBB) {
		if (!LIS.isNotInMIMap(MI))
		LIS.RemoveMachineInstrFromMaps(MI);
		if (MI.isDebugValue())
		continue;
		LIS.InsertMachineInstrInMaps(MI);
		}
		}
		}


/// Given a schedule for the loop, generate a new version of the loop,		/// Given a schedule for the loop, generate a new version of the loop,
/// and replace the old version. This function generates a prolog		/// and replace the old version. This function generates a prolog
/// that contains the initial iterations in the pipeline, and kernel		/// that contains the initial iterations in the pipeline, and kernel
/// loop, and the epilogue that contains the code for the final		/// loop, and the epilogue that contains the code for the final
/// iterations.		/// iterations.
void SwingSchedulerDAG::generatePipelinedLoop(SMSchedule &Schedule) {		void SwingSchedulerDAG::generatePipelinedLoop(SMSchedule &Schedule) {
// Create a new basic block for the kernel and add it to the CFG.		// Create a new basic block for the kernel and add it to the CFG.
MachineBasicBlock *KernelBB = MF.CreateMachineBasicBlock(BB->getBasicBlock());		MachineBasicBlock *KernelBB = MF.CreateMachineBasicBlock(BB->getBasicBlock());

unsigned MaxStageCount = Schedule.getMaxStageCount();		unsigned MaxStageCount = Schedule.getMaxStageCount();

// Remember the registers that are used in different stages. The index is		// Remember the registers that are used in different stages. The index is
// the iteration, or stage, that the instruction is scheduled in. This is		// the iteration, or stage, that the instruction is scheduled in. This is
// a map between register names in the original block and the names created		// a map between register names in the original block and the names created
// in each stage of the pipelined loop.		// in each stage of the pipelined loop.
ValueMapTy VRMap = new ValueMapTy[(MaxStageCount + 1) 2];		ValueMapTy VRMap = new ValueMapTy[(MaxStageCount + 1) 2];
InstrMapTy InstrMap;		InstrMapTy InstrMap;

SmallVector<MachineBasicBlock *, 4> PrologBBs;		SmallVector<MachineBasicBlock *, 4> PrologBBs;
// Generate the prolog instructions that set up the pipeline.		// Generate the prolog instructions that set up the pipeline.
generateProlog(Schedule, MaxStageCount, KernelBB, VRMap, PrologBBs);		generateProlog(Schedule, MaxStageCount, KernelBB, VRMap, PrologBBs);
MF.insert(BB->getIterator(), KernelBB);		MF.insert(BB->getIterator(), KernelBB);
		LIS.insertMBBInMaps(KernelBB);

// Rearrange the instructions to generate the new, pipelined loop,		// Rearrange the instructions to generate the new, pipelined loop,
// and update register names as needed.		// and update register names as needed.
for (int Cycle = Schedule.getFirstCycle(),		for (int Cycle = Schedule.getFirstCycle(),
LastCycle = Schedule.getFinalCycle();		LastCycle = Schedule.getFinalCycle();
Cycle <= LastCycle; ++Cycle) {		Cycle <= LastCycle; ++Cycle) {
std::deque<SUnit *> &CycleInstrs = Schedule.getInstructions(Cycle);		std::deque<SUnit *> &CycleInstrs = Schedule.getInstructions(Cycle);
// This inner loop schedules each instruction in the cycle.		// This inner loop schedules each instruction in the cycle.
Show All 36 Lines	void SwingSchedulerDAG::generatePipelinedLoop(SMSchedule &Schedule) {

// We need this step because the register allocation doesn't handle some		// We need this step because the register allocation doesn't handle some
// situations well, so we insert copies to help out.		// situations well, so we insert copies to help out.
splitLifetimes(KernelBB, EpilogBBs, Schedule);		splitLifetimes(KernelBB, EpilogBBs, Schedule);

// Remove dead instructions due to loop induction variables.		// Remove dead instructions due to loop induction variables.
removeDeadInstructions(KernelBB, EpilogBBs);		removeDeadInstructions(KernelBB, EpilogBBs);

		// Add PrologBBs, KernelBB and EpilogBBs for the liveness update later.
		SmallVector<MachineBasicBlock *, 4> UpdateBBs;
		UpdateBBs.insert(UpdateBBs.begin(), PrologBBs.begin(), PrologBBs.end());
		UpdateBBs.insert(UpdateBBs.end(), KernelBB);
		UpdateBBs.insert(UpdateBBs.end(), EpilogBBs.begin(), EpilogBBs.end());

// Add branches between prolog and epilog blocks.		// Add branches between prolog and epilog blocks.
addBranches(PrologBBs, KernelBB, EpilogBBs, Schedule, VRMap);		addBranches(PrologBBs, KernelBB, EpilogBBs, UpdateBBs, Schedule, VRMap);

// Remove the original loop since it's no longer referenced.		// Remove the original loop since it's no longer referenced.
for (auto &I : *BB)		for (auto &I : *BB)
LIS.RemoveMachineInstrFromMaps(I);		LIS.RemoveMachineInstrFromMaps(I);
BB->clear();		BB->clear();
BB->eraseFromParent();		BB->eraseFromParent();

		// Update liveness
		updateLiveness(UpdateBBs, LIS);

delete[] VRMap;		delete[] VRMap;
}		}

/// Generate the pipeline prolog code.		/// Generate the pipeline prolog code.
void SwingSchedulerDAG::generateProlog(SMSchedule &Schedule, unsigned LastStage,		void SwingSchedulerDAG::generateProlog(SMSchedule &Schedule, unsigned LastStage,
MachineBasicBlock *KernelBB,		MachineBasicBlock *KernelBB,
ValueMapTy *VRMap,		ValueMapTy *VRMap,
MBBVectorTy &PrologBBs) {		MBBVectorTy &PrologBBs) {
MachineBasicBlock *PreheaderBB = MLI->getLoopFor(BB)->getLoopPreheader();		MachineBasicBlock *PreheaderBB = MLI->getLoopFor(BB)->getLoopPreheader();
assert(PreheaderBB != nullptr &&		assert(PreheaderBB != nullptr &&
"Need to add code to handle loops w/o preheader");		"Need to add code to handle loops w/o preheader");
MachineBasicBlock *PredBB = PreheaderBB;		MachineBasicBlock *PredBB = PreheaderBB;
InstrMapTy InstrMap;		InstrMapTy InstrMap;

// Generate a basic block for each stage, not including the last stage,		// Generate a basic block for each stage, not including the last stage,
// which will be generated in the kernel. Each basic block may contain		// which will be generated in the kernel. Each basic block may contain
// instructions from multiple stages/iterations.		// instructions from multiple stages/iterations.
for (unsigned i = 0; i < LastStage; ++i) {		for (unsigned i = 0; i < LastStage; ++i) {
// Create and insert the prolog basic block prior to the original loop		// Create and insert the prolog basic block prior to the original loop
// basic block. The original loop is removed later.		// basic block. The original loop is removed later.
MachineBasicBlock *NewBB = MF.CreateMachineBasicBlock(BB->getBasicBlock());		MachineBasicBlock *NewBB = MF.CreateMachineBasicBlock(BB->getBasicBlock());
PrologBBs.push_back(NewBB);		PrologBBs.push_back(NewBB);
MF.insert(BB->getIterator(), NewBB);		MF.insert(BB->getIterator(), NewBB);
		LIS.insertMBBInMaps(NewBB);
NewBB->transferSuccessors(PredBB);		NewBB->transferSuccessors(PredBB);
PredBB->addSuccessor(NewBB);		PredBB->addSuccessor(NewBB);
PredBB = NewBB;		PredBB = NewBB;

// Generate instructions for each appropriate stage. Process instructions		// Generate instructions for each appropriate stage. Process instructions
// in original program order.		// in original program order.
for (int StageNum = i; StageNum >= 0; --StageNum) {		for (int StageNum = i; StageNum >= 0; --StageNum) {
for (MachineBasicBlock::iterator BBI = BB->instr_begin(),		for (MachineBasicBlock::iterator BBI = BB->instr_begin(),
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	void SwingSchedulerDAG::generateEpilog(SMSchedule &Schedule, unsigned LastStage,
// Generate a basic block for each stage, not including the last stage,		// Generate a basic block for each stage, not including the last stage,
// which was generated for the kernel. Each basic block may contain		// which was generated for the kernel. Each basic block may contain
// instructions from multiple stages/iterations.		// instructions from multiple stages/iterations.
int EpilogStage = LastStage + 1;		int EpilogStage = LastStage + 1;
for (unsigned i = LastStage; i >= 1; --i, ++EpilogStage) {		for (unsigned i = LastStage; i >= 1; --i, ++EpilogStage) {
MachineBasicBlock *NewBB = MF.CreateMachineBasicBlock();		MachineBasicBlock *NewBB = MF.CreateMachineBasicBlock();
EpilogBBs.push_back(NewBB);		EpilogBBs.push_back(NewBB);
MF.insert(BB->getIterator(), NewBB);		MF.insert(BB->getIterator(), NewBB);
		LIS.insertMBBInMaps(NewBB);
PredBB->replaceSuccessor(LoopExitBB, NewBB);		PredBB->replaceSuccessor(LoopExitBB, NewBB);
NewBB->addSuccessor(LoopExitBB);		NewBB->addSuccessor(LoopExitBB);

if (EpilogStart == LoopExitBB)		if (EpilogStart == LoopExitBB)
EpilogStart = NewBB;		EpilogStart = NewBB;

// Add instructions to the epilog depending on the current block.		// Add instructions to the epilog depending on the current block.
// Process instructions in original program order.		// Process instructions in original program order.
▲ Show 20 Lines • Show All 56 Lines • ▼ Show 20 Lines	static void replaceRegUsesAfterLoop(unsigned FromReg, unsigned ToReg,
for (MachineRegisterInfo::use_iterator I = MRI.use_begin(FromReg),		for (MachineRegisterInfo::use_iterator I = MRI.use_begin(FromReg),
E = MRI.use_end();		E = MRI.use_end();
I != E;) {		I != E;) {
MachineOperand &O = *I;		MachineOperand &O = *I;
++I;		++I;
if (O.getParent()->getParent() != MBB)		if (O.getParent()->getParent() != MBB)
O.setReg(ToReg);		O.setReg(ToReg);
}		}
if (!LIS.hasInterval(ToReg))
LIS.createEmptyInterval(ToReg);
}		}

/// Return true if the register has a use that occurs outside the		/// Return true if the register has a use that occurs outside the
/// specified loop.		/// specified loop.
static bool hasUseAfterLoop(unsigned Reg, MachineBasicBlock *BB,		static bool hasUseAfterLoop(unsigned Reg, MachineBasicBlock *BB,
MachineRegisterInfo &MRI) {		MachineRegisterInfo &MRI) {
for (MachineRegisterInfo::use_iterator I = MRI.use_begin(Reg),		for (MachineRegisterInfo::use_iterator I = MRI.use_begin(Reg),
E = MRI.use_end();		E = MRI.use_end();
▲ Show 20 Lines • Show All 485 Lines • ▼ Show 20 Lines	for (unsigned i = 1, e = MI.getNumOperands(); i != e; i += 2)
if (MI.getOperand(i + 1).getMBB() == Incoming) {		if (MI.getOperand(i + 1).getMBB() == Incoming) {
MI.RemoveOperand(i + 1);		MI.RemoveOperand(i + 1);
MI.RemoveOperand(i);		MI.RemoveOperand(i);
break;		break;
}		}
}		}
}		}

		// Remove basic block from its parent and also from UpdateBBs as
		// we don't need for the liveness update any longer.
		void SwingSchedulerDAG::removeBB(MachineBasicBlock *RemoveBB,
		MBBVectorTy &UpdateBBs) {
		for (MBBVectorTy::const_iterator MBB = UpdateBBs.begin(),
		MBE = UpdateBBs.end();
		MBB != MBE; ++MBB) {
		if (*MBB == RemoveBB) {
		UpdateBBs.erase(MBB);
		break;
		}
		}
		RemoveBB->clear();
		RemoveBB->eraseFromParent();
		}

/// Create branches from each prolog basic block to the appropriate epilog		/// Create branches from each prolog basic block to the appropriate epilog
/// block. These edges are needed if the loop ends before reaching the		/// block. These edges are needed if the loop ends before reaching the
/// kernel.		/// kernel.
void SwingSchedulerDAG::addBranches(MBBVectorTy &PrologBBs,		void SwingSchedulerDAG::addBranches(MBBVectorTy &PrologBBs,
MachineBasicBlock *KernelBB,		MachineBasicBlock *KernelBB,
MBBVectorTy &EpilogBBs,		MBBVectorTy &EpilogBBs,
		MBBVectorTy &UpdateBBs,
SMSchedule &Schedule, ValueMapTy *VRMap) {		SMSchedule &Schedule, ValueMapTy *VRMap) {
assert(PrologBBs.size() == EpilogBBs.size() && "Prolog/Epilog mismatch");		assert(PrologBBs.size() == EpilogBBs.size() && "Prolog/Epilog mismatch");
MachineInstr *IndVar = Pass.LI.LoopInductionVar;		MachineInstr *IndVar = Pass.LI.LoopInductionVar;
MachineInstr *Cmp = Pass.LI.LoopCompare;		MachineInstr *Cmp = Pass.LI.LoopCompare;
MachineBasicBlock *LastPro = KernelBB;		MachineBasicBlock *LastPro = KernelBB;
MachineBasicBlock *LastEpi = KernelBB;		MachineBasicBlock *LastEpi = KernelBB;

// Start from the blocks connected to the kernel and work "out"		// Start from the blocks connected to the kernel and work "out"
Show All 27 Lines	if (TargetRegisterInfo::isVirtualRegister(LC)) {
numAdded = TII->insertBranch(*Prolog, Epilog, LastPro, Cond, DebugLoc());		numAdded = TII->insertBranch(*Prolog, Epilog, LastPro, Cond, DebugLoc());
} else if (j >= LCMin) {		} else if (j >= LCMin) {
Prolog->addSuccessor(Epilog);		Prolog->addSuccessor(Epilog);
Prolog->removeSuccessor(LastPro);		Prolog->removeSuccessor(LastPro);
LastEpi->removeSuccessor(Epilog);		LastEpi->removeSuccessor(Epilog);
numAdded = TII->insertBranch(*Prolog, Epilog, nullptr, Cond, DebugLoc());		numAdded = TII->insertBranch(*Prolog, Epilog, nullptr, Cond, DebugLoc());
removePhis(Epilog, LastEpi);		removePhis(Epilog, LastEpi);
// Remove the blocks that are no longer referenced.		// Remove the blocks that are no longer referenced.
if (LastPro != LastEpi) {		if (LastPro != LastEpi)
LastEpi->clear();		removeBB(LastEpi, UpdateBBs);
LastEpi->eraseFromParent();
}		removeBB(LastPro, UpdateBBs);
LastPro->clear();
LastPro->eraseFromParent();
} else {		} else {
numAdded = TII->insertBranch(*Prolog, LastPro, nullptr, Cond, DebugLoc());		numAdded = TII->insertBranch(*Prolog, LastPro, nullptr, Cond, DebugLoc());
removePhis(Epilog, Prolog);		removePhis(Epilog, Prolog);
}		}
LastPro = Prolog;		LastPro = Prolog;
LastEpi = Epilog;		LastEpi = Epilog;
for (MachineBasicBlock::reverse_instr_iterator I = Prolog->instr_rbegin(),		for (MachineBasicBlock::reverse_instr_iterator I = Prolog->instr_rbegin(),
E = Prolog->instr_rend();		E = Prolog->instr_rend();
▲ Show 20 Lines • Show All 1,076 Lines • Show Last 20 Lines

lib/CodeGen/MachineUnroller.cpp

This file was added.

				//===------- MachineUnroller.cpp - Machine Loop unrolling utilities -------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				// This file implements the loop unrolling functionality at MI level.
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/MachineUnroller.h"
				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/CodeGen/LiveIntervals.h"
				#include "llvm/CodeGen/MachineBasicBlock.h"
				#include "llvm/CodeGen/MachineFunction.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineOperand.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/IR/DebugLoc.h"
				#include "llvm/Support/Debug.h"

				using namespace llvm;

				#define DEBUG_TYPE "mi-loop-unroll"

				// This is a utility class for unrolling loops at MI level.
				// It only unroll loops with the run-time trip count and
				// with a single basic block.
				//
				// After unrolling, the loop structure will be the following:
				//
				// Original LoopPreheader
				// Unrolled LoopPreheader
				// Unrolled Loop
				// Unrolled LoopExit
				// Remainder LoopPreheader
				// Remainder Loop
				// Remainder LoopExit
				// Original LoopExit

				void MachineUnroller::init(MachineLoop *loop, unsigned unrollFactor) {
				L = loop;
				UnrollFactor = unrollFactor;
				OrigHeader = L->getHeader();
				OrigPreheader = L->getLoopPreheader();
				OrigLoopExit = L->getExitBlock();
				LoopBBs.clear();
				ExitBBLiveIns.clear();
				}

				bool MachineUnroller::canUnroll() {
				// Only loops with a single basic block are handled. Also, the loop must
				// be analyzable using analyzeBranch. It's the responsibility of the caller of
				// this function to make sure that these requirement are met.
				assert(L->getNumBlocks() == 1 && "Only loops with single basic block can be"
				"unrolled!!");
				if (!isPowerOf2_32(UnrollFactor)) {
				LLVM_DEBUG(dbgs() << "Can't Unroll!! UnrollFactor must be a power of 2.");
				return false;
				}

				LoopIndVar = nullptr;
				LoopCmp = nullptr;
				if (TII->analyzeLoop(*L, LoopIndVar, LoopCmp))
				return false;

				// Get loop trip count. Compile-time trip count is not handled.
				LC = getLoopCount(OrigHeader, LoopIndVar, LoopCmp);
				return TargetRegisterInfo::isVirtualRegister(LC);
				}

				/// Create empty basic blocks for the unrolled/remainder loops and
				/// add them to the CFG. Some BBs from the original loop are reused
				/// and their successors/predecessors are changed as needed.
				void MachineUnroller::createUnrolledLoopStruct() {
				// Create basic blocks for the Unrolled Loop.
				ULPreheader = MF->CreateMachineBasicBlock();
				MF->insert(OrigHeader->getIterator(), ULPreheader);
				LIS->insertMBBInMaps(ULPreheader);

				ULHeader = MF->CreateMachineBasicBlock();
				ULHeader->setAlignment(OrigHeader->getAlignment());
				MF->insert(OrigHeader->getIterator(), ULHeader);
				LIS->insertMBBInMaps(ULHeader);

				ULPreheader->addSuccessor(ULHeader);
				ULHeader->addSuccessor(ULHeader);
				OrigPreheader->replaceSuccessor(OrigHeader, ULPreheader);

				// Create basic blocks for the Remainder Loop. The original loop header
				// is used as the remainder loop header. The loop trip count is adjusted
				// later to the appropriate value.
				RLHeader = OrigHeader;

				ULExit = MF->CreateMachineBasicBlock();
				MF->insert(RLHeader->getIterator(), ULExit);
				LIS->insertMBBInMaps(ULExit);

				RLPreheader = MF->CreateMachineBasicBlock();
				MF->insert(RLHeader->getIterator(), RLPreheader);
				LIS->insertMBBInMaps(RLPreheader);

				RLExit = MF->CreateMachineBasicBlock();
				MF->insert(++RLHeader->getIterator(), RLExit);
				LIS->insertMBBInMaps(RLExit);

				ULExit->addSuccessor(RLPreheader);
				RLPreheader->addSuccessor(RLHeader);

				ULHeader->addSuccessor(ULExit);
				OrigPreheader->addSuccessor(ULExit);
				ULExit->addSuccessor(RLExit);
				RLExit->addSuccessor(OrigLoopExit);
				RLHeader->replaceSuccessor(OrigLoopExit, RLExit);

				LoopBBs.push_back(ULPreheader);
				LoopBBs.push_back(ULHeader);
				LoopBBs.push_back(ULExit);
				LoopBBs.push_back(RLPreheader);
				LoopBBs.push_back(RLHeader);
				LoopBBs.push_back(RLExit);

				// Since the instructions are added/deleted to the basic blocks present
				// in LoopBBs and OrigPreheader, it makes their slot indexes out-of-date.
				// Remove all the instructions currently present in these basic blocks from
				// LIS and insert them later after they have gone through all changes.
				for (auto MBB : LoopBBs) {
				for (MachineInstr &MI : *MBB)
				if (!LIS->isNotInMIMap(MI))
				LIS->RemoveMachineInstrFromMaps(MI);
				}

				for (MachineInstr &MI : *OrigPreheader)
				if (!LIS->isNotInMIMap(MI))
				LIS->RemoveMachineInstrFromMaps(MI);

				// Update the Phis in RLHeader (same as OrigHeader) and
				// OrigLoopExit to use the new predecessors.
				for (MachineBasicBlock::iterator I = RLHeader->instr_begin(),
				E = RLHeader->getFirstNonPHI();
				I != E; ++I) {
				MachineInstr Phi = &I;
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2)
				if (Phi->getOperand(i + 1).getMBB() != RLHeader)
				Phi->getOperand(i + 1).setMBB(RLPreheader);
				}

				for (MachineBasicBlock::iterator I = OrigLoopExit->instr_begin(),
				E = OrigLoopExit->getFirstNonPHI();
				I != E; ++I) {
				MachineInstr Phi = &I;
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2)
				if (Phi->getOperand(i + 1).getMBB() == RLHeader)
				Phi->getOperand(i + 1).setMBB(RLExit);
				}
				}

				/// Return the Phi Operand that comes from outside the loop.
				static MachineOperand &getInitPhiOp(MachineInstr *Phi,
				MachineBasicBlock *LoopBB) {
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2)
				if (Phi->getOperand(i + 1).getMBB() != LoopBB)
				return Phi->getOperand(i);
				llvm_unreachable("Unexpected Phi structure.");
				}

				/// Return the Phi register value that comes from outside the loop.
				static unsigned getInitPhiReg(MachineInstr Phi, MachineBasicBlock LoopBB) {
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2)
				if (Phi->getOperand(i + 1).getMBB() != LoopBB)
				return Phi->getOperand(i).getReg();
				llvm_unreachable("Unexpected Phi structure.");
				}

				/// Return the Phi Operand that comes from the loop block.
				static MachineOperand &getLoopPhiOp(MachineInstr *Phi,
				MachineBasicBlock *LoopBB) {
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2)
				if (Phi->getOperand(i + 1).getMBB() == LoopBB)
				return Phi->getOperand(i);
				llvm_unreachable("Unexpected Phi structure.");
				}

				/// Return the Phi register value that comes from the loop block.
				static unsigned getLoopPhiReg(MachineInstr Phi, MachineBasicBlock LoopBB) {
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2)
				if (Phi->getOperand(i + 1).getMBB() == LoopBB)
				return Phi->getOperand(i).getReg();
				llvm_unreachable("Unexpected Phi structure.");
				}

				/// Return the basic block corresponding to the Phi register value.
				static MachineBasicBlock getPhiRegBB(MachineInstr Phi, unsigned Reg) {
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2)
				if (Phi->getOperand(i).getReg() == Reg)
				return Phi->getOperand(i + 1).getMBB();
				return 0;
				}

				/// Replace all uses of FromReg that appear within the specified
				/// basic block with ToReg.
				static void replaceRegUses(unsigned FromReg, unsigned ToReg,
				MachineBasicBlock *MBB, MachineRegisterInfo &MRI) {
				for (MachineRegisterInfo::use_iterator I = MRI.use_begin(FromReg),
				E = MRI.use_end();
				I != E;) {
				MachineOperand &O = *I;
				++I;
				MachineInstr *UseMI = O.getParent();
				if (UseMI->isPHI() && getPhiRegBB(UseMI, FromReg) != MBB)
				continue; // Don't change the register name

				if (UseMI->getParent() == MBB)
				O.setReg(ToReg);
				}
				}

				/// Clone the Phi instruction and set all the operands appropriately.
				/// This function assumes the instruction is a Phi.
				static MachineInstr clonePHI(MachineBasicBlock BB, MachineBasicBlock *BB1,
				MachineBasicBlock OrigBB, MachineInstr Phi) {
				MachineFunction *MF = OrigBB->getParent();
				unsigned InitVal = getInitPhiReg(Phi, OrigBB);
				unsigned LoopVal = getLoopPhiReg(Phi, OrigBB);
				MachineInstr *NewMI = MF->CloneMachineInstr(Phi);
				NewMI->getOperand(1).setReg(InitVal);
				NewMI->getOperand(2).setMBB(BB1);
				NewMI->getOperand(3).setReg(LoopVal);
				NewMI->getOperand(4).setMBB(BB);
				return NewMI;
				}

				static bool isBlockOutsideLoop(SmallVector<MachineBasicBlock *, 4> &LoopBBs,
				MachineBasicBlock *MBB) {
				for (auto TBB : LoopBBs)
				if (TBB == MBB)
				return false;
				return true;
				}

				static void
				replaceRegUsesAfterLoop(unsigned FromReg, unsigned ToReg,
				MachineRegisterInfo &MRI,
				SmallVector<MachineBasicBlock *, 4> &LoopBBs) {
				MachineInstr *DefMI = MRI.getVRegDef(ToReg);
				for (MachineRegisterInfo::use_iterator I = MRI.use_begin(FromReg),
				E = MRI.use_end();
				I != E;) {
				MachineOperand &O = *I;
				++I;
				MachineBasicBlock *UseBB = O.getParent()->getParent();
				if (isBlockOutsideLoop(LoopBBs, UseBB) && DefMI != O.getParent())
				O.setReg(ToReg);
				}
				}

				/// Update liveness information for all the basic blocks that are either
				/// newly added or modified during the transformation.
				static void updateLiveness(SmallVector<MachineBasicBlock *, 4> &MBBList,
				LiveIntervals *LIS) {
				for (auto MBB : MBBList) {
				for (MachineInstr &MI : *MBB) {
				if (!LIS->isNotInMIMap(MI))
				LIS->RemoveMachineInstrFromMaps(MI);
				if (MI.isDebugValue())
				continue;
				LIS->InsertMachineInstrInMaps(MI);
				}
				}
				}

				/// Return the register name for the latest instance of 'reg' as found
				/// in the VRMap. FYI, During unrolling, different instances of 'reg'
				/// (one from each iteration) are given a new name which is tracked
				/// using VRMap.
				unsigned MachineUnroller::getLatestInstance(unsigned reg, MachineBasicBlock *BB,
				ValueMapTy &VRMap) {
				unsigned LatestReg = reg;
				while (VRMap[BB].count(LatestReg) && LatestReg != VRMap[BB][LatestReg]) {
				LatestReg = VRMap[BB][LatestReg];
				}
				return LatestReg;
				}

				/// Update the machine instruction with new virtual registers. This
				/// function is only used to update the instructions in the unrolled
				/// loop header. It may change the defintions and/or uses.
				void MachineUnroller::updateInstruction(MachineInstr *NewMI, bool FirstIter,
				ValueMapTy &OldVRMap) {
				MachineBasicBlock *BB = NewMI->getParent();
				DenseMap<unsigned, unsigned> NewVRMap;
				DenseMap<unsigned, unsigned> &BBVRMap = VRMap[BB];
				for (unsigned i = 0, e = NewMI->getNumOperands(); i != e; ++i) {
				MachineOperand &MO = NewMI->getOperand(i);
				if (!MO.isReg() \|\| !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
				continue;
				unsigned reg = MO.getReg();
				if (MO.isDef()) {
				// Create a new virtual register for the definition.
				const TargetRegisterClass *RC = MRI->getRegClass(reg);
				unsigned NewReg = MRI->createVirtualRegister(RC);
				MO.setReg(NewReg);
				NewVRMap[reg] = NewReg;
				if (NewMI->isPHI())
				ULPhiVRMap[reg] = NewReg;
				} else if (MO.isUse()) {
				MachineInstr *DefMI = MRI->getVRegDef(reg);
				if (DefMI && DefMI->isPHI()) {
				if (NewMI->isPHI() && FirstIter)
				// Don't change the 'use' yet based on the new def reg. It will be
				// changed later to use the the last instance of the value reaching
				// from the loop after it has been unrolled.
				continue;
				else if (!FirstIter) {
				// Get mapped reg:
				// 1) If 'use' is a PHI, use the mapped reg from the previous
				// iteration.
				// 2) If 'use' is a non-PHI, use the mapped reg from the current
				// iteration.
				unsigned LatestReg = NewMI->isPHI()
				? getLatestInstance(reg, BB, OldVRMap)
				: getLatestInstance(reg, BB, VRMap);
				MO.setReg(LatestReg);
				continue;
				}
				}
				if (BBVRMap.count(reg)) {
				unsigned MappedReg = BBVRMap[reg];
				if (MRI->getVRegDef(MappedReg) != NewMI)
				MO.setReg(MappedReg);
				}
				}
				}

				for (auto Val : NewVRMap)
				VRMap[BB][Val.first] = Val.second;
				}

				/// Return true if we can compute the amount the instruction changes
				/// during each iteration. Set Delta to the amount of the change.
				bool MachineUnroller::computeDelta(MachineInstr &MI, unsigned &Delta) const {
				const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
				unsigned BaseReg;
				int64_t Offset;
				if (!TII->getMemOpBaseRegImmOfs(MI, BaseReg, Offset, TRI))
				return false;

				// Check if there is a Phi. If so, get the definition in the loop.
				MachineInstr *BaseDef = MRI->getVRegDef(BaseReg);
				if (BaseDef && BaseDef->isPHI()) {
				if (BaseDef->getParent() != MI.getParent())
				return false;
				BaseReg = getLoopPhiReg(BaseDef, MI.getParent());
				BaseDef = MRI->getVRegDef(BaseReg);
				}
				if (!BaseDef)
				return false;

				int D = 0;
				if (!TII->getIncrementValue(*BaseDef, D) && D >= 0)
				return false;

				Delta = D;
				return true;
				}

				/// Update the memory operand with a new offset when the unroller
				/// generates a new copy of the instruction that refers to a
				/// different memory location.
				void MachineUnroller::updateMemOperands(MachineInstr *NewMI,
				MachineInstr *OldMI, unsigned iter)
				const {
				if (iter == 0)
				return;
				// If the instruction has memory operands, then adjust the offset
				// when the instruction appears in different iterations.
				if (NewMI->memoperands_empty())
				return;
				SmallVector<MachineMemOperand *, 2> NewMMOs;
				for (MachineMemOperand *MMO : NewMI->memoperands()) {
				if (MMO->isVolatile() \|\| (MMO->isInvariant() && MMO->isDereferenceable()) \|\|
				(!MMO->getValue())) {
				NewMMOs.push_back(MMO);
				continue;
				}
				unsigned Delta;
				if (computeDelta(*OldMI, Delta)) {
				int64_t AdjOffset = Delta * iter;
				NewMMOs.push_back(
				MF->getMachineMemOperand(MMO, AdjOffset, MMO->getSize()));
				} else
				NewMMOs.push_back(
				MF->getMachineMemOperand(MMO, 0, MemoryLocation::UnknownSize));
				}
				NewMI->setMemRefs(*MF, NewMMOs);
				}

				/// Adjust offset value for the instructions with memory operands when their
				/// copies are generated after first iteration. By adjusting the offset and
				/// using the right base register, we can avoid uncessary 'add' instructions
				/// that are used to increment the offset for each iteration.

				/// Generate instructions for the unrolled loop header.
				void MachineUnroller::generateUnrolledLoop() {
				for (unsigned iter = 0; iter < UnrollFactor; iter++) {
				ValueMapTy OldVRMap = VRMap;
				for (MachineBasicBlock::iterator I = OrigHeader->instr_begin(),
				E = OrigHeader->getFirstTerminator();
				I != E; ++I) {
				MachineInstr MI = &I;
				bool FirstIter = (iter == 0);
				if (MI->isPHI() && !FirstIter) {
				// Just create a new dummy register name for the PHI def and map
				// it to LoopVal reaching from the previous iteration.
				unsigned OrigReg = MI->getOperand(0).getReg();
				const TargetRegisterClass *RC = MRI->getRegClass(OrigReg);
				unsigned NewReg = MRI->createVirtualRegister(RC);
				VRMap[ULHeader][OrigReg] = NewReg;
				unsigned LoopVal = getLoopPhiReg(MI, OrigHeader);
				VRMap[ULHeader][NewReg] =
				getLatestInstance(LoopVal, ULHeader, OldVRMap);
				continue;
				}
				MachineInstr *NewMI =
				MI->isPHI() ? clonePHI(ULHeader, ULPreheader, OrigHeader, MI)
				: MF->CloneMachineInstr(MI);
				ULHeader->push_back(NewMI);
				updateInstruction(NewMI, iter == 0, OldVRMap);
				updateMemOperands(NewMI, MI, iter);
				}
				}

				// Copy any terminator instructions to the unrolled loop header.
				for (MachineBasicBlock::iterator I = OrigHeader->getFirstTerminator(),
				E = OrigHeader->instr_end();
				I != E; ++I) {
				arsenmUnsubmitted Not Done Reply Inline Actions Isn't this just terminators()? arsenm: Isn't this just terminators()?
				jvermaAuthorUnsubmitted Not Done Reply Inline Actions Will fix. jverma: Will fix.
				MachineInstr NewMI = MF->CloneMachineInstr(&I);
				ULHeader->push_back(NewMI);
				updateInstruction(NewMI, false, VRMap);
				}

				// Update PHIs
				for (MachineBasicBlock::iterator I = ULHeader->instr_begin(),
				E = ULHeader->getFirstNonPHI();
				I != E; ++I) {
				MachineInstr Phi = &I;
				MachineOperand &MO = getLoopPhiOp(Phi, ULHeader);
				unsigned reg = MO.getReg();
				MO.setReg(getLatestInstance(reg, ULHeader, VRMap));
				}
				}

				/// Regenerate post-increment load/store instructions. Also, update the offset
				/// value for the load/store instructions that use the same base address as the
				/// newly created post-increment load/store.

				/// Generate Phis for the exit block for the unrolled loop.
				void MachineUnroller::generatePhisForULExit() {
				ValueMapTy OldVRMap = VRMap;
				for (MachineBasicBlock::iterator I = OrigHeader->instr_begin(),
				E = OrigHeader->getFirstNonPHI();
				I != E; ++I) {
				MachineInstr Phi = &I;
				assert(Phi->isPHI() && "Expecting a Phi.");
				unsigned DefReg = Phi->getOperand(0).getReg();
				const TargetRegisterClass *RC = MRI->getRegClass(DefReg);
				unsigned InitVal = getInitPhiReg(Phi, OrigHeader);
				unsigned LoopVal = getLoopPhiReg(Phi, OrigHeader);

				assert(InitVal != 0 && LoopVal != 0 && "Unexpected Phi structure.");
				MachineInstr *LoopInst = MRI->getVRegDef(LoopVal);
				unsigned PhiOp1 = InitVal;
				unsigned PhiOp2 = LoopInst->isPHI()
				? getLatestInstance(LoopVal, ULHeader, OldVRMap)
				: getLatestInstance(LoopVal, ULHeader, VRMap);

				unsigned NewReg = MRI->createVirtualRegister(RC);
				MachineInstrBuilder NewPhi =
				BuildMI(*ULExit, ULExit->getFirstNonPHI(), DebugLoc(),
				TII->get(TargetOpcode::PHI), NewReg);
				NewPhi.addReg(PhiOp1).addMBB(OrigPreheader);
				NewPhi.addReg(PhiOp2).addMBB(ULHeader);
				VRMap[ULExit][DefReg] = NewReg;
				replaceRegUses(DefReg, NewReg, ULExit, *MRI);

				// Update Phi in the original loop header to use 'NewReg'
				// as the initial value.
				getInitPhiOp(Phi, OrigHeader).setReg(NewReg);
				}

				// Generate additional PHIs for the values that are live-in for
				// the original loop exit block.
				generateNewPhis(ULExit, OrigPreheader, ULHeader);
				}

				unsigned MachineUnroller::getMappedRegORCreate(unsigned Reg,
				MachineBasicBlock *BB) {
				const TargetRegisterClass *RC = MRI->getRegClass(Reg);
				if (VRMap[BB].count(Reg))
				return getLatestInstance(Reg, BB, VRMap);

				unsigned NewReg = MRI->createVirtualRegister(RC);
				BuildMI(*BB, BB->getFirstNonPHI(), DebugLoc(),
				TII->get(TargetOpcode::IMPLICIT_DEF), NewReg);
				return NewReg;
				}

				void MachineUnroller::generateNewPhis(MachineBasicBlock *BB,
				MachineBasicBlock *BB1,
				MachineBasicBlock *BB2) {
				for (auto Reg : ExitBBLiveIns) {
				unsigned BB1Reg = getMappedRegORCreate(Reg, BB1);
				unsigned BB2Reg = getMappedRegORCreate(Reg, BB2);
				const TargetRegisterClass *RC = MRI->getRegClass(Reg);
				unsigned NewReg = MRI->createVirtualRegister(RC);
				MachineInstrBuilder NewPhi = BuildMI(*BB, BB->getFirstNonPHI(), DebugLoc(),
				TII->get(TargetOpcode::PHI), NewReg);
				NewPhi.addReg(BB1Reg).addMBB(BB1);
				NewPhi.addReg(BB2Reg).addMBB(BB2);
				VRMap[BB][Reg] = NewReg;
				}
				}

				/// Generate Phis for the exit block for the remainder loop.
				void MachineUnroller::generatePhisForRLExit() {
				// Generate PHIs for the values that are live-in for
				// the original loop exit block.
				generateNewPhis(RLExit, ULExit, RLHeader);

				for (MachineBasicBlock::iterator I = RLExit->instr_begin(),
				E = RLExit->getFirstNonPHI();
				I != E; ++I) {
				MachineInstr Phi = &I;
				unsigned OrigBBReg = 0;
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2) {
				if (Phi->getOperand(i + 1).getMBB() == OrigHeader)
				OrigBBReg = Phi->getOperand(i).getReg();
				}
				assert(OrigBBReg != 0 && "Unexpected Phi structure.");
				unsigned PhiDefReg = Phi->getOperand(0).getReg();
				replaceRegUsesAfterLoop(OrigBBReg, PhiDefReg, *MRI, LoopBBs);
				}
				}

				void MachineUnroller::getExitBBLiveIns() {
				for (auto I = OrigHeader->instr_begin(), E = OrigHeader->instr_end(); I != E;
				++I) {
				MachineInstr MI = &I;
				for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
				MachineOperand &MO = MI->getOperand(i);
				if (!MO.isReg() \|\| !MO.isDef() \|\|
				!TargetRegisterInfo::isVirtualRegister(MO.getReg()))
				continue;
				unsigned DefReg = MO.getReg();
				for (MachineRegisterInfo::use_iterator I = MRI->use_begin(DefReg),
				E = MRI->use_end();
				I != E;) {
				MachineOperand &O = *I;
				++I;
				if (O.getParent()->getParent() != OrigHeader) {
				ExitBBLiveIns.push_back(DefReg);
				break;
				}
				}
				}
				}
				}

				void MachineUnroller::addBBIntoVRMap(MachineBasicBlock *BB) {
				for (auto I = BB->instr_begin(), E = BB->instr_end(); I != E; ++I) {
				MachineInstr MI = &I;
				for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
				MachineOperand &MO = MI->getOperand(i);
				if (!MO.isReg() \|\| !TargetRegisterInfo::isVirtualRegister(MO.getReg()))
				continue;
				if (MO.isDef()) {
				unsigned DefReg = MO.getReg();
				VRMap[BB][DefReg] = DefReg;
				}
				}
				}
				}

				/// Remove all Phi instructions from BB.
				static void cleanUpPHIs(MachineBasicBlock *BB, MachineRegisterInfo &MRI) {
				for (MachineBasicBlock::iterator MII = BB->instr_begin(),
				MIE = BB->getFirstNonPHI();
				MII != MIE;) {
				MachineInstr Phi = &MII;
				++MII;
				unsigned InitVal = getInitPhiReg(Phi, BB);
				unsigned PhiDef = Phi->getOperand(0).getReg();
				for (MachineRegisterInfo::use_iterator I = MRI.use_begin(PhiDef),
				E = MRI.use_end();
				I != E;) {
				MachineOperand &O = *I;
				++I;
				O.setReg(InitVal);
				}
				Phi->eraseFromParent();
				}
				}

				/// Fix all the branches for the unrolled and remainder loops. Also, update
				/// the loop count.
				void MachineUnroller::fixBranchesAndLoopCount(unsigned ULCount,
				unsigned RLCount) {
				SmallVector<MachineOperand, 4> Cond;
				MachineBasicBlock TBB = nullptr, FBB = nullptr;
				bool checkBranch = TII->analyzeBranch(*ULHeader, TBB, FBB, Cond);
				assert(!checkBranch && "Can't analyze the branch in UnrolledLoop Header");
				(void)checkBranch;

				TII->removeBranch(*ULHeader);
				TII->insertBranch(*ULHeader, ULHeader, ULExit, Cond, DebugLoc());

				// Change loop count for the Unrolled loop and fixup branches.
				SmallVector<MachineOperand, 4> Cond1;
				changeLoopCount(OrigPreheader, ULPreheader, *ULHeader, ULCount, LoopIndVar,
				*LoopCmp, Cond1);
				TII->insertBranch(*OrigPreheader, ULExit, ULPreheader, Cond1, DebugLoc());
				Cond1.clear();
				TII->insertBranch(*ULPreheader, ULHeader, nullptr, Cond1, DebugLoc());

				// Copy instructions from the unrolled loop preheader as it may contain
				// loop setup instructions also needed for the Remainder loop.
				for (MachineBasicBlock::iterator I = ULPreheader->instr_begin(),
				E = ULPreheader->getFirstTerminator();
				I != E; ++I) {
				MachineInstr MI = &I;
				MachineInstr *NewMI = MF->CloneMachineInstr(MI);
				ULExit->push_back(NewMI);
				}

				// Change loop count for the Remainder loop and fixup branches.
				TII->removeBranch(*RLHeader);
				TII->insertBranch(*RLHeader, RLHeader, RLExit, Cond, DebugLoc());

				Cond1.clear();
				changeLoopCount(ULExit, RLPreheader, *RLHeader, RLCount, LoopIndVar,
				*LoopCmp, Cond1);
				TII->insertBranch(*ULExit, RLExit, RLPreheader, Cond1, DebugLoc());

				Cond1.clear();
				TII->insertBranch(*RLPreheader, RLHeader, nullptr, Cond1, DebugLoc());
				TII->insertBranch(*RLExit, OrigLoopExit, nullptr, Cond1, DebugLoc());
				if (RLHeader->succ_size() == 1)
				cleanUpPHIs(RLHeader, *MRI);
				}

				void MachineUnroller::preprocessPhiNodes(MachineBasicBlock &B) {
				SlotIndexes &Slots = *LIS->getSlotIndexes();

				for (MachineInstr &PI : make_range(B.begin(), B.getFirstNonPHI())) {
				MachineOperand &DefOp = PI.getOperand(0);
				assert(DefOp.getSubReg() == 0);
				auto *RC = MRI->getRegClass(DefOp.getReg());

				for (unsigned i = 1, n = PI.getNumOperands(); i != n; i += 2) {
				MachineOperand &RegOp = PI.getOperand(i);
				if (RegOp.getSubReg() == 0)
				continue;

				// If the operand uses a subregister, replace it with a new register
				// without subregisters, and generate a copy to the new register.
				unsigned NewReg = MRI->createVirtualRegister(RC);
				MachineBasicBlock &PredB = *PI.getOperand(i + 1).getMBB();
				MachineBasicBlock::iterator At = PredB.getFirstTerminator();
				const DebugLoc &DL = PredB.findDebugLoc(At);
				auto Copy =
				BuildMI(PredB, At, DL, TII->get(TargetOpcode::COPY), NewReg)
				.addReg(RegOp.getReg(), getRegState(RegOp), RegOp.getSubReg());
				Slots.insertMachineInstrInMaps(*Copy);
				RegOp.setReg(NewReg);
				RegOp.setSubReg(0);
				}
				}
				}

				bool MachineUnroller::unroll(MachineLoop *loop, unsigned unrollFactor) {
				init(loop, unrollFactor);
				if (!canUnroll())
				return false;

				// Remove any subregisters from input to phi nodes.
				preprocessPhiNodes(*loop->getHeader());

				// Add all the def regs in the loop header in VRMap.
				addBBIntoVRMap(OrigHeader);
				getExitBBLiveIns();

				// Create empty basic blocks for the unrolled version of the loop.
				createUnrolledLoopStruct();

				// Add instructions to compute trip counts for the unrolled and
				// remainder loops.
				TII->removeBranch(*OrigPreheader);
				unsigned ULCount = addUnrolledLoopCountMI(*OrigPreheader, LC, UnrollFactor);
				unsigned RLCount = addRemLoopCountMI(*OrigPreheader, LC, UnrollFactor);

				// Add instructions to the Unrolled loop header.
				generateUnrolledLoop();

				// Generate Phis for the unrolled loop exit block and also update
				// Phis in the remainder loop header to use the correct initial values.
				generatePhisForULExit();

				// Generate Phis for the remainder loop exit block.
				generatePhisForRLExit();

				// Optimize unrolled loop header.
				optimize(*ULHeader);

				// Update branches and adjust loop count.
				fixBranchesAndLoopCount(ULCount, RLCount);

				SmallVector<MachineBasicBlock *, 4> UpdateBBs = LoopBBs;
				UpdateBBs.insert(UpdateBBs.begin(), OrigPreheader);
				updateLiveness(UpdateBBs, LIS);

				// Modify existing loop to point to the unrolled loop header.
				L->removeBlockFromLoop(OrigHeader);
				L->addBasicBlockToLoop(ULHeader, MLI->getBase());
				return true;
				}

lib/Target/Hexagon/CMakeLists.txt

Show All 37 Lines	add_llvm_target(HexagonCodeGen
HexagonInstrInfo.cpp		HexagonInstrInfo.cpp
HexagonISelDAGToDAG.cpp		HexagonISelDAGToDAG.cpp
HexagonISelDAGToDAGHVX.cpp		HexagonISelDAGToDAGHVX.cpp
HexagonISelLowering.cpp		HexagonISelLowering.cpp
HexagonISelLoweringHVX.cpp		HexagonISelLoweringHVX.cpp
HexagonLoopIdiomRecognition.cpp		HexagonLoopIdiomRecognition.cpp
HexagonMachineFunctionInfo.cpp		HexagonMachineFunctionInfo.cpp
HexagonMachineScheduler.cpp		HexagonMachineScheduler.cpp
		HexagonMachineUnroller.cpp
HexagonMCInstLower.cpp		HexagonMCInstLower.cpp
HexagonNewValueJump.cpp		HexagonNewValueJump.cpp
HexagonOptAddrMode.cpp		HexagonOptAddrMode.cpp
HexagonOptimizeSZextends.cpp		HexagonOptimizeSZextends.cpp
HexagonPeephole.cpp		HexagonPeephole.cpp
HexagonRDFOpt.cpp		HexagonRDFOpt.cpp
HexagonRegisterInfo.cpp		HexagonRegisterInfo.cpp
HexagonSelectionDAGInfo.cpp		HexagonSelectionDAGInfo.cpp
Show All 23 Lines

lib/Target/Hexagon/Hexagon.td

	Show First 20 Lines • Show All 246 Lines • ▼ Show 20 Lines
	def changeAddrMode_rr_ur: InstrMapping {			def changeAddrMode_rr_ur: InstrMapping {
	let FilterClass = "ImmRegShl";			let FilterClass = "ImmRegShl";
	let RowFields = ["CextOpcode", "PredSense", "PNewValue", "isNVStore"];			let RowFields = ["CextOpcode", "PredSense", "PNewValue", "isNVStore"];
	let ColFields = ["addrMode"];			let ColFields = ["addrMode"];
	let KeyCol = ["BaseRegOffset"];			let KeyCol = ["BaseRegOffset"];
	let ValueCols = [["BaseLongOffset"]];			let ValueCols = [["BaseLongOffset"]];
	}			}

	def changeAddrMode_ur_rr : InstrMapping {			def changeAddrMode_ur_rr: InstrMapping {
	let FilterClass = "ImmRegShl";			let FilterClass = "ImmRegShl";
	let RowFields = ["CextOpcode", "PredSense", "PNewValue", "isNVStore"];			let RowFields = ["CextOpcode", "PredSense", "PNewValue", "isNVStore"];
	let ColFields = ["addrMode"];			let ColFields = ["addrMode"];
	let KeyCol = ["BaseLongOffset"];			let KeyCol = ["BaseLongOffset"];
	let ValueCols = [["BaseRegOffset"]];			let ValueCols = [["BaseRegOffset"]];
	}			}

	def getRegForm : InstrMapping {			def getRegForm : InstrMapping {
	▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

lib/Target/Hexagon/HexagonDepInstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 10,020 Lines • ▼ Show 20 Lines
	let Inst{31-21} = 0b10011011000;			let Inst{31-21} = 0b10011011000;
	let isPredicated = 1;			let isPredicated = 1;
	let isPredicatedFalse = 1;			let isPredicatedFalse = 1;
	let hasNewValue = 1;			let hasNewValue = 1;
	let opNewValue = 0;			let opNewValue = 0;
	let addrMode = PostInc;			let addrMode = PostInc;
	let accessSize = ByteAccess;			let accessSize = ByteAccess;
	let mayLoad = 1;			let mayLoad = 1;
				let CextOpcode = "L2_loadrb";
	let BaseOpcode = "L2_loadrb_pi";			let BaseOpcode = "L2_loadrb_pi";
	let Constraints = "$Rx32 = $Rx32in";			let Constraints = "$Rx32 = $Rx32in";
	}			}
	def L2_ploadrbf_zomap : HInst<			def L2_ploadrbf_zomap : HInst<
	(outs IntRegs:$Rd32),			(outs IntRegs:$Rd32),
	(ins PredRegs:$Pt4, IntRegs:$Rs32),			(ins PredRegs:$Pt4, IntRegs:$Rs32),
	"if (!$Pt4) $Rd32 = memb($Rs32)",			"if (!$Pt4) $Rd32 = memb($Rs32)",
	tc_ef52ed71, TypeMAPPING> {			tc_ef52ed71, TypeMAPPING> {
	▲ Show 20 Lines • Show All 352 Lines • ▼ Show 20 Lines
	let Inst{31-21} = 0b10011011010;			let Inst{31-21} = 0b10011011010;
	let isPredicated = 1;			let isPredicated = 1;
	let isPredicatedFalse = 1;			let isPredicatedFalse = 1;
	let hasNewValue = 1;			let hasNewValue = 1;
	let opNewValue = 0;			let opNewValue = 0;
	let addrMode = PostInc;			let addrMode = PostInc;
	let accessSize = HalfWordAccess;			let accessSize = HalfWordAccess;
	let mayLoad = 1;			let mayLoad = 1;
				let CextOpcode = "L2_loadrh";
	let BaseOpcode = "L2_loadrh_pi";			let BaseOpcode = "L2_loadrh_pi";
	let Constraints = "$Rx32 = $Rx32in";			let Constraints = "$Rx32 = $Rx32in";
	}			}
	def L2_ploadrhf_zomap : HInst<			def L2_ploadrhf_zomap : HInst<
	(outs IntRegs:$Rd32),			(outs IntRegs:$Rd32),
	(ins PredRegs:$Pt4, IntRegs:$Rs32),			(ins PredRegs:$Pt4, IntRegs:$Rs32),
	"if (!$Pt4) $Rd32 = memh($Rs32)",			"if (!$Pt4) $Rd32 = memh($Rs32)",
	tc_ef52ed71, TypeMAPPING> {			tc_ef52ed71, TypeMAPPING> {
	▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	let Inst{31-21} = 0b10011011100;			let Inst{31-21} = 0b10011011100;
	let isPredicated = 1;			let isPredicated = 1;
	let isPredicatedFalse = 1;			let isPredicatedFalse = 1;
	let hasNewValue = 1;			let hasNewValue = 1;
	let opNewValue = 0;			let opNewValue = 0;
	let addrMode = PostInc;			let addrMode = PostInc;
	let accessSize = WordAccess;			let accessSize = WordAccess;
	let mayLoad = 1;			let mayLoad = 1;
				let CextOpcode = "L2_loadri";
	let BaseOpcode = "L2_loadri_pi";			let BaseOpcode = "L2_loadri_pi";
	let Constraints = "$Rx32 = $Rx32in";			let Constraints = "$Rx32 = $Rx32in";
	}			}
	def L2_ploadrif_zomap : HInst<			def L2_ploadrif_zomap : HInst<
	(outs IntRegs:$Rd32),			(outs IntRegs:$Rd32),
	(ins PredRegs:$Pt4, IntRegs:$Rs32),			(ins PredRegs:$Pt4, IntRegs:$Rs32),
	"if (!$Pt4) $Rd32 = memw($Rs32)",			"if (!$Pt4) $Rd32 = memw($Rs32)",
	tc_ef52ed71, TypeMAPPING> {			tc_ef52ed71, TypeMAPPING> {
	▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	let Inst{31-21} = 0b10011011001;			let Inst{31-21} = 0b10011011001;
	let isPredicated = 1;			let isPredicated = 1;
	let isPredicatedFalse = 1;			let isPredicatedFalse = 1;
	let hasNewValue = 1;			let hasNewValue = 1;
	let opNewValue = 0;			let opNewValue = 0;
	let addrMode = PostInc;			let addrMode = PostInc;
	let accessSize = ByteAccess;			let accessSize = ByteAccess;
	let mayLoad = 1;			let mayLoad = 1;
				let CextOpcode = "L2_loadrub";
	let BaseOpcode = "L2_loadrub_pi";			let BaseOpcode = "L2_loadrub_pi";
	let Constraints = "$Rx32 = $Rx32in";			let Constraints = "$Rx32 = $Rx32in";
	}			}
	def L2_ploadrubf_zomap : HInst<			def L2_ploadrubf_zomap : HInst<
	(outs IntRegs:$Rd32),			(outs IntRegs:$Rd32),
	(ins PredRegs:$Pt4, IntRegs:$Rs32),			(ins PredRegs:$Pt4, IntRegs:$Rs32),
	"if (!$Pt4) $Rd32 = memub($Rs32)",			"if (!$Pt4) $Rd32 = memub($Rs32)",
	tc_ef52ed71, TypeMAPPING> {			tc_ef52ed71, TypeMAPPING> {
	▲ Show 20 Lines • Show All 180 Lines • ▼ Show 20 Lines
	let Inst{31-21} = 0b10011011011;			let Inst{31-21} = 0b10011011011;
	let isPredicated = 1;			let isPredicated = 1;
	let isPredicatedFalse = 1;			let isPredicatedFalse = 1;
	let hasNewValue = 1;			let hasNewValue = 1;
	let opNewValue = 0;			let opNewValue = 0;
	let addrMode = PostInc;			let addrMode = PostInc;
	let accessSize = HalfWordAccess;			let accessSize = HalfWordAccess;
	let mayLoad = 1;			let mayLoad = 1;
				let CextOpcode = "L2_loadruh";
	let BaseOpcode = "L2_loadruh_pi";			let BaseOpcode = "L2_loadruh_pi";
	let Constraints = "$Rx32 = $Rx32in";			let Constraints = "$Rx32 = $Rx32in";
	}			}
	def L2_ploadruhf_zomap : HInst<			def L2_ploadruhf_zomap : HInst<
	(outs IntRegs:$Rd32),			(outs IntRegs:$Rd32),
	(ins PredRegs:$Pt4, IntRegs:$Rs32),			(ins PredRegs:$Pt4, IntRegs:$Rs32),
	"if (!$Pt4) $Rd32 = memuh($Rs32)",			"if (!$Pt4) $Rd32 = memuh($Rs32)",
	tc_ef52ed71, TypeMAPPING> {			tc_ef52ed71, TypeMAPPING> {
	▲ Show 20 Lines • Show All 9,657 Lines • ▼ Show 20 Lines
	let Inst{13-11} = 0b000;			let Inst{13-11} = 0b000;
	let Inst{31-21} = 0b10101011101;			let Inst{31-21} = 0b10101011101;
	let addrMode = PostInc;			let addrMode = PostInc;
	let accessSize = ByteAccess;			let accessSize = ByteAccess;
	let isNVStore = 1;			let isNVStore = 1;
	let isNewValue = 1;			let isNewValue = 1;
	let isRestrictNoSlot1Store = 1;			let isRestrictNoSlot1Store = 1;
	let mayStore = 1;			let mayStore = 1;
				let CextOpcode = "S2_storerb";
	let BaseOpcode = "S2_storerb_pi";			let BaseOpcode = "S2_storerb_pi";
	let isPredicable = 1;			let isPredicable = 1;
	let isNVStorable = 1;			let isNVStorable = 1;
	let opNewValue = 3;			let opNewValue = 3;
	let Constraints = "$Rx32 = $Rx32in";			let Constraints = "$Rx32 = $Rx32in";
	}			}
	def S2_storerbnew_pr : HInst<			def S2_storerbnew_pr : HInst<
	(outs IntRegs:$Rx32),			(outs IntRegs:$Rx32),
	▲ Show 20 Lines • Show All 480 Lines • ▼ Show 20 Lines
	let Inst{13-11} = 0b001;			let Inst{13-11} = 0b001;
	let Inst{31-21} = 0b10101011101;			let Inst{31-21} = 0b10101011101;
	let addrMode = PostInc;			let addrMode = PostInc;
	let accessSize = HalfWordAccess;			let accessSize = HalfWordAccess;
	let isNVStore = 1;			let isNVStore = 1;
	let isNewValue = 1;			let isNewValue = 1;
	let isRestrictNoSlot1Store = 1;			let isRestrictNoSlot1Store = 1;
	let mayStore = 1;			let mayStore = 1;
				let CextOpcode = "S2_storerh";
	let BaseOpcode = "S2_storerh_pi";			let BaseOpcode = "S2_storerh_pi";
	let isNVStorable = 1;			let isNVStorable = 1;
	let isPredicable = 1;			let isPredicable = 1;
	let opNewValue = 3;			let opNewValue = 3;
	let Constraints = "$Rx32 = $Rx32in";			let Constraints = "$Rx32 = $Rx32in";
	}			}
	def S2_storerhnew_pr : HInst<			def S2_storerhnew_pr : HInst<
	(outs IntRegs:$Rx32),			(outs IntRegs:$Rx32),
	▲ Show 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	let Inst{13-11} = 0b010;			let Inst{13-11} = 0b010;
	let Inst{31-21} = 0b10101011101;			let Inst{31-21} = 0b10101011101;
	let addrMode = PostInc;			let addrMode = PostInc;
	let accessSize = WordAccess;			let accessSize = WordAccess;
	let isNVStore = 1;			let isNVStore = 1;
	let isNewValue = 1;			let isNewValue = 1;
	let isRestrictNoSlot1Store = 1;			let isRestrictNoSlot1Store = 1;
	let mayStore = 1;			let mayStore = 1;
				let CextOpcode = "S2_storeri";
	let BaseOpcode = "S2_storeri_pi";			let BaseOpcode = "S2_storeri_pi";
	let isPredicable = 1;			let isPredicable = 1;
	let opNewValue = 3;			let opNewValue = 3;
	let Constraints = "$Rx32 = $Rx32in";			let Constraints = "$Rx32 = $Rx32in";
	}			}
	def S2_storerinew_pr : HInst<			def S2_storerinew_pr : HInst<
	(outs IntRegs:$Rx32),			(outs IntRegs:$Rx32),
	(ins IntRegs:$Rx32in, ModRegs:$Mu2, IntRegs:$Nt8),			(ins IntRegs:$Rx32in, ModRegs:$Mu2, IntRegs:$Nt8),
	▲ Show 20 Lines • Show All 15,619 Lines • Show Last 20 Lines

lib/Target/Hexagon/HexagonMachineUnroller.h

This file was added.

				//===------ HexagonMachineUnroller.h - Custom Hexagon Machine Unroller-----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Custom Hexagon Machine Unroller
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_LIB_TARGET_HEXAGON_HEXAGONMACHINEUNROLLER_H
				#define LLVM_LIB_TARGET_HEXAGON_HEXAGONMACHINEUNROLLER_H

				#include "HexagonInstrInfo.h"
				#include "llvm/CodeGen/MachineUnroller.h"

				namespace llvm {

				class HexagonMachineUnroller : public MachineUnroller {
				const HexagonInstrInfo *HII;

				public:
				HexagonMachineUnroller(MachineUnrollerContext *C) : MachineUnroller(C) {
				HII = static_cast<const HexagonInstrInfo *>(C->TII);
				}

				unsigned getLoopCount(MachineBasicBlock &MBB, MachineInstr *IndVar,
				MachineInstr &Cmp) const override;

				/// Add instruction to compute trip count for the unrolled loop.
				unsigned addUnrolledLoopCountMI(MachineBasicBlock &MBB, unsigned LC,
				unsigned UnrollFactor) const override;

				/// Add instruction to compute remainder trip count for the unrolled loop.
				unsigned addRemLoopCountMI(MachineBasicBlock &MBB, unsigned LC,
				unsigned UnrollFactor) const override;

				void changeLoopCount(MachineBasicBlock &BB, MachineBasicBlock &Preheader,
				MachineBasicBlock &Header, unsigned LC,
				MachineInstr *IndVar, MachineInstr &Cmp,
				SmallVectorImpl<MachineOperand> &Cond) const override;

				void optimize(MachineBasicBlock &BB) const override;

				bool canReplaceWithPostInc(MachineInstr MI, MachineInstr AddMI) const;
				void replaceWithPostInc(MachineInstr MI, MachineInstr AddMI) const;
				void generatePostInc(MachineBasicBlock *BB) const;
				void replacePostIncWithBaseOffset(MachineBasicBlock *BB) const;
				void replacePostIncWithBaseOffset(MachineInstr *MI) const;
				bool isValidPostIncValue(const MachineInstr &MI, int IncVal) const;
				void updateBaseAndOffset(MachineInstr MI, MachineInstr AddMI) const;
				void foldAdds(MachineBasicBlock &BB) const;
				// Remove dead instructions that might have been addeded during unrolling.
				void removeDeadInstructions(MachineBasicBlock &BB) const;
				bool isValidOffset(const MachineInstr &MI, int64_t Offset,
				const TargetRegisterInfo *TRI) const;
				};
				} // end namespace llvm

				#endif // LLVM_LIB_TARGET_HEXAGON_HEXAGONMACHINEUNROLLER_H

lib/Target/Hexagon/HexagonMachineUnroller.cpp

This file was added.

				//===----- HexagonMachineUnroller.cpp - Custom Hexagon Machine Unroller ---===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// Custom Hexagon Machine Unroller
				//
				//===----------------------------------------------------------------------===//

				#include "HexagonMachineUnroller.h"
				#include "HexagonInstrInfo.h"
				#include "llvm/ADT/SmallVector.h"
				#include "llvm/CodeGen/MachineInstr.h"
				#include "llvm/CodeGen/MachineInstrBuilder.h"
				#include "llvm/CodeGen/MachineRegisterInfo.h"
				#include "llvm/CodeGen/MachineUnroller.h"
				#include "llvm/CodeGen/TargetInstrInfo.h"
				#include "llvm/Support/MathExtras.h"

				using namespace llvm;

				static bool isAddWithImmValue(const MachineInstr &MI) {
				return MI.getOpcode() == Hexagon::A2_addi;
				}

				/// Return true if MIA dominates MIB.
				static bool dominates(MachineInstr MIA, MachineInstr MIB) {
				if (MIA->getParent() != MIB->getParent())
				return false; // Don't know since machine dominator tree is out of date.

				MachineBasicBlock *MBB = MIA->getParent();
				MachineBasicBlock::iterator I = MBB->instr_begin();
				// Iterate over the basic block until MIA or MIB is found.
				for (; &I != MIA && &I != MIB; ++I)
				;

				// MIA dominates MIB if MIA is found first.
				return &*I == MIA;
				}

				/// Return the Phi register value that comes from the loop block.
				static unsigned getLoopPhiReg(MachineInstr Phi, MachineBasicBlock LoopBB) {
				for (unsigned i = 1, e = Phi->getNumOperands(); i != e; i += 2)
				if (Phi->getOperand(i + 1).getMBB() == LoopBB)
				return Phi->getOperand(i).getReg();
				llvm_unreachable("Unexpected Phi structure.");
				}

				static bool executesAtMostOnce(MachineInstr *MI) {
				if (MI->getOpcode() != Hexagon::A2_andir)
				return false;
				if (MI->getOperand(2).getImm() == 1)
				return true;
				return false;
				}

				unsigned HexagonMachineUnroller::getLoopCount(MachineBasicBlock &MBB,
				MachineInstr *IndVar,
				MachineInstr &Cmp) const {
				// We expect a hardware loop currently. This means that IndVar is set
				// to null, and the compare is the ENDLOOP instruction.
				assert((!IndVar) && HII->isEndLoopN(Cmp.getOpcode()) &&
				"Expecting a hardware loop");
				DebugLoc DL = Cmp.getDebugLoc();
				SmallPtrSet<MachineBasicBlock *, 8> VisitedBBs;
				MachineInstr *Loop = HII->findLoopInstr(
				&MBB, Cmp.getOpcode(), Cmp.getOperand(0).getMBB(), VisitedBBs);
				if (!Loop)
				return 0;
				// The loop trip count is a compile-time value.
				if (Loop->getOpcode() == Hexagon::J2_loop0i \|\|
				Loop->getOpcode() == Hexagon::J2_loop1i)
				return Loop->getOperand(1).getImm();

				// The loop trip count is a run-time value.
				assert(Loop->getOpcode() == Hexagon::J2_loop0r && "Unexpected instruction");
				return Loop->getOperand(1).getReg();
				}

				unsigned HexagonMachineUnroller::addUnrolledLoopCountMI(
				MachineBasicBlock &MBB, unsigned LC, unsigned UnrollFactor) const {
				assert(isPowerOf2_32(UnrollFactor) && "UnrollFactor must be a power of 2");
				MachineFunction *MF = MBB.getParent();
				unsigned ShiftBy = Log2_32(UnrollFactor);
				unsigned NewUnrolledLC = HII->createVR(MF, MVT::i32);
				BuildMI(MBB, MBB.instr_end(), DebugLoc(), HII->get(Hexagon::S2_lsr_i_r),
				NewUnrolledLC)
				.addReg(LC)
				.addImm(ShiftBy);
				return NewUnrolledLC;
				}

				unsigned
				HexagonMachineUnroller::addRemLoopCountMI(MachineBasicBlock &MBB, unsigned LC,
				unsigned UnrollFactor) const {
				assert(isPowerOf2_32(UnrollFactor) && "UnrollFactor must be a power of 2");
				MachineFunction *MF = MBB.getParent();
				unsigned RemLC = HII->createVR(MF, MVT::i32);
				BuildMI(MBB, MBB.instr_end(), DebugLoc(), HII->get(Hexagon::A2_andir), RemLC)
				.addReg(LC)
				.addImm(UnrollFactor - 1);
				return RemLC;
				}

				/// For instructions with a base and offset, return true if the new Offset
				/// is a valid value with the correct alignment.
				bool HexagonMachineUnroller::isValidOffset(
				const MachineInstr &MI, int64_t Offset,
				const TargetRegisterInfo *TRI) const {
				if (!HII->isValidOffset(MI.getOpcode(), Offset, TRI, false))
				return false;
				unsigned AlignMask = HII->getMemAccessSize(MI) - 1;
				return (Offset & AlignMask) == 0;
				}

				void HexagonMachineUnroller::changeLoopCount(
				MachineBasicBlock &BB, MachineBasicBlock &Preheader,
				MachineBasicBlock &Header, unsigned LC, MachineInstr *IndVar,
				MachineInstr &Cmp, SmallVectorImpl<MachineOperand> &Cond) const {

				// We expect a hardware loop currently. This means that IndVar is set
				// to null, and the compare is the ENDLOOP instruction.
				assert((!IndVar) && HII->isEndLoopN(Cmp.getOpcode()) &&
				"Expecting a hardware loop");
				MachineFunction *MF = Preheader.getParent();
				DebugLoc DL = Cmp.getDebugLoc();
				SmallPtrSet<MachineBasicBlock *, 8> VisitedBBs;
				MachineInstr *Loop = HII->findLoopInstr(
				&Header, Cmp.getOpcode(), Cmp.getOperand(0).getMBB(), VisitedBBs);
				if (!Loop)
				return;
				// The loop trip count is a run-time value.
				assert(Loop->getOpcode() == Hexagon::J2_loop0r && "Unexpected instruction");
				MachineRegisterInfo &MRI = Cmp.getParent()->getParent()->getRegInfo();
				MachineInstr *LCDefMI = MRI.getVRegDef(LC);
				MachineInstr *NewCmp;
				if (executesAtMostOnce(LCDefMI)) {
				// The loop executes at most once. Therefore, it must be unrolled
				// by removing loop setup, endloop and back-edge (jump) instruction to avoid
				// stalls due to front-end mispredictions.
				// FYI: the front end predicts endloop is taken twice and then waits to see
				// which way it goes when it encounters it a third time. Since loop[01] is
				// resolved by the back-end and it takes at least 10 cycles from fetch to
				// commit, for the very small loops that execute only once, it can result
				// into a lot of stalled cycles.
				unsigned LoopEnd = HII->createVR(MF, MVT::i1);
				NewCmp = BuildMI(&BB, DL, HII->get(Hexagon::C2_cmpgtui), LoopEnd)
				.addReg(LC)
				.addImm(0);
				Cmp.eraseFromParent();
				Header.removeSuccessor(&Header);
				} else {
				unsigned LoopEnd = HII->createVR(MF, MVT::i1);
				NewCmp = BuildMI(&BB, DL, HII->get(Hexagon::C2_cmpgtui), LoopEnd)
				.addReg(LC)
				.addImm(0);
				BuildMI(&Preheader, DL, HII->get(Hexagon::J2_loop0r))
				.addMBB(Loop->getOperand(0).getMBB())
				.addReg(LC);
				}
				// Delete the old loop instruction.
				Loop->eraseFromParent();
				Cond.push_back(MachineOperand::CreateImm(Hexagon::J2_jumpf));
				Cond.push_back(NewCmp->getOperand(0));
				}

				bool HexagonMachineUnroller::isValidPostIncValue(const MachineInstr &MI,
				int IncVal) const {
				unsigned AlignMask = HII->getMemAccessSize(MI) - 1;
				if ((IncVal & AlignMask) != 0)
				return false;
				// Number of total bits in the instruction used to encode Inc value.
				unsigned IncBits = 4;
				IncBits += Log2_32(HII->getMemAccessSize(MI));
				int MinValidVal = -1U << (IncBits - 1);
				int MaxValidVal = ~(-1U << (IncBits - 1));
				return (IncVal >= MinValidVal && IncVal <= MaxValidVal);
				}

				void HexagonMachineUnroller::foldAdds(MachineBasicBlock &BB) const {
				for (MachineBasicBlock::iterator I = BB.getFirstNonPHI(),
				E = BB.getFirstTerminator();
				I != E;) {
				MachineInstr MI = &I;
				I++;
				if (!isAddWithImmValue(*MI))
				continue;
				unsigned DefReg = MI->getOperand(0).getReg();
				unsigned AddReg = MI->getOperand(1).getReg();
				int64_t AddImm = MI->getOperand(2).getImm();

				SmallVector<MachineInstr *, 4> UseList;
				for (MachineRegisterInfo::use_iterator RI = MRI->use_begin(DefReg),
				RE = MRI->use_end();
				RI != RE; ++RI) {
				MachineOperand &MO = *RI;
				MachineInstr *UseMI = MO.getParent();
				UseList.push_back(UseMI);
				}
				for (auto UseMI : UseList) {
				if (isAddWithImmValue(*UseMI)) {
				int64_t NewImm = AddImm + UseMI->getOperand(2).getImm();
				UseMI->getOperand(1).setReg(AddReg);
				UseMI->getOperand(2).setImm(NewImm);
				} else if (HII->isBaseImmOffset(*UseMI))
				updateBaseAndOffset(UseMI, MI);
				}
				}
				removeDeadInstructions(BB);
				}

				void HexagonMachineUnroller::updateBaseAndOffset(MachineInstr *MI,
				MachineInstr *AddMI) const {
				const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
				assert(HII->isBaseImmOffset(*MI));
				unsigned BasePos, OffsetPos;
				if (!HII->getBaseAndOffsetPosition(*MI, BasePos, OffsetPos))
				return;

				MachineOperand &OffsetOp = MI->getOperand(OffsetPos);
				MachineOperand &BaseOp = MI->getOperand(BasePos);

				if (BaseOp.getReg() != AddMI->getOperand(0).getReg())
				return;

				unsigned IncBase = AddMI->getOperand(1).getReg();
				int64_t IncValue = AddMI->getOperand(2).getImm();

				int64_t NewOffset = OffsetOp.getImm() + IncValue;
				if (!isValidOffset(*MI, NewOffset, TRI))
				return;

				OffsetOp.setImm(NewOffset);
				BaseOp.setReg(IncBase);
				}

				void HexagonMachineUnroller::replacePostIncWithBaseOffset(
				MachineBasicBlock *BB) const {
				for (MachineBasicBlock::iterator I = BB->getFirstNonPHI(),
				E = BB->getFirstTerminator();
				I != E;) {
				MachineInstr MI = &I;
				I++;
				if (!HII->isPostIncrement(*MI))
				continue;

				replacePostIncWithBaseOffset(MI);
				}
				}

				void HexagonMachineUnroller::replacePostIncWithBaseOffset(
				MachineInstr *MI) const {
				if (!HII->isPostIncrement(MI) \|\| HII->isPredicated(MI))
				return;
				short NewOpcode = HII->changeAddrMode_pi_io(MI->getOpcode());
				if (NewOpcode < 0)
				return;

				unsigned BasePos = 0, OffsetPos = 0;
				if (!HII->getBaseAndOffsetPosition(*MI, BasePos, OffsetPos))
				return;
				const MachineOperand &IncValue = MI->getOperand(OffsetPos);
				const MachineOperand &IncBase = MI->getOperand(BasePos);

				MachineBasicBlock &MBB = *MI->getParent();
				DebugLoc DL = MI->getDebugLoc();
				MachineOperand *IncDest;
				MachineInstrBuilder MIB;
				if (MI->mayLoad()) {
				IncDest = &MI->getOperand(1);
				const MachineOperand &LDValue = MI->getOperand(0);
				MIB = BuildMI(MBB, *MI, DL, HII->get(NewOpcode));
				MIB.add(LDValue).add(IncBase).addImm(0);
				} else {
				IncDest = &MI->getOperand(0);
				const MachineOperand &STValue = MI->getOperand(3);
				MIB = BuildMI(MBB, *MI, DL, HII->get(NewOpcode));
				MIB.add(IncBase).addImm(0).add(STValue);
				}

				// Transfer memoperands.
				MIB->setMemRefs(*MBB.getParent(), MI->memoperands());

				MachineInstrBuilder MIBA = BuildMI(MBB, *MI, DL, HII->get(Hexagon::A2_addi));
				MIBA.add(*IncDest).add(IncBase).add(IncValue);
				MI->eraseFromParent();
				}


				// Convert post-inc addressing mode into base-offset along with an
				// 'add' instruction that is used to increment the address.
				// This is done to break dependence between post-increment memory operations
				// in the unrolled version of the loop. 'add' instructions are later
				// optimized out.
				// Ex:
				// original loop:
				// v1 = phi(v0, v3)
				// v2,v3 = post_load v1, 4

				// Unrolling without optimizing post-increments:
				// v1 = phi(v0, v3')
				// v2,v3 = post_load v1, 4
				// v2',v3'= post_load v3, 4

				// Instead, we want to have this:
				// v1 = phi(v0, v3')
				// v2,v3' = post_load v1, 8
				// v2 = load v3', -4
				//
				void HexagonMachineUnroller::generatePostInc(MachineBasicBlock *BB) const {
				const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
				MachineBasicBlock::iterator MII = BB->getFirstNonPHI();
				MachineBasicBlock::iterator MIE = BB->instr_begin();
				bool isOK = true;
				while (MII != MIE) {
				MachineInstr Phi = &std::prev(MII);
				MII = std::prev(MII);
				unsigned LoopVal = getLoopPhiReg(Phi, BB);
				MachineInstr *LoopInst = MRI->getVRegDef(LoopVal);
				if (!isAddWithImmValue(*LoopInst))
				continue;

				if (LoopInst->getOpcode() != Hexagon::A2_addi)
				continue;

				unsigned AddReg = LoopInst->getOperand(1).getReg();
				int64_t AddImm = LoopInst->getOperand(2).getImm();
				SmallVector<MachineInstr *, 4> UseList;
				MachineInstr *PostIncCandidate = nullptr;

				for (MachineRegisterInfo::use_iterator RI = MRI->use_begin(AddReg),
				RE = MRI->use_end();
				RI != RE; ++RI) {
				MachineOperand &MO = *RI;
				MachineInstr *UseMI = MO.getParent();
				if (UseMI == LoopInst)
				continue;
				if (!dominates(UseMI, LoopInst)) {
				isOK = false;
				break;
				}
				unsigned BaseReg;
				int64_t Offset;
				if (!HII->isBaseImmOffset(*UseMI) \|\|
				!HII->getMemOpBaseRegImmOfs(*UseMI, BaseReg, Offset, TRI)) {
				isOK = false;
				break;
				}
				int64_t NewOffset = Offset - AddImm;
				if (!isValidOffset(*UseMI, NewOffset, TRI) \|\| BaseReg != AddReg) {
				isOK = false;
				break;
				}
				if (Offset == 0 && !PostIncCandidate) {
				PostIncCandidate = UseMI;
				continue;
				}
				UseList.push_back(UseMI);
				}

				if (!isOK)
				continue;

				// If a candidate is found, replace it with the post-inc instruction.
				// Also, adjust offset for other uses as needed.
				if (!PostIncCandidate \|\| !canReplaceWithPostInc(PostIncCandidate, LoopInst))
				continue;

				for (auto UseMI : UseList) {
				if (!dominates(PostIncCandidate, UseMI))
				continue;
				unsigned BasePos, OffsetPos;
				if (HII->getBaseAndOffsetPosition(*UseMI, BasePos, OffsetPos)) {
				// New offset has already been validated; no need to do it again.
				int64_t NewOffset = UseMI->getOperand(OffsetPos).getImm() - AddImm;
				UseMI->getOperand(OffsetPos).setImm(NewOffset);
				UseMI->getOperand(BasePos).setReg(LoopVal);
				}
				}
				replaceWithPostInc(PostIncCandidate, LoopInst);
				}
				}

				bool HexagonMachineUnroller::canReplaceWithPostInc(MachineInstr *MI,
				MachineInstr *AddMI) const {
				if (HII->changeAddrMode_io_pi(MI->getOpcode()) < 0)
				return false;
				assert(AddMI->getOpcode() == Hexagon::A2_addi);
				return isValidPostIncValue(*MI, AddMI->getOperand(2).getImm());
				}

				void HexagonMachineUnroller::replaceWithPostInc(MachineInstr *MI,
				MachineInstr *AddMI) const {
				short NewOpcode = HII->changeAddrMode_io_pi(MI->getOpcode());
				assert(NewOpcode >= 0 &&
				"Couldn't change base offset to post-increment form");

				MachineBasicBlock &MBB = *MI->getParent();
				DebugLoc DL = MI->getDebugLoc();
				const MachineOperand &IncDest = AddMI->getOperand(0);
				const MachineOperand &IncBase = AddMI->getOperand(1);
				const MachineOperand &IncValue = AddMI->getOperand(2);
				MachineInstrBuilder MIB;
				if (MI->mayLoad()) {
				const MachineOperand &LDValue = MI->getOperand(0);
				MIB = BuildMI(MBB, *MI, DL, HII->get(NewOpcode));
				MIB.add(LDValue).add(IncDest).add(IncBase).add(IncValue);
				} else {
				const MachineOperand &STValue = MI->getOperand(2);
				MIB = BuildMI(MBB, *MI, DL, HII->get(NewOpcode));
				MIB.add(IncDest).add(IncBase).add(IncValue).add(STValue);
				}

				// Transfer memoperands.
				MIB->setMemRefs(*MBB.getParent(), MI->memoperands());

				MI->eraseFromParent();
				AddMI->eraseFromParent();
				}

				/// Remove instructions that generate values with no uses.
				void HexagonMachineUnroller::removeDeadInstructions(
				MachineBasicBlock &BB) const {
				// For BB, check that the value defined by each instruction is used.
				// If not, delete it.
				for (MachineBasicBlock::reverse_instr_iterator MI = BB.instr_rbegin(),
				ME = BB.instr_rend();
				MI != ME;) {
				// From DeadMachineInstructionElem. Don't delete inline assembly.
				if (MI->isInlineAsm()) {
				++MI;
				continue;
				}
				bool SawStore = false;
				// Check if it's safe to remove the instruction due to side effects.
				if (!MI->isSafeToMove(nullptr, SawStore)) {
				++MI;
				continue;
				}
				unsigned Uses = 0;
				for (MachineInstr::mop_iterator MOI = MI->operands_begin(),
				MOE = MI->operands_end();
				MOI != MOE; ++MOI) {
				if (!MOI->isReg() \|\| !MOI->isDef())
				continue;
				unsigned reg = MOI->getReg();
				// Assume physical registers are used.
				if (TargetRegisterInfo::isPhysicalRegister(reg)) {
				Uses++;
				continue;
				}
				if (MRI->use_begin(reg) != MRI->use_end())
				Uses++;
				}
				if (!Uses) {
				MI++->eraseFromParent();
				continue;
				}
				++MI;
				}
				}

				void HexagonMachineUnroller::optimize(MachineBasicBlock &BB) const {
				replacePostIncWithBaseOffset(&BB);
				foldAdds(BB);
				generatePostInc(&BB);
				}

lib/Target/Hexagon/HexagonTargetMachine.cpp

Show All 9 Lines
// Implements the info about Hexagon target spec.		// Implements the info about Hexagon target spec.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "HexagonTargetMachine.h"		#include "HexagonTargetMachine.h"
#include "Hexagon.h"		#include "Hexagon.h"
#include "HexagonISelLowering.h"		#include "HexagonISelLowering.h"
#include "HexagonMachineScheduler.h"		#include "HexagonMachineScheduler.h"
		#include "HexagonMachineUnroller.h"
#include "HexagonTargetObjectFile.h"		#include "HexagonTargetObjectFile.h"
#include "HexagonTargetTransformInfo.h"		#include "HexagonTargetTransformInfo.h"
#include "llvm/CodeGen/Passes.h"		#include "llvm/CodeGen/Passes.h"
#include "llvm/CodeGen/TargetPassConfig.h"		#include "llvm/CodeGen/TargetPassConfig.h"
#include "llvm/IR/LegacyPassManager.h"		#include "llvm/IR/LegacyPassManager.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/TargetRegistry.h"		#include "llvm/Support/TargetRegistry.h"
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines
/// HexagonTargetMachineModule - Note that this is used on hosts that		/// HexagonTargetMachineModule - Note that this is used on hosts that
/// cannot link in a library unless there are references into the		/// cannot link in a library unless there are references into the
/// library. In particular, it seems that it is not possible to get		/// library. In particular, it seems that it is not possible to get
/// things to work on Win32 without this. Though it is unused, do not		/// things to work on Win32 without this. Though it is unused, do not
/// remove it.		/// remove it.
extern "C" int HexagonTargetMachineModule;		extern "C" int HexagonTargetMachineModule;
int HexagonTargetMachineModule = 0;		int HexagonTargetMachineModule = 0;

		static MachineUnroller *
		createHexagonMachineUnroller(MachineUnrollerContext *C) {
		MachineUnroller *U = new HexagonMachineUnroller(C);
		return U;
		}

static ScheduleDAGInstrs createVLIWMachineSched(MachineSchedContext C) {		static ScheduleDAGInstrs createVLIWMachineSched(MachineSchedContext C) {
ScheduleDAGMILive *DAG =		ScheduleDAGMILive *DAG =
new VLIWMachineScheduler(C, make_unique<ConvergingVLIWScheduler>());		new VLIWMachineScheduler(C, make_unique<ConvergingVLIWScheduler>());
DAG->addMutation(make_unique<HexagonSubtarget::UsrOverflowMutation>());		DAG->addMutation(make_unique<HexagonSubtarget::UsrOverflowMutation>());
DAG->addMutation(make_unique<HexagonSubtarget::HVXMemLatencyMutation>());		DAG->addMutation(make_unique<HexagonSubtarget::HVXMemLatencyMutation>());
DAG->addMutation(make_unique<HexagonSubtarget::CallMutation>());		DAG->addMutation(make_unique<HexagonSubtarget::CallMutation>());
DAG->addMutation(createCopyConstrainDAGMutation(DAG->TII, DAG->TRI));		DAG->addMutation(createCopyConstrainDAGMutation(DAG->TII, DAG->TRI));
return DAG;		return DAG;
▲ Show 20 Lines • Show All 166 Lines • ▼ Show 20 Lines	HexagonTargetMachine &getHexagonTargetMachine() const {
return getTM<HexagonTargetMachine>();		return getTM<HexagonTargetMachine>();
}		}

ScheduleDAGInstrs *		ScheduleDAGInstrs *
createMachineScheduler(MachineSchedContext *C) const override {		createMachineScheduler(MachineSchedContext *C) const override {
return createVLIWMachineSched(C);		return createVLIWMachineSched(C);
}		}

		MachineUnroller*
		createMachineUnroller(MachineUnrollerContext *C) const override {
		return createHexagonMachineUnroller(C);
		}

void addIRPasses() override;		void addIRPasses() override;
bool addInstSelector() override;		bool addInstSelector() override;
void addPreRegAlloc() override;		void addPreRegAlloc() override;
void addPostRegAlloc() override;		void addPostRegAlloc() override;
void addPreSched2() override;		void addPreSched2() override;
void addPreEmitPass() override;		void addPreEmitPass() override;
};		};
} // namespace		} // namespace
▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/bit-gen-rseq.ll

	; RUN: llc -march=hexagon -disable-hsdr -hexagon-subreg-liveness < %s \| FileCheck %s			; RUN: llc -march=hexagon -disable-hsdr -hexagon-subreg-liveness \
				; RUN: -enable-pipeliner-unroll=false < %s \| FileCheck %s
	; Check that we don't generate any bitwise operations.			; Check that we don't generate any bitwise operations.

	; CHECK-NOT: = or(			; CHECK-NOT: = or(
	; CHECK-NOT: = and(			; CHECK-NOT: = and(

	target triple = "hexagon"			target triple = "hexagon"

	define i32 @fred(i32* nocapture readonly %p, i32 %n) #0 {			define i32 @fred(i32* nocapture readonly %p, i32 %n) #0 {
	Show All 34 Lines

test/CodeGen/Hexagon/hwloop4.ll

	; RUN: llc -march=hexagon -mcpu=hexagonv5 < %s \| FileCheck %s			; RUN: llc -march=hexagon -mcpu=hexagonv5 -enable-pipeliner-unroll=false \
				; RUN: < %s \| FileCheck %s
	;			;
	; Remove the unnecessary 'add' instruction used for the hardware loop setup.			; Remove the unnecessary 'add' instruction used for the hardware loop setup.

	; CHECK: [[OP0:r[0-9]+]] = add([[OP1:r[0-9]+]],#-[[OP2:[0-9]+]]			; CHECK: [[OP0:r[0-9]+]] = add([[OP1:r[0-9]+]],#-[[OP2:[0-9]+]]
	; CHECK-NOT: add([[OP0]],#[[OP2]])			; CHECK-NOT: add([[OP0]],#[[OP2]])
	; CHECK: lsr([[OP1]],#{{[0-9]+}})			; CHECK: lsr([[OP1]],#{{[0-9]+}})
	; CHECK: loop0			; CHECK: loop0

	▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/late_instr.ll

	; RUN: llc -march=hexagon -disable-hsdr < %s \| FileCheck %s			; RUN: llc -march=hexagon -disable-hsdr -enable-pipeliner-unroll=false \
				; RUN: < %s \| FileCheck %s

	; Check if instruction vandqrt.acc and its predecessor are scheduled in consecutive packets.			; Check if instruction vandqrt.acc and its predecessor are scheduled in consecutive packets.
	; CHECK: or(q{{[0-3]+}},q{{[0-3]+}})			; CHECK: or(q{{[0-3]+}},q{{[0-3]+}})
	; CHECK: }			; CHECK: }
	; CHECK-NOT: }			; CHECK-NOT: }
	; CHECK: \|= vand(q{{[0-3]+}},r{{[0-9]+}})			; CHECK: \|= vand(q{{[0-3]+}},r{{[0-9]+}})
	; CHECK: endloop0			; CHECK: endloop0

	▲ Show 20 Lines • Show All 222 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/miunroll-optimize-memrefs1.ll

This file was added.

				; RUN: llc -O3 -march=hexagon -enable-pipeliner-unroll=false \
				; RUN: < %s \| FileCheck --check-prefix=CHECK-NO-UNROLL %s

				; RUN: llc -O3 -march=hexagon -enable-pipeliner-unroll=true \
				; RUN: < %s \| FileCheck --check-prefix=CHECK-UNROLL %s

				; Without the machine unroller, make sure that the inner most loop has only one sfmpy instruction.

				; CHECK-NO-UNROLL: loop0(.LBB0_[[LOOP:.]]
				; CHECK-NO-UNROLL: .LBB0_[[LOOP]]:
				; CHECK-NO-UNROLL: {
				; CHECK-NO-UNROLL-DAG: {
				; CHECK-NO-UNROLL-DAG: sfmpy
				; CHECK-NO-UNROLL-NOT: sfmpy
				; CHECK-NO-UNROLL: endloop0
				; CHECK-NO-UNROLL-NOT: loop0

				; When the machine unroller is enabled, the inner most loop in the test
				; gets unrolled by 2. Make sure that there are only 3 packets and
				; 2 sfmpy instructions (one for each loop iteration) in the unrolled loop.

				; CHECK-UNROLL: loop0(.LBB0_[[LOOP:.]]
				; CHECK-UNROLL: .LBB0_[[LOOP]]:
				; CHECK-UNROLL: sfmpy
				; CHECK-UNROLL: sfmpy
				; CHECK-UNROLL-NOT: sfmpy
				; CHECK-UNROLL: } :endloop0


				%struct.loops_params_s = type { i32, i32, i32, i32, i32, i32, i32, [32 x i32], [32 x i32], i32, i32, i32, i32, i32, i8, i32, i32, float, i32, float, float, float, float, i32, i8, %struct.intparts_s, float, float, float, i32 }
				%struct.intparts_s = type { i8, i16, i32, i32 }

				; Function Attrs: nounwind
				define float @inner_product(%struct.loops_params_s* %p) {
				entry:
				%v = getelementptr inbounds %struct.loops_params_s, %struct.loops_params_s* %p, i32 0, i32 17
				%0 = load float, float* %v, align 4
				%1 = load float, float* %0, align 4
				%arrayidx2 = getelementptr inbounds float, float* %0, i32 1
				%2 = load float, float* %arrayidx2, align 4
				%N = getelementptr inbounds %struct.loops_params_s, %struct.loops_params_s* %p, i32 0, i32 5
				%3 = load i32, i32* %N, align 4
				%Loop = getelementptr inbounds %struct.loops_params_s, %struct.loops_params_s* %p, i32 0, i32 9
				%4 = load i32, i32* %Loop, align 4
				%vsize = getelementptr inbounds %struct.loops_params_s, %struct.loops_params_s* %p, i32 0, i32 1
				%5 = load i32, i32* %vsize, align 4
				%call = tail call i32 bitcast (i32 (...)* @reinit_vec to i32 (%struct.loops_params_s, float, i32))(%struct.loops_params_s %p, float* %1, i32 %5)
				%6 = load i32, i32* %vsize, align 4
				%call4 = tail call i32 bitcast (i32 (...)* @reinit_vec to i32 (%struct.loops_params_s, float, i32))(%struct.loops_params_s %p, float* %2, i32 %6)
				%cmp39 = icmp slt i32 %4, 1
				br i1 %cmp39, label %for.end13, label %for.body.lr.ph

				for.body.lr.ph:
				%cmp636 = icmp sgt i32 %3, 0
				br label %for.body

				for.body:
				%q.042 = phi float [ 0.000000e+00, %for.body.lr.ph ], [ %q.1.lcssa, %for.inc11 ]
				%l.040 = phi i32 [ 1, %for.body.lr.ph ], [ %inc12, %for.inc11 ]
				br i1 %cmp636, label %for.body7.lr.ph, label %for.inc11

				for.body7.lr.ph:
				%arrayidx8.gep = getelementptr float, float* %2, i32 %l.040
				br label %for.body7

				for.body7:
				%q.138 = phi float [ %q.042, %for.body7.lr.ph ], [ %add10, %for.body7 ]
				%arrayidx8.phi = phi float* [ %arrayidx8.gep, %for.body7.lr.ph ], [ %arrayidx8.inc, %for.body7 ]
				%arrayidx9.phi = phi float* [ %1, %for.body7.lr.ph ], [ %arrayidx9.inc, %for.body7 ]
				%k.037 = phi i32 [ 0, %for.body7.lr.ph ], [ %inc, %for.body7 ]
				%7 = load float, float* %arrayidx8.phi, align 4
				%8 = load float, float* %arrayidx9.phi, align 4
				%mul = fmul float %7, %8
				%add10 = fadd float %q.138, %mul
				%inc = add nuw nsw i32 %k.037, 1
				%exitcond = icmp eq i32 %inc, %3
				%arrayidx8.inc = getelementptr float, float* %arrayidx8.phi, i32 32
				%arrayidx9.inc = getelementptr float, float* %arrayidx9.phi, i32 32
				br i1 %exitcond, label %for.inc11, label %for.body7

				for.inc11:
				%q.1.lcssa = phi float [ %q.042, %for.body ], [ %add10, %for.body7 ]
				%inc12 = add nuw nsw i32 %l.040, 1
				%exitcond44 = icmp eq i32 %l.040, %4
				br i1 %exitcond44, label %for.end13, label %for.body

				for.end13:
				%q.0.lcssa = phi float [ 0.000000e+00, %entry ], [ %q.1.lcssa, %for.inc11 ]
				ret float %q.0.lcssa
				}

				declare i32 @reinit_vec(...) local_unnamed_addr #0

test/CodeGen/Hexagon/miunroll-optimize-memrefs2.ll

This file was added.

				; RUN: llc -O3 -march=hexagon -enable-pipeliner-unroll=false \
				; RUN: < %s \| FileCheck --check-prefix=CHECK-NO-UNROLL %s

				; RUN: llc -O3 -march=hexagon -enable-pipeliner-unroll=true \
				; RUN: < %s \| FileCheck --check-prefix=CHECK-UNROLL %s

				; Without the machine unroller, check that the inner most loop has only one sfmpy instruction.

				; CHECK-NO-UNROLL: loop0(.LBB0_[[LOOP:.]]
				; CHECK-NO-UNROLL: .LBB0_[[LOOP]]:
				; CHECK-NO-UNROLL: {
				; CHECK-NO-UNROLL-DAG: {
				; CHECK-NO-UNROLL-DAG: sfmpy
				; CHECK-NO-UNROLL-NOT: sfmpy
				; CHECK-NO-UNROLL: endloop0
				; CHECK-NO-UNROLL-NOT: loop0

				; When the machine unroller is enabled, the inner most loop in the test
				; gets unrolled by 4. Make sure that there are only 4 packets and
				; 4 sfmpy instructions (one for each loop iteration) in the unrolled loop.

				; CHECK-UNROLL: loop0(.LBB0_[[LOOP:.]]
				; CHECK-UNROLL: .LBB0_[[LOOP]]:
				; CHECK-UNROLL: {
				; CHECK-UNROLL-NOT: {
				; CHECK-UNROLL: sfmpy
				; CHECK-UNROLL: {
				; CHECK-UNROLL-NOT: {
				; CHECK-UNROLL: sfmpy
				; CHECK-UNROLL: {
				; CHECK-UNROLL-NOT: {
				; CHECK-UNROLL: sfmpy
				; CHECK-UNROLL: {
				; CHECK-UNROLL-NOT: {
				; CHECK-UNROLL: sfmpy
				; CHECK-UNROLL-NOT: {
				; CHECK-UNROLL: } :endloop0
				; CHECK-UNROLL: loop0(.LBB0_[[LOOP:.]]

				; Function Attrs: norecurse nounwind readonly
				define float @PolyEval_horner(float %pt, i32 %degree, float* noalias nocapture readonly %coeff) local_unnamed_addr {
				entry:
				%arrayidx = getelementptr inbounds float, float* %coeff, i32 %degree
				%0 = load float, float* %arrayidx, align 4
				%tobool8 = icmp eq i32 %degree, 0
				br i1 %tobool8, label %while.end, label %while.body.preheader

				while.body.preheader:
				br label %while.body

				while.body:
				%sum.010 = phi float [ %add, %while.body ], [ %0, %while.body.preheader ]
				%i.09 = phi i32 [ %sub, %while.body ], [ %degree, %while.body.preheader ]
				%mul = fmul contract float %sum.010, %pt
				%sub = add i32 %i.09, -32
				%arrayidx1 = getelementptr inbounds float, float* %coeff, i32 %sub
				%1 = load float, float* %arrayidx1, align 4
				%add = fadd contract float %mul, %1
				%tobool = icmp eq i32 %sub, 0
				br i1 %tobool, label %while.end, label %while.body

				while.end:
				%sum.0.lcssa = phi float [ %0, %entry ], [ %add, %while.body ]
				ret float %sum.0.lcssa
				}

test/CodeGen/Hexagon/miunroll-update-memoperands.ll

This file was added.

				; RUN: llc -march=hexagon -O3 -enable-pipeliner-unroll=true < %s
				; REQUIRES: asserts

				; This test used to fail with an "UNREACHABLE" executed in Machine Unroller due to a bug
				; in computeDelta function.

				%class.mrObjectRecord = type { i32, i32, %class.mrSurfaceList, i32, i32, i32, i32, i32, i32 }
				%class.mrSurfaceList = type { %class.ggSolidTexture, %class.ggTrain }
				%class.ggSolidTexture = type { i32 (...)** }
				%class.ggTrain = type { %class.ggSolidTexture**, i32, i32 }

				declare i32 @__gxx_personality_v0(...)

				; Function Attrs: nobuiltin
				declare void @_Znaj() local_unnamed_addr

				; Function Attrs: norecurse
				declare dso_local fastcc %class.mrObjectRecord* @_ZN12ggDictionaryI14mrObjectRecordE6lookUpERK8ggString() unnamed_addr align 2

				; Function Attrs: norecurse
				define dso_local fastcc void @_ZN7mrScene9AddObjectEP9mrSurfaceRK8ggStringS4_i() unnamed_addr align 2 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
				entry:
				br i1 undef, label %_ZN12ggDictionaryI10ggMaterialE6lookUpERK8ggString.exit, label %while.body.i.i.lr.ph

				while.body.i.i.lr.ph:
				unreachable

				_ZN12ggDictionaryI10ggMaterialE6lookUpERK8ggString.exit:
				%call5 = tail call fastcc %class.mrObjectRecord* @_ZN12ggDictionaryI14mrObjectRecordE6lookUpERK8ggString()
				br i1 undef, label %if.then7, label %if.end11

				if.then7:
				invoke void @_Znaj()
				to label %invoke.cont unwind label %lpad

				invoke.cont:
				br label %if.end11

				lpad:
				%0 = landingpad { i8*, i32 }
				cleanup
				resume { i8*, i32 } %0

				if.end11:
				%recPtr.0 = phi %class.mrObjectRecord* [ %call5, %_ZN12ggDictionaryI10ggMaterialE6lookUpERK8ggString.exit ], [ undef, %invoke.cont ]
				%surfaces.i.i7 = getelementptr inbounds %class.mrObjectRecord, %class.mrObjectRecord* %recPtr.0, i32 0, i32 2, i32 1
				%data.i.i.i11 = getelementptr inbounds %class.ggTrain, %class.ggTrain* %surfaces.i.i7, i32 0, i32 0
				br label %for.body.i.i.i

				for.cond.cleanup.i.i.i:
				ret void

				for.body.i.i.i:
				%i.0.i.i.i52 = phi i32 [ %inc.i.i.i, %for.body.i.i.i ], [ 0, %if.end11 ]
				%1 = load i32, i32* undef, align 4
				%2 = load %class.ggSolidTexture, %class.ggSolidTexture* %data.i.i.i11, align 4
				%arrayidx9.i.i.i = getelementptr inbounds %class.ggSolidTexture, %class.ggSolidTexture* %2, i32 %i.0.i.i.i52
				%3 = bitcast %class.ggSolidTexture** %arrayidx9.i.i.i to i32*
				store i32 %1, i32* %3, align 4
				%inc.i.i.i = add nuw nsw i32 %i.0.i.i.i52, 1
				%cmp7.i.i.i = icmp slt i32 %inc.i.i.i, undef
				br i1 %cmp7.i.i.i, label %for.body.i.i.i, label %for.cond.cleanup.i.i.i
				}

test/CodeGen/Hexagon/miunroll-update-offset.ll

This file was added.

				; RUN: llc -march=hexagon -O3 -enable-pipeliner-unroll=true < %s \| FileCheck %s

				; After machine unrolling the loop, make sure that all base+offset loads
				; use correct base and offset values.

				; CHECK: loop0(.LBB0_[[LOOP:.]],
				; CHECK: .LBB0_[[LOOP]]:
				; CHECK: memh([[REG1:(r[0-9]+)]]+#{{[0-9]+}}) = r{{[0-9]+}}
				; CHECK-DAG: memh([[REG1]]+#{{32\|0}}) = r{{[0-9]+}}
				; CHECK-DAG: memh([[REG1]]+#64) = r{{[0-9]+}}
				; CHECK-DAG: memh([[REG1]]+#96) = r{{[0-9]+}}
				; CHECK: endloop0

				%struct.csGroup = type { i32, i32, i32, i16, i16, i16, i16, i16, i16, i16, i16, i16}

				@numRows = external local_unnamed_addr global i32, align 4
				@MPG = common local_unnamed_addr global i32 0, align 4
				@groupArray = common local_unnamed_addr global %struct.csGroup* null, align 4
				@numGroups = common local_unnamed_addr global i32 0, align 4

				; Function Attrs: nounwind
				define i32 @globe() local_unnamed_addr {
				entry:
				%0 = load i32, i32* @numRows, align 4
				%add = shl i32 %0, 1
				%add1 = add i32 %add, 6
				store i32 %add1, i32* @MPG, align 4
				%1 = mul i32 %0, 72
				%mul3 = add i32 %1, 252
				%call = tail call i32 bitcast (i32 (...)* @safe_malloc to i32 (i32)*)(i32 %mul3) #2
				%2 = inttoptr i32 %call to %struct.csGroup*
				store %struct.csGroup* %2, %struct.csGroup** @groupArray, align 4
				%3 = load i32, i32* @numGroups, align 4
				%cmp10 = icmp slt i32 %3, 1
				br i1 %cmp10, label %for.end, label %for.body.preheader

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.body: ; preds = %for.body.preheader, %for.body
				%group.011 = phi i32 [ %inc, %for.body ], [ 1, %for.body.preheader ]
				%conv = trunc i32 %group.011 to i16
				%flag = getelementptr inbounds %struct.csGroup, %struct.csGroup* %2, i32 %group.011, i32 11
				store i16 %conv, i16* %flag, align 4
				%inc = add nuw nsw i32 %group.011, 1
				%cmp = icmp slt i32 %group.011, %3
				br i1 %cmp, label %for.body, label %for.end

				for.end: ; preds = %for.body, %entry
				ret i32 undef
				}

				declare i32 @safe_malloc(...) local_unnamed_addr

test/CodeGen/Hexagon/miunroll.ll

This file was added.

				; RUN: llc -O3 -march=hexagon -enable-pipeliner-unroll=false \
				; RUN: < %s \| FileCheck --check-prefix=CHECK-NO-UNROLL %s

				; RUN: llc -O3 -march=hexagon -enable-pipeliner-unroll=true \
				; RUN: < %s \| FileCheck --check-prefix=CHECK-UNROLL %s

				; Make sure that there's only one hardware loop when the machine unroller is disabled.
				; CHECK-NO-UNROLL: loop0(.LBB0_[[LOOP:.]]
				; CHECK-NO-UNROLL: .LBB0_[[LOOP]]:
				; CHECK-NO-UNROLL: sfmpy
				; CHECK-NO-UNROLL-NOT: sfmpy
				; CHECK-NO-UNROLL: endloop0
				; CHECK-NO-UNROLL-NOT: loop0

				; Make sure that there are multiple hardware loops when the machine unroller is enabled, one for the unrolled loop and another for the remainder loop.
				; CHECK-UNROLL: loop0(.LBB0_[[LOOP:.]]
				; CHECK-UNROLL: .LBB0_[[LOOP]]:
				; CHECK-UNROLL: sfmpy
				; CHECK-UNROLL: sfmpy
				; CHECK-UNROLL: endloop0

				define float @test(i32 %n, float %da, float* noalias nocapture readonly %dx, i32 %incx, float* noalias nocapture %dy, i32 %incy) local_unnamed_addr {
				entry:
				%cmp = icmp slt i32 %n, 1
				%cmp1 = fcmp oeq float %da, 0.000000e+00
				%or.cond45 = or i1 %cmp, %cmp1
				br i1 %or.cond45, label %if.then6, label %if.end3

				if.end3:
				%cmp4 = icmp ne i32 %incx, 1
				%cmp5 = icmp ne i32 %incy, 1
				%or.cond = or i1 %cmp4, %cmp5
				br i1 %or.cond, label %if.then6, label %for.body.lr.ph

				if.then6:
				ret float 0.000000e+00

				for.body.lr.ph:
				%0 = load float, float* %dy, align 4
				br label %for.body

				for.body:
				%arrayidx18.phi = phi float* [ %dx, %for.body.lr.ph ], [ %arrayidx18.inc, %for.body ]
				%arrayidx21.phi = phi float* [ %dy, %for.body.lr.ph ], [ %arrayidx21.inc, %for.body ]
				%i.047 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
				%1 = load float, float* %arrayidx18.phi, align 4
				%mul19 = fmul float %1, %da
				%add20 = fadd float %0, %mul19
				store float %add20, float* %arrayidx21.phi, align 4
				%inc = add nuw nsw i32 %i.047, 1
				%exitcond = icmp eq i32 %inc, %n
				%arrayidx18.inc = getelementptr float, float* %arrayidx18.phi, i32 32
				%arrayidx21.inc = getelementptr float, float* %arrayidx21.phi, i32 32
				br i1 %exitcond, label %if.then6, label %for.body
				}

test/CodeGen/Hexagon/no-packets.ll

	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon -enable-pipeliner-unroll=false < %s \| FileCheck %s
	; Check that there are no packets with two or more instructions, except			; Check that there are no packets with two or more instructions, except
	; for the endloop packet.			; for the endloop packet.

	; This is the expected code:			; This is the expected code:
	;			;
	; p0 = cmp.gt(r3,#0)			; p0 = cmp.gt(r3,#0)
	; if (!p0) jump:nt .LBB0_3			; if (!p0) jump:nt .LBB0_3
	; loop0(.LBB0_2,r3)			; loop0(.LBB0_2,r3)
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/simplify64bitops_7223.ll

	; RUN: llc -march=hexagon -enable-pipeliner=false < %s \| FileCheck %s			; RUN: llc -march=hexagon -enable-pipeliner=false \
				; RUN: -enable-pipeliner-unroll=false < %s \| FileCheck %s

	; RUN: llc -march=hexagon -enable-pipeliner < %s			; RUN: llc -march=hexagon -enable-pipeliner < %s
	; REQUIRES: asserts			; REQUIRES: asserts

	; CHECK-NOT: and(			; CHECK-NOT: and(
	; CHECK-NOT: or(			; CHECK-NOT: or(
	; CHECK-NOT: combine(0			; CHECK-NOT: combine(0
	; CHECK: add			; CHECK: add
	; CHECK: add(			; CHECK: add(
	; CHECK-NEXT: memuh(			; CHECK-NEXT: memuh(
	; CHECK-NEXT: endloop			; CHECK-NEXT: endloop

	▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/swp-carried-1.ll

	; RUN: llc -march=hexagon -rdf-opt=0 -disable-hexagon-misched -hexagon-initial-cfg-cleanup=0 < %s \| FileCheck %s			; RUN: llc -march=hexagon -rdf-opt=0 -disable-hexagon-misched \
				; RUN: -enable-pipeliner-unroll=false -hexagon-initial-cfg-cleanup=0 \
				; RUN: < %s \| FileCheck %s

	; Test that we generate the correct code when a loop carried value			; Test that we generate the correct code when a loop carried value
	; is scheduled one stage earlier than it's use. The code in			; is scheduled one stage earlier than it's use. The code in
	; isLoopCarried was returning false in this case, and the generated			; isLoopCarried was returning false in this case, and the generated
	; code was missing an copy.			; code was missing an copy.

	; CHECK: loop0(.LBB0_[[LOOP:.]],			; CHECK: loop0(.LBB0_[[LOOP:.]],
	; CHECK: .LBB0_[[LOOP]]:			; CHECK: .LBB0_[[LOOP]]:
	▲ Show 20 Lines • Show All 53 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/swp-change-deps.ll

	; RUN: llc -march=hexagon -hexagon-initial-cfg-cleanup=0 < %s \| FileCheck %s			; RUN: llc -march=hexagon -enable-pipeliner-unroll=false \
				; RUN: -hexagon-initial-cfg-cleanup=0 < %s \| FileCheck %s

	; Test that we generate the correct offsets for loads in the prolog			; Test that we generate the correct offsets for loads in the prolog
	; after removing dependences on a post-increment instructions of the			; after removing dependences on a post-increment instructions of the
	; base register.			; base register.

	; CHECK: memh([[REG0:(r[0-9]+)]]+#0)			; CHECK: memh([[REG0:(r[0-9]+)]]+#0)
	; CHECK: memh([[REG0]]+#2)			; CHECK: memh([[REG0]]+#2)
	; CHECK: loop0			; CHECK: loop0
	▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/swp-epilog-numphis.ll

	; XFAIL: *			; XFAIL: *
	; Needs some fixed in the pipeliner.			; Needs some fixed in the pipeliner.
	; RUN: llc -march=hexagon < %s \| FileCheck %s			; RUN: llc -march=hexagon -enable-pipeliner-unroll=false < %s \| FileCheck %s

	; CHECK: endloop0			; CHECK: endloop0
	; CHECK: vmem			; CHECK: vmem
	; CHECK: vmem([[REG:r([0-9]+)]]+#1) =			; CHECK: vmem([[REG:r([0-9]+)]]+#1) =
	; CHECK: vmem([[REG]]+#0) =			; CHECK: vmem([[REG]]+#0) =

	define void @f0(i32 %a0) local_unnamed_addr #0 {			define void @f0(i32 %a0) local_unnamed_addr #0 {
	b0:			b0:
	▲ Show 20 Lines • Show All 73 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/swp-epilog-phi9.ll

	; RUN: llc -march=hexagon -hexagon-initial-cfg-cleanup=0 < %s \| FileCheck %s			; RUN: llc -march=hexagon -enable-pipeliner-unroll=false \
				; RUN: -hexagon-initial-cfg-cleanup=0 < %s \| FileCheck %s

	; Test that we generate the correct Phi name in the last couple of epilog			; Test that we generate the correct Phi name in the last couple of epilog
	; blocks, when there are 3 epilog blocks. The Phi was scheduled in stage			; blocks, when there are 3 epilog blocks. The Phi was scheduled in stage
	; 2, so the computation for the number of Phis needs to be adjusted when			; 2, so the computation for the number of Phis needs to be adjusted when
	; the incoming prolog block is from prolog 0 or prolog 1.			; the incoming prolog block is from prolog 0 or prolog 1.
	; Note: the pipeliner no longer generates a 3 stage pipeline for this test.			; Note: the pipeliner no longer generates a 3 stage pipeline for this test.

	; CHECK: loop0			; CHECK: loop0
	▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/swp-max.ll

	; RUN: llc -march=hexagon -mcpu=hexagonv5 -enable-pipeliner \			; RUN: llc -march=hexagon -mcpu=hexagonv5 -enable-pipeliner \
	; RUN: -pipeliner-max-stages=2 < %s \| FileCheck %s			; RUN: -pipeliner-max-stages=2 -enable-pipeliner-unroll=false \
				; RUN: < %s \| FileCheck %s

	@A = global [8 x i32] [i32 4, i32 -3, i32 5, i32 -2, i32 -1, i32 2, i32 6, i32 -2], align 8			@A = global [8 x i32] [i32 4, i32 -3, i32 5, i32 -2, i32 -1, i32 2, i32 6, i32 -2], align 8

	define i32 @test(i32 %Left, i32 %Right) {			define i32 @test(i32 %Left, i32 %Right) {
	entry:			entry:
	%add = add nsw i32 %Right, %Left			%add = add nsw i32 %Right, %Left
	%div = sdiv i32 %add, 2			%div = sdiv i32 %add, 2
	%cmp9 = icmp slt i32 %div, %Left			%cmp9 = icmp slt i32 %div, %Left
	Show All 32 Lines

test/CodeGen/Hexagon/swp-multi-loops.ll

	; RUN: llc -march=hexagon -mcpu=hexagonv5 -enable-pipeliner < %s \| FileCheck %s			; RUN: llc -march=hexagon -mcpu=hexagonv5 -enable-pipeliner \
				; RUN: -enable-pipeliner-unroll=false < %s \| FileCheck %s

	; Make sure we attempt to pipeline all inner most loops.			; Make sure we attempt to pipeline all inner most loops.

	; Check if the first loop is pipelined.			; Check if the first loop is pipelined.
	; CHECK: loop0(.LBB0_[[LOOP:.]],			; CHECK: loop0(.LBB0_[[LOOP:.]],
	; CHECK: .LBB0_[[LOOP]]:			; CHECK: .LBB0_[[LOOP]]:
	; CHECK: add(r{{[0-9]+}},r{{[0-9]+}})			; CHECK: add(r{{[0-9]+}},r{{[0-9]+}})
	; CHECK-NEXT: memw(r{{[0-9]+}}++#4)			; CHECK-NEXT: memw(r{{[0-9]+}}++#4)
	▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

test/CodeGen/Hexagon/swp-vsum.ll

	; RUN: llc -march=hexagon -mcpu=hexagonv5 -enable-pipeliner < %s \| FileCheck %s			; RUN: llc -march=hexagon -mcpu=hexagonv5 -enable-pipeliner \
	; RUN: llc -march=hexagon -mcpu=hexagonv60 -enable-pipeliner < %s \| FileCheck %s --check-prefix=CHECKV60			; RUN: -enable-pipeliner-unroll=false < %s \| FileCheck %s

				; RUN: llc -march=hexagon -mcpu=hexagonv5 -O3 -enable-pipeliner-unroll=false \
				; RUN: < %s \| FileCheck %s

				; RUN: llc -march=hexagon -mcpu=hexagonv60 -enable-pipeliner \
				; RUN: -enable-pipeliner-unroll=false < %s \| FileCheck %s --check-prefix=CHECKV60

	; Simple vector total.			; Simple vector total.
	; CHECK: loop0(.LBB0_[[LOOP:.]],			; CHECK: loop0(.LBB0_[[LOOP:.]],
	; CHECK: .LBB0_[[LOOP]]:			; CHECK: .LBB0_[[LOOP]]:
	; CHECK: add(r{{[0-9]+}},r{{[0-9]+}})			; CHECK: add(r{{[0-9]+}},r{{[0-9]+}})
	; CHECK-NEXT: memw(r{{[0-9]+}}++#4)			; CHECK-NEXT: memw(r{{[0-9]+}}++#4)
	; CHECK-NEXT: endloop0			; CHECK-NEXT: endloop0

	Show All 22 Lines

test/CodeGen/Hexagon/swp-xxh2.ll

	; RUN: llc -march=hexagon -enable-pipeliner -debug-only=pipeliner < %s -o - 2>&1 > /dev/null \| FileCheck %s			; RUN: llc -march=hexagon -enable-pipeliner -enable-pipeliner-unroll=false \
				; RUN: -debug-only=pipeliner < %s -o - 2>&1 > /dev/null \| FileCheck %s
	; REQUIRES: asserts			; REQUIRES: asserts

	; Fix bug when pipelining xxh benchmark at O3, mv55, and with vectorization.			; Fix bug when pipelining xxh benchmark at O3, mv55, and with vectorization.
	; The problem is choosing the correct name for the Phis in the epilog.			; The problem is choosing the correct name for the Phis in the epilog.

	; CHECK: New block			; CHECK: New block
	; CHECK: %{{.}}, %[[REG:([0-9]+)]]{{.}} = L2_loadri_pi			; CHECK: %{{.}}, %[[REG:([0-9]+)]]{{.}} = L2_loadri_pi
	; CHECK: epilog:			; CHECK: epilog:
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Implement machine unroller utility classNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 168841

include/llvm/CodeGen/MachineUnroller.h

include/llvm/CodeGen/TargetPassConfig.h

lib/CodeGen/CMakeLists.txt

lib/CodeGen/MachinePipeliner.cpp

lib/CodeGen/MachineUnroller.cpp

lib/Target/Hexagon/CMakeLists.txt

lib/Target/Hexagon/Hexagon.td

lib/Target/Hexagon/HexagonDepInstrInfo.td

lib/Target/Hexagon/HexagonMachineUnroller.h

lib/Target/Hexagon/HexagonMachineUnroller.cpp

lib/Target/Hexagon/HexagonTargetMachine.cpp

test/CodeGen/Hexagon/bit-gen-rseq.ll

test/CodeGen/Hexagon/hwloop4.ll

test/CodeGen/Hexagon/late_instr.ll

test/CodeGen/Hexagon/miunroll-optimize-memrefs1.ll

test/CodeGen/Hexagon/miunroll-optimize-memrefs2.ll

test/CodeGen/Hexagon/miunroll-update-memoperands.ll

test/CodeGen/Hexagon/miunroll-update-offset.ll

test/CodeGen/Hexagon/miunroll.ll

test/CodeGen/Hexagon/no-packets.ll

test/CodeGen/Hexagon/simplify64bitops_7223.ll

test/CodeGen/Hexagon/swp-carried-1.ll

test/CodeGen/Hexagon/swp-change-deps.ll

test/CodeGen/Hexagon/swp-epilog-numphis.ll

test/CodeGen/Hexagon/swp-epilog-phi9.ll

test/CodeGen/Hexagon/swp-max.ll

test/CodeGen/Hexagon/swp-multi-loops.ll

test/CodeGen/Hexagon/swp-vsum.ll

test/CodeGen/Hexagon/swp-xxh2.ll

Implement machine unroller utility class
Needs ReviewPublic