This is an archive of the discontinued LLVM Phabricator instance.

[RFC] [TargetTransformInfo] Introduce isRegisterRich, it returns true if the target architecture is register-rich.
Needs ReviewPublic

Authored by etherzhhb on Jan 17 2018, 11:13 AM.

Download Raw Diff

Details

Reviewers

sepavloff
hfinkel
efriedma
• dberlin
silvas

Summary

Currently in the llvm middle-end, we disable some optimizations because we worry about the register pressure, (e.g. GVNHoist and ArgumentPromotion).

However, in the architecture that are register-rich, e.g. FPGAs, we do not need to worry about the register pressure at all. For these architecures, we may want to optimization the LLVM IR without worrying about the register pressure.

I suggest that we introduce a hook in the TargetTransformInfo to tell if the current target architecture is register-rich or not. With this hook, we can enable the optimizations that increase register pressure in case the current architecture is register-rich.

One problem for introducing this hook: we are not able to test it (in the public buildbot) without an register-rich target architecture in LLVM trunk, and unfortunately, we do not have one today.

To address this problem, I propose to add a command line option to enable the register-rich mode such that we can test the corresponding code path as well.

Diff Detail

Repository: rL LLVM

Event Timeline

etherzhhb created this revision.Jan 17 2018, 11:13 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 17 2018, 11:13 AM

etherzhhb edited the summary of this revision. (Show Details)Jan 17 2018, 11:44 AM

etherzhhb added reviewers: silvas, sepavloff, hfinkel, efriedma.

Herald added a reviewer: • dberlin. · View Herald TranscriptJan 17 2018, 11:44 AM

There is already a function getNumberOfRegisters. Would it be enough to return some large number from it?

Implement isRegisterRich based on getNumberOfRegisters, and introduce a threshold to tell when the number of registers if big enough to be considered as "register-rich". Also include the usage of the isRegisterRich function in GVNHoist

In D42191#979132, @kparzysz wrote:

There is already a function getNumberOfRegisters. Would it be enough to return some large number from it?

Yes, I think getNumberOfRegisters is sufficient. But instead of comparing getNumberOfRegisters with a large number in every passes, maybe we could introduce a function in TargetTransformInfo to do the comparison?

In part, it looks like you're looking to modify heuristics that prevent increased spilling around calls. I can imagine that, in general, architectures with lots of registers suffer from less of that, but that's really a statement about the ABI of the call, not the architecture itself.

Also, are you targeting FPGAs? Maybe this is better phrased in terms of architectures that don't have physical registers (we already have backend targets that have only virtual registers), because lacking physical registers, I can imagine register pressure being much less concerning. Arbitrary hoisting might not be great for FPGA HLS either, because of increased routing pressure and the like, but if we imagine that the backend can sink when useful, then maybe we're just making a statement about being reliant on that?

lib/Transforms/Scalar/GVNHoist.cpp
1045	Update this comment?
1101	Would you want to modify this check too?
1127	Is this comment accurate? I don't see why this is talking about calls here when calls are handled in the block above.

In D42191#979308, @hfinkel wrote:

In part, it looks like you're looking to modify heuristics that prevent increased spilling around calls. I can imagine that, in general, architectures with lots of registers suffer from less of that, but that's really a statement about the ABI of the call, not the architecture itself.

The original motivation is about hoisting the GEP, in general I want to GVN the gep like:

bb0:
idx0 = i + 2
g0 = gep A, idx0
br bb2

bb1:
idx1 = i + 2
g1 = gep A, idx1
br bb2

bb2:
g = phi [g0, bb0]; [g1, bb1]

In this example, g1 and g2 is not GVNed because our concern about the register pressure (or address mode?).

Also, are you targeting FPGAs?

Yes

Maybe this is better phrased in terms of architectures that don't have physical registers (we already have backend targets that have only virtual registers), because lacking physical registers, I can imagine register pressure being much less concerning.
Arbitrary hoisting might not be great for FPGA HLS either, because of increased routing pressure and the like,

Not sure the place and route pressure, but too much registers may still have a negative impact on FPGAs.

but if we imagine that the backend can sink when useful, then maybe we're just making a statement about being reliant on that?

Yes.

I am thinking maybe some of the LLVM IR passes could provide a mode that run without the concern of physical constraints like register pressure, and simplify the dataflow of the LLVM IR as much as possible.

I am trying to identify the actual physical constraints that prevents simplification like GVN from happening, and try to make these constraints more explicit and provide a hook to relax these constraints if possible.

lib/Transforms/Scalar/GVNHoist.cpp
1045	Yes
1101	yes
1127	I guess it is about the address mode since it is related to GEP

Address Hal's comment

Fix typo

etherzhhb marked 4 inline comments as done.Jan 18 2018, 2:32 PM

silvas resigned from this revision.Mar 25 2020, 6:25 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfo.h

7 lines

TargetTransformInfoImpl.h

2 lines

lib/

Transforms/

Scalar/

GVNHoist.cpp

19 lines

Diff 130263

include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 645 Lines • ▼ Show 20 Lines	public:
/// \brief Additional properties of an operand's values.		/// \brief Additional properties of an operand's values.
enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };		enum OperandValueProperties { OP_None = 0, OP_PowerOf2 = 1 };

/// \return The number of scalar or vector registers that the target has.		/// \return The number of scalar or vector registers that the target has.
/// If 'Vectors' is true, it returns the number of vector registers. If it is		/// If 'Vectors' is true, it returns the number of vector registers. If it is
/// set to false, it returns the number of scalar registers.		/// set to false, it returns the number of scalar registers.
unsigned getNumberOfRegisters(bool Vector) const;		unsigned getNumberOfRegisters(bool Vector) const;

		static const unsigned RegisterRichThreshold = 8192;

		/// \brief Return true if the target architecture is register-rich
		bool isRegisterRich() const {
		return getNumberOfRegisters(false) > RegisterRichThreshold;
		}

/// \return The width of the largest scalar or vector register type.		/// \return The width of the largest scalar or vector register type.
unsigned getRegisterBitWidth(bool Vector) const;		unsigned getRegisterBitWidth(bool Vector) const;

/// \return The width of the smallest vector register type.		/// \return The width of the smallest vector register type.
unsigned getMinVectorRegisterBitWidth() const;		unsigned getMinVectorRegisterBitWidth() const;

/// \return True if it should be considered for address type promotion.		/// \return True if it should be considered for address type promotion.
/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is		/// \p AllowPromotionWithoutCommonHeader Set true if promoting \p I is
▲ Show 20 Lines • Show All 926 Lines • Show Last 20 Lines

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	unsigned getIntrinsicCost(Intrinsic::ID IID, Type *RetTy,
case Intrinsic::coro_suspend:		case Intrinsic::coro_suspend:
case Intrinsic::coro_param:		case Intrinsic::coro_param:
case Intrinsic::coro_subfn_addr:		case Intrinsic::coro_subfn_addr:
// These intrinsics don't actually represent code after lowering.		// These intrinsics don't actually represent code after lowering.
return TTI::TCC_Free;		return TTI::TCC_Free;
}		}
}		}

		bool isRegisterRich() { return false; }

bool hasBranchDivergence() { return false; }		bool hasBranchDivergence() { return false; }

bool isSourceOfDivergence(const Value *V) { return false; }		bool isSourceOfDivergence(const Value *V) { return false; }

bool isAlwaysUniform(const Value *V) { return false; }		bool isAlwaysUniform(const Value *V) { return false; }

unsigned getFlatAddressSpace () {		unsigned getFlatAddressSpace () {
return -1;		return -1;
▲ Show 20 Lines • Show All 642 Lines • Show Last 20 Lines

lib/Transforms/Scalar/GVNHoist.cpp

Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Analysis/AliasAnalysis.h"		#include "llvm/Analysis/AliasAnalysis.h"
#include "llvm/Analysis/GlobalsModRef.h"		#include "llvm/Analysis/GlobalsModRef.h"
#include "llvm/Analysis/IteratedDominanceFrontier.h"		#include "llvm/Analysis/IteratedDominanceFrontier.h"
#include "llvm/Analysis/MemoryDependenceAnalysis.h"		#include "llvm/Analysis/MemoryDependenceAnalysis.h"
#include "llvm/Analysis/MemorySSA.h"		#include "llvm/Analysis/MemorySSA.h"
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/PostDominators.h"		#include "llvm/Analysis/PostDominators.h"
		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Argument.h"		#include "llvm/IR/Argument.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines
}		}

// This pass hoists common computations across branches sharing common		// This pass hoists common computations across branches sharing common
// dominator. The primary goal is to reduce the code size, and in some		// dominator. The primary goal is to reduce the code size, and in some
// cases reduce critical path (by exposing more ILP).		// cases reduce critical path (by exposing more ILP).
class GVNHoist {		class GVNHoist {
public:		public:
GVNHoist(DominatorTree DT, PostDominatorTree PDT, AliasAnalysis *AA,		GVNHoist(DominatorTree DT, PostDominatorTree PDT, AliasAnalysis *AA,
MemoryDependenceResults MD, MemorySSA MSSA)		MemoryDependenceResults MD, MemorySSA MSSA,
: DT(DT), PDT(PDT), AA(AA), MD(MD), MSSA(MSSA),		TargetTransformInfo &TTI)
		: DT(DT), PDT(PDT), AA(AA), MD(MD), MSSA(MSSA), TTI(TTI),
MSSAUpdater(llvm::make_unique<MemorySSAUpdater>(MSSA)) {}		MSSAUpdater(llvm::make_unique<MemorySSAUpdater>(MSSA)) {}

bool run(Function &F) {		bool run(Function &F) {
NumFuncArgs = F.arg_size();		NumFuncArgs = F.arg_size();
VN.setDomTree(DT);		VN.setDomTree(DT);
VN.setAliasAnalysis(AA);		VN.setAliasAnalysis(AA);
VN.setMemDep(MD);		VN.setMemDep(MD);
bool Res = false;		bool Res = false;
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

private:		private:
GVN::ValueTable VN;		GVN::ValueTable VN;
DominatorTree *DT;		DominatorTree *DT;
PostDominatorTree *PDT;		PostDominatorTree *PDT;
AliasAnalysis *AA;		AliasAnalysis *AA;
MemoryDependenceResults *MD;		MemoryDependenceResults *MD;
MemorySSA *MSSA;		MemorySSA *MSSA;
		TargetTransformInfo &TTI;
std::unique_ptr<MemorySSAUpdater> MSSAUpdater;		std::unique_ptr<MemorySSAUpdater> MSSAUpdater;
DenseMap<const Value *, unsigned> DFSNumber;		DenseMap<const Value *, unsigned> DFSNumber;
BBSideEffectsSet BBSideEffects;		BBSideEffectsSet BBSideEffects;
DenseSet<const BasicBlock *> HoistBarrier;		DenseSet<const BasicBlock *> HoistBarrier;
SmallVector<BasicBlock *, 32> IDFBlocks;		SmallVector<BasicBlock *, 32> IDFBlocks;
unsigned NumFuncArgs;		unsigned NumFuncArgs;
const bool HoistingGeps = false;

enum InsKind { Unknown, Scalar, Load, Store };		enum InsKind { Unknown, Scalar, Load, Store };

// Return true when there are exception handling in BB.		// Return true when there are exception handling in BB.
bool hasEH(const BasicBlock *BB) {		bool hasEH(const BasicBlock *BB) {
auto It = BBSideEffects.find(BB);		auto It = BBSideEffects.find(BB);
if (It != BBSideEffects.end())		if (It != BBSideEffects.end())
return It->second;		return It->second;
▲ Show 20 Lines • Show All 684 Lines • ▼ Show 20 Lines	for (const HoistingPointInfo &HP : HPL) {
// When we do not find Repl in HoistPt, select the first in the list		// When we do not find Repl in HoistPt, select the first in the list
// and move it to HoistPt.		// and move it to HoistPt.
Repl = InstructionsToHoist.front();		Repl = InstructionsToHoist.front();

// We can move Repl in HoistPt only when all operands are available.		// We can move Repl in HoistPt only when all operands are available.
// The order in which hoistings are done may influence the availability		// The order in which hoistings are done may influence the availability
// of operands.		// of operands.
if (!allOperandsAvailable(Repl, DestBB)) {		if (!allOperandsAvailable(Repl, DestBB)) {
// When HoistingGeps there is nothing more we can do to make the		// When HoistingGeps there is nothing more we can do to make the
		hfinkelUnsubmitted Done Reply Inline Actions Update this comment? hfinkel: Update this comment?
		etherzhhbAuthorUnsubmitted Done Reply Inline Actions Yes etherzhhb: Yes
// operands available: just continue.		// operands available: just continue.
if (HoistingGeps)		if (TTI.isRegisterRich())
continue;		continue;

// When not HoistingGeps we need to copy the GEPs.		// When not HoistingGeps we need to copy the GEPs.
if (!makeGepOperandsAvailable(Repl, DestBB, InstructionsToHoist))		if (!makeGepOperandsAvailable(Repl, DestBB, InstructionsToHoist))
continue;		continue;
}		}

// Move the instruction at the end of HoistPt.		// Move the instruction at the end of HoistPt.
Show All 37 Lines	for (BasicBlock *BB : depth_first(&F.getEntryBlock())) {
// If I1 cannot guarantee progress, subsequent instructions		// If I1 cannot guarantee progress, subsequent instructions
// in BB cannot be hoisted anyways.		// in BB cannot be hoisted anyways.
if (!isGuaranteedToTransferExecutionToSuccessor(&I1)) {		if (!isGuaranteedToTransferExecutionToSuccessor(&I1)) {
HoistBarrier.insert(BB);		HoistBarrier.insert(BB);
break;		break;
}		}
// Only hoist the first instructions in BB up to MaxDepthInBB. Hoisting		// Only hoist the first instructions in BB up to MaxDepthInBB. Hoisting
// deeper may increase the register pressure and compilation time.		// deeper may increase the register pressure and compilation time.
if (MaxDepthInBB != -1 && InstructionNb++ >= MaxDepthInBB)		if (MaxDepthInBB != -1 && InstructionNb++ >= MaxDepthInBB)
		hfinkelUnsubmitted Done Reply Inline Actions Would you want to modify this check too? hfinkel: Would you want to modify this check too?
		etherzhhbAuthorUnsubmitted Done Reply Inline Actions yes etherzhhb: yes
break;		break;

// Do not value number terminator instructions.		// Do not value number terminator instructions.
if (isa<TerminatorInst>(&I1))		if (isa<TerminatorInst>(&I1))
break;		break;

if (auto *Load = dyn_cast<LoadInst>(&I1))		if (auto *Load = dyn_cast<LoadInst>(&I1))
LI.insert(Load, VN);		LI.insert(Load, VN);
else if (auto *Store = dyn_cast<StoreInst>(&I1))		else if (auto *Store = dyn_cast<StoreInst>(&I1))
SI.insert(Store, VN);		SI.insert(Store, VN);
else if (auto *Call = dyn_cast<CallInst>(&I1)) {		else if (auto *Call = dyn_cast<CallInst>(&I1)) {
if (auto *Intr = dyn_cast<IntrinsicInst>(Call)) {		if (auto *Intr = dyn_cast<IntrinsicInst>(Call)) {
if (isa<DbgInfoIntrinsic>(Intr) \|\|		if (isa<DbgInfoIntrinsic>(Intr) \|\|
Intr->getIntrinsicID() == Intrinsic::assume \|\|		Intr->getIntrinsicID() == Intrinsic::assume \|\|
Intr->getIntrinsicID() == Intrinsic::sideeffect)		Intr->getIntrinsicID() == Intrinsic::sideeffect)
continue;		continue;
}		}
if (Call->mayHaveSideEffects())		if (Call->mayHaveSideEffects())
break;		break;

if (Call->isConvergent())		if (Call->isConvergent())
break;		break;

CI.insert(Call, VN);		CI.insert(Call, VN);
} else if (HoistingGeps \|\| !isa<GetElementPtrInst>(&I1))		} else if (TTI.isRegisterRich() \|\| !isa<GetElementPtrInst>(&I1))
// Do not hoist scalars past calls that may write to memory because		// Do not hoist scalars past calls that may write to memory because
		hfinkelUnsubmitted Done Reply Inline Actions Is this comment accurate? I don't see why this is talking about calls here when calls are handled in the block above. hfinkel: Is this comment accurate? I don't see why this is talking about calls here when calls are…
		etherzhhbAuthorUnsubmitted Done Reply Inline Actions I guess it is about the address mode since it is related to GEP etherzhhb: I guess it is about the address mode since it is related to GEP
// that could result in spills later. geps are handled separately.		// that could result in spills later. geps are handled separately.
// TODO: We can relax this for targets like AArch64 as they have more		// TODO: We can relax this for targets like AArch64 as they have more
// registers than X86.		// registers than X86.
II.insert(&I1, VN);		II.insert(&I1, VN);
}		}
}		}

HoistingPointList HPL;		HoistingPointList HPL;
Show All 18 Lines	public:
bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (skipFunction(F))		if (skipFunction(F))
return false;		return false;
auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();		auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();		auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();		auto &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
auto &MD = getAnalysis<MemoryDependenceWrapperPass>().getMemDep();		auto &MD = getAnalysis<MemoryDependenceWrapperPass>().getMemDep();
auto &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();		auto &MSSA = getAnalysis<MemorySSAWrapperPass>().getMSSA();
		auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

GVNHoist G(&DT, &PDT, &AA, &MD, &MSSA);		GVNHoist G(&DT, &PDT, &AA, &MD, &MSSA, TTI);
return G.run(F);		return G.run(F);
}		}

void getAnalysisUsage(AnalysisUsage &AU) const override {		void getAnalysisUsage(AnalysisUsage &AU) const override {
		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addRequired<DominatorTreeWrapperPass>();		AU.addRequired<DominatorTreeWrapperPass>();
AU.addRequired<PostDominatorTreeWrapperPass>();		AU.addRequired<PostDominatorTreeWrapperPass>();
AU.addRequired<AAResultsWrapperPass>();		AU.addRequired<AAResultsWrapperPass>();
AU.addRequired<MemoryDependenceWrapperPass>();		AU.addRequired<MemoryDependenceWrapperPass>();
AU.addRequired<MemorySSAWrapperPass>();		AU.addRequired<MemorySSAWrapperPass>();
AU.addPreserved<DominatorTreeWrapperPass>();		AU.addPreserved<DominatorTreeWrapperPass>();
AU.addPreserved<MemorySSAWrapperPass>();		AU.addPreserved<MemorySSAWrapperPass>();
AU.addPreserved<GlobalsAAWrapperPass>();		AU.addPreserved<GlobalsAAWrapperPass>();
}		}
};		};

} // end namespace llvm		} // end namespace llvm

PreservedAnalyses GVNHoistPass::run(Function &F, FunctionAnalysisManager &AM) {		PreservedAnalyses GVNHoistPass::run(Function &F, FunctionAnalysisManager &AM) {
DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);		DominatorTree &DT = AM.getResult<DominatorTreeAnalysis>(F);
PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);		PostDominatorTree &PDT = AM.getResult<PostDominatorTreeAnalysis>(F);
AliasAnalysis &AA = AM.getResult<AAManager>(F);		AliasAnalysis &AA = AM.getResult<AAManager>(F);
MemoryDependenceResults &MD = AM.getResult<MemoryDependenceAnalysis>(F);		MemoryDependenceResults &MD = AM.getResult<MemoryDependenceAnalysis>(F);
MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();		MemorySSA &MSSA = AM.getResult<MemorySSAAnalysis>(F).getMSSA();
GVNHoist G(&DT, &PDT, &AA, &MD, &MSSA);		TargetTransformInfo &TTI = AM.getResult<TargetIRAnalysis>(F);
		GVNHoist G(&DT, &PDT, &AA, &MD, &MSSA, TTI);
if (!G.run(F))		if (!G.run(F))
return PreservedAnalyses::all();		return PreservedAnalyses::all();

PreservedAnalyses PA;		PreservedAnalyses PA;
PA.preserve<DominatorTreeAnalysis>();		PA.preserve<DominatorTreeAnalysis>();
PA.preserve<MemorySSAAnalysis>();		PA.preserve<MemorySSAAnalysis>();
PA.preserve<GlobalsAA>();		PA.preserve<GlobalsAA>();
return PA;		return PA;
Show All 14 Lines