This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Target/PowerPC/
-
lib/
-
Target/
-
PowerPC/
-
PPCTargetTransformInfo.h
1/1
PPCTargetTransformInfo.cpp

Differential D75790

[PowerPC] Fix compile time issue in recursive CTR analysis code
ClosedPublic

Authored by tejohnson on Mar 6 2020, 5:53 PM.

Download Raw Diff

Details

Reviewers

hfinkel
MaskRay

Group Reviewers

Restricted Project

Commits

rG8f5e3c74b678: [PowerPC] Fix compile time issue in recursive CTR analysis code

Summary

Avoid re-examining operands on recursive walk looking for CTR.
This was causing huge compile time after some earlier optimization
created a large expression.

The start of the expression (created by IndVarSimplify) looked like:

%469 = lshr i64 trunc (i128 xor (i128 udiv (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 trunc (i128 xor (i128 lshr (i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011), i128 64), i128 mul (i128 zext (i64 add (i64 ptrtoint (i8 @_ZN4absl13hash_internal13CityHashState5kSeedE to i64), i64 120) to i128), i128 8192506886679785011)) to i64), i64 45) to i128), ...

with the _ZN4absl13hash_internal13CityHashState5kSeedE referenced many times.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tejohnson created this revision.Mar 6 2020, 5:53 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 6 2020, 5:53 PM

Herald added subscribers: shchenz, jsji, kbarton and 2 others. · View Herald Transcript

MaskRay added a subscriber: MaskRay.Mar 6 2020, 6:36 PM

MaskRay added inline comments.

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
508	How about `SmallPtrSet<const Value *, 4>` (or 4 -> another integer)?

Herald added a subscriber: • wuzish. · View Herald TranscriptMar 6 2020, 6:36 PM

Harbormaster failed remote builds in B48421: Diff 248872!Mar 6 2020, 6:46 PM

tejohnson marked an inline comment as done.Mar 6 2020, 7:35 PM

Address comment

jsji added a reviewer: Restricted Project.Mar 6 2020, 7:52 PM

jsji added a project: Restricted Project.

Harbormaster failed remote builds in B48425: Diff 248880!Mar 6 2020, 8:22 PM

Generally looks good, but the title needs to be clarified. This piece of code is related to an optimization which uses CTR as the loop count register.

For context, D6786 introduced mightUseCTR. rL251582 made it recurse into the constant.

IIUC, with ThinLTO's ImportConstantsWithRefs optimization, a constant can be very large, and recursing into it for every instruction can make the compilation slow. Is that the case?

tejohnson retitled this revision from [PowerPC] Fix compile time issue to [PowerPC] Fix compile time issue in recursive CTR analysis code.Mar 10 2020, 1:44 PM

In D75790#1915590, @MaskRay wrote:

Generally looks good, but the title needs to be clarified. This piece of code is related to an optimization which uses CTR as the loop count register.

Updated the title.

For context, D6786 introduced mightUseCTR. rL251582 made it recurse into the constant.

IIUC, with ThinLTO's ImportConstantsWithRefs optimization, a constant can be very large, and recursing into it for every instruction can make the compilation slow. Is that the case?

No, the constant itself was quite simple. Importing it simply enabled some additional optimization (in IndVarSimplify) that created a huge instruction expression. It was iterating that expression, not the constant itself, that is so slow.

MaskRay accepted this revision.Mar 10 2020, 2:18 PM

This revision is now accepted and ready to land.Mar 10 2020, 2:18 PM

In D75790#1915671, @tejohnson wrote:

In D75790#1915590, @MaskRay wrote:

Generally looks good, but the title needs to be clarified. This piece of code is related to an optimization which uses CTR as the loop count register.

Updated the title.

For context, D6786 introduced mightUseCTR. rL251582 made it recurse into the constant.

IIUC, with ThinLTO's ImportConstantsWithRefs optimization, a constant can be very large, and recursing into it for every instruction can make the compilation slow. Is that the case?

No, the constant itself was quite simple. Importing it simply enabled some additional optimization (in IndVarSimplify) that created a huge instruction expression. It was iterating that expression, not the constant itself, that is so slow.

It'd be nice to give a (reduced) example in the description, even if testing it reliably may be unrealistic.

In D75790#1915719, @MaskRay wrote:

In D75790#1915671, @tejohnson wrote:

In D75790#1915590, @MaskRay wrote:

Generally looks good, but the title needs to be clarified. This piece of code is related to an optimization which uses CTR as the loop count register.

Updated the title.

For context, D6786 introduced mightUseCTR. rL251582 made it recurse into the constant.

IIUC, with ThinLTO's ImportConstantsWithRefs optimization, a constant can be very large, and recursing into it for every instruction can make the compilation slow. Is that the case?

No, the constant itself was quite simple. Importing it simply enabled some additional optimization (in IndVarSimplify) that created a huge instruction expression. It was iterating that expression, not the constant itself, that is so slow.

It'd be nice to give a (reduced) example in the description, even if testing it reliably may be unrealistic.

I showed the start of the expression that causes the analysis to re-analyze the same operand many times.

Closed by commit rG8f5e3c74b678: [PowerPC] Fix compile time issue in recursive CTR analysis code (authored by tejohnson). · Explain WhyMar 11 2020, 4:30 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

PowerPC/

PPCTargetTransformInfo.h

3 lines

PPCTargetTransformInfo.cpp

14 lines

Diff 249796

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h

Show All 27 Lines	class PPCTTIImpl : public BasicTTIImplBase<PPCTTIImpl> {
typedef TargetTransformInfo TTI;		typedef TargetTransformInfo TTI;
friend BaseT;		friend BaseT;

const PPCSubtarget *ST;		const PPCSubtarget *ST;
const PPCTargetLowering *TLI;		const PPCTargetLowering *TLI;

const PPCSubtarget *getST() const { return ST; }		const PPCSubtarget *getST() const { return ST; }
const PPCTargetLowering *getTLI() const { return TLI; }		const PPCTargetLowering *getTLI() const { return TLI; }
bool mightUseCTR(BasicBlock BB, TargetLibraryInfo LibInfo);		bool mightUseCTR(BasicBlock BB, TargetLibraryInfo LibInfo,
		SmallPtrSetImpl<const Value *> &Visited);

public:		public:
explicit PPCTTIImpl(const PPCTargetMachine *TM, const Function &F)		explicit PPCTTIImpl(const PPCTargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),		: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
TLI(ST->getTargetLowering()) {}		TLI(ST->getTargetLowering()) {}

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{
▲ Show 20 Lines • Show All 82 Lines • Show Last 20 Lines

llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp

Show First 20 Lines • Show All 211 Lines • ▼ Show 20 Lines	if (U->getType()->isVectorTy()) {
// Instructions that need to be split should cost more.		// Instructions that need to be split should cost more.
std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, U->getType());		std::pair<int, MVT> LT = TLI->getTypeLegalizationCost(DL, U->getType());
return LT.first * BaseT::getUserCost(U, Operands);		return LT.first * BaseT::getUserCost(U, Operands);
}		}

return BaseT::getUserCost(U, Operands);		return BaseT::getUserCost(U, Operands);
}		}

bool PPCTTIImpl::mightUseCTR(BasicBlock *BB,		bool PPCTTIImpl::mightUseCTR(BasicBlock BB, TargetLibraryInfo LibInfo,
TargetLibraryInfo *LibInfo) {		SmallPtrSetImpl<const Value *> &Visited) {
const PPCTargetMachine &TM = ST->getTargetMachine();		const PPCTargetMachine &TM = ST->getTargetMachine();

// Loop through the inline asm constraints and look for something that		// Loop through the inline asm constraints and look for something that
// clobbers ctr.		// clobbers ctr.
auto asmClobbersCTR = [](InlineAsm *IA) {		auto asmClobbersCTR = [](InlineAsm *IA) {
InlineAsm::ConstraintInfoVector CIV = IA->ParseConstraints();		InlineAsm::ConstraintInfoVector CIV = IA->ParseConstraints();
for (unsigned i = 0, ie = CIV.size(); i < ie; ++i) {		for (unsigned i = 0, ie = CIV.size(); i < ie; ++i) {
InlineAsm::ConstraintInfo &C = CIV[i];		InlineAsm::ConstraintInfo &C = CIV[i];
if (C.Type != InlineAsm::isInput)		if (C.Type != InlineAsm::isInput)
for (unsigned j = 0, je = C.Codes.size(); j < je; ++j)		for (unsigned j = 0, je = C.Codes.size(); j < je; ++j)
if (StringRef(C.Codes[j]).equals_lower("{ctr}"))		if (StringRef(C.Codes[j]).equals_lower("{ctr}"))
return true;		return true;
}		}
return false;		return false;
};		};

// Determining the address of a TLS variable results in a function call in		// Determining the address of a TLS variable results in a function call in
// certain TLS models.		// certain TLS models.
std::function<bool(const Value*)> memAddrUsesCTR =		std::function<bool(const Value *)> memAddrUsesCTR =
[&memAddrUsesCTR, &TM](const Value *MemAddr) -> bool {		[&memAddrUsesCTR, &TM, &Visited](const Value *MemAddr) -> bool {
		// No need to traverse again if we already checked this operand.
		if (!Visited.insert(MemAddr).second)
		return false;
const auto *GV = dyn_cast<GlobalValue>(MemAddr);		const auto *GV = dyn_cast<GlobalValue>(MemAddr);
if (!GV) {		if (!GV) {
// Recurse to check for constants that refer to TLS global variables.		// Recurse to check for constants that refer to TLS global variables.
if (const auto *CV = dyn_cast<Constant>(MemAddr))		if (const auto *CV = dyn_cast<Constant>(MemAddr))
for (const auto &CO : CV->operands())		for (const auto &CO : CV->operands())
if (memAddrUsesCTR(CO))		if (memAddrUsesCTR(CO))
return true;		return true;

▲ Show 20 Lines • Show All 247 Lines • ▼ Show 20 Lines	for (BasicBlock *BB : L->blocks())
Metrics.analyzeBasicBlock(BB, *this, EphValues);		Metrics.analyzeBasicBlock(BB, *this, EphValues);
// 6 is an approximate latency for the mtctr instruction.		// 6 is an approximate latency for the mtctr instruction.
if (Metrics.NumInsts <= (6 * SchedModel.getIssueWidth()))		if (Metrics.NumInsts <= (6 * SchedModel.getIssueWidth()))
return false;		return false;
}		}

// We don't want to spill/restore the counter register, and so we don't		// We don't want to spill/restore the counter register, and so we don't
// want to use the counter register if the loop contains calls.		// want to use the counter register if the loop contains calls.
		SmallPtrSet<const Value *, 4> Visited;
		MaskRayUnsubmitted Done Reply Inline Actions How about `SmallPtrSet<const Value , 4>` (or 4 -> another integer)? MaskRay:* How about `SmallPtrSet<const Value *, 4>` (or 4 -> another integer)?
for (Loop::block_iterator I = L->block_begin(), IE = L->block_end();		for (Loop::block_iterator I = L->block_begin(), IE = L->block_end();
I != IE; ++I)		I != IE; ++I)
if (mightUseCTR(*I, LibInfo))		if (mightUseCTR(*I, LibInfo, Visited))
return false;		return false;

SmallVector<BasicBlock*, 4> ExitingBlocks;		SmallVector<BasicBlock*, 4> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);		L->getExitingBlocks(ExitingBlocks);

// If there is an exit edge known to be frequently taken,		// If there is an exit edge known to be frequently taken,
// we should not transform this loop.		// we should not transform this loop.
for (auto &BB : ExitingBlocks) {		for (auto &BB : ExitingBlocks) {
▲ Show 20 Lines • Show All 475 Lines • Show Last 20 Lines