This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5.
ClosedPublic

Authored by jlebar on Mar 29 2016, 9:33 AM.

Download Raw Diff

Details

Reviewers

Commits

rGcd5fbea67e76: [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5.
rL266406: [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5.

Summary

Calls on NVPTX are unusually expensive (for one thing, lots of state
needs to be saved to memory, which is slow), so make the inlininer much
more aggressive.

Diff Detail

Event Timeline

jlebar updated this revision to Diff 51938.Mar 29 2016, 9:33 AM

jlebar retitled this revision from to [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5..

jlebar updated this object.

jlebar added a reviewer: chandlerc.

jlebar added subscribers: tra, llvm-commits.

Herald added a subscriber: jholewinski. · View Herald TranscriptMar 29 2016, 9:33 AM

The inlining threshold multiplier is now a fraction.

I ended up chatting with Hal about this and he made a really great point about this. I had been thinking that it is really brittle to have the target provided inlining threshold be an absolute number instead of a multiplier / ratio as you have it.

However, Hal pointed out that this creates a coupling that could also be problematic. Consider an out-of-tree target with a carefully tuned inlining threshold multiplier. If lots of targets do this, changing the threshold could become extremely problematic because small changes would would still disturb the target-specific tunings. His suggestion was to use an absolute threshold from the target in the absense of an explicitly specified command line flag.

Thinking about this more, I think it still presents the same problem. Consider a change to the inliner that significantly changes the rate at which we inline things. It might be useful to be able to adjust the threshold when making the change to keep most inlining decisions neutral across targets.

I'm not 100% sure which is the best mechanism here. Or this may point out a problem with any of these mechanisms.

One possible alternative is to not have the size-based inlining be target configurable, and to make this exclusively handled by the proposed runtime cost estimation based inlining when that goes in...

This revision now requires changes to proceed.Apr 11 2016, 12:10 AM

Oh ignore this. I got the wrong revision. I'll go replay this on the right revision. Sorry.

Consider an out-of-tree target with a carefully tuned inlining threshold multiplier. If lots of targets do this, changing the threshold could become extremely problematic because small changes would would still disturb the target-specific tunings.

My thought was, the inlining multiplier is such a coarse knob -- set atop rather coarse heuristics -- that tuning it particularly carefully for a target might not make much sense.

I don't know if that's true or not, but it's at least an argument one could make.

His suggestion was to use an absolute threshold from the target in the absense of an explicitly specified command line flag. Thinking about this more, I think it still presents the same problem. Consider a change to the inliner that significantly changes the rate at which we inline things. It might be useful to be able to adjust the threshold when making the change to keep most inlining decisions neutral across targets.

I guess this (and the previous one, to an extent) are a question of API stability guarantees. To the extent that we don't promise to have a stable API for out-of-tree targets, we could change the inliner and at the same time break compilation for any out-of-tree targets that set the threshold. They'll have to fix the error and recompute their inlining threshold.

That doesn't seem so unreasonable to me -- it's not like we'd be doing this once a month, or even every release. And as an out-of-tree target, you'd only be broken if you explicitly opted in to this tuning mechanism, which we could make clear comes with this possibility of future intentional breakage.

Oh ignore this. I got the wrong revision. I'll go replay this on the right revision. Sorry.

(Maybe this was posted on the wrong patch? Your comment seemed quite sane to me. :)

In D18561#397125, @jlebar wrote:

Oh ignore this. I got the wrong revision. I'll go replay this on the right revision. Sorry.

(Maybe this was posted on the wrong patch? Your comment seemed quite sane to me. :)

No, my comment should have been on D18560 with the general discussion. You might want to move your response there as well.

Closed by commit rL266406: [NVPTX] Set NVPTXTTI::getInliningThresholdMultiplier to 5. (authored by jlebar). · Explain WhyApr 14 2016, 6:44 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

NVPTX/

NVPTXTargetTransformInfo.h

2 lines

NVPTXTargetTransformInfo.cpp

7 lines

Diff 51945

lib/Target/NVPTX/NVPTXTargetTransformInfo.h

Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	public:
NVPTXTTIImpl(NVPTXTTIImpl &&Arg)		NVPTXTTIImpl(NVPTXTTIImpl &&Arg)
: BaseT(std::move(static_cast<BaseT &>(Arg))), ST(std::move(Arg.ST)),		: BaseT(std::move(static_cast<BaseT &>(Arg))), ST(std::move(Arg.ST)),
TLI(std::move(Arg.TLI)) {}		TLI(std::move(Arg.TLI)) {}

bool hasBranchDivergence() { return true; }		bool hasBranchDivergence() { return true; }

bool isSourceOfDivergence(const Value *V);		bool isSourceOfDivergence(const Value *V);

		std::pair<int, int> getInliningThresholdMultiplier(const Function *Caller);

int getArithmeticInstrCost(		int getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,		TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None);		TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None);

void getUnrollingPreferences(Loop *L, TTI::UnrollingPreferences &UP);		void getUnrollingPreferences(Loop *L, TTI::UnrollingPreferences &UP);
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp

Show All 40 Lines	switch (II->getIntrinsicID()) {
default: return false;		default: return false;
case Intrinsic::nvvm_atomic_load_add_f32:		case Intrinsic::nvvm_atomic_load_add_f32:
case Intrinsic::nvvm_atomic_load_inc_32:		case Intrinsic::nvvm_atomic_load_inc_32:
case Intrinsic::nvvm_atomic_load_dec_32:		case Intrinsic::nvvm_atomic_load_dec_32:
return true;		return true;
}		}
}		}

		std::pair<int, int>
		NVPTXTTIImpl::getInliningThresholdMultiplier(const Function * /* Caller */) {
		// Increase the inlining cost threshold by a factor of 5, reflecting that
		// calls are particularly expensive in NVPTX.
		return {5, 1};
		}

bool NVPTXTTIImpl::isSourceOfDivergence(const Value *V) {		bool NVPTXTTIImpl::isSourceOfDivergence(const Value *V) {
// Without inter-procedural analysis, we conservatively assume that arguments		// Without inter-procedural analysis, we conservatively assume that arguments
// to __device__ functions are divergent.		// to __device__ functions are divergent.
if (const Argument *Arg = dyn_cast<Argument>(V))		if (const Argument *Arg = dyn_cast<Argument>(V))
return !isKernelFunction(*Arg->getParent());		return !isKernelFunction(*Arg->getParent());

if (const Instruction *I = dyn_cast<Instruction>(V)) {		if (const Instruction *I = dyn_cast<Instruction>(V)) {
// Without pointer analysis, we conservatively assume values loaded from		// Without pointer analysis, we conservatively assume values loaded from
▲ Show 20 Lines • Show All 75 Lines • Show Last 20 Lines