This is an archive of the discontinued LLVM Phabricator instance.

TTI: Use a better default for areInlineCompatibl
AbandonedPublic

Authored by arsenm on Aug 7 2017, 8:06 AM.

Download Raw Diff

Details

Reviewers

Summary

AArch64 and X86 do the exact same thing for this.
Rather than requiring the target-cpu exactly matches,
if target info is available check subtarget features.

Fixes not inlining OpenCL library functions on AMDGPU,
which don't have an explicitly set target-cpu.

Diff Detail

Event Timeline

arsenm created this revision.Aug 7 2017, 8:06 AM

Herald added subscribers: kristof.beyls, eraman, Anastasia and 5 others. · View Herald TranscriptAug 7 2017, 8:06 AM

fhahn added a subscriber: fhahn.Aug 7 2017, 8:19 AM

fhahn added inline comments.

include/llvm/CodeGen/BasicTTIImpl.h
1092	I think this implementation is not conservative enough for some targets. For example, the ARM backend has a more conservative implementation https://github.com/llvm-mirror/llvm/blob/master/lib/Target/ARM/ARMTargetTransformInfo.cpp#L18 Some target-features in the ARM backend have impact on the generated code (e.g. thumb-mode). So if we allow inlining for all subsets, we could change the "thumbness" of the inline function. Other backends might have similar tricky target-features to deal with.

I also think this patch is too optimistic. Not comparing the CPU name/feature attributes seems daring but doable to me. But I assuming all features are additive so we just need to check whether the callees flags are a subset of the callers flags seems too optimistic to me.

This revision now requires changes to proceed.Aug 15 2017, 11:14 AM

sdardis added a subscriber: sdardis.Aug 21 2017, 2:19 AM

arsenm abandoned this revision.Aug 28 2017, 4:06 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

TargetTransformInfoImpl.h

1 line

CodeGen/

BasicTTIImpl.h

18 lines

lib/

Target/

AArch64/

AArch64TargetTransformInfo.h

3 lines

AArch64TargetTransformInfo.cpp

14 lines

X86/

X86TargetTransformInfo.h

2 lines

X86TargetTransformInfo.cpp

16 lines

test/

Transforms/

Inline/

AMDGPU/

inline-target-cpu.ll

49 lines

lit.local.cfg

2 lines

Diff 109999

include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 460 Lines • ▼ Show 20 Lines	void getMemcpyLoopResidualLoweringType(SmallVectorImpl<Type *> &OpsOut,
unsigned SrcAlign,		unsigned SrcAlign,
unsigned DestAlign) const {		unsigned DestAlign) const {
for (unsigned i = 0; i != RemainingBytes; ++i)		for (unsigned i = 0; i != RemainingBytes; ++i)
OpsOut.push_back(Type::getInt8Ty(Context));		OpsOut.push_back(Type::getInt8Ty(Context));
}		}

bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const {		const Function *Callee) const {
		// If there is no target machine, be very conservative.
return (Caller->getFnAttribute("target-cpu") ==		return (Caller->getFnAttribute("target-cpu") ==
Callee->getFnAttribute("target-cpu")) &&		Callee->getFnAttribute("target-cpu")) &&
(Caller->getFnAttribute("target-features") ==		(Caller->getFnAttribute("target-features") ==
Callee->getFnAttribute("target-features"));		Callee->getFnAttribute("target-features"));
}		}

unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) const { return 128; }		unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) const { return 128; }

▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

include/llvm/CodeGen/BasicTTIImpl.h

	Show First 20 Lines • Show All 1,083 Lines • ▼ Show 20 Lines
	/// \param F Called function, might be nullptr.			/// \param F Called function, might be nullptr.
	/// \param RetTy Return value types.			/// \param RetTy Return value types.
	/// \param Tys Argument types.			/// \param Tys Argument types.
	/// \returns The cost of Call instruction.			/// \returns The cost of Call instruction.
	unsigned getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys) {			unsigned getCallInstrCost(Function F, Type RetTy, ArrayRef<Type *> Tys) {
	return 10;			return 10;
	}			}

				bool areInlineCompatible(const Function *Caller,
				fhahnUnsubmitted Not Done Reply Inline Actions I think this implementation is not conservative enough for some targets. For example, the ARM backend has a more conservative implementation https://github.com/llvm-mirror/llvm/blob/master/lib/Target/ARM/ARMTargetTransformInfo.cpp#L18 Some target-features in the ARM backend have impact on the generated code (e.g. thumb-mode). So if we allow inlining for all subsets, we could change the "thumbness" of the inline function. Other backends might have similar tricky target-features to deal with. fhahn: I think this implementation is not conservative enough for some targets. For example, the ARM…
				const Function *Callee) const {
				const TargetMachine &TM = getTLI()->getTargetMachine();

				const FeatureBitset &CallerBits =
				TM.getSubtargetImpl(*Caller)->getFeatureBits();
				const FeatureBitset &CalleeBits =
				TM.getSubtargetImpl(*Callee)->getFeatureBits();

				// Inline a callee if its target-features are a subset of the callers
				// target-features.
				//
				// Targets can override if this is too limiting by including subtarget
				// features that we might not care about for inlining, but it is
				// conservatively correct.
				return (CallerBits & CalleeBits) == CalleeBits;
				}

	unsigned getNumberOfParts(Type *Tp) {			unsigned getNumberOfParts(Type *Tp) {
	std::pair<unsigned, MVT> LT = getTLI()->getTypeLegalizationCost(DL, Tp);			std::pair<unsigned, MVT> LT = getTLI()->getTypeLegalizationCost(DL, Tp);
	return LT.first;			return LT.first;
	}			}

	unsigned getAddressComputationCost(Type Ty, ScalarEvolution ,			unsigned getAddressComputationCost(Type Ty, ScalarEvolution ,
	const SCEV *) {			const SCEV *) {
	return 0;			return 0;
	▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	class AArch64TTIImpl : public BasicTTIImplBase<AArch64TTIImpl> {
bool isWideningInstruction(Type *Ty, unsigned Opcode,		bool isWideningInstruction(Type *Ty, unsigned Opcode,
ArrayRef<const Value *> Args);		ArrayRef<const Value *> Args);

public:		public:
explicit AArch64TTIImpl(const AArch64TargetMachine *TM, const Function &F)		explicit AArch64TTIImpl(const AArch64TargetMachine *TM, const Function &F)
: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),		: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
TLI(ST->getTargetLowering()) {}		TLI(ST->getTargetLowering()) {}

bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;

/// \name Scalar TTI Implementations		/// \name Scalar TTI Implementations
/// @{		/// @{

using BaseT::getIntImmCost;		using BaseT::getIntImmCost;
int getIntImmCost(int64_t Val);		int getIntImmCost(int64_t Val);
int getIntImmCost(const APInt &Imm, Type *Ty);		int getIntImmCost(const APInt &Imm, Type *Ty);
int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);		int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

	Show All 17 Lines
	#include <algorithm>			#include <algorithm>
	using namespace llvm;			using namespace llvm;

	#define DEBUG_TYPE "aarch64tti"			#define DEBUG_TYPE "aarch64tti"

	static cl::opt<bool> EnableFalkorHWPFUnrollFix("enable-falkor-hwpf-unroll-fix",			static cl::opt<bool> EnableFalkorHWPFUnrollFix("enable-falkor-hwpf-unroll-fix",
	cl::init(true), cl::Hidden);			cl::init(true), cl::Hidden);

	bool AArch64TTIImpl::areInlineCompatible(const Function *Caller,
	const Function *Callee) const {
	const TargetMachine &TM = getTLI()->getTargetMachine();

	const FeatureBitset &CallerBits =
	TM.getSubtargetImpl(*Caller)->getFeatureBits();
	const FeatureBitset &CalleeBits =
	TM.getSubtargetImpl(*Callee)->getFeatureBits();

	// Inline a callee if its target-features are a subset of the callers
	// target-features.
	return (CallerBits & CalleeBits) == CalleeBits;
	}

	/// \brief Calculate the cost of materializing a 64-bit value. This helper			/// \brief Calculate the cost of materializing a 64-bit value. This helper
	/// method might only calculate a fraction of a larger immediate. Therefore it			/// method might only calculate a fraction of a larger immediate. Therefore it
	/// is valid to return a cost of ZERO.			/// is valid to return a cost of ZERO.
	int AArch64TTIImpl::getIntImmCost(int64_t Val) {			int AArch64TTIImpl::getIntImmCost(int64_t Val) {
	// Check if the immediate can be encoded within an instruction.			// Check if the immediate can be encoded within an instruction.
	if (Val == 0 \|\| AArch64_AM::isLogicalImmediate(Val, 64))			if (Val == 0 \|\| AArch64_AM::isLogicalImmediate(Val, 64))
	return 0;			return 0;

	▲ Show 20 Lines • Show All 822 Lines • Show Last 20 Lines

lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	public:

int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);		int getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm, Type *Ty);
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty);		Type *Ty);
bool isLegalMaskedLoad(Type *DataType);		bool isLegalMaskedLoad(Type *DataType);
bool isLegalMaskedStore(Type *DataType);		bool isLegalMaskedStore(Type *DataType);
bool isLegalMaskedGather(Type *DataType);		bool isLegalMaskedGather(Type *DataType);
bool isLegalMaskedScatter(Type *DataType);		bool isLegalMaskedScatter(Type *DataType);
bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;
bool expandMemCmp(Instruction *I, unsigned &MaxLoadSize);		bool expandMemCmp(Instruction *I, unsigned &MaxLoadSize);
bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();
private:		private:
int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,		int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,		int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,215 Lines • ▼ Show 20 Lines	bool X86TTIImpl::isLegalMaskedGather(Type *DataTy) {
// AVX-512 allows gather and scatter		// AVX-512 allows gather and scatter
return (DataWidth == 32 \|\| DataWidth == 64) && ST->hasAVX512();		return (DataWidth == 32 \|\| DataWidth == 64) && ST->hasAVX512();
}		}

bool X86TTIImpl::isLegalMaskedScatter(Type *DataType) {		bool X86TTIImpl::isLegalMaskedScatter(Type *DataType) {
return isLegalMaskedGather(DataType);		return isLegalMaskedGather(DataType);
}		}

bool X86TTIImpl::areInlineCompatible(const Function *Caller,
const Function *Callee) const {
const TargetMachine &TM = getTLI()->getTargetMachine();

// Work this as a subsetting of subtarget features.
const FeatureBitset &CallerBits =
TM.getSubtargetImpl(*Caller)->getFeatureBits();
const FeatureBitset &CalleeBits =
TM.getSubtargetImpl(*Callee)->getFeatureBits();

// FIXME: This is likely too limiting as it will include subtarget features
// that we might not care about for inlining, but it is conservatively
// correct.
return (CallerBits & CalleeBits) == CalleeBits;
}

bool X86TTIImpl::expandMemCmp(Instruction *I, unsigned &MaxLoadSize) {		bool X86TTIImpl::expandMemCmp(Instruction *I, unsigned &MaxLoadSize) {
// TODO: We can increase these based on available vector ops.		// TODO: We can increase these based on available vector ops.
MaxLoadSize = ST->is64Bit() ? 8 : 4;		MaxLoadSize = ST->is64Bit() ? 8 : 4;
return true;		return true;
}		}

bool X86TTIImpl::enableInterleavedAccessVectorization() {		bool X86TTIImpl::enableInterleavedAccessVectorization() {
// TODO: We expect this to be beneficial regardless of arch,		// TODO: We expect this to be beneficial regardless of arch,
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

test/Transforms/Inline/AMDGPU/inline-target-cpu.ll

This file was added.

				; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -inline < %s \| FileCheck %s
				; RUN: opt -mtriple=amdgcn-amd-amdhsa -S -passes='cgscc(inline)' < %s \| FileCheck %s

				; CHECK-LABEL: @func_no_target_cpu(
				define i32 @func_no_target_cpu() #0 {
				ret i32 0
				}

				; CHECK-LABEL: @target_cpu_call_no_target_cpu(
				; CHECK-NEXT: ret i32 0
				define i32 @target_cpu_call_no_target_cpu() #1 {
				%call = call i32 @func_no_target_cpu()
				ret i32 %call
				}

				; CHECK-LABEL: @target_cpu_target_features_call_no_target_cpu(
				; CHECK-NEXT: ret i32 0
				define i32 @target_cpu_target_features_call_no_target_cpu() #2 {
				%call = call i32 @func_no_target_cpu()
				ret i32 %call
				}

				; CHECK-LABEL: @fp32_denormals(
				define i32 @fp32_denormals() #3 {
				ret i32 0
				}

				; CHECK-LABEL: @no_fp32_denormals_call_f32_denormals(
				; CHECK-NEXT: call i32 @fp32_denormals()
				define i32 @no_fp32_denormals_call_f32_denormals() #4 {
				%call = call i32 @fp32_denormals()
				ret i32 %call
				}

				; Make sure gfx9 can call unspecified functions because of movrel
				; feature change.
				; CHECK-LABEL: @gfx9_target_features_call_no_target_cpu(
				; CHECK-NEXT: ret i32 0
				define i32 @gfx9_target_features_call_no_target_cpu() #5 {
				%call = call i32 @func_no_target_cpu()
				ret i32 %call
				}

				attributes #0 = { nounwind }
				attributes #1 = { nounwind "target-cpu"="fiji" }
				attributes #2 = { nounwind "target-cpu"="fiji" "target-features"="+fp32-denormals" }
				attributes #3 = { nounwind "target-features"="+fp32-denormals" }
				attributes #4 = { nounwind "target-features"="-fp32-denormals" }
				attributes #5 = { nounwind "target-cpu"="gfx900" }

test/Transforms/Inline/AMDGPU/lit.local.cfg

This file was added.

				if not 'AMDGPU' in config.root.targets:
				config.unsupported = True