This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
-
X86TargetTransformInfo.cpp
-
test/Transforms/Inline/X86/
-
Transforms/
-
Inline/
-
X86/
-
call-abi-compatibility.ll

Differential D116036

[Inline][X86] Avoid inlining if it would create ABI-incompatible calls (PR52660)
ClosedPublic

Authored by nikic on Dec 20 2021, 5:45 AM.

Download Raw Diff

Details

Reviewers

craig.topper
RKSimon
pengfei
tstellar

Commits

rG7c3cf4c2c068: [Inline][X86] Avoid inlining if it would create ABI-incompatible calls (PR52660)

Summary

X86 allows inlining functions if the callee target features are a subset of the caller target features. This ensures that we don't inline something into a caller that does not support it.

However, this does not account for possible call ABI mismatches as a result of inlining. If a call passing a vector argument was originally in -avx function, calling another -avx function, the vector is passed in xmm. If we now inline it into an +avx function, then it will be passed in ymm, even though the callee expects it in xmm.

Fix this by scanning over all calls in the function and checking whether ABI incompatibility is possible. Calls that only pass scalar types are excluded, as I believe those always use the same ABI independent of target features (right?)

Fixes https://github.com/llvm/llvm-project/issues/52660.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.Dec 20 2021, 5:45 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptDec 20 2021, 5:45 AM

nikic requested review of this revision.Dec 20 2021, 5:45 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2021, 5:45 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nikic added a parent revision: D116031: [ArgPromotion][TTI] Pass types to ABI compatibility hook.Dec 20 2021, 5:45 AM

Harbormaster completed remote builds in B140074: Diff 395423.Dec 20 2021, 5:56 AM

The fix seems good. Just two concerns here:

I wonder what's the algorithm when inlining. Given we have a calling chain A->B->C. If we can inline both B and C into A, is B inlined firstly to A, then C to AB, or C inlined firstly to B, then BC to A? I'd worry if we are pessimistic for the former case.
Another concern is the compiling time. Scanning seems inevitable, but can we avoid to repeat scanning if the inlined function is called in many place?

Herald added a subscriber: ChuanqiXu. · View Herald TranscriptDec 21 2021, 11:57 PM

In D116036#3206048, @pengfei wrote:

The fix seems good. Just two concerns here:

I wonder what's the algorithm when inlining. Given we have a calling chain A->B->C. If we can inline both B and C into A, is B inlined firstly to A, then C to AB, or C inlined firstly to B, then BC to A? I'd worry if we are pessimistic for the former case.

With few exceptions, inlining happens bottom-up, so B->C will be inlined first and then A->BC. So calls in the callee will already be eliminated, unless they can't be inlined for some reason (noinline, cost etc).

Another concern is the compiling time. Scanning seems inevitable, but can we avoid to repeat scanning if the inlined function is called in many place?

I think this will not have much practical impact mainly because this kind of target feature mismatch is rare. Usually target features on all functions are the same. Inlining already does multiple scans of the callee, so at least this doesn't change anything asymptotically.

Thanks for the explanation! LGTM, but please leave some time for others comments.

This revision is now accepted and ready to land.Dec 22 2021, 6:03 AM

This revision was landed with ongoing or failed builds.Dec 27 2021, 12:36 AM

Closed by commit rG7c3cf4c2c068: [Inline][X86] Avoid inlining if it would create ABI-incompatible calls (PR52660) (authored by nikic). · Explain Why

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rG7c3cf4c2c068: [Inline][X86] Avoid inlining if it would create ABI-incompatible calls (PR52660).

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

42 lines

test/

Transforms/

Inline/

X86/

call-abi-compatibility.ll

14 lines

Diff 396277

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

Show All 37 Lines
/// code size, latency and uop count.		/// code size, latency and uop count.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "X86TargetTransformInfo.h"		#include "X86TargetTransformInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/CodeGen/BasicTTIImpl.h"		#include "llvm/CodeGen/BasicTTIImpl.h"
#include "llvm/CodeGen/CostTable.h"		#include "llvm/CodeGen/CostTable.h"
#include "llvm/CodeGen/TargetLowering.h"		#include "llvm/CodeGen/TargetLowering.h"
		#include "llvm/IR/InstIterator.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"

using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "x86tti"		#define DEBUG_TYPE "x86tti"

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 5,128 Lines • ▼ Show 20 Lines	bool X86TTIImpl::areInlineCompatible(const Function *Caller,
const TargetMachine &TM = getTLI()->getTargetMachine();		const TargetMachine &TM = getTLI()->getTargetMachine();

// Work this as a subsetting of subtarget features.		// Work this as a subsetting of subtarget features.
const FeatureBitset &CallerBits =		const FeatureBitset &CallerBits =
TM.getSubtargetImpl(*Caller)->getFeatureBits();		TM.getSubtargetImpl(*Caller)->getFeatureBits();
const FeatureBitset &CalleeBits =		const FeatureBitset &CalleeBits =
TM.getSubtargetImpl(*Callee)->getFeatureBits();		TM.getSubtargetImpl(*Callee)->getFeatureBits();

		// Check whether features are the same (apart from the ignore list).
FeatureBitset RealCallerBits = CallerBits & ~InlineFeatureIgnoreList;		FeatureBitset RealCallerBits = CallerBits & ~InlineFeatureIgnoreList;
FeatureBitset RealCalleeBits = CalleeBits & ~InlineFeatureIgnoreList;		FeatureBitset RealCalleeBits = CalleeBits & ~InlineFeatureIgnoreList;
return (RealCallerBits & RealCalleeBits) == RealCalleeBits;		if (RealCallerBits == RealCalleeBits)
		return true;

		// If the features are a subset, we need to additionally check for calls
		// that may become ABI-incompatible as a result of inlining.
		if ((RealCallerBits & RealCalleeBits) != RealCalleeBits)
		return false;

		for (const Instruction &I : instructions(Callee)) {
		if (const auto *CB = dyn_cast<CallBase>(&I)) {
		SmallVector<Type *, 8> Types;
		for (Value *Arg : CB->args())
		Types.push_back(Arg->getType());
		if (!CB->getType()->isVoidTy())
		Types.push_back(CB->getType());

		// Simple types are always ABI compatible.
		auto IsSimpleTy = [](Type *Ty) {
		return !Ty->isVectorTy() && !Ty->isAggregateType();
		};
		if (all_of(Types, IsSimpleTy))
		continue;

		if (Function *NestedCallee = CB->getCalledFunction()) {
		// Assume that intrinsics are always ABI compatible.
		if (NestedCallee->isIntrinsic())
		continue;

		// Do a precise compatibility check.
		if (!areTypesABICompatible(Caller, NestedCallee, Types))
		return false;
		} else {
		// We don't know the target features of the callee,
		// assume it is incompatible.
		return false;
		}
		}
		}
		return true;
}		}

bool X86TTIImpl::areTypesABICompatible(const Function *Caller,		bool X86TTIImpl::areTypesABICompatible(const Function *Caller,
const Function *Callee,		const Function *Callee,
const ArrayRef<Type *> &Types) const {		const ArrayRef<Type *> &Types) const {
if (!BaseT::areTypesABICompatible(Caller, Callee, Types))		if (!BaseT::areTypesABICompatible(Caller, Callee, Types))
return false;		return false;

▲ Show 20 Lines • Show All 526 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --function-signature
	; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -S -inline \| FileCheck %s			; RUN: opt < %s -mtriple=x86_64-unknown-linux-gnu -S -inline \| FileCheck %s

	; Test for PR52660.			; Test for PR52660.

	; This call should not get inlined, because it would make the callee_not_avx			; This call should not get inlined, because it would make the callee_not_avx
	; call ABI incompatible.			; call ABI incompatible.
	; TODO: Currently gets inlined.
	define void @caller_avx() "target-features"="+avx" {			define void @caller_avx() "target-features"="+avx" {
	; CHECK-LABEL: define {{[^@]+}}@caller_avx			; CHECK-LABEL: define {{[^@]+}}@caller_avx
	; CHECK-SAME: () #[[ATTR0:[0-9]+]] {			; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
	; CHECK-NEXT: [[TMP1:%.*]] = call i64 @callee_not_avx(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)			; CHECK-NEXT: call void @caller_not_avx()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @caller_not_avx()			call void @caller_not_avx()
	ret void			ret void
	}			}

	define internal void @caller_not_avx() {			define internal void @caller_not_avx() {
				; CHECK-LABEL: define {{[^@]+}}@caller_not_avx() {
				; CHECK-NEXT: [[TMP1:%.*]] = call i64 @callee_not_avx(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
				; CHECK-NEXT: ret void
				;
	call i64 @callee_not_avx(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)			call i64 @callee_not_avx(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
	ret void			ret void
	}			}

	define i64 @callee_not_avx(<4 x i64> %arg) noinline {			define i64 @callee_not_avx(<4 x i64> %arg) noinline {
	; CHECK-LABEL: define {{[^@]+}}@callee_not_avx			; CHECK-LABEL: define {{[^@]+}}@callee_not_avx
	; CHECK-SAME: (<4 x i64> [[ARG:%.*]]) #[[ATTR1:[0-9]+]] {			; CHECK-SAME: (<4 x i64> [[ARG:%.*]]) #[[ATTR1:[0-9]+]] {
	; CHECK-NEXT: [[V:%.*]] = extractelement <4 x i64> [[ARG]], i64 2			; CHECK-NEXT: [[V:%.*]] = extractelement <4 x i64> [[ARG]], i64 2
	; CHECK-NEXT: ret i64 [[V]]			; CHECK-NEXT: ret i64 [[V]]
	;			;
	%v = extractelement <4 x i64> %arg, i64 2			%v = extractelement <4 x i64> %arg, i64 2
	ret i64 %v			ret i64 %v
	}			}

	; This call also shouldn't be inlined, as we don't know whether callee_unknown			; This call also shouldn't be inlined, as we don't know whether callee_unknown
	; is ABI compatible or not.			; is ABI compatible or not.
	; TODO: Currently gets inlined.
	define void @caller_avx2() "target-features"="+avx" {			define void @caller_avx2() "target-features"="+avx" {
	; CHECK-LABEL: define {{[^@]+}}@caller_avx2			; CHECK-LABEL: define {{[^@]+}}@caller_avx2
	; CHECK-SAME: () #[[ATTR0]] {			; CHECK-SAME: () #[[ATTR0]] {
	; CHECK-NEXT: [[TMP1:%.*]] = call i64 @callee_unknown(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)			; CHECK-NEXT: call void @caller_not_avx2()
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	call void @caller_not_avx2()			call void @caller_not_avx2()
	ret void			ret void
	}			}

	define internal void @caller_not_avx2() {			define internal void @caller_not_avx2() {
				; CHECK-LABEL: define {{[^@]+}}@caller_not_avx2() {
				; CHECK-NEXT: [[TMP1:%.*]] = call i64 @callee_unknown(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
				; CHECK-NEXT: ret void
				;
	call i64 @callee_unknown(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)			call i64 @callee_unknown(<4 x i64> <i64 0, i64 1, i64 2, i64 3>)
	ret void			ret void
	}			}

	declare i64 @callee_unknown(<4 x i64>)			declare i64 @callee_unknown(<4 x i64>)

	; This call should get inlined, because we assume that intrinsics are always			; This call should get inlined, because we assume that intrinsics are always
	; ABI compatible.			; ABI compatible.
	Show All 34 Lines