This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/X86/
-
Target/
-
X86/
8
X86TargetTransformInfo.cpp
-
test/Transforms/Inline/X86/
-
Transforms/
-
Inline/
-
X86/
-
call-abi-compatibility.ll

Differential D157826

[X86] Allow inlining callees missing VLX feature
Needs ReviewPublic

Authored by kalcutter on Aug 13 2023, 2:13 PM.

Download Raw Diff

Details

Reviewers

kazu
RKSimon
pengfei

Summary

This patch attempts to fix a regression caused by https://github.com/llvm/llvm-project/commit/d6f994acb3d545b80161e24ab742c9c69d4bbf33. In particular, always_inline should work on callees without VLX from VLX functions. I found this testing clang-17-rc2. If accepted, please also apply this to the LLVM 17.0.0 branch.

Diff Detail

Event Timeline

kalcutter created this revision.Aug 13 2023, 2:13 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2023, 2:13 PM

Herald added a subscriber: hiraditya. · View Herald Transcript

kalcutter requested review of this revision.Aug 13 2023, 2:13 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 13 2023, 2:13 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

nikic added a subscriber: nikic.Aug 13 2023, 2:42 PM

nikic added inline comments.

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
6082	I assume that this is the actually failing check? In that case, should the adjustment be in that function?

kalcutter added inline comments.Aug 13 2023, 3:00 PM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
6082	I am not sure. That function is checking that the used types are ABI compatible, but what about available target instructions? Is the idea to first only check the types, then later during CodeGen the target does a more exhaustive check? Fixing areTypesABICompatible() would be more general I guess. Do you have an idea how much work is involved properly fixing that function? I am not familiar with this code base at all. Do you have anything against applying this patch as an incremental improvement?

kalcutter added inline comments.Aug 13 2023, 3:23 PM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
6082	I guess the feature subset check covers all cases of available target instructions. So just making sure all the types are compatible, like you suggested, would be the proper fix.

There is also another related issue: When areInlineCompatible() returns false here, a callee with attribute always_inline is silently not inlined. It seems like clang should emit and error or at least a warning in this case. I only noticed this because my code was running 10x slower compiling with clang 17.

kalcutter marked an inline comment as not done.Aug 13 2023, 3:38 PM

Harbormaster completed remote builds in B252231: Diff 549755.Aug 13 2023, 3:39 PM

pengfei added inline comments.Aug 13 2023, 4:20 PM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
6082	The function is designed to check if both caller and callee allow ZMM codegen or not when passing a 512-bit vector type. The logic to check ZMM codegen is: has AVX512F but not AVX512VL has AVX512F and `prefer-vector-width` >= 512 has AVX512F and `min-legal-vector-width` > 256 So has AVX512VL or not does affect the ABI when `prefer-vector-width` and `min-legal-vector-width` are set 256 so less, though it shouldn't happen since we have guarantee they are set properly in the FE.

kalcutter added inline comments.Aug 14 2023, 12:52 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
6082	I am a bit confused. A function with both AVX512F and AVX512VL features obviously allows ZMM codegen in the function body. Why should enabling AVX512VL change how 512-bit vector types are passed? Even if that were the case for an opaque function, why should that disabling inlining? Also I don't see how (1)-(3) should ever affect`always_inline`.

pengfei added inline comments.Aug 14 2023, 6:34 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
6082	We have an option `-mprefer-vector-width=128/256`, which prefers to 128/256 vector instructions under AVX512VL. Under such situation, 512-bit vector may be split into 256-bit when passing. No sure if it is the reason we disable inlining.

kalcutter added inline comments.Aug 14 2023, 9:05 AM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
6058	Thinking about this more. Why shouldn't this function just end at this line? ISTM that only features should control if inlining is _possible_ or not. I don't see how checking for ABI compatibility between types is helpful. Once the callee gets inlined, it can take on the ABI of the caller anyway.

pengfei added inline comments.Aug 14 2023, 5:57 PM

llvm/lib/Target/X86/X86TargetTransformInfo.cpp
6058	It's good for caller has `FeatureVLX` but callee doesn't, but maybe we need to check ABI compatibility for the both direction. The reason we need to check type is only vector type affected by feature difference.

Matt added a subscriber: Matt.Aug 16 2023, 1:16 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

X86/

X86TargetTransformInfo.cpp

4 lines

test/

Transforms/

Inline/

X86/

call-abi-compatibility.ll

18 lines

Diff 549755

llvm/lib/Target/X86/X86TargetTransformInfo.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,043 Lines • ▼ Show 20 Lines	const FeatureBitset &CalleeBits =
TM.getSubtargetImpl(*Callee)->getFeatureBits();		TM.getSubtargetImpl(*Callee)->getFeatureBits();

// Check whether features are the same (apart from the ignore list).		// Check whether features are the same (apart from the ignore list).
FeatureBitset RealCallerBits = CallerBits & ~InlineFeatureIgnoreList;		FeatureBitset RealCallerBits = CallerBits & ~InlineFeatureIgnoreList;
FeatureBitset RealCalleeBits = CalleeBits & ~InlineFeatureIgnoreList;		FeatureBitset RealCalleeBits = CalleeBits & ~InlineFeatureIgnoreList;
if (RealCallerBits == RealCalleeBits)		if (RealCallerBits == RealCalleeBits)
return true;		return true;

		// If the callee is only missing VLX, they are compatible.
		if (RealCallerBits == (RealCalleeBits \| FeatureBitset{X86::FeatureVLX}))
		return true;

// If the features are a subset, we need to additionally check for calls		// If the features are a subset, we need to additionally check for calls
// that may become ABI-incompatible as a result of inlining.		// that may become ABI-incompatible as a result of inlining.
if ((RealCallerBits & RealCalleeBits) != RealCalleeBits)		if ((RealCallerBits & RealCalleeBits) != RealCalleeBits)
		kalcutterAuthorUnsubmitted Not Done Reply Inline Actions Thinking about this more. Why shouldn't this function just end at this line? ISTM that only features should control if inlining is _possible_ or not. I don't see how checking for ABI compatibility between types is helpful. Once the callee gets inlined, it can take on the ABI of the caller anyway. kalcutter: Thinking about this more. Why shouldn't this function just end at this line? ISTM that only…
		pengfeiUnsubmitted Not Done Reply Inline Actions It's good for caller has `FeatureVLX` but callee doesn't, but maybe we need to check ABI compatibility for the both direction. The reason we need to check type is only vector type affected by feature difference. pengfei: It's good for caller has `FeatureVLX` but callee doesn't, but maybe we need to check ABI…
return false;		return false;

for (const Instruction &I : instructions(Callee)) {		for (const Instruction &I : instructions(Callee)) {
if (const auto *CB = dyn_cast<CallBase>(&I)) {		if (const auto *CB = dyn_cast<CallBase>(&I)) {
SmallVector<Type *, 8> Types;		SmallVector<Type *, 8> Types;
for (Value *Arg : CB->args())		for (Value *Arg : CB->args())
Types.push_back(Arg->getType());		Types.push_back(Arg->getType());
if (!CB->getType()->isVoidTy())		if (!CB->getType()->isVoidTy())
Types.push_back(CB->getType());		Types.push_back(CB->getType());

// Simple types are always ABI compatible.		// Simple types are always ABI compatible.
auto IsSimpleTy = [](Type *Ty) {		auto IsSimpleTy = [](Type *Ty) {
return !Ty->isVectorTy() && !Ty->isAggregateType();		return !Ty->isVectorTy() && !Ty->isAggregateType();
};		};
if (all_of(Types, IsSimpleTy))		if (all_of(Types, IsSimpleTy))
continue;		continue;

if (Function *NestedCallee = CB->getCalledFunction()) {		if (Function *NestedCallee = CB->getCalledFunction()) {
// Assume that intrinsics are always ABI compatible.		// Assume that intrinsics are always ABI compatible.
if (NestedCallee->isIntrinsic())		if (NestedCallee->isIntrinsic())
continue;		continue;

// Do a precise compatibility check.		// Do a precise compatibility check.
if (!areTypesABICompatible(Caller, NestedCallee, Types))		if (!areTypesABICompatible(Caller, NestedCallee, Types))
		nikicUnsubmitted Not Done Reply Inline Actions I assume that this is the actually failing check? In that case, should the adjustment be in that function? nikic: I assume that this is the actually failing check? In that case, should the adjustment be in…
		kalcutterAuthorUnsubmitted Not Done Reply Inline Actions I am not sure. That function is checking that the used types are ABI compatible, but what about available target instructions? Is the idea to first only check the types, then later during CodeGen the target does a more exhaustive check? Fixing areTypesABICompatible() would be more general I guess. Do you have an idea how much work is involved properly fixing that function? I am not familiar with this code base at all. Do you have anything against applying this patch as an incremental improvement? kalcutter: I am not sure. That function is checking that the used types are ABI compatible, but what about…
		kalcutterAuthorUnsubmitted Not Done Reply Inline Actions I guess the feature subset check covers all cases of available target instructions. So just making sure all the types are compatible, like you suggested, would be the proper fix. kalcutter: I guess the feature subset check covers all cases of available target instructions. So just…
		pengfeiUnsubmitted Not Done Reply Inline Actions The function is designed to check if both caller and callee allow ZMM codegen or not when passing a 512-bit vector type. The logic to check ZMM codegen is: has AVX512F but not AVX512VL has AVX512F and `prefer-vector-width` >= 512 has AVX512F and `min-legal-vector-width` > 256 So has AVX512VL or not does affect the ABI when `prefer-vector-width` and `min-legal-vector-width` are set 256 so less, though it shouldn't happen since we have guarantee they are set properly in the FE. pengfei: The function is designed to check if both caller and callee allow ZMM codegen or not when…
		kalcutterAuthorUnsubmitted Not Done Reply Inline Actions I am a bit confused. A function with both AVX512F and AVX512VL features obviously allows ZMM codegen in the function body. Why should enabling AVX512VL change how 512-bit vector types are passed? Even if that were the case for an opaque function, why should that disabling inlining? Also I don't see how (1)-(3) should ever affect`always_inline`. kalcutter: I am a bit confused. A function with both AVX512F and AVX512VL features obviously allows ZMM…
		pengfeiUnsubmitted Not Done Reply Inline Actions We have an option `-mprefer-vector-width=128/256`, which prefers to 128/256 vector instructions under AVX512VL. Under such situation, 512-bit vector may be split into 256-bit when passing. No sure if it is the reason we disable inlining. pengfei: We have an option `-mprefer-vector-width=128/256`, which prefers to 128/256 vector instructions…
return false;		return false;
} else {		} else {
// We don't know the target features of the callee,		// We don't know the target features of the callee,
// assume it is incompatible.		// assume it is incompatible.
return false;		return false;
}		}
}		}
}		}
▲ Show 20 Lines • Show All 569 Lines • Show Last 20 Lines

llvm/test/Transforms/Inline/X86/call-abi-compatibility.ll

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	}			}

	define internal void @caller_not_avx4() {			define internal void @caller_not_avx4() {
	call i64 @caller_unknown_simple(i64 0)			call i64 @caller_unknown_simple(i64 0)
	ret void			ret void
	}			}

	declare i64 @caller_unknown_simple(i64)			declare i64 @caller_unknown_simple(i64)

				; This call should get inlined, because the callee is only missing VLX.
				define void @caller_vlx() "target-features"="+avx512f,+avx512vl" {
				; CHECK-LABEL: define {{[^@]+}}@caller_vlx
				; CHECK-SAME: () #[[ATTR2:[0-9]+]] {
				; CHECK-NEXT: call void @callee_not_vlx(<8 x i64> <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>)
				; CHECK-NEXT: ret void
				;
				call void @caller_not_vlx(<8 x i64> <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7>)
				ret void
				}

				define internal void @caller_not_vlx(<8 x i64> %arg) "target-features"="+avx512f" {
				call void @callee_not_vlx(<8 x i64> %arg)
				ret void
				}

				declare void @callee_not_vlx(<8 x i64>)