This is an archive of the discontinued LLVM Phabricator instance.

[Inliner] Teach inliner to merge 'min-legal-vector-width' function attribute
ClosedPublic

Authored by craig.topper on Jul 10 2018, 5:08 PM.

Download Raw Diff

Details

Reviewers

Commits

rG1d504f777e10: [Inliner] Teach inliner to merge 'min-legal-vector-width' function attribute
rL337844: [Inliner] Teach inliner to merge 'min-legal-vector-width' function attribute

Summary

When we inline a function with a min-legal-vector-width attribute we need to make sure the caller also ends up with at least that vector width.

In the future we may want to have heuristics to block inlining for different vector widths possibly with another attribute, but we haven't defined that yet.

I've based this entirely on the stack-probe-size merging code.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 10 2018, 5:08 PM

Herald added a subscriber: eraman. · View Herald TranscriptJul 10 2018, 5:08 PM

Ping

chandlerc added inline comments.Jul 23 2018, 4:26 PM

lib/IR/Attributes.cpp
1685–1686	I feel like we're going to want to do something a bit more nuanced than this... For example, consider a function doing dynamic dispatch based on CPUID detection. It will look like: void do_algo() { if (has_feature_X()) do_algo_with_X(); else if (has_feature_Y()) do_algo_with_Y(); else do_algo_generic(); } I don't think we're going to want to promote the min legal width of this wrapper to be the largest of all the things it calls, even if they are viable for inlining.... Right now, these usually are subtarget selecting and I think we block inlining in that case. But now that we can talk about vector width, I could imagine the above selecting a 256-bit algorithm when running on a Skylake CPU, but a 128-bit algorithm when running on older CPUs, and not needing an target features to differ between the two. Just the vector min length. If we need a heuristic, the one I would suggest goes along the lines of: Explicit attributes always win, we don't adjust them. An implicit attribute can be promoted iff the callee post dominates the entry of the caller. That is, the callee is not predicated in some way that might select one callee instead of another. Would #2 still be too restrictive for the use cases you have in mind?

craig.topper added inline comments.Jul 23 2018, 4:49 PM

lib/IR/Attributes.cpp
1685–1686	This code is the code that gets called after the inlining decision has been made right? My immediate goal was just to get always_inline to propagate correctly so the intrinsics get propagated. Heuristics would need to go into something like TTI::areInlineCompatible, but we don't have a CallSite there for #2.

chandlerc added inline comments.Jul 23 2018, 5:12 PM

lib/IR/Attributes.cpp
1685–1686	Ok, that makes more sense. Maybe some comments here explaining that while this may not be desirable, there is a pretty clear semantic transformation to merge the two attributes? (Also, the always_inline thing is a great "0" in my heuristics above! =] So hopefully will be easy to at least write an inital cut at TTI::areInlineCompatible that gets past basic sanity by inlining intrinsics and such w/o pushing much further.

Improve comment.

As I mentioned on IRC always_inline doesn't go through inline cost analysis including attribute compatibility checking. So this patch is the minimum necessary to make intrinsics propagate their vector width to the function they get inlined into.

Add back the test case.

LGTM

This revision is now accepted and ready to land.Jul 24 2018, 10:58 AM

Closed by commit rL337844: [Inliner] Teach inliner to merge 'min-legal-vector-width' function attribute (authored by ctopper). · Explain WhyJul 24 2018, 11:49 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

IR/

Attributes.td

1 line

lib/

IR/

Attributes.cpp

23 lines

test/

Transforms/

Inline/

inline-min-legal-vector-width.ll

29 lines

Diff 154904

include/llvm/IR/Attributes.td

	Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines
	def : MergeRule<"setAND<NoNansFPMathAttr>">;			def : MergeRule<"setAND<NoNansFPMathAttr>">;
	def : MergeRule<"setAND<UnsafeFPMathAttr>">;			def : MergeRule<"setAND<UnsafeFPMathAttr>">;
	def : MergeRule<"setOR<NoImplicitFloatAttr>">;			def : MergeRule<"setOR<NoImplicitFloatAttr>">;
	def : MergeRule<"setOR<NoJumpTablesAttr>">;			def : MergeRule<"setOR<NoJumpTablesAttr>">;
	def : MergeRule<"setOR<ProfileSampleAccurateAttr>">;			def : MergeRule<"setOR<ProfileSampleAccurateAttr>">;
	def : MergeRule<"adjustCallerSSPLevel">;			def : MergeRule<"adjustCallerSSPLevel">;
	def : MergeRule<"adjustCallerStackProbes">;			def : MergeRule<"adjustCallerStackProbes">;
	def : MergeRule<"adjustCallerStackProbeSize">;			def : MergeRule<"adjustCallerStackProbeSize">;
				def : MergeRule<"adjustMinLegalVectorWidth">;

lib/IR/Attributes.cpp

Show First 20 Lines • Show All 1,676 Lines • ▼ Show 20 Lines	if (Caller.hasFnAttribute("stack-probe-size")) {
Caller.addFnAttr(Callee.getFnAttribute("stack-probe-size"));		Caller.addFnAttr(Callee.getFnAttribute("stack-probe-size"));
}		}
} else {		} else {
Caller.addFnAttr(Callee.getFnAttribute("stack-probe-size"));		Caller.addFnAttr(Callee.getFnAttribute("stack-probe-size"));
}		}
}		}
}		}

		/// If the inlined function defines a min legal vector width, then ensure
		/// the calling function has the same or larger min legal vector width.
		chandlercUnsubmitted Not Done Reply Inline Actions I feel like we're going to want to do something a bit more nuanced than this... For example, consider a function doing dynamic dispatch based on CPUID detection. It will look like: void do_algo() { if (has_feature_X()) do_algo_with_X(); else if (has_feature_Y()) do_algo_with_Y(); else do_algo_generic(); } I don't think we're going to want to promote the min legal width of this wrapper to be the largest of all the things it calls, even if they are viable for inlining.... Right now, these usually are subtarget selecting and I think we block inlining in that case. But now that we can talk about vector width, I could imagine the above selecting a 256-bit algorithm when running on a Skylake CPU, but a 128-bit algorithm when running on older CPUs, and not needing an target features to differ between the two. Just the vector min length. If we need a heuristic, the one I would suggest goes along the lines of: Explicit attributes always win, we don't adjust them. An implicit attribute can be promoted iff the callee post dominates the entry of the caller. That is, the callee is not predicated in some way that might select one callee instead of another. Would #2 still be too restrictive for the use cases you have in mind? chandlerc: I feel like we're going to want to do something a bit more nuanced than this... For example…
		craig.topperAuthorUnsubmitted Not Done Reply Inline Actions This code is the code that gets called after the inlining decision has been made right? My immediate goal was just to get always_inline to propagate correctly so the intrinsics get propagated. Heuristics would need to go into something like TTI::areInlineCompatible, but we don't have a CallSite there for #2. craig.topper: This code is the code that gets called after the inlining decision has been made right? My…
		chandlercUnsubmitted Not Done Reply Inline Actions Ok, that makes more sense. Maybe some comments here explaining that while this may not be desirable, there is a pretty clear semantic transformation to merge the two attributes? (Also, the always_inline thing is a great "0" in my heuristics above! =] So hopefully will be easy to at least write an inital cut at TTI::areInlineCompatible that gets past basic sanity by inlining intrinsics and such w/o pushing much further. chandlerc: Ok, that makes more sense. Maybe some comments here explaining that while this may not be…
		static void
		adjustMinLegalVectorWidth(Function &Caller, const Function &Callee) {
		if (Callee.hasFnAttribute("min-legal-vector-width")) {
		uint64_t CalleeVectorWidth;
		Callee.getFnAttribute("min-legal-vector-width")
		.getValueAsString()
		.getAsInteger(0, CalleeVectorWidth);
		if (Caller.hasFnAttribute("min-legal-vector-width")) {
		uint64_t CallerVectorWidth;
		Caller.getFnAttribute("min-legal-vector-width")
		.getValueAsString()
		.getAsInteger(0, CallerVectorWidth);
		if (CallerVectorWidth < CalleeVectorWidth) {
		Caller.addFnAttr(Callee.getFnAttribute("min-legal-vector-width"));
		}
		} else {
		Caller.addFnAttr(Callee.getFnAttribute("min-legal-vector-width"));
		}
		}
		}

#define GET_ATTR_COMPAT_FUNC		#define GET_ATTR_COMPAT_FUNC
#include "AttributesCompatFunc.inc"		#include "AttributesCompatFunc.inc"

bool AttributeFuncs::areInlineCompatible(const Function &Caller,		bool AttributeFuncs::areInlineCompatible(const Function &Caller,
const Function &Callee) {		const Function &Callee) {
return hasCompatibleFnAttrs(Caller, Callee);		return hasCompatibleFnAttrs(Caller, Callee);
}		}

void AttributeFuncs::mergeAttributesForInlining(Function &Caller,		void AttributeFuncs::mergeAttributesForInlining(Function &Caller,
const Function &Callee) {		const Function &Callee) {
mergeFnAttrs(Caller, Callee);		mergeFnAttrs(Caller, Callee);
}		}

test/Transforms/Inline/inline-min-legal-vector-width.ll

This file was added.

				; RUN: opt %s -inline -S \| FileCheck %s

				define internal void @innerSmall() "min-legal-vector-width"="128" {
				ret void
				}

				define internal void @innerLarge() "min-legal-vector-width"="512" {
				ret void
				}

				define void @outerNoAttribute() {
				call void @innerLarge()
				ret void
				}

				define void @outerConflictingAttributeSmall() "min-legal-vector-width"="128" {
				call void @innerLarge()
				ret void
				}

				define void @outerConflictingAttributeLarge() "min-legal-vector-width"="512" {
				call void @innerSmall()
				ret void
				}

				; CHECK: define void @outerNoAttribute() #0
				; CHECK: define void @outerConflictingAttributeSmall() #0
				; CHECK: define void @outerConflictingAttributeLarge() #0
				; CHECK: attributes #0 = { "min-legal-vector-width"="512" }