This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/NVPTX/
-
Target/
-
NVPTX/
5
NVPTXTargetTransformInfo.h

Differential D47691

[NVPTX] Ignore target-cpu and -features for inling
ClosedPublic

Authored by Hahnfeld on Jun 3 2018, 12:14 PM.

Download Raw Diff

Details

Reviewers

tra
jlebar
hfinkel
echristo

Commits

rGc7410ed47aeb: [NVPTX] Ignore target-cpu and -features for inlining
rL334904: [NVPTX] Ignore target-cpu and -features for inlining

Summary

We don't want to prevent inlining because of target-cpu and -features
attributes that were added to newer versions of LLVM/Clang: There are
no incompatible functions in PTX, ptxas will throw errors in such cases.

Diff Detail

Event Timeline

Hahnfeld created this revision.Jun 3 2018, 12:14 PM

Herald added subscribers: llvm-commits, jholewinski. · View Herald TranscriptJun 3 2018, 12:14 PM

Hahnfeld mentioned this in D47070: [CUDA] Upgrade linked bitcode to enable inlining.Jun 3 2018, 12:15 PM

hfinkel added a subscriber: hfinkel.Jun 3 2018, 2:32 PM

hfinkel added inline comments.

lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66	"We can ignore potential problems in inlining because the assembler will generate errors later if we do the wrong thing" is probably not what we mean. Obviously the point of this function is to avoid inlining when that might cause a problem later (either in tools, such as the assembler, or at runtime). My understanding, from the ongoing discussions, is that PTX is backward compatible, and ptxas will generate code for the underlying target for all PTX uniformly during any given compilation (and, thus, regardless of what the attributes say, the generated machine code will use features from the most-recent specified target. This might not be completely true (i.e., ptxas might still generate code in light of different legalization decisions), but it might be true enough to be the desired behavior.

tra accepted this revision.Jun 5 2018, 11:51 AM

tra added inline comments.

lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66	We're talking about two issues here: what should be done with functions that have target-cpu or target-feature that's not compatible with the current compilation mode? Right now the assumption is that the IR we're given is valid (as in - compilable) for the given compilation mode. At the very least, I can successfully compile functions with target-cpu=sm_70 with llc -march=aarch64. If that's not the case, results may vary from success to partial success (suboptimal code due to some instructions being unavailable), to a failure during compilation (unavailable intrinsics) to PTX or, finally, a failure during compilation of PTX->SASS (e.g. invalid instruction in inline assembly)). I'm not aware of a good way to deal with incompatible IR in LLVM and that's not NVPTX-specific. What inlining restrictions should we impose on the functions we deemed acceptable for compilation? If the assumption above is true, then we have no reason (in NVPTX) to prevent linkining based on these attributes, because every one of them is compilable, which in turn means that it's executable on the target GPU. The goal of preventing inlining here was to avoid executing incompatible code. Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

This revision is now accepted and ready to land.Jun 5 2018, 11:51 AM

Hahnfeld added inline comments.Jun 5 2018, 12:42 PM

lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66	(To be honest I don't understand the point of `areInlineCompatible` at all: Either that function can be compiled in which case it doesn't matter if it is called or inlined, or it is invalid which will result in an error...)

tra added inline comments.Jun 5 2018, 1:07 PM

lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66	My understanding is that it's needed to support function multiversioning in clang which will only work if functions are not inlined: https://clang.llvm.org/docs/AttributeReference.html#target-gnu-target https://gcc.gnu.org/wiki/FunctionMultiVersioning

hfinkel added inline comments.Jun 6 2018, 5:40 AM

lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66	(To be honest I don't understand the point of areInlineCompatible at all: Either that function can be compiled in which case it doesn't matter if it is called or inlined, or it is invalid which will result in an error...) Because you might have: foo.c: void foo() { // some basic implementation } foo_avx.c, compiled with -mavx void foo_avx() { // do things with some axv intrinsics } main.c: ... if (cpu_has_avx()) foo_avx(); else foo(); now compile with LTO enabled. You need to prevent the inlining of foo_avx into the caller in main or else you'll lose the avx codegen (which is an optimization problem, and could, moreover, lead to a compilation failure if intrinsics are used that won't work if the target feature is not enabled). Is there a sense in which the same is true for PTX? Are there intrinsics that depend on target features?

Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

In summary, I think that's okay so long as there aren't intrinsics that depend on target features.

echristo added reviewers: hfinkel, echristo.Jun 6 2018, 5:46 PM

In D47691#1123513, @hfinkel wrote:

Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

In summary, I think that's okay so long as there aren't intrinsics that depend on target features.

Does this mean you are ok with landing this patch?

In D47691#1123513, @hfinkel wrote:

Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

In summary, I think that's okay so long as there aren't intrinsics that depend on target features.

There are intrinsics that have *minimum* requirements for GPU variant (SM_XX) and PTX version.
So it is possible to generate instructions for a wrong GPU variant which will cause PTXAS to fail.
AFAICT, we have no good way to implement intended semantics of target-gpu and target-feature in NVPTX because it just does not allow mixing instructions from different GPU variants in the same TU. Well, you can have *older* SM/PTX entities in the IR, but all of them will be compiled for the most recent variant. IMO we can either ignore the attributes (and leave PTXAS deal with potentially invalid instructions) or error out during compilation (which LLVM does not handle particularly well, either).

Whether inline functions with mismatched attributes is actually orthogonal to that. It's the presence of the mismatched attributes that's the problem. Whether we inline those functions or not does not change the situation much. I.e. if mismatched function requires older SM/PTX, it will compile and work just fine, whether it's inlined or not. If the function requires newer SM/PTX, it will fail PTXAS compilation if not inlined and may or may not fail is it is inlined as it may end up not generating anything that would need new functionality. In the end inlining mismatched functions in NVPTX does not make the situation worse than it would be if inlining was not allowed.

The issue with handling target-cpu and target-features in NVPTX in principle does remain open. This patch should let things work the way they did before my clang patch ended up generating the attributes and exposed this issue.

The description you had is basically how and why this stuff works. Just
switch "older cpu" for "minimum gpu" and I think all of this should still
make sense?

I've discussed this with @echristo . Bottom line is that this patch is OK to proceed, and that I'm on the hook to add a NVPTX-specific pass to enforce that all target-cpu and and target-feature attributes in the TU are compatible with the current compilation parameters. I.e. if we're compiling for sm_50, then IR with functions that have target-cpu=sm_60 will result in an error.

Closed by commit rL334904: [NVPTX] Ignore target-cpu and -features for inlining (authored by Hahnfeld). · Explain WhyJun 17 2018, 2:59 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

NVPTX/

NVPTXTargetTransformInfo.h

8 lines

Diff 149655

lib/Target/NVPTX/NVPTXTargetTransformInfo.h

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	public:
// LoopVectorizer's unrolling heuristics.		// LoopVectorizer's unrolling heuristics.
unsigned getNumberOfRegisters(bool Vector) const { return 1; }		unsigned getNumberOfRegisters(bool Vector) const { return 1; }

// Only <2 x half> should be vectorized, so always return 32 for the vector		// Only <2 x half> should be vectorized, so always return 32 for the vector
// register size.		// register size.
unsigned getRegisterBitWidth(bool Vector) const { return 32; }		unsigned getRegisterBitWidth(bool Vector) const { return 32; }
unsigned getMinVectorRegisterBitWidth() const { return 32; }		unsigned getMinVectorRegisterBitWidth() const { return 32; }

		// We don't want to prevent inlining because of target-cpu and -features
		// attributes that were added to newer versions of LLVM/Clang: There are
		// no incompatible functions in PTX, ptxas will throw errors in such cases.
		hfinkelUnsubmitted Not Done Reply Inline Actions "We can ignore potential problems in inlining because the assembler will generate errors later if we do the wrong thing" is probably not what we mean. Obviously the point of this function is to avoid inlining when that might cause a problem later (either in tools, such as the assembler, or at runtime). My understanding, from the ongoing discussions, is that PTX is backward compatible, and ptxas will generate code for the underlying target for all PTX uniformly during any given compilation (and, thus, regardless of what the attributes say, the generated machine code will use features from the most-recent specified target. This might not be completely true (i.e., ptxas might still generate code in light of different legalization decisions), but it might be true enough to be the desired behavior. hfinkel: "We can ignore potential problems in inlining because the assembler will generate errors later…
		traUnsubmitted Not Done Reply Inline Actions We're talking about two issues here: what should be done with functions that have target-cpu or target-feature that's not compatible with the current compilation mode? Right now the assumption is that the IR we're given is valid (as in - compilable) for the given compilation mode. At the very least, I can successfully compile functions with target-cpu=sm_70 with llc -march=aarch64. If that's not the case, results may vary from success to partial success (suboptimal code due to some instructions being unavailable), to a failure during compilation (unavailable intrinsics) to PTX or, finally, a failure during compilation of PTX->SASS (e.g. invalid instruction in inline assembly)). I'm not aware of a good way to deal with incompatible IR in LLVM and that's not NVPTX-specific. What inlining restrictions should we impose on the functions we deemed acceptable for compilation? If the assumption above is true, then we have no reason (in NVPTX) to prevent linkining based on these attributes, because every one of them is compilable, which in turn means that it's executable on the target GPU. The goal of preventing inlining here was to avoid executing incompatible code. Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX. tra: We're talking about two issues here: * what should be done with functions that have target…
		HahnfeldAuthorUnsubmitted Not Done Reply Inline Actions (To be honest I don't understand the point of `areInlineCompatible` at all: Either that function can be compiled in which case it doesn't matter if it is called or inlined, or it is invalid which will result in an error...) Hahnfeld: (To be honest I don't understand the point of `areInlineCompatible` at all: Either that…
		traUnsubmitted Not Done Reply Inline Actions My understanding is that it's needed to support function multiversioning in clang which will only work if functions are not inlined: https://clang.llvm.org/docs/AttributeReference.html#target-gnu-target https://gcc.gnu.org/wiki/FunctionMultiVersioning tra: My understanding is that it's needed to support function multiversioning in clang which will…
		hfinkelUnsubmitted Not Done Reply Inline Actions (To be honest I don't understand the point of areInlineCompatible at all: Either that function can be compiled in which case it doesn't matter if it is called or inlined, or it is invalid which will result in an error...) Because you might have: foo.c: void foo() { // some basic implementation } foo_avx.c, compiled with -mavx void foo_avx() { // do things with some axv intrinsics } main.c: ... if (cpu_has_avx()) foo_avx(); else foo(); now compile with LTO enabled. You need to prevent the inlining of foo_avx into the caller in main or else you'll lose the avx codegen (which is an optimization problem, and could, moreover, lead to a compilation failure if intrinsics are used that won't work if the target feature is not enabled). Is there a sense in which the same is true for PTX? Are there intrinsics that depend on target features? hfinkel: > (To be honest I don't understand the point of areInlineCompatible at all: Either that…
		bool areInlineCompatible(const Function *Caller,
		const Function *Callee) const {
		return true;
		}

// Increase the inlining cost threshold by a factor of 5, reflecting that		// Increase the inlining cost threshold by a factor of 5, reflecting that
// calls are particularly expensive in NVPTX.		// calls are particularly expensive in NVPTX.
unsigned getInliningThresholdMultiplier() { return 5; }		unsigned getInliningThresholdMultiplier() { return 5; }

int getArithmeticInstrCost(		int getArithmeticInstrCost(
unsigned Opcode, Type *Ty,		unsigned Opcode, Type *Ty,
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,		TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
Show All 27 Lines