This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Ignore target-cpu and -features for inling
ClosedPublic

Authored by Hahnfeld on Jun 3 2018, 12:14 PM.

Details

Summary

We don't want to prevent inlining because of target-cpu and -features
attributes that were added to newer versions of LLVM/Clang: There are
no incompatible functions in PTX, ptxas will throw errors in such cases.

Diff Detail

Repository
rL LLVM

Event Timeline

Hahnfeld created this revision.Jun 3 2018, 12:14 PM
hfinkel added a subscriber: hfinkel.Jun 3 2018, 2:32 PM
hfinkel added inline comments.
lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66 ↗(On Diff #149655)

"We can ignore potential problems in inlining because the assembler will generate errors later if we do the wrong thing" is probably not what we mean. Obviously the point of this function is to avoid inlining when that might cause a problem later (either in tools, such as the assembler, or at runtime).

My understanding, from the ongoing discussions, is that PTX is backward compatible, and ptxas will generate code for the underlying target for all PTX uniformly during any given compilation (and, thus, regardless of what the attributes say, the generated machine code will use features from the most-recent specified target. This might not be completely true (i.e., ptxas might still generate code in light of different legalization decisions), but it might be true enough to be the desired behavior.

tra accepted this revision.Jun 5 2018, 11:51 AM
tra added inline comments.
lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66 ↗(On Diff #149655)

We're talking about two issues here:

  • what should be done with functions that have target-cpu or target-feature that's not compatible with the current compilation mode?

Right now the assumption is that the IR we're given is valid (as in - compilable) for the given compilation mode. At the very least, I can successfully compile functions with target-cpu=sm_70 with llc -march=aarch64.

If that's not the case, results may vary from success to partial success (suboptimal code due to some instructions being unavailable), to a failure during compilation (unavailable intrinsics) to PTX or, finally, a failure during compilation of PTX->SASS (e.g. invalid instruction in inline assembly)). I'm not aware of a good way to deal with incompatible IR in LLVM and that's not NVPTX-specific.

  • What inlining restrictions should we impose on the functions we deemed acceptable for compilation?

If the assumption above is true, then we have no reason (in NVPTX) to prevent linkining based on these attributes, because every one of them is compilable, which in turn means that it's executable on the target GPU. The goal of preventing inlining here was to avoid executing incompatible code.

Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

This revision is now accepted and ready to land.Jun 5 2018, 11:51 AM
Hahnfeld added inline comments.Jun 5 2018, 12:42 PM
lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66 ↗(On Diff #149655)

(To be honest I don't understand the point of areInlineCompatible at all: Either that function can be compiled in which case it doesn't matter if it is called or inlined, or it is invalid which will result in an error...)

tra added inline comments.Jun 5 2018, 1:07 PM
lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66 ↗(On Diff #149655)

My understanding is that it's needed to support function multiversioning in clang which will only work if functions are not inlined:
https://clang.llvm.org/docs/AttributeReference.html#target-gnu-target https://gcc.gnu.org/wiki/FunctionMultiVersioning

hfinkel added inline comments.Jun 6 2018, 5:40 AM
lib/Target/NVPTX/NVPTXTargetTransformInfo.h
66 ↗(On Diff #149655)

(To be honest I don't understand the point of areInlineCompatible at all: Either that function can be compiled in which case it doesn't matter if it is called or inlined, or it is invalid which will result in an error...)

Because you might have:

foo.c:
void foo() {
  // some basic implementation
}

foo_avx.c, compiled with -mavx
void foo_avx() {
  // do things with some axv intrinsics
}

main.c:
...
if (cpu_has_avx())
  foo_avx();
else
  foo();

now compile with LTO enabled. You need to prevent the inlining of foo_avx into the caller in main or else you'll lose the avx codegen (which is an optimization problem, and could, moreover, lead to a compilation failure if intrinsics are used that won't work if the target feature is not enabled).

Is there a sense in which the same is true for PTX? Are there intrinsics that depend on target features?

Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

In summary, I think that's okay so long as there aren't intrinsics that depend on target features.

Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

In summary, I think that's okay so long as there aren't intrinsics that depend on target features.

Does this mean you are ok with landing this patch?

tra added a comment.Jun 12 2018, 9:57 AM

Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

In summary, I think that's okay so long as there aren't intrinsics that depend on target features.

There are intrinsics that have *minimum* requirements for GPU variant (SM_XX) and PTX version.
So it is possible to generate instructions for a wrong GPU variant which will cause PTXAS to fail.
AFAICT, we have no good way to implement intended semantics of target-gpu and target-feature in NVPTX because it just does not allow mixing instructions from different GPU variants in the same TU. Well, you can have *older* SM/PTX entities in the IR, but all of them will be compiled for the most recent variant. IMO we can either ignore the attributes (and leave PTXAS deal with potentially invalid instructions) or error out during compilation (which LLVM does not handle particularly well, either).

Whether inline functions with mismatched attributes is actually orthogonal to that. It's the presence of the mismatched attributes that's the problem. Whether we inline those functions or not does not change the situation much. I.e. if mismatched function requires older SM/PTX, it will compile and work just fine, whether it's inlined or not. If the function requires newer SM/PTX, it will fail PTXAS compilation if not inlined and may or may not fail is it is inlined as it may end up not generating anything that would need new functionality. In the end inlining mismatched functions in NVPTX does not make the situation worse than it would be if inlining was not allowed.

The issue with handling target-cpu and target-features in NVPTX in principle does remain open. This patch should let things work the way they did before my clang patch ended up generating the attributes and exposed this issue.

The description you had is basically how and why this stuff works. Just
switch "older cpu" for "minimum gpu" and I think all of this should still
make sense?

tra added a comment.Jun 12 2018, 5:33 PM

I've discussed this with @echristo . Bottom line is that this patch is OK to proceed, and that I'm on the hook to add a NVPTX-specific pass to enforce that all target-cpu and and target-feature attributes in the TU are compatible with the current compilation parameters. I.e. if we're compiling for sm_50, then IR with functions that have target-cpu=sm_60 will result in an error.

This revision was automatically updated to reflect the committed changes.