This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Upgrade linked bitcode to enable inlining
AbandonedPublic

Authored by Hahnfeld on May 18 2018, 8:01 AM.

Details

Reviewers
tra
Summary

Revision rC329829 added the architecture to "target-features". This
prevents inlining of previously generated bitcode because the
feature sets don't match. Thus duplicate the information from
"target-cpu" to avoid writing special cases in the analysis.

I'm not sure if that will save us in the long term because inlining
will break again when we add new features. Additionally, using later
CUDA versions might raise the PTX version which is also a feature...

Diff Detail

Event Timeline

Hahnfeld created this revision.May 18 2018, 8:01 AM
jlebar removed a reviewer: jlebar.May 18 2018, 10:54 AM
jlebar added a subscriber: jlebar.

I defer to Art on this one.

tra added a subscriber: echristo.May 18 2018, 11:03 AM

This was not intended. :-( I was unaware that GetCPUAndFeaturesAttributes() would add any feature that looks like a valid CPU name to the target-cpu attribute.
All I needed is to make builtins available or not. Setting them as function attributes is not what we need here.

I'm not sure what's the best way to deal with this. On one hand I do need to make some builtins available depending on combination of GPU arch and PTX version. The only way to do it is via the features. On the other hand, the features appear to propagate to LLVM IR, which is something I don't need or want.

One way would be to introduce some sort of feature blacklist which would prevent them from being converted to function attributes.
Or, perhaps, we can change TARGET_BUILTIN or create something similar which would allow availability of builtins w/o relying on features.

As a short-term fix we can disable feature-to-function attribute propagation for NVPTX until we fix it.

@echristo -- any other suggestions?

Hahnfeld added a comment.EditedMay 18 2018, 2:20 PM

I think that's intended because the generated code might use instructions based on that feature. If we want to ignore that, we could override TargetTransformInfo::areInlineCompatible for NVPTX to only compare target-cpu - but I'm not sure if that is wise...

Looks like this was added as a "temporary solution" in D8984. Meanwhile the attribute whitelist was merged half a year later (D7802), so maybe we can just get rid of comparing target-cpu and target-features entirely?

Looks like this was added as a "temporary solution" in D8984. Meanwhile the attribute whitelist was merged half a year later (D7802), so maybe we can just get rid of comparing target-cpu and target-features entirely?

You don't want to get rid of the comparison because you might have specific code that can't be inlined from one into another.

In D47070#1104846, @tra wrote:

This was not intended. :-( I was unaware that GetCPUAndFeaturesAttributes() would add any feature that looks like a valid CPU name to the target-cpu attribute.
All I needed is to make builtins available or not. Setting them as function attributes is not what we need here.

I'm not sure what's the best way to deal with this. On one hand I do need to make some builtins available depending on combination of GPU arch and PTX version. The only way to do it is via the features. On the other hand, the features appear to propagate to LLVM IR, which is something I don't need or want.

One way would be to introduce some sort of feature blacklist which would prevent them from being converted to function attributes.
Or, perhaps, we can change TARGET_BUILTIN or create something similar which would allow availability of builtins w/o relying on features.

As a short-term fix we can disable feature-to-function attribute propagation for NVPTX until we fix it.

@echristo -- any other suggestions?

This is some of what I was talking about when I was mentioning how function attributes and the targets work. Ideally you'll have a compatible set of features and it won't really cause an issue. The idea is that if you're compiling for a minimum ptx feature of X, then any "compatible" set of ptx should be able to inline into your code. I think you do want the features to propagate in general, just specific use cases may not care one way or another - that said, for those use cases you're probably just compiling everything with the same feature anyhow.

I guess, ultimately, I'm not seeing what the concern here is for how features are working or not working for the target so it's harder to help. What is the problem you're running into, or can you try a different way of explaining it to me? :)

tra added a comment.EditedMay 22 2018, 4:09 PM

As a short-term fix we can disable feature-to-function attribute propagation for NVPTX until we fix it.

@echristo -- any other suggestions?

This is some of what I was talking about when I was mentioning how function attributes and the targets work. Ideally you'll have a compatible set of features and it won't really cause an issue. The idea is that if you're compiling for a minimum ptx feature of X, then any "compatible" set of ptx should be able to inline into your code. I think you do want the features to propagate in general, just specific use cases may not care one way or another - that said, for those use cases you're probably just compiling everything with the same feature anyhow.

The thing is that with NVPTX you can not have incompatible functions in the PTX, period. PTXAS will just throw syntax errors at you. In that regard PTX is very different from intel where in the same binary you can have different functions with code for different x86 variants. For PTX, sm_50 and sm_60 mean entirely different GPUs with entirely different instruction sets/encoding. PTX version would be an approximation of a different language dialect . You can not use anything from PTX 4.0 if your file says it's PTX3.0. It's sort of like you can't use c++17 features when you're compiling in c++98 mode. Bottom line is that features and target-cpu do not make much sense for NVPTX. Everything we generate in a TU must satisfy minimum PTX version and minimum GPU variant and it all will be compiled for and run on only one specific GPU. There's no mixing and matching.

The question is -- what's the best way to make things work as they were before I broke them?
@Hahnfeld's idea of ignoring features and target-cpu would get us there, but that may be a never-ending source of surprises if/when something else decides to pay attention to those attributes.
I think a better way to deal with the problem would be to
a) figure out how to make builtins available/or not on clang side, and
b) make target-cpu and target-features attributes explicitly unsupported on NVPTX as we can not provide the functionality those attributes imply.

I guess, ultimately, I'm not seeing what the concern here is for how features are working or not working for the target so it's harder to help. What is the problem you're running into, or can you try a different way of explaining it to me? :)

Here's my understanding of what happens:
We've started adding target-features and target-cpu to everything clang generates.
We also need to link with libdevice (or IR generated by clang which which has functions w/o those attributes. Or we need to link with IR produced by clang which used different CUDA SDK and thus different PTX version in target-feature.
Due to attribute mismatch we are failing to inline some of the functions and that hurts performance.

In D47070#1108803, @tra wrote:

Here's my understanding of what happens:
We've started adding target-features and target-cpu to everything clang generates.
We also need to link with libdevice (or IR generated by clang which which has functions w/o those attributes. Or we need to link with IR produced by clang which used different CUDA SDK and thus different PTX version in target-feature.
Due to attribute mismatch we are failing to inline some of the functions and that hurts performance.

In the case of OpenMP we are linking runtime function in a bitcode library so that Clang can inline them. This dramatically improves performance, so I'm really interested in making this work again with libraries compiled by older versions of Clang.

Is there a viable path forward? Should I put up a patch that just ignores all target-features in LLVM?

tra added a comment.Jun 1 2018, 10:42 AM

IMO overriding TargetTransformInfo::areInlineCompatible to always return true on NVPTX is what we want to do instead of upgrading everything else.
AFAICT, on NVPTX there's no reason to prevent inlining due to those attributes -- we'll never generate code, nor will we ever execute it on any other GPU than we're currently compiling for.

This should get you going until I figure out how to have target-specific builtins without sticking target-cpu and target-features attributes on everything.

Hahnfeld abandoned this revision.Jun 3 2018, 12:15 PM

Superseded by D47691