This is an archive of the discontinued LLVM Phabricator instance.

[CUDA][HIP] support __noinline__ as keyword
ClosedPublic

Authored by yaxunl on May 3 2022, 11:27 AM.

Details

Summary

CUDA/HIP programs use __noinline__ like a keyword e.g.
__noinline__ void foo() {} since __noinline__ is defined
as a macro __attribute__((noinline)) in CUDA/HIP runtime
header files.

However, gcc and clang supports __attribute__((__noinline__))
the same as __attribute__((noinline)). Some C++ libraries
use __attribute__((__noinline__)) in their header files.
When CUDA/HIP programs include such header files,
clang will emit error about invalid attributes.

This patch fixes this issue by supporting __noinline__ as
a keyword, so that CUDA/HIP runtime could remove
the macro definition.

Diff Detail

Event Timeline

yaxunl created this revision.May 3 2022, 11:27 AM
yaxunl requested review of this revision.May 3 2022, 11:27 AM
yaxunl updated this revision to Diff 426794.May 3 2022, 11:48 AM

add feature cuda_noinline_keyword to facilitate CUDA/HIP headers removing noinline macro

I don't know how language extensions come about in CUDA or HIP -- is there an appropriate standards body (or something similar) that's aware of this extension and supports it?

The changes should likely come with a release note entry about the new functionality, and some documentation changes as well.

clang/include/clang/Basic/Attr.td
1777–1780
clang/include/clang/Basic/Features.def
274

Do the CUDA or HIP specs define __noinline__ as a keyword specifically? If not, this isn't a FEATURE, it's an EXTENSION because it's specific to Clang, not the language standard.

clang/lib/Parse/ParseDecl.cpp
902

I think we should we be issuing a pedantic "this is a clang extension" warning here, WDYT?

clang/test/SemaCUDA/noinline.cu
9

I think there should also be a test like:

[[gnu::__noinline__]] void fun4() {}

to verify that the double square bracket syntax also correctly handles this being a keyword now (I expect the test to pass).

tra added a reviewer: rsmith.May 6 2022, 11:04 AM
tra added a subscriber: rsmith.

I don't know how language extensions come about in CUDA or HIP -- is there an appropriate standards body (or something similar) that's aware of this extension and supports it?

Summoning @rsmith for his language lawyer expertise.

yaxunl marked 4 inline comments as done.May 6 2022, 11:12 AM
yaxunl added inline comments.
clang/include/clang/Basic/Attr.td
1777–1780

will do

clang/include/clang/Basic/Features.def
274

CUDA/HIP do not have language spec. In their programming guide, they do not define __noinline__ as a keyword.

Will make it an extension.

clang/lib/Parse/ParseDecl.cpp
902

will do

clang/test/SemaCUDA/noinline.cu
9

will do

yaxunl updated this revision to Diff 427685.May 6 2022, 11:16 AM
yaxunl marked 4 inline comments as done.

revised by Aaron's comments

aaron.ballman added inline comments.May 6 2022, 11:23 AM
clang/include/clang/Basic/Features.def
274

CUDA/HIP do not have language spec.

Then what body of people governs changes to the language? Basically, I'm trying to understand whether this patch meets the community requirements for adding an extension: https://clang.llvm.org/get_involved.html#criteria, specifically #4 (though the rest of the points are worth keeping in mind). I don't want to Clang ending up stepping on toes by defining this extension only to accidentally frustrate the CUDA community.

yaxunl marked an inline comment as done.May 6 2022, 11:46 AM
yaxunl added inline comments.
clang/include/clang/Basic/Features.def
274

specific to __noinline__, it is largely determined by the existing behaviour of CUDA SDK.

The CUDA SDK defines __noinline__ as a macro __attribute__((noinline)). However, it is not compatible with some C++ headers which use __attribute__((__noinline__)).

This patch will not change the usage pattern of __noinline__. It is equivalent to the original behaviour with the benefit of being compatible with C++ headers.

yaxunl updated this revision to Diff 427700.May 6 2022, 11:48 AM
yaxunl marked an inline comment as done.

added release note and documentation

tra added a comment.May 6 2022, 12:07 PM

CUDA/HIP do not have language spec.

Well. It's not completely true. CUDA programming guide does serve as the de-facto spec for CUDA. It's far from perfect, but it does mention __noinline__ and __forceinline__ as function qualifiers: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#noinline-and-forceinline

CUDA/HIP do not have language spec.

Well. It's not completely true. CUDA programming guide does serve as the de-facto spec for CUDA. It's far from perfect, but it does mention __noinline__ and __forceinline__ as function qualifiers: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#noinline-and-forceinline

Thanks for the pointer. I missed that part.

CUDA SDK implements __noinline__ as attribute __attribute__((noinline)) though. Some requirements may not have diagnostics.

CUDA/HIP do not have language spec.

Well. It's not completely true. CUDA programming guide does serve as the de-facto spec for CUDA. It's far from perfect, but it does mention __noinline__ and __forceinline__ as function qualifiers: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#noinline-and-forceinline

Thank you, that's the magic words I was hoping for -- because they're described as function qualifiers, I think it's justifiable to add them as a keyword implementation in Clang and not worry about stepping on the toes of the CUDA spec (it's adhering to what the current spec requires).

Should we do __forceinline__ at the same time so that there's consistency?

clang/lib/Parse/ParseDecl.cpp
902

I'm questioning whether my advice here was good or not -- now that I see the CUDA spec already calls these function qualifiers... it's debatable whether this is a Clang extension or just the way in which Clang implements the CUDA function qualifiers. @tra -- do you have opinions?

I'm sort of leaning towards dropping the extension warning, but the only reason I can think of for keeping it is if Clang is the only CUDA compiler that doesn't require you to include a header before using the function qualifiers. If that's the case, there is a portability concern.

Should we do __forceinline__ at the same time so that there's consistency?

__forceinline__ does not have the issue as __noinline__ has since it is not a GCC attribute. The current CUDA/HIP implementation of __forceinline__ in header files is sufficient. I do not see the benefit of implementing __forceinline__ as a keyword.

Should we do __forceinline__ at the same time so that there's consistency?

__forceinline__ does not have the issue as __noinline__ has since it is not a GCC attribute. The current CUDA/HIP implementation of __forceinline__ in header files is sufficient. I do not see the benefit of implementing __forceinline__ as a keyword.

Primarily to reduce user confusion. It's kind of weird for __noinline__ to be a keyword and __forceinline__ to not be a keyword when they're both defined the same way by the CUDA spec. This means you can #undef one of them but not the other, that sort of thing.

clang/test/CodeGenCUDA/noinline.cu
2

I've asked @erichkeane to weigh in on whether there's a better approach here than specifying an optimization level.

clang/test/SemaCUDA/noinline.cu
9

Ah, I just noticed we also have no tests for the behavior of the keyword in the presence of the macro being defined. e.g.,

#define __noinline__ __attribute__((__noinline__))
__noinline__ void fun5() {}
erichkeane added inline comments.May 9 2022, 10:15 AM
clang/test/CodeGenCUDA/noinline.cu
2

You don't need to do this, it looks like all you're trying to do is keep 'clang' out of O0 mode. However, what you do NOT want is the optimizations to run. The common way to do that is to combine O1/O2/etc like: -O2 -disable-llvm-passes

This will keep clang in O2 mode, but will keep the optimizer from running anything, which might mess with the test later on.

tra added a comment.May 9 2022, 12:21 PM

Should we do __forceinline__ at the same time so that there's consistency?

Primarily to reduce user confusion. It's kind of weird for __noinline__ to be a keyword and __forceinline__ to not be a keyword when they're both defined the same way by the CUDA spec. This means you can #undef one of them but not the other, that sort of thing.

I'm slightly biased towards making them both a keyword. That said, I may be convinced otherwise if we discover that it may break some assumptions in existing C++ code. I just don't know enough.

clang/lib/Parse/ParseDecl.cpp
902

I'm not sure if such a warning would be useful.

the only reason I can think of for keeping it is if Clang is the only CUDA compiler that doesn't require you to include a header before using the function qualifiers. If that's the case, there is a portability concern.

I don't think it's an issue.

We already have similar divergence between nvcc/clang. E.g. built-in variables like threadIdx. Clang implements them in a header, but NVCC provides them by compiler itself.
With both compilers the variables are available by the time we get to compile user code. Virtually all CUDA compilations are done with tons of CUDA headers pre-included by compiler. Those that do not do that are already on their own and have to provide many other 'standard' CUDA things like target attributes. I don't think we need to worry about that.

yaxunl marked 4 inline comments as done.May 9 2022, 12:34 PM

__forceinline__ does not have the issue as __noinline__ has since it is not a GCC attribute. The current CUDA/HIP implementation of __forceinline__ in header files is sufficient. I do not see the benefit of implementing __forceinline__ as a keyword.

Primarily to reduce user confusion. It's kind of weird for __noinline__ to be a keyword and __forceinline__ to not be a keyword when they're both defined the same way by the CUDA spec. This means you can #undef one of them but not the other, that sort of thing.

If we are to add __forceinline__ as a keyword, I feel it better be a separate patch to be cleaner.

clang/lib/Parse/ParseDecl.cpp
902

I can remove the diagnostics since it seems unnecessary.

I tend to treat it as an extension since nvcc is the de facto standard implementation, which does not implement it as a keyword. Compared to that, this is like an extension.

clang/test/CodeGenCUDA/noinline.cu
2

will use -O2 -disable-llvm-passes

clang/test/SemaCUDA/noinline.cu
9

will do

yaxunl updated this revision to Diff 428167.May 9 2022, 12:36 PM
yaxunl marked 2 inline comments as done.

removed diagnostics and added more tests

tra accepted this revision.May 9 2022, 1:17 PM

If we are to add __forceinline__ as a keyword, I feel it better be a separate patch to be cleaner.

Fine with me.

clang/lib/Parse/ParseDecl.cpp
902

I'd argue that NVCC does implement it (as in "documents and makes it available"). Providing the documented functionality using a different implementation does not reach the point of being an extension, IMO. While there are observable differences between implementations, depending on them would be a portability error for the user.

This revision is now accepted and ready to land.May 9 2022, 1:17 PM

If we are to add __forceinline__ as a keyword, I feel it better be a separate patch to be cleaner.

I'm fine with that.

A few nits and a question about the test recently added.

clang/docs/ReleaseNotes.rst
353–358
clang/include/clang/Basic/AttrDocs.td
543
clang/test/SemaCUDA/noinline.cu
9

I missed an important detail -- I think this is now going to generate a warning in -pedantic mode (through -Wkeyword-macro) when compiling for CUDA; is that going to be a problem for CUDA headers, or are those always included as a system header (and so the diagnostics will be suppressed)?

yaxunl marked 4 inline comments as done.May 10 2022, 9:30 AM
yaxunl added inline comments.
clang/docs/ReleaseNotes.rst
353–358

will fix

clang/include/clang/Basic/AttrDocs.td
543

will fix.

clang/lib/Parse/ParseDecl.cpp
902

that makes sense. will change the extension to feature

clang/test/SemaCUDA/noinline.cu
9

I could not find how clang driver adds CUDA include path

https://github.com/llvm/llvm-project/blob/main/clang/lib/Driver/ToolChains/Cuda.cpp#L284

@tra do you know how CUDA include path is added? is it done by CMake?

For HIP the HIP include path is added as a system include path by clang driver.

aaron.ballman added inline comments.May 10 2022, 9:33 AM
clang/test/SemaCUDA/noinline.cu
9

Whatever we find out, we can emulate its behavior here in the test file to see what the diagnostic behavior will be (you can use GNU linemarkers to convince the compiler parts of the source are in a system header).

yaxunl marked 4 inline comments as done.May 10 2022, 10:10 AM
yaxunl added inline comments.
clang/test/SemaCUDA/noinline.cu
9

will add tests for that.

It seems no matter it is system header or normal header, no warnings are emitted even with -pedantic.

aaron.ballman added inline comments.May 10 2022, 10:11 AM
clang/test/SemaCUDA/noinline.cu
9

Excellent, thank you!

yaxunl updated this revision to Diff 428423.May 10 2022, 10:13 AM
yaxunl marked an inline comment as done.

make it a feature, add tests for pedantic, fix release notes and doecumentation

tra added inline comments.May 10 2022, 10:51 AM
clang/test/SemaCUDA/noinline.cu
9
This revision was landed with ongoing or failed builds.May 10 2022, 11:34 AM
This revision was automatically updated to reflect the committed changes.
yaxunl marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptMay 10 2022, 11:34 AM
delcypher added inline comments.
clang/lib/Basic/IdentifierTable.cpp
111

@yaxunl Is it intentional that you didn't update KEYALL here? That means KEYALL doesn't include the bit for KEYCUDA.

If that was your intention then this will break if someone adds a new key. E.g.

KEYCUDA = 0x2000000,
KEYSOMENEWTHING = 0x4000000,
// ...
// KEYALL now includes `KEYCUDA`, whereas it didn't before.
// KEYALL includes KEYSOMENEWTHING 
KEYALL = (0x7ffffff & ~KEYNOMS18 &
              ~KEYNOOPENCL) // KEYNOMS18 and KEYNOOPENCL are used to exclude.
...
  1. Updating the 0x1ffffff constant to 0x3ffffff so that KEYALL includes KEYCUDA
  2. If your intention is to not have KEYCUDA set in KEYALL then amend KEYALL to be.
KEYALL = (0x7ffffff & ~KEYNOMS18 &
              ~KEYNOOPENCL & ~KEYCUDA ) // KEYNOMS18 and KEYNOOPENCL are used to exclude.
// KEYCUDA is not included in KEYALL
yaxunl added inline comments.May 10 2022, 7:52 PM
clang/lib/Basic/IdentifierTable.cpp
111

My intention is not to include KEYCUDA in KEYALL.

Should I change KEYALL to

KEYALL = (0x3ffffff & ~KEYNOMS18 &
              ~KEYNOOPENCL & ~KEYCUDA ) // KEYNOMS18 and KEYNOOPENCL are used to exclude.
// KEYCUDA is not included in KEYALL

instead of

KEYALL = (0x7ffffff & ~KEYNOMS18 &
              ~KEYNOOPENCL & ~KEYCUDA ) // KEYNOMS18 and KEYNOOPENCL are used to exclude.
// KEYCUDA is not included in KEYALL

since the current maximum mask is 0x3ffffff instead of 0x7ffffff

delcypher added inline comments.May 10 2022, 10:28 PM
clang/lib/Basic/IdentifierTable.cpp
111

Oops, you're right it would be 0x3ffffff. I wonder though if we should clean this up so we don't need to manually update the bit mask every time... what if it was written like this?

 enum {
    KEYC99        = 0x1,
    KEYCXX        = 0x2,
    KEYCXX11      = 0x4,
    ....
    KEYSYCL       = 0x1000000,
    KEYCUDA       = 0x2000000,
    KEYMAX = KEYCUDA, // Must be set to the largest KEY enum value
    KEYALLCXX = KEYCXX | KEYCXX11 | KEYCXX20,

    // KEYNOMS18 and KEYNOOPENCL are used to exclude.
    // KEYCUDA is not included in KEYALL because <FIXME add reason here>
    KEYALL = (((KEYMAX & (KEYMAX-1)) & ~KEYNOMS18 & ~KEYNOOPENCL & ~KEYCUDA)
};
yaxunl added inline comments.May 11 2022, 8:18 AM
clang/lib/Basic/IdentifierTable.cpp
111

On second thought, KEYALL does not need to exclude KEYCUDA.

However, it would be good to set KEYALL in a generic approach. I will open a separate review.

yaxunl marked 2 inline comments as done.May 11 2022, 8:48 AM
yaxunl added inline comments.
clang/lib/Basic/IdentifierTable.cpp
111
delcypher added inline comments.May 11 2022, 8:50 AM
clang/lib/Basic/IdentifierTable.cpp
111

Oops that should say

KEYALL = (((KEYMAX | (KEYMAX-1)) & ~KEYNOMS18 & ~KEYNOOPENCL & ~KEYCUDA)