This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Defer operator overloading errors
ClosedPublic

Authored by yaxunl on Jun 17 2021, 7:48 PM.

Details

Summary

nvcc does not diagnose overloading resolution diagnostics
if it happens in functions not emitted, e.g., if a device
function calls a host function, it is not diagnosed in
host compilation. clang implemented a similar feature
under option -fgpu-defer-diags.

Although clang is able to defer overloading resolution
diagnostics for common functions. It does not defer
overloading resolution caused diagnostics for overloaded
operators. For example, the diagnostics of this test
https://godbolt.org/z/P5eGrrnEd is deferred by nvcc
but not by clang.

This patch extends the existing deferred
diagnostic mechanism and defers a diagnostic caused
by overloaded operator.

Diff Detail

Event Timeline

yaxunl requested review of this revision.Jun 17 2021, 7:48 PM
yaxunl created this revision.
yaxunl edited the summary of this revision. (Show Details)Jun 17 2021, 7:52 PM
tra added a comment.Jun 21 2021, 10:14 AM

I don't think we want to do this.

struct S {
    S& operator <<(int x);
};

__device__ void foo() {
    S s;
    s<<1;
}

<source>:7:6: error: invalid operands to binary expression ('S' and 'int')
    s<<1;
    ~^ ~
<source>:2:8: note: candidate function not viable: call to __host__ function from __device__ function
    S& operator <<(int x);

https://godbolt.org/z/Maa6Ed94W
I believe diagnostic issued by clang is completely valid here.

This is the case where clang and NVCC fundamentally differ in their compilation approach. NVCC effectively does not see device-side code during host compilation. Clang does. The code above is wrong regardless of whether we're compiling for the host of for the device and therefore, I believe, the diagnostic is appropriate.

If the operator is intended to be used on GPU, it should have appropriate attributes. If it's not, then it's an error. NVCC not diagnosing it is a deficiency in NVCC, IMO. This particular problem should be fixed in the source code.

I don't think we want to do this.

struct S {
    S& operator <<(int x);
};

__device__ void foo() {
    S s;
    s<<1;
}

<source>:7:6: error: invalid operands to binary expression ('S' and 'int')
    s<<1;
    ~^ ~
<source>:2:8: note: candidate function not viable: call to __host__ function from __device__ function
    S& operator <<(int x);

https://godbolt.org/z/Maa6Ed94W
I believe diagnostic issued by clang is completely valid here.

This is the case where clang and NVCC fundamentally differ in their compilation approach. NVCC effectively does not see device-side code during host compilation. Clang does. The code above is wrong regardless of whether we're compiling for the host of for the device and therefore, I believe, the diagnostic is appropriate.

If the operator is intended to be used on GPU, it should have appropriate attributes. If it's not, then it's an error. NVCC not diagnosing it is a deficiency in NVCC, IMO. This particular problem should be fixed in the source code.

nvcc does see and parse the device functions in host compilation, e.g.

# cat a.cu
struct S {
    S& operator <<(int x);
};

#if !__CUDA_ARCH__
__device__ void foo() {
    S s;
    s<<1;
    s<<;
}
#endif

# nvcc a.cu
a.cu(9): error: expected an expression

If nvcc simply skips all device functions, it will not diagnose the syntax error in the above code. It just chooses not to diagnose overloading resolution related errors.

P.S. To reproduce the above diagnostics you need to execute nvcc directly instead of on goldbolt.org, since glodbolt.org only does device compilation for CUDA.

tra added a comment.Jun 21 2021, 1:45 PM

nvcc does see and parse the device functions in host compilation, e.g.

# cat a.cu
struct S {
    S& operator <<(int x);
};

#if !__CUDA_ARCH__
__device__ void foo() {
    S s;
    s<<1;
    s<<;
}
#endif

# nvcc a.cu
a.cu(9): error: expected an expression

This particular error is reported by cudafe++. It's *not* the host compiler.
If you compile with "-keep" and examine the code that's actually compiled for the host, you will see this:

__attribute__((unused)) void foo() {int volatile ___ = 1;
# 10
::exit(___);}
#if 0 
# 6
{
# 7
S s;
# 8
(s << (1));
# 10
}
#endif

Notice the #if 0 surrounding the original code.

tra added a comment.Jun 21 2021, 1:53 PM

In NVCC's compilation only device-side compilation is aware of "semantics' of __host__/__device__ attributes. HOST-side compilation is not aware of GPU existence. cudafe++ massages the input source to create that fiction for the host compilation, but AFAICT, it does not care about semantics of what can be called from where but just strips deive-side code from the sources passed to the host compilation. Essentially, its diagnostics is limited to syntax errors. AFAICT, semantics is only checked by cicc during device-side compilation.

So, in the end it may look that nvcc decided to ignore errors in device code, but in reality, host-side compilation is a regular GPU-unaware C++ compilation that does not see the code with the error.

In NVCC's compilation only device-side compilation is aware of "semantics' of __host__/__device__ attributes. HOST-side compilation is not aware of GPU existence. cudafe++ massages the input source to create that fiction for the host compilation, but AFAICT, it does not care about semantics of what can be called from where but just strips deive-side code from the sources passed to the host compilation. Essentially, its diagnostics is limited to syntax errors. AFAICT, semantics is only checked by cicc during device-side compilation.

So, in the end it may look that nvcc decided to ignore errors in device code, but in reality, host-side compilation is a regular GPU-unaware C++ compilation that does not see the code with the error.

That explains why nvcc does not allow overloaded functions differing only by __host__ or __device__ attributes, since they all become host functions in host compilations.

However, this does cause source level incompatibilities, i.e. CUDA code that passes nvcc does not pass clang. This patch somehow addresses that without compromising clang's more sophisticated __host__/__device__ overloading resolution capabilities.

tra added a comment.Jun 21 2021, 4:05 PM

However, this does cause source level incompatibilities, i.e. CUDA code that passes nvcc does not pass clang. This patch somehow addresses that without compromising clang's more sophisticated __host__/__device__ overloading resolution capabilities.

Yes. This will not be the first time where clang does diagnose invalid code while NVCC does not.
In general, some amount of source code porting between nvcc and clang is expected. In practice the amount of work required to make CUDA code portable has been relatively minor.

My general rule of thumb is that if both host and device compilation have enough information to diagnose a problem, both should do it.
In this case, there's no ambiguity that the code is invalid. The sooner we diagnose it, the better.

Is there particular use case where this patch would be necessary?

However, this does cause source level incompatibilities, i.e. CUDA code that passes nvcc does not pass clang. This patch somehow addresses that without compromising clang's more sophisticated __host__/__device__ overloading resolution capabilities.

Yes. This will not be the first time where clang does diagnose invalid code while NVCC does not.
In general, some amount of source code porting between nvcc and clang is expected. In practice the amount of work required to make CUDA code portable has been relatively minor.

My general rule of thumb is that if both host and device compilation have enough information to diagnose a problem, both should do it.
In this case, there's no ambiguity that the code is invalid. The sooner we diagnose it, the better.

Is there particular use case where this patch would be necessary?

Such host/device overloading resolution induced issue is not limited to device functions calling host functions. It could also happen to host device functions calling host functions, which is less controversial for deferring.

tra added a comment.Jun 22 2021, 10:43 AM

Such host/device overloading resolution induced issue is not limited to device functions calling host functions.

It does not change the fact that the code in the test above is invalid, regardless of whether we compile it on the host or on the device.

It could also happen to host device functions calling host functions, which is less controversial for deferring.

H/D functions are special, because their overloading is affected whether compilation is done on the host or on the device side and we often can not tell whether the diagnostic is appropriate until codegen.

I still think that deferring diags for unambiguously invalid code is not a good idea. The fact that NVCC can only diagnose such errors during device-side compilation is not a good enough reason, IMO, to make clang ignore real errors, even if we'd still end up eventually failing later, during device-side compilation.

@rsmith, @rjmccall -- any thoughts?

Such host/device overloading resolution induced issue is not limited to device functions calling host functions.

It does not change the fact that the code in the test above is invalid, regardless of whether we compile it on the host or on the device.

It could also happen to host device functions calling host functions, which is less controversial for deferring.

H/D functions are special, because their overloading is affected whether compilation is done on the host or on the device side and we often can not tell whether the diagnostic is appropriate until codegen.

I still think that deferring diags for unambiguously invalid code is not a good idea. The fact that NVCC can only diagnose such errors during device-side compilation is not a good enough reason, IMO, to make clang ignore real errors, even if we'd still end up eventually failing later, during device-side compilation.

@rsmith, @rjmccall -- any thoughts?

We don't defer such diags by default. We only defer them under option -fgpu-defer-diags, which users have to specify explicitly.

tra added a comment.Jun 22 2021, 3:09 PM

We don't defer such diags by default. We only defer them under option -fgpu-defer-diags, which users have to specify explicitly.

Thank you for pointing this out. I've missed that all the tests were run with the flag -- it was hidden in the diff.

While I can't say I'm happy about deferring unambiguous diags, that train is gone. Making -fgpu-defer-diags work more consistently is an improvement.

So, fine. Let's proceed discussing the changes in the patch.

clang/test/SemaCUDA/deferred-oeverload.cu
60–62

If we're allowing to postpone an invalid call of a host function, shouldn't we also allow postponing other errors?
E.g. should we postpone the error on an attempt to call callee4() ?
Similarly, if we were to call a undeclared_func() here, should the error also be postponed?

TBH, I don't quite understand now how to tell what is and isn't supposed to be deferred with -fgpu-defer-diags.
Is there a clear criteria what should and should not be deferred?

yaxunl added inline comments.Jun 23 2021, 11:21 AM
clang/test/SemaCUDA/deferred-oeverload.cu
60–62

We discussed about what diagnostics to be deferred before. We do not want to defer all diagnostics since nvcc apparently only ignores host/device related diagnostics. Our previous conclusion is to defer overloading resolution related diagnostics when the full candidates set include host device functions or wrong-sided candidates. This roughly matches nvcc's behavior.

tra accepted this revision.Jun 23 2021, 11:46 AM
tra added inline comments.
clang/test/SemaCUDA/deferred-oeverload.cu
60–62

Please bear with me. I don't have all the relevant context in my head. Nor would anyone else looking at the patch.

Our previous conclusion is to defer overloading resolution related diagnostics when the full candidates set include host device functions or wrong-sided candidates. This roughly matches nvcc's behavior.

Thank you. This should probably be added somewhere at the top of this test file.

This revision is now accepted and ready to land.Jun 23 2021, 11:46 AM
yaxunl marked 2 inline comments as done.Jun 23 2021, 12:54 PM
yaxunl added inline comments.
clang/test/SemaCUDA/deferred-oeverload.cu
60–62

will do when committing

This revision was landed with ongoing or failed builds.Jun 23 2021, 8:40 PM
This revision was automatically updated to reflect the committed changes.
yaxunl marked an inline comment as done.
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2021, 8:40 PM
ro added a subscriber: ro.Jun 25 2021, 12:00 AM

This patch broke the Solaris/sparcv9 and Solaris/amd64 buildbots:

/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1782:11: error: non-const lvalue reference to type 'clang::Sema' cannot bind to a temporary of type 'int'
        : S(_S), SavedDeferDiags(S.DeferDiags) {
          ^ ~~
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1782:34: warning: reference 'S' is not yet bound to a value when used here [-Wuninitialized]
        : S(_S), SavedDeferDiags(S.DeferDiags) {
                                 ^
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1783:22: error: use of non-static data member 'DeferDiags' of 'Sema' from nested type 'DeferDiagsRAII'
      S.DeferDiags = DeferDiags;
                     ^~~~~~~~~~
1 warning and 3 errors generated.
[173/835] Building CXX object tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o
FAILED: tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o 
/opt/llvm/12/bin/clang++  -DCLANG_ROUND_TRIP_CC1_ARGS=ON -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Parse -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/lib/Parse -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include -Itools/clang/include -Iinclude -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/include -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/include/llvm/Support/Solaris -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o -MF tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o.d -o tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o -c /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/lib/Parse/ParseDeclCXX.cpp
In file included from /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/lib/Parse/ParseDeclCXX.cpp:13:
In file included from /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Parse/Parser.h:24:
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1781:26: error: expected ')'
    DeferDiagsRAII(Sema &_S, bool DeferDiags)
                         ^
/usr/include/iso/ctype_iso.h:32:12: note: expanded from macro '_S'
#define _S      0x00000008      /* Spacing character */
                ^
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1781:19: note: to match this '('
    DeferDiagsRAII(Sema &_S, bool DeferDiags)
                  ^
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1782:11: error: non-const lvalue reference to type 'clang::Sema' cannot bind to a temporary of type 'int'
        : S(_S), SavedDeferDiags(S.DeferDiags) {
          ^ ~~
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1782:34: warning: reference 'S' is not yet bound to a value when used here [-Wuninitialized]
        : S(_S), SavedDeferDiags(S.DeferDiags) {
                                 ^
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1783:22: error: use of non-static data member 'DeferDiags' of 'Sema' from nested type 'DeferDiagsRAII'
      S.DeferDiags = DeferDiags;
                     ^~~~~~~~~~
1 warning and 3 errors generated.

_S is a reserved identifier in <ctype.h>.

In D104505#2840239, @ro wrote:

This patch broke the Solaris/sparcv9 and Solaris/amd64 buildbots:

/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1782:11: error: non-const lvalue reference to type 'clang::Sema' cannot bind to a temporary of type 'int'
        : S(_S), SavedDeferDiags(S.DeferDiags) {
          ^ ~~
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1782:34: warning: reference 'S' is not yet bound to a value when used here [-Wuninitialized]
        : S(_S), SavedDeferDiags(S.DeferDiags) {
                                 ^
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1783:22: error: use of non-static data member 'DeferDiags' of 'Sema' from nested type 'DeferDiagsRAII'
      S.DeferDiags = DeferDiags;
                     ^~~~~~~~~~
1 warning and 3 errors generated.
[173/835] Building CXX object tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o
FAILED: tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o 
/opt/llvm/12/bin/clang++  -DCLANG_ROUND_TRIP_CC1_ARGS=ON -DGTEST_HAS_RTTI=0 -D_DEBUG -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/clang/lib/Parse -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/lib/Parse -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include -Itools/clang/include -Iinclude -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/include -I/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/llvm/include/llvm/Support/Solaris -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -fdiagnostics-color -ffunction-sections -fdata-sections -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O3 -DNDEBUG    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o -MF tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o.d -o tools/clang/lib/Parse/CMakeFiles/obj.clangParse.dir/ParseDeclCXX.cpp.o -c /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/lib/Parse/ParseDeclCXX.cpp
In file included from /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/lib/Parse/ParseDeclCXX.cpp:13:
In file included from /opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Parse/Parser.h:24:
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1781:26: error: expected ')'
    DeferDiagsRAII(Sema &_S, bool DeferDiags)
                         ^
/usr/include/iso/ctype_iso.h:32:12: note: expanded from macro '_S'
#define _S      0x00000008      /* Spacing character */
                ^
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1781:19: note: to match this '('
    DeferDiagsRAII(Sema &_S, bool DeferDiags)
                  ^
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1782:11: error: non-const lvalue reference to type 'clang::Sema' cannot bind to a temporary of type 'int'
        : S(_S), SavedDeferDiags(S.DeferDiags) {
          ^ ~~
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1782:34: warning: reference 'S' is not yet bound to a value when used here [-Wuninitialized]
        : S(_S), SavedDeferDiags(S.DeferDiags) {
                                 ^
/opt/llvm-buildbot/home/solaris11-amd64/clang-solaris11-amd64/llvm/clang/include/clang/Sema/Sema.h:1783:22: error: use of non-static data member 'DeferDiags' of 'Sema' from nested type 'DeferDiagsRAII'
      S.DeferDiags = DeferDiags;
                     ^~~~~~~~~~
1 warning and 3 errors generated.

_S is a reserved identifier in <ctype.h>.

Thanks for letting me know. I am fixing it.