This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Diagnose aggregate args containing half types
Needs ReviewPublic

Authored by yaxunl on Mar 7 2021, 7:34 AM.

Details

Reviewers
tra
Summary

gcc and clang currently do not have a consistent ABI
for half precision types. Passing aggregate args containing half precision
types between clang and gcc can cause UB.

This patch adds an option -fhip-allow-half-arg. When off, clang
will diagnose aggregate arguments containing half precision
types in host functions.

Diff Detail

Event Timeline

yaxunl created this revision.Mar 7 2021, 7:34 AM
yaxunl requested review of this revision.Mar 7 2021, 7:34 AM
yaxunl updated this revision to Diff 328979.Mar 8 2021, 5:06 AM

fix test and clang-tidy warnings

tra added a comment.Sep 30 2021, 11:42 AM

Is this patch still relevant? Looks like I've missed it.

What exactly is the difference between gcc and clang regarding fp16 and why does it matter for aggregate arguments?
On a trivial example both clang and gcc appear to treat _Float16 similarly: https://godbolt.org/z/8WxK95zTj

In D98143#3034419, @tra wrote:

Is this patch still relevant? Looks like I've missed it.

What exactly is the difference between gcc and clang regarding fp16 and why does it matter for aggregate arguments?
On a trivial example both clang and gcc appear to treat _Float16 similarly: https://godbolt.org/z/8WxK95zTj

On gcc11 and below, since gcc does not support fp16, it is common practice to use short to pass fp16 in struct. Then gcc and clang has different ABI: https://godbolt.org/z/zqhT7x7qo

Basically if one compiles a function with struct type arg containing _Float16 with clang, then compiles a caller with struct type arg containing short with gcc, it will not work.

If use gcc trunk with _Float16, then there will not be such issue since the ABI is the same.

tra added a comment.Sep 30 2021, 3:22 PM

On gcc11 and below, since gcc does not support fp16, it is common practice to use short to pass fp16 in struct. Then gcc and clang has different ABI: https://godbolt.org/z/zqhT7x7qo

Basically if one compiles a function with struct type arg containing _Float16 with clang, then compiles a caller with struct type arg containing short with gcc, it will not work.

If both compilers use the same type they both agree on, then there's no problem. If they pass data using different types but identical names, that's an ODR violation and it's not specific to fp16.

Given that fp16 ABI is not stable, the dignostics provided by the patch may be a useful safeguard to have.

That said, I'm not quite sure how one would use it in practice. Is that purely to check that fp16 is not passed around anywhere in the host code?
If we compile everything with the check enabled that would potentially produce a lot of irrelevant/false-positive results.
E.g. there may be legitimate host-only code passing fp16 parameters and used only from clang-compiled files.
Perhaps the check should be limited to externally visible functions only, and, maybe, make it a warning?
We can not conclusively tell whether use of fp16 by the callee will be a problem if we don't know anything about the caller.

Are there specific cases where this particular issue popped up?
I think I did run into an issue with some ROCm headers that were using different fp16 types depending on how a file was compiled, even though in both cases compiler was clang. It's been a while ago, so I do not recall the exact details.
It might've been this code: https://github.com/ROCmSoftwarePlatform/rocBLAS/blob/c560f436cea24a7c5b775042464bc4c2989744ca/library/include/internal/rocblas-types.h#L73

clang/include/clang/Basic/LangOptions.def
256–258

Nit: Maybe simplify it to Allow using half precision types in host function parameter and return types.

Also, this issue would technically affect CUDA, too, so GPUAllowHalfArg might work better.

clang/lib/Sema/SemaDecl.cpp
8916–8917

I'd split it into separate early exits for the flag and for attribute checks, with an added comment that we only check host functions.

That brings the question of how we should handle HD functions. Right now they would not checked, which makes it possible to run into the issue you're trying to solve if that function is used on the host.

clang/test/SemaCUDA/half-arg.cu
9–10

What happens if I have a lambda that captures an fp16 value?

Granted, it will be inadvisable to pass across compilers, but, nevertheless it's another kind of an aggregate we may end up passing to a function.