This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Fix regressions due to fp contract change
ClosedPublic

Authored by yaxunl on Oct 26 2020, 10:56 AM.

Details

Summary

Recently HIP toolchain made a change to use clang instead of opt/llc to do compilation
(https://reviews.llvm.org/D81861). The intention is to make HIP toolchain canonical like
other toolchains.

However, this change introduced an unintentional change regarding backend fp fuse
option, which caused regressions in some HIP applications.

Basically before the change, HIP toolchain used clang to generate bitcode, then use
opt/llc to optimize bitcode and generate ISA. As such, the amdgpu backend takes
the default fp fuse mode which is 'Standard'. This mode respect contract flag of
fmul/fadd instructions and do not fuse fmul/fadd instructions without contract flag.

However, after the change, HIP toolchain now use clang to generate IR, do optimization,
and generate ISA as one process. Now amdgpu backend fp fuse option is determined
by -ffp-contract option, which is 'fast' by default. And this -ffp-contract=fast language option
is translated to 'Fast' fp fuse option in backend. Suddenly backend starts to fuse fmul/fadd
instructions without contract flag.

This causes wrong result for some device library functions, e.g. tan(-1e20), which should
return 0.8446, now returns -0.933. What is worse is that since backend with 'Fast' fp fuse
option does not respect contract flag, there is no way to use #pragma clang fp contract
directive to enforce fp contract requirements.

This patch fixes the regression by introducing a new value 'fast-honor-pragmas' for -ffp-contract
and use it for HIP by default. 'fast-honor-pragmas' is equivalent to 'fast' in frontend but
let the backend to use 'Standard' fp fuse option. 'fast-honor-pragmas' is useful since 'Fast'
fp fuse option in backend does not honor contract flag, it is of little use to HIP
applications since all code with #pragma STDC FP_CONTRACT or any IR from a
source compiled with -ffp-contract=on is broken.

Diff Detail

Event Timeline

yaxunl created this revision.Oct 26 2020, 10:56 AM
yaxunl requested review of this revision.Oct 26 2020, 10:56 AM

I have objections to the code change here. I'll leave the conceptual question to other people interested in the HIP toolchain.

I have objections to the code change here. I'll leave the conceptual question to other people interested in the HIP toolchain.

Is it OK to introduce a clang codegen option e.g. -fp-contract-backend=x to control the backend fp fuse option? By default it matches -ffp-contract language option, but allows being overridden by explicit option. Then HIP toolchain can use it to override the backend fp fuse option.

tra added inline comments.Oct 26 2020, 11:42 AM
clang/lib/CodeGen/BackendUtil.cpp
502

I don't think it's a good idea to force this.

Perhaps a better way to address this would be to set HIP-specific default to Standard where CUDA does it:
https://github.com/llvm/llvm-project/blob/master/clang/lib/Frontend/CompilerInvocation.cpp#L2415

Currently HIP inherits this setting from CUDA.

yaxunl added inline comments.Oct 26 2020, 11:53 AM
clang/lib/CodeGen/BackendUtil.cpp
502

We want to keep -ffp-contract=fast for frontend so that we can continue emitting fmul/fadd insts with contract flag in IR for HIP programs. We only want to change the backend fp fuse option. Currently there is no separate clang option to set backend fp fuse option.

Argh, sorry! I meant to say "I have *no* objections".

tra added inline comments.Oct 26 2020, 3:47 PM
clang/lib/CodeGen/BackendUtil.cpp
502

I do not see any references to AllowFPOpFusion anywhere under clang/ other than in this function.
Perhaps I'm missing something. How/where does it make a difference in the front-end other than setting the option for the back-end?

tra added a subscriber: scanon.Oct 27 2020, 10:11 AM
tra added inline comments.
clang/lib/CodeGen/BackendUtil.cpp
502

Thank you. I see.
I'm still uncomfortable with growing a target-specific quirk in FP behavior that can't be overridden from command line, but considering that it's limted to HIP only, it may be OK short-term.

I think similar issue (fast-math does not allow changing contraction mode) was discussed last year:
http://lists.llvm.org/pipermail/llvm-dev/2019-July/133787.html

One thing I know about FP is that there are more nuances than I'm aware of. It may be worth asking with more experience with this.

@scanon -- do we have any better options to honor contraction pragmas with implicitly-enabled fast-math?

rjmccall added inline comments.Oct 27 2020, 11:26 AM
clang/lib/CodeGen/BackendUtil.cpp
502

In the abstract, it seems reasonable for pragmas to override a global fast-math setting.

yaxunl added inline comments.Oct 28 2020, 10:05 AM
clang/lib/CodeGen/BackendUtil.cpp
502

I think probably we need to introduce a new value for -ffp-contract= option faststd, which is fast for FE and Standard for BE. In this case, fp add/mult by default can be fused, unless disabled by pragma, and BE respect the restrictions imposed by pragmas in FE. I think this mode is more useful than the original fast mode.

tra added inline comments.Oct 28 2020, 10:13 AM
clang/lib/CodeGen/BackendUtil.cpp
502

This should work, I think.

yaxunl updated this revision to Diff 302099.Oct 31 2020, 9:31 AM
yaxunl edited the summary of this revision. (Show Details)

introduce faststd as value for -ffp-contract and use it for HIP by default.

tra added a comment.Nov 2 2020, 2:30 PM

LGTM, but I'll defer to @rjmccall for the approval.

I agree this is useful. However, you need to update the manual to cover faststd.

Could you also an IRGen test for this rather than only testing CUDA assembly output?

yaxunl added a comment.Nov 3 2020, 4:48 AM

I agree this is useful. However, you need to update the manual to cover faststd.

will update the manual.

clang/test/CodeGenCUDA/fp-contract.cu
203

@rjmccall I have IRGen checks in this test. Are they sufficient? Thanks.

yaxunl updated this revision to Diff 302560.Nov 3 2020, 5:59 AM

updated manual

Hmm. Do we actually want this behavior of fast overriding pragmas? What do other compilers do here? It might be reasonable to just treat this as a bug.

clang/docs/LanguageExtensions.rst
3213

Suggestion:

This can be useful when fast contraction is otherwise enabled for the translation unit
with the ``-ffp-contract=faststd`` flag. Note that ``-ffp-contract=fast`` will override
pragmas to fuse multiply and addition across statements regardless of any controlling
pragmas.
clang/docs/UsersManual.rst
1339–1341

"...with the `FP_CONTRACT and clang fp contract` pragmas. Please refer..."

GCC doesn't respect the pragma, so "what other compilers do" is not a particularly useful metric.

(If you tell GCC to respect the pragma via -std=c17 or similar, then -ffp-contract=fast overrides it just like clang's current behavior: https://godbolt.org/z/5dxxGb)

yaxunl updated this revision to Diff 302662.Nov 3 2020, 12:28 PM
yaxunl marked 7 inline comments as done.

revised manual by John's comments

yaxunl added a comment.Nov 4 2020, 8:39 AM

Hmm. Do we actually want this behavior of fast overriding pragmas? What do other compilers do here? It might be reasonable to just treat this as a bug.

I think clang is just trying to follow gcc's behavior. However, this is undesirable in certain cases. Introducing 'faststd' gives us more choices to avoid the undesirable behavior.

yaxunl added a comment.Nov 4 2020, 9:45 AM

Hmm. Do we actually want this behavior of fast overriding pragmas? What do other compilers do here? It might be reasonable to just treat this as a bug.

I think clang is just trying to follow gcc's behavior. However, this is undesirable in certain cases. Introducing 'faststd' gives us more choices to avoid the undesirable behavior.

For other compilers:

MSVC respects pragma with /fp:fast option:

https://godbolt.org/z/3rja55

Intel compiler also respects fp_contract pragma with -fp-model fast={1|2} option:

https://godbolt.org/z/fez86h

nvcc by default always fuse across statements and does not support pragmas to control fp contract

https://docs.nvidia.com/cuda/floating-point/index.html#controlling-fused-multiply-add

yaxunl added a comment.Nov 5 2020, 7:16 AM

@rjmccall ping. Any further concerns for this patch? Thanks.

Okay. It sounds like strict compatibility with GCC implies ignoring pragmas in fast, and that's what we're most concerned with, since this is originally a GCC option. So the only question I have now is whether faststd is the best name for this. Does anyone want to suggest an alternative?

scanon added a comment.Nov 5 2020, 1:26 PM

I do not much like faststd, as there's nothing "standard" about it. I do not, however, have a better suggestion off the top of my head. Let's pause and consider the name a little bit longer, please?

yaxunl added a comment.Nov 9 2020, 7:49 AM

How about fast-constrained, fast-limited, fast-restricted, or fast-restrained?

fast-strict? Sounds sort of oxymoronic, but it's fast while also being strict about honoring pragmas. I don't have any great ideas here.

Strictly speaking, fp-contract=fast probably should have been a separate flag entirely (since there's no _expression_ being contracted in fast). Unfortunately, that ship has sailed, and it does constrain our ability to choose an accurate name somewhat.

What if we just spell it out? fast-respect-pragma? fast-when-unspecified? I don't think that we really need to try to be as brief as possible with this one.

tra added a comment.Nov 11 2020, 10:24 AM

Strictly speaking, fp-contract=fast probably should have been a separate flag entirely (since there's no _expression_ being contracted in fast). Unfortunately, that ship has sailed, and it does constrain our ability to choose an accurate name somewhat.

What if we just spell it out? fast-respect-pragma? fast-when-unspecified? I don't think that we really need to try to be as brief as possible with this one.

This sounds reasonable. We already have -fhonor-nans and -fhonor-infinities. Should we make it fast-honor-pragma for consistency?

In D90174#2389269, @tra wrote:

Strictly speaking, fp-contract=fast probably should have been a separate flag entirely (since there's no _expression_ being contracted in fast). Unfortunately, that ship has sailed, and it does constrain our ability to choose an accurate name somewhat.

What if we just spell it out? fast-respect-pragma? fast-when-unspecified? I don't think that we really need to try to be as brief as possible with this one.

This sounds reasonable. We already have -fhonor-nans and -fhonor-infinities. Should we make it fast-honor-pragma for consistency?

+1 with fast-honor-pragma

Probably should be pluralized for consistency, fast-honor-pragmas, but yeah, that's fine with me.

yaxunl updated this revision to Diff 304604.Nov 11 2020, 11:32 AM
yaxunl edited the summary of this revision. (Show Details)

rename faststd to fast-honor-pragmas

scanon accepted this revision.Nov 19 2020, 8:44 AM

I'm fine with this.

This revision is now accepted and ready to land.Nov 19 2020, 8:44 AM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 5:10 AM