This is an archive of the discontinued LLVM Phabricator instance.

[RFC][OpenCL] Set fp contract flag on -cl-mad-enable
Needs RevisionPublic

Authored by Anastasia on May 21 2020, 3:34 PM.

Details

Summary

I think setting contract flag on fp instructions with -cl-mad-enable should be sensible considering spec wording:

-cl-mad-enable Allow a * b + c to be replaced by a mad. The mad computes a * b + c with reduced accuracy. For example, some OpenCL devices implement mad as truncate the result of a * b before adding it to c

However, I am unclear how it impacts fdiv instructions and etc, as I am not sure how exactly it is optimized. LLVM reference manual says:

contract

Allow floating-point contraction (e.g. fusing a multiply followed by an addition into a fused multiply-and-add).

Presumably contract makes no effect in fdiv?

Now another question is whether we could remove LessPreciseFPMAD from CodeGen options as I don't feel it has actual uses. Although I might be misunderstanding this.

TODO: the same applies to -cl-unsafe-math-optimizations

Diff Detail

Event Timeline

Anastasia created this revision.May 21 2020, 3:34 PM
Anastasia marked an inline comment as done.May 21 2020, 3:35 PM
Anastasia added inline comments.
clang/test/CodeGenOpenCL/relaxed-fpmath.cl
21

I don't find this behavior "NORMAL". I don't believe we should contract expressions by default in OpenCL...

Anastasia retitled this revision from [RCF][OpenCL] Set fp contract flag on -cl-mad-enable to [RFC][OpenCL] Set fp contract flag on -cl-mad-enable.May 21 2020, 3:40 PM

The langref wording makes me think this isn't quite right. This depends on your definition of floating point contraction. I've always assumed it meant allow FMA, potentially increasing precision. Is contracting into something less precise allowed? If not, that's stricter / the opposite of what -cl-mad-enable implies. My interpretation of the CL spec description would be to use fmuladd with an afn flag (although that still can allow for increasing precision)

For AMDGPU I've thought about interpreting less-precise-fpmad as allowing denormal flushing that would otherwise be illegal. Currently it doesn't do anything, but somehow interpreting the flags for this would be better.

The langref wording makes me think this isn't quite right. This depends on your definition of floating point contraction. I've always assumed it meant allow FMA, potentially increasing precision. Is contracting into something less precise allowed?

I don't see anywhere it says that contraction is for higher precision only. If I check the LLVM language manual fast flag implies contract which is what we are setting with -cl-fast-relaxed-math known to result in lower accuracy.

If not, that's stricter / the opposite of what -cl-mad-enable implies. My interpretation of the CL spec description would be to use fmuladd with an afn flag (although that still can allow for increasing precision)

Currently fmuladd is produced with LangOptions::FPM_On that is used with FP_CONTRACT pragma. If I look at the documentation LangOptions::FPM_Fast is serving the same purpose just allowing more cases to be contracted (i.e. across statements). Hence due to this it sets contract flag rather than emitting intrinsic directly providing more freedom to the backend to optimise and combine fused computations.

Anastasia marked an inline comment as done.May 22 2020, 4:56 AM
Anastasia added inline comments.
clang/test/CodeGenOpenCL/relaxed-fpmath.cl
21

I just found in table 38 of OpenCL C spec, last entry says:

x * y + z
Implemented either as a correctly rounded fma or as a multiply and an add both of which are correctly rounded.

In table 8 for fma it states:

Returns the correctly rounded floating-point representation of the sum of c with the infinitely precise product of a and b. Rounding of intermediate products shall not occur.

When I check LLVM doecumentation for fmuladd it says:

is equivalent to the expression a * b + c, except that it is unspecified whether rounding will be performed between the multiplication and addition steps. Fusion is not guaranteed, even if the target platform supports it. If a fused multiply-add is required, the corresponding llvm.fma intrinsic function should be used instead. This never sets errno, just as ‘llvm.fma.*’.

Does this mean that rounding of an intermediate product may occur and therefore it is not safe to use it for OpenCL mode by default?

Anastasia marked an inline comment as done.May 22 2020, 5:10 AM
Anastasia added inline comments.
clang/test/CodeGenOpenCL/relaxed-fpmath.cl
21

After more digging I conclude that the use of fmuladd for x * y + z should be ok as per table 38 of OpenCL C because it allows either fused or non-fused operation i.e. it is ok if either intermediate value is rounded or not.

I think the contract flag needs clarification. I would interpret an instruction with only a contract flag as meaning allow precision increasing FMA formation, and contract+afn to mean combining while reducing precision

Anastasia marked an inline comment as done.May 22 2020, 8:00 AM
Anastasia added inline comments.
clang/test/CodeGenOpenCL/relaxed-fpmath.cl
21

I changed my mind again. This is now covered by https://reviews.llvm.org/D80440.

arsenm added a comment.Aug 5 2020, 3:40 PM

Is this still necessary?

Was this addressed already? Today it looks like we have:

if (Opts.FastRelaxedMath || Opts.CLUnsafeMath)
  Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2022, 11:58 AM
Herald added a subscriber: Naghasan. · View Herald Transcript
arsenm requested changes to this revision.Dec 14 2022, 6:04 AM

Please rebase if still relevant

This revision now requires changes to proceed.Dec 14 2022, 6:04 AM