This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
1/1
LanguageExtensions.rst
1/1
UsersManual.rst
-
include/clang/
-
clang/
-
Basic/
-
LangOptions.h
-
Driver/
-
Options.td
-
lib/
-
CodeGen/
8/8
BackendUtil.cpp
-
Frontend/
-
CompilerInvocation.cpp
-
Sema/
-
SemaAttr.cpp
-
test/
-
CodeGenCUDA/
1/1
fp-contract.cu
-
Driver/
-
autocomplete.c

Differential D90174

[HIP] Fix regressions due to fp contract change
ClosedPublic

Authored by yaxunl on Oct 26 2020, 10:56 AM.

Download Raw Diff

Details

Reviewers

tra
rjmccall
scanon

Commits

rGcb08558caa3b: [HIP] Fix regressions due to fp contract change

Summary

Recently HIP toolchain made a change to use clang instead of opt/llc to do compilation
(https://reviews.llvm.org/D81861). The intention is to make HIP toolchain canonical like
other toolchains.

However, this change introduced an unintentional change regarding backend fp fuse
option, which caused regressions in some HIP applications.

Basically before the change, HIP toolchain used clang to generate bitcode, then use
opt/llc to optimize bitcode and generate ISA. As such, the amdgpu backend takes
the default fp fuse mode which is 'Standard'. This mode respect contract flag of
fmul/fadd instructions and do not fuse fmul/fadd instructions without contract flag.

However, after the change, HIP toolchain now use clang to generate IR, do optimization,
and generate ISA as one process. Now amdgpu backend fp fuse option is determined
by -ffp-contract option, which is 'fast' by default. And this -ffp-contract=fast language option
is translated to 'Fast' fp fuse option in backend. Suddenly backend starts to fuse fmul/fadd
instructions without contract flag.

This causes wrong result for some device library functions, e.g. tan(-1e20), which should
return 0.8446, now returns -0.933. What is worse is that since backend with 'Fast' fp fuse
option does not respect contract flag, there is no way to use #pragma clang fp contract
directive to enforce fp contract requirements.

This patch fixes the regression by introducing a new value 'fast-honor-pragmas' for -ffp-contract
and use it for HIP by default. 'fast-honor-pragmas' is equivalent to 'fast' in frontend but
let the backend to use 'Standard' fp fuse option. 'fast-honor-pragmas' is useful since 'Fast'
fp fuse option in backend does not honor contract flag, it is of little use to HIP
applications since all code with #pragma STDC FP_CONTRACT or any IR from a
source compiled with -ffp-contract=on is broken.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

yaxunl created this revision.Oct 26 2020, 10:56 AM

Herald added a subscriber: tpr. · View Herald TranscriptOct 26 2020, 10:56 AM

yaxunl requested review of this revision.Oct 26 2020, 10:56 AM

I have objections to the code change here. I'll leave the conceptual question to other people interested in the HIP toolchain.

In D90174#2354249, @rjmccall wrote:

I have objections to the code change here. I'll leave the conceptual question to other people interested in the HIP toolchain.

Is it OK to introduce a clang codegen option e.g. -fp-contract-backend=x to control the backend fp fuse option? By default it matches -ffp-contract language option, but allows being overridden by explicit option. Then HIP toolchain can use it to override the backend fp fuse option.

tra added inline comments.Oct 26 2020, 11:42 AM

clang/lib/CodeGen/BackendUtil.cpp
502	I don't think it's a good idea to force this. Perhaps a better way to address this would be to set HIP-specific default to Standard where CUDA does it: https://github.com/llvm/llvm-project/blob/master/clang/lib/Frontend/CompilerInvocation.cpp#L2415 Currently HIP inherits this setting from CUDA.

yaxunl added inline comments.Oct 26 2020, 11:53 AM

clang/lib/CodeGen/BackendUtil.cpp
502	We want to keep -ffp-contract=fast for frontend so that we can continue emitting fmul/fadd insts with contract flag in IR for HIP programs. We only want to change the backend fp fuse option. Currently there is no separate clang option to set backend fp fuse option.

Argh, sorry! I meant to say "I have *no* objections".

tra added inline comments.Oct 26 2020, 3:47 PM

clang/lib/CodeGen/BackendUtil.cpp
502	I do not see any references to `AllowFPOpFusion` anywhere under `clang/` other than in this function. Perhaps I'm missing something. How/where does it make a difference in the front-end other than setting the option for the back-end?

yaxunl added inline comments.Oct 26 2020, 5:29 PM

clang/lib/CodeGen/BackendUtil.cpp
502	-ffp-contract not only sets backend fp fuse option, but also setFPContractMode https://github.com/llvm/llvm-project/blob/2e204e23911b1f8bd1463535da40c6e48747a138/clang/include/clang/Basic/LangOptions.h#L413 which enables allowFPContractAcrossStatement https://github.com/llvm/llvm-project/blob/2e204e23911b1f8bd1463535da40c6e48747a138/clang/include/clang/Basic/LangOptions.h#L439 which sets llvm::FastMathFlags https://github.com/llvm/llvm-project/blob/d3205bbca3e0002d76282878986993e7e7994779/clang/lib/CodeGen/CodeGenFunction.cpp#L129 which causes fmul/fadd to have contract flag

tra added a subscriber: scanon.Oct 27 2020, 10:11 AM

tra added inline comments.

clang/lib/CodeGen/BackendUtil.cpp
502	Thank you. I see. I'm still uncomfortable with growing a target-specific quirk in FP behavior that can't be overridden from command line, but considering that it's limted to HIP only, it may be OK short-term. I think similar issue (fast-math does not allow changing contraction mode) was discussed last year: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133787.html One thing I know about FP is that there are more nuances than I'm aware of. It may be worth asking with more experience with this. @scanon -- do we have any better options to honor contraction pragmas with implicitly-enabled fast-math?

rjmccall added inline comments.Oct 27 2020, 11:26 AM

clang/lib/CodeGen/BackendUtil.cpp
502	In the abstract, it seems reasonable for pragmas to override a global fast-math setting.

yaxunl added inline comments.Oct 28 2020, 10:05 AM

clang/lib/CodeGen/BackendUtil.cpp
502	I think probably we need to introduce a new value for `-ffp-contract=` option `faststd`, which is `fast` for FE and `Standard` for BE. In this case, fp add/mult by default can be fused, unless disabled by pragma, and BE respect the restrictions imposed by pragmas in FE. I think this mode is more useful than the original `fast` mode.

tra added inline comments.Oct 28 2020, 10:13 AM

clang/lib/CodeGen/BackendUtil.cpp
502	This should work, I think.

introduce faststd as value for -ffp-contract and use it for HIP by default.

Herald added subscribers: dexonsmith, dang. · View Herald TranscriptOct 31 2020, 9:31 AM

LGTM, but I'll defer to @rjmccall for the approval.

I agree this is useful. However, you need to update the manual to cover faststd.

Could you also an IRGen test for this rather than only testing CUDA assembly output?

In D90174#2370336, @rjmccall wrote:

I agree this is useful. However, you need to update the manual to cover faststd.

will update the manual.

clang/test/CodeGenCUDA/fp-contract.cu
203	@rjmccall I have IRGen checks in this test. Are they sufficient? Thanks.

updated manual

Hmm. Do we actually want this behavior of fast overriding pragmas? What do other compilers do here? It might be reasonable to just treat this as a bug.

clang/docs/LanguageExtensions.rst

3213

Suggestion:

This can be useful when fast contraction is otherwise enabled for the translation unit
with the ``-ffp-contract=faststd`` flag. Note that ``-ffp-contract=fast`` will override
pragmas to fuse multiply and addition across statements regardless of any controlling
pragmas.

clang/docs/UsersManual.rst

1339–1341

"...with the `FP_CONTRACT and clang fp contract` pragmas. Please refer..."

GCC doesn't respect the pragma, so "what other compilers do" is not a particularly useful metric.

(If you tell GCC to respect the pragma via -std=c17 or similar, then -ffp-contract=fast overrides it just like clang's current behavior: https://godbolt.org/z/5dxxGb)

revised manual by John's comments

In D90174#2371577, @rjmccall wrote:

Hmm. Do we actually want this behavior of fast overriding pragmas? What do other compilers do here? It might be reasonable to just treat this as a bug.

I think clang is just trying to follow gcc's behavior. However, this is undesirable in certain cases. Introducing 'faststd' gives us more choices to avoid the undesirable behavior.

In D90174#2373829, @yaxunl wrote:

In D90174#2371577, @rjmccall wrote:

Hmm. Do we actually want this behavior of fast overriding pragmas? What do other compilers do here? It might be reasonable to just treat this as a bug.

I think clang is just trying to follow gcc's behavior. However, this is undesirable in certain cases. Introducing 'faststd' gives us more choices to avoid the undesirable behavior.

For other compilers:

MSVC respects pragma with /fp:fast option:

https://godbolt.org/z/3rja55

Intel compiler also respects fp_contract pragma with -fp-model fast={1|2} option:

https://godbolt.org/z/fez86h

nvcc by default always fuse across statements and does not support pragmas to control fp contract

https://docs.nvidia.com/cuda/floating-point/index.html#controlling-fused-multiply-add

@rjmccall ping. Any further concerns for this patch? Thanks.

Okay. It sounds like strict compatibility with GCC implies ignoring pragmas in fast, and that's what we're most concerned with, since this is originally a GCC option. So the only question I have now is whether faststd is the best name for this. Does anyone want to suggest an alternative?

I do not much like faststd, as there's nothing "standard" about it. I do not, however, have a better suggestion off the top of my head. Let's pause and consider the name a little bit longer, please?

How about fast-constrained, fast-limited, fast-restricted, or fast-restrained?

fast-strict? Sounds sort of oxymoronic, but it's fast while also being strict about honoring pragmas. I don't have any great ideas here.

Strictly speaking, fp-contract=fast probably should have been a separate flag entirely (since there's no _expression_ being contracted in fast). Unfortunately, that ship has sailed, and it does constrain our ability to choose an accurate name somewhat.

What if we just spell it out? fast-respect-pragma? fast-when-unspecified? I don't think that we really need to try to be as brief as possible with this one.

In D90174#2387518, @scanon wrote:

Strictly speaking, fp-contract=fast probably should have been a separate flag entirely (since there's no _expression_ being contracted in fast). Unfortunately, that ship has sailed, and it does constrain our ability to choose an accurate name somewhat.

What if we just spell it out? fast-respect-pragma? fast-when-unspecified? I don't think that we really need to try to be as brief as possible with this one.

This sounds reasonable. We already have -fhonor-nans and -fhonor-infinities. Should we make it fast-honor-pragma for consistency?

In D90174#2389269, @tra wrote:

In D90174#2387518, @scanon wrote:

Strictly speaking, fp-contract=fast probably should have been a separate flag entirely (since there's no _expression_ being contracted in fast). Unfortunately, that ship has sailed, and it does constrain our ability to choose an accurate name somewhat.

What if we just spell it out? fast-respect-pragma? fast-when-unspecified? I don't think that we really need to try to be as brief as possible with this one.

This sounds reasonable. We already have -fhonor-nans and -fhonor-infinities. Should we make it fast-honor-pragma for consistency?

+1 with fast-honor-pragma

Probably should be pluralized for consistency, fast-honor-pragmas, but yeah, that's fine with me.

rename faststd to fast-honor-pragmas

ping

I'm fine with this.

This revision is now accepted and ready to land.Nov 19 2020, 8:44 AM

Closed by commit rGcb08558caa3b: [HIP] Fix regressions due to fp contract change (authored by yaxunl). · Explain WhyNov 24 2020, 5:10 AM

This revision was automatically updated to reflect the committed changes.

yaxunl added a commit: rGcb08558caa3b: [HIP] Fix regressions due to fp contract change.

Herald added a project: Restricted Project. · View Herald TranscriptNov 24 2020, 5:10 AM

tra mentioned this in D112760: Require 'contract' fast-math flag for FMA generation.Nov 2 2021, 12:38 PM

awarzynski mentioned this in D136080: [flang] Add -ffp-contract option processing.Oct 24 2022, 1:33 AM

Revision Contents

Path

Size

clang/

docs/

LanguageExtensions.rst

5 lines

UsersManual.rst

11 lines

include/

clang/

Basic/

LangOptions.h

15 lines

Driver/

Options.td

10 lines

lib/

CodeGen/

BackendUtil.cpp

1 line

Frontend/

CompilerInvocation.cpp

17 lines

Sema/

SemaAttr.cpp

2 lines

test/

CodeGenCUDA/

fp-contract.cu

290 lines

Driver/

autocomplete.c

1 line

Diff 307319

clang/docs/LanguageExtensions.rst

Show First 20 Lines • Show All 3,203 Lines • ▼ Show 20 Lines	for(...) {
#pragma clang fp contract(fast)		#pragma clang fp contract(fast)
a = b[i] * c[i];		a = b[i] * c[i];
d[i] += a;		d[i] += a;
}		}


The pragma can also be used with ``off`` which turns FP contraction off for a		The pragma can also be used with ``off`` which turns FP contraction off for a
section of the code. This can be useful when fast contraction is otherwise		section of the code. This can be useful when fast contraction is otherwise
enabled for the translation unit with the ``-ffp-contract=fast`` flag.		enabled for the translation unit with the ``-ffp-contract=fast-honor-pragmas`` flag.
		Note that ``-ffp-contract=fast`` will override pragmas to fuse multiply and
		rjmccallUnsubmitted Done Reply Inline Actions Suggestion: This can be useful when fast contraction is otherwise enabled for the translation unit with the ``-ffp-contract=faststd`` flag. Note that ``-ffp-contract=fast`` will override pragmas to fuse multiply and addition across statements regardless of any controlling pragmas. rjmccall: Suggestion: This can be useful when fast contraction is otherwise enabled for the…
		addition across statements regardless of any controlling pragmas.

``#pragma clang fp exceptions`` specifies floating point exception behavior. It		``#pragma clang fp exceptions`` specifies floating point exception behavior. It
may take one the the values: ``ignore``, ``maytrap`` or ``strict``. Meaning of		may take one the the values: ``ignore``, ``maytrap`` or ``strict``. Meaning of
these values is same as for `constrained floating point intrinsics <http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics>`_.		these values is same as for `constrained floating point intrinsics <http://llvm.org/docs/LangRef.html#constrained-floating-point-intrinsics>`_.

.. code-block:: c++		.. code-block:: c++

{		{
▲ Show 20 Lines • Show All 478 Lines • Show Last 20 Lines

clang/docs/UsersManual.rst

Show First 20 Lines • Show All 1,330 Lines • ▼ Show 20 Lines	.. option:: -ffp-contract=<value>
Specify when the compiler is permitted to form fused floating-point		Specify when the compiler is permitted to form fused floating-point
operations, such as fused multiply-add (FMA). Fused operations are		operations, such as fused multiply-add (FMA). Fused operations are
permitted to produce more precise results than performing the same		permitted to produce more precise results than performing the same
operations separately.		operations separately.

The C standard permits intermediate floating-point results within an		The C standard permits intermediate floating-point results within an
expression to be computed with more precision than their type would		expression to be computed with more precision than their type would
normally allow. This permits operation fusing, and Clang takes advantage		normally allow. This permits operation fusing, and Clang takes advantage
of this by default. This behavior can be controlled with the		of this by default. This behavior can be controlled with the ``FP_CONTRACT``
``FP_CONTRACT`` pragma. Please refer to the pragma documentation for a		and ``clang fp contract`` pragmas. Please refer to the pragma documentation
description of how the pragma interacts with this option.		for a description of how the pragmas interact with this option.
		rjmccallUnsubmitted Done Reply Inline Actions "...with the `FP_CONTRACT `and` `clang fp contract`` pragmas. Please refer..." rjmccall: "...with the ``FP_CONTRACT`` and ``clang fp contract`` pragmas. Please refer..."

Valid values are:		Valid values are:

* ``fast`` (everywhere)		* ``fast`` (fuse across statements disregarding pragmas, default for CUDA)
* ``on`` (according to FP_CONTRACT pragma, default)		* ``on`` (fuse in the same statement unless dictated by pragmas, default for languages other than CUDA/HIP)
* ``off`` (never fuse)		* ``off`` (never fuse)
		* ``fast-honor-pragmas`` (fuse across statements unless dictated by pragmas, default for HIP)

.. _opt_fhonor-infinities:		.. _opt_fhonor-infinities:

-f[no-]honor-infinities		-f[no-]honor-infinities

If both ``-fno-honor-infinities`` and ``-fno-honor-nans`` are used,		If both ``-fno-honor-infinities`` and ``-fno-honor-nans`` are used,
has the same effect as specifying ``-ffinite-math-only``.		has the same effect as specifying ``-ffinite-math-only``.

▲ Show 20 Lines • Show All 2,538 Lines • Show Last 20 Lines

clang/include/clang/Basic/LangOptions.h

Show First 20 Lines • Show All 181 Lines • ▼ Show 20 Lines	public:

enum FPModeKind {		enum FPModeKind {
// Disable the floating point pragma		// Disable the floating point pragma
FPM_Off,		FPM_Off,

// Enable the floating point pragma		// Enable the floating point pragma
FPM_On,		FPM_On,

// Aggressively fuse FP ops (E.g. FMA).		// Aggressively fuse FP ops (E.g. FMA) disregarding pragmas.
FPM_Fast		FPM_Fast,

		// Aggressively fuse FP ops and honor pragmas.
		FPM_FastHonorPragmas
};		};

/// Alias for RoundingMode::NearestTiesToEven.		/// Alias for RoundingMode::NearestTiesToEven.
static constexpr unsigned FPR_ToNearest =		static constexpr unsigned FPR_ToNearest =
static_cast<unsigned>(llvm::RoundingMode::NearestTiesToEven);		static_cast<unsigned>(llvm::RoundingMode::NearestTiesToEven);

/// Possible floating point exception behavior.		/// Possible floating point exception behavior.
enum FPExceptionModeKind {		enum FPExceptionModeKind {
▲ Show 20 Lines • Show All 212 Lines • ▼ Show 20 Lines
public:		public:
FPOptions() : Value(0) {		FPOptions() : Value(0) {
setFPContractMode(LangOptions::FPM_Off);		setFPContractMode(LangOptions::FPM_Off);
setRoundingMode(static_cast<RoundingMode>(LangOptions::FPR_ToNearest));		setRoundingMode(static_cast<RoundingMode>(LangOptions::FPR_ToNearest));
setFPExceptionMode(LangOptions::FPE_Ignore);		setFPExceptionMode(LangOptions::FPE_Ignore);
}		}
explicit FPOptions(const LangOptions &LO) {		explicit FPOptions(const LangOptions &LO) {
Value = 0;		Value = 0;
setFPContractMode(LO.getDefaultFPContractMode());		// The language fp contract option FPM_FastHonorPragmas has the same effect
		// as FPM_Fast in frontend. For simplicity, use FPM_Fast uniformly in
		// frontend.
		auto LangOptContractMode = LO.getDefaultFPContractMode();
		if (LangOptContractMode == LangOptions::FPM_FastHonorPragmas)
		LangOptContractMode = LangOptions::FPM_Fast;
		setFPContractMode(LangOptContractMode);
setRoundingMode(LO.getFPRoundingMode());		setRoundingMode(LO.getFPRoundingMode());
setFPExceptionMode(LO.getFPExceptionMode());		setFPExceptionMode(LO.getFPExceptionMode());
setAllowFPReassociate(LO.AllowFPReassoc);		setAllowFPReassociate(LO.AllowFPReassoc);
setNoHonorNaNs(LO.NoHonorNaNs);		setNoHonorNaNs(LO.NoHonorNaNs);
setNoHonorInfs(LO.NoHonorInfs);		setNoHonorInfs(LO.NoHonorInfs);
setNoSignedZero(LO.NoSignedZero);		setNoSignedZero(LO.NoSignedZero);
setAllowReciprocal(LO.AllowRecip);		setAllowReciprocal(LO.AllowRecip);
setAllowApproxFunc(LO.ApproxFunc);		setAllowApproxFunc(LO.ApproxFunc);
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

clang/include/clang/Driver/Options.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 1,270 Lines • ▼ Show 20 Lines
	// This option was originally misspelt "infinites" [sic].			// This option was originally misspelt "infinites" [sic].
	def : Flag<["-"], "fhonor-infinites">, Alias<fhonor_infinities>;			def : Flag<["-"], "fhonor-infinites">, Alias<fhonor_infinities>;
	def : Flag<["-"], "fno-honor-infinites">, Alias<fno_honor_infinities>;			def : Flag<["-"], "fno-honor-infinites">, Alias<fno_honor_infinities>;
	def frounding_math : Flag<["-"], "frounding-math">, Group<f_Group>, Flags<[CC1Option]>;			def frounding_math : Flag<["-"], "frounding-math">, Group<f_Group>, Flags<[CC1Option]>;
	def fno_rounding_math : Flag<["-"], "fno-rounding-math">, Group<f_Group>, Flags<[CC1Option]>;			def fno_rounding_math : Flag<["-"], "fno-rounding-math">, Group<f_Group>, Flags<[CC1Option]>;
	def ftrapping_math : Flag<["-"], "ftrapping-math">, Group<f_Group>, Flags<[CC1Option]>;			def ftrapping_math : Flag<["-"], "ftrapping-math">, Group<f_Group>, Flags<[CC1Option]>;
	def fno_trapping_math : Flag<["-"], "fno-trapping-math">, Group<f_Group>, Flags<[CC1Option]>;			def fno_trapping_math : Flag<["-"], "fno-trapping-math">, Group<f_Group>, Flags<[CC1Option]>;
	def ffp_contract : Joined<["-"], "ffp-contract=">, Group<f_Group>,			def ffp_contract : Joined<["-"], "ffp-contract=">, Group<f_Group>,
	Flags<[CC1Option]>, HelpText<"Form fused FP ops (e.g. FMAs): fast (everywhere)"			Flags<[CC1Option]>, HelpText<"Form fused FP ops (e.g. FMAs):"
	" \| on (according to FP_CONTRACT pragma) \| off (never fuse). Default"			" fast (fuses across statements disregarding pragmas)"
	" is 'fast' for CUDA/HIP and 'on' otherwise.">, Values<"fast,on,off">;			" \| on (only fuses in the same statement unless dictated by pragmas)"
				" \| off (never fuses)"
				" \| fast-honor-pragmas (fuses across statements unless diectated by pragmas)."
				" Default is 'fast' for CUDA, 'fast-honor-pragmas' for HIP, and 'on' otherwise.">,
				Values<"fast,on,off,fast-honor-pragmas">;

	defm strict_float_cast_overflow : OptOutFFlag<"strict-float-cast-overflow",			defm strict_float_cast_overflow : OptOutFFlag<"strict-float-cast-overflow",
	"Assume that overflowing float-to-int casts are undefined (default)",			"Assume that overflowing float-to-int casts are undefined (default)",
	"Relax language rules and try to match the behavior of the target's native float-to-int conversion instructions">;			"Relax language rules and try to match the behavior of the target's native float-to-int conversion instructions">;

	def ffor_scope : Flag<["-"], "ffor-scope">, Group<f_Group>;			def ffor_scope : Flag<["-"], "ffor-scope">, Group<f_Group>;
	def fno_for_scope : Flag<["-"], "fno-for-scope">, Group<f_Group>;			def fno_for_scope : Flag<["-"], "fno-for-scope">, Group<f_Group>;

	▲ Show 20 Lines • Show All 3,771 Lines • Show Last 20 Lines

clang/lib/CodeGen/BackendUtil.cpp

Show First 20 Lines • Show All 475 Lines • ▼ Show 20 Lines	static bool initTargetOptions(DiagnosticsEngine &Diags,
// Set FP fusion mode.		// Set FP fusion mode.
switch (LangOpts.getDefaultFPContractMode()) {		switch (LangOpts.getDefaultFPContractMode()) {
case LangOptions::FPM_Off:		case LangOptions::FPM_Off:
// Preserve any contraction performed by the front-end. (Strict performs		// Preserve any contraction performed by the front-end. (Strict performs
// splitting of the muladd intrinsic in the backend.)		// splitting of the muladd intrinsic in the backend.)
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;		Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
break;		break;
case LangOptions::FPM_On:		case LangOptions::FPM_On:
		case LangOptions::FPM_FastHonorPragmas:
Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;		Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
break;		break;
case LangOptions::FPM_Fast:		case LangOptions::FPM_Fast:
Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;		Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
break;		break;
}		}

Options.UseInitArray = CodeGenOpts.UseInitArray;		Options.UseInitArray = CodeGenOpts.UseInitArray;
Options.DisableIntegratedAS = CodeGenOpts.DisableIntegratedAS;		Options.DisableIntegratedAS = CodeGenOpts.DisableIntegratedAS;
Options.CompressDebugSections = CodeGenOpts.getCompressDebugSections();		Options.CompressDebugSections = CodeGenOpts.getCompressDebugSections();
Options.RelaxELFRelocations = CodeGenOpts.RelaxELFRelocations;		Options.RelaxELFRelocations = CodeGenOpts.RelaxELFRelocations;

// Set EABI version.		// Set EABI version.
Options.EABIVersion = TargetOpts.EABIVersion;		Options.EABIVersion = TargetOpts.EABIVersion;

if (LangOpts.SjLjExceptions)		if (LangOpts.SjLjExceptions)
Options.ExceptionModel = llvm::ExceptionHandling::SjLj;		Options.ExceptionModel = llvm::ExceptionHandling::SjLj;
if (LangOpts.SEHExceptions)		if (LangOpts.SEHExceptions)
		traUnsubmitted Done Reply Inline Actions I don't think it's a good idea to force this. Perhaps a better way to address this would be to set HIP-specific default to Standard where CUDA does it: https://github.com/llvm/llvm-project/blob/master/clang/lib/Frontend/CompilerInvocation.cpp#L2415 Currently HIP inherits this setting from CUDA. tra: I don't think it's a good idea to force this. Perhaps a better way to address this would be to…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions We want to keep -ffp-contract=fast for frontend so that we can continue emitting fmul/fadd insts with contract flag in IR for HIP programs. We only want to change the backend fp fuse option. Currently there is no separate clang option to set backend fp fuse option. yaxunl: We want to keep -ffp-contract=fast for frontend so that we can continue emitting fmul/fadd…
		traUnsubmitted Done Reply Inline Actions I do not see any references to `AllowFPOpFusion` anywhere under `clang/` other than in this function. Perhaps I'm missing something. How/where does it make a difference in the front-end other than setting the option for the back-end? tra: I do not see any references to `AllowFPOpFusion` anywhere under `clang/` other than in this…
		yaxunlAuthorUnsubmitted Done Reply Inline Actions -ffp-contract not only sets backend fp fuse option, but also setFPContractMode https://github.com/llvm/llvm-project/blob/2e204e23911b1f8bd1463535da40c6e48747a138/clang/include/clang/Basic/LangOptions.h#L413 which enables allowFPContractAcrossStatement https://github.com/llvm/llvm-project/blob/2e204e23911b1f8bd1463535da40c6e48747a138/clang/include/clang/Basic/LangOptions.h#L439 which sets llvm::FastMathFlags https://github.com/llvm/llvm-project/blob/d3205bbca3e0002d76282878986993e7e7994779/clang/lib/CodeGen/CodeGenFunction.cpp#L129 which causes fmul/fadd to have contract flag yaxunl: -ffp-contract not only sets backend fp fuse option, but also setFPContractMode https://github.
		traUnsubmitted Done Reply Inline Actions Thank you. I see. I'm still uncomfortable with growing a target-specific quirk in FP behavior that can't be overridden from command line, but considering that it's limted to HIP only, it may be OK short-term. I think similar issue (fast-math does not allow changing contraction mode) was discussed last year: http://lists.llvm.org/pipermail/llvm-dev/2019-July/133787.html One thing I know about FP is that there are more nuances than I'm aware of. It may be worth asking with more experience with this. @scanon -- do we have any better options to honor contraction pragmas with implicitly-enabled fast-math? tra: Thank you. I see. I'm still uncomfortable with growing a target-specific quirk in FP behavior…
		rjmccallUnsubmitted Done Reply Inline Actions In the abstract, it seems reasonable for pragmas to override a global fast-math setting. rjmccall: In the abstract, it seems reasonable for pragmas to override a global fast-math setting.
		yaxunlAuthorUnsubmitted Done Reply Inline Actions I think probably we need to introduce a new value for `-ffp-contract=` option `faststd`, which is `fast` for FE and `Standard` for BE. In this case, fp add/mult by default can be fused, unless disabled by pragma, and BE respect the restrictions imposed by pragmas in FE. I think this mode is more useful than the original `fast` mode. yaxunl: I think probably we need to introduce a new value for `-ffp-contract=` option `faststd`, which…
		traUnsubmitted Done Reply Inline Actions This should work, I think. tra: This should work, I think.
Options.ExceptionModel = llvm::ExceptionHandling::WinEH;		Options.ExceptionModel = llvm::ExceptionHandling::WinEH;
if (LangOpts.DWARFExceptions)		if (LangOpts.DWARFExceptions)
Options.ExceptionModel = llvm::ExceptionHandling::DwarfCFI;		Options.ExceptionModel = llvm::ExceptionHandling::DwarfCFI;
if (LangOpts.WasmExceptions)		if (LangOpts.WasmExceptions)
Options.ExceptionModel = llvm::ExceptionHandling::Wasm;		Options.ExceptionModel = llvm::ExceptionHandling::Wasm;

Options.NoInfsFPMath = LangOpts.NoHonorInfs;		Options.NoInfsFPMath = LangOpts.NoHonorInfs;
Options.NoNaNsFPMath = LangOpts.NoHonorNaNs;		Options.NoNaNsFPMath = LangOpts.NoHonorNaNs;
▲ Show 20 Lines • Show All 1,086 Lines • Show Last 20 Lines

clang/lib/Frontend/CompilerInvocation.cpp

Show First 20 Lines • Show All 2,418 Lines • ▼ Show 20 Lines	if (Opts.IncludeDefaultHeader) {
} else {		} else {
PPOpts.Includes.push_back("opencl-c.h");		PPOpts.Includes.push_back("opencl-c.h");
}		}
}		}
}		}

Opts.HIP = IK.getLanguage() == Language::HIP;		Opts.HIP = IK.getLanguage() == Language::HIP;
Opts.CUDA = IK.getLanguage() == Language::CUDA \|\| Opts.HIP;		Opts.CUDA = IK.getLanguage() == Language::CUDA \|\| Opts.HIP;
if (Opts.CUDA)		if (Opts.HIP) {
// Set default FP_CONTRACT to FAST.		// HIP toolchain does not support 'Fast' FPOpFusion in backends since it
		// fuses multiplication/addition instructions without contract flag from
		// device library functions in LLVM bitcode, which causes accuracy loss in
		// certain math functions, e.g. tan(-1e20) becomes -0.933 instead of 0.8446.
		// For device library functions in bitcode to work, 'Strict' or 'Standard'
		// FPOpFusion options in backends is needed. Therefore 'fast-honor-pragmas'
		// FP contract option is used to allow fuse across statements in frontend
		// whereas respecting contract flag in backend.
		Opts.setDefaultFPContractMode(LangOptions::FPM_FastHonorPragmas);
		} else if (Opts.CUDA) {
		// Allow fuse across statements disregarding pragmas.
Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);		Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
		}

Opts.RenderScript = IK.getLanguage() == Language::RenderScript;		Opts.RenderScript = IK.getLanguage() == Language::RenderScript;
if (Opts.RenderScript) {		if (Opts.RenderScript) {
Opts.NativeHalfType = 1;		Opts.NativeHalfType = 1;
Opts.NativeHalfArgsAndReturns = 1;		Opts.NativeHalfArgsAndReturns = 1;
}		}

// OpenCL and C++ both have bool, true, false keywords.		// OpenCL and C++ both have bool, true, false keywords.
▲ Show 20 Lines • Show All 900 Lines • ▼ Show 20 Lines	#include "clang/Basic/LangStandards.def"
if (Arg *A = Args.getLastArg(OPT_ffp_contract)) {		if (Arg *A = Args.getLastArg(OPT_ffp_contract)) {
StringRef Val = A->getValue();		StringRef Val = A->getValue();
if (Val == "fast")		if (Val == "fast")
Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);		Opts.setDefaultFPContractMode(LangOptions::FPM_Fast);
else if (Val == "on")		else if (Val == "on")
Opts.setDefaultFPContractMode(LangOptions::FPM_On);		Opts.setDefaultFPContractMode(LangOptions::FPM_On);
else if (Val == "off")		else if (Val == "off")
Opts.setDefaultFPContractMode(LangOptions::FPM_Off);		Opts.setDefaultFPContractMode(LangOptions::FPM_Off);
		else if (Val == "fast-honor-pragmas")
		Opts.setDefaultFPContractMode(LangOptions::FPM_FastHonorPragmas);
else		else
Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;		Diags.Report(diag::err_drv_invalid_value) << A->getAsString(Args) << Val;
}		}

auto FPRM = llvm::RoundingMode::NearestTiesToEven;		auto FPRM = llvm::RoundingMode::NearestTiesToEven;
if (Args.hasArg(OPT_frounding_math)) {		if (Args.hasArg(OPT_frounding_math)) {
FPRM = llvm::RoundingMode::Dynamic;		FPRM = llvm::RoundingMode::Dynamic;
}		}
▲ Show 20 Lines • Show All 727 Lines • Show Last 20 Lines

clang/lib/Sema/SemaAttr.cpp

Show First 20 Lines • Show All 960 Lines • ▼ Show 20 Lines	case LangOptions::FPM_On:
NewFPFeatures.setAllowFPContractWithinStatement();		NewFPFeatures.setAllowFPContractWithinStatement();
break;		break;
case LangOptions::FPM_Fast:		case LangOptions::FPM_Fast:
NewFPFeatures.setAllowFPContractAcrossStatement();		NewFPFeatures.setAllowFPContractAcrossStatement();
break;		break;
case LangOptions::FPM_Off:		case LangOptions::FPM_Off:
NewFPFeatures.setDisallowFPContract();		NewFPFeatures.setDisallowFPContract();
break;		break;
		case LangOptions::FPM_FastHonorPragmas:
		llvm_unreachable("Should not happen");
}		}
FpPragmaStack.Act(Loc, Sema::PSK_Set, StringRef(), NewFPFeatures);		FpPragmaStack.Act(Loc, Sema::PSK_Set, StringRef(), NewFPFeatures);
CurFPFeatures = NewFPFeatures.applyOverrides(getLangOpts());		CurFPFeatures = NewFPFeatures.applyOverrides(getLangOpts());
}		}

void Sema::ActOnPragmaFPReassociate(SourceLocation Loc, bool IsEnabled) {		void Sema::ActOnPragmaFPReassociate(SourceLocation Loc, bool IsEnabled) {
FPOptionsOverride NewFPFeatures = CurFPFeatureOverrides();		FPOptionsOverride NewFPFeatures = CurFPFeatureOverrides();
NewFPFeatures.setAllowFPReassociateOverride(IsEnabled);		NewFPFeatures.setAllowFPReassociateOverride(IsEnabled);
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

clang/test/CodeGenCUDA/fp-contract.cu

	// REQUIRES: x86-registered-target			// REQUIRES: x86-registered-target, nvptx-registered-target, amdgpu-registered-target
	// REQUIRES: nvptx-registered-target

	// By default we should fuse multiply/add into fma instruction.			// By default CUDA uses -ffp-contract=fast, HIP uses -ffp-contract=fast-honor-pragmas.
				// we should fuse multiply/add into fma instruction.
				// In IR, fmul/fadd instructions with contract flag are emitted.
				// In backend
				// nvptx - assumes fast fp fuse option, which fuses
				// mult/add insts disregarding contract flag and
				// llvm.fmuladd intrinsics.
				// amdgcn - assumes standard fp fuse option, which only
				// fuses mult/add insts with contract flag and
				// llvm.fmuladd intrinsics.

				// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
				// RUN: -disable-llvm-passes -o - %s \
				// RUN: \| FileCheck -check-prefixes=COMMON,NV-ON %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -target-cpu gfx906 -disable-llvm-passes -o - -x hip %s \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-ON %s
	// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \			// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
	// RUN: -disable-llvm-passes -o - %s \| FileCheck -check-prefix ENABLED %s			// RUN: -O3 -o - %s \
				// RUN: \| FileCheck -check-prefixes=COMMON,NV-OPT-FAST %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x hip %s \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-FASTSTD %s

				// Check separate compile/backend steps corresponding to -save-temps.

				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
				// RUN: -O3 -disable-llvm-passes -target-cpu gfx906 -o %t.ll -x hip %s
				// RUN: cat %t.ll \| FileCheck -check-prefixes=COMMON,AMD-OPT-FAST-IR %s
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x ir %t.ll \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-FASTSTD %s

	// Explicit -ffp-contract=fast			// Explicit -ffp-contract=fast
				// In IR, fmul/fadd instructions with contract flag are emitted.
				// In backend
				// nvptx/amdgcn - assumes fast fp fuse option, which fuses
				// mult/add insts disregarding contract flag and
				// llvm.fmuladd intrinsics.

	// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \			// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
	// RUN: -ffp-contract=fast -disable-llvm-passes -o - %s \			// RUN: -ffp-contract=fast -disable-llvm-passes -o - %s \
	// RUN: \| FileCheck -check-prefix ENABLED %s			// RUN: \| FileCheck -check-prefixes=COMMON,NV-ON %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -target-cpu gfx906 -disable-llvm-passes -o - -x hip %s \
				// RUN: -ffp-contract=fast \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-ON %s
				// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
				// RUN: -O3 -o - %s \
				// RUN: -ffp-contract=fast \
				// RUN: \| FileCheck -check-prefixes=COMMON,NV-OPT-FAST %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x hip %s \
				// RUN: -ffp-contract=fast \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-FAST %s

				// Check separate compile/backend steps corresponding to -save-temps.
				// When input is IR, -ffp-contract has no effect. Backend uses default
				// default FP fuse option.

				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
				// RUN: -ffp-contract=fast \
				// RUN: -O3 -disable-llvm-passes -target-cpu gfx906 -o %t.ll -x hip %s
				// RUN: cat %t.ll \| FileCheck -check-prefixes=COMMON,AMD-OPT-FAST-IR %s
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x ir %t.ll \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-FASTSTD %s

				// Explicit -ffp-contract=fast-honor-pragmas
				// In IR, fmul/fadd instructions with contract flag are emitted.
				// In backend
				// nvptx/amdgcn - assumes standard fp fuse option, which only
				// fuses mult/add insts with contract flag or
				// llvm.fmuladd intrinsics.

				// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
				// RUN: -ffp-contract=fast-honor-pragmas -disable-llvm-passes -o - %s \
				// RUN: \| FileCheck -check-prefixes=COMMON,NV-ON %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -target-cpu gfx906 -disable-llvm-passes -o - -x hip %s \
				// RUN: -ffp-contract=fast-honor-pragmas \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-ON %s
				// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
				// RUN: -O3 -o - %s \
				// RUN: -ffp-contract=fast-honor-pragmas \
				// RUN: \| FileCheck -check-prefixes=COMMON,NV-OPT-FASTSTD %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x hip %s \
				// RUN: -ffp-contract=fast-honor-pragmas \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-FASTSTD %s

				// Check separate compile/backend steps corresponding to -save-temps.
				// When input is IR, -ffp-contract has no effect. Backend uses default
				// default FP fuse option.

				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
				// RUN: -ffp-contract=fast-honor-pragmas \
				// RUN: -O3 -disable-llvm-passes -target-cpu gfx906 -o %t.ll -x hip %s
				// RUN: cat %t.ll \| FileCheck -check-prefixes=COMMON,AMD-OPT-FAST-IR %s
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x ir %t.ll \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-FASTSTD %s

	// Explicit -ffp-contract=on -- fusing by front-end.			// Explicit -ffp-contract=on -- fusing by front-end.
				// In IR,
				// mult/add in the same statement - llvm.fmuladd instrinsic emitted
				// mult/add in different statement - fmul/fadd instructions without
				// contract flag are emitted.
				// In backend
				// nvptx/amdgcn - assumes standard fp fuse option, which only
				// fuses mult/add insts with contract flag or
				// llvm.fmuladd intrinsics.

	// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \			// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
	// RUN: -ffp-contract=on -disable-llvm-passes -o - %s \			// RUN: -ffp-contract=on -disable-llvm-passes -o - %s \
	// RUN: \| FileCheck -check-prefix ENABLED %s			// RUN: \| FileCheck -check-prefixes=COMMON,NV-ON %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -target-cpu gfx906 -disable-llvm-passes -o - -x hip %s \
				// RUN: -ffp-contract=on \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-ON %s
				// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
				// RUN: -O3 -o - %s \
				// RUN: -ffp-contract=on \
				// RUN: \| FileCheck -check-prefixes=COMMON,NV-OPT-ON %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x hip %s \
				// RUN: -ffp-contract=on \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-ON %s

				// Check separate compile/backend steps corresponding to -save-temps.

				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
				// RUN: -ffp-contract=on \
				// RUN: -O3 -disable-llvm-passes -target-cpu gfx906 -o %t.ll -x hip %s
				// RUN: cat %t.ll \| FileCheck -check-prefixes=COMMON,AMD-OPT-ON-IR %s
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x ir %t.ll \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-ON %s

	// Explicit -ffp-contract=off should disable instruction fusing.			// Explicit -ffp-contract=off should disable instruction fusing.
				// In IR, fmul/fadd instructions without contract flag are emitted.
				// In backend
				// nvptx/amdgcn - assumes standard fp fuse option, which only
				// fuses mult/add insts with contract flag or
				// llvm.fmuladd intrinsics.

	// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \			// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
	// RUN: -ffp-contract=off -disable-llvm-passes -o - %s \			// RUN: -ffp-contract=off -disable-llvm-passes -o - %s \
	// RUN: \| FileCheck -check-prefix DISABLED %s			// RUN: \| FileCheck -check-prefixes=COMMON,NV-OFF %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -target-cpu gfx906 -disable-llvm-passes -o - -x hip %s \
				// RUN: -ffp-contract=off \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OFF %s
				// RUN: %clang_cc1 -fcuda-is-device -triple nvptx-nvidia-cuda -S \
				// RUN: -O3 -o - %s \
				// RUN: -ffp-contract=off \
				// RUN: \| FileCheck -check-prefixes=COMMON,NV-OPT-OFF %s
				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x hip %s \
				// RUN: -ffp-contract=off \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-OFF %s

				// Check separate compile/backend steps corresponding to -save-temps.

				// RUN: %clang_cc1 -fcuda-is-device -triple amdgcn-amd-amdhsa -emit-llvm \
				// RUN: -ffp-contract=off \
				// RUN: -O3 -disable-llvm-passes -target-cpu gfx906 -o %t.ll -x hip %s
				// RUN: cat %t.ll \| FileCheck -check-prefixes=COMMON,AMD-OPT-OFF-IR %s
				// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -S \
				// RUN: -O3 -target-cpu gfx906 -o - -x ir %t.ll \
				// RUN: \| FileCheck -check-prefixes=COMMON,AMD-OPT-OFF %s

	#include "Inputs/cuda.h"			#include "Inputs/cuda.h"

				// Test multiply/add in the same statement, which can be emitted as FMA when
				// fp-contract is on or fast.
	__host__ __device__ float func(float a, float b, float c) { return a + b * c; }			__host__ __device__ float func(float a, float b, float c) { return a + b * c; }
	// ENABLED: fma.rn.f32			// COMMON-LABEL: _Z4funcfff
	// ENABLED-NEXT: st.param.f32			// NV-ON: fma.rn.f32
				// NV-ON-NEXT: st.param.f32
				// AMD-ON: v_fmac_f32_e64
				// AMD-ON-NEXT: s_setpc_b64

				// NV-OFF: mul.rn.f32
				// NV-OFF-NEXT: add.rn.f32
				// NV-OFF-NEXT: st.param.f32
				// AMD-OFF: v_mul_f32_e64
				// AMD-OFF-NEXT: v_add_f32_e64
				// AMD-OFF-NEXT: s_setpc_b64

				// NV-OPT-FAST: fma.rn.f32
				// NV-OPT-FAST-NEXT: st.param.f32
				// NV-OPT-FASTSTD: fma.rn.f32
				// NV-OPT-FASTSTD-NEXT: st.param.f32
				// NV-OPT-ON: fma.rn.f32
				// NV-OPT-ON-NEXT: st.param.f32
				// NV-OPT-OFF: mul.rn.f32
				// NV-OPT-OFF-NEXT: add.rn.f32
				// NV-OPT-OFF-NEXT: st.param.f32

				// AMD-OPT-FAST-IR: fmul contract float
				yaxunlAuthorUnsubmitted Done Reply Inline Actions @rjmccall I have IRGen checks in this test. Are they sufficient? Thanks. yaxunl: @rjmccall I have IRGen checks in this test. Are they sufficient? Thanks.
				// AMD-OPT-FAST-IR: fadd contract float
				// AMD-OPT-ON-IR: @llvm.fmuladd.f32
				// AMD-OPT-OFF-IR: fmul float
				// AMD-OPT-OFF-IR: fadd float

				// AMD-OPT-FAST: v_fmac_f32_e32
				// AMD-OPT-FAST-NEXT: s_setpc_b64
				// AMD-OPT-FASTSTD: v_fmac_f32_e32
				// AMD-OPT-FASTSTD-NEXT: s_setpc_b64
				// AMD-OPT-ON: v_fmac_f32_e32
				// AMD-OPT-ON-NEXT: s_setpc_b64
				// AMD-OPT-OFF: v_mul_f32_e32
				// AMD-OPT-OFF-NEXT: v_add_f32_e32
				// AMD-OPT-OFF-NEXT: s_setpc_b64

				// Test multiply/add in the different statements, which can be emitted as
				// FMA when fp-contract is fast but not on.
				__host__ __device__ float func2(float a, float b, float c) {
				float t = b * c;
				return t + a;
				}
				// COMMON-LABEL: _Z5func2fff
				// NV-OPT-FAST: fma.rn.f32
				// NV-OPT-FAST-NEXT: st.param.f32
				// NV-OPT-FASTSTD: fma.rn.f32
				// NV-OPT-FASTSTD-NEXT: st.param.f32
				// NV-OPT-ON: mul.rn.f32
				// NV-OPT-ON: add.rn.f32
				// NV-OPT-ON-NEXT: st.param.f32
				// NV-OPT-OFF: mul.rn.f32
				// NV-OPT-OFF: add.rn.f32
				// NV-OPT-OFF-NEXT: st.param.f32

				// AMD-OPT-FAST-IR: fmul contract float
				// AMD-OPT-FAST-IR: fadd contract float
				// AMD-OPT-ON-IR: fmul float
				// AMD-OPT-ON-IR: fadd float
				// AMD-OPT-OFF-IR: fmul float
				// AMD-OPT-OFF-IR: fadd float

				// AMD-OPT-FAST: v_fmac_f32_e32
				// AMD-OPT-FAST-NEXT: s_setpc_b64
				// AMD-OPT-FASTSTD: v_fmac_f32_e32
				// AMD-OPT-FASTSTD-NEXT: s_setpc_b64
				// AMD-OPT-ON: v_mul_f32_e32
				// AMD-OPT-ON-NEXT: v_add_f32_e32
				// AMD-OPT-ON-NEXT: s_setpc_b64
				// AMD-OPT-OFF: v_mul_f32_e32
				// AMD-OPT-OFF-NEXT: v_add_f32_e32
				// AMD-OPT-OFF-NEXT: s_setpc_b64

				// Test multiply/add in the different statements, which is forced
				// to be compiled with fp contract on. fmul/fadd without contract
				// flags are emitted in IR. In nvptx, they are emitted as FMA in
				// fp-contract is fast but not on, as nvptx backend uses the same
				// fp fuse option as front end, whereas fast fp fuse option in
				// backend fuses fadd/fmul disregarding contract flag. In amdgcn
				// they are not fused as amdgcn always use standard fp fusion
				// option which respects contract flag.
				__host__ __device__ float func3(float a, float b, float c) {
				#pragma clang fp contract(on)
				float t = b * c;
				return t + a;
				}
				// COMMON-LABEL: _Z5func3fff
				// NV-OPT-FAST: fma.rn.f32
				// NV-OPT-FAST-NEXT: st.param.f32
				// NV-OPT-FASTSTD: mul.rn.f32
				// NV-OPT-FASTSTD: add.rn.f32
				// NV-OPT-FASTSTD-NEXT: st.param.f32
				// NV-OPT-ON: mul.rn.f32
				// NV-OPT-ON: add.rn.f32
				// NV-OPT-ON-NEXT: st.param.f32
				// NV-OPT-OFF: mul.rn.f32
				// NV-OPT-OFF: add.rn.f32
				// NV-OPT-OFF-NEXT: st.param.f32

				// AMD-OPT-FAST-IR: fmul float
				// AMD-OPT-FAST-IR: fadd float
				// AMD-OPT-ON-IR: fmul float
				// AMD-OPT-ON-IR: fadd float
				// AMD-OPT-OFF-IR: fmul float
				// AMD-OPT-OFF-IR: fadd float

	// DISABLED: mul.rn.f32			// AMD-OPT-FAST: v_fmac_f32_e32
	// DISABLED-NEXT: add.rn.f32			// AMD-OPT-FAST-NEXT: s_setpc_b64
	// DISABLED-NEXT: st.param.f32			// AMD-OPT-FASTSTD: v_mul_f32_e32
				// AMD-OPT-FASTSTD-NEXT: v_add_f32_e32
				// AMD-OPT-FASTSTD-NEXT: s_setpc_b64
				// AMD-OPT-ON: v_mul_f32_e32
				// AMD-OPT-ON-NEXT: v_add_f32_e32
				// AMD-OPT-ON-NEXT: s_setpc_b64
				// AMD-OPT-OFF: v_mul_f32_e32
				// AMD-OPT-OFF-NEXT: v_add_f32_e32
				// AMD-OPT-OFF-NEXT: s_setpc_b64

clang/test/Driver/autocomplete.c

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
	// FNOSANICOVERALL-NEXT: trace-bb			// FNOSANICOVERALL-NEXT: trace-bb
	// FNOSANICOVERALL-NEXT: trace-cmp			// FNOSANICOVERALL-NEXT: trace-cmp
	// FNOSANICOVERALL-NEXT: trace-div			// FNOSANICOVERALL-NEXT: trace-div
	// FNOSANICOVERALL-NEXT: trace-gep			// FNOSANICOVERALL-NEXT: trace-gep
	// FNOSANICOVERALL-NEXT: trace-pc			// FNOSANICOVERALL-NEXT: trace-pc
	// FNOSANICOVERALL-NEXT: trace-pc-guard			// FNOSANICOVERALL-NEXT: trace-pc-guard
	// RUN: %clang --autocomplete=-ffp-contract= \| FileCheck %s -check-prefix=FFPALL			// RUN: %clang --autocomplete=-ffp-contract= \| FileCheck %s -check-prefix=FFPALL
	// FFPALL: fast			// FFPALL: fast
				// FFPALL-NEXT: fast-honor-pragmas
	// FFPALL-NEXT: off			// FFPALL-NEXT: off
	// FFPALL-NEXT: on			// FFPALL-NEXT: on
	// RUN: %clang --autocomplete=-flto= \| FileCheck %s -check-prefix=FLTOALL			// RUN: %clang --autocomplete=-flto= \| FileCheck %s -check-prefix=FLTOALL
	// FLTOALL: full			// FLTOALL: full
	// FLTOALL-NEXT: thin			// FLTOALL-NEXT: thin
	// RUN: %clang --autocomplete=-fveclib= \| FileCheck %s -check-prefix=FVECLIBALL			// RUN: %clang --autocomplete=-fveclib= \| FileCheck %s -check-prefix=FVECLIBALL
	// FVECLIBALL: Accelerate			// FVECLIBALL: Accelerate
	// FVECLIBALL-NEXT: libmvec			// FVECLIBALL-NEXT: libmvec
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[HIP] Fix regressions due to fp contract changeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 307319

clang/docs/LanguageExtensions.rst

clang/docs/UsersManual.rst

clang/include/clang/Basic/LangOptions.h

clang/include/clang/Driver/Options.td

clang/lib/CodeGen/BackendUtil.cpp

clang/lib/Frontend/CompilerInvocation.cpp

clang/lib/Sema/SemaAttr.cpp

clang/test/CodeGenCUDA/fp-contract.cu

clang/test/Driver/autocomplete.c

[HIP] Fix regressions due to fp contract change
ClosedPublic