This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/IR/
-
llvm/
-
IR/
-
Operator.h
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/
-
CodeGen/
-
AArch64/
-
fdiv-combine.ll
-
AMDGPU/
-
fdiv.ll
-
PowerPC/
-
fdiv-combine.ll
-
X86/
-
fdiv-combine.ll
-
LTO/X86/
-
X86/
-
Inputs/
-
fast-with-recip.ll
-
fast-without-recip.ll
-
fast-recip.ll
-
Transforms/
-
InstCombine/
-
fast-math.ll
-
SLPVectorizer/X86/
-
X86/
-
propagate_ir_flags.ll
-
unittests/IR/
-
IR/
-
IRBuilderTest.cpp

Differential D26708

Fix -f[no-]reciprocal-math -ffast-math interaction, including LTO
AbandonedPublic

Authored by wristow on Nov 15 2016, 3:22 PM.

Download Raw Diff

Details

Reviewers

spatel
• tstellarAMD
hfinkel
javed.absar

Summary

There are some inconsistencies in the handling of fast-math-flags that are
(a) preventing the selective disabling of individual fast-math features and
(b) keeping some features from working properly with LTO.
The proposed change here fixes an immediate problem where reciprocal-math
isn't handled correctly. This is one small step in handling these issues
that happen for some of the other fast-math-flags as well.

Here is a simple test-case that illustrates the problem for the
reciprocal-math situation.

extern void use(float x, float y);

void test(float a, float b, float c)
{
  float q1 = a / c;
  float q2 = b / c;
  use(q1, q2);
}

Without -ffast-math, two divisions will be done, and with -ffast-math only
one division will happen, since this will be transformed into:

float tmp = 1.0f / c;
float q1 = a * tmp;
float q2 = b * tmp;
use(q1, q2);

The bug is that with -ffast-math -fno-reciprocal-math, this
reciprocal-transformation is not suppressed.

tl;dr

The situation is that passing -ffast-math on the command-line, results in
passing the following 6 lower-level flags to cc1:

-menable-no-infs
-menable-no-nans
-fno-signed-zeros
-freciprocal-math
-fno-trapping-math
-ffp-contract=fast

and also still passing -ffast-math itself to cc1 (the act of passing
-ffast-math to cc1 results in the macro __FAST_MATH__ being defined).

These low level flags can be disabled individually. As an aside, when
-ffast-math is used, and a certain subset of the above are not disabled,
then the switch:

-menable-unsafe-fp-math

is also passed to cc1.

Ultimately, even when -fno-reciprocal-math is passed on the command-line,
the fact that -ffast-math is still passed to cc1 ends up setting the flag
UnsafeAlgebra in LLVM, which ends up over-riding the user's request to
suppress the reciprocal-math transformation. The code-change here is a
fairly simple one to deal with this.

Prior to this change, the reciprocal transformations are enabled when
either the fast or arcp IR-level flags are on. The philosophy of the
approach taken here is to only enable reciprocal transformations when
arcp is on. To put it another way, rather than an "umbrella" flag such
as fast being checked in the back-end (along with an individual flag like
arcp), it seems to me that just checking the individual flag expresses
the need more cleanly. Any fast-math-related transformation that doesn't
have an individual flag (e.g., re-association currently doesn't), should
eventually have an individual flag defined for it, and then that individual
flag should be checked. In the end, if -ffast-math sets the 6
lower-level flags described above, then the equivalent setting of the
individual flags should be equivalent. That is, ultimately, the following
2 user-commands should produce the same code:

clang -c -O2 -ffast-math foo.c
clang -c -O2 -D__FAST_MATH__ -fno-honor-infinities -fno-honor-nans -fno-signed-zeros -freciprocal-math -fno-trapping-math -ffp-contract=fast foo.c

and this proposed change is a small step in that direction.

This is my first venture into this area of LLVM, and I may be
misunderstanding some of the bigger-picture aspects. Maybe the approach of
controlling the reciprocal transformation strictly by arcp (rather than
arcp OR fast) is counter to some other expectations. In which case,
I'll be happy to learn more about how this is intended to work.

Related to this being my first venture into this area, although this fixes
the immediate problem, even with it there are some lurking issues (possibly
in Clang rather than LLVM, or maybe in both, but I'm not sure).
Specifically, if the above test-case (that computes the two quotients q1
and q2) is compiled with -ffast-math producing a .ll file, then there
is no indication in the .ll file that the reciprocal transformation is
enabled. That is, the arcp flag is not on the division instructions
(although the fast flag is, as expected -- but in this model I'm
describing, the fast flag is not to be checked for this). Continuing to
process that .ll file through to an assembly file shows two divisions
happening, confirming that (incorrectly) the reciprocal-transformation
does not happen. For example:

$ clang -S -o test_via_ll.ll -emit-llvm -O2 -ffast-math test.c
$ llc -o test_via_ll.s test_via_ll.ll  # via .ll: -ffast-math did not get the job done
$ grep div test_via_ll.s
        divss   %xmm2, %xmm0
        divss   %xmm2, %xmm1
$

Whereas doing the same compilation via a .bc file does honor the request
for doing the reciprocal transformation:

$ clang -c -o test_via_bc.bc -emit-llvm -O2 -ffast-math test.c
$ llc -o test_via_bc.s test_via_bc.bc
$ grep div test_via_bc.s               # via .bc: -ffast-math did get the job done
        divss   %xmm2, %xmm3
$

To be clear, without this proposed change, the approach via the .ll file
does correctly do the reciprocal transformation (as does the .bc approach).
And with my proposed change, the .bc approach continues to work (and the
bug that's the whole point of this patch is fixed), but the .ll approach shown
above fails. Manually adding the arcp flag to test_via_ll.s does result
in the reciprocal transformation being done with the updated compiler, as
expected.

I don't think this .ll issue causes any problems when going through normal
compilations (that is, when producing object files or bitcode files), but
it clearly is trouble when producing .ll files.

Diff Detail

Event Timeline

wristow updated this revision to Diff 78085.Nov 15 2016, 3:22 PM

wristow retitled this revision from to Fix -f[no-]reciprocal-math -ffast-math interaction, including LTO.

wristow updated this object.

wristow added reviewers: spatel, hfinkel.

wristow added a subscriber: llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptNov 15 2016, 3:22 PM

Herald added subscribers: mehdi_amini, nhaehnle, nemanjai. · View Herald Transcript

This is related to a discussion in PR27372 (although not the original issue of that PR).

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

If I understand what you wrote correctly, the correct target of your attention would be Clang instead of LLVM.

In D26708#596608, @majnemer wrote:

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

So the change to make -ffast-math -fno-reciprocal-math work should be (a) remove the fast flag, and (b) add all the other ones (except arcp) to each relevant place?

In D26708#596623, @wristow wrote:

In D26708#596608, @majnemer wrote:

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

So the change to make -ffast-math -fno-reciprocal-math work should be (a) remove the fast flag, and (b) add all the other ones (except arcp) to each relevant place?

We might reconsider having 'fast' imply all of the other flags. I believe this was a suboptimal design choice, and so long as someone is willing to do the work to separate out the various semantic requirements, we should allow that work to proceed. We should discuss this on llvm-dev first, however.

In D26708#596664, @hfinkel wrote:

We might reconsider having 'fast' imply all of the other flags. I believe this was a suboptimal design choice, and so long as someone is willing to do the work to separate out the various semantic requirements, we should allow that work to proceed.

I have to admit I felt "funny" about changing the semantics of whether fast should imply all the other flags. Reconsidering that design choice is what I was implicitly suggesting when I said:

To put it another way, rather than an "umbrella" flag such
as fast being checked in the back-end (along with an individual flag like
arcp), it seems to me that just checking the individual flag expresses
the need more cleanly. Any fast-math-related transformation that doesn't
have an individual flag (e.g., re-association currently doesn't), should
eventually have an individual flag defined for it, and then that individual
flag should be checked.

Regarding:

We should discuss this on llvm-dev first, however.

Sounds good. I'll start a discussion on llvm-dev.

We should discuss this on llvm-dev first, however.

Sounds good. I'll start a discussion on llvm-dev.

http://lists.llvm.org/pipermail/llvm-dev/2016-November/107104.html

I think it would be a lot better to have sub-options to fast-math in a similar way to how target features are handled (via -mattr).

In D26708#596811, @kparzysz wrote:

I think it would be a lot better to have sub-options to fast-math in a similar way to how target features are handled (via -mattr).

Let me clarify---not to fix this issue here, but in general. It would make setting/clearing of the individual options much clearer.

In D26708#596623, @wristow wrote:

In D26708#596608, @majnemer wrote:

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

So the change to make -ffast-math -fno-reciprocal-math work should be (a) remove the fast flag, and (b) add all the other ones (except arcp) to each relevant place?

This is the behavior I'd expect clang to have.

In D26708#596832, @kparzysz wrote:

In D26708#596811, @kparzysz wrote:

I think it would be a lot better to have sub-options to fast-math in a similar way to how target features are handled (via -mattr).

Let me clarify---not to fix this issue here, but in general. It would make setting/clearing of the individual options much clearer.

Can you give a straw man example of how such options would look like on the command line?

In D26708#596959, @majnemer wrote:

In D26708#596623, @wristow wrote:

In D26708#596608, @majnemer wrote:

This change seems wrong. Our IR defines fast as implying arcp: http://llvm.org/docs/LangRef.html#fast-math-flags

So the change to make -ffast-math -fno-reciprocal-math work should be (a) remove the fast flag, and (b) add all the other ones (except arcp) to each relevant place?

This is the behavior I'd expect clang to have.

OK, thanks. There's more discussion over on the mailing list, so I'll continue over there.

In D26708#596973, @mehdi_amini wrote:

Let me clarify---not to fix this issue here, but in general. It would make setting/clearing of the individual options much clearer.

Can you give a straw man example of how such options would look like on the command line?

Sure. For example, something like
-ffast-math=+noinf,-nonan -ffast-math=+recip
would enable "no infinities", disable "no NaNs", and enable the use of reciprocals. Each such occurrence of -ffast-math would behave as if it was combined with all the preceding ones, i.e. the above would be equivalent to -ffast-math=+noinf,-nonan,+recip.

There could also be something like -ffast-math=none and -ffast-math=all to disable/enable all available settings respectively.

While the existing options could be handled meaningfully, this scheme has the benefit of being less ambiguous to the user.

The IBM XLC compiler has something similar for -qstrict.

Neat! Thanks Krzysztof.

Abandoning this very old proposed patch. This patch was about some fundamental issues with the "umbrella" aspect of the FMF fast. This was intended to be a small step in solving those issues. That umbrella aspect of fast was fixed by by Sanjay (https://reviews.llvm.org/D39304). With that groundwork done, there have been a handful of additional improvements. With all those improvements, the issue of this patch is no longer a problem.

Herald added a reviewer: javed.absar. · View Herald TranscriptMay 21 2018, 3:12 PM

Herald added subscribers: inglorion, wdng. · View Herald Transcript

Revision Contents

Path

Size


	llvm/

include/

llvm/

IR/

Operator.h

2 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

5 lines

test/

CodeGen/

AArch64/

fdiv-combine.ll

24 lines

AMDGPU/

fdiv.ll

4 lines

PowerPC/

fdiv-combine.ll

6 lines

X86/

fdiv-combine.ll

4 lines

Transforms/

InstCombine/

fast-math.ll

2 lines

SLPVectorizer/

X86/

propagate_ir_flags.ll

6 lines

unittests/

IR/

IRBuilderTest.cpp

2 lines

test/

LTO/

X86/

Inputs/

fast-with-recip.ll

9 lines

fast-without-recip.ll

9 lines

fast-recip.ll

38 lines

Diff 78085

include/llvm/IR/Operator.h

Show First 20 Lines • Show All 196 Lines • ▼ Show 20 Lines	public:
void setNoInfs() { Flags \|= NoInfs; }		void setNoInfs() { Flags \|= NoInfs; }
void setNoSignedZeros() { Flags \|= NoSignedZeros; }		void setNoSignedZeros() { Flags \|= NoSignedZeros; }
void setAllowReciprocal() { Flags \|= AllowReciprocal; }		void setAllowReciprocal() { Flags \|= AllowReciprocal; }
void setUnsafeAlgebra() {		void setUnsafeAlgebra() {
Flags \|= UnsafeAlgebra;		Flags \|= UnsafeAlgebra;
setNoNaNs();		setNoNaNs();
setNoInfs();		setNoInfs();
setNoSignedZeros();		setNoSignedZeros();
setAllowReciprocal();
}		}

void operator&=(const FastMathFlags &OtherFlags) {		void operator&=(const FastMathFlags &OtherFlags) {
Flags &= OtherFlags.Flags;		Flags &= OtherFlags.Flags;
}		}
};		};


/// Utility class for floating point operations which can have		/// Utility class for floating point operations which can have
/// information about relaxed accuracy requirements attached to them.		/// information about relaxed accuracy requirements attached to them.
class FPMathOperator : public Operator {		class FPMathOperator : public Operator {
private:		private:
friend class Instruction;		friend class Instruction;

void setHasUnsafeAlgebra(bool B) {		void setHasUnsafeAlgebra(bool B) {
SubclassOptionalData =		SubclassOptionalData =
(SubclassOptionalData & ~FastMathFlags::UnsafeAlgebra) \|		(SubclassOptionalData & ~FastMathFlags::UnsafeAlgebra) \|
(B * FastMathFlags::UnsafeAlgebra);		(B * FastMathFlags::UnsafeAlgebra);

// Unsafe algebra implies all the others		// Unsafe algebra implies all the others
if (B) {		if (B) {
setHasNoNaNs(true);		setHasNoNaNs(true);
setHasNoInfs(true);		setHasNoInfs(true);
setHasNoSignedZeros(true);		setHasNoSignedZeros(true);
setHasAllowReciprocal(true);
}		}
}		}
void setHasNoNaNs(bool B) {		void setHasNoNaNs(bool B) {
SubclassOptionalData =		SubclassOptionalData =
(SubclassOptionalData & ~FastMathFlags::NoNaNs) \|		(SubclassOptionalData & ~FastMathFlags::NoNaNs) \|
(B * FastMathFlags::NoNaNs);		(B * FastMathFlags::NoNaNs);
}		}
void setHasNoInfs(bool B) {		void setHasNoInfs(bool B) {
▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,881 Lines • ▼ Show 20 Lines
	// Combine multiple FDIVs with the same divisor into multiple FMULs by the			// Combine multiple FDIVs with the same divisor into multiple FMULs by the
	// reciprocal.			// reciprocal.
	// E.g., (a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)			// E.g., (a / D; b / D;) -> (recip = 1.0 / D; a * recip; b * recip)
	// Notice that this is not always beneficial. One reason is different target			// Notice that this is not always beneficial. One reason is different target
	// may have different costs for FDIV and FMUL, so sometimes the cost of two			// may have different costs for FDIV and FMUL, so sometimes the cost of two
	// FDIVs may be lower than the cost of one FDIV and two FMULs. Another reason			// FDIVs may be lower than the cost of one FDIV and two FMULs. Another reason
	// is the critical path is increased from "one FDIV" to "one FDIV + one FMUL".			// is the critical path is increased from "one FDIV" to "one FDIV + one FMUL".
	SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {			SDValue DAGCombiner::combineRepeatedFPDivisors(SDNode *N) {
	bool UnsafeMath = DAG.getTarget().Options.UnsafeFPMath;
	const SDNodeFlags *Flags = N->getFlags();			const SDNodeFlags *Flags = N->getFlags();
	if (!UnsafeMath && !Flags->hasAllowReciprocal())			if (!Flags->hasAllowReciprocal())
	return SDValue();			return SDValue();

	// Skip if current node is a reciprocal.			// Skip if current node is a reciprocal.
	SDValue N0 = N->getOperand(0);			SDValue N0 = N->getOperand(0);
	ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);			ConstantFPSDNode *N0CFP = dyn_cast<ConstantFPSDNode>(N0);
	if (N0CFP && N0CFP->isExactlyValue(1.0))			if (N0CFP && N0CFP->isExactlyValue(1.0))
	return SDValue();			return SDValue();

	// Exit early if the target does not want this transform or if there can't			// Exit early if the target does not want this transform or if there can't
	// possibly be enough uses of the divisor to make the transform worthwhile.			// possibly be enough uses of the divisor to make the transform worthwhile.
	SDValue N1 = N->getOperand(1);			SDValue N1 = N->getOperand(1);
	unsigned MinUses = TLI.combineRepeatedFPDivisors();			unsigned MinUses = TLI.combineRepeatedFPDivisors();
	if (!MinUses \|\| N1->use_size() < MinUses)			if (!MinUses \|\| N1->use_size() < MinUses)
	return SDValue();			return SDValue();

	// Find all FDIV users of the same divisor.			// Find all FDIV users of the same divisor.
	// Use a set because duplicates may be present in the user list.			// Use a set because duplicates may be present in the user list.
	SetVector<SDNode *> Users;			SetVector<SDNode *> Users;
	for (auto *U : N1->uses()) {			for (auto *U : N1->uses()) {
	if (U->getOpcode() == ISD::FDIV && U->getOperand(1) == N1) {			if (U->getOpcode() == ISD::FDIV && U->getOperand(1) == N1) {
	// This division is eligible for optimization only if global unsafe math			// This division is eligible for optimization only if global unsafe math
	// is enabled or if this division allows reciprocal formation.			// is enabled or if this division allows reciprocal formation.
	if (UnsafeMath \|\| U->getFlags()->hasAllowReciprocal())			if (U->getFlags()->hasAllowReciprocal())
	Users.insert(U);			Users.insert(U);
	}			}
	}			}

	// Now that we have the actual number of divisor uses, make sure it meets			// Now that we have the actual number of divisor uses, make sure it meets
	// the minimum threshold specified by the target.			// the minimum threshold specified by the target.
	if (Users.size() < MinUses)			if (Users.size() < MinUses)
	return SDValue();			return SDValue();
	▲ Show 20 Lines • Show All 6,542 Lines • Show Last 20 Lines

test/CodeGen/AArch64/fdiv-combine.ll

	; RUN: llc -mtriple=aarch64-unknown-unknown < %s \| FileCheck %s			; RUN: llc -mtriple=aarch64-unknown-unknown < %s \| FileCheck %s

	; Following test cases check:			; Following test cases check:
	; a / D; b / D; c / D;			; a / D; b / D; c / D;
	; =>			; =>
	; recip = 1.0 / D; a * recip; b * recip; c * recip;			; recip = 1.0 / D; a * recip; b * recip; c * recip;
	define void @three_fdiv_float(float %D, float %a, float %b, float %c) #0 {			define void @three_fdiv_float(float %D, float %a, float %b, float %c) #0 {
	; CHECK-LABEL: three_fdiv_float:			; CHECK-LABEL: three_fdiv_float:
	; CHECK: fdiv s			; CHECK: fdiv s
	; CHECK-NOT: fdiv			; CHECK-NOT: fdiv
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	%div = fdiv float %a, %D			%div = fdiv arcp float %a, %D
	%div1 = fdiv float %b, %D			%div1 = fdiv arcp float %b, %D
	%div2 = fdiv float %c, %D			%div2 = fdiv arcp float %c, %D
	tail call void @foo_3f(float %div, float %div1, float %div2)			tail call void @foo_3f(float %div, float %div1, float %div2)
	ret void			ret void
	}			}

	define void @three_fdiv_double(double %D, double %a, double %b, double %c) #0 {			define void @three_fdiv_double(double %D, double %a, double %b, double %c) #0 {
	; CHECK-LABEL: three_fdiv_double:			; CHECK-LABEL: three_fdiv_double:
	; CHECK: fdiv d			; CHECK: fdiv d
	; CHECK-NOT: fdiv			; CHECK-NOT: fdiv
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	%div = fdiv double %a, %D			%div = fdiv arcp double %a, %D
	%div1 = fdiv double %b, %D			%div1 = fdiv arcp double %b, %D
	%div2 = fdiv double %c, %D			%div2 = fdiv arcp double %c, %D
	tail call void @foo_3d(double %div, double %div1, double %div2)			tail call void @foo_3d(double %div, double %div1, double %div2)
	ret void			ret void
	}			}

	define void @three_fdiv_4xfloat(<4 x float> %D, <4 x float> %a, <4 x float> %b, <4 x float> %c) #0 {			define void @three_fdiv_4xfloat(<4 x float> %D, <4 x float> %a, <4 x float> %b, <4 x float> %c) #0 {
	; CHECK-LABEL: three_fdiv_4xfloat:			; CHECK-LABEL: three_fdiv_4xfloat:
	; CHECK: fdiv v			; CHECK: fdiv v
	; CHECK-NOT: fdiv			; CHECK-NOT: fdiv
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	%div = fdiv <4 x float> %a, %D			%div = fdiv arcp <4 x float> %a, %D
	%div1 = fdiv <4 x float> %b, %D			%div1 = fdiv arcp <4 x float> %b, %D
	%div2 = fdiv <4 x float> %c, %D			%div2 = fdiv arcp <4 x float> %c, %D
	tail call void @foo_3_4xf(<4 x float> %div, <4 x float> %div1, <4 x float> %div2)			tail call void @foo_3_4xf(<4 x float> %div, <4 x float> %div1, <4 x float> %div2)
	ret void			ret void
	}			}

	define void @three_fdiv_2xdouble(<2 x double> %D, <2 x double> %a, <2 x double> %b, <2 x double> %c) #0 {			define void @three_fdiv_2xdouble(<2 x double> %D, <2 x double> %a, <2 x double> %b, <2 x double> %c) #0 {
	; CHECK-LABEL: three_fdiv_2xdouble:			; CHECK-LABEL: three_fdiv_2xdouble:
	; CHECK: fdiv v			; CHECK: fdiv v
	; CHECK-NOT: fdiv			; CHECK-NOT: fdiv
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	%div = fdiv <2 x double> %a, %D			%div = fdiv arcp <2 x double> %a, %D
	%div1 = fdiv <2 x double> %b, %D			%div1 = fdiv arcp <2 x double> %b, %D
	%div2 = fdiv <2 x double> %c, %D			%div2 = fdiv arcp <2 x double> %c, %D
	tail call void @foo_3_2xd(<2 x double> %div, <2 x double> %div1, <2 x double> %div2)			tail call void @foo_3_2xd(<2 x double> %div, <2 x double> %div1, <2 x double> %div2)
	ret void			ret void
	}			}

	; Following test cases check we never combine two FDIVs if neither of them			; Following test cases check we never combine two FDIVs if neither of them
	; calculates a reciprocal.			; calculates a reciprocal.
	define void @two_fdiv_float(float %D, float %a, float %b) #0 {			define void @two_fdiv_float(float %D, float %a, float %b) #0 {
	; CHECK-LABEL: two_fdiv_float:			; CHECK-LABEL: two_fdiv_float:
	Show All 28 Lines

test/CodeGen/AMDGPU/fdiv.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines

	; FUNC-LABEL: {{^}}fdiv_fast_denormals_f32:			; FUNC-LABEL: {{^}}fdiv_fast_denormals_f32:
	; SI: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}			; SI: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}
	; SI: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]			; SI: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]
	; SI-NOT: [[RESULT]]			; SI-NOT: [[RESULT]]
	; SI: buffer_store_dword [[RESULT]]			; SI: buffer_store_dword [[RESULT]]
	define void @fdiv_fast_denormals_f32(float addrspace(1)* %out, float %a, float %b) #2 {			define void @fdiv_fast_denormals_f32(float addrspace(1)* %out, float %a, float %b) #2 {
	entry:			entry:
	%fdiv = fdiv fast float %a, %b			%fdiv = fdiv fast arcp float %a, %b
	store float %fdiv, float addrspace(1)* %out			store float %fdiv, float addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}fdiv_f32_fast_math:			; FUNC-LABEL: {{^}}fdiv_f32_fast_math:
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS

	; SI: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}			; SI: v_rcp_f32_e32 [[RCP:v[0-9]+]], s{{[0-9]+}}
	; SI: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]			; SI: v_mul_f32_e32 [[RESULT:v[0-9]+]], s{{[0-9]+}}, [[RCP]]
	; SI-NOT: [[RESULT]]			; SI-NOT: [[RESULT]]
	; SI: buffer_store_dword [[RESULT]]			; SI: buffer_store_dword [[RESULT]]
	define void @fdiv_f32_fast_math(float addrspace(1)* %out, float %a, float %b) #0 {			define void @fdiv_f32_fast_math(float addrspace(1)* %out, float %a, float %b) #0 {
	entry:			entry:
	%fdiv = fdiv fast float %a, %b			%fdiv = fdiv fast arcp float %a, %b
	store float %fdiv, float addrspace(1)* %out			store float %fdiv, float addrspace(1)* %out
	ret void			ret void
	}			}

	; FUNC-LABEL: {{^}}fdiv_f32_arcp_math:			; FUNC-LABEL: {{^}}fdiv_f32_arcp_math:
	; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W			; R600-DAG: RECIP_IEEE * T{{[0-9]+\.[XYZW]}}, KC0[2].W
	; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS			; R600-DAG: MUL_IEEE {{\** *}}T{{[0-9]+\.[XYZW]}}, KC0[2].Z, PS

	▲ Show 20 Lines • Show All 142 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/fdiv-combine.ll

	; RUN: llc -verify-machineinstrs -mcpu=ppc64 < %s \| FileCheck %s			; RUN: llc -verify-machineinstrs -mcpu=ppc64 < %s \| FileCheck %s
	target datalayout = "E-m:e-i64:64-n32:64"			target datalayout = "E-m:e-i64:64-n32:64"
	target triple = "powerpc64-unknown-linux-gnu"			target triple = "powerpc64-unknown-linux-gnu"

	; Following test case checks:			; Following test case checks:
	; a / D; b / D; c / D;			; a / D; b / D; c / D;
	; =>			; =>
	; recip = 1.0 / D; a * recip; b * recip; c * recip;			; recip = 1.0 / D; a * recip; b * recip; c * recip;

	define void @three_fdiv_double(double %D, double %a, double %b, double %c) #0 {			define void @three_fdiv_double(double %D, double %a, double %b, double %c) #0 {
	; CHECK-LABEL: three_fdiv_double:			; CHECK-LABEL: three_fdiv_double:
	; CHECK: fdiv {{[0-9]}}			; CHECK: fdiv {{[0-9]}}
	; CHECK-NOT: fdiv			; CHECK-NOT: fdiv
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	; CHECK: fmul			; CHECK: fmul
	%div = fdiv double %a, %D			%div = fdiv arcp double %a, %D
	%div1 = fdiv double %b, %D			%div1 = fdiv arcp double %b, %D
	%div2 = fdiv double %c, %D			%div2 = fdiv arcp double %c, %D
	tail call void @foo_3d(double %div, double %div1, double %div2)			tail call void @foo_3d(double %div, double %div1, double %div2)
	ret void			ret void
	}			}

	define void @two_fdiv_double(double %D, double %a, double %b) #0 {			define void @two_fdiv_double(double %D, double %a, double %b) #0 {
	; CHECK-LABEL: two_fdiv_double:			; CHECK-LABEL: two_fdiv_double:
	; CHECK: fdiv {{[0-9]}}			; CHECK: fdiv {{[0-9]}}
	; CHECK: fdiv {{[0-9]}}			; CHECK: fdiv {{[0-9]}}
	Show All 12 Lines

test/CodeGen/X86/fdiv-combine.ll

	Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines
	define double @div3_arcp(double %x, double %y, double %z) {			define double @div3_arcp(double %x, double %y, double %z) {
	; CHECK-LABEL: div3_arcp:			; CHECK-LABEL: div3_arcp:
	; CHECK: # BB#0:			; CHECK: # BB#0:
	; CHECK-NEXT: movsd{{.*#+}} xmm2 = mem[0],zero			; CHECK-NEXT: movsd{{.*#+}} xmm2 = mem[0],zero
	; CHECK-NEXT: divsd %xmm1, %xmm2			; CHECK-NEXT: divsd %xmm1, %xmm2
	; CHECK-NEXT: mulsd %xmm2, %xmm0			; CHECK-NEXT: mulsd %xmm2, %xmm0
	; CHECK-NEXT: addsd %xmm2, %xmm0			; CHECK-NEXT: addsd %xmm2, %xmm0
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%div1 = fdiv fast double 1.0, %y			%div1 = fdiv arcp double 1.0, %y
	%div2 = fdiv fast double %x, %y			%div2 = fdiv arcp double %x, %y
	%ret = fadd fast double %div2, %div1			%ret = fadd fast double %div2, %div1
	ret double %ret			ret double %ret
	}			}

	define void @PR24141() {			define void @PR24141() {
	; CHECK-LABEL: PR24141:			; CHECK-LABEL: PR24141:
	; CHECK: callq			; CHECK: callq
	; CHECK-NEXT: divsd			; CHECK-NEXT: divsd
	Show All 15 Lines

test/LTO/X86/Inputs/fast-with-recip.ll

This file was added.

				define void @fastWithRecip(float %a, float %b, float %c) {
				entry:
				%div = fdiv fast arcp float %a, %c
				%div1 = fdiv fast arcp float %b, %c
				tail call void @useWithRecip(float %div, float %div1)
				ret void
				}

				declare void @useWithRecip(float, float)

test/LTO/X86/Inputs/fast-without-recip.ll

This file was added.

				define void @fastWithoutRecip(float %a, float %b, float %c) {
				entry:
				%div = fdiv fast float %a, %c
				%div1 = fdiv fast float %b, %c
				tail call void @useWithoutRecip(float %div, float %div1)
				ret void
				}

				declare void @useWithoutRecip(float, float)

test/LTO/X86/fast-recip.ll

This file was added.

				; RUN: llvm-link -o %t.bc %s %p/Inputs/fast-without-recip.ll %p/Inputs/fast-with-recip.ll
				; RUN: opt -inline -instcombine -o %t2.bc %t.bc
				; RUN: llc -disable-tail-calls %t2.bc -o - \| FileCheck %s

				; Inlining will be done on fastWithRecip() (built with fast-math leaving the
				; reciprocal-transformation enabled), and fastWithoutRecip() (built with
				; fast-math but disabling the reciprocal-transformation). They both contain
				; two divisions with the same denominator, and so are candidates for the
				; reciprocal-transformation. We verify that in the enabled version, only
				; one division is done (the reciprocal) followed by two multiplications. And
				; in the disabled version, both divisions are done (and no multiplications).

				define void @foo(float %a0, float %a1, float %a2, float %a3, float %a4, float %a5) #0 {
				entry:
				; CHECK: fooEnter
				; CHECK: div
				; CHECK-NOT: div
				; CHECK: mul
				; CHECK: mul
				; CHECK: useWithRecip
				; CHECK: div
				; CHECK: div
				; CHECK-NOT: mul
				; CHECK: useWithoutRecip
				; CHECK: fooExit
				tail call void @fooEnter()
				tail call void @fastWithRecip(float %a0, float %a1, float %a2)
				tail call void @fastWithoutRecip(float %a3, float %a4, float %a5)
				tail call void @fooExit()
				ret void
				}

				declare void @fooEnter()
				declare void @fastWithRecip(float, float, float)
				declare void @fastWithoutRecip(float, float, float)
				declare void @fooExit()

				attributes #0 = { "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "unsafe-fp-math"="true" }

test/Transforms/InstCombine/fast-math.ll

	Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines
	;			;
	; Testing-cases about div			; Testing-cases about div
	;			;
	; =========================================================================			; =========================================================================

	; X/C1 / C2 => X * (1/(C2*C1))			; X/C1 / C2 => X * (1/(C2*C1))
	define float @fdiv1(float %x) {			define float @fdiv1(float %x) {
	%div = fdiv float %x, 0x3FF3333340000000			%div = fdiv float %x, 0x3FF3333340000000
	%div1 = fdiv fast float %div, 0x4002666660000000			%div1 = fdiv fast arcp float %div, 0x4002666660000000
	ret float %div1			ret float %div1
	; 0x3FF3333340000000 = 1.2f			; 0x3FF3333340000000 = 1.2f
	; 0x4002666660000000 = 2.3f			; 0x4002666660000000 = 2.3f
	; 0x3FD7303B60000000 = 0.36231884057971014492			; 0x3FD7303B60000000 = 0.36231884057971014492
	; CHECK-LABEL: @fdiv1(			; CHECK-LABEL: @fdiv1(
	; CHECK: fmul fast float %x, 0x3FD7303B60000000			; CHECK: fmul fast float %x, 0x3FD7303B60000000
	}			}

	▲ Show 20 Lines • Show All 475 Lines • Show Last 20 Lines

test/Transforms/SLPVectorizer/X86/propagate_ir_flags.ll

Show First 20 Lines • Show All 248 Lines • ▼ Show 20 Lines	define void @only_arcp(float* %x) {
%idx3 = getelementptr inbounds float, float* %x, i64 2		%idx3 = getelementptr inbounds float, float* %x, i64 2
%idx4 = getelementptr inbounds float, float* %x, i64 3		%idx4 = getelementptr inbounds float, float* %x, i64 3

%load1 = load float, float* %idx1, align 4		%load1 = load float, float* %idx1, align 4
%load2 = load float, float* %idx2, align 4		%load2 = load float, float* %idx2, align 4
%load3 = load float, float* %idx3, align 4		%load3 = load float, float* %idx3, align 4
%load4 = load float, float* %idx4, align 4		%load4 = load float, float* %idx4, align 4

%op1 = fadd fast float %load1, 1.0		%op1 = fadd fast arcp float %load1, 1.0
%op2 = fadd fast float %load2, 1.0		%op2 = fadd fast arcp float %load2, 1.0
%op3 = fadd fast float %load3, 1.0		%op3 = fadd fast arcp float %load3, 1.0
%op4 = fadd arcp float %load4, 1.0		%op4 = fadd arcp float %load4, 1.0

store float %op1, float* %idx1, align 4		store float %op1, float* %idx1, align 4
store float %op2, float* %idx2, align 4		store float %op2, float* %idx2, align 4
store float %op3, float* %idx3, align 4		store float %op3, float* %idx3, align 4
store float %op4, float* %idx4, align 4		store float %op4, float* %idx4, align 4

ret void		ret void
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

unittests/IR/IRBuilderTest.cpp

Show First 20 Lines • Show All 157 Lines • ▼ Show 20 Lines	TEST_F(IRBuilderTest, FastMathFlags) {

// Now, try it with CreateBinOp		// Now, try it with CreateBinOp
F = Builder.CreateBinOp(Instruction::FAdd, F, F);		F = Builder.CreateBinOp(Instruction::FAdd, F, F);
EXPECT_TRUE(Builder.getFastMathFlags().any());		EXPECT_TRUE(Builder.getFastMathFlags().any());
ASSERT_TRUE(isa<Instruction>(F));		ASSERT_TRUE(isa<Instruction>(F));
FAdd = cast<Instruction>(F);		FAdd = cast<Instruction>(F);
EXPECT_TRUE(FAdd->hasNoNaNs());		EXPECT_TRUE(FAdd->hasNoNaNs());

		FMF.setAllowReciprocal();
		Builder.setFastMathFlags(FMF);
F = Builder.CreateFDiv(F, F);		F = Builder.CreateFDiv(F, F);
EXPECT_TRUE(Builder.getFastMathFlags().any());		EXPECT_TRUE(Builder.getFastMathFlags().any());
EXPECT_TRUE(Builder.getFastMathFlags().UnsafeAlgebra);		EXPECT_TRUE(Builder.getFastMathFlags().UnsafeAlgebra);
ASSERT_TRUE(isa<Instruction>(F));		ASSERT_TRUE(isa<Instruction>(F));
FDiv = cast<Instruction>(F);		FDiv = cast<Instruction>(F);
EXPECT_TRUE(FDiv->hasAllowReciprocal());		EXPECT_TRUE(FDiv->hasAllowReciprocal());

Builder.clearFastMathFlags();		Builder.clearFastMathFlags();
▲ Show 20 Lines • Show All 262 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Fix -f[no-]reciprocal-math -ffast-math interaction, including LTOAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 78085

include/llvm/IR/Operator.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/AArch64/fdiv-combine.ll

test/CodeGen/AMDGPU/fdiv.ll

test/CodeGen/PowerPC/fdiv-combine.ll

test/CodeGen/X86/fdiv-combine.ll

test/LTO/X86/Inputs/fast-with-recip.ll

test/LTO/X86/Inputs/fast-without-recip.ll

test/LTO/X86/fast-recip.ll

test/Transforms/InstCombine/fast-math.ll

test/Transforms/SLPVectorizer/X86/propagate_ir_flags.ll

unittests/IR/IRBuilderTest.cpp

Fix -f[no-]reciprocal-math -ffast-math interaction, including LTO
AbandonedPublic