This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
1
InstCombineCalls.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
fma.ll

Differential D82778

[InstCombine] fma x, y, 0 -> fmul x, y
ClosedPublic

Authored by dmgreen on Jun 29 2020, 8:53 AM.

Download Raw Diff

Details

Reviewers

spatel
arsenm
lebedev.ri
fhahn

Commits

rG9e49d1d9b870: [InstCombine] fma x, y, 0 -> fmul x, y

Summary

If the addend of the fma is zero, common sense would suggest that we can convert fma x, y, 0.0 to fmul x, y. This comes up with some user code that was expecting the first fma in an unrolled loop to simplify to a fmul.

Floating point often does not follow naive common sense though. Alive suggests that this should be guarded by nsz (as fadd -0.0, 0.0 = 0.0). However it also did not complete running, so I have only validated this against fadd nsz (fmul(x,y), 0) -> fmul nsz (x,y), not with fma nsz (x, y, 0.0) -> fmul nsz (x, y).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dmgreen created this revision.Jun 29 2020, 8:53 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2020, 8:53 AM

Herald added subscribers: hiraditya, wdng. · View Herald Transcript

I don't think I see why this needs an nsz check (but think it could also fold -0 in the 3rd operand with one). The fma should only be able to produce a negative 0 if the 3rd operand is also a -0

fadd(+0, -0) -> +0
fadd(-0, -0) -> -0

fmul(-0, +x) -> -0
fmul(+0, -x) -> -0
fmul(-0, -x) -> 0

fma(+x, -0, 0) -> +0
fma(-x, +0, 0) -> +0
fma(-x, -0, 0) -> +0
fma(+x, -0, -0) -> -0

I think it's because of these ones:

fma(+0, -x, 0) -> +0

fmul(+0, -x) -> -0
fadd(+0, -0) -> +0

Removing the fadd would leave it at -0. This is what alive2 comes up with:

Processing fma.txt..                          
                                              
----------------------------------------      
  %ys = fma half %x, half %y, half 0.000000   
  ret half %ys                                
=>                                            
  %ys = fmul half %x, %y                      
  ret half %ys                                
                                              
ERROR: Value mismatch for half %ys            
                                              
Example:                                      
half %x = #xf461 (-17936)                     
half %y = #x0000 (+0.0)                       
Source value: #x0000 (+0.0)                   
Target value: #x8000 (-0.0)

In D82778#2120456, @dmgreen wrote:
I think it's because of these ones:
fma(+0, -x, 0) -> +0

fmul(+0, -x) -> -0
fadd(+0, -0) -> +0
Removing the fadd would leave it at -0.

Right - we need 'nsz' to fold adding +0.0 (I think this patch is correct as-is).

We can also always fold fma(x, y, -0.0) --> fmul x, y because (this is in InstSimplify):
fadd X, -0 ==> X

But I can't get the online version of Alive2 to confirm that (time-out). Local version should verify it?
If we don't have tests for the add of -0.0 variant, it would be good to add them for the likely follow-on patch.

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp
2385–2387	Can reduce this to something like: return BinaryOperator::CreateFMulFMF(Src0, Src1, II);

Thanks for the extra info. I ran things overnight and it said this for -0.0

Processing fma.txt..

----------------------------------------
  %ys = fma half %x, half %y, half -0.000000
  ret half %ys
=>
  %ys = fmul half %x, %y
  ret half %ys

Done: 1
Transformation seems to be correct!

real    136m22.357s
user    117m17.118s
sys     0m7.495s

Which looks good. I've extended the pattern to match and added some tests.

Unfortunately still this for +0.0:

Processing fma.txt..

----------------------------------------
  %ys = fma nsz half %x, half %y, half 0.000000
  ret half %ys
=>
  %ys = fmul half %x, %y
  ret half %ys

ERROR: SMT Error: smt tactic failed to show goal to be sat/unsat memout

real    152m39.828s
user    147m53.980s
sys     0m6.067s

But I'm fairly confident it's OK given nsz and the result for -0.0.

LGTM

This revision is now accepted and ready to land.Jun 30 2020, 7:36 AM

Closed by commit rG9e49d1d9b870: [InstCombine] fma x, y, 0 -> fmul x, y (authored by dmgreen). · Explain WhyJun 30 2020, 12:30 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCalls.cpp

8 lines

test/

Transforms/

InstCombine/

fma.ll

12 lines

Diff 274585

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show First 20 Lines • Show All 2,373 Lines • ▼ Show 20 Lines	case Intrinsic::fma: {
if (Value *V = SimplifyFMAFMul(II->getArgOperand(0), II->getArgOperand(1),		if (Value *V = SimplifyFMAFMul(II->getArgOperand(0), II->getArgOperand(1),
II->getFastMathFlags(),		II->getFastMathFlags(),
SQ.getWithInstruction(II))) {		SQ.getWithInstruction(II))) {
auto *FAdd = BinaryOperator::CreateFAdd(V, II->getArgOperand(2));		auto *FAdd = BinaryOperator::CreateFAdd(V, II->getArgOperand(2));
FAdd->copyFastMathFlags(II);		FAdd->copyFastMathFlags(II);
return FAdd;		return FAdd;
}		}

		// fma x, y, 0 -> fmul x, y
		// This is always valid for -0.0, but requires nsz for +0.0 as
		// -0.0 + 0.0 = 0.0, which would not be the same as the fmul on its own.
		if (match(II->getArgOperand(2), m_NegZeroFP()) \|\|
		(match(II->getArgOperand(2), m_PosZeroFP()) &&
		II->getFastMathFlags().noSignedZeros()))
		spatelUnsubmitted Not Done Reply Inline Actions Can reduce this to something like: return BinaryOperator::CreateFMulFMF(Src0, Src1, II); spatel: Can reduce this to something like: return BinaryOperator::CreateFMulFMF(Src0, Src1, II);
		return BinaryOperator::CreateFMulFMF(Src0, Src1, II);

break;		break;
}		}
case Intrinsic::copysign: {		case Intrinsic::copysign: {
if (SignBitMustBeZero(II->getArgOperand(1), &TLI)) {		if (SignBitMustBeZero(II->getArgOperand(1), &TLI)) {
// If we know that the sign argument is positive, reduce to FABS:		// If we know that the sign argument is positive, reduce to FABS:
// copysign X, Pos --> fabs X		// copysign X, Pos --> fabs X
Value *Fabs = Builder.CreateUnaryIntrinsic(Intrinsic::fabs,		Value *Fabs = Builder.CreateUnaryIntrinsic(Intrinsic::fabs,
II->getArgOperand(0), II);		II->getArgOperand(0), II);
▲ Show 20 Lines • Show All 2,770 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/fma.ll

	Show First 20 Lines • Show All 366 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret float [[FMA]]			; CHECK-NEXT: ret float [[FMA]]
	;			;
	%fma = call float @llvm.fma.f32(float %x, float %y, float 0.0)			%fma = call float @llvm.fma.f32(float %x, float %y, float 0.0)
	ret float %fma			ret float %fma
	}			}

	define float @fma_x_y_0_nsz(float %x, float %y) {			define float @fma_x_y_0_nsz(float %x, float %y) {
	; CHECK-LABEL: @fma_x_y_0_nsz(			; CHECK-LABEL: @fma_x_y_0_nsz(
	; CHECK-NEXT: [[FMA:%.]] = call nsz float @llvm.fma.f32(float [[X:%.]], float [[Y:%.*]], float 0.000000e+00)			; CHECK-NEXT: [[FMA:%.]] = fmul nsz float [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: ret float [[FMA]]			; CHECK-NEXT: ret float [[FMA]]
	;			;
	%fma = call nsz float @llvm.fma.f32(float %x, float %y, float 0.0)			%fma = call nsz float @llvm.fma.f32(float %x, float %y, float 0.0)
	ret float %fma			ret float %fma
	}			}

	define <8 x half> @fma_x_y_0_v(<8 x half> %x, <8 x half> %y) {			define <8 x half> @fma_x_y_0_v(<8 x half> %x, <8 x half> %y) {
	; CHECK-LABEL: @fma_x_y_0_v(			; CHECK-LABEL: @fma_x_y_0_v(
	; CHECK-NEXT: [[FMA:%.]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[X:%.]], <8 x half> [[Y:%.*]], <8 x half> zeroinitializer)			; CHECK-NEXT: [[FMA:%.]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[X:%.]], <8 x half> [[Y:%.*]], <8 x half> zeroinitializer)
	; CHECK-NEXT: ret <8 x half> [[FMA]]			; CHECK-NEXT: ret <8 x half> [[FMA]]
	;			;
	%fma = call <8 x half> @llvm.fma.v8f16(<8 x half> %x, <8 x half> %y, <8 x half> zeroinitializer)			%fma = call <8 x half> @llvm.fma.v8f16(<8 x half> %x, <8 x half> %y, <8 x half> zeroinitializer)
	ret <8 x half> %fma			ret <8 x half> %fma
	}			}

	define <8 x half> @fma_x_y_0_nsz_v(<8 x half> %x, <8 x half> %y) {			define <8 x half> @fma_x_y_0_nsz_v(<8 x half> %x, <8 x half> %y) {
	; CHECK-LABEL: @fma_x_y_0_nsz_v(			; CHECK-LABEL: @fma_x_y_0_nsz_v(
	; CHECK-NEXT: [[FMA:%.]] = call nsz <8 x half> @llvm.fma.v8f16(<8 x half> [[X:%.]], <8 x half> [[Y:%.*]], <8 x half> zeroinitializer)			; CHECK-NEXT: [[FMA:%.]] = fmul nsz <8 x half> [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: ret <8 x half> [[FMA]]			; CHECK-NEXT: ret <8 x half> [[FMA]]
	;			;
	%fma = call nsz <8 x half> @llvm.fma.v8f16(<8 x half> %x, <8 x half> %y, <8 x half> zeroinitializer)			%fma = call nsz <8 x half> @llvm.fma.v8f16(<8 x half> %x, <8 x half> %y, <8 x half> zeroinitializer)
	ret <8 x half> %fma			ret <8 x half> %fma
	}			}

	define float @fmuladd_x_y_0(float %x, float %y) {			define float @fmuladd_x_y_0(float %x, float %y) {
	; CHECK-LABEL: @fmuladd_x_y_0(			; CHECK-LABEL: @fmuladd_x_y_0(
	; CHECK-NEXT: [[FMA:%.]] = call float @llvm.fmuladd.f32(float [[X:%.]], float [[Y:%.*]], float 0.000000e+00)			; CHECK-NEXT: [[FMA:%.]] = call float @llvm.fmuladd.f32(float [[X:%.]], float [[Y:%.*]], float 0.000000e+00)
	; CHECK-NEXT: ret float [[FMA]]			; CHECK-NEXT: ret float [[FMA]]
	;			;
	%fma = call float @llvm.fmuladd.f32(float %x, float %y, float 0.0)			%fma = call float @llvm.fmuladd.f32(float %x, float %y, float 0.0)
	ret float %fma			ret float %fma
	}			}

	define float @fmuladd_x_y_0_nsz(float %x, float %y) {			define float @fmuladd_x_y_0_nsz(float %x, float %y) {
	; CHECK-LABEL: @fmuladd_x_y_0_nsz(			; CHECK-LABEL: @fmuladd_x_y_0_nsz(
	; CHECK-NEXT: [[FMA:%.]] = call nsz float @llvm.fmuladd.f32(float [[X:%.]], float [[Y:%.*]], float 0.000000e+00)			; CHECK-NEXT: [[FMA:%.]] = fmul nsz float [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: ret float [[FMA]]			; CHECK-NEXT: ret float [[FMA]]
	;			;
	%fma = call nsz float @llvm.fmuladd.f32(float %x, float %y, float 0.0)			%fma = call nsz float @llvm.fmuladd.f32(float %x, float %y, float 0.0)
	ret float %fma			ret float %fma
	}			}

	define float @fma_x_y_m0(float %x, float %y) {			define float @fma_x_y_m0(float %x, float %y) {
	; CHECK-LABEL: @fma_x_y_m0(			; CHECK-LABEL: @fma_x_y_m0(
	; CHECK-NEXT: [[FMA:%.]] = call float @llvm.fma.f32(float [[X:%.]], float [[Y:%.*]], float -0.000000e+00)			; CHECK-NEXT: [[FMA:%.]] = fmul float [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: ret float [[FMA]]			; CHECK-NEXT: ret float [[FMA]]
	;			;
	%fma = call float @llvm.fma.f32(float %x, float %y, float -0.0)			%fma = call float @llvm.fma.f32(float %x, float %y, float -0.0)
	ret float %fma			ret float %fma
	}			}

	define <8 x half> @fma_x_y_m0_v(<8 x half> %x, <8 x half> %y) {			define <8 x half> @fma_x_y_m0_v(<8 x half> %x, <8 x half> %y) {
	; CHECK-LABEL: @fma_x_y_m0_v(			; CHECK-LABEL: @fma_x_y_m0_v(
	; CHECK-NEXT: [[FMA:%.]] = call <8 x half> @llvm.fma.v8f16(<8 x half> [[X:%.]], <8 x half> [[Y:%.*]], <8 x half> <half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000, half 0xH8000>)			; CHECK-NEXT: [[FMA:%.]] = fmul <8 x half> [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: ret <8 x half> [[FMA]]			; CHECK-NEXT: ret <8 x half> [[FMA]]
	;			;
	%fma = call <8 x half> @llvm.fma.v8f16(<8 x half> %x, <8 x half> %y, <8 x half> <half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0>)			%fma = call <8 x half> @llvm.fma.v8f16(<8 x half> %x, <8 x half> %y, <8 x half> <half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0, half -0.0>)
	ret <8 x half> %fma			ret <8 x half> %fma
	}			}

	define float @fmuladd_x_y_m0(float %x, float %y) {			define float @fmuladd_x_y_m0(float %x, float %y) {
	; CHECK-LABEL: @fmuladd_x_y_m0(			; CHECK-LABEL: @fmuladd_x_y_m0(
	; CHECK-NEXT: [[FMA:%.]] = call float @llvm.fmuladd.f32(float [[X:%.]], float [[Y:%.*]], float -0.000000e+00)			; CHECK-NEXT: [[FMA:%.]] = fmul float [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: ret float [[FMA]]			; CHECK-NEXT: ret float [[FMA]]
	;			;
	%fma = call float @llvm.fmuladd.f32(float %x, float %y, float -0.0)			%fma = call float @llvm.fmuladd.f32(float %x, float %y, float -0.0)
	ret float %fma			ret float %fma
	}			}

	define float @fmuladd_x_1_z_fast(float %x, float %z) {			define float @fmuladd_x_1_z_fast(float %x, float %z) {
	; CHECK-LABEL: @fmuladd_x_1_z_fast(			; CHECK-LABEL: @fmuladd_x_1_z_fast(
	▲ Show 20 Lines • Show All 356 Lines • Show Last 20 Lines