Download Raw Diff

Details

Reviewers

spatel
Quolyk
efriedma

Commits

rG7736c1936a93: [InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math

Summary

If fast-math is set, then LLVM is free to do the following transformation pow(x, y) * pow(x, z) -> pow(x, y + z). This patch adds this transformation and tests for it. See more https://bugs.llvm.org/show_bug.cgi?id=47205

It handles two cases

When operands of fmul are different instructions

%4 = call reassoc float @llvm.pow.f32(float %0, float %1)
%5 = call reassoc float @llvm.pow.f32(float %0, float %2)
%6 = fmul reassoc float %5, %4
-->
%3 = fadd reassoc float %1, %2
%4 = call reassoc float @llvm.pow.f32(float %0, float %3)

When operands of fmul are the same instruction

%4 = call reassoc float @llvm.pow.f32(float %0, float %1)
%5 = fmul reassoc float %4, %4
-->
%3 = fadd reassoc float %1, %1
%4 = call reassoc float @llvm.pow.f32(float %0, float %3)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vdsered created this revision.May 16 2021, 4:03 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 16 2021, 4:03 AM

vdsered requested review of this revision.May 16 2021, 4:03 AM

Herald added a subscriber: llvm-commits. · View Herald TranscriptMay 16 2021, 4:03 AM

Calling CreateBinaryIntrinsic instead of CreateIntrinsic for better readability

Harbormaster completed remote builds in B104699: Diff 345698.May 16 2021, 5:23 AM

xbolva00 added a reviewer: efriedma.May 16 2021, 9:26 AM

I have noticed that there are two similiar transformations below exp2(X) * exp2(Y) -> exp2(X + Y) and exp(X) * exp(Y) -> exp(X + Y) that are done when each operand of multiplication has exactly one use which is n't true if, for example, there is exp2(X) * exp2(X) where exp2(X) is the same instruction, so it can be improved a little. I'm going to update the patch

In D102574#2763294, @vdsered wrote:

I have noticed that there are two similiar transformations below exp2(X) * exp2(Y) -> exp2(X + Y) and exp(X) * exp(Y) -> exp(X + Y) that are done when each operand of multiplication has exactly one use which is n't true if, for example, there is exp2(X) * exp2(X) where exp2(X) is the same instruction, so it can be improved a little. I'm going to update the patch

I recommend that you make that use-check logic into a helper function as a preliminary step, so it can be be accessed from other places. For example, we miss this transform too because the use check is too restrictive:
https://alive2.llvm.org/ce/z/27ryac

In D102574#2763320, @spatel wrote:

In D102574#2763294, @vdsered wrote:

I have noticed that there are two similiar transformations below exp2(X) * exp2(Y) -> exp2(X + Y) and exp(X) * exp(Y) -> exp(X + Y) that are done when each operand of multiplication has exactly one use which is n't true if, for example, there is exp2(X) * exp2(X) where exp2(X) is the same instruction, so it can be improved a little. I'm going to update the patch

I recommend that you make that use-check logic into a helper function as a preliminary step, so it can be be accessed from other places. For example, we miss this transform too because the use check is too restrictive:
https://alive2.llvm.org/ce/z/27ryac

Are you talking about all checks that we do here? (see below)

Op0->hasOneUse() || Op1->hasOneUse() || (Op1 == Op0 && Op1->hasNUses(2))

Because I think this condition is going to be the same for many patterns and I will create another patch to implement the helper function and refactor conditions at least for those transformations (below) before updating this one

exp(X) * exp(Y) -> exp(X + Y)
exp2(X) * exp2(Y) -> exp2(X + Y)
sqrt(X) * sqrt(Y) -> sqrt(X * Y)

There are more cases when this sort of check is useful like in InstCombinerImpl::SimplifyAssociativeOrCommutative

// Transform: "(A op C1) op (B op C2)" ==> "(A op B) op (C1 op C2)"
// if C1 and C2 are constants.
Value *A, *B;
Constant *C1, *C2;
if (Op0 && Op1 &&
    Op0->getOpcode() == Opcode && Op1->getOpcode() == Opcode &&
    match(Op0, m_OneUse(m_BinOp(m_Value(A), m_Constant(C1)))) &&
    match(Op1, m_OneUse(m_BinOp(m_Value(B), m_Constant(C2))))) {

The only difference is how op0 and op1 are compared (via opcode), but the use-check can be relaxed too as I understand

In D102574#2764241, @vdsered wrote:
In D102574#2763320, @spatel wrote:

In D102574#2763294, @vdsered wrote:

I have noticed that there are two similiar transformations below exp2(X) * exp2(Y) -> exp2(X + Y) and exp(X) * exp(Y) -> exp(X + Y) that are done when each operand of multiplication has exactly one use which is n't true if, for example, there is exp2(X) * exp2(X) where exp2(X) is the same instruction, so it can be improved a little. I'm going to update the patch

I recommend that you make that use-check logic into a helper function as a preliminary step, so it can be be accessed from other places. For example, we miss this transform too because the use check is too restrictive:
https://alive2.llvm.org/ce/z/27ryac

Are you talking about all checks that we do here? (see below)
Op0->hasOneUse() || Op1->hasOneUse() || (Op1 == Op0 && Op1->hasNUses(2))

Yes - let's give that line a name (not sure what's best - "hasRemovableUser" ?), so we don't have to repeat it.

Because I think this condition is going to be the same for many patterns and I will create another patch to implement the helper function and refactor conditions at least for those transformations (below) before updating this one

exp(X) * exp(Y) -> exp(X + Y)

exp2(X) * exp2(Y) -> exp2(X + Y)

sqrt(X) * sqrt(Y) -> sqrt(X * Y)

Yes - please make a different patch. It's probably only mul/fmul that have a need for this. Most other binops should simplify if both operands are the same.
For example, the sqrt pattern should get folded by instsimplify, so I don't think that needs any changes.

vdsered mentioned this in D102698: [InstCombine] Relaxed constraints of uses for exp(X) * exp(Y) -> exp(X + Y) and exp2(X) * exp2(Y) -> exp2(X + Y).May 18 2021, 9:51 AM

I created a new patch https://reviews.llvm.org/D102698. I will update the current patch when we close the new one.

spatel mentioned this in rG13140120dcca: [InstCombine] Relax constraints of uses for exp(X) * exp(Y) -> exp(X + Y).Jun 1 2021, 5:34 AM

Replaced hasOneUse + equality of operands with isOnlyUserOfAnyOperand
FMF are added to produced fadd
Added negative tests

@spatel you mentioned one more optimization with sext that we miss because of too strict use-constraint here. I'll create another patch for that. I think, it's going to be the last one. I don't know any other transformations for mul/fmul that would benefit from the new method.

Harbormaster completed remote builds in B107065: Diff 349008.Jun 1 2021, 12:40 PM

spatel mentioned this in rGf03f4944cf82: [InstCombine] add tests for pow() reassociation; NFC.Jun 4 2021, 7:16 AM

In D102574#2791437, @vdsered wrote:

@spatel you mentioned one more optimization with sext that we miss because of too strict use-constraint here. I'll create another patch for that. I think, it's going to be the last one. I don't know any other transformations for mul/fmul that would benefit from the new method.

Yes, that was:
https://alive2.llvm.org/ce/z/27ryac
...and it would be another patch

I pushed the baseline tests for this patch with:
f03f4944cf

Please update/rebase here in Phab, so we'll see just the diffs from this patch.

vdsered updated this revision to Diff 350047.Jun 5 2021, 2:51 AM

xbolva00 added a subscriber: xbolva00.Jun 5 2021, 3:06 AM

xbolva00 added inline comments.

llvm/test/Transforms/InstCombine/fmul-pow.ll
147	Restore

Harbormaster completed remote builds in B107809: Diff 350047.Jun 5 2021, 3:14 AM

@spatel Done

Harbormaster completed remote builds in B107814: Diff 350059.Jun 5 2021, 9:45 AM

spatel added inline comments.Jun 6 2021, 4:29 PM

llvm/test/Transforms/InstCombine/fmul-pow.ll
107	Do you know why you are getting this (and similar) test changes? They are cosmetic (value names differ), but I don't see this when testing locally.

Removed name diff in tests

vdsered added inline comments.Jun 6 2021, 6:48 PM

llvm/test/Transforms/InstCombine/fmul-pow.ll
107	I have seen recently how update_test_checks warned about using tmps in tests so I thought of renaming. It doesn't say anything for these tests however... Updated the patch to remove the naming diff and see the actual changes in output for tests

Harbormaster completed remote builds in B107897: Diff 350162.Jun 6 2021, 7:20 PM

LGTM

llvm/test/Transforms/InstCombine/fmul-pow.ll
107	Ah - yes, the script will warn if it sees IR like: %tmp1 = fmul float %x, %y ...because the script is creating FileCheck variables for unnamed values in IR (`%1`) with names like `TMP1` (as we see in the examples in this patch). I don't see any potential conflicts with these tests.

This revision is now accepted and ready to land.Jun 7 2021, 3:29 AM

Closed by commit rG7736c1936a93: [InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math (authored by vdsered, committed by spatel). · Explain WhyJun 7 2021, 5:10 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG7736c1936a93: [InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math.

vdsered added a comment.Jun 7 2021, 5:16 AM

This comment was removed by vdsered.

Sorry, I didn't see you have already committed :)

vdsered mentioned this in D104193: [InstCombine] Fold (sext bool X) * (sext bool X) to zext (and X, X).Jun 13 2021, 6:39 AM

Diff 350253

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

Show First 20 Lines • Show All 549 Lines • ▼ Show 20 Lines	if (I.hasNoNaNs() && I.hasNoSignedZeros() && Op0 == Op1 &&
// (sqrt(Y) / X) * (sqrt(Y) / X) --> Y / (X * X)		// (sqrt(Y) / X) * (sqrt(Y) / X) --> Y / (X * X)
if (match(Op0, m_FDiv(m_Intrinsic<Intrinsic::sqrt>(m_Value(Y)),		if (match(Op0, m_FDiv(m_Intrinsic<Intrinsic::sqrt>(m_Value(Y)),
m_Value(X)))) {		m_Value(X)))) {
Value *XX = Builder.CreateFMulFMF(X, X, &I);		Value *XX = Builder.CreateFMulFMF(X, X, &I);
return BinaryOperator::CreateFDivFMF(Y, XX, &I);		return BinaryOperator::CreateFDivFMF(Y, XX, &I);
}		}
}		}

		if (I.isOnlyUserOfAnyOperand()) {
		// pow(x, y) * pow(x, z) -> pow(x, y + z)
		if (match(Op0, m_Intrinsic<Intrinsic::pow>(m_Value(X), m_Value(Y))) &&
		match(Op1, m_Intrinsic<Intrinsic::pow>(m_Specific(X), m_Value(Z)))) {
		auto *YZ = Builder.CreateFAddFMF(Y, Z, &I);
		auto *NewPow = Builder.CreateBinaryIntrinsic(Intrinsic::pow, X, YZ, &I);
		return replaceInstUsesWith(I, NewPow);
		}

// exp(X) * exp(Y) -> exp(X + Y)		// exp(X) * exp(Y) -> exp(X + Y)
if (match(Op0, m_Intrinsic<Intrinsic::exp>(m_Value(X))) &&		if (match(Op0, m_Intrinsic<Intrinsic::exp>(m_Value(X))) &&
match(Op1, m_Intrinsic<Intrinsic::exp>(m_Value(Y))) &&		match(Op1, m_Intrinsic<Intrinsic::exp>(m_Value(Y)))) {
I.isOnlyUserOfAnyOperand()) {
Value *XY = Builder.CreateFAddFMF(X, Y, &I);		Value *XY = Builder.CreateFAddFMF(X, Y, &I);
Value *Exp = Builder.CreateUnaryIntrinsic(Intrinsic::exp, XY, &I);		Value *Exp = Builder.CreateUnaryIntrinsic(Intrinsic::exp, XY, &I);
return replaceInstUsesWith(I, Exp);		return replaceInstUsesWith(I, Exp);
}		}

// exp2(X) * exp2(Y) -> exp2(X + Y)		// exp2(X) * exp2(Y) -> exp2(X + Y)
if (match(Op0, m_Intrinsic<Intrinsic::exp2>(m_Value(X))) &&		if (match(Op0, m_Intrinsic<Intrinsic::exp2>(m_Value(X))) &&
match(Op1, m_Intrinsic<Intrinsic::exp2>(m_Value(Y))) &&		match(Op1, m_Intrinsic<Intrinsic::exp2>(m_Value(Y)))) {
I.isOnlyUserOfAnyOperand()) {
Value *XY = Builder.CreateFAddFMF(X, Y, &I);		Value *XY = Builder.CreateFAddFMF(X, Y, &I);
Value *Exp2 = Builder.CreateUnaryIntrinsic(Intrinsic::exp2, XY, &I);		Value *Exp2 = Builder.CreateUnaryIntrinsic(Intrinsic::exp2, XY, &I);
return replaceInstUsesWith(I, Exp2);		return replaceInstUsesWith(I, Exp2);
}		}
		}

// (XY) X => (XX) Y where Y != X		// (XY) X => (XX) Y where Y != X
// The purpose is two-fold:		// The purpose is two-fold:
// 1) to form a power expression (of X).		// 1) to form a power expression (of X).
// 2) potentially shorten the critical path: After transformation, the		// 2) potentially shorten the critical path: After transformation, the
// latency of the instruction Y is amortized by the expression of X*X,		// latency of the instruction Y is amortized by the expression of X*X,
// and therefore Y is in a "less critical" position compared to what it		// and therefore Y is in a "less critical" position compared to what it
// was before the transformation.		// was before the transformation.
▲ Show 20 Lines • Show All 1,015 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/fmul-pow.ll

Show First 20 Lines • Show All 73 Lines • ▼ Show 20 Lines	;
%1 = call double @llvm.pow.f64(double %a, double %b)		%1 = call double @llvm.pow.f64(double %a, double %b)
%2 = call double @llvm.pow.f64(double %a, double %c)		%2 = call double @llvm.pow.f64(double %a, double %c)
%mul = fmul double %2, %1		%mul = fmul double %2, %1
ret double %mul		ret double %mul
}		}

define double @pow_ab_x_pow_ac_reassoc(double %a, double %b, double %c) {		define double @pow_ab_x_pow_ac_reassoc(double %a, double %b, double %c) {
; CHECK-LABEL: @pow_ab_x_pow_ac_reassoc(		; CHECK-LABEL: @pow_ab_x_pow_ac_reassoc(
; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.pow.f64(double [[A:%.]], double [[B:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc double [[C:%.]], [[B:%.*]]
; CHECK-NEXT: [[TMP2:%.]] = call double @llvm.pow.f64(double [[A]], double [[C:%.]])		; CHECK-NEXT: [[TMP2:%.]] = call reassoc double @llvm.pow.f64(double [[A:%.]], double [[TMP1]])
; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[TMP2]], [[TMP1]]		; CHECK-NEXT: ret double [[TMP2]]
; CHECK-NEXT: ret double [[MUL]]
;		;
%1 = call double @llvm.pow.f64(double %a, double %b)		%1 = call double @llvm.pow.f64(double %a, double %b)
%2 = call double @llvm.pow.f64(double %a, double %c)		%2 = call double @llvm.pow.f64(double %a, double %c)
%mul = fmul reassoc double %2, %1		%mul = fmul reassoc double %2, %1
ret double %mul		ret double %mul
}		}


define double @pow_ab_reassoc(double %a, double %b) {		define double @pow_ab_reassoc(double %a, double %b) {
; CHECK-LABEL: @pow_ab_reassoc(		; CHECK-LABEL: @pow_ab_reassoc(
; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.pow.f64(double [[A:%.]], double [[B:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = fadd reassoc double [[B:%.]], [[B]]
; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[TMP1]], [[TMP1]]		; CHECK-NEXT: [[TMP2:%.]] = call reassoc double @llvm.pow.f64(double [[A:%.]], double [[TMP1]])
; CHECK-NEXT: ret double [[MUL]]		; CHECK-NEXT: ret double [[TMP2]]
;		;
%1 = call double @llvm.pow.f64(double %a, double %b)		%1 = call double @llvm.pow.f64(double %a, double %b)
%mul = fmul reassoc double %1, %1		%mul = fmul reassoc double %1, %1
ret double %mul		ret double %mul
}		}

define double @pow_ab_reassoc_extra_use(double %a, double %b) {		define double @pow_ab_reassoc_extra_use(double %a, double %b) {
; CHECK-LABEL: @pow_ab_reassoc_extra_use(		; CHECK-LABEL: @pow_ab_reassoc_extra_use(
; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.pow.f64(double [[A:%.]], double [[B:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.pow.f64(double [[A:%.]], double [[B:%.*]])
; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[TMP1]], [[TMP1]]		; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[TMP1]], [[TMP1]]
; CHECK-NEXT: call void @use(double [[TMP1]])		; CHECK-NEXT: call void @use(double [[TMP1]])
		spatelUnsubmitted Not Done Reply Inline Actions Do you know why you are getting this (and similar) test changes? They are cosmetic (value names differ), but I don't see this when testing locally. spatel: Do you know why you are getting this (and similar) test changes? They are cosmetic (value names…
		vdseredAuthorUnsubmitted Done Reply Inline Actions I have seen recently how update_test_checks warned about using tmps in tests so I thought of renaming. It doesn't say anything for these tests however... Updated the patch to remove the naming diff and see the actual changes in output for tests vdsered: I have seen recently how update_test_checks warned about using tmps in tests so I thought of…
		spatelUnsubmitted Not Done Reply Inline Actions Ah - yes, the script will warn if it sees IR like: %tmp1 = fmul float %x, %y ...because the script is creating FileCheck variables for unnamed values in IR (`%1`) with names like `TMP1` (as we see in the examples in this patch). I don't see any potential conflicts with these tests. spatel: Ah - yes, the script will warn if it sees IR like: %tmp1 = fmul float %x, %y ...because the…
; CHECK-NEXT: ret double [[MUL]]		; CHECK-NEXT: ret double [[MUL]]
;		;
%1 = call double @llvm.pow.f64(double %a, double %b)		%1 = call double @llvm.pow.f64(double %a, double %b)
%mul = fmul reassoc double %1, %1		%mul = fmul reassoc double %1, %1
call void @use(double %1)		call void @use(double %1)
ret double %mul		ret double %mul
}		}

define double @pow_ab_x_pow_ac_reassoc_extra_use(double %a, double %b, double %c) {		define double @pow_ab_x_pow_ac_reassoc_extra_use(double %a, double %b, double %c) {
; CHECK-LABEL: @pow_ab_x_pow_ac_reassoc_extra_use(		; CHECK-LABEL: @pow_ab_x_pow_ac_reassoc_extra_use(
; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.pow.f64(double [[A:%.]], double [[B:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.pow.f64(double [[A:%.]], double [[B:%.*]])
; CHECK-NEXT: [[TMP2:%.]] = call double @llvm.pow.f64(double [[A]], double [[C:%.]])		; CHECK-NEXT: [[TMP2:%.]] = fadd reassoc double [[B]], [[C:%.]]
; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[TMP3:%.*]] = call reassoc double @llvm.pow.f64(double [[A]], double [[TMP2]])
; CHECK-NEXT: call void @use(double [[TMP1]])		; CHECK-NEXT: call void @use(double [[TMP1]])
; CHECK-NEXT: ret double [[MUL]]		; CHECK-NEXT: ret double [[TMP3]]
;		;
%1 = call double @llvm.pow.f64(double %a, double %b)		%1 = call double @llvm.pow.f64(double %a, double %b)
%2 = call double @llvm.pow.f64(double %a, double %c)		%2 = call double @llvm.pow.f64(double %a, double %c)
%mul = fmul reassoc double %1, %2		%mul = fmul reassoc double %1, %2
call void @use(double %1)		call void @use(double %1)
ret double %mul		ret double %mul
}		}

define double @pow_ab_x_pow_ac_reassoc_multiple_uses(double %a, double %b, double %c) {		define double @pow_ab_x_pow_ac_reassoc_multiple_uses(double %a, double %b, double %c) {
; CHECK-LABEL: @pow_ab_x_pow_ac_reassoc_multiple_uses(		; CHECK-LABEL: @pow_ab_x_pow_ac_reassoc_multiple_uses(
; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.pow.f64(double [[A:%.]], double [[B:%.*]])		; CHECK-NEXT: [[TMP1:%.]] = call double @llvm.pow.f64(double [[A:%.]], double [[B:%.*]])
; CHECK-NEXT: [[TMP2:%.]] = call double @llvm.pow.f64(double [[A]], double [[C:%.]])		; CHECK-NEXT: [[TMP2:%.]] = call double @llvm.pow.f64(double [[A]], double [[C:%.]])
; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[TMP1]], [[TMP2]]		; CHECK-NEXT: [[MUL:%.*]] = fmul reassoc double [[TMP1]], [[TMP2]]
; CHECK-NEXT: call void @use(double [[TMP1]])		; CHECK-NEXT: call void @use(double [[TMP1]])
; CHECK-NEXT: call void @use(double [[TMP2]])		; CHECK-NEXT: call void @use(double [[TMP2]])
; CHECK-NEXT: ret double [[MUL]]		; CHECK-NEXT: ret double [[MUL]]
;		;
%1 = call double @llvm.pow.f64(double %a, double %b)		%1 = call double @llvm.pow.f64(double %a, double %b)
%2 = call double @llvm.pow.f64(double %a, double %c)		%2 = call double @llvm.pow.f64(double %a, double %c)
%mul = fmul reassoc double %1, %2		%mul = fmul reassoc double %1, %2
call void @use(double %1)		call void @use(double %1)
call void @use(double %2)		call void @use(double %2)
ret double %mul		ret double %mul
}		}
		xbolva00Unsubmitted Not Done Reply Inline Actions Restore xbolva00: Restore

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 350253

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/test/Transforms/InstCombine/fmul-pow.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-mathClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 350253

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/test/Transforms/InstCombine/fmul-pow.ll

[InstCombine] Missed optimization for pow(x, y) * pow(x, z) with fast-math
ClosedPublic