Download Raw Diff

Details

Reviewers

spatel
grandinj
cameron.mcinally
lebedev.ri

Commits

rG626c3738cdfa: [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X)

Summary

Implement the following transformation in InstCombine. These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)".

1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive) fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

venkataramanan.kumar.llvm created this revision.Aug 27 2020, 10:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2020, 10:34 AM

Herald added subscribers: llvm-commits, steven.zhang, hiraditya. · View Herald Transcript

venkataramanan.kumar.llvm requested review of this revision.Aug 27 2020, 10:34 AM

Harbormaster completed remote builds in B69815: Diff 288384.Aug 27 2020, 12:59 PM

grandinj added a subscriber: grandinj.Aug 28 2020, 12:39 AM

grandinj added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
552	Just a drive-by commentator: I am surprised there is not an existing transform which does 1 * X ==> X X * 1 ==> X X / 1 ==> X

venkataramanan.kumar.llvm added inline comments.Aug 28 2020, 1:13 AM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
552	Yes they are available. This one is specifically targeting the pattern where one operand for multiplication is X and the other operand is Divide. The dividend is 1.0 and divisor is sqrt(x). x * 1/sqrt(x) ==> x/sqrt(x) later on we try to fold it to sqrt(x) under associative math option.

grandinj added inline comments.Aug 28 2020, 1:25 AM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
552	If they are available, then surely it is not necessary to do X * 1.0 / sqrt(X) ==> X/sqrt(X). Surely by the time it gets there, that would already have been folded to X / sqrt(X) ?

Folding is not happening when the number of uses for the divide operand (say 1.0/something) is more than one.

The test case I added is same as below
ref: https://godbolt.org/z/xKh1n1
here 1/sqrt(x) is used more than once and the folding is not happening.

As I said this patch does not have the usage restriction and folds the particular case of x *1/sqrt(x) to x/sqrt(x) and later we will optimize the divide away to sqrt(x).

spatel added reviewers: grandinj, cameron.mcinally, lebedev.ri.Aug 28 2020, 5:17 AM

spatel added inline comments.

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
552	The motivation for this patch is not obvious because it is not stated in the summary, not shown in code comments, and the tests are not pre-committed to show diffs. Please do all 3 of those things. This patch is a result of a decision made in D85709 (and I think that patch should be abandoned now): we are intentionally not folding x/sqrt(x) in IR because it loses information that the backend can't recover (for the case when x==0.0). So the multi-use case of the 1/sqrt(x) factor is the only reason we need this specialized transform. We already handle more general cases, and we are intentionally not creating extra (and likely expensive) fdiv ops in IR. This pattern is the exception to the rule because we always expect the backend to reduce x/sqrt(x) if it has the necessary (reassoc) fast-math-flags. Please see my comments in D86395 for an idea about how to write tests that provide coverage for the code proposed here (we need 2 tests to cover the 2 potential code patterns resulting from commutation.

venkataramanan.kumar.llvm edited the summary of this revision. (Show Details)Aug 31 2020, 1:29 AM

Herald added a subscriber: danielkiss. · View Herald TranscriptAug 31 2020, 1:29 AM

Update the patch as per the review comments given by Sanjay.

Note that we may still have backend gaps in sqrt codegen, so we will need to watch out for regressions. I tried to squash those better with:
rG716e35a0cf53
rG1c9a09f42e5e

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
553–558	We should predicate this fold using the same FMF that the backend is using. Currently at least, we are requiring "nsz", so that should be checked here.
llvm/test/Transforms/InstCombine/fmul-sqrt.ll
109–112	This test could provide better coverage for the logic as implemented. The only FMF requirement is the "reassoc" (and likely "nsz") on the fmul, and that's typical of our FP folds so far. If we want to make that more conservative by requiring FMF on the fdiv or fsqrt too, this test would then offer a potential counter-example. There's some discussion about the semantics/subtlety of FMF in: https://llvm.org/PR46326

venkataramanan.kumar.llvm added inline comments.Aug 31 2020, 11:13 AM

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
553–558	When x= -0.0 then 1.0/sqrt(-0,0) * -0.0 = NAN = -0.0/sqrt(-0.0). So the transform as such don't require "nsz" check right? But yes then the folding will not happen in the back end when we have -0.0 . I will add the nsz check in the next revision of this patch.
llvm/test/Transforms/InstCombine/fmul-sqrt.ll
109–112	is there any restriction that FMF should be set on all the operands we try to re-associate ? When I read the PR it seems there is no such rule. For the patch I will add "nsz" to fmul. It already has the "fast" setting for other operations. is that the correct understanding?

venkataramanan.kumar.llvm updated this revision to Diff 289077.Aug 31 2020, 11:11 PM

This comment was removed by venkataramanan.kumar.llvm.

Updated the patch with "nsz" check .

spatel mentioned this in rGd48699e3e89f: [InstCombine] adjust recip sqrt tests for better coverage; NFC.Sep 1 2020, 6:44 AM

spatel added inline comments.Sep 1 2020, 6:52 AM

llvm/test/Transforms/InstCombine/fmul-sqrt.ll
109–112	No, there is no rule requiring FMF on all operands in the expression. The rules are currently unspecified as to exactly how FMF should be predicated/applied. That's what makes this and related transforms potentially controversial and subject to change in the future. I think it is easier to show my suggestion directly, so I updated the tests here: rGd48699e The 1st test now has the minimum FMF required to trigger the transform; the 2nd test is what we would typically expect -ffast-math code to look like (everything is full 'fast'). (Note: I also changed the 2nd test to use vector types for better test coverage.) Please have a look and rebase. Let me know if it makes sense.

Updated the patch as per comments received from Sanjay.

venkataramanan.kumar.llvm added inline comments.Sep 1 2020, 8:11 AM

llvm/test/Transforms/InstCombine/fmul-sqrt.ll
109–112	yes that makes sense. Updated the patch after rebase.

LGTM

This revision is now accepted and ready to land.Sep 1 2020, 9:33 AM

Closed by commit rG626c3738cdfa: [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X) (authored by venkataramanan.kumar.llvm, committed by spatel). · Explain WhySep 2 2020, 5:25 AM

This revision was automatically updated to reflect the committed changes.

spatel added a commit: rG626c3738cdfa: [InstCombine] Transform 1.0/sqrt(X) * X to X/sqrt(X).

Diff 289407

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

Show First 20 Lines • Show All 538 Lines • ▼ Show 20 Lines	if (I.hasAllowReassoc()) {
if (I.hasNoNaNs() &&		if (I.hasNoNaNs() &&
match(Op0, m_OneUse(m_Intrinsic<Intrinsic::sqrt>(m_Value(X)))) &&		match(Op0, m_OneUse(m_Intrinsic<Intrinsic::sqrt>(m_Value(X)))) &&
match(Op1, m_OneUse(m_Intrinsic<Intrinsic::sqrt>(m_Value(Y))))) {		match(Op1, m_OneUse(m_Intrinsic<Intrinsic::sqrt>(m_Value(Y))))) {
Value *XY = Builder.CreateFMulFMF(X, Y, &I);		Value *XY = Builder.CreateFMulFMF(X, Y, &I);
Value *Sqrt = Builder.CreateUnaryIntrinsic(Intrinsic::sqrt, XY, &I);		Value *Sqrt = Builder.CreateUnaryIntrinsic(Intrinsic::sqrt, XY, &I);
return replaceInstUsesWith(I, Sqrt);		return replaceInstUsesWith(I, Sqrt);
}		}

		// The following transforms are done irrespective of the number of uses
		// for the expression "1.0/sqrt(X)".
		// 1) 1.0/sqrt(X) * X -> X/sqrt(X)
		// 2) X * 1.0/sqrt(X) -> X/sqrt(X)
		// We always expect the backend to reduce X/sqrt(X) to sqrt(X), if it
		// has the necessary (reassoc) fast-math-flags.
		grandinjUnsubmitted Not Done Reply Inline Actions Just a drive-by commentator: I am surprised there is not an existing transform which does 1 * X ==> X X * 1 ==> X X / 1 ==> X grandinj: Just a drive-by commentator: I am surprised there is not an existing transform which does 1…
		venkataramanan.kumar.llvmAuthorUnsubmitted Done Reply Inline Actions Yes they are available. This one is specifically targeting the pattern where one operand for multiplication is X and the other operand is Divide. The dividend is 1.0 and divisor is sqrt(x). x * 1/sqrt(x) ==> x/sqrt(x) later on we try to fold it to sqrt(x) under associative math option. venkataramanan.kumar.llvm: Yes they are available. This one is specifically targeting the pattern where one operand for…
		grandinjUnsubmitted Not Done Reply Inline Actions If they are available, then surely it is not necessary to do X * 1.0 / sqrt(X) ==> X/sqrt(X). Surely by the time it gets there, that would already have been folded to X / sqrt(X) ? grandinj: If they are available, then surely it is not necessary to do X * 1.0 / sqrt(X) ==> X/sqrt…
		spatelUnsubmitted Not Done Reply Inline Actions The motivation for this patch is not obvious because it is not stated in the summary, not shown in code comments, and the tests are not pre-committed to show diffs. Please do all 3 of those things. This patch is a result of a decision made in D85709 (and I think that patch should be abandoned now): we are intentionally not folding x/sqrt(x) in IR because it loses information that the backend can't recover (for the case when x==0.0). So the multi-use case of the 1/sqrt(x) factor is the only reason we need this specialized transform. We already handle more general cases, and we are intentionally not creating extra (and likely expensive) fdiv ops in IR. This pattern is the exception to the rule because we always expect the backend to reduce x/sqrt(x) if it has the necessary (reassoc) fast-math-flags. Please see my comments in D86395 for an idea about how to write tests that provide coverage for the code proposed here (we need 2 tests to cover the 2 potential code patterns resulting from commutation. spatel: The motivation for this patch is not obvious because it is not stated in the summary, not shown…
		if (I.hasNoSignedZeros() &&
		match(Op0, (m_FDiv(m_SpecificFP(1.0), m_Value(Y)))) &&
		match(Y, m_Intrinsic<Intrinsic::sqrt>(m_Value(X))) && Op1 == X)
		return BinaryOperator::CreateFDivFMF(X, Y, &I);
		if (I.hasNoSignedZeros() &&
		match(Op1, (m_FDiv(m_SpecificFP(1.0), m_Value(Y)))) &&
		spatelUnsubmitted Not Done Reply Inline Actions We should predicate this fold using the same FMF that the backend is using. Currently at least, we are requiring "nsz", so that should be checked here. spatel: We should predicate this fold using the same FMF that the backend is using. Currently at least…
		venkataramanan.kumar.llvmAuthorUnsubmitted Done Reply Inline Actions When x= -0.0 then 1.0/sqrt(-0,0) * -0.0 = NAN = -0.0/sqrt(-0.0). So the transform as such don't require "nsz" check right? But yes then the folding will not happen in the back end when we have -0.0 . I will add the nsz check in the next revision of this patch. venkataramanan.kumar.llvm: When x= -0.0 then 1.0/sqrt(-0,0) * -0.0 = NAN = -0.0/sqrt(-0.0). So the transform as such…
		match(Y, m_Intrinsic<Intrinsic::sqrt>(m_Value(X))) && Op0 == X)
		return BinaryOperator::CreateFDivFMF(X, Y, &I);

// Like the similar transform in instsimplify, this requires 'nsz' because		// Like the similar transform in instsimplify, this requires 'nsz' because
// sqrt(-0.0) = -0.0, and -0.0 * -0.0 does not simplify to -0.0.		// sqrt(-0.0) = -0.0, and -0.0 * -0.0 does not simplify to -0.0.
if (I.hasNoNaNs() && I.hasNoSignedZeros() && Op0 == Op1 &&		if (I.hasNoNaNs() && I.hasNoSignedZeros() && Op0 == Op1 &&
Op0->hasNUses(2)) {		Op0->hasNUses(2)) {
// Peek through fdiv to find squaring of square root:		// Peek through fdiv to find squaring of square root:
// (X / sqrt(Y)) * (X / sqrt(Y)) --> (X * X) / Y		// (X / sqrt(Y)) * (X / sqrt(Y)) --> (X * X) / Y
if (match(Op0, m_FDiv(m_Value(X),		if (match(Op0, m_FDiv(m_Value(X),
m_Intrinsic<Intrinsic::sqrt>(m_Value(Y))))) {		m_Intrinsic<Intrinsic::sqrt>(m_Value(Y))))) {
▲ Show 20 Lines • Show All 1,010 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/fmul-sqrt.ll

Show First 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	;
%squared = fmul fast double %rsqrt, %rsqrt		%squared = fmul fast double %rsqrt, %rsqrt
ret double %squared		ret double %squared
}		}

define double @rsqrt_x_reassociate_extra_use(double %x, double * %p) {		define double @rsqrt_x_reassociate_extra_use(double %x, double * %p) {
; CHECK-LABEL: @rsqrt_x_reassociate_extra_use(		; CHECK-LABEL: @rsqrt_x_reassociate_extra_use(
; CHECK-NEXT: [[SQRT:%.]] = call double @llvm.sqrt.f64(double [[X:%.]])		; CHECK-NEXT: [[SQRT:%.]] = call double @llvm.sqrt.f64(double [[X:%.]])
; CHECK-NEXT: [[RSQRT:%.*]] = fdiv double 1.000000e+00, [[SQRT]]		; CHECK-NEXT: [[RSQRT:%.*]] = fdiv double 1.000000e+00, [[SQRT]]
; CHECK-NEXT: [[RES:%.*]] = fmul reassoc nsz double [[RSQRT]], [[X]]		; CHECK-NEXT: [[RES:%.]] = fdiv reassoc nsz double [[X:%.]], [[SQRT]]
; CHECK-NEXT: store double [[RSQRT]], double* [[P:%.*]], align 8		; CHECK-NEXT: store double [[RSQRT]], double* [[P:%.*]], align 8
; CHECK-NEXT: ret double [[RES]]		; CHECK-NEXT: ret double [[RES]]
;		;
%sqrt = call double @llvm.sqrt.f64(double %x)		%sqrt = call double @llvm.sqrt.f64(double %x)
%rsqrt = fdiv double 1.0, %sqrt		%rsqrt = fdiv double 1.0, %sqrt
%res = fmul reassoc nsz double %rsqrt, %x		%res = fmul reassoc nsz double %rsqrt, %x
		spatelUnsubmitted Not Done Reply Inline Actions This test could provide better coverage for the logic as implemented. The only FMF requirement is the "reassoc" (and likely "nsz") on the fmul, and that's typical of our FP folds so far. If we want to make that more conservative by requiring FMF on the fdiv or fsqrt too, this test would then offer a potential counter-example. There's some discussion about the semantics/subtlety of FMF in: https://llvm.org/PR46326 spatel: This test could provide better coverage for the logic as implemented. The only FMF requirement…
		venkataramanan.kumar.llvmAuthorUnsubmitted Done Reply Inline Actions is there any restriction that FMF should be set on all the operands we try to re-associate ? When I read the PR it seems there is no such rule. For the patch I will add "nsz" to fmul. It already has the "fast" setting for other operations. is that the correct understanding? venkataramanan.kumar.llvm: is there any restriction that FMF should be set on all the operands we try to re-associate ?
		spatelUnsubmitted Not Done Reply Inline Actions No, there is no rule requiring FMF on all operands in the expression. The rules are currently unspecified as to exactly how FMF should be predicated/applied. That's what makes this and related transforms potentially controversial and subject to change in the future. I think it is easier to show my suggestion directly, so I updated the tests here: rGd48699e The 1st test now has the minimum FMF required to trigger the transform; the 2nd test is what we would typically expect -ffast-math code to look like (everything is full 'fast'). (Note: I also changed the 2nd test to use vector types for better test coverage.) Please have a look and rebase. Let me know if it makes sense. spatel: No, there is no rule requiring FMF on all operands in the expression. The rules are currently…
		venkataramanan.kumar.llvmAuthorUnsubmitted Done Reply Inline Actions yes that makes sense. Updated the patch after rebase. venkataramanan.kumar.llvm: yes that makes sense. Updated the patch after rebase.
store double %rsqrt, double* %p		store double %rsqrt, double* %p
ret double %res		ret double %res
}		}

define <2 x float> @x_add_y_rsqrt_reassociate_extra_use(<2 x float> %x, <2 x float> %y, <2 x float>* %p) {		define <2 x float> @x_add_y_rsqrt_reassociate_extra_use(<2 x float> %x, <2 x float> %y, <2 x float>* %p) {
; CHECK-LABEL: @x_add_y_rsqrt_reassociate_extra_use(		; CHECK-LABEL: @x_add_y_rsqrt_reassociate_extra_use(
; CHECK-NEXT: [[ADD:%.]] = fadd fast <2 x float> [[X:%.]], [[Y:%.*]]		; CHECK-NEXT: [[ADD:%.]] = fadd fast <2 x float> [[X:%.]], [[Y:%.*]]
; CHECK-NEXT: [[SQRT:%.*]] = call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> [[ADD]])		; CHECK-NEXT: [[SQRT:%.*]] = call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> [[ADD]])
; CHECK-NEXT: [[RSQRT:%.*]] = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, [[SQRT]]		; CHECK-NEXT: [[RSQRT:%.*]] = fdiv fast <2 x float> <float 1.000000e+00, float 1.000000e+00>, [[SQRT]]
; CHECK-NEXT: [[RES:%.*]] = fmul fast <2 x float> [[ADD]], [[RSQRT]]		; CHECK-NEXT: [[RES:%.*]] = fdiv fast <2 x float> [[ADD]], [[SQRT]]
; CHECK-NEXT: store <2 x float> [[RSQRT]], <2 x float>* [[P:%.*]], align 8		; CHECK-NEXT: store <2 x float> [[RSQRT]], <2 x float>* [[P:%.*]], align 8
; CHECK-NEXT: ret <2 x float> [[RES]]		; CHECK-NEXT: ret <2 x float> [[RES]]
;		;
%add = fadd fast <2 x float> %x, %y ; thwart complexity-based canonicalization		%add = fadd fast <2 x float> %x, %y ; thwart complexity-based canonicalization
%sqrt = call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> %add)		%sqrt = call fast <2 x float> @llvm.sqrt.v2f32(<2 x float> %add)
%rsqrt = fdiv fast <2 x float> <float 1.0, float 1.0>, %sqrt		%rsqrt = fdiv fast <2 x float> <float 1.0, float 1.0>, %sqrt
%res = fmul fast <2 x float> %add, %rsqrt		%res = fmul fast <2 x float> %add, %rsqrt
store <2 x float> %rsqrt, <2 x float>* %p		store <2 x float> %rsqrt, <2 x float>* %p
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine]: Transform 1.0/sqrt(X) * X to X/sqrt(X) and X * 1.0/sqrt(X) to X/sqrt(X)
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 289407

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/test/Transforms/InstCombine/fmul-sqrt.ll

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine]: Transform 1.0/sqrt(X) * X to X/sqrt(X) and X * 1.0/sqrt(X) to X/sqrt(X)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 289407

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp

llvm/test/Transforms/InstCombine/fmul-sqrt.ll

[InstCombine]: Transform 1.0/sqrt(X) * X to X/sqrt(X) and X * 1.0/sqrt(X) to X/sqrt(X)
ClosedPublic