Page MenuHomePhabricator

[InstCombine]: Transform 1.0/sqrt(X) * X to X/sqrt(X) and X * 1.0/sqrt(X) to X/sqrt(X)
ClosedPublic

Authored by venkataramanan.kumar.llvm on Aug 27 2020, 10:34 AM.

Details

Summary

Implement the following transformation in InstCombine. These transforms will now be performed irrespective of the number of uses for the expression "1.0/sqrt(X)".

1.0/sqrt(X) * X => X/sqrt(X)
X * 1.0/sqrt(X) => X/sqrt(X)

We already handle more general cases, and we are intentionally not creating extra (and likely expensive) fdiv ops in IR. This pattern is the exception to the rule because we always expect the Backend to reduce X/sqrt(X) to sqrt(X), if it has the necessary (reassoc) fast-math-flags.

Ref: DagCombiner optimizes the X/sqrt(X) to sqrt(X).

Diff Detail

Event Timeline

venkataramanan.kumar.llvm requested review of this revision.Aug 27 2020, 10:34 AM
grandinj added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
552

Just a drive-by commentator: I am surprised there is not an existing transform which does

1 * X ==> X
X * 1 ==> X
X / 1 ==> X
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
552

Yes they are available.

This one is specifically targeting the pattern where one operand for multiplication is X and the other operand is Divide. The dividend is 1.0 and divisor is sqrt(x).

x * 1/sqrt(x) ==> x/sqrt(x) later on we try to fold it to sqrt(x) under associative math option.

grandinj added inline comments.Aug 28 2020, 1:25 AM
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
552

If they are available, then surely it is not necessary to do

X * 1.0 / sqrt(X) ==> X/sqrt(X).

Surely by the time it gets there, that would already have been folded to

X / sqrt(X)

?

Folding is not happening when the number of uses for the divide operand (say 1.0/something) is more than one.

The test case I added is same as below
ref: https://godbolt.org/z/xKh1n1
here 1/sqrt(x) is used more than once and the folding is not happening.

As I said this patch does not have the usage restriction and folds the particular case of x *1/sqrt(x) to x/sqrt(x) and later we will optimize the divide away to sqrt(x).

spatel added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
552

The motivation for this patch is not obvious because it is not stated in the summary, not shown in code comments, and the tests are not pre-committed to show diffs. Please do all 3 of those things.

This patch is a result of a decision made in D85709 (and I think that patch should be abandoned now): we are intentionally not folding x/sqrt(x) in IR because it loses information that the backend can't recover (for the case when x==0.0).

So the multi-use case of the 1/sqrt(x) factor is the only reason we need this specialized transform. We already handle more general cases, and we are intentionally not creating extra (and likely expensive) fdiv ops in IR. This pattern is the exception to the rule because we always expect the backend to reduce x/sqrt(x) if it has the necessary (reassoc) fast-math-flags.

Please see my comments in D86395 for an idea about how to write tests that provide coverage for the code proposed here (we need 2 tests to cover the 2 potential code patterns resulting from commutation.

Update the patch as per the review comments given by Sanjay.

Note that we may still have backend gaps in sqrt codegen, so we will need to watch out for regressions. I tried to squash those better with:
rG716e35a0cf53
rG1c9a09f42e5e

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
553–558

We should predicate this fold using the same FMF that the backend is using. Currently at least, we are requiring "nsz", so that should be checked here.

llvm/test/Transforms/InstCombine/fmul-sqrt.ll
109–112

This test could provide better coverage for the logic as implemented. The only FMF requirement is the "reassoc" (and likely "nsz") on the fmul, and that's typical of our FP folds so far.

If we want to make that more conservative by requiring FMF on the fdiv or fsqrt too, this test would then offer a potential counter-example.

There's some discussion about the semantics/subtlety of FMF in:
https://llvm.org/PR46326

llvm/lib/Transforms/InstCombine/InstCombineMulDivRem.cpp
553–558

When x= -0.0 then 1.0/sqrt(-0,0) * -0.0 = NAN = -0.0/sqrt(-0.0). So the transform as such don't require "nsz" check right?

But yes then the folding will not happen in the back end when we have -0.0 . I will add the nsz check in the next revision of this patch.

llvm/test/Transforms/InstCombine/fmul-sqrt.ll
109–112

is there any restriction that FMF should be set on all the operands we try to re-associate ?

When I read the PR it seems there is no such rule.

For the patch I will add "nsz" to fmul. It already has the "fast" setting for other operations. is that the correct understanding?

This comment was removed by venkataramanan.kumar.llvm.

Updated the patch with "nsz" check .

spatel added inline comments.Sep 1 2020, 6:52 AM
llvm/test/Transforms/InstCombine/fmul-sqrt.ll
109–112

No, there is no rule requiring FMF on all operands in the expression. The rules are currently unspecified as to exactly how FMF should be predicated/applied. That's what makes this and related transforms potentially controversial and subject to change in the future.

I think it is easier to show my suggestion directly, so I updated the tests here:
rGd48699e

The 1st test now has the minimum FMF required to trigger the transform; the 2nd test is what we would typically expect -ffast-math code to look like (everything is full 'fast').

(Note: I also changed the 2nd test to use vector types for better test coverage.)

Please have a look and rebase. Let me know if it makes sense.

Updated the patch as per comments received from Sanjay.

llvm/test/Transforms/InstCombine/fmul-sqrt.ll
109–112

yes that makes sense. Updated the patch after rebase.

spatel accepted this revision.Sep 1 2020, 9:33 AM

LGTM

This revision is now accepted and ready to land.Sep 1 2020, 9:33 AM
This revision was automatically updated to reflect the committed changes.