This is an archive of the discontinued LLVM Phabricator instance.

[X86][X86 intrinsics]Folding cmp(sub(a,b),0) into cmp(a,b) optimization
ClosedPublic

Authored by m_zuckerman on Mar 27 2017, 8:42 AM.

Details

Summary

This patch is a part of two reviews and base on the https://reviews.llvm.org/D31396.

This patch adds new optimization (Folding cmp(sub(a,b),0) into cmp(a,b))
to instCombineCall pass and was written specific for X86 CMP intrinsics.

Diff Detail

Event Timeline

RKSimon edited edge metadata.
RKSimon added a subscriber: llvm-commits.

Please make sure you always include llvm-commits as a subscriber in future patches.

lib/Transforms/InstCombine/InstCombineCalls.cpp
2401

Pull out these repeated calls to getFastMathFlags:

FastMathFlags FMFs = I->getFastMathFlags();
test/Transforms/InstCombine/X86FsubCmpCombine.ll
1 ↗(On Diff #93138)

Regenerate with utils/update_test_checks.py

5 ↗(On Diff #93138)

maybe split these into one test per intrinsic? also do tests for safe algebra (or maybe one of the other fmfs each per test).

12 ↗(On Diff #93138)

clean this up if possible

m_zuckerman marked 4 inline comments as done.
spatel edited edge metadata.Mar 31 2017, 7:25 AM
spatel added a subscriber: scanon.

If this transform is valid (cc'ing @scanon), then should we also do this for general IR?

define i1 @fcmpsub(double %x, double %y) {
  %sub = fsub nnan ninf nsz double %x, %y
  %cmp = fcmp nnan ninf nsz ugt double %sub, 0.0
  ret i1 %cmp
}

define i1 @fcmpsub(double %x, double %y) {
  %cmp = fcmp nnan ninf nsz ugt double %x, %y
  ret i1 %cmp
}
lib/Transforms/InstCombine/InstCombineCalls.cpp
2401

Don't we need to check the FMF of the intrinsic too?

2402–2403

Currently, unsafeAlgebra implies all of the other FMF bits, so checking that bit is redundant here.
If we change the definition of unsafeAlgebra in the future (there was a proposal for this recently), then this check will be wrong. Either way, remove the unsafeAlgebra predicate (unless I'm misunderstanding the constraints of this transform).

(x-y) == 0 --> x == y does not require nsz (zeros of any sign compare equal), nor does it require nnan (if x or y is NaN, both comparisons are false). It *does* require ninf (because inf-inf = NaN), and it also requires that subnormals are not flushed by the CPU. There's been some churn around flushing recently, Hal may have thoughts (+Hal).

m_zuckerman marked an inline comment as done.Apr 1 2017, 6:28 AM

If this transform is valid (cc'ing @scanon), then should we also do this for general IR?

define i1 @fcmpsub(double %x, double %y) {
  %sub = fsub nnan ninf nsz double %x, %y
  %cmp = fcmp nnan ninf nsz ugt double %sub, 0.0
  ret i1 %cmp
}

define i1 @fcmpsub(double %x, double %y) {
  %cmp = fcmp nnan ninf nsz ugt double %x, %y
  ret i1 %cmp
}

You are absolutely right, your transform is valid and we will do it after this patch.
Since the intrinsics are lowered with generic IR, mine patch is still valid and we will need them both for a complete solution.

lib/Transforms/InstCombine/InstCombineCalls.cpp
2401

No, we don't need to check, this is implied from the flags of the sub.
According to http://llvm.org/docs/LangRef.html#fast-math-flags optimizations can assume that the arguments and the result behave as expected from them. Since the compare uses the result and splits them to two arguments (the same arguments as in the sub) we are still working with the early assumption.

We can continue with the assumptions as long we will work with the same arguments or the same result.

m_zuckerman updated this revision to Diff 93747.Apr 1 2017, 7:07 AM
spatel added a comment.Apr 1 2017, 9:02 AM

You are absolutely right, your transform is valid and we will do it after this patch.
Since the intrinsics are lowered with generic IR, mine patch is still valid and we will need them both for a complete solution.

  1. We need to be conservative for the general case...speaking from experience :). As @scanon mentioned, we need some way to tell whether denorms are flushed to zero or not. I think this patch is safe currently because we assume the default FP ENV, and on x86 that would not have DAZ/FTZ.
  2. We don't need nsz or nnan for this fold (see @scanon comment).
  3. I'd prefer to put all of the tests in one file since they are just variants of the same fold.

You are absolutely right, your transform is valid and we will do it after this patch.
Since the intrinsics are lowered with generic IR, mine patch is still valid and we will need them both for a complete solution.

  1. We need to be conservative for the general case...speaking from experience :). As @scanon mentioned, we need some way to tell whether denorms are flushed to zero or not. I think this patch is safe currently because we assume the default FP ENV, and on x86 that would not have DAZ/FTZ.
  2. We don't need nsz or nnan for this fold (see @scanon comment).
  3. I'd prefer to put all of the tests in one file since they are just variants of the same fold.

1.I agree with you, in the general case we need to be caution, but this transformation is feasible.
Regard the X86 this transformation is safe (As you wrote) from all perspectives. We know that status of the flags and we know the target.

3.I am with you on that.

m_zuckerman updated this revision to Diff 93795.Apr 2 2017, 9:15 AM
spatel added inline comments.Apr 3 2017, 7:39 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
2396

You can use 'auto *' with dyn_cast because the type is obvious.

2398–2400

This comment should specify the non-obvious constraints that we've discussed here:

// This fold requires NINF because inf minus inf is nan.
// NSZ is not needed because zeros of any sign are equal for both compares.
// NNAN is not needed because nans compare the same for both compares.
// FMF are not needed on the compare intrinsic because...
test/Transforms/InstCombine/X86FsubCmpCombine.ll
6 ↗(On Diff #93795)

Misspelling in function name. Also, as suggested earlier, I'd really prefer to have one test per intrinsic rather than everything in one function. It makes it a lot easier to see the simple pattern that's getting folded.

m_zuckerman updated this revision to Diff 94199.Apr 5 2017, 4:38 AM
spatel accepted this revision.Apr 10 2017, 4:36 PM

LGTM - see inline comments for a couple of cleanups.

lib/Transforms/InstCombine/InstCombineCalls.cpp
2396

You missed this nit.

2407–2410

Giving local names to the operands doesn't add value here IMO, but if you want to do that, I prefer to use "A" and "B" to match the formula in the comment.

This revision is now accepted and ready to land.Apr 10 2017, 4:36 PM
hfinkel edited edge metadata.Apr 10 2017, 8:05 PM

(x-y) == 0 --> x == y does not require nsz (zeros of any sign compare equal), nor does it require nnan (if x or y is NaN, both comparisons are false). It *does* require ninf (because inf-inf = NaN), and it also requires that subnormals are not flushed by the CPU. There's been some churn around flushing recently, Hal may have thoughts (+Hal).

We still assume standard denormal handling. There's ongoing work to add intrinsics for operations that are sensitive to this behavior (and rounding modes, etc.), but that shouldn't affect this.

Hi,
I did a small modification on the code since I am going to retire from the canonical compare representation review.
I added to the code the ability to fold cmp(0, fsub(a,b)) additnal to the reviewrd cmp(fsub(a,b),0)

Thanks,
Michael Zuckerman

spatel added inline comments.Apr 14 2017, 7:26 AM
Transforms/InstCombine/InstCombineCalls.cpp
2348–2366 ↗(On Diff #95159)

There are many ways to deal with commuted patterns, and a 2-loop is my least favorite. Would you consider using std::swap and matchers instead? Something like:

Value *Arg0 = II->getArgOperand(0);
Value *Arg1 = II->getArgOperand(1);
bool Arg0IsZero = match(Arg0, m_Zero());
if (Arg0IsZero)
  std::swap(Arg0, Arg1);
Value *A, *B;
if ((match(Arg0, m_OneUse(m_FSub(m_Value(A), m_Value(B)))) &&
     match(Arg1, m_Zero()) &&
     cast<Instruction>(Arg0)->getFastMathFlags().noInfs())) {
  if (Arg0IsZero)
    std::swap(A, B);
  II->setArgOperand(0, A);
  II->setArgOperand(1, B);
  return II;
}

I agree with you that your code is more elegant.

This revision was automatically updated to reflect the committed changes.