Constant folding and instcombine for saturating adds
AcceptedPublic

Authored by nikic on Wed, Nov 7, 4:18 PM.

Details

Summary

We'd like to make use of the new saturating add intrinsics in Rust (in part because some obvious patterns don't optimize when implemented via with.overflow intrinsics). However right now saturing adds seems to be completely opaque to the optimizer.

This patch implements some basic optimization support for saturating add intrinsics, namely:

  • Constant folding sat(C1 + C2) -> C3
  • sat(X + 0) -> X
  • sat(X + undef) -> X
  • sat(X uadd MAX) -> MAX
  • sat(sat(X + C1) + C2) -> sat(X + C3) where legal
  • sat(X1 + X2) -> add nuw/nsw where possible

Diff Detail

nikic created this revision.Wed, Nov 7, 4:18 PM

What about subtraction?

leonardchan accepted this revision.Wed, Nov 7, 9:42 PM

LGTM, but you should probably wait for LGTM from @craig.topper since he's likely more familiar with InstCombine than I am.

This revision is now accepted and ready to land.Wed, Nov 7, 9:42 PM
bjope added a comment.Wed, Nov 7, 11:58 PM

I'm no expert on how things are divided between InstructionSimplify and InstCombine, but shouldn't the simple folds be in InstructionSimplify?
For the record, we were planning to upstream something like this in simplifyBinaryIntrinsic in InstructionSimplify.cpp:

case Intrinsic::sadd_sat:
  // X + 0 -> X
  if (match(Op1, m_Zero()))
    return Op0;

  // 0 + X -> X
  if (match(Op0, m_Zero()))
    return Op1;

  // X + undef -> undef
  // undef + X -> undef
  if (match(Op1, m_Undef()) || match(Op0, m_Undef()))
    return UndefValue::get(ReturnType);

  break;
case Intrinsic::ssub_sat:
  // X - 0 -> X
  if (match(Op1, m_Zero()))
    return Op0;

  // X - undef -> undef
  // undef - X -> undef
  if (match(Op1, m_Undef()) || match(Op0, m_Undef()))
    return UndefValue::get(ReturnType);

  break;

The above also fold expressions involving undef.

nit: In our dowstream repo we currently have implemented folds for sadd_sat and ssub_sat. So even if doing add/sub in different patches, it would be really nice to have a patch for ssub_sat and usub_sat ready and land both patches at the same time. That way we can replace things in one merge, without the need to support the old folds of ssub_sat and the new folds for sadd_sat during a transition period.

lib/Support/APInt.cpp
1953 ↗(On Diff #173074)

Coding style: skip all braces here.

1966 ↗(On Diff #173074)

Coding style: skip all braces here.

lib/Transforms/InstCombine/InstCombineCalls.cpp
2027

full stop/period missing

2045

full stop/period missing

2077

full stop/period missing

2081

full stop/period missing

RKSimon requested changes to this revision.Thu, Nov 8, 1:20 AM
RKSimon added a reviewer: RKSimon.
RKSimon added a subscriber: RKSimon.

Please can you add APInt unit tests

This revision now requires changes to proceed.Thu, Nov 8, 1:20 AM
nikic added a comment.Thu, Nov 8, 1:39 AM

Regarding

// X + undef -> undef
// undef + X -> undef
if (match(Op1, m_Undef()) || match(Op0, m_Undef()))
  return UndefValue::get(ReturnType);

I was initially planning to include these simplifications, but ultimately was not certain regarding their legality. In particular, if we have uadd.sat(MaxValue, Y), then the result is fully determined to be MaxValue, regardless of the value of Y. If we have something like sadd.sat(SignedMinValue, Y) then the result is known to be negative. In either case the intrinsic cannot have the full range of results of the result type, regardless of the value of Y. As such, I think folding operations on undef to undef would not be legal in this case.

It should be possible to fold uadd.sat(X, undef) to MaxValue. Not sure how useful that is though.

bjope added a comment.Thu, Nov 8, 1:59 AM

Regarding

// X + undef -> undef
// undef + X -> undef
if (match(Op1, m_Undef()) || match(Op0, m_Undef()))
  return UndefValue::get(ReturnType);

I was initially planning to include these simplifications, but ultimately was not certain regarding their legality. In particular, if we have uadd.sat(MaxValue, Y), then the result is fully determined to be MaxValue, regardless of the value of Y. If we have something like sadd.sat(SignedMinValue, Y) then the result is known to be negative. In either case the intrinsic cannot have the full range of results of the result type, regardless of the value of Y. As such, I think folding operations on undef to undef would not be legal in this case.

It should be possible to fold uadd.sat(X, undef) to MaxValue. Not sure how useful that is though.

Ok, I did not really think about it that way, but you definitely got a point there.
But for example, sadd.sat(undef, undef) can still be folded to undef, right?
Anyway, I'm not sure how important folds involving undef are here at all. I just noticed that undef was handled for sadd_with_overflow etc, and figured we needed it for "completeness".

nikic updated this revision to Diff 173134.Thu, Nov 8, 2:26 AM

Address style comments, move trivial transforms to InstructionSimplify, add APInt unit test, move code for overflow information based checks outside the condition for a constant operand (duh).

nikic marked 5 inline comments as done.Thu, Nov 8, 2:30 AM
nikic added inline comments.
lib/Support/APInt.cpp
1953 ↗(On Diff #173074)

I've left the braces on the overflow branch, otherwise there's a dangling else compiler warning.

bjope added inline comments.Thu, Nov 8, 2:32 AM
lib/Support/APInt.cpp
1953 ↗(On Diff #173074)

Yes, this is ok (not sure why I wrote "skip all", that was confusing).

rkruppe added a subscriber: rkruppe.Thu, Nov 8, 3:28 AM
RKSimon added inline comments.Thu, Nov 8, 5:42 AM
lib/Support/APInt.cpp
1953 ↗(On Diff #173074)

It might be cleaner as:

if (!Overflow)
  return Res;

return isNegative() ? APInt::getSignedMinValue(Res.getBitWidth())
                    : APInt::getSignedMaxValue(Res.getBitWidth());
craig.topper added inline comments.Thu, Nov 8, 10:27 AM
lib/Transforms/InstCombine/InstCombineCalls.cpp
2065

saturing->saturating

nikic updated this revision to Diff 173207.Thu, Nov 8, 11:59 AM

Improve control-flow in APInt implementation, fix typo in comment.

nikic marked 5 inline comments as done.Thu, Nov 8, 2:05 PM

I've added an implementation for saturating subs in D54274. Please tell me if I should combine the changes into this one instead.

nlopes added a subscriber: nlopes.Thu, Nov 8, 2:41 PM

Regarding

// X + undef -> undef
// undef + X -> undef
if (match(Op1, m_Undef()) || match(Op0, m_Undef()))
  return UndefValue::get(ReturnType);

I was initially planning to include these simplifications, but ultimately was not certain regarding their legality. In particular, if we have uadd.sat(MaxValue, Y), then the result is fully determined to be MaxValue, regardless of the value of Y. If we have something like sadd.sat(SignedMinValue, Y) then the result is known to be negative. In either case the intrinsic cannot have the full range of results of the result type, regardless of the value of Y. As such, I think folding operations on undef to undef would not be legal in this case.

It should be possible to fold uadd.sat(X, undef) to MaxValue. Not sure how useful that is though.

You can also assume that undef is 0 and fold X + undef -> X.

nikic updated this revision to Diff 173289.Fri, Nov 9, 2:33 AM
nikic edited the summary of this revision. (Show Details)

Add simplify for X + undef -> X.

Please can you pull out the APInt changes + unit tests from this and D54274 into their own patch - I'll give that a quick check but I think that part is ready. I'll leave the IR changes to everyone else.

bjope added a comment.Fri, Nov 9, 10:49 AM

Regarding

// X + undef -> undef
// undef + X -> undef
if (match(Op1, m_Undef()) || match(Op0, m_Undef()))
  return UndefValue::get(ReturnType);

I was initially planning to include these simplifications, but ultimately was not certain regarding their legality. In particular, if we have uadd.sat(MaxValue, Y), then the result is fully determined to be MaxValue, regardless of the value of Y. If we have something like sadd.sat(SignedMinValue, Y) then the result is known to be negative. In either case the intrinsic cannot have the full range of results of the result type, regardless of the value of Y. As such, I think folding operations on undef to undef would not be legal in this case.

It should be possible to fold uadd.sat(X, undef) to MaxValue. Not sure how useful that is though.

You can also assume that undef is 0 and fold X + undef -> X.

Is it perhaps "better" to fold sadd_sat(X, undef) -> 0 And uadd_sat(X, undef) -> MaxValue` if we want to get rid of undef here? That way we get rid of the X operand as well.

nikic added a comment.Sun, Nov 11, 2:20 AM

Is it perhaps "better" to fold sadd_sat(X, undef) -> 0 And uadd_sat(X, undef) -> MaxValue if we want to get rid of undef here? That way we get rid of the X operand as well.

Folding sadd_sat(X, undef) -> 0 would not be valid, because if X = SignedMinValue, there is no choice of undef for which the result would be zero. The largest value that can be reached is -1.

I think this is a list of the possibilities that we have:

For signed add I think there is only one choice:

sadd_sat(X, undef) -> X   // for undef = 0
sadd_sat(undef, X) -> X   // for undef = 0

For signed sub we have two variants:

ssub_sat(X, undef) -> X   // for undef = 0
ssub_sat(undef, X)        // don't simplify
// or
ssub_sat(X, undef) -> 0   // for undef = X
ssub_sat(undef, X) -> 0   // for undef = X

For unsigned add we also have two:

uadd_sat(X, undef) -> X   // for undef = 0
uadd_sat(undef, X) -> X   // for undef = 0
// or
uadd_sat(X, undef) -> MAX // for undef = MAX
uadd_sat(undef, X) -> MAX // for undef = MAX

For unsigned sub also two:

usub_sat(X, undef) -> X   // for undef = 0
usub_sat(undef, X)        // don't simplify
// or
usub_sat(X, undef) -> 0   // for undef = X
usub_sat(undef, X) -> 0   // for undef = X

Given these possibilities, I would suggest to use the following folds. They don't result in the maximal number of constant results, but they have consistent results/assumptions for the signed&unsigned cases:

sadd_sat(X, undef) -> X   // for undef = 0
sadd_sat(undef, X) -> X   // for undef = 0
uadd_sat(X, undef) -> X   // for undef = 0
uadd_sat(undef, X) -> X   // for undef = 0
ssub_sat(X, undef) -> 0   // for undef = X
ssub_sat(undef, X) -> 0   // for undef = X
usub_sat(X, undef) -> 0   // for undef = X
usub_sat(undef, X) -> 0   // for undef = X
nikic updated this revision to Diff 173549.Sun, Nov 11, 2:31 AM

Rebase on separate APInt patch.

bjope added a comment.Sun, Nov 11, 3:02 AM

Is it perhaps "better" to fold sadd_sat(X, undef) -> 0 And uadd_sat(X, undef) -> MaxValue if we want to get rid of undef here? That way we get rid of the X operand as well.

Folding sadd_sat(X, undef) -> 0 would not be valid, because if X = SignedMinValue, there is no choice of undef for which the result would be zero. The largest value that can be reached is -1.

Ok. Got it! Sorry for the confusion.

Btw, thanks for adding these new folds of saturating add/sub.

Please can you add vector constant folding tests + support?

bjope added a comment.Wed, Nov 14, 6:16 AM

Please can you add vector constant folding tests + support?

Isn't it better to land this one and https://reviews.llvm.org/D54274 first since they basically are ready?
I think they are ok and reviewed (I think the only thing blocking was the @RKSimon request to extract APInt stuff and that is done).

If we need to support vectors also at once, then I think you should merge the add/sub patches into one big pile (otherwise it just complicates review and patch handling for @nikic since the "subtraction" patch basically rewrites lots of the things added in the "addition" patch.

RKSimon accepted this revision.Wed, Nov 14, 6:50 AM

LGTM - happy for vector support to be added as a follow up patch

This revision is now accepted and ready to land.Wed, Nov 14, 6:50 AM

I'm no expert on how things are divided between InstructionSimplify and InstCombine, but shouldn't the simple folds be in InstructionSimplify?

The rule is that if you don't create any new variable values (ie, you are returning an existing value or constant), then it should go in InstSimplify. Note that InstSimplify is used as an analysis by other passes like GVN, so putting transforms in there improves things beyond InstCombine.

I see that this patch already has 2 approvals, so I won't block it, but we really should have the regression tests in the corresponding test directory (and using the appropriate pass in the RUN line). So the constant folding hunk of this patch should really be its own commit with tests in test/Analysis/ConstantFolding/, and the InstSimplify hunk of this patch should be its own commit with tests in test/Transforms/InstSimplify. Please make those changes as a follow-up if this is committed initially as a single patch.

nikic added a comment.Wed, Nov 14, 8:57 AM

I've now split this up in separate patches, separated the tests for constant folding, instruction simplify and instcombine and added tests for vector operations. The patches are now (with the add/sub cases combined):

I did not change any of the code, only the tests. Vectors to the most part work out of the box. The only thing that does not work with vectors are sat(sat(X + C1) + C2) -> sat(X + (C1 + C2)) simplifications where C1 and C2 are not constant splats. It would be possible to support this, but I'm not sure it's worth the extra complexity. From looking at the surrounding codes it seems like nothing else tries to handle the non-splat case either.