Page MenuHomePhabricator

[InstCombine] Add support for saturating add/sub
ClosedPublic

Authored by nikic on Nov 14 2018, 8:53 AM.

Details

Summary

This adds support for saturating add/sub intrinsics to InstCombine. In particular the following folds are supported:

  • sat(X1 + X2) -> add nuw/nsw where known no overflow
  • sat(X1 - X2) -> sub nuw/nsw where known no overflow
  • sat(X1 uadd X2) -> MAX where known overflow
  • sat(X1 usub X2) -> 0 where known overflow
  • sat(X ssub C) -> sat(X sadd C) where C != MIN
  • sat(sat(X + C1) + C2) -> sat(X + (C1 + C2)) where legal
  • sat(sat(X - C1) - C2) -> sat(X + (C1 + C2)) where legal

To properly support this, I've also changed the computeOverflowForUnsignedSub() implementation to return AlwaysOverflows results, for parity with the computeOverflowForUnsignedAdd() implementation.

A possible future improvement for the vector case is to support sat(sat(X + C1) + C2) -> sat(X + (C1 + C2)) style foldings also in the case where C1 and C2 are not constant splats.

Diff Detail

Event Timeline

nikic created this revision.Nov 14 2018, 8:53 AM
nikic updated this revision to Diff 175165.Nov 24 2018, 12:18 PM

Fix formatting, split up test cases, present tests as diff over baseline.

spatel added inline comments.Nov 26 2018, 2:14 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
2061–2068

This was already duplicated 3x, so I added a helper:
rL347604

I might have missed it, but I don't think the tests exercise this possibility? Also, the baseline tests are not checked into trunk yet?

2085–2087

Should we add this (and the later usub_sat fold to zero) to InstSimplify? That would make all of these cases more symmetric:
if (no_ov) { create regular math op }

At that point, it might not be worth using this switch-within-a-switch. Just handle each intrinsic in its own case statement.

spatel added inline comments.Nov 26 2018, 2:23 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
2046–2054

Can we reuse this code for the new intrinsics?

nikic marked 4 inline comments as done.Nov 26 2018, 2:42 PM
nikic added inline comments.
lib/Transforms/InstCombine/InstCombineCalls.cpp
2046–2054

This is generally possible. The reason why I did not do this is that for the always overflow case, we'd end up inserting an unused instruction (https://github.com/llvm-mirror/llvm/blob/7c7c0a2cd982d34de3e49dc0ef7a459e1262f306/lib/Transforms/InstCombine/InstCombineCompares.cpp#L3805).

If creating unnecessary instructions as an intermediate state is fine, I can switch this to use OptimizeOverflowCheck.

2061–2068

You're right, a test case for this is missing. I didn't commit the baseline tests yet, in case there are comments.

2085–2087

Adding this to InstSimplify would mean that we have to run the overflow calculation (which requires known bits) twice, once in InstSimplify for the always overflow case and one in InstCombine for the never overflow case. Not sure if that's worthwhile?

2109

One case this code currently does not handle is something like ssub.sat(ssub.add(%foo, -10, 10), i.e. mixes of signed sub and add where the sign ends up being the same if we account for the add/sub.

I think this case could be elegantly handled by canonicalizing ssub.sat(%foo, C) to ssub.add(%foo, -C) (assuming C != MIN), similar to the canonicalization that also happens for normal subs. What do you think about this?

spatel added inline comments.Nov 26 2018, 3:19 PM
lib/Transforms/InstCombine/InstCombineCalls.cpp
2046–2054

No, let's not do that then. Creating a dead instruction could mean that instcombine has to iterate the entire function again just to remove it.

2061–2068

Feel free to add/commit tests as you'd like with NFC patches. We can always adjust them if they don't provide coverage for the code that gets added later.

2085–2087

Good point. There's no clear answer for where we draw the line. Computing known bits is known to be expensive, but adding to instsimplify potentially gets the code optimized faster (and improves other passes).

It comes down to how often we expect to see these intrinsics in real code vs. how often we would expect to succeed at these transforms. My guess is those answers are "rare" and "rarer", but I have no data.

Let's just keep this as-is then.

2109

Yes, if we can convert ssub to sadd, let's do that. The more these are handled like the regular math ops, the better.

nikic updated this revision to Diff 175558.Nov 27 2018, 12:53 PM
nikic edited the summary of this revision. (Show Details)

Use operand swap helper, canonicalize ssub to sadd if possible, add tests for operand swapping and canonicalization, rebase over committed baseline tests.

spatel accepted this revision.Nov 27 2018, 2:45 PM

I think everything's correct here, so LGTM. But this is really several independent changes in 1 patch. Please split this up when committing, so we can reduce the risk of bugs.
It could be split like this or some other sequence if you see a better way:

  1. Canonicalize constants to Op1
  2. NeverOverflows / AlwaysOverflows
  3. ssub.sat(X, C) -> sadd.sat(X, -C)
  4. sat(sat(X + Val2) + Val) -> sat(X + (Val+Val2))
  5. sat(sat(X - Val2) - Val) -> sat(X - (Val+Val2))
This revision is now accepted and ready to land.Nov 27 2018, 2:45 PM
nikic marked 3 inline comments as done.Nov 28 2018, 8:44 AM

Split up and committed as rL347769, rL347770, rL347771, rL347772, rL347773.

Thanks for the reviews!

nikic closed this revision.Nov 28 2018, 8:44 AM