Page MenuHomePhabricator

[InstCombine] Start canonicalizing to uadd.sat and usub.sat
ClosedPublic

Authored by nikic on Mar 2 2019, 4:52 AM.

Details

Summary

At this point our uadd.sat and usub.sat support should be better than expanded IR, so start canonicalizing to these intrinsics. I'm porting the already existing patterns, though we'll also want to convert the currently canonical patterns in addition to that.

Diff Detail

Repository
rL LLVM

Event Timeline

nikic created this revision.Mar 2 2019, 4:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2019, 4:52 AM

Now that it's a single instruction, what about one-use checks there?

nikic added a comment.Mar 2 2019, 5:17 AM

Now that it's a single instruction, what about one-use checks there?

I've left them in place to avoid redundant calculations if the intrinsic needs to be expanded in the backend. If that's not a concern I can drop them (apart from the case that inserts an extra sub).

nikic updated this revision to Diff 189075.Mar 3 2019, 1:58 AM

Use CreateBinaryIntrinsic instead of CreateCall.

spatel added a subscriber: dmgreen.Mar 3 2019, 7:28 AM

There are 2 concerns here:

  1. Is the IR-level analysis for the intrinsics as good/better than the raw IR? The description says yes, so I assume we've filled all the known analysis, value tracking, vectorization, and combining gaps.
  2. Is codegen as good/better than the raw IR? Ie, for all known targets/types, are we DAG combining/lowering the intrinsics optimally?

I want to make sure about #2 because there was some potential fallout from aggressively forming the similar usub overflow intrinsic in CGP in D57789, so we ended up adding a target hook (cc @dmgreen).

nikic added a comment.Mar 3 2019, 8:40 AM

@spatel We're not lowering optimally, e.g. not using UQADD and friends on AArch64. They're not going to be used with expanded IR either though, only via target intrinsics. I can take a look at improving that first.

spatel added a comment.Mar 3 2019, 9:08 AM

@spatel We're not lowering optimally, e.g. not using UQADD and friends on AArch64. They're not going to be used with expanded IR either though, only via target intrinsics. I can take a look at improving that first.

Sounds good. I think this is the right transform in IR, but if there's anything we can do in advance to ensure that codegen doesn't regress, that's always best.

I don't think it will be affected with this patch yet, but I'm looking at 1 more x86 or general vector codegen improvement for min/max in PR14613:
https://bugs.llvm.org/show_bug.cgi?id=14613
...and then I think we can also convert min/max patterns to uadd.sat in IR.

Hello. This one looks OK from the tests I just ran. A few things moved, but only by a small amount, and usually in the right direction.

I think the "with.overflow" intrinsics were causing more problems for us. These saturations, at least in the tests I've ran, seem to be handled OK.

test/Transforms/InstCombine/unsigned_saturated_sub.ll
131 ↗(On Diff #189075)

Do these comments need updating?

I converted umin-based patterns for uaddsat here:
rL357012
That should give us an x86 improvement in PR14613.

I'm not sure if we are holding this patch up for any other changes, but if not and if the above change doesn't cause trouble, we should push ahead with this patch.

test/Transforms/InstCombine/unsigned_saturated_sub.ll
4–7 ↗(On Diff #189075)

Remove/update this comment.

nikic updated this revision to Diff 192313.Mar 26 2019, 12:04 PM
nikic marked 2 inline comments as done.

Update comments.

I'm not sure if we are holding this patch up for any other changes, but if not and if the above change doesn't cause trouble, we should push ahead with this patch.

I originally wanted to do https://bugs.llvm.org/show_bug.cgi?id=41023 before this, but had to give up on it.

spatel accepted this revision.Mar 27 2019, 5:59 AM

LGTM

This revision is now accepted and ready to land.Mar 27 2019, 5:59 AM
This revision was automatically updated to reflect the committed changes.