This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] sub X, sext(bool Y) -> add X, zext(bool Y)
ClosedPublic

Authored by spatel on Sep 30 2016, 3:03 PM.

Details

Summary

I noticed the base case in D24527, and that is hopefully simple enough. Please let me know if I've thought about the nsw/nuw cases correctly.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 73151.Sep 30 2016, 3:03 PM
spatel retitled this revision from to [InstCombine] sub X, sext(bool Y) -> add X, zext(bool Y).
spatel updated this object.
spatel added reviewers: sanjoy, efriedma, majnemer.
spatel added a subscriber: llvm-commits.
spatel updated this revision to Diff 73964.Oct 7 2016, 12:15 PM

Ping.

Minor update:
Not sure if the use of 'auto' was welcome here, so I changed it to 'BinaryOperator *Add'.

efriedma added inline comments.Oct 13 2016, 12:01 PM
lib/Transforms/InstCombine/InstCombineAddSub.cpp
1648 ↗(On Diff #73964)

I would favor dropping the nuw flag over keeping IR in a form that isn't canonical.

test/Transforms/InstCombine/urem.ll
32 ↗(On Diff #73964)

For example, this seems better than "sub nuw i5 %x, %tmp".

spatel updated this revision to Diff 74579.Oct 13 2016, 1:59 PM

Patch updated as suggested by Eli:
Favor the simpler zext+add sequence over keeping an 'nuw' flag.

efriedma accepted this revision.Oct 13 2016, 2:06 PM
efriedma edited edge metadata.

Please make sure we have a test to check that we can correctly reverse the transform in SelectionDAG when we need to (for example, make sure we don't genereate an unnecessary AND for a vector compare on x86).

Otherwise LGTM.

This revision is now accepted and ready to land.Oct 13 2016, 2:06 PM

Please make sure we have a test to check that we can correctly reverse the transform in SelectionDAG when we need to (for example, make sure we don't genereate an unnecessary AND for a vector compare on x86).

How did you see that one coming? :)
Indeed, x86 doesn't deal with this and gets worse. This might be the same as the fixes needed before D25485 can proceed...

define <4 x i32> @foo(<4 x i32> %x, <4 x i32> %y, <4 x i32> %z) {
  %cmp = icmp eq <4 x i32> %x, %y
  %sext = sext <4 x i1> %cmp to <4 x i32>
  %sub = sub <4 x i32> %z, %sext
  ret <4 x i32> %sub
}

We have this:

vpcmpeqd	%xmm1, %xmm0, %xmm0
vpsubd	%xmm0, %xmm2, %xmm0

But with this patch, we get this:

vpcmpeqd	%xmm1, %xmm0, %xmm0
vpsrld	$31, %xmm0, %xmm0
vpaddd	%xmm2, %xmm0, %xmm0
spatel added a subscriber: RKSimon.Oct 13 2016, 3:50 PM

Oh wow, I hadn't updated in a day. @RKSimon just fixed this with D25374 / rL284015. The last test case in that patch is nearly what I posted here. :)

This revision was automatically updated to reflect the committed changes.