This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701, PR26819)
ClosedPublic

Authored by spatel on Mar 3 2016, 10:49 AM.

Details

Summary

This is an update of:
http://reviews.llvm.org/rL262424

That was reverted because it caused failures in an end-to-end clang AArch64 test ( clang/test/CodeGen/aarch64-neon-misc.c ).
That failure no longer happens after:
http://reviews.llvm.org/rL262623
Ie, the AArch64 backend recognizes the pattern produced by this patch and generates the expected code.

There is an end-to-end improvement for AArch64 from this patch that is not currently tested anywhere AFAIK. That is noted in PR26819:
https://llvm.org/bugs/show_bug.cgi?id=26819

This source:

// Are elements of 'a' <= -1? Ie, is 'a' negative?
int32x2_t test_vcle_s32(int32x2_t a) {
  return vcle_s32(a, vceq_s32(a, a) );
}

Becomes:

test_vcle_s32:
  sshr	v0.2s, v0.2s, #31
  ret

Instead of the current:

test_vcle_s32:
  movi	d1, #0xffffffffffffffff
  cmge	v0.2s, v1.2s, v0.2s
  ret

Given the current controversy about end-to-end testing, I have not included a test for that in this patch. But it seems to me that we should have that kind of test *somewhere* to make sure that IR optimizations are correctly handled by a backend. In other words, if that current end-to-end AArch clang test didn't exist, I wouldn't have known that this patch pessimized AArch64. It's entirely possible that some other backend will still be pessimized by this change because it has no equivalent end-to-end test. The difference is sufficiently small that it is unlikely to show up as a perf regression anywhere, but it undoubtedly would be a regression.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 49754.Mar 3 2016, 10:49 AM
spatel retitled this revision from to [InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701, PR26819).
spatel updated this object.
spatel added a subscriber: llvm-commits.

Ping.
I hope my FUD-inducing summary didn't shoot this patch in the foot. :)

mehdi_amini added inline comments.Mar 24 2016, 4:26 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

I usually see canonicalization as something that is target independent, and then after optimization the lowering can adjust toward something that the target will "like" (ultimately the SDAG legalizer is doing this). Any comment?

spatel added inline comments.Mar 24 2016, 5:21 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

That sounds right to me in general. But are you suggesting that we should do the comparison canonicalizations for vector types too? Ie, (x <=s -1) --> (x <s 0).

My code comment here is my assumption about why we *don't* do that - vector ISAs may have limited functionality, so some backends (x86 at least) would have to then undo that canonicalization.

I see 2 reasons to justify this patch as written:

  1. The shift transform is an actual IR optimization, not just a canonicalization - it reduces the instruction count and/or complexity of the IR.
  2. We're already doing this (shift) transform for some comparisons, so we might as well include the other, related comparison operators where it can apply.
mehdi_amini edited edge metadata.Apr 7 2016, 5:24 PM

Someone should provide some feedback on the vector canonicalization question (Chandler? David?)

In D17859#395086, @joker.eph wrote:

Someone should provide some feedback on the vector canonicalization question (Chandler? David?)

Adding Chandler to reviewers.

Ping * 6.

For reference, some other questions about where we draw the 'canonical IR' line:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html
https://llvm.org/bugs/show_bug.cgi?id=27343
https://llvm.org/bugs/show_bug.cgi?id=24766#c2

We try to maintain our target-independent idealism in IR, particularly in InstCombine, but we do make concessions for practical target constraints.

For this (really minor!) case, I think the potentially limited range of vector functionality is reason not to try hard to canonicalize vector comparisons in the same way as integer comparisons.

hfinkel added inline comments.Apr 26 2016, 1:54 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

That sounds right to me in general. But are you suggesting that we should do the comparison canonicalizations for vector types too? Ie, (x <=s -1) --> (x <s 0).

Yes. Having different canonicalization rules for vectors and scalars is confusing, and leads to problems like this. Keeping track of what we do, in general, is hard. Keeping track of different rules for scalars and vectors should be avoided if possible.

My code comment here is my assumption about why we *don't* do that - vector ISAs may have limited functionality, so some backends (x86 at least) would have to then undo that canonicalization.

Often times these things are just oversights, or were done a long time ago when the backend infrastructure was not as mature. I don't know how we ended up in this state on this particular issue, but given that matching this in the backend seems fairly easy (in either form), we should do that.

chandlerc added inline comments.Apr 27 2016, 12:45 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

+1 to what Hal said.

We should canonicalize vectors and scalars aggressively and solve problems that arise in the backend.

We shouldn't add more code to match non-canonical patterns in instcombine, we should fix the canonicalization.

spatel added inline comments.Apr 29 2016, 1:47 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

Some day I'll get canonicalization right. :)

I've drafted a patch to do as suggested here, but now I'm wondering if there's another reason not to treat vectors and scalars the same for the purposes of icmp with constant: there's a fundamental difference between vector and scalar icmp ops.

Example:
icmp uge i32 %x, 0 --> true
icmp uge <2 x i32> %x, <i32 0, i32 42> --> ?

We want to subtract 1 from the constant operand and canonicalize to 'ugt' here. We don't have to worry about the first case in InstCombiner::visitICmpInst() because InstSimplify takes care of those kinds of edge cases. That doesn't exist for vectors (although it probably should for a splatted constant vector).

Is it still worth doing this canonicalization if it doesn't work in general?

spatel updated this revision to Diff 57102.May 12 2016, 2:44 PM
spatel edited edge metadata.

Patch updated:
I didn't see a reply to my latest question: is it really a canonicalization if it only works some of the time?
But here's a semi-canonicalization patch for comparison to the earlier draft.

chandlerc added inline comments.May 12 2016, 3:18 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

This is a great question.

I'm not sure. I can see at least three options:

  1. Canonicalize to ugt when we can, but not always, much as your code does here. Teach passes that assume we canonicalized on ugt to consider the uge case for vectors.
  1. Drop all canonicalizing to ugt, and audit the code that matches on ugt to also match on uge.
  1. Canonicalize the vector case to ugt as well by replacing the boundary elements with undef, and then inserting a 'true' over the resulting i1 lane coming out of the icmp.

While #1 is clearly the simplest and safest approach, I'm not sure what the right long term solution is here. #3 seems really appealing if it works, but I'm worried that it the insertion of the true will end up causing problems downstream. Might be worth some experimentation though.

spatel added inline comments.May 12 2016, 3:46 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

For #3, we would have to add insertelements and/or shufflevectors to the IR, right? Here we reach a strange place where bigger IR is "more canonical" than smaller IR?

I'm still holding out hope for option #2 (the earlier draft of this patch) because:

  1. It's just ~2 lines of code to catch a very specific optimization (turning a compare into a shift).
  2. The odds that the canonicalization is actually helpful for vector integer targets seem slim.

In this draft, I've added a lot of code, and it's really only that one tiny shift transform that I can see as a benefit. It's just not worth the effort IMO.

chandlerc added inline comments.May 12 2016, 4:04 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

For #3, yes, we'd add an instruction which clobbers a lane with a constant. The challenging part of vectors is modeling partial constants. For almost all cases I wouldn't expect it to be worth it but for i1 vectors I think it might be OK. But I'm not sure yet.

Regarding #2, I don't really understand why this is only for turning a compare into a shift. There are a *lot* of places in the code that match on ICMP_UGT only and not on ICMP_UGE... I'm also surprised actually that these are that related because the new patch only seems to touch UGE and UGT not SGE and SGT...

spatel added inline comments.May 12 2016, 4:33 PM
lib/Transforms/InstCombine/InstCombineCasts.cpp
985 ↗(On Diff #49754)

I agree that there's lots of potential ICMP transforms, but I just don't see that in actual vector code. Maybe I'm just not imagining the possibilities well enough.

Another potential problem is that we just don't have enough of the IR plumbing done so that it can trigger follow-on transforms (eg, https://llvm.org/bugs/show_bug.cgi?id=24942 ).

The new patch (doing the canonicalization when possible but ignoring the edge cases) is enabled for all of UGE/ULE/SGE/SLE. Did I miss something? I am experiencing some weirdness in Phab from replying inline to a spot that only exists in the old version of the patch. :)

chandlerc accepted this revision.May 12 2016, 7:53 PM
chandlerc edited edge metadata.

LGTM -- I think this is the best option we have for now.

I think we should add support for the ge/le icmp variants to any transforms that we end up with test cases for where we have a constant at a boundary here. This is no worse than completely removing the canonicalization, but lets us be lazy and allows the easy cases to still collapse to the same predictable structure.

lib/Transforms/InstCombine/InstCombineCompares.cpp
3145–3146 ↗(On Diff #57102)

I'd not have the comment talk about a missing instsimplify pass as we'll likely fail to remove it when adding that...

This revision is now accepted and ready to land.May 12 2016, 7:53 PM
spatel added inline comments.May 13 2016, 7:05 AM
lib/Transforms/InstCombine/InstCombineCompares.cpp
3145–3146 ↗(On Diff #57102)

Good point. I'll replace this with a blurb about the trade-offs that we discussed here. Thanks!

This revision was automatically updated to reflect the committed changes.