This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701, PR26819)
ClosedPublic

Authored by spatel on Mar 3 2016, 10:49 AM.

Download Raw Diff

Details

Reviewers

rengolin
chandlerc
t.p.northover
majnemer
MatzeB
escha
mehdi_amini
hfinkel

Commits

rGb79ab27853b8: [InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT (PR26701…
rL269426: [InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT…

Summary

This is an update of:
http://reviews.llvm.org/rL262424

That was reverted because it caused failures in an end-to-end clang AArch64 test ( clang/test/CodeGen/aarch64-neon-misc.c ).
That failure no longer happens after:
http://reviews.llvm.org/rL262623
Ie, the AArch64 backend recognizes the pattern produced by this patch and generates the expected code.

There is an end-to-end improvement for AArch64 from this patch that is not currently tested anywhere AFAIK. That is noted in PR26819:
https://llvm.org/bugs/show_bug.cgi?id=26819

This source:

// Are elements of 'a' <= -1? Ie, is 'a' negative?
int32x2_t test_vcle_s32(int32x2_t a) {
  return vcle_s32(a, vceq_s32(a, a) );
}

Becomes:

test_vcle_s32:
  sshr	v0.2s, v0.2s, #31
  ret

Instead of the current:

test_vcle_s32:
  movi	d1, #0xffffffffffffffff
  cmge	v0.2s, v1.2s, v0.2s
  ret

Given the current controversy about end-to-end testing, I have not included a test for that in this patch. But it seems to me that we should have that kind of test *somewhere* to make sure that IR optimizations are correctly handled by a backend. In other words, if that current end-to-end AArch clang test didn't exist, I wouldn't have known that this patch pessimized AArch64. It's entirely possible that some other backend will still be pessimized by this change because it has no equivalent end-to-end test. The difference is sufficiently small that it is unlikely to show up as a perf regression anywhere, but it undoubtedly would be a regression.

Diff Detail

Event Timeline

spatel updated this revision to Diff 49754.Mar 3 2016, 10:49 AM

spatel retitled this revision from to [InstCombine] convert 'isPositive' and 'isNegative' vector comparisons to shifts (PR26701, PR26819).

spatel updated this object.

spatel added reviewers: t.p.northover, rengolin, majnemer.

spatel added a subscriber: llvm-commits.

Herald added subscribers: mcrosier, aemerson. · View Herald TranscriptMar 3 2016, 10:49 AM

Ping.
I hope my FUD-inducing summary didn't shoot this patch in the foot. :)

Ping * 2.

Ping * 3.

mehdi_amini added inline comments.Mar 24 2016, 4:26 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	I usually see canonicalization as something that is target independent, and then after optimization the lowering can adjust toward something that the target will "like" (ultimately the SDAG legalizer is doing this). Any comment?

spatel added inline comments.Mar 24 2016, 5:21 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	That sounds right to me in general. But are you suggesting that we should do the comparison canonicalizations for vector types too? Ie, (x <=s -1) --> (x <s 0). My code comment here is my assumption about why we don't do that - vector ISAs may have limited functionality, so some backends (x86 at least) would have to then undo that canonicalization. I see 2 reasons to justify this patch as written: The shift transform is an actual IR optimization, not just a canonicalization - it reduces the instruction count and/or complexity of the IR. We're already doing this (shift) transform for some comparisons, so we might as well include the other, related comparison operators where it can apply.

Ping * 4.

Ping * 5.

Someone should provide some feedback on the vector canonicalization question (Chandler? David?)

In D17859#395086, @joker.eph wrote:

Someone should provide some feedback on the vector canonicalization question (Chandler? David?)

Adding Chandler to reviewers.

Ping * 6.

For reference, some other questions about where we draw the 'canonical IR' line:
http://lists.llvm.org/pipermail/llvm-dev/2016-April/098154.html
https://llvm.org/bugs/show_bug.cgi?id=27343
https://llvm.org/bugs/show_bug.cgi?id=24766#c2

We try to maintain our target-independent idealism in IR, particularly in InstCombine, but we do make concessions for practical target constraints.

For this (really minor!) case, I think the potentially limited range of vector functionality is reason not to try hard to canonicalize vector comparisons in the same way as integer comparisons.

hfinkel added inline comments.Apr 26 2016, 1:54 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	That sounds right to me in general. But are you suggesting that we should do the comparison canonicalizations for vector types too? Ie, (x <=s -1) --> (x <s 0). Yes. Having different canonicalization rules for vectors and scalars is confusing, and leads to problems like this. Keeping track of what we do, in general, is hard. Keeping track of different rules for scalars and vectors should be avoided if possible. My code comment here is my assumption about why we don't do that - vector ISAs may have limited functionality, so some backends (x86 at least) would have to then undo that canonicalization. Often times these things are just oversights, or were done a long time ago when the backend infrastructure was not as mature. I don't know how we ended up in this state on this particular issue, but given that matching this in the backend seems fairly easy (in either form), we should do that.

chandlerc added inline comments.Apr 27 2016, 12:45 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	+1 to what Hal said. We should canonicalize vectors and scalars aggressively and solve problems that arise in the backend. We shouldn't add more code to match non-canonical patterns in instcombine, we should fix the canonicalization.

spatel added inline comments.Apr 29 2016, 1:47 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	Some day I'll get canonicalization right. :) I've drafted a patch to do as suggested here, but now I'm wondering if there's another reason not to treat vectors and scalars the same for the purposes of icmp with constant: there's a fundamental difference between vector and scalar icmp ops. Example: icmp uge i32 %x, 0 --> true icmp uge <2 x i32> %x, <i32 0, i32 42> --> ? We want to subtract 1 from the constant operand and canonicalize to 'ugt' here. We don't have to worry about the first case in InstCombiner::visitICmpInst() because InstSimplify takes care of those kinds of edge cases. That doesn't exist for vectors (although it probably should for a splatted constant vector). Is it still worth doing this canonicalization if it doesn't work in general?

Patch updated:
I didn't see a reply to my latest question: is it really a canonicalization if it only works some of the time?
But here's a semi-canonicalization patch for comparison to the earlier draft.

chandlerc added inline comments.May 12 2016, 3:18 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	This is a great question. I'm not sure. I can see at least three options: Canonicalize to ugt when we can, but not always, much as your code does here. Teach passes that assume we canonicalized on ugt to consider the uge case for vectors. Drop all canonicalizing to ugt, and audit the code that matches on ugt to also match on uge. Canonicalize the vector case to ugt as well by replacing the boundary elements with undef, and then inserting a 'true' over the resulting i1 lane coming out of the icmp. While #1 is clearly the simplest and safest approach, I'm not sure what the right long term solution is here. #3 seems really appealing if it works, but I'm worried that it the insertion of the true will end up causing problems downstream. Might be worth some experimentation though.

spatel added inline comments.May 12 2016, 3:46 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	For #3, we would have to add insertelements and/or shufflevectors to the IR, right? Here we reach a strange place where bigger IR is "more canonical" than smaller IR? I'm still holding out hope for option #2 (the earlier draft of this patch) because: It's just ~2 lines of code to catch a very specific optimization (turning a compare into a shift). The odds that the canonicalization is actually helpful for vector integer targets seem slim. In this draft, I've added a lot of code, and it's really only that one tiny shift transform that I can see as a benefit. It's just not worth the effort IMO.

chandlerc added inline comments.May 12 2016, 4:04 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	For #3, yes, we'd add an instruction which clobbers a lane with a constant. The challenging part of vectors is modeling partial constants. For almost all cases I wouldn't expect it to be worth it but for i1 vectors I think it might be OK. But I'm not sure yet. Regarding #2, I don't really understand why this is only for turning a compare into a shift. There are a lot of places in the code that match on ICMP_UGT only and not on ICMP_UGE... I'm also surprised actually that these are that related because the new patch only seems to touch UGE and UGT not SGE and SGT...

spatel added inline comments.May 12 2016, 4:33 PM

lib/Transforms/InstCombine/InstCombineCasts.cpp
985	I agree that there's lots of potential ICMP transforms, but I just don't see that in actual vector code. Maybe I'm just not imagining the possibilities well enough. Another potential problem is that we just don't have enough of the IR plumbing done so that it can trigger follow-on transforms (eg, https://llvm.org/bugs/show_bug.cgi?id=24942 ). The new patch (doing the canonicalization when possible but ignoring the edge cases) is enabled for all of UGE/ULE/SGE/SLE. Did I miss something? I am experiencing some weirdness in Phab from replying inline to a spot that only exists in the old version of the patch. :)

LGTM -- I think this is the best option we have for now.

I think we should add support for the ge/le icmp variants to any transforms that we end up with test cases for where we have a constant at a boundary here. This is no worse than completely removing the canonicalization, but lets us be lazy and allows the easy cases to still collapse to the same predictable structure.

lib/Transforms/InstCombine/InstCombineCompares.cpp
3145–3146 ↗	(On Diff #57102)	I'd not have the comment talk about a missing instsimplify pass as we'll likely fail to remove it when adding that...

This revision is now accepted and ready to land.May 12 2016, 7:53 PM

spatel added inline comments.May 13 2016, 7:05 AM

lib/Transforms/InstCombine/InstCombineCompares.cpp
3145–3146 ↗	(On Diff #57102)	Good point. I'll replace this with a blurb about the trade-offs that we discussed here. Thanks!

Closed by commit rL269426: [InstCombine] canonicalize* LE/GE vector integer comparisons to LT/GT… (authored by spatel). · Explain WhyMay 13 2016, 8:16 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineCasts.cpp

16 lines

test/

Transforms/

InstCombine/

vec_sext.ll

28 lines

Diff 49754

lib/Transforms/InstCombine/InstCombineCasts.cpp

Show First 20 Lines • Show All 970 Lines • ▼ Show 20 Lines	Instruction InstCombiner::transformSExtICmp(ICmpInst ICI, Instruction &CI) {
// Don't bother if Op1 isn't of vector or integer type.		// Don't bother if Op1 isn't of vector or integer type.
if (!Op1->getType()->isIntOrIntVectorTy())		if (!Op1->getType()->isIntOrIntVectorTy())
return nullptr;		return nullptr;

if (Constant *Op1C = dyn_cast<Constant>(Op1)) {		if (Constant *Op1C = dyn_cast<Constant>(Op1)) {
// (x <s 0) ? -1 : 0 -> ashr x, 31 -> all ones if negative		// (x <s 0) ? -1 : 0 -> ashr x, 31 -> all ones if negative
// (x >s -1) ? -1 : 0 -> not (ashr x, 31) -> all ones if positive		// (x >s -1) ? -1 : 0 -> not (ashr x, 31) -> all ones if positive
if ((Pred == ICmpInst::ICMP_SLT && Op1C->isNullValue()) \|\|		if ((Pred == ICmpInst::ICMP_SLT && Op1C->isNullValue()) \|\|
(Pred == ICmpInst::ICMP_SGT && Op1C->isAllOnesValue())) {		(Pred == ICmpInst::ICMP_SGT && Op1C->isAllOnesValue()) \|\|

		// The following comparisons should only be present for vectors.
		// For scalar integers, the comparison should be canonicalized to one of
		// the above forms. We don't do that canonicalization for vectors
		// because vector ISAs may not have a full range of comparison
		// operators. This cmp+sext transform, however, will simplify the IR, so
		mehdi_aminiUnsubmitted Not Done Reply Inline Actions I usually see canonicalization as something that is target independent, and then after optimization the lowering can adjust toward something that the target will "like" (ultimately the SDAG legalizer is doing this). Any comment? mehdi_amini: I usually see canonicalization as something that is target independent, and then after…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions That sounds right to me in general. But are you suggesting that we should do the comparison canonicalizations for vector types too? Ie, (x <=s -1) --> (x <s 0). My code comment here is my assumption about why we don't do that - vector ISAs may have limited functionality, so some backends (x86 at least) would have to then undo that canonicalization. I see 2 reasons to justify this patch as written: The shift transform is an actual IR optimization, not just a canonicalization - it reduces the instruction count and/or complexity of the IR. We're already doing this (shift) transform for some comparisons, so we might as well include the other, related comparison operators where it can apply. spatel: That sounds right to me in general. But are you suggesting that we should do the comparison…
		hfinkelUnsubmitted Not Done Reply Inline Actions That sounds right to me in general. But are you suggesting that we should do the comparison canonicalizations for vector types too? Ie, (x <=s -1) --> (x <s 0). Yes. Having different canonicalization rules for vectors and scalars is confusing, and leads to problems like this. Keeping track of what we do, in general, is hard. Keeping track of different rules for scalars and vectors should be avoided if possible. My code comment here is my assumption about why we don't do that - vector ISAs may have limited functionality, so some backends (x86 at least) would have to then undo that canonicalization. Often times these things are just oversights, or were done a long time ago when the backend infrastructure was not as mature. I don't know how we ended up in this state on this particular issue, but given that matching this in the backend seems fairly easy (in either form), we should do that. hfinkel: > That sounds right to me in general. But are you suggesting that we should do the comparison…
		chandlercUnsubmitted Not Done Reply Inline Actions +1 to what Hal said. We should canonicalize vectors and scalars aggressively and solve problems that arise in the backend. We shouldn't add more code to match non-canonical patterns in instcombine, we should fix the canonicalization. chandlerc: +1 to what Hal said. We should canonicalize vectors and scalars aggressively and solve…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions Some day I'll get canonicalization right. :) I've drafted a patch to do as suggested here, but now I'm wondering if there's another reason not to treat vectors and scalars the same for the purposes of icmp with constant: there's a fundamental difference between vector and scalar icmp ops. Example: icmp uge i32 %x, 0 --> true icmp uge <2 x i32> %x, <i32 0, i32 42> --> ? We want to subtract 1 from the constant operand and canonicalize to 'ugt' here. We don't have to worry about the first case in InstCombiner::visitICmpInst() because InstSimplify takes care of those kinds of edge cases. That doesn't exist for vectors (although it probably should for a splatted constant vector). Is it still worth doing this canonicalization if it doesn't work in general? spatel: Some day I'll get canonicalization right. :) I've drafted a patch to do as suggested here, but…
		chandlercUnsubmitted Not Done Reply Inline Actions This is a great question. I'm not sure. I can see at least three options: Canonicalize to ugt when we can, but not always, much as your code does here. Teach passes that assume we canonicalized on ugt to consider the uge case for vectors. Drop all canonicalizing to ugt, and audit the code that matches on ugt to also match on uge. Canonicalize the vector case to ugt as well by replacing the boundary elements with undef, and then inserting a 'true' over the resulting i1 lane coming out of the icmp. While #1 is clearly the simplest and safest approach, I'm not sure what the right long term solution is here. #3 seems really appealing if it works, but I'm worried that it the insertion of the true will end up causing problems downstream. Might be worth some experimentation though. chandlerc: This is a great question. I'm not sure. I can see at least three options: 1) Canonicalize to…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions For #3, we would have to add insertelements and/or shufflevectors to the IR, right? Here we reach a strange place where bigger IR is "more canonical" than smaller IR? I'm still holding out hope for option #2 (the earlier draft of this patch) because: It's just ~2 lines of code to catch a very specific optimization (turning a compare into a shift). The odds that the canonicalization is actually helpful for vector integer targets seem slim. In this draft, I've added a lot of code, and it's really only that one tiny shift transform that I can see as a benefit. It's just not worth the effort IMO. spatel: For #3, we would have to add insertelements and/or shufflevectors to the IR, right? Here we…
		chandlercUnsubmitted Not Done Reply Inline Actions For #3, yes, we'd add an instruction which clobbers a lane with a constant. The challenging part of vectors is modeling partial constants. For almost all cases I wouldn't expect it to be worth it but for i1 vectors I think it might be OK. But I'm not sure yet. Regarding #2, I don't really understand why this is only for turning a compare into a shift. There are a lot of places in the code that match on ICMP_UGT only and not on ICMP_UGE... I'm also surprised actually that these are that related because the new patch only seems to touch UGE and UGT not SGE and SGT... chandlerc: For #3, yes, we'd add an instruction which clobbers a lane with a constant. The challenging…
		spatelAuthorUnsubmitted Not Done Reply Inline Actions I agree that there's lots of potential ICMP transforms, but I just don't see that in actual vector code. Maybe I'm just not imagining the possibilities well enough. Another potential problem is that we just don't have enough of the IR plumbing done so that it can trigger follow-on transforms (eg, https://llvm.org/bugs/show_bug.cgi?id=24942 ). The new patch (doing the canonicalization when possible but ignoring the edge cases) is enabled for all of UGE/ULE/SGE/SLE. Did I miss something? I am experiencing some weirdness in Phab from replying inline to a spot that only exists in the old version of the patch. :) spatel: I agree that there's lots of potential ICMP transforms, but I just don't see that in actual…
		// we always do it.
		//
		// (x <=s -1) ? -1 : 0 -> ashr x, 31 -> all ones if negative
		// (x >=s 0) ? -1 : 0 -> not (ashr x, 31) -> all ones if positive
		(Pred == ICmpInst::ICMP_SLE && Op1C->isAllOnesValue()) \|\|
		(Pred == ICmpInst::ICMP_SGE && Op1C->isNullValue())) {
Value *Sh = ConstantInt::get(Op0->getType(),		Value *Sh = ConstantInt::get(Op0->getType(),
Op0->getType()->getScalarSizeInBits()-1);		Op0->getType()->getScalarSizeInBits()-1);
Value *In = Builder->CreateAShr(Op0, Sh, Op0->getName()+".lobit");		Value *In = Builder->CreateAShr(Op0, Sh, Op0->getName()+".lobit");
if (In->getType() != CI.getType())		if (In->getType() != CI.getType())
In = Builder->CreateIntCast(In, CI.getType(), true/SExt/);		In = Builder->CreateIntCast(In, CI.getType(), true/SExt/);

if (Pred == ICmpInst::ICMP_SGT)		// Invert the sign bit if the comparison was checking for 'is positive'.
		if (Pred == ICmpInst::ICMP_SGT \|\| Pred == ICmpInst::ICMP_SGE)
In = Builder->CreateNot(In, In->getName()+".not");		In = Builder->CreateNot(In, In->getName()+".not");
return replaceInstUsesWith(CI, In);		return replaceInstUsesWith(CI, In);
}		}
}		}

if (ConstantInt *Op1C = dyn_cast<ConstantInt>(Op1)) {		if (ConstantInt *Op1C = dyn_cast<ConstantInt>(Op1)) {
// If we know that only one bit of the LHS of the icmp can be set and we		// If we know that only one bit of the LHS of the icmp can be set and we
// have an equality comparison with zero or a power of 2, we can transform		// have an equality comparison with zero or a power of 2, we can transform
▲ Show 20 Lines • Show All 939 Lines • Show Last 20 Lines

test/Transforms/InstCombine/vec_sext.ll

	Show All 37 Lines
	; CHECK-LABEL: @test1			; CHECK-LABEL: @test1
	; CHECK: ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>			; CHECK: ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
	; CHECK: xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>			; CHECK: xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
	; CHECK: sub nsw <4 x i32> zeroinitializer, %a			; CHECK: sub nsw <4 x i32> zeroinitializer, %a
	; CHECK: and <4 x i32> %b.lobit, %a			; CHECK: and <4 x i32> %b.lobit, %a
	; CHECK: and <4 x i32> %b.lobit.not, %sub			; CHECK: and <4 x i32> %b.lobit.not, %sub
	; CHECK: or <4 x i32> %0, %1			; CHECK: or <4 x i32> %0, %1
	}			}

				;;; PR26701: https://llvm.org/bugs/show_bug.cgi?id=26701

				; Signed-less-than-or-equal to -1 is the same operation as above: smear the sign bit.

				define <2 x i32> @is_negative(<2 x i32> %a) {
				%cmp = icmp sle <2 x i32> %a, <i32 -1, i32 -1>
				%sext = sext <2 x i1> %cmp to <2 x i32>
				ret <2 x i32> %sext

				; CHECK-LABEL: @is_negative(
				; CHECK-NEXT: ashr <2 x i32> %a, <i32 31, i32 31>
				; CHECK-NEXT: ret <2 x i32>
				}

				; Signed-greater-than-or-equal to 0 is 'not' of the same operation as above.

				define <2 x i32> @is_positive(<2 x i32> %a) {
				%cmp = icmp sge <2 x i32> %a, zeroinitializer
				%sext = sext <2 x i1> %cmp to <2 x i32>
				ret <2 x i32> %sext

				; CHECK-LABEL: @is_positive(
				; CHECK-NEXT: [[SHIFT:%[a-zA-Z0-9.]+]] = ashr <2 x i32> %a, <i32 31, i32 31>
				; CHECK-NEXT: xor <2 x i32> [[SHIFT]], <i32 -1, i32 -1>
				; CHECK-NEXT: ret <2 x i32>
				}