This is an archive of the discontinued LLVM Phabricator instance.

[X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN.
ClosedPublic

Authored by ab on Feb 11 2016, 6:08 PM.

Details

Summary

Currently, we sometimes miscompile this vector pattern:

(c ? -v : v)

We lower it to (because "c" is <4 x i1>, lowered as a vector mask):

(~c & v) | (c & -v)

When we have SSSE3, we incorrectly lower that to PSIGN, which does:

(c < 0 ? -v : c > 0 ? v : 0)

in other words, when c is either all-ones or all-zero:

(c ? -v : 0)

While this is an old bug, it rarely triggers because the PSIGN combine
is too sensitive to operand order. This will be improved separately.

Note that the PSIGN tests are also incorrect.
Consider test/CodeGen/X86/vec-sign.ll:

%b.lobit = ashr <4 x i32> %b, <i32 31, i32 31, i32 31, i32 31>
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = xor <4 x i32> %b.lobit, <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = and <4 x i32> %a, %0
%2 = and <4 x i32> %b.lobit, %sub
%cond = or <4 x i32> %1, %2
ret <4 x i32> %cond

if %b is zero:

%b.lobit = <4 x i32> zeroinitializer
%sub = sub nsw <4 x i32> zeroinitializer, %a
%0 = <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>
%1 = <4 x i32> %a
%2 = <4 x i32> zeroinitializer
%cond = or <4 x i32> %a, zeroinitializer
ret <4 x i32> %a

whereas we currently generate:

psignd %xmm1, %xmm0
retq

which returns 0, as %xmm1 is 0.

Instead of directly using c as a mask, avoid the zero case by setting
any bit (other than the sign bit). This lets PSIGN default to the
positive case, while not changing the "negative" (all ones) case.
With that, the generated sequence correctly implements:

(c ? -v : v)

Fixes PR26110.

Diff Detail

Repository
rL LLVM

Event Timeline

ab updated this revision to Diff 47760.Feb 11 2016, 6:08 PM
ab retitled this revision from to [X86] Don't turn (c?-v:v) into (c?-v:0) by blindly using PSIGN..
ab updated this object.
ab added subscribers: RKSimon, spatel, craig.topper and 3 others.
ab added inline comments.Feb 11 2016, 6:12 PM
test/CodeGen/X86/vec-sign.ll
7 ↗(On Diff #47760)

This isn't very helpful; I'll try to look into getting some CP verbose printing.

(c ? -v : v)

I think that we can improve the SSE2 (no psign) codegen and have the SSSE3 solution avoid psign completely by using a variant of:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate

From what I can tell, the SSE2 savings are one integer logic op + a move. The SSSE3 case would have 3 simple integer logic ops rather than load/or/psign. I don't think it's worth chasing. Side note: it's been 10 years since SSSE3 (Merom) came out...can we change the default x86 subtarget now from Yonah/SSE2?

lib/Target/X86/X86ISelLowering.cpp
26334 ↗(On Diff #47760)

This comment is misleading. We know we have a 'select' kind of operation, but if we don't have SSE4, then we're going to bail out because we don't actually have the x86 blendv .

26354 ↗(On Diff #47760)

getConstant() is magic. There are no code comments to tell you this, but it can do the splat for you. :)

ab updated this revision to Diff 48007.Feb 15 2016, 1:36 PM

Use logic instead of PSIGN.

ab updated this revision to Diff 48009.Feb 15 2016, 1:45 PM

Simplify (srl c, 31) to reuse intermediate result: (srl (sra c, 31), 31).

Noticed after uploading the patch, of course ;)

ab added a comment.Feb 15 2016, 1:50 PM

(c ? -v : v)

I think that we can improve the SSE2 (no psign) codegen and have the SSSE3 solution avoid psign completely by using a variant of:
https://graphics.stanford.edu/~seander/bithacks.html#ConditionalNegate

From what I can tell, the SSE2 savings are one integer logic op + a move. The SSSE3 case would have 3 simple integer logic ops rather than load/or/psign. I don't think it's worth chasing.

You're right! Here's a simple adaptation of that.

This also means that we can completely remove PSIGN, as this was the only user in the codebase. I'll do that when we agree.

Side note: it's been 10 years since SSSE3 (Merom) came out...can we change the default x86 subtarget now from Yonah/SSE2?

Heh, I would love to see that, but I guess that the hardware vendors (Sony, Apple?) changed it already, and most of the others (distros?) want to stick to the minimal meaning of "x86_64".

-Ahmed

spatel added inline comments.Feb 15 2016, 4:07 PM
lib/Target/X86/X86ISelLowering.cpp
26481–26484 ↗(On Diff #48009)

Double-check me to make sure, but we can do one better I think:
((X ^ M) + (M & 1))
((X ^ M) - (M)) <--- since we know that M is all 1s (ie, -1), change the 'add 1' to 'sub -1'

ab updated this revision to Diff 48038.Feb 15 2016, 4:29 PM

Simplify further by subtracting the mask.

ab marked 4 inline comments as done.Feb 15 2016, 4:30 PM
ab added inline comments.
lib/Target/X86/X86ISelLowering.cpp
26475–26478 ↗(On Diff #48038)

Ah yes, beautiful!

In D17181#353035, @ab wrote:

This also means that we can completely remove PSIGN, as this was the only user in the codebase. I'll do that when we agree.

Great! I didn't think it was worth the effort to optimize the codegen, but we did better than I initially thought we could. And deleting an X86-specific node is better still. psign...quite an instruction. :)

Side note: it's been 10 years since SSSE3 (Merom) came out...can we change the default x86 subtarget now from Yonah/SSE2?

Heh, I would love to see that, but I guess that the hardware vendors (Sony, Apple?) changed it already, and most of the others (distros?) want to stick to the minimal meaning of "x86_64".

My memory was off - Yonah had SSE3:
https://en.wikipedia.org/wiki/Yonah_%28microprocessor%29

In any case, it looks like we default to "Core2" as our CPU model which actually is Merom, but then we limit it to SSE2. Seems odd. A Darwin x86 target required SSE3 from the start. There might be something to chase down here.

spatel accepted this revision.Feb 15 2016, 4:52 PM
spatel added a reviewer: spatel.

LGTM.

This revision is now accepted and ready to land.Feb 15 2016, 4:52 PM
This revision was automatically updated to reflect the committed changes.
ab marked an inline comment as done.