This is an archive of the discontinued LLVM Phabricator instance.

[X86] Emit more efficient >= comparisons against 0
ClosedPublic

Authored by majnemer on Aug 18 2015, 11:29 PM.

Details

Summary

We don't do a great job with >= 0 comparisons against zero when the
result is used as an i8.

Given something like:
void f(long long LL, bool *B) {

*B = LL >= 0;

}

We used to generate:

shrq    $63, %rdi
xorb    $1, %dil
movb    %dil, (%rsi)

Now we generate:

testq   %rdi, %rdi
setns   (%rsi)

Diff Detail

Event Timeline

majnemer updated this revision to Diff 32504.Aug 18 2015, 11:29 PM
majnemer retitled this revision from to [X86] Emit more efficient >= comparisons against 0.
majnemer updated this object.
majnemer added reviewers: chandlerc, resistor.
majnemer added a subscriber: llvm-commits.
spatel added a subscriber: spatel.Aug 19 2015, 7:44 AM
mbodart added inline comments.Aug 19 2015, 9:53 AM
lib/Target/X86/X86ISelLowering.cpp
24077

It would be nice to have a brief comment here describing the general pattern.
Something like:

Attempt to transform signedness tests similar to (i8)(X >= 0), appearing in DAG form
as for example (xor i8 trunc(A i32 >> 31), 1), into setcc A SETGT -1.

24078

I would think we also want to suppress this if the xor has more than one user.

24113

This is fine. But just curious, is there a reason you chose SETGT -1 over SETGE 0?

majnemer added inline comments.Aug 19 2015, 10:10 AM
lib/Target/X86/X86ISelLowering.cpp
24077

Sure, will do!

24078

It might be a little early for me (no coffee, etc.), why is it undesirable to use a test/setns sequence if there is more than one user?

24113

We seem to normalize it to that: http://llvm.org/docs/doxygen/html/X86ISelLowering_8cpp_source.html#l03732

Using SETGE 0 works equally well but results in a test/setge sequence. The sequence is equivalent but also requires remembering that test sets OF to 0.

mbodart added inline comments.Aug 19 2015, 10:28 AM
lib/Target/X86/X86ISelLowering.cpp
24078

More than one user likley means the xor is still live.

So the transformation would just add more instructions, without removing any.

I'm a bit new to LLVM, so maybe there's some larger context I'm not seeing.
If so, please shed some light.

24113

OK, thanks for the explanation.

majnemer added inline comments.Aug 19 2015, 12:15 PM
lib/Target/X86/X86ISelLowering.cpp
24078

My understanding is that this would replace all users of the xor to instead use the setcc which would mean no additional instructions would get generated. However, I think it makes sense to check to see if the truncate and shift both have one use.

majnemer updated this revision to Diff 32595.Aug 19 2015, 12:24 PM
  • Address review comments
spatel added inline comments.Aug 19 2015, 12:51 PM
lib/Target/X86/X86ISelLowering.cpp
24077–24120

This function description is too specific now.

24089

typo: "aginst"

24107

typo: "ammount"

test/CodeGen/X86/cmp.ll
215

B is not used in this test.

215–224

Add tests for an i1 output, i16 input, and i64 input?

majnemer updated this revision to Diff 32602.Aug 19 2015, 1:08 PM
  • Address review comments
  • Address Sanjay's review comments.
mbodart added inline comments.Aug 19 2015, 1:26 PM
lib/Target/X86/X86ISelLowering.cpp
24078

Guess my coffee wasn't strong enough either ;-)
Yes, the trunc/shift usage counts were what I meant.

spatel added inline comments.Aug 19 2015, 1:37 PM
lib/Target/X86/X86ISelLowering.cpp
24112

It would be great to add the earlier review comment about the difference between SETGT -1 and SETGE 0 here in the code comment. Save someone a little wondering down the line. :)

I have no further comments.

Thanks, David!

This revision was automatically updated to reflect the committed changes.