This is an archive of the discontinued LLVM Phabricator instance.

[x86] add SBB optimization for SETAE (uge) condition code
ClosedPublic

Authored by spatel on Jun 26 2017, 4:07 PM.

Details

Summary

x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants.

To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. DAGCombiner already has the foundation to allow the transforms, so we just need to fill in the holes for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here.

Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs

Earlier steps in this series:
rL306040
rL306072

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Jun 26 2017, 4:07 PM
zvi added inline comments.Jun 26 2017, 11:44 PM
test/CodeGen/X86/sbb.ll
149 ↗(On Diff #104035)

AFAIK, the processor does avoid false-dependency stalls for the case 'sbb EAX EAX' (unlike xor's and sub's).
So we may want to ensure a 'xor EAX EAX' is placed before.

zvi added inline comments.Jun 26 2017, 11:47 PM
test/CodeGen/X86/sbb.ll
149 ↗(On Diff #104035)

Typo fix: the processor does not avoid false-dependency stalls for the case 'sbb EAX EAX'

spatel added inline comments.Jun 27 2017, 6:43 AM
test/CodeGen/X86/sbb.ll
149 ↗(On Diff #104035)

Yes, I think that might be a concern. I took a quick look at this, but it's not a small fix from what I can tell:

  1. We can't put the xor in the tablegen pat/def because we can't interfere with the EFLAGS result coming from the cmp (the xor must occur before the cmp).
  2. We can't use the existing hasUndefRegUpdate() as-is because it's specifically checking operand 1 (and I'm not sure if it would account for #1 either).

The xor for the setcc code is produced by the x86-fixup-setcc pass. We'd probably need to graft into that or create another similar pass?

spatel added a comment.Jul 3 2017, 9:09 AM

Ping.

According to Agner's latest docs:
http://www.agner.org/optimize/microarchitecture.pdf

All recent AMD uarch *do* recognize sbb with the same operands as an idiom. See Chapters 18/19/20:
"The following instructions are recognized as independent of the input when both operands are the same register:
XOR, SUB, SBB (depends on carry flag only)..."

That would further the case that we want a later pass to insert the xor based on CPU model. Adding those as part of isel would just be code bloat for AMD.

zvi edited edge metadata.Jul 3 2017, 10:07 AM

I would like to run some internal benchmarks on this patch. Are you ok with waiting a couple more days?

In D34652#798420, @zvi wrote:

I would like to run some internal benchmarks on this patch. Are you ok with waiting a couple more days?

Sure - no problem. I haven't been able to see anything over noise level in my testing. I'll start looking at the later fixup passes to see if I can account for this case with the existing machinery.

zvi accepted this revision.Jul 4 2017, 7:01 AM

LGTM. Internal testing did not reveal any performance impact for better or worse. Thanks.

This revision is now accepted and ready to land.Jul 4 2017, 7:01 AM
spatel added a comment.Jul 4 2017, 8:40 AM
In D34652#798894, @zvi wrote:

LGTM. Internal testing did not reveal any performance impact for better or worse. Thanks.

Thanks for checking! FWIW, this patch increases the likelihood of producing the sbb. However, if a 'naked sbb' is a problem, then I think that pattern already exists anytime we create X86ISD::SETCC_CARRY directly or with materializeSBB(). So we should have the fixup pass independent of this patch assuming that the false dependency stall is a problem.

This revision was automatically updated to reflect the committed changes.