This is an archive of the discontinued LLVM Phabricator instance.

[x86] don't blindly transform SETB into SBB
ClosedPublic

Authored by spatel on Mar 4 2017, 1:35 PM.

Details

Summary

I noticed unnecessary 'sbb' instructions in D30472 and while looking at 'ptest' codegen recently. This happens because we were transforming any 'setb' - even when we only wanted a single-bit result.

This patch moves those transforms under visitAdd/visitSub, so we we're only creating sbb/adc when it is a win. I don't know why we need a SETCC_CARRY node type, but I'm not proposing to change that existing behavior in this patch.

[Also, I'm skeptical that sbb/adc are a win for all micro-arches, but I'm not volunteering to perf test a single-cycle difference for adds either!]

The test changes here are all cases where we no longer produce sbb/adc, except for the fake diffs in add-of-carry.ll and peep-setb.ll. I made those changes so we'd have a record of where the moved transforms are still firing (and supposedly doing good).

Diff Detail

Repository
rL LLVM

Event Timeline

spatel created this revision.Mar 4 2017, 1:35 PM
RKSimon added inline comments.Mar 9 2017, 10:01 AM
lib/Target/X86/X86ISelLowering.cpp
34117 ↗(On Diff #90590)

ScalarSizeInBits - or is that being too pessimistic?

test/CodeGen/X86/add-of-carry.ll
3 ↗(On Diff #90590)

The changes in this file don't look related to the patch.

test/CodeGen/X86/peep-setb.ll
3 ↗(On Diff #90590)

Again - not related to the patch

test/CodeGen/X86/sse42-intrinsics-x86.ll
98 ↗(On Diff #90590)

The stack usage is unfortunate.....

spatel added inline comments.Mar 9 2017, 11:28 AM
lib/Target/X86/X86ISelLowering.cpp
34117 ↗(On Diff #90590)

Not sure how there could be a difference: materializeSBB() only works with i1 or i8 (there's an assert in there). So I don't think there's any chance of vectors here at the moment. Did I understand the comment?

test/CodeGen/X86/add-of-carry.ll
3 ↗(On Diff #90590)

I tried to explain this in the summary - maybe not well enough though. :)
I'd like to have a record of where this transform is still firing along with this patch, so we understand what patterns are actually helped by adc/sbb (in theory). I guess a test comment would do a better job than these phantom diffs. I'll change that.

test/CodeGen/X86/sse42-intrinsics-x86.ll
98 ↗(On Diff #90590)

Agreed - not sure where that goes wrong yet.

spatel updated this revision to Diff 91203.Mar 9 2017, 11:52 AM

Patch updated (no code changes, but...):

  1. Changed phantom test diffs to code comments to better track the cases where this transform still fires.
  2. Rebased - we got more diffs from rL297249 and less from rL297404.
RKSimon accepted this revision.Mar 11 2017, 1:27 PM

LGTM - but there is still a lot to do to improve partial register stalls.....

test/CodeGen/X86/sse42-intrinsics-x86.ll
98 ↗(On Diff #90590)

Seems to be due to pcmpestri being hardwired to use eax/ecx and that xor ebx, ebx needs to be set before pcmpestri to extract the flags correctly. Its beyond the scope of this, but it could be done as:

movl $7, %eax
movl $7, %edx
pcmpestri $7, %xmm1, %xmm0
movl $0, %eax
setb %al
This revision is now accepted and ready to land.Mar 11 2017, 1:27 PM
This revision was automatically updated to reflect the committed changes.
xbolva00 added inline comments.
llvm/trunk/test/CodeGen/X86/setcc.ll
5

The comment needs to removed.

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2021, 8:54 AM
Herald added a subscriber: pengfei. · View Herald Transcript
spatel marked an inline comment as done.Apr 22 2021, 9:12 AM
spatel added inline comments.
llvm/trunk/test/CodeGen/X86/setcc.ll
5

Thanks - 11232037cc4a