This is an archive of the discontinued LLVM Phabricator instance.

[x86] avoid false dependency stall on 'sbb' with same source reg
ClosedPublic

Authored by spatel on Feb 2 2022, 1:08 PM.

Details

Summary

This is effectively inverting the transform added with D116804 because the downside of the false dependency of something like "sbb %eax, %eax" is much greater than the upside of eliminating a zeroing instruction on (all?) Intel CPUs.

Diff Detail

Event Timeline

spatel created this revision.Feb 2 2022, 1:08 PM
spatel requested review of this revision.Feb 2 2022, 1:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2022, 1:08 PM
spatel added inline comments.Feb 2 2022, 1:10 PM
llvm/test/CodeGen/X86/sbb-false-dep.ll
30

I'm not sure why this is a mov $0, but it is the same as with D116804 reverted.

Is the getSBBZero function you're modifying not in tree yet?

spatel added a comment.Feb 2 2022, 2:00 PM

Is the getSBBZero function you're modifying not in tree yet?

Sorry - yes, I made that a local preliminary/NFC patch. I just pushed it up to main:
f523e83b204ea3e4ab80df6b

I think I prefer the opposite tuning flag polarity. It's not truely a false dep. It's more of a missed special case. The semantics of SBB imply the sources are read.

I think I prefer the opposite tuning flag polarity. It's not truely a false dep. It's more of a missed special case. The semantics of SBB imply the sources are read.

+1 TuningSBBDepBreaking flag makes more sense as an opt-in optimization

spatel updated this revision to Diff 405689.Feb 3 2022, 9:50 AM

Patch updated:

  1. Inverted polarity of the CPU attribute - default assumes we do not have the sbb idiom (we should zero the register).
  2. Added a test file to check that the CPU attribute is on the right set of CPUs (Agner's guide says it is effectively any 64-bit AMD CPU).

We get more diffs than were created by D116804 because this is adding a zero op to every SETCC_CARRY node rather than just the ones that were altered by the earlier patch.

There might be a way to reduce the diffs, but I haven't found the predicate yet. Safer to err on the side of executing an unnecessary zero op rather than missing a false dependency stall?

spatel edited the summary of this revision. (Show Details)Feb 3 2022, 9:52 AM
craig.topper added inline comments.Feb 3 2022, 10:55 AM
llvm/lib/Target/X86/X86.td
448

I'm not sure about the use of the word "zero" in this. I liked TuningSBBDepBreaking

spatel updated this revision to Diff 405743.Feb 3 2022, 12:18 PM

Patch updated:
Changed feature name from "HasSBBZeroIdiom" to "HasSBBDepBreaking".

RKSimon accepted this revision.Feb 6 2022, 4:08 AM

LGTM - cheers

This revision is now accepted and ready to land.Feb 6 2022, 4:08 AM
This revision was landed with ongoing or failed builds.Feb 7 2022, 7:13 AM
This revision was automatically updated to reflect the committed changes.
craig.topper added inline comments.Feb 7 2022, 1:13 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
468

This has a -Wparentheses warning with gcc.

spatel added inline comments.Feb 7 2022, 2:11 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
468

Thanks!
be059a1263c6