This is effectively inverting the transform added with D116804 because the downside of the false dependency of something like "sbb %eax, %eax" is much greater than the upside of eliminating a zeroing instruction on (all?) Intel CPUs.
Details
Diff Detail
Unit Tests
Event Timeline
Sorry - yes, I made that a local preliminary/NFC patch. I just pushed it up to main:
f523e83b204ea3e4ab80df6b
I think I prefer the opposite tuning flag polarity. It's not truely a false dep. It's more of a missed special case. The semantics of SBB imply the sources are read.
Patch updated:
- Inverted polarity of the CPU attribute - default assumes we do not have the sbb idiom (we should zero the register).
- Added a test file to check that the CPU attribute is on the right set of CPUs (Agner's guide says it is effectively any 64-bit AMD CPU).
We get more diffs than were created by D116804 because this is adding a zero op to every SETCC_CARRY node rather than just the ones that were altered by the earlier patch.
There might be a way to reduce the diffs, but I haven't found the predicate yet. Safer to err on the side of executing an unnecessary zero op rather than missing a false dependency stall?
llvm/lib/Target/X86/X86.td | ||
---|---|---|
448 | I'm not sure about the use of the word "zero" in this. I liked TuningSBBDepBreaking |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
468 | This has a -Wparentheses warning with gcc. |
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
468 | Thanks! |
I'm not sure about the use of the word "zero" in this. I liked TuningSBBDepBreaking