This is an archive of the discontinued LLVM Phabricator instance.

[X86] Emit SBB instead of SETCC_CARRY from LowerSELECT. Break false dependency on the SBB input.
ClosedPublic

Authored by craig.topper on Dec 6 2018, 11:49 PM.

Details

Summary

I'm hoping we can just replace SETCC_CARRY with SBB. This is another step towards that.

I've explicitly used zero as the input to the setcc to avoid a false dependency that we've had with the SETCC_CARRY. I change one of the patterns that used NEG to instead use an explicit compare with 0 on the LHS. We needed the zero anyway to avoid the false dependency. The negate would clobber its input register. By using a CMP we can avoid that which could be useful.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Dec 6 2018, 11:49 PM

Hi Craig,

Overall the patch looks good for Intel processors. For AMD (at least Jaguar and other modern AMD processors), an SBB with same register operands is a dependency breaking instruction. To be more specific, an SBB with same reg operands depends on the CF only. So we don’t need a zero idiom XOR to explicitly break the data dependency on the input register operands. That means, the old codegen was better for AMD.

I don’t know what is the best approach in this case.
Maybe you can add a FIXME comment and raise a bug for it, so that we can address it later. It would allow you to commit this change in the meantime. Not sure if @RKSimon has a better idea.

test/CodeGen/X86/select.ll
790 ↗(On Diff #177131)

It is a shame that SBB with same registers is not treated specially by Intel hips.
On Jaguar (and other modern AMD processors), an SBB where register operands are the same is a dependency-breaking instruction.
That means, this XOR is redundant on Jaguar, and there is no need for it to break a dependency on EAX.

test/CodeGen/X86/vector-compare-any_of.ll
55 ↗(On Diff #177131)

Same “problem”. The original code was probably slightly better for Jaguar.

90 ↗(On Diff #177131)

same.

RKSimon added inline comments.Dec 10 2018, 4:13 AM
test/CodeGen/X86/pr35972.ll
10 ↗(On Diff #177131)

Now that the scheduler models are getting better at providing info on target-specific dependency breaking instructions, maybe a later pass should just cleanup things like this? I REALLY don't want yet another feature flag for this kind of thing.

How should we proceed here? Can we commit this patch and file a bug for AMD improvements?

How should we proceed here? Can we commit this patch and file a bug for AMD improvements?

LGTM - please can you raise a bug about using scheduler models to handle dependency breaking instructions.

RKSimon accepted this revision.Dec 12 2018, 3:53 AM
This revision is now accepted and ready to land.Dec 12 2018, 3:53 AM
andreadb accepted this revision.Dec 12 2018, 4:07 AM

LGTM too.

This revision was automatically updated to reflect the committed changes.