This is an archive of the discontinued LLVM Phabricator instance.

[X86] Fold AND(SRL(X,Y),1) -> SETCC(BT(X,Y))
ClosedPublic

Authored by RKSimon on Apr 1 2022, 4:02 AM.

Details

Summary

As noticed on PR39174, if we're extracting a single non-constant bit, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Diff Detail

Event Timeline

RKSimon created this revision.Apr 1 2022, 4:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2022, 4:02 AM
RKSimon requested review of this revision.Apr 1 2022, 4:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2022, 4:02 AM

As noticed on PR39174, if we're extracting a single non-constant bit, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Why using ECX register is a slow shift?

As noticed on PR39174, if we're extracting a single non-constant bit, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Why using ECX register is a slow shift?

Just that we have to move the shift amount to ECX for the shift ops, which can have side effects on register pressure/allocation.

pengfei accepted this revision.Apr 1 2022, 7:31 AM

As noticed on PR39174, if we're extracting a single non-constant bit, then try to use BT+SETCC instead to avoid messing around moving the shift amount to the ECX register, using slow x86 shift ops etc.

Why using ECX register is a slow shift?

Just that we have to move the shift amount to ECX for the shift ops, which can have side effects on register pressure/allocation.

LGTM. I thought that, but the word slow made me think if there's any other issue I don't know :)

This revision is now accepted and ready to land.Apr 1 2022, 7:31 AM
This revision was landed with ongoing or failed builds.Apr 1 2022, 8:08 AM
This revision was automatically updated to reflect the committed changes.

What about the case where we want to have the inverse of the bit, https://godbolt.org/z/sEjq9n9Kn
I would presume we can consume any such not by inverting the predicate (b<->nb).

What about the case where we want to have the inverse of the bit, https://godbolt.org/z/sEjq9n9Kn
I would presume we can consume any such not by inverting the predicate (b<->nb).

np - we already do something like that in some of the other X86ISD::BT lowering cases - I'll take a look next week