This is an archive of the discontinued LLVM Phabricator instance.

[X86] Disable creating BEXTR from shift and mask operations with BMI. Only do it for TBM.
AbandonedPublic

Authored by craig.topper on Jul 29 2017, 11:25 PM.

Details

Reviewers
zvi
RKSimon
Summary

The BMI version of BEXTR takes a bit count and shift value packed into a register. TBM passes it as an immediate. Currently we have a DAG combine that creates BEXTR from shift and mask if either BMI or TBM is supported.

For the BMI case this means we have to move the immediate into a register first and then do the BEXTR. So its always 2 instructions. The shift and mask we replaced would have also been 2 instructions. On Intel hardware, BEXTR is 2 uops according to Agner Fog's tables so that means we probably went from 2 uops to do shift and mask, to 3 uops to move the immediate and do the BEXTR. So I doubt this is a win.

This patch disables the combine for BMI and leaves it only for TBM.

I'm trying to figure out if we can move the TBM version to isel as we are currently using BEXTR when we could zero extend AH/BH/CH/DH. The latter is handled by isel patterns while the BEXTR is handled as a DAG combine.

Diff Detail

Event Timeline

craig.topper created this revision.Jul 29 2017, 11:25 PM
RKSimon edited edge metadata.Jul 31 2017, 4:16 AM

What about BMI cases that reuse the constant?

That's a fair point. Is there a way we can detect that case?

That's a fair point. Is there a way we can detect that case?

It's tricky - maybe replace X86ISD::BEXTR with a X86ISD::BEXTRI that keeps the constant length/index separate to make it easier to split to SHR+AND in isel - would that help?

TBM usage of BEXTR is curious as well - annoyingly the instruction takes a 32-bit immediate, wasting code size. In some circumstances its better to use the reg form and reuse the value.....

And TBM is also useful to replace AND + low bit mask (index = 0) as it makes folding a load easier and is non-destructive, but if the length % 8 = 0 then its better to fold to movz.

If we create X86ISD::BEXTRI with split immediates. How would we create the combined immediate to select BEXTRI instruction. Maybe a custom inserter?

If we create X86ISD::BEXTRI with split immediates. How would we create the combined immediate to select BEXTRI instruction. Maybe a custom inserter?

Not sure - I'm struggling to think of something that will deal with all the useful cases where we should still try to use BEXTR (foldable loads on BMI + multiple uses on BMI/TBM).

craig.topper abandoned this revision.Aug 8 2017, 9:31 AM