The BMI version of BEXTR takes a bit count and shift value packed into a register. TBM passes it as an immediate. Currently we have a DAG combine that creates BEXTR from shift and mask if either BMI or TBM is supported.
For the BMI case this means we have to move the immediate into a register first and then do the BEXTR. So its always 2 instructions. The shift and mask we replaced would have also been 2 instructions. On Intel hardware, BEXTR is 2 uops according to Agner Fog's tables so that means we probably went from 2 uops to do shift and mask, to 3 uops to move the immediate and do the BEXTR. So I doubt this is a win.
This patch disables the combine for BMI and leaves it only for TBM.
I'm trying to figure out if we can move the TBM version to isel as we are currently using BEXTR when we could zero extend AH/BH/CH/DH. The latter is handled by isel patterns while the BEXTR is handled as a DAG combine.