Page MenuHomePhabricator

[X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C))

Authored by craig.topper on Sep 14 2018, 1:19 PM.



MOVMSK only care about the sign bit so we don't need the setcc to fill the whole element with 0s/1s. We can just shift the bit we're looking for into the sign bit. This saves a constant pool load.

Inspired by PR38840.

Diff Detail


Event Timeline

craig.topper created this revision.Sep 14 2018, 1:19 PM
lebedev.ri added inline comments.
38825 ↗(On Diff #165579)

That should probably be &&

Or I failed to get the file in the diff at all.

RKSimon added inline comments.Sep 14 2018, 3:01 PM
1128 ↗(On Diff #165588)

Are these and masks necessary - we're only using the signbit?

1221 ↗(On Diff #165588)

Fix AVX1 case too?

Would D38128 help with this btw?

craig.topper added inline comments.Sep 14 2018, 3:12 PM
1128 ↗(On Diff #165588)

No. I guess SimplifyDemandedBits can't make it through the lowered v16i8 shift. Probably due to the bitcasts or the promotion of and to v2i64?

Not sure how D38128 would help. Isn't that about register allocation?

Punt on vXi8 types for now. Add FIXMEs.

Not sure how D38128 would help. Isn't that about register allocation?

Sorry - that was supposed to be for D52109

RKSimon accepted this revision.Sep 15 2018, 3:51 AM

LGTM with one minor

38823 ↗(On Diff #165601)

Safer to use >= 32 since movmsk vXi16 doesn't exist.

This revision is now accepted and ready to land.Sep 15 2018, 3:51 AM
This revision was automatically updated to reflect the committed changes.