This is an archive of the discontinued LLVM Phabricator instance.

[instcombine][x86] Converted pdep/pext with shifted mask to simple arithmetic
ClosedPublic

Authored by reames on Sep 17 2020, 3:20 PM.

Details

Summary

If the mask of a pdep or pext instruction is a shift masked (i.e. one contiguous block of ones) we need at most one and and one shift to represent the operation without the intrinsic. One all platforms I know of, this is faster than the pdep/pext.

The cost modelling for multiple contiguous blocks might be worth exploring in a follow up, but it's not relevant for my current use case. It would almost certainly be a win on AMDs where these are really really slow though.

Diff Detail

Event Timeline

reames created this revision.Sep 17 2020, 3:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 17 2020, 3:20 PM
reames requested review of this revision.Sep 17 2020, 3:20 PM
This revision is now accepted and ready to land.Sep 18 2020, 10:07 AM
anna accepted this revision.Sep 18 2020, 11:36 AM

thanks for this Philip!

This revision was landed with ongoing or failed builds.Sep 18 2020, 2:55 PM
This revision was automatically updated to reflect the committed changes.