This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] SSE4A constant folding and conversion to shuffles.
ClosedPublic

Authored by RKSimon on Oct 1 2015, 8:17 AM.

Details

Summary

This patch improves support for combining the SSE4A EXTRQ(I) and INSERTQ(I) intrinsics:

1 - Converts INSERTQ/EXTRQ calls to INSERTQI/EXTRQI if the 'index' operand is constant
2 - Converts INSERTQI/EXTRQI calls to shufflevector if the bit index/length are both byte aligned (we can already lower shuffles to INSERTQI/EXTRQI if its useful)
3 - Constant folding support
4 - Add zeroinitializer handling

Michael - I've also removed some old INSERTQI 'bundling' code that attempted to merge 2 INSERTQI calls; this doesn't actually work as it assumed that we were inserting from the same source and that ranges could be merged together. This isn't true as we always insert the bottom bits from a source. Technically we could reduce this to a single case where both are inserting to the same destination index and then just have a single insertion of the maximum of the 2 lengths - if you think this is worth it I'll add back the code for that case only.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 36245.Oct 1 2015, 8:17 AM
RKSimon retitled this revision from to [InstCombine] SSE4A constant folding and conversion to shuffles..
RKSimon updated this object.
RKSimon added reviewers: andreadb, Bigcheese, spatel.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
andreadb edited edge metadata.Oct 6 2015, 8:37 AM

Hi Simon,

Overall, the patch seems reasonable.
However, I think it would be better if you send the fix for the wrong merging of pairs of insertqi calls as a separate patch.

I also noticed that your code converts an extrq/insertq with constant width and length into an extrqi/insertqi.
Have you considered the possibility of moving the logic that folds extrqi/insertqi into separate functions?
If you do that, then you could simplify 'extrq'/'insertq' in one step rather than going through the preliminar extra step of converting extrq/insertq to insertqi/extrqi. Basically what I am suggesting (if possible) is to change the code so that we simplify extrq/insertq in one step rather than in two steps.

-Andrea

RKSimon updated this revision to Diff 37265.Oct 13 2015, 10:00 AM
RKSimon edited edge metadata.

Thanks Andrea - I've updated the patch, moving the simplification code into helper functions. I've already committed the bugfix.

andreadb accepted this revision.Oct 13 2015, 10:17 AM
andreadb edited edge metadata.

LGTM. Thanks Simon!

This revision is now accepted and ready to land.Oct 13 2015, 10:17 AM
This revision was automatically updated to reflect the committed changes.