This is an archive of the discontinued LLVM Phabricator instance.

[X86, SSE] instcombine common cases of insertps intrinsics into shuffles
ClosedPublic

Authored by spatel on Apr 4 2015, 2:21 PM.

Details

Summary

This is very similar to D8486 (vperm2). If we treat insertps intrinsics as shufflevectors, we can optimize them better.

I've left all but the full zero case of the zero mask variants out of this patch. I don't think those can be converted into a single shuffle in all cases, but I'd be happy to be proven wrong as I was for vperm2f128.

Either way, we'd need to support whatever sequence we come up with for those cases in the backend before converting them here.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 23251.Apr 4 2015, 2:21 PM
spatel retitled this revision from to [X86, SSE] instcombine common cases of insertps intrinsics into shuffles.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: andreadb, craig.topper, RKSimon.
spatel added a subscriber: Unknown Object (MLST).
majnemer added inline comments.
lib/Transforms/InstCombine/InstCombineCalls.cpp
202 ↗(On Diff #23251)

please make this auto *CInt so that it is more obvious that CInt is a pointer.

spatel updated this revision to Diff 23252.Apr 4 2015, 2:35 PM

Patch updated:
Fixed the 'auto' to 'auto *' to make the pointer-ness explicit. Also fixed the line that I copy-pasted from. :)

RKSimon edited edge metadata.Apr 5 2015, 7:41 AM

This is looking pretty good. If the reason you haven't used the zmask more is to avoid the need for multiple shuffle stages is it worthwhile checking if the zmask (only) overrides the insertion destination, or cases where the 2 operands point to the same variable?

spatel added a comment.Apr 5 2015, 3:40 PM

This is looking pretty good. If the reason you haven't used the zmask more is to avoid the need for multiple shuffle stages is it worthwhile checking if the zmask (only) overrides the insertion destination, or cases where the 2 operands point to the same variable?

I thought about the case where the zmask overrides the insert, but I figured that was pretty far-fetched. I didn't consider the case where both inputs are the same. Let me know if you think those are worth chasing as stand-alone cases or if it's better to just solve the zmask case in general in the backend. Even if the oddball cases are worthy, I'd prefer to solve them in a follow-on patch just for the sake of patch minimalism.

This is looking pretty good. If the reason you haven't used the zmask more is to avoid the need for multiple shuffle stages is it worthwhile checking if the zmask (only) overrides the insertion destination, or cases where the 2 operands point to the same variable?

I thought about the case where the zmask overrides the insert, but I figured that was pretty far-fetched. I didn't consider the case where both inputs are the same. Let me know if you think those are worth chasing as stand-alone cases or if it's better to just solve the zmask case in general in the backend. Even if the oddball cases are worthy, I'd prefer to solve them in a follow-on patch just for the sake of patch minimalism.

I have ended up using both in downcoding, mainly due to register pressure problems:

  • zeroing out lanes using insertps as we don't have a spare register for the xorps/blendps pattern
  • reuse of the first operand - shuffle X00X

Its the second case that I think would be particular useful here but I'd be happy to see any improvements in a follow up patch.

Ping - can't tell if the last comment was an implicit 'LGTM'. :)

RKSimon accepted this revision.Apr 15 2015, 4:50 PM
RKSimon edited edge metadata.

LGTM - getting those alternative shuffles (especially the X00X style) would be nice to have in any future patch but not necessary here.

This revision is now accepted and ready to land.Apr 15 2015, 4:50 PM
This revision was automatically updated to reflect the committed changes.

Thanks, Simon - r235124.

I've put further insertps improvements on my list, so should have more patches in the near future.