This is an archive of the discontinued LLVM Phabricator instance.

[x86] instcombine more cases of insertps into a shufflevector
ClosedPublic

Authored by spatel on Apr 24 2015, 10:44 AM.

Details

Summary

This is a follow-on to D8833 (insertps optimization when the zero mask is not used).

In this patch, we check for the case where the zmask is used, but both input vectors to the insertps intrinsic are the same operand. This lets us replace the 2nd shuffle input operand with the zero vector.

I confirmed that the x86 backend generates the expected insertps instructions for the shuffles created here.

Diff Detail

Event Timeline

spatel updated this revision to Diff 24399.Apr 24 2015, 10:44 AM
spatel retitled this revision from to [x86] instcombine more cases of insertps into a shufflevector.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: RKSimon, andreadb, craig.topper.
spatel added a subscriber: Unknown Object (MLST).
RKSimon edited edge metadata.Apr 24 2015, 11:13 AM

Thanks Sanjay, a minor suggested change to support a (dumb) edge case that zeros out the inserted lane.

lib/Transforms/InstCombine/InstCombineCalls.cpp
232

A minor change would allow (silly) cases where the 2nd operand insertion has been zero'd away:

if ((II.getArgOperand(0) == II.getArgOperand(1)) || (Zmask & (1 << DestLane)))

spatel updated this revision to Diff 24408.Apr 24 2015, 12:43 PM
spatel edited edge metadata.

Thanks, Simon!

Patch updated:
I had the zmask override in my sites, but I didn't recognize that it was just going to be one more check in the 'if'. :)
Added test case to verify the zmask override even with 2 different input vectors.

RKSimon accepted this revision.Apr 25 2015, 6:58 AM
RKSimon edited edge metadata.

LGTM - thanks Sanjay.

This revision is now accepted and ready to land.Apr 25 2015, 6:58 AM
This revision was automatically updated to reflect the committed changes.