This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Combine UNPCKL with vector_shuffle into UNPCKH to save one instruction for sext from v16i8 to v16i16 and v8i16 to v8i32.
ClosedPublic

Authored by congh on Nov 5 2015, 2:52 PM.

Details

Summary

This patch is enabling combining UNPCKL with vector_shuffle that moves the upper half of a vector into the lower half, into a UNPCKH instruction. For example:

t2: v16i8 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t1,
    undef:v16i8
t3: v16i8 = X86ISD::UNPCKL undef:v16i8, t2

will be combined to:

t3: v16i8 = X86ISD::UNPCKH undef:v16i8, t1

Diff Detail

Repository
rL LLVM

Event Timeline

congh updated this revision to Diff 39423.Nov 5 2015, 2:52 PM
congh retitled this revision from to [X86][SSE] Combine UNPCKL with vector_shuffle into UNPCKH to save one instruction for sext from v16i8 to v16i16 and v8i16 to v8i32..
congh updated this object.
congh added reviewers: hfinkel, RKSimon, dexonsmith, davidxl.
congh updated this object.
congh added a subscriber: llvm-commits.
RKSimon edited edge metadata.Nov 6 2015, 3:05 AM

Please can you either add tests for 256-bit shuffles or (if you can't get this to occur) limit the combine to 128-bit vectors only?

congh added a comment.Nov 6 2015, 10:57 AM

Please can you either add tests for 256-bit shuffles or (if you can't get this to occur) limit the combine to 128-bit vectors only?

This combine is not for AVX or couldn't happen on 256-bit: SSE4.1 provides pmovsxbw and pmovsxwd which are better than unpack+shift. As I could not compose test cases for 256-bit vectors, I still added a check of 128-bit vector in this patch. Thank you for the advice!

congh updated this revision to Diff 39563.Nov 6 2015, 10:58 AM
congh edited edge metadata.

Update the patch according to Simon's comment by checking 128-bit vectors.

congh updated this revision to Diff 39564.Nov 6 2015, 10:59 AM

Correct the patch.

RKSimon accepted this revision.Nov 13 2015, 10:10 AM
RKSimon edited edge metadata.

Sorry for the delay, I've been away this week. I have a couple of minor queries but overall I'm happy with this patch (and the code improvements) - thank you.

lib/Target/X86/X86ISelLowering.cpp
22774 ↗(On Diff #39564)

Are you just seeing cases with ISD::VECTOR_SHUFFLE? I'd have expected some to be using X86 shuffle nodes as well.

22782 ↗(On Diff #39564)

auto ShufOp = Op1.getOperand(0);

This revision is now accepted and ready to land.Nov 13 2015, 10:10 AM
congh marked an inline comment as done.Nov 13 2015, 11:05 AM

Thank you very much for the review, Simon!

lib/Target/X86/X86ISelLowering.cpp
22774 ↗(On Diff #39564)

Yes. I could not see X86's shuffle from my experiments in which integers are promoted.

This revision was automatically updated to reflect the committed changes.