This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes.
ClosedPublic

Authored by craig.topper on Oct 2 2019, 11:18 PM.

Details

Summary

This patch recognizes the shuffle pattern we get from a
v8i64->v8i8 truncate when v8i64 isn't a legal type.

With VLX we can use two VTRUNCS, unpckldq, and a insert_subvector.

Diff Detail

Event Timeline

craig.topper created this revision.Oct 2 2019, 11:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 2 2019, 11:18 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon added inline comments.Oct 3 2019, 4:00 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
15539

isSequentialOrUndefInRange(Mask, 0, 8, 0, 8) ?

15542

Zeroable.extractBits(16, 8).isAllOnesValue() ?

craig.topper marked an inline comment as done.Oct 3 2019, 9:36 AM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
15542

That should have been 8-32. I guess I had 24 bits in my head and wrote the wrong end.

Use simpler checks instead of loops.

craig.topper marked an inline comment as done.Oct 3 2019, 10:14 AM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
15538

I went with an approach that relied less on having two magic numbers mentioned. I wrote its in terms of Mask.size() even though we know that's 32 so that only the 8 that was already use above was mentioned again.

RKSimon accepted this revision.Oct 3 2019, 10:52 AM

LGTM - cheers

This revision is now accepted and ready to land.Oct 3 2019, 10:52 AM