This patch recognizes the shuffle pattern we get from a
v8i64->v8i8 truncate when v8i64 isn't a legal type.
With VLX we can use two VTRUNCS, unpckldq, and a insert_subvector.
Paths
| Differential D68374
[X86] Add v32i8 shuffle lowering strategy to recognize two v4i64 vectors truncated to v4i8 and concatenated into the lower 8 bytes with undef/zero upper bytes. ClosedPublic Authored by craig.topper on Oct 2 2019, 11:18 PM.
Details Summary This patch recognizes the shuffle pattern we get from a With VLX we can use two VTRUNCS, unpckldq, and a insert_subvector.
Diff Detail Event Timelinecraig.topper added inline comments.
craig.topper added inline comments.
This revision is now accepted and ready to land.Oct 3 2019, 10:52 AM
Revision Contents
Diff 223050 llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/min-legal-vector-width.ll
llvm/test/CodeGen/X86/shuffle-vs-trunc-512.ll
|
I went with an approach that relied less on having two magic numbers mentioned. I wrote its in terms of Mask.size() even though we know that's 32 so that only the 8 that was already use above was mentioned again.