This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets.
It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions.
Paths
| Differential D7596
[X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2 ClosedPublic Authored by RKSimon on Feb 12 2015, 10:25 AM.
Details Summary This patch refactors the existing lowerVectorShuffleAsByteShift function to add support for 256-bit vectors on AVX2 targets. It also fixes a tablegen issue that prevented the lowering of vpslldq/vpsrldq vec256 instructions.
Diff Detail
Event TimelineRKSimon updated this object. Comment Actions I haven't looked at the patch in detail, but wondering if you plan to do anything about the AVX1 lowering of these shuffles. I would expect two 128-bit shifts for these cases in AVX1. Comment Actions The AVX1 splits fail because the x86 computeZeroableShuffleElements() call isn't recursive - so can't peek through nodes (mainly shuffles but concat / subvector etc, as well do crop up). I'm hoping to give that code an overhaul in the near future - plus there is nothing x86 specific about it and would be useful to provide for others targets, the DAGCombiner etc. to check for zeros/zeroables. If you want I can do that work first and then update this patch? Comment Actions No need to fix the AVX1 handling first. I was just making sure it was on your radar. I'd like to see this patch go in so we can remove more of the intrinsics.
Comment Actions Thanks Craig, I've fixed the minor issues - I'll put up a new patch shortly. Chandler has made some changes to computeZeroableShuffleElements() that means that the AVX1 versions now see the zeroable lanes and use vpslldq/vpsrldq (the xmm versions) so that looks like its sorted too.
chandlerc edited edge metadata. Comment ActionsFeel free to go ahead and submit this Simon! Thanks for working on it, and glad you saw that I found and squashed the AVX1 nonsense so all this should be working now. This revision is now accepted and ready to land.Feb 15 2015, 4:28 AM Closed by commit rL229311: [X86][AVX2] vpslldq/vpsrldq byte shifts for AVX2 (authored by RKSimon). · Explain WhyFeb 15 2015, 5:21 AM This revision was automatically updated to reflect the committed changes. Comment Actions Thanks guys, the final commit had me moving the v4i64 unpack shuffle matching after byte shift detection as it was interfering with single input shuffle matching - otherwise we were creating a zero register for no purpose.
Revision Contents
Diff 19843 lib/Target/X86/X86ISelLowering.cpp
lib/Target/X86/X86InstrSSE.td
test/CodeGen/X86/vector-shuffle-256-v16.ll
test/CodeGen/X86/vector-shuffle-256-v32.ll
test/CodeGen/X86/vector-shuffle-256-v4.ll
test/CodeGen/X86/vector-shuffle-256-v8.ll
|
llvm style is usually ++i on loop iterators. Same with a couple of the other loops.