We can see chains of insertelement instructions before SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector.
This should solve PR29126:
https://llvm.org/bugs/show_bug.cgi?id=29126
Differential D23886
[InstCombine] fold insertelement of constant into shuffle with constant operand (PR29126) spatel on Aug 25 2016, 12:03 PM. Authored by
Details We can see chains of insertelement instructions before SSE/AVX scalar intrinsics, so this is a first step towards shrinking that to a single shufflevector. This should solve PR29126:
Diff Detail Event TimelineComment Actions This seems vaguely risky: if the backend isn't smart enough to pattern-match an arbitrary shuffle into a cheap shuffle + insert, you could end up making the code slower. (For example, I'm not sure the x86 backend can decompose an <8 x i16> shuffle into a punpcklwd+pinsrw in general.)
Comment Actions Let me write up some more tests and pass them on to a few different targets. For the x86 cases I looked at, this appeared to always be a win, but I didn't try any i16 tests.
Comment Actions This leads to a question that was raised in D22114. Which is canonical: a shufflevector with a select-equivalent mask or a select with a constant condition operand? Given: define <4 x i8> @hoo(<4 x i8> %x) { %y = shufflevector <4 x i8> %x, <4 x i8> <i8 undef, i8 5, i8 6, i8 7>, <4 x i32><i32 0, i32 7, i32 6, i32 5> ; lane-changing ret <4 x i8> %y } Should we transform to: define <4 x i8> @hoo(<4 x i8> %x) { %y = shufflevector <4 x i8> %x, <4 x i8> <i8 undef, i8 7, i8 6, i8 5>, <4 x i32><i32 0, i32 5, i32 6, i32 7> ; lane-preserving ret <4 x i8> %y } or: define <4 x i8> @hoo(<4 x i8> %x) { %y = select <4 x i1> <i1 true, i1 false, i1 false, i1 false>, <4 x i8> %x, <4 x i8> <i8 undef, i8 7, i8 6, i8 5> ret <4 x i8> %y } IR should be canonical, so we should add transforms to make one of the latter the preferred form? Comment Actions Probably a good idea to bring up the shuffle vs select discussion on llvmdev, to get more visibility; it will have a substantial impact on backends. Comment Actions Patch updated: Even without that step in place, I think we can push this patch forward by limiting the transform to select-equivalent shuffles.
Comment Actions This is looking much better.
|
I think you can write the body of this loop as "int EltVal = Shuf.getMaskValue(i); if (EltVal != -1 && EltVal != i && EltVal != i + Vecsize) return false;".