This mirrors a similar shufflevector transformation so the same
effect is obtained for scalable vectors. The transformation is
only performed when it can be proven the number of resulting
reversals is not increased. By bubbling the reversals from operand
to result this should typically be the case and ideally leads to
back-back shuffles that can be elimitated entirely.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
2340 | Would it be possible to modify one of the existing tests to have flags so we can correctly defend their propagation? We now have code paths where we depend upon the fast-math flags being set correctly on the select by the loop vectoriser. | |
2354 | I don't know how far you want to go with this, but you can support more than just splats for the false and true vals. For example, we can also do the transformation when FVal is an arbitrary fixed-width vector constant, i.e. define <4 x i32> @select_reverse(<4 x i1> %a, <4 x i32> %b, <4 x i32> %c) { %a.rev = tail call <4 x i1> @llvm.experimental.vector.reverse.v4i1(<4 x i1> %a) %b.rev = tail call <4 x i32> @llvm.experimental.vector.reverse.v4i32(<4 x i32> %b) %select = select <4 x i1> %a.rev, <4 x i32> %b.rev, <4 x i32> <i32 0, i32 1, i32 2, i32 3> ret <4 x i32> %select } |
Fixed typo in comment and converted reverse_select_reverse to show flags get preserved.
llvm/lib/Transforms/InstCombine/InstCombineSelect.cpp | ||
---|---|---|
2340 | I've updated reverse_select_reverse. | |
2350 | The only situation where the combine can introduce more reversals than the original input IR is when all operands have more than one use. So if one or more operands have a single use then we can be sure the output will be the same or better. For the case where the number of reversals is the same the hope is that at some point the reversals will bubble to a point where back-back reversals are seen with both then removed. | |
2354 | vector.reverse is not typically used for fixed length vectors as such intrinsics should be converted to shufflevector instructions where the existing combines will kick in. For scalable vectors is there an idiom outside of splats where the operand can be rewritten without introducing an explicit reversal? |
Would it be possible to modify one of the existing tests to have flags so we can correctly defend their propagation? We now have code paths where we depend upon the fast-math flags being set correctly on the select by the loop vectoriser.