This patch adds a hook to TTI for choosing scalarized shuffle-reduction as opposed to vectorized shuffle-reduction sequence for reduction idiom.
Allows generation:
%0 = extractelement <4 x float> %bin.rdx, i32 0 %1 = extractelement <4 x float> %bin.rdx, i32 1 %res = fadd fast float %0, %1
Instead of
%rdx.shuf1 = shufflevector <4 x float> %bin.rdx, <4 x float> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef> %bin.rdx2 = fadd fast <4 x float> %bin.rdx, %rdx.shuf1 %res = extractelement <4 x float> %bin.rdx2, i32 0
Hi Simon,
This patch reflects your suggestion on https://reviews.llvm.org/D45393