I started drafting a patch for this as an add-on for simplify demanded elements that would have been similar to SimplifyMultipleUseDemandedBits(), but since I don't know if any more folds like this exist, it seems unnecessary to make a larger patch for this special case.
The problem arises because we (correctly, I think) bail out of SimplifyDemandedVectorElts() if a value has multiple uses. But then we call recognizeIdentityMask() which ignores undef shuffle mask elements. That means we might replace a shuffle with an insertelement that doesn't need to exist. At that point, there's no going back - we've lost the information (via the shuffle mask) that would have let any other transforms know that the insertelement was not actually needed. If we don't ignore undefs in recognizeIdentityMask(), we'll currently fail to simplify several other patterns, so chasing down all of those didn't seem like a better option.
This can manifest as bad codegen for things like horizontal ops (PR34111) because the bogus inserts in IR can't be ignored by the SLP vectorizer, and then the backend doesn't have enough pattern matching to see the redundancies:
define <4 x float> @add_ps_002(<4 x float> %a) { %a0 = extractelement <4 x float> %a, i32 0 %a1 = extractelement <4 x float> %a, i32 1 %a2 = extractelement <4 x float> %a, i32 2 %a3 = extractelement <4 x float> %a, i32 3 %a0_again = extractelement <4 x float> %a, i32 0 %a1_again = extractelement <4 x float> %a, i32 1 %a2_again = extractelement <4 x float> %a, i32 2 %a3_again = extractelement <4 x float> %a, i32 3 %add01 = fadd float %a0, %a1 %add23 = fadd float %a2, %a3 %add01_again = fadd float %a0_again, %a1_again %add23_again = fadd float %a2_again, %a3_again %out0 = insertelement <4 x float> undef, float %add01, i32 0 %out01 = insertelement <4 x float> %out0, float %add23, i32 1 %out012 = insertelement <4 x float> %out01, float %add01_again, i32 2 %out0123 = insertelement <4 x float> %out012, float %add23_again, i32 3 %shuffle = shufflevector <4 x float> %out0123, <4 x float> %a, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> ret <4 x float> %shuffle }
$ ./opt -instcombine -slp-vectorizer -instcombine -S hadd_bogus_undef.ll | ./llc -o - -mattr=avx
vmovshdup %xmm0, %xmm1 ## xmm1 = xmm0[1,1,3,3] <-- didn't need that... vaddss %xmm1, %xmm0, %xmm1 <--- or that... vhaddps %xmm0, %xmm0, %xmm0 vinsertps $32, %xmm1, %xmm0, %xmm0 ## xmm0 = xmm0[0,1],xmm1[0],xmm0[3] <-- or that