In most test changes this allows us to drop some broadcasts/shuffles.
I think i got the logic right (at least, i have already caught the obvious bugs i had..)
[DAGCombine][X86][ARM] EXTRACT_SUBVECTOR(VECTOR_SHUFFLE(?,?,Mask)) -> VECTOR_SHUFFLE(EXTRACT_SUBVECTOR(?, ?), EXTRACT_SUBVECTOR(?, ?), Mask')
lebedev.ri on Jun 11 2021, 3:31 PM.Authored by
Many remaining cases are rotates, with the pattern like:
Combining: t0: ch = EntryToken Optimized vector-legalized selection DAG: %bb.0 'splatvar_funnnel_v8i32:' SelectionDAG has 26 nodes: t0: ch = EntryToken t2: v8i32,ch = CopyFromReg t0, Register:v8i32 %0 t25: v2i64 = zero_extend_vector_inreg t45 t26: v4i32 = bitcast t25 t27: v8i32 = X86ISD::VSHL t2, t26 t40: v4i32 = BUILD_VECTOR Constant:i32<32>, Constant:i32<32>, Constant:i32<32>, Constant:i32<32> t38: v4i32 = sub t40, t45 t30: v2i64 = zero_extend_vector_inreg t38 t31: v4i32 = bitcast t30 t32: v8i32 = X86ISD::VSRL t2, t31 t21: v8i32 = or t27, t32 t10: ch,glue = CopyToReg t0, Register:v8i32 $ymm0, t21 t4: v8i32,ch = CopyFromReg t0, Register:v8i32 %1 t6: v8i32 = vector_shuffle<0,0,0,0,0,0,0,0> t4, undef:v8i32 t43: v4i32 = extract_subvector t6, Constant:i64<0> t47: v4i32 = BUILD_VECTOR Constant:i32<31>, Constant:i32<31>, Constant:i32<31>, Constant:i32<31> t45: v4i32 = and t43, t47 t11: ch = X86ISD::RET_FLAG t10, TargetConstant:i32<0>, Register:v8i32 $ymm0, t10:1
Let's suppose we start at t27: v8i32 = X86ISD::VSHL t2, t26, and demand the 0'th element of shift amount t26.
Looks like SimplifyMultipleUseDemandedBits() could theoretically deal with it,
What if the ZERO_EXTEND_VECTOR_INREG case is extended to just bitcast the source if we only need the 0th element and the upper source elements aliasing that 0th element is known to be zero?
Sorry for not getting back to this - I've been meaning to take a look at the x86 test changes - the fact that they are mostly from the same funnelshift/rotation lowering code suggests we've just missed something for splat values in there.