binop (splat X), (splat C) --> splat (binop X, C)
binop (splat C), (splat X) --> splat (binop C, X)
We do this in IR, and there's a similar fold for the case with 2 non-constant operands just above the code diff in this patch.
This was discussed in D79718, and the extra shuffle in the test (llvm/test/CodeGen/X86/vector-fshl-128.ll::sink_splatvar) where it was noticed disappears because demanded elements analysis is no longer blocked. The large majority of the test diffs seem to be benign code scheduling changes, but I do see another type of win: moving the splat later allows binop narrowing in some cases. I don't see any obvious regressions; those were avoided on x86 and ARM with the INSERT_VECTOR_ELT restriction.
Something weird happened here I think. We appear to be loading a constant as an immediate and moving it into a vector. Then we AND it with something that was just ANDed with a constant pool. Could the two ANDs using a single constant pool?