binop (extelt X, C), (extelt Y, C) --> extelt (binop X, Y), C
This is a transform that has been considered for canonicalization (instcombine) in the past because it reduces instruction count. But as shown in the x86 tests, it's impossible to know if it's profitable without a cost model. There are many potential target constraints to consider.
We have implemented similar transforms in the backend (DAGCombiner and target-specific), but I don't think we have this exact fold there either (and if we did it in SDAG, it wouldn't work across blocks).
Note: this patch was intended to handle the more general case where the extract indexes do not match, but it got too big, so I scaled it back to this pattern for now.
Hm, and if there is only one extract, and the other operand is constant,
we consider that vectorizing the constant in some way will always be costlier?
Just thinking out loud, not for this patch.