This patch adds a combine to attempt to reduce the costs of certain
select-shuffle patterns. The form of code it attempts to detect is:
%x = shuffle ... %y = shuffle ... %a = binop %x, %y %b = binop %x, %y shuffle %a, %b, selectmask
A classic select-mask will pick items from each lane of a or b. These
do not always have a great lowering on many architectures. This patch
attempt to pack a and b into the lower elements, creating a different
shuffle for reconstructing the orignal which may be better than the
select mask. This can be better for performance, especially if less
elements of a and b need to be computed and the input shuffles are
cheaper.
Because select-masks are just one form of shuffle, we generalize to any
mask. So long as the backend has decent costmodel for the shuffles, this
can generally improve things when they come up. For more basic cost
models, the folds do not appear to be profitable, not getting past the
cost checks.
nit of -> if