This is sort of a work in progress to try to see if we can work toward supporting D36454 and other cases where narrowing can give us smaller operations enabling EVEX->VEX.
There are really 2 components to this patch. Combining 2 layers of extract_subvectors/insert_subvectors. And moving a subvector extract through an operation to its inputs.
So far this successfully narrows the final reduction operation in the sad and madd tests. Ideally we'd narrow some of the earlier operations as well.
We also need to support matching horizontal binops from target shuffles in order to convert the final operation to HADD.
Should this be in DAGCombiner protected with a isExtractSubvectorCheap call?