[x86] eliminate redundant shuffle of horizontal math ops when both inputs are the same
This is limited to a set of patterns based on the example in PR34111:
...but as I was investigating this, I see that horizontal patterns can go wrong in many,
many other ways that would not be handled by this patch. Each data type may even go
different in the DAG after starting with the same basic IR pattern, so even proper IR
canonicalization won't fix it all.
Differential Revision: https://reviews.llvm.org/D37357