This is a partial fix for:
...as seen in the integer test, we still need to correct the result when using the existing (old) horizontal op matching function because it does not model the way x86 256-bit horizontal ops return results (each 128-bit half is its own horizontal-op). A potential follow-up change for that is discussed in the bug report.
This generally duplicates a lot of the existing matching code, but we can't just remove that without introducing regressions, so the existing code is renamed and used less often. I'm hoping to get some kind of fix for the miscompile bug in before the release.
I have a follow-up patch that addresses the 'TODO' comment about allowing non-matching vector sizes between the extracts and build vector.