[x86] lower extracted fadd/fsub to horizontal vector math
This would show up if we fix horizontal reductions to narrow as they go along,
but it's an improvement for size and/or Jaguar (fast-hops) independent of that.
We need to do this late to not interfere with other pattern matching of larger
We can extend this to integer ops in a follow-up patch.
Differential Revision: https://reviews.llvm.org/D56011