This is intended to restore horizontal codegen to what it looked like before IR demanded elements improved in:
rL343727
As noted in PR39195:
https://bugs.llvm.org/show_bug.cgi?id=39195
...horizontal ops can be worse for performance than a shuffle+regular binop, so I've added a TODO. Ideally, we'd solve that in a machine instruction pass, but a quicker solution may be adding a 'HasSlowHorizontalOp' feature/bug bit to deal with it here in the DAG.