This patch changes how LLVM handles the accumulator/start value
in the reduction, by never ignoring it regardless of the presence of
fast-math flags on callsites. This change introduces the following
new intrinsics to replace the existing ones:
llvm.experimental.vector.reduce.fadd -> llvm.experimental.vector.reduce.v2.fadd llvm.experimental.vector.reduce.fmul -> llvm.experimental.vector.reduce.v2.fmul
and adds functionality to auto-upgrade existing LLVM IR and bitcode.
Regardless of IsPairwiseForm, this will compute an unordered reduction cost, just for two different reduction strategies. Just passing FMF.allowReassoc() here wouldn't be meaningful. We'd need a separate flag to indicate ordered reductions.