The llvm.experimental.vector.reduce.fadd/fmul intrinsic expansions were ignoring the accumulator scalar argument which should be added/multiplied to the scalar value from the vector reduction.
For (fast) shuffle reductions we should be able to apply the accumulator at the end of the sequence.