This patch updates the FADDPv* patterns that only use the lower half of
the result register. For those patterns, the second operand does not
matter because its results won't be used.
Instead of introducing new implicit defs for those operands, just use
the first operand. The problem with using new implicit defs is that
register allocation can introduce unnecessary dependencies by using a
different register than the first operand.
For motivating cases, see the changes in the fadd_reduction_*_in_loop
cases. Without this change, the first faddp in the loop has an
unnecessary additional dependency through v0, which is also used for
a cross-iteration reduction.
This can noticeable impact performance. For slightly bigger loops,
this change can improve performance by 15%.