This is more a discussion item than an actual patch right now. Curious to know what folks think about profitability on various targets.
Our current lowering of a reduction in the vectorizer creates the starting vector value as a splat of the identity element and then inserts the original scalar start as the low element. An alternate choice would be to instead use a splat of the identity element, and defer handling the scalar start until the end of the loop.
RISC-V has an interesting quirk to the reduction instructions where the start value of the recurrence must be provided. As a result, using the deferred-add strategy fits slightly better with the ISA, and generally allows us to get rid of one scalar to vector insert. Its worth noting this comes at the cost of extending the live range of the scalar start value.
I glanced at AArch64 for comparison, and all of the unordered reductions appear to have a single vector operand. So, in this case the deferred add strategy would cost us a longer live range and an extra scalar add. This doesn't appear profitable unless I'm missing something.
What do our other targets prefer here? Is this a case we should just have a target hook, or is there something smarter we can do heuristic wise?