This addresses the "TODO: handle reductions when tail is folded by masking" left by D50480.
When folding the tail (scalar leftover iterations) by introducing an additional (masked) vector iteration at the end, first note that
the header phi of a reduction can remain intact - it takes care of "accumulating the partial sums" of a reduction across all iterations except for the last one. It is only this last iteration which now needs to accumulate these partial sums under a mask, effecting the live-out values. This can be accomplished by introducing a Select instruction, choosing between the last and penultimate (i.e., header phi) values of the partial sums.