When vectorising ordered reductions we call a function
LoopVectorizationPlanner::adjustRecipesForReductions to replace the
existing VPWidenRecipe for the fadd instruction with a new
VPReductionRecipe. We attempt to insert the new recipe in the same
place, but this is wrong because createBlockInMask may have
generated new recipes that VPReductionRecipe now depends upon. I
have changed the insertion code to append the recipe to the
VPBasicBlock instead.
Added a new RUN with tail-folding enabled to the existing test:
Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll
I think the new check lines from for.end here and in the other tests below can be removed, since we will never reach the scalar epilogue?