The iteration order of LoopVectorizationLegality::Reductions matters for the final code generation, so we better use MapVector instead of DenseMap for it to remove the nondeterminacy. reduction-order.ll in the patch is an example reduced from the case we saw. In the output of opt command, the order of the select instructions in the vector.body block keeps changing from run to run currently.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
LGTM thanks. I've added a few nits to the test.
llvm/test/Transforms/LoopVectorize/reduction-order.ll | ||
---|---|---|
5 | If the test is x86 specific, please move to LoopVectorize/X86. But it is probably enough to drop the triple and pass -force-vector-width= and -force-vector-interleave directly. | |
11 | Given that test explicitly cares about ordering I think it would be better to include the operands for the adds as well. | |
35 | Is this chain needed? also the test is probably more robust if you use concrete values instead of undef. Same for the loop condition. | |
46 | Metadata not needed? |
llvm/test/Transforms/LoopVectorize/reduction-order.ll | ||
---|---|---|
35 | Ah, I thought bugpoint should already remove all the unnecessary instructions but seems not. I try and find the chain is not needed. Thanks for pointing that out. |
Out of curiosity - does this test case fail reliably with LLVM_REVERSE_ITERATION enabled (before the patch to fix the iteration order)? (or with it disabled, etc) or is it noisy failing (pretty even % of the time?) even without that?
Culprit is the following from LoopVectorize.cpp:
// Finally, if tail is folded by masking, introduce selects between the phi // and the live-out instruction of each reduction, at the end of the latch. if (CM.foldTailByMasking()) { Builder.setInsertPoint(VPBB); auto *Cond = RecipeBuilder.createBlockInMask(OrigLoop->getHeader(), Plan); for (auto &Reduction : *Legal->getReductionVars()) { VPValue *Phi = Plan->getVPValue(Reduction.first); VPValue *Red = Plan->getVPValue(Reduction.second.getLoopExitInstr()); Builder.createNaryOp(Instruction::Select, {Cond, Red, Phi}); } }
Hence small estimated trip-count based on branch weights in the above test is used to trigger foldTailByMasking. Another, more direct alternative is to specify -prefer-predicate-over-epilog.
If the test is x86 specific, please move to LoopVectorize/X86. But it is probably enough to drop the triple and pass -force-vector-width= and -force-vector-interleave directly.