This is an archive of the discontinued LLVM Phabricator instance.

[LV] Remove nondeterminacy by changing LoopVectorizationLegality::Reductions from DenseMap to MapVector
ClosedPublic

Authored by wmi on Jan 27 2020, 8:46 AM.

Details

Summary

The iteration order of LoopVectorizationLegality::Reductions matters for the final code generation, so we better use MapVector instead of DenseMap for it to remove the nondeterminacy. reduction-order.ll in the patch is an example reduced from the case we saw. In the output of opt command, the order of the select instructions in the vector.body block keeps changing from run to run currently.

Diff Detail

Repository
rL LLVM

Event Timeline

wmi created this revision.Jan 27 2020, 8:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2020, 8:46 AM
Herald added a subscriber: rkruppe. · View Herald Transcript
fhahn accepted this revision.Jan 27 2020, 8:59 AM
fhahn added reviewers: Ayal, gilr.
fhahn added a subscriber: fhahn.

LGTM thanks. I've added a few nits to the test.

llvm/test/Transforms/LoopVectorize/reduction-order.ll
5

If the test is x86 specific, please move to LoopVectorize/X86. But it is probably enough to drop the triple and pass -force-vector-width= and -force-vector-interleave directly.

11

Given that test explicitly cares about ordering I think it would be better to include the operands for the adds as well.

35

Is this chain needed? also the test is probably more robust if you use concrete values instead of undef. Same for the loop condition.

46

Metadata not needed?

This revision is now accepted and ready to land.Jan 27 2020, 8:59 AM
wmi marked 5 inline comments as done.Jan 27 2020, 9:26 AM
wmi added inline comments.
llvm/test/Transforms/LoopVectorize/reduction-order.ll
35

Ah, I thought bugpoint should already remove all the unnecessary instructions but seems not. I try and find the chain is not needed. Thanks for pointing that out.

wmi updated this revision to Diff 240616.Jan 27 2020, 9:28 AM
wmi marked an inline comment as done.

Address Florian's comments.

Out of curiosity - does this test case fail reliably with LLVM_REVERSE_ITERATION enabled (before the patch to fix the iteration order)? (or with it disabled, etc) or is it noisy failing (pretty even % of the time?) even without that?

This revision was automatically updated to reflect the committed changes.
Ayal added a comment.Jan 30 2020, 12:54 AM

Culprit is the following from LoopVectorize.cpp:

// Finally, if tail is folded by masking, introduce selects between the phi
// and the live-out instruction of each reduction, at the end of the latch.
if (CM.foldTailByMasking()) {
  Builder.setInsertPoint(VPBB);
  auto *Cond = RecipeBuilder.createBlockInMask(OrigLoop->getHeader(), Plan);
  for (auto &Reduction : *Legal->getReductionVars()) {
    VPValue *Phi = Plan->getVPValue(Reduction.first);
    VPValue *Red = Plan->getVPValue(Reduction.second.getLoopExitInstr());
    Builder.createNaryOp(Instruction::Select, {Cond, Red, Phi});
  }
}

Hence small estimated trip-count based on branch weights in the above test is used to trigger foldTailByMasking. Another, more direct alternative is to specify -prefer-predicate-over-epilog.