This is an archive of the discontinued LLVM Phabricator instance.

[LV] Remove nondeterminacy by changing LoopVectorizationLegality::Reductions from DenseMap to MapVector
ClosedPublic

Authored by wmi on Jan 27 2020, 8:46 AM.

Download Raw Diff

Details

Reviewers

davidxl
fhahn
Ayal
gilr

Commits

rGf60671f049bc: [LV] Remove nondeterminacy by changing LoopVectorizationLegality::Reductions…

Summary

The iteration order of LoopVectorizationLegality::Reductions matters for the final code generation, so we better use MapVector instead of DenseMap for it to remove the nondeterminacy. reduction-order.ll in the patch is an example reduced from the case we saw. In the output of opt command, the order of the select instructions in the vector.body block keeps changing from run to run currently.

Diff Detail

Repository: rL LLVM

Event Timeline

wmi created this revision.Jan 27 2020, 8:46 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 27 2020, 8:46 AM

Herald added a subscriber: rkruppe. · View Herald Transcript

LGTM thanks. I've added a few nits to the test.

llvm/test/Transforms/LoopVectorize/reduction-order.ll
5	If the test is x86 specific, please move to LoopVectorize/X86. But it is probably enough to drop the triple and pass -force-vector-width= and -force-vector-interleave directly.
11	Given that test explicitly cares about ordering I think it would be better to include the operands for the adds as well.
35	Is this chain needed? also the test is probably more robust if you use concrete values instead of undef. Same for the loop condition.
46	Metadata not needed?

This revision is now accepted and ready to land.Jan 27 2020, 8:59 AM

wmi marked 5 inline comments as done.Jan 27 2020, 9:26 AM

wmi added inline comments.

llvm/test/Transforms/LoopVectorize/reduction-order.ll
35	Ah, I thought bugpoint should already remove all the unnecessary instructions but seems not. I try and find the chain is not needed. Thanks for pointing that out.

Address Florian's comments.

Out of curiosity - does this test case fail reliably with LLVM_REVERSE_ITERATION enabled (before the patch to fix the iteration order)? (or with it disabled, etc) or is it noisy failing (pretty even % of the time?) even without that?

Closed by commit rGf60671f049bc: [LV] Remove nondeterminacy by changing LoopVectorizationLegality::Reductions… (authored by wmi). · Explain WhyJan 27 2020, 4:56 PM

This revision was automatically updated to reflect the committed changes.

Culprit is the following from LoopVectorize.cpp:

// Finally, if tail is folded by masking, introduce selects between the phi
// and the live-out instruction of each reduction, at the end of the latch.
if (CM.foldTailByMasking()) {
  Builder.setInsertPoint(VPBB);
  auto *Cond = RecipeBuilder.createBlockInMask(OrigLoop->getHeader(), Plan);
  for (auto &Reduction : *Legal->getReductionVars()) {
    VPValue *Phi = Plan->getVPValue(Reduction.first);
    VPValue *Red = Plan->getVPValue(Reduction.second.getLoopExitInstr());
    Builder.createNaryOp(Instruction::Select, {Cond, Red, Phi});
  }
}

Hence small estimated trip-count based on branch weights in the above test is used to trigger foldTailByMasking. Another, more direct alternative is to specify -prefer-predicate-over-epilog.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Vectorize/

LoopVectorizationLegality.h

2 lines

test/

Transforms/

LoopVectorize/

reduction-order.ll

48 lines

Diff 240608

llvm/include/llvm/Transforms/Vectorize/LoopVectorizationLegality.h

Show First 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	LoopVectorizationLegality(
LoopInfo LI, OptimizationRemarkEmitter ORE,		LoopInfo LI, OptimizationRemarkEmitter ORE,
LoopVectorizationRequirements R, LoopVectorizeHints H, DemandedBits *DB,		LoopVectorizationRequirements R, LoopVectorizeHints H, DemandedBits *DB,
AssumptionCache *AC)		AssumptionCache *AC)
: TheLoop(L), LI(LI), PSE(PSE), TTI(TTI), TLI(TLI), DT(DT),		: TheLoop(L), LI(LI), PSE(PSE), TTI(TTI), TLI(TLI), DT(DT),
GetLAA(GetLAA), ORE(ORE), Requirements(R), Hints(H), DB(DB), AC(AC) {}		GetLAA(GetLAA), ORE(ORE), Requirements(R), Hints(H), DB(DB), AC(AC) {}

/// ReductionList contains the reduction descriptors for all		/// ReductionList contains the reduction descriptors for all
/// of the reductions that were found in the loop.		/// of the reductions that were found in the loop.
using ReductionList = DenseMap<PHINode *, RecurrenceDescriptor>;		using ReductionList = MapVector<PHINode *, RecurrenceDescriptor>;

/// InductionList saves induction variables and maps them to the		/// InductionList saves induction variables and maps them to the
/// induction descriptor.		/// induction descriptor.
using InductionList = MapVector<PHINode *, InductionDescriptor>;		using InductionList = MapVector<PHINode *, InductionDescriptor>;

/// RecurrenceSet contains the phi nodes that are recurrences other than		/// RecurrenceSet contains the phi nodes that are recurrences other than
/// inductions and reductions.		/// inductions and reductions.
using RecurrenceSet = SmallPtrSet<const PHINode *, 8>;		using RecurrenceSet = SmallPtrSet<const PHINode *, 8>;
▲ Show 20 Lines • Show All 266 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/reduction-order.ll

This file was added.

				; RUN: opt -loop-vectorize -S < %s 2>&1 \| FileCheck %s
				; RUN: opt -passes='loop-vectorize' -S < %s 2>&1 \| FileCheck %s

				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-grtev4-linux-gnu"
				fhahnUnsubmitted Done Reply Inline Actions If the test is x86 specific, please move to LoopVectorize/X86. But it is probably enough to drop the triple and pass -force-vector-width= and -force-vector-interleave directly. fhahn: If the test is x86 specific, please move to LoopVectorize/X86. But it is probably enough to…

				; Make sure the selects generated from reduction are always emitted
				; in deterministic order.
				; CHECK-LABEL: @foo(
				; CHECK: vector.body:
				; CHECK: %[[VAR1:.*]] = add <4 x i32>
				fhahnUnsubmitted Done Reply Inline Actions Given that test explicitly cares about ordering I think it would be better to include the operands for the adds as well. fhahn: Given that test explicitly cares about ordering I think it would be better to include the…
				; CHECK-NEXT: = add <4 x i32>
				; CHECK-NEXT: = add <4 x i32>
				; CHECK-NEXT: = add <4 x i32>
				; CHECK-NEXT: %[[VAR2:.*]] = add <4 x i32>
				; CHECK: select <4 x i1> {{.*}}, <4 x i32> %[[VAR2]], <4 x i32>
				; CHECK-NEXT: select <4 x i1> {{.*}}, <4 x i32> %[[VAR1]], <4 x i32>
				; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body
				;
				define internal i64 @foo(i32* %t0) !prof !1 {
				t16:
				br label %t20

				t17: ; preds = %t20
				%t18 = phi i32 [ %t24, %t20 ]
				%t19 = phi i32 [ %t28, %t20 ]
				br label %t31

				t20: ; preds = %t20, %t16
				%t21 = phi i64 [ 0, %t16 ], [ %t29, %t20 ]
				%t22 = phi i32 [ undef, %t16 ], [ %t28, %t20 ]
				%t23 = phi i32 [ 0, %t16 ], [ %t24, %t20 ]
				%t24 = add i32 undef, %t23
				%t25 = add i32 %t22, undef
				%t26 = add i32 %t25, undef
				fhahnUnsubmitted Done Reply Inline Actions Is this chain needed? also the test is probably more robust if you use concrete values instead of undef. Same for the loop condition. fhahn: Is this chain needed? also the test is probably more robust if you use concrete values instead…
				wmiAuthorUnsubmitted Done Reply Inline Actions Ah, I thought bugpoint should already remove all the unnecessary instructions but seems not. I try and find the chain is not needed. Thanks for pointing that out. wmi: Ah, I thought bugpoint should already remove all the unnecessary instructions but seems not. I…
				%t27 = add i32 %t26, undef
				%t28 = add i32 %t27, undef
				%t29 = add nuw nsw i64 %t21, 1
				%t30 = icmp eq i64 %t29, undef
				br i1 %t30, label %t17, label %t20, !prof !2

				t31:
				ret i64 undef
				}

				!0 = !{!"clang version google3-trunk (fe5f233a938f5bc31c458c39cca54d7dcc2667ef)"}
				fhahnUnsubmitted Done Reply Inline Actions Metadata not needed? fhahn: Metadata not needed?
				!1 = !{!"function_entry_count", i64 801}
				!2 = !{!"branch_weights", i32 746, i32 1}