This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
-
pr33706.ll

Differential D35227

[LV] Don't allow outside uses of IVs if the SCEV is predicated on loop conditions
ClosedPublic

Authored by mkuper on Jul 10 2017, 4:26 PM.

Download Raw Diff

Details

Reviewers

mssimpso
gilr

Commits

rGfdb46b2fb4d2: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop…
rL307837: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop…

Summary

This fixes PR33706.

I'm still not 100% sure the PSCEV actually makes sense here. Sanjoy, if you could take a look (even post-commit) at the original PR, that would be great.

Diff Detail

Event Timeline

mkuper created this revision.Jul 10 2017, 4:26 PM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptJul 10 2017, 4:26 PM

Hi Michael,

I'm probably missing something obvious here. For external IV uses, we compute the end IV value assuming we're coming from the vector loop. But if we executed the vector loop, shouldn't the SCEV predicate have been true, which would mean that the PSEV was a safe assumption?

In D35227#805164, @mssimpso wrote:

Hi Michael,

I'm probably missing something obvious here. For external IV uses, we compute the end IV value assuming we're coming from the vector loop. But if we executed the vector loop, shouldn't the SCEV predicate have been true, which would mean that the PSEV was a safe assumption?

You're not necessarily missing something obvious, I may be misunderstanding what PSCEV does here. If I understand correctly, PSCEV adds the assumption the IV doesn't overflow inside the loop. In this case, what I see is that it overflows on loop exit. So, either:
a) I misunderstand what PSCEV is doing here. :-)
b) PSCEV is wrong inside the loop.
c) The assumption PSCEV makes is correct, except on exit. If this is the case, then we can't use the SCEV we get to compute the value after the last iteration.

In D35227#805493, @mkuper wrote:

In D35227#805164, @mssimpso wrote:

Hi Michael,

I'm probably missing something obvious here. For external IV uses, we compute the end IV value assuming we're coming from the vector loop. But if we executed the vector loop, shouldn't the SCEV predicate have been true, which would mean that the PSEV was a safe assumption?

You're not necessarily missing something obvious, I may be misunderstanding what PSCEV does here. If I understand correctly, PSCEV adds the assumption the IV doesn't overflow inside the loop. In this case, what I see is that it overflows on loop exit. So, either:
a) I misunderstand what PSCEV is doing here. :-)
b) PSCEV is wrong inside the loop.
c) The assumption PSCEV makes is correct, except on exit. If this is the case, then we can't use the SCEV we get to compute the value after the last iteration.

I see. Yes, that makes sense to me, then.

This revision is now accepted and ready to land.Jul 11 2017, 2:15 PM

In D35227#805493, @mkuper wrote:

In D35227#805164, @mssimpso wrote:

I'm probably missing something obvious here. For external IV uses, we compute the end IV value assuming we're coming from the vector loop. But if we executed the vector loop, shouldn't the SCEV predicate have been true, which would mean that the PSEV was a safe assumption?

You're not necessarily missing something obvious, I may be misunderstanding what PSCEV does here. If I understand correctly, PSCEV adds the assumption the IV doesn't overflow inside the loop. In this case, what I see is that it overflows on loop exit. So, either:
a) I misunderstand what PSCEV is doing here. :-)
b) PSCEV is wrong inside the loop.
c) The assumption PSCEV makes is correct, except on exit. If this is the case, then we can't use the SCEV we get to compute the value after the last iteration.

I can't be sure without understanding what exactly LV is trying to do here, but I suspect it is (c). This is true not only of PSCEV but also of SCEV -- "{A,+,B} is nsw" means "the increment *operation* will not overflow if the backedge is taken" or "the increment *operation* may overflow on the last iteration".

Concretely, this is the predicate PSCEV uses to make {A,+,B} nsw:

//   Step < 0, Start - |Step| * Backedge <= Start
//   Step >= 0, Start + |Step| * Backedge > Start

So, e.g. if Backedge is 1, Start is INT_SMAX - 1 and Step is also 1, then the predicate will be true but on the last iteration the IV will be INT_SMAX and the increment *operation* will overflow (even though the overflowed value will not flow back into the IV's PHI node).

I have some more detail here: https://www.playingwithpointers.com/scev-integer-overflow.html

Closed by commit rL307837: [LV] Don't allow outside uses of IVs if the SCEV is predicated on loop… (authored by mkuper). · Explain WhyJul 12 2017, 12:54 PM

This revision was automatically updated to reflect the committed changes.

gilr added inline comments.Jul 16 2017, 4:18 AM

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
5321 ↗	(On Diff #106289)	Can we narrow the check as documented above to only check if the phi's SCEV relies on predicates?
5322 ↗	(On Diff #106289)	The phi node is presumably guaranteed not to overflow also on the last iteration, so should be correct as a live out, right?
llvm/trunk/test/Transforms/LoopVectorize/pr33706.ll
8 ↗	(On Diff #106289)	It would be good to document the specific problem in vectorizing this loop, which is the live-out value %tmp20. When %arg2 == 1, the trip count is 65536 and %tmp20 coming out of the loop should be 0 (65536 & 65535), but currently LV pre-computes the live-out in the pre-header as 65536. Right? Can the test be minimized?

Ayal added a subscriber: Ayal.Jul 18 2017, 12:30 AM

Thanks, Gil!

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
5321 ↗	(On Diff #106289)	I don't think so - IIUC, once we've added the predicate to PSCEV, all further SCEV expressions we get from it can rely the same predicate, and I'm not sure there's a way to query whether a specific one did.
5322 ↗	(On Diff #106289)	Oh, yes, I think you're right - and this meshes well with what sanjoy told me! Feel free to refine it.
llvm/trunk/test/Transforms/LoopVectorize/pr33706.ll
8 ↗	(On Diff #106289)	Right. I thought PR33706 has sufficient details, and pointing to it was fine. I can add the explanation here if you want. I actually spent a while trying to minimize the test, but it's pretty annoying - bugpoint minimizes to the wrong thing, because we can also vectorize similar things using first-order recurrences.

Instead of refraining to vectorize a loop which has an externally used phi (or rather the bump thereof) and any predicate, can a predicate be added (or an existing one be extended) to also cover the last iteration? Pity to bail out on such corner cases.

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
5321 ↗	(On Diff #106289)	Yeah; we could try to note if addInductionPhi() is being called from the first attempt to recognize isInductionPHI() which adds no new predicates, or otherwise from the second attempt which does. But in both cases isInductionPHI() calls PSE.getSCEV(Phi) which may use existing predicates. Probably needs restructuring to first check the result of SE.getSCEV(Phi). Best clarify the documentation above to match the code.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

9 lines

test/

Transforms/

LoopVectorize/

pr33706.ll

61 lines

Diff 105935

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,309 Lines • ▼ Show 20 Lines	if (ID.getKind() == InductionDescriptor::IK_IntInduction &&
// than it is expedient). We've checked that it begins at zero and		// than it is expedient). We've checked that it begins at zero and
// steps by one, so this is a canonical induction variable.		// steps by one, so this is a canonical induction variable.
if (!PrimaryInduction \|\| PhiTy == WidestIndTy)		if (!PrimaryInduction \|\| PhiTy == WidestIndTy)
PrimaryInduction = Phi;		PrimaryInduction = Phi;
}		}

// Both the PHI node itself, and the "post-increment" value feeding		// Both the PHI node itself, and the "post-increment" value feeding
// back into the PHI node may have external users.		// back into the PHI node may have external users.
		// We can allow those uses, except if the SCEVs we have for them rely
		// on predicates that only hold within the loop, since allowing the exit
		// currently means re-using this SCEV outside the loop.
		if (PSE.getUnionPredicate().isAlwaysTrue()) {
AllowedExit.insert(Phi);		AllowedExit.insert(Phi);
AllowedExit.insert(Phi->getIncomingValueForBlock(TheLoop->getLoopLatch()));		AllowedExit.insert(Phi->getIncomingValueForBlock(TheLoop->getLoopLatch()));
		}

DEBUG(dbgs() << "LV: Found an induction variable.\n");		DEBUG(dbgs() << "LV: Found an induction variable.\n");
return;		return;
}		}

bool LoopVectorizationLegality::canVectorizeInstrs() {		bool LoopVectorizationLegality::canVectorizeInstrs() {
BasicBlock *Header = TheLoop->getHeader();		BasicBlock *Header = TheLoop->getHeader();

▲ Show 20 Lines • Show All 2,796 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/pr33706.ll

				; RUN: opt -S -loop-vectorize -force-vector-interleave=1 -force-vector-width=2 < %s \| FileCheck %s

				@global = local_unnamed_addr global i32 0, align 4
				@global.1 = local_unnamed_addr global i32 0, align 4
				@global.2 = local_unnamed_addr global float 0x3EF0000000000000, align 4

				; CHECK-LABEL: @PR33706
				; CHECK-NOT: <2 x i32>
				define void @PR33706(float* nocapture readonly %arg, float* nocapture %arg1, i32 %arg2) local_unnamed_addr {
				bb:
				%tmp = load i32, i32* @global.1, align 4
				%tmp3 = getelementptr inbounds float, float* %arg, i64 190
				%tmp4 = getelementptr inbounds float, float* %arg1, i64 512
				%tmp5 = and i32 %tmp, 65535
				%tmp6 = icmp ugt i32 %arg2, 65536
				br i1 %tmp6, label %bb7, label %bb9

				bb7: ; preds = %bb
				%tmp8 = load i32, i32* @global, align 4
				br label %bb27

				bb9: ; preds = %bb
				%tmp10 = udiv i32 65536, %arg2
				br label %bb11

				bb11: ; preds = %bb11, %bb9
				%tmp12 = phi i32 [ %tmp20, %bb11 ], [ %tmp5, %bb9 ]
				%tmp13 = phi float* [ %tmp18, %bb11 ], [ %tmp4, %bb9 ]
				%tmp14 = phi i32 [ %tmp16, %bb11 ], [ %tmp10, %bb9 ]
				%tmp15 = phi i32 [ %tmp19, %bb11 ], [ %tmp, %bb9 ]
				%tmp16 = add nsw i32 %tmp14, -1
				%tmp17 = sitofp i32 %tmp12 to float
				store float %tmp17, float* %tmp13, align 4
				%tmp18 = getelementptr inbounds float, float* %tmp13, i64 1
				%tmp19 = add i32 %tmp15, %arg2
				%tmp20 = and i32 %tmp19, 65535
				%tmp21 = icmp eq i32 %tmp16, 0
				br i1 %tmp21, label %bb22, label %bb11

				bb22: ; preds = %bb11
				%tmp23 = phi float* [ %tmp18, %bb11 ]
				%tmp24 = phi i32 [ %tmp19, %bb11 ]
				%tmp25 = phi i32 [ %tmp20, %bb11 ]
				%tmp26 = ashr i32 %tmp24, 16
				store i32 %tmp26, i32* @global, align 4
				br label %bb27

				bb27: ; preds = %bb22, %bb7
				%tmp28 = phi i32 [ %tmp26, %bb22 ], [ %tmp8, %bb7 ]
				%tmp29 = phi float* [ %tmp23, %bb22 ], [ %tmp4, %bb7 ]
				%tmp30 = phi i32 [ %tmp25, %bb22 ], [ %tmp5, %bb7 ]
				%tmp31 = sext i32 %tmp28 to i64
				%tmp32 = getelementptr inbounds float, float* %tmp3, i64 %tmp31
				%tmp33 = load float, float* %tmp32, align 4
				%tmp34 = sitofp i32 %tmp30 to float
				%tmp35 = load float, float* @global.2, align 4
				%tmp36 = fmul float %tmp35, %tmp34
				%tmp37 = fadd float %tmp33, %tmp36
				store float %tmp37, float* %tmp29, align 4
				ret void
				}