This is an archive of the discontinued LLVM Phabricator instance.

[LV] Handle external uses of floating-point induction variables
ClosedPublic

Authored by mssimpso on Apr 24 2017, 10:56 AM.

Download Raw Diff

Details

Reviewers

mkuper
davide
delena

Commits

rG9eed0bee3dcd: [LV] Handle external uses of floating-point induction variables
rL301428: [LV] Handle external uses of floating-point induction variables

Summary

Reference: https://bugs.llvm.org/show_bug.cgi?id=32758

Diff Detail

Build Status

Buildable 5826
Build 5826: arc lint + arc unit

Event Timeline

mssimpso created this revision.Apr 24 2017, 10:56 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptApr 24 2017, 10:56 AM

The original PR looks fishy to me, but I agree this is a real issue regardless.

I'm not sure this patch is correct, though. Just to understand what's going on here - we have an FP IV, for which we can compute the (integer, obviously) trip-count. We then cast that integer trip count into an FP value (possibly losing precision) , and them compute start + (step * trip count), in FP, to get the value from the penultimate iteration?

Hi Michael,

After transforming the IV, we end up with something like fp_start + float(n_vec - 1) * fp_iv_step. So we are casting the number of vector loop iterations and multiplying by the floating-point step

Yes, that should have been "trip count - 1", sorry.

My question is whether doing this in FP actually produces the desired result. When I wrote this code for int IVs, the idea was that "start + (step * (count - 1))" is equivalent to "start + step + ... + step" with count-1 additions.
Is this true for the FP case?

I think it should be fine if fast-math is enabled? I was under the impression that we would only recognize floating-point inductions with fast-math enabled, but having just tested it this doesn't seem to be the case after all. Shouldn't it be? What do you think?

RKSimon added a subscriber: RKSimon.Apr 24 2017, 1:31 PM

I'm not sure. Do we currently (I mean, without this patch) do anything with FP IVs that violates spec?

If we do, then, yes, we not be recognizing them - and this transformation is also safe.
If we don't, then it would be a good idea to ask somebody who understands FP better whether this makes sense or not.

OK, we require fast-math to vectorize floating-point inductions unless vectorization is forced. In my last update, I mentioned that I was seeing the test case be vectorized even without fast-math, but the test was using -force-vector-width. So I've moved the test to the X86 directory and added an additional no fast-math variant. We vectorize the fast-math version and compute the value of the external IV use the same way we do for integer IVs. We don't vectorize the no fast-math version.

Elena, what do you think about this? The alternative I see would be to just disallow external uses of floating-point inductions.

We vectorize loops with FP inductions and FP reductions under the "fast-math". The main point here that the FP induction is a "secondary", the tripcount does not depends on it. I think it's OK to use FP induction outside the loop, why not? FP reduction, that we allow today, actually means using the result value outside.

Ok, for fast-math, I think this makes sense.
LGTM.

This revision is now accepted and ready to land.Apr 25 2017, 2:01 PM

Closed by commit rL301428: [LV] Handle external uses of floating-point induction variables (authored by mssimpso). · Explain WhyApr 26 2017, 9:35 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

8 lines

test/

Transforms/

LoopVectorize/

float-induction.ll

41 lines

Diff 96428

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,580 Lines • ▼ Show 20 Lines	for (User *U : OrigPhi->users()) {
if (!OrigLoop->contains(UI)) {		if (!OrigLoop->contains(UI)) {
const DataLayout &DL =		const DataLayout &DL =
OrigLoop->getHeader()->getModule()->getDataLayout();		OrigLoop->getHeader()->getModule()->getDataLayout();
assert(isa<PHINode>(UI) && "Expected LCSSA form");		assert(isa<PHINode>(UI) && "Expected LCSSA form");

IRBuilder<> B(MiddleBlock->getTerminator());		IRBuilder<> B(MiddleBlock->getTerminator());
Value *CountMinusOne = B.CreateSub(		Value *CountMinusOne = B.CreateSub(
CountRoundDown, ConstantInt::get(CountRoundDown->getType(), 1));		CountRoundDown, ConstantInt::get(CountRoundDown->getType(), 1));
Value *CMO = B.CreateSExtOrTrunc(CountMinusOne, II.getStep()->getType(),		Value *CMO =
"cast.cmo");		!II.getStep()->getType()->isIntegerTy()
		? B.CreateCast(Instruction::SIToFP, CountMinusOne,
		II.getStep()->getType())
		: B.CreateSExtOrTrunc(CountMinusOne, II.getStep()->getType());
		CMO->setName("cast.cmo");
Value *Escape = II.transform(B, CMO, PSE.getSE(), DL);		Value *Escape = II.transform(B, CMO, PSE.getSE(), DL);
Escape->setName("ind.escape");		Escape->setName("ind.escape");
MissingVals[UI] = Escape;		MissingVals[UI] = Escape;
}		}
}		}

for (auto &I : MissingVals) {		for (auto &I : MissingVals) {
PHINode *PHI = cast<PHINode>(I.first);		PHINode *PHI = cast<PHINode>(I.first);
▲ Show 20 Lines • Show All 4,387 Lines • Show Last 20 Lines

test/Transforms/LoopVectorize/float-induction.ll

Show First 20 Lines • Show All 332 Lines • ▼ Show 20 Lines	for.inc:
%i.next = add nuw nsw i64 %i, 1		%i.next = add nuw nsw i64 %i, 1
%j.next = fadd fast float %j, 1.0		%j.next = fadd fast float %j, 1.0
%cond = icmp slt i64 %i.next, %N		%cond = icmp slt i64 %i.next, %N
br i1 %cond, label %for.body, label %for.end		br i1 %cond, label %for.body, label %for.end

for.end:		for.end:
ret void		ret void
}		}

		; VEC4_INTERL1-LABEL: @external_use(
		; VEC4_INTERL1-NEXT: entry:
		; VEC4_INTERL1-NEXT: [[TMP0:%.*]] = icmp sgt i64 %n, 1
		; VEC4_INTERL1-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i64 %n, i64 1
		; VEC4_INTERL1: br i1 {{.*}}, label %scalar.ph, label %min.iters.checked
		; VEC4_INTERL1: min.iters.checked:
		; VEC4_INTERL1-NEXT: [[N_VEC:%.*]] = and i64 [[SMAX]], 9223372036854775804
		; VEC4_INTERL1: br i1 {{.*}}, label %scalar.ph, label %vector.ph
		; VEC4_INTERL1: vector.ph:
		; VEC4_INTERL1-NEXT: br label %vector.body
		; VEC4_INTERL1: vector.body:
		; VEC4_INTERL1-NEXT: [[INDEX:%.]] = phi i64 [ 0, %vector.ph ], [ [[INDEX_NEXT:%.]], %vector.body ]
		; VEC4_INTERL1-NEXT: [[INDEX_NEXT]] = add i64 [[INDEX]], 4
		; VEC4_INTERL1-NEXT: [[TMP1:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
		; VEC4_INTERL1-NEXT: br i1 [[TMP1]], label %middle.block, label %vector.body
		; VEC4_INTERL1: middle.block:
		; VEC4_INTERL1: [[TMP2:%.*]] = add nsw i64 [[N_VEC]], -1
		; VEC4_INTERL1-NEXT: [[CAST_CMO:%.*]] = sitofp i64 [[TMP2]] to double
		; VEC4_INTERL1-NEXT: [[IND_ESCAPE:%.*]] = fadd fast double [[CAST_CMO]], %m
		; VEC4_INTERL1-NEXT: br i1 {{.*}}, label %for.end, label %scalar.ph
		; VEC4_INTERL1: for.end:
		; VEC4_INTERL1-NEXT: [[TMP0:%.*]] = phi double [ %j, %for.body ], [ [[IND_ESCAPE]], %middle.block ]
		; VEC4_INTERL1-NEXT: ret double [[TMP0]]

		define double @external_use(double %m, i64 %n) {
		entry:
		br label %for.body

		for.body:
		%i = phi i64 [ 0, %entry ], [%i.next, %for.body]
		%j = phi double [ %m, %entry ], [ %j.next, %for.body ]
		%i.next = add i64 %i, 1
		%j.next = fadd fast double %j, 1.0
		%cond = icmp slt i64 %i.next, %n
		br i1 %cond, label %for.body, label %for.end

		for.end:
		%tmp0 = phi double [ %j, %for.body ]
		ret double %tmp0
		}