This is an archive of the discontinued LLVM Phabricator instance.

[LV] Handle external uses of floating-point induction variables
ClosedPublic

Authored by mssimpso on Apr 24 2017, 10:56 AM.

Download Raw Diff

Details

Reviewers

mkuper
davide
delena

Commits

rG9eed0bee3dcd: [LV] Handle external uses of floating-point induction variables
rL301428: [LV] Handle external uses of floating-point induction variables

Summary

Reference: https://bugs.llvm.org/show_bug.cgi?id=32758

Diff Detail

Repository: rL LLVM

Event Timeline

mssimpso created this revision.Apr 24 2017, 10:56 AM

Herald added a subscriber: mzolotukhin. · View Herald TranscriptApr 24 2017, 10:56 AM

The original PR looks fishy to me, but I agree this is a real issue regardless.

I'm not sure this patch is correct, though. Just to understand what's going on here - we have an FP IV, for which we can compute the (integer, obviously) trip-count. We then cast that integer trip count into an FP value (possibly losing precision) , and them compute start + (step * trip count), in FP, to get the value from the penultimate iteration?

Hi Michael,

After transforming the IV, we end up with something like fp_start + float(n_vec - 1) * fp_iv_step. So we are casting the number of vector loop iterations and multiplying by the floating-point step

Yes, that should have been "trip count - 1", sorry.

My question is whether doing this in FP actually produces the desired result. When I wrote this code for int IVs, the idea was that "start + (step * (count - 1))" is equivalent to "start + step + ... + step" with count-1 additions.
Is this true for the FP case?

I think it should be fine if fast-math is enabled? I was under the impression that we would only recognize floating-point inductions with fast-math enabled, but having just tested it this doesn't seem to be the case after all. Shouldn't it be? What do you think?

RKSimon added a subscriber: RKSimon.Apr 24 2017, 1:31 PM

I'm not sure. Do we currently (I mean, without this patch) do anything with FP IVs that violates spec?

If we do, then, yes, we not be recognizing them - and this transformation is also safe.
If we don't, then it would be a good idea to ask somebody who understands FP better whether this makes sense or not.

OK, we require fast-math to vectorize floating-point inductions unless vectorization is forced. In my last update, I mentioned that I was seeing the test case be vectorized even without fast-math, but the test was using -force-vector-width. So I've moved the test to the X86 directory and added an additional no fast-math variant. We vectorize the fast-math version and compute the value of the external IV use the same way we do for integer IVs. We don't vectorize the no fast-math version.

Elena, what do you think about this? The alternative I see would be to just disallow external uses of floating-point inductions.

We vectorize loops with FP inductions and FP reductions under the "fast-math". The main point here that the FP induction is a "secondary", the tripcount does not depends on it. I think it's OK to use FP induction outside the loop, why not? FP reduction, that we allow today, actually means using the result value outside.

Ok, for fast-math, I think this makes sense.
LGTM.

This revision is now accepted and ready to land.Apr 25 2017, 2:01 PM

Closed by commit rL301428: [LV] Handle external uses of floating-point induction variables (authored by mssimpso). · Explain WhyApr 26 2017, 9:35 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

8 lines

test/

Transforms/

LoopVectorize/

X86/

float-induction-x86.ll

63 lines

Diff 96768

llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,580 Lines • ▼ Show 20 Lines	for (User *U : OrigPhi->users()) {
if (!OrigLoop->contains(UI)) {		if (!OrigLoop->contains(UI)) {
const DataLayout &DL =		const DataLayout &DL =
OrigLoop->getHeader()->getModule()->getDataLayout();		OrigLoop->getHeader()->getModule()->getDataLayout();
assert(isa<PHINode>(UI) && "Expected LCSSA form");		assert(isa<PHINode>(UI) && "Expected LCSSA form");

IRBuilder<> B(MiddleBlock->getTerminator());		IRBuilder<> B(MiddleBlock->getTerminator());
Value *CountMinusOne = B.CreateSub(		Value *CountMinusOne = B.CreateSub(
CountRoundDown, ConstantInt::get(CountRoundDown->getType(), 1));		CountRoundDown, ConstantInt::get(CountRoundDown->getType(), 1));
Value *CMO = B.CreateSExtOrTrunc(CountMinusOne, II.getStep()->getType(),		Value *CMO =
"cast.cmo");		!II.getStep()->getType()->isIntegerTy()
		? B.CreateCast(Instruction::SIToFP, CountMinusOne,
		II.getStep()->getType())
		: B.CreateSExtOrTrunc(CountMinusOne, II.getStep()->getType());
		CMO->setName("cast.cmo");
Value *Escape = II.transform(B, CMO, PSE.getSE(), DL);		Value *Escape = II.transform(B, CMO, PSE.getSE(), DL);
Escape->setName("ind.escape");		Escape->setName("ind.escape");
MissingVals[UI] = Escape;		MissingVals[UI] = Escape;
}		}
}		}

for (auto &I : MissingVals) {		for (auto &I : MissingVals) {
PHINode *PHI = cast<PHINode>(I.first);		PHINode *PHI = cast<PHINode>(I.first);
▲ Show 20 Lines • Show All 4,388 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/LoopVectorize/X86/float-induction-x86.ll

	Show First 20 Lines • Show All 76 Lines • ▼ Show 20 Lines

	for.end.loopexit: ; preds = %for.body			for.end.loopexit: ; preds = %for.body
	br label %for.end			br label %for.end

	for.end: ; preds = %for.end.loopexit, %entry			for.end: ; preds = %for.end.loopexit, %entry
	ret void			ret void
	}			}

				; AUTO_VEC-LABEL: @external_use_with_fast_math(
				; AUTO_VEC-NEXT: entry:
				; AUTO_VEC-NEXT: [[TMP0:%.*]] = icmp sgt i64 %n, 1
				; AUTO_VEC-NEXT: [[SMAX:%.*]] = select i1 [[TMP0]], i64 %n, i64 1
				; AUTO_VEC: br i1 {{.*}}, label %for.body, label %min.iters.checked
				; AUTO_VEC: min.iters.checked:
				; AUTO_VEC-NEXT: [[N_VEC:%.*]] = and i64 [[SMAX]], 9223372036854775792
				; AUTO_VEC: br i1 {{.*}}, label %for.body, label %vector.body
				; AUTO_VEC: middle.block:
				; AUTO_VEC: [[TMP11:%.*]] = add nsw i64 [[N_VEC]], -1
				; AUTO_VEC-NEXT: [[CAST_CMO:%.*]] = sitofp i64 [[TMP11]] to double
				; AUTO_VEC-NEXT: [[TMP12:%.*]] = fmul fast double [[CAST_CMO]], 3.000000e+00
				; AUTO_VEC-NEXT: br i1 {{.*}}, label %for.end, label %for.body
				; AUTO_VEC: for.end:
				; AUTO_VEC-NEXT: [[J_LCSSA:%.*]] = phi double [ [[TMP12]], %middle.block ], [ %j, %for.body ]
				; AUTO_VEC-NEXT: ret double [[J_LCSSA]]
				;
				define double @external_use_with_fast_math(double* %a, i64 %n) {
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ 0, %entry ], [%i.next, %for.body]
				%j = phi double [ 0.0, %entry ], [ %j.next, %for.body ]
				%tmp0 = getelementptr double, double* %a, i64 %i
				store double %j, double* %tmp0
				%i.next = add i64 %i, 1
				%j.next = fadd fast double %j, 3.0
				%cond = icmp slt i64 %i.next, %n
				br i1 %cond, label %for.body, label %for.end

				for.end:
				%tmp1 = phi double [ %j, %for.body ]
				ret double %tmp1
				}

				; AUTO_VEC-LABEL: @external_use_without_fast_math(
				; AUTO_VEC: for.body:
				; AUTO_VEC: [[J:%.]] = phi double [ 0.000000e+00, %entry ], [ [[J_NEXT:%.]], %for.body ]
				; AUTO_VEC: [[J_NEXT]] = fadd double [[J]], 3.000000e+00
				; AUTO_VEC: br i1 {{.*}}, label %for.body, label %for.end
				; AUTO_VEC: for.end:
				; AUTO_VEC-NEXT: ret double [[J]]
				;
				define double @external_use_without_fast_math(double* %a, i64 %n) {
				entry:
				br label %for.body

				for.body:
				%i = phi i64 [ 0, %entry ], [%i.next, %for.body]
				%j = phi double [ 0.0, %entry ], [ %j.next, %for.body ]
				%tmp0 = getelementptr double, double* %a, i64 %i
				store double %j, double* %tmp0
				%i.next = add i64 %i, 1
				%j.next = fadd double %j, 3.0
				%cond = icmp slt i64 %i.next, %n
				br i1 %cond, label %for.body, label %for.end

				for.end:
				%tmp1 = phi double [ %j, %for.body ]
				ret double %tmp1
				}

	attributes #0 = { "no-nans-fp-math"="true" }			attributes #0 = { "no-nans-fp-math"="true" }
	attributes #1 = { "no-nans-fp-math"="false" }			attributes #1 = { "no-nans-fp-math"="false" }