This is an archive of the discontinued LLVM Phabricator instance.

Loop vectorization with FP induction variables
ClosedPublic

Authored by delena on Jun 14 2016, 11:30 AM.

Details

Summary

Handling FP IVs without SCEV. Covered simple cases with constant and loop invariant step.

float *A;
float x = init;
for (int i=0; i < N; ++i) {

  A[i] = x;
  x -= fp_inc;
}

Diff Detail

Repository
rL LLVM

Event Timeline

delena updated this revision to Diff 60711.Jun 14 2016, 11:30 AM
delena retitled this revision from to Loop vectorization with FP induction variables.
delena updated this object.
delena set the repository for this revision to rL LLVM.
delena added a subscriber: llvm-commits.
sanjoy edited edge metadata.Jun 20 2016, 10:02 AM

Some comments inline.

I cannot review the LoopVectorize.cpp changes.

lib/Analysis/ScalarEvolutionExpander.cpp
1610 ↗(On Diff #60711)

Why do we need this?

lib/Transforms/Utils/LoopUtils.cpp
758 ↗(On Diff #60711)

Looking at the users of isInductionPHI, looks like it should be easy to pass in the Loop * directly? If so, can we just have this check be: if (L->getHeader() != Phi->getParent()) return false; ? That way we won't have to add the getLoopFor interface to SCEV (which does not look like it belongs there).

delena added inline comments.Jun 20 2016, 10:12 AM
lib/Analysis/ScalarEvolutionExpander.cpp
1610 ↗(On Diff #60711)

I defined Step as unknown SCEV, but underlying value is of floating point type.
In this case expand() just returns the underlying value, but it fails on one of these two lines.

lib/Transforms/Utils/LoopUtils.cpp
758 ↗(On Diff #60711)

Yes, of course. I'll change.

mkuper added inline comments.Jun 20 2016, 12:42 PM
lib/Transforms/Utils/LoopUtils.cpp
675 ↗(On Diff #60711)

Maybe ((IK == IK_FpInduction) || Step->getType()->isIntegerTy()) would be clearer.

678 ↗(On Diff #60711)

Maybe put this on line 668, next to the two asserts for the other types?

733 ↗(On Diff #60711)

As long as you're touching this - maybe hoist this assert for all 3 cases above the switch?

739 ↗(On Diff #60711)

Why do we need this?

767 ↗(On Diff #60711)

Some variable naming nitpicking:
Bb -> either BB or bb?
BEValueV -> BEValue
StartValueV -> StartValue

772 ↗(On Diff #60711)

Do we really need the if here? We expect L->contains(BB) to be true for one of the incoming values, and false for the other, right? So if we're in the "else", we're already in a bad shape, regardless of whether BEValueV == V or not.
Or did I misunderstand?

776 ↗(On Diff #60711)

Same as above.

780 ↗(On Diff #60711)

Can we replace this with an early exit?

801 ↗(On Diff #60711)

Are you sure this is always a safe place to insert? E.g. what if Addend is a PHI in an outer loop?

827 ↗(On Diff #60711)

Why not isFloatingPointTy() here?

lib/Transforms/Vectorize/LoopVectorize.cpp
2154 ↗(On Diff #60711)

ITy here stands for "IntegerTy", I guess. Perhaps rename, now that ITy can be float?
(Unless ITy stands for IndexTy, which also make sense, and in which case we should also probably rename. :-) )

2178 ↗(On Diff #60711)

This looks a bit odd.
If I understand correctly, you're relying on the step being FNeg to distinguish whether the original direction of the loop was positive or negative (by transforming a phi that feeds an fsub by a phi that feeds an fadd with an fneg).
What happens if the scalar loop has an fadd, where the original step is a loop-invariant fneg?

mkuper added inline comments.Jun 20 2016, 12:42 PM
lib/Analysis/ScalarEvolutionExpander.cpp
1610 ↗(On Diff #60711)

That sounds a bit weird to me.
It looks like the contract for expandCodeFor is that if you pass a type, you get the expansion cast to this type. What happens if you pass a type, but get a result that's from a different type than you passed (because it's not scevable)?

delena marked 2 inline comments as done.Jun 20 2016, 1:51 PM

Michael, thanks a lot for your comments. I'll upload a new patch.

lib/Analysis/ScalarEvolutionExpander.cpp
1610 ↗(On Diff #60711)

I don't know why do we need to pass the type.
May be for casting between i64 and PointerType?
But an Unknown SCEV may have any type, right?

lib/Transforms/Utils/LoopUtils.cpp
678 ↗(On Diff #60711)

All these checks belong to the Step type. See comment #669.

733 ↗(On Diff #60711)

Ok

739 ↗(On Diff #60711)

I should cover fsub operation somehow.
; for (int i=0; i < N; ++i) {
; A[i] = x;
; x -= fp_inc;
; }
I keep Step as Fneg(fp_inc) in this case.

772 ↗(On Diff #60711)

I just copied this code from integer Phi together with weird variable names.
Can we have multiple bbs inside loop?

801 ↗(On Diff #60711)

This FNeg disappears after transformation. It is just a way to distinguish between FAdd and FSub.

827 ↗(On Diff #60711)

isFloatingPointTy() includes more than half, float and double. I'm not sure I want to cover other types.

lib/Transforms/Vectorize/LoopVectorize.cpp
2178 ↗(On Diff #60711)

I don't see any problem. x + (-a) is equal to x - a.

mkuper edited edge metadata.Jun 20 2016, 2:14 PM

Thanks, Elena.

I guess there's one thing I don't understand about the fnegs - why are you using an IR instruction as the marker of whether the original induction had an fadd or an fsub, instead of a property of the IV? It would make sense to me if you actually used the fneg to feed the vector induction, thus simplifying the code (not having to special-case the sub), but instead you special-case it anyway by looking through the fneg.

Another thing I forgot earlier - this should only fire when we're in unsafe/fast math mode, right? Is there a check for that?

lib/Transforms/Utils/LoopUtils.cpp
739 ↗(On Diff #60711)

I understand, what I mean is - let's say the step is an FNeg.
Why can't you feed the FNeg directly into the CreateFMul? Do we get worse code?

772 ↗(On Diff #60711)

Right now, I think not. (and that needs to be fixed.) :-\
Anyway, if this is a verbatim copy from the int case, leave it be for now. Should probably be fixed separately.
(Of course, that raises the question - can the int and the fp case share the code? Or is it not worth it?)

801 ↗(On Diff #60711)

When does it disappear? If it disappears in later clean-up, we still don't want to insert it at an illegal location (like between two phis).
If it's guaranteed to be deleted during the run of the vectorizer, I guess this will technically work, but I'd still prefer to avoid it.

lib/Transforms/Vectorize/LoopVectorize.cpp
2178 ↗(On Diff #60711)

What I'm trying to say is that it's weird that the behavior would be different based on whether the step is an fneg. It could be an fneg because you added an fneg, it could be an fneg because there was already an fneg. Why does this code look through the fneg? It doesn't seem like this should be the vectorizer's job.
Although if the fneg won't get cleaned up later, this is probably the right thing to do.

I guess there's one thing I don't understand about the fnegs - why are you using an IR instruction as the marker of whether the original induction had an fadd or an fsub, instead of a property of the IV? It would make sense to me if you actually used the fneg to feed the vector induction, thus simplifying the code (not having to special-case the sub), but instead you special-case it anyway by looking through the fneg.

What do you mean by "property of IV" ? Do you suggest to add a special field to InductionDescriptor?
Fneg should be a part of Step. But Step is a SCEV and I was not allowed to implement FP SCEV.

Another thing I forgot earlier - this should only fire when we're in unsafe/fast math mode, right? Is there a check for that?

FP reduction is allowed for +/- operators. Only max/min checks for unsafe.

What do you mean by "property of IV" ? Do you suggest to add a special field to InductionDescriptor?
Fneg should be a part of Step. But Step is a SCEV and I was not allowed to implement FP SCEV.

You know, I probably just don't understand the SCEV situation well enough.
Sanjoy, any chance you could take a look, even though part of it is in LoopVectorize.cpp? :-)

delena updated this revision to Diff 61935.Jun 27 2016, 12:26 AM
delena edited edge metadata.
  1. Changed handling of FSUB operation. I keep the original binary operation inside Induction Descriptor.
  2. Allow FP induction in fast-math mode only.

Hi Elena,

Sorry for the delay, I was out on vacation. I'm catching up on email now, and I will try to get to review this this week.

Thanks!

sanjoy requested changes to this revision.Jul 15 2016, 12:43 AM
sanjoy edited edge metadata.

Some comments inline. Please let me know if you want me to look at something specific, but I'm not familiar enough with the code this patch touches to lgtm it.

../lib/Analysis/ScalarEvolutionExpander.cpp
1614 ↗(On Diff #61935)

This is odd -- is it just to help keep the Step as a SCEV *? If so, I'd suggest solving that within InductionDescriptor itself (i.e. maybe support having the step as either a SCEV * or a Value *, depending on the type of the InductionDescriptor?).

../lib/Transforms/Utils/LoopUtils.cpp
762 ↗(On Diff #61935)

(Not for fixing in this change) looks like a better interface would be to return an Optional< InductionDescriptor>?

787 ↗(On Diff #61935)

Maybe use a dyn_cast here?

811 ↗(On Diff #61935)

The condition looks inverted?

This revision now requires changes to proceed.Jul 15 2016, 12:43 AM
delena marked an inline comment as done.Jul 17 2016, 5:35 AM
delena added inline comments.
../lib/Analysis/ScalarEvolutionExpander.cpp
1614 ↗(On Diff #61935)

I't dropping this change, I don't need it anymore.

../lib/Transforms/Utils/LoopUtils.cpp
787 ↗(On Diff #61935)

Ok.

811 ↗(On Diff #61935)

The hasUnsafeAlgebra() means that instruction itself has "fast" attribute. In this case we don't need additional check. But if the BOp does not have the "fast" attribute, the legality of FP transformation should be allowed on function level. I'll add a comment.

delena updated this revision to Diff 64249.Jul 17 2016, 5:43 AM
delena added a reviewer: sbaranga.
delena marked an inline comment as done.
delena added a subscriber: sbaranga.

Some changes according to Sanjoy's comment. Thanks Sanjoy and Michael for review.
Still looking for somebody who can accept this patch. Added @sbaranga, who made similar changes in induction variables.

I want to go over the code again, after the changes, but the reason I don't feel like I can accept the patch is because I wasn't part of the original FP SCEV discussion, and I'm not sure I understand the design considerations.

If someone - e.g. sanjoy - OK's the design, I can LGTM the LV code change.

mkuper added inline comments.Jul 18 2016, 11:20 AM
../include/llvm/Transforms/Utils/LoopUtils.h
274 ↗(On Diff #64249)

I think Instruction::BinaryOpsEnd may be better for an explicitly invalid BinaryOp. Not sure that's a good choice, but pretty sure 0 isn't.

../lib/Transforms/Utils/LoopUtils.cpp
787 ↗(On Diff #64249)

I think what sanjoy meant was:

BinaryOperator *BOp = dyn_cast<BinaryOperator>(BEValue);
if (!BOp)
  return false;
../lib/Transforms/Vectorize/LoopVectorize.cpp
4100 ↗(On Diff #64249)

Main -> Primary (I think we use that consistently)

6402 ↗(On Diff #64249)

Can you add a test for this? All of the tests you added force UF == 1.

6405 ↗(On Diff #64249)

Are you sure about this? I mean, it's true for vectorizing, but is it true here as well?
(I'm not saying it isn't, just making sure this is intentional)

I want to go over the code again, after the changes, but the reason I don't feel like I can accept the patch is because I wasn't part of the original FP SCEV discussion, and I'm not sure I understand the design considerations.

The bottom line of the FP SCEV discussion was the point that FP SCEV is overkill for "secondary" IV (like in the example above). We'll need FP SCEV for primary FP IV like
for (float f =0.0; f < g; f+=0.5) {}.
But such loops are rare and most of them can be re-mapped to integers.
The suggestion was to include FP IV to the current InductionDescriptor.

I want to go over the code again, after the changes, but the reason I don't feel like I can accept the patch is because I wasn't part of the original FP SCEV discussion, and I'm not sure I understand the design considerations.

The bottom line of the FP SCEV discussion was the point that FP SCEV is overkill for "secondary" IV (like in the example above). We'll need FP SCEV for primary FP IV like
for (float f =0.0; f < g; f+=0.5) {}.
But such loops are rare and most of them can be re-mapped to integers.
The suggestion was to include FP IV to the current InductionDescriptor.

+1

delena marked 3 inline comments as done.Jul 19 2016, 5:04 AM
delena added inline comments.
../include/llvm/Transforms/Utils/LoopUtils.h
274 ↗(On Diff #64249)

Thanks, I'll fix.

../lib/Transforms/Vectorize/LoopVectorize.cpp
6405 ↗(On Diff #64249)

Even if you have only unrolling, and VF is 1, the value of FP induction is calculated as:
sitofp(PrimaryIV) * Increment.

for (int i=0; i<N; i++) {
  fp_ind += fp_inc;
}

is transferred to something like this:

init = fp_inc;
for (int i=0; i<N; i++) {
  fp_ind = init + i*fp_inc; 
}

In this case we need unsafe math.
I added tests for unrolling.

delena updated this revision to Diff 64479.Jul 19 2016, 6:23 AM
delena edited edge metadata.
delena marked an inline comment as done.

Updated the code according to Michael's comments.

mkuper accepted this revision.Jul 19 2016, 11:08 AM
mkuper edited edge metadata.

LGTM

anemet edited edge metadata.Jul 19 2016, 11:10 AM

LGTM

I've just started looking at this too. Please give me a few mins. So far I only encountered minor things.

../include/llvm/Transforms/Utils/LoopUtils.h
308 ↗(On Diff #64479)

Fp->FP

328–329 ↗(On Diff #64479)

binary op -> induction op is better everywhere. Also I am assuming this is the op that advances the induction variable. You may want to spell this out somewhere.

../lib/Transforms/Utils/LoopUtils.cpp
770 ↗(On Diff #64479)

this function is returning a bool

Sorry, didn't realize anyone else is still interested in looking at this, given how long this patch has been up.
Ignore my LGTM. :-)

Sorry, didn't realize anyone else is still interested in looking at this, given how long this patch has been up.
Ignore my LGTM. :-)

No, *I am* sorry for chiming in this late. I felt obliged because you mentioned that no one looked at this from the original llvm-dev thread and I felt bad ;). And thanks for reviewing it, this is looking pretty good.

So with these it should LGTM too. I haven't checked everything (most notably the unsafe math part).

I just wanted to see whether this was in line with the direction set in the original llvm-dev thread and it is! Thanks to all of you and sorry about the delay again.

../lib/Transforms/Utils/LoopUtils.cpp
816–818 ↗(On Diff #64479)

I think that a better interface would be to take BOp (step instruction) optionally and then derive DI::hasUnsafeAlgebra and the opcode from that. This is OK as a follow-up if you prefer.

../lib/Transforms/Vectorize/LoopVectorize.cpp
4104–4105 ↗(On Diff #64479)

by adding StepVal, you mean

delena updated this revision to Diff 64645.Jul 20 2016, 1:36 AM
delena edited edge metadata.
delena marked 4 inline comments as done.

Minor changes according to Adam's comments.

delena added inline comments.Jul 20 2016, 1:38 AM
../include/llvm/Transforms/Utils/LoopUtils.h
328–329 ↗(On Diff #64479)

Ok. Changing it to getInductionBinOp()

../lib/Transforms/Utils/LoopUtils.cpp
770 ↗(On Diff #64479)

Fixed. Thanks.

816–818 ↗(On Diff #64479)

I did it at the beginning and then prefer to follow Reduction implementation, just to be consistent with existing interface.

../lib/Transforms/Vectorize/LoopVectorize.cpp
4104–4105 ↗(On Diff #64479)

I fixed the comment. thanks.

delena updated this revision to Diff 64670.Jul 20 2016, 5:34 AM

Added one more test where init value and step are constants.

You should probably also have a test for the case where there is not "fast" attribute on the step instruction but we can still vectorize with the hints.

../include/llvm/Transforms/Utils/LoopUtils.h
321–323 ↗(On Diff #64670)

This comment seems incorrect/misleading. I think this flag "allows" a relaxed FP model not "requires" it. Do you agree?

328–329 ↗(On Diff #64670)

I think you missed the part about "everywhere", i.e. making the member variable consistent with this name too: BinaryOp -> InductionOp

343–344 ↗(On Diff #64670)

This comment is also ambiguous. This is again the induction/step instruction *iff* the instruction allows for a relaxed FP model.

../lib/Transforms/Utils/LoopUtils.cpp
819–821 ↗(On Diff #64670)

Fair but it does not make sense to pass a different instructions for UAI and BO and the current interface allows for that. At least then change the ctor to take a single instruction and derive UAI and BO from that.

../test/Transforms/LoopVectorize/float-induction.ll
12–13 ↗(On Diff #64670)

Where is the result of these guys used? Would be good to check too.

16–17 ↗(On Diff #64670)

Why not also match the result of the fsub and check it in the store?

delena updated this revision to Diff 64853.Jul 21 2016, 3:39 AM

Updated the patch according to Adam's comments.

anemet added inline comments.Jul 21 2016, 10:18 AM
../include/llvm/Transforms/Utils/LoopUtils.h
355 ↗(On Diff #64853)

Comments are full sentences, please end with a period.

../test/Transforms/LoopVectorize/float-induction.ll
4 ↗(On Diff #64853)

I *think* you can only have this under LoopVectorize/X86 (what if the X86 backend is not enabled in a build?). But more importantly, I don't understand why you need to formulate the non-fast-math case as an x86 test.

delena added inline comments.Jul 22 2016, 5:47 AM
../include/llvm/Transforms/Utils/LoopUtils.h
355 ↗(On Diff #64853)

ok.

../test/Transforms/LoopVectorize/float-induction.ll
4 ↗(On Diff #64853)

The "unsafe" function attribute works only for auto-vec. If you specify -force-vector-width=4 the loop will be vectorized anyway. So I added "X86" tests just to check combination of function attribute and "safe" FP induction in the auto-vectorization mode.

anemet accepted this revision.Jul 22 2016, 9:29 AM
anemet edited edge metadata.

LGTM with the x86 test split into its own test under Transform/LoopVectorize/X86. Thanks!

../test/Transforms/LoopVectorize/float-induction.ll
4 ↗(On Diff #64853)

Hmm, that is somewhat unexpected. I thought that the -force* stuff only overruled the cost model... @hfinkel, is it intentional/desired that -force-vector-width=>1 overrules (some of) the legality checks? We have different ways to bypass the legality checks so perhaps keeping -force-vector-width as a way to overrule the profitability checks is more desired?

I guess one concerns is that this internal flag was advertised on Nadav's Auto-vectorizer blog post so perhaps changing the behavior is not trivial at this point.

Anyhow, coming back to this patch, you need then to split this part of the test into a Transform/LoopVectorize/X86 test.

This revision was automatically updated to reflect the committed changes.