This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Detect loops in the innermost loop before creating InnerLoopVectorizer
ClosedPublic

Authored by timshen on Jul 28 2016, 6:44 PM.

Diff Detail

Event Timeline

timshen updated this revision to Diff 66071.Jul 28 2016, 6:44 PM
timshen retitled this revision from to [LoopVectorize] Detect loops in the innermost loop before creating InnerLoopVectorizer.
timshen updated this object.
timshen added reviewers: nadav, chandlerc, bkramer, dblaikie.
timshen added a subscriber: llvm-commits.
nadav added inline comments.Jul 29 2016, 12:12 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
6813

Is this the correct way to add this check? This looks like a check that needs to be a part of the legality check. Also, we should probably emit a debug message or something.

timshen updated this revision to Diff 66379.Aug 1 2016, 2:12 PM

Move the check to canVectorize().

timshen marked an inline comment as done.Aug 1 2016, 2:13 PM
nadav accepted this revision.Aug 1 2016, 4:13 PM
nadav edited edge metadata.
nadav added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
4628

I think that the new message is less informative. Please make this two separate checks. One for cycles and the other message for non-innermost.

I don't have any other comments.

This revision is now accepted and ready to land.Aug 1 2016, 4:13 PM
mkuper added a subscriber: mkuper.Aug 1 2016, 4:45 PM
mkuper added inline comments.
test/Transforms/LoopVectorize/pr28541.ll
4

Could you also please check that the loop doesn't get vectorized?

anemet added a subscriber: anemet.Aug 1 2016, 5:27 PM

Where this check should be is related to PR28374. I think it should be checked when we initially gather the inner most loops in addInnerLoop.

test/Transforms/LoopVectorize/pr28541.ll
2

Use -loop-vectorize rather than -O2.

4

+1 and if you pass -pass-remarks-missed=loop-vectorize you can also check that the diag message is emitted correctly

mkuper added a comment.Aug 1 2016, 9:54 PM

Where this check should be is related to PR28374. I think it should be checked when we initially gather the inner most loops in addInnerLoop.

I'm not sure. I mean, both ways make sense, but the way I see it is the inner-most "loop", in the sense that there are no loops (as opposed to cycles) living inside it. So adding it, and then performing a legality check makes sense to me.
What's the motivation for doing it in addInnerLoop? I don't see why that would improve reporting, and I think it'd be nicer to leave addInnerLoop straight-forward, and keep all of the negative conditions in the legality check.

Where this check should be is related to PR28374. I think it should be checked when we initially gather the inner most loops in addInnerLoop.

I'm not sure. I mean, both ways make sense, but the way I see it is the inner-most "loop", in the sense that there are no loops (as opposed to cycles) living inside it. So adding it, and then performing a legality check makes sense to me.
What's the motivation for doing it in addInnerLoop? I don't see why that would improve reporting, and I think it'd be nicer to leave addInnerLoop straight-forward, and keep all of the negative conditions in the legality check.

I guess it all depends on whether you think these checks are similar or not. I don't really see the difference between loops and cycles. For example, when I look at the testcase in C, I essentially see two nested loops even though the inner one is a bit weird (I guess it's multi-entry). I am not sure that it make sense to differentiate the two to the user either.

Thus I'd like to have a single place where we check for the inner-most loop property (with LoopInfo and beyond) and where we properly report back to the user. This is currently addInnerLoop. (The other place is dead before the patch, AFAICT.)

chandlerc edited edge metadata.Aug 1 2016, 11:08 PM

I guess it all depends on whether you think these checks are similar or not. I don't really see the difference between loops and cycles. For example, when I look at the testcase in C, I essentially see two nested loops even though the inner one is a bit weird (I guess it's multi-entry). I am not sure that it make sense to differentiate the two to the user either.

Thus I'd like to have a single place where we check for the inner-most loop property (with LoopInfo and beyond) and where we properly report back to the user. This is currently addInnerLoop. (The other place is dead before the patch, AFAICT.)

At a somewhat fundamental level, LLVM uses the term "loop" to mean "natural loop", and not any arbitrary cycle. Similarly, throughout LoopInfo and the loop pass manager we model "inner loop" as the innermost natural loop, regardless of whether there are cycles nested within it.

So while I don't have a really strong opinion about how the vectorizer is factored (I don't really work on it) I would expect an LLVM API spelled "addInnerLoop" to mean "add inner natural loop" and not "add inner cycle which happens to be a natural loop" or "add inner loop that doesn't contain a cycle".

It is possible we could work to change LLVM's terminology, but I think that'd be a pretty big shift. So, if you want to filter these things in "addInnerLoop" I'd suggest trying to find a more precise name.

At a somewhat fundamental level, LLVM uses the term "loop" to mean "natural loop", and not any arbitrary cycle. Similarly, throughout LoopInfo and the loop pass manager we model "inner loop" as the innermost natural loop, regardless of whether there are cycles nested within it.

So while I don't have a really strong opinion about how the vectorizer is factored (I don't really work on it) I would expect an LLVM API spelled "addInnerLoop" to mean "add inner natural loop" and not "add inner cycle which happens to be a natural loop" or "add inner loop that doesn't contain a cycle".

It is possible we could work to change LLVM's terminology, but I think that'd be a pretty big shift. So, if you want to filter these things in "addInnerLoop" I'd suggest trying to find a more precise name.

Sure, I would be certainly OK changing the name if that is misleading.

I look at this more from the POV of which one is conservatively more correct?

I think that in addInnerLoop, adding a loop that actually has another non-natural loop nested is more error-prone than not. I actually believe that we have other (inner-most) loop optimizations that have the same bug as the vectorizer that use a similar code to enumerate inner-most loops. Thus it would probably be better to have a central utility that gives you natural loops that are guaranteed to have an acyclic body (whatever the name is), i.e. inner-most in the most conservative sense.

timshen updated this revision to Diff 66867.Aug 4 2016, 3:54 PM
timshen edited edge metadata.

Updated tests, and pull out hasCycleInLoopBody.

timshen updated this revision to Diff 66870.Aug 4 2016, 4:00 PM
timshen marked 4 inline comments as done.

Add FileCheck invoke to the testcase.

I think that in addInnerLoop, adding a loop that actually has another non-natural loop nested is more error-prone than not. I actually believe that we have other (inner-most) loop optimizations that have the same bug as the vectorizer that use a similar code to enumerate inner-most loops. Thus it would probably be better to have a central utility that gives you natural loops that are guaranteed to have an acyclic body (whatever the name is), i.e. inner-most in the most conservative sense.

Maybe we can do this in another patch, and have an exhaustive discussion there? I moved hasCycleInLoopBody out, so that the intended change to addInnerLoop is one-line.

mkuper added a comment.Aug 4 2016, 4:12 PM

The LV change LGTM, although if anemet strongly prefers to move this out of the legality check, and into addInnerLoop, I can live with that.

I really don't know enough about graph traits to pass judgement on that. One thing I can say is that I have the nagging suspicion LoopBodyTraits should live somewhere closer to LoopInfo, and not in LoopVectorizer.cpp.

test/Transforms/LoopVectorize/pr28541.ll
26

This doesn't look like it ought to work if you're only passing -loop-vectorize to opt.
Presumably, to get any kind of output except IR, you'd need either -debug (in which case you'd have to require asserts), or, as Adam suggested, -pass-remarks-missed. I agree with Adam - what you want is pass-remarks, and to check that you get "the loop body has a cycle".

mkuper added inline comments.Aug 4 2016, 4:13 PM
test/Transforms/LoopVectorize/pr28541.ll
26

Sorry, ignore this, commented before the last change. :-)

The LV change LGTM, although if anemet strongly prefers to move this out of the legality check, and into addInnerLoop, I can live with that.

Thanks, Michael. Yes, that would be my preference. We need to change the name though to avoid the ambiguity with loop <-> natural loop. AddLoopWithAcyclicBody?

I really don't know enough about graph traits to pass judgement on that. One thing I can say is that I have the nagging suspicion LoopBodyTraits should live somewhere closer to LoopInfo, and not in LoopVectorizer.cpp.

This is new to me too but looks pretty interesting so will stare at it more tomorrow.

timshen updated this revision to Diff 67001.Aug 5 2016, 1:09 PM

addInnerLoop -> addAcyclicInnerLoop

anemet added inline comments.Aug 6 2016, 5:35 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
224–226

I think that we want to say it here a bit more explicitly that the reason you skip the loop header so that the backedges are excluded from the graph.

230

Add a comment that this is wrapped in order to 'package' a constant Loop inside the iterator.

246

I don't think we use 'this' in such cases.

249–254

Can this be a predicate function instead?

262–265

Actually a more basic question. Why can't we filter succ_iterator directly i.e. just by passing the loop to LoopBodyFilter here?

1921–1922

This would be the place to emit the optimization remark (emitAnalysis)

4632–4635

You no longer need this.

test/Transforms/LoopVectorize/pr28541.ll
2

Why did you change from -pass-remaks-missed to -pass-remarks? I prefer positive checks over negative ones.

I think that we also want to check the particular reason for failing to vectorize, so please also pass -pass-remarks-analysis=loop-vectorize and check for that string as well.

timshen updated this revision to Diff 67250.Aug 8 2016, 5:01 PM
timshen marked 3 inline comments as done.

Updated comments for LoopBodyTraits.

lib/Transforms/Vectorize/LoopVectorize.cpp
249–254

If the predicate is a function, then to use it we have to pass it around as a function pointer, which is more expensive because of the dynamic dispatch. If we have a huge loop body it may end up with effecting the performance.

262–265

Because we want to do the filtering lazily, don't we? If we filter succ_iterator here by LoopBodyFilter, we have to create a container to hold the filtered results (so that later operator++() and operator*() can inspect those results), which is less efficient. I went for the efficient way without sacrifice the readability (?).

1921–1922

There are two problems revealed by this addInnerLoop -> addAcyclicInnerLoop change.

First, since addAcyclicInnerLoop gets called in LoopVectorizePass::runImpl, where no report/analysis infrastructure is set up, it's pretty hard to call emitAnalysis here.

Secondly, a loop with a cycle in its body never gets into LoopVectorizePass::processLoop, so nothing is logged through -pass-remarks-missed. That's why I have to check for -pass-remarks in the test case.

4632–4635

Could LoopVectorizationLegality be used in anywhere else, where addAcyclicInnerLoop doesn't guard?

test/Transforms/LoopVectorize/pr28541.ll
2

Why did you change from -pass-remaks-missed to -pass-remarks? I prefer positive checks over negative ones.

See the comments for addAcyclicInnerLoop.

hans added a subscriber: hans.Aug 11 2016, 4:03 PM

Who is this blocked on? anemet?

In D22952#513286, @hans wrote:

Who is this blocked on? anemet?

Blocked by D22951, so dblaikie. :)

Blocked by D22951, so dblaikie. :)

Sorry, dblaikie and anemet. I post some comments in the last round.

I'll look at this today. @hans, is this for 3.9 or what's the rush?

hans added a comment.Aug 11 2016, 4:09 PM

I'll look at this today. @hans, is this for 3.9 or what's the rush?

Yes, the PR is a 3.9 blocker. There's no immediate rush, I'm just trying to make sure things keep moving along :-)

anemet added inline comments.Aug 11 2016, 6:06 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
262–265

I am not sure we're talking about the same thing. I meant:

return make_filter_range(make_range<succ_iterator>(
                                          succ_begin(Node.second),
                                          succ_end(Node.second)),
                                        LoopBodyFilter(Node.first))

and then:

struct LoopBodyFilter {
  LoopBodyFitler(Loop *L) : L(L) {}
  bool operator()(BasicBlock *BB) const {
    return BB != L->getHeader() && L->contains(BB);
  }
  Loop *L;
};

?

In other words, it does not seem any different to me whether you attach the loop to the succ_iterator or the filter object.

anemet requested changes to this revision.Aug 12 2016, 10:57 AM
anemet added a reviewer: anemet.
anemet added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
1921–1922

It seem to me that nothing prevents us from letting all loops go all the way until Legality.canVectorize(). Then we can check this property as the very first thing in there.

If that does not work we can check this earlier in processLoop.

The part we're missing to report diagnostics in addAcyclicInnerLoop is the Hints which provides the pass name. Alternatively, we can also construct Hints in addAcyclicInnerLoop. This is probably the least preferred alternative though because then you need to propagate this back to processLoop.

This revision now requires changes to proceed.Aug 12 2016, 10:57 AM
timshen marked an inline comment as done.Aug 12 2016, 1:49 PM
timshen added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
1921–1922

It seem to me that nothing prevents us from letting all loops go all the way until Legality.canVectorize(). Then we can check this property as the very first thing in there.

I think the point of addInnerLoop, rather than addAllLoops, is to make the Worklist smaller. So if we want to let all loops go all the way until Legality::canVectorize(), there is a potential compilation performance change and also an analysis change.

I'm fine with changing addInnerLoop to addInnerAcyclicLoop, because it more or less is related to this patch, and it doesn't have a performance impact; but changing the "take innermost loop only" assumption of LoopVectorizePass::processLoop is too much.

I think your concern should be addressed in a separate patch.

anemet added inline comments.Aug 12 2016, 2:49 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
1921–1922

OK. There is no regression in the optimization diagnostic quality with this patch so I am fine with doing it later.

4632–4635

No.

test/Transforms/LoopVectorize/pr28541.ll
3

Then please add a FIXME about this.

timshen updated this revision to Diff 67923.Aug 12 2016, 3:32 PM
timshen edited edge metadata.

Added FIXME into the test.

timshen marked 3 inline comments as done.Aug 12 2016, 3:33 PM
timshen added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
4632–4635

Removed.

anemet accepted this revision.Aug 12 2016, 3:38 PM
anemet edited edge metadata.

LGTM with the change below.

lib/Transforms/Vectorize/LoopVectorize.cpp
4626–4630

Please add a FIXME, that this code is dead too as we discovered (unless you will remove it in a quick follow-up).

This revision is now accepted and ready to land.Aug 12 2016, 3:38 PM
timshen updated this revision to Diff 67927.Aug 12 2016, 3:44 PM
timshen marked an inline comment as done.
timshen edited edge metadata.

Added a FIXME into canVectorize() on innertmost loop check.

timshen marked an inline comment as done.Aug 12 2016, 3:46 PM
timshen added inline comments.
lib/Transforms/Vectorize/LoopVectorize.cpp
4626–4630

I added a FIXME, since it's unclear to me if it's a good idea to remove things from canVectorize() - this function seems to do all kinds of centralized checking, regardless of the caller side situation.

I'll leave the potential actual removal in the future, possibly after a discussion.

timshen marked an inline comment as done.Aug 12 2016, 3:47 PM
anemet added inline comments.Aug 12 2016, 3:49 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
4626–4630

OK.

For the record, there is already the PR I quoted above about not reporting anything for non-inner-most loops.

This revision was automatically updated to reflect the committed changes.

Thank you all for reviewing!