This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Guard movement of insertion point for loop-invariants
ClosedPublic

Authored by wristow on Nov 19 2018, 10:37 AM.

Details

Summary

r320789 suppressed moving the insertion point of SCEV expressions with dev/rem operations to the loop header in non-loop-invariant situations. The hoisitng is also unsafe in the loop-invariant case, since there may be a guard against a zero denominator. This is an adjustment to the fix of r320789 to suppress the movement even in the loop-invariant case.

This fixes PR30806.

Diff Detail

Event Timeline

wristow created this revision.Nov 19 2018, 10:37 AM
skatkov added inline comments.Nov 22 2018, 10:11 PM
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

Should this part be fixed as well?

wristow marked an inline comment as done.Nov 26 2018, 11:40 AM
wristow added inline comments.
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

It does seem like it should. But I couldn't come up with a C/C++ test-case that exercised that area. That, along with the comment about "even though it's an invalid position" made me hesitant to touch this area.

Assuming this part should also be guarded, then since the SafeToHoist lambda is depends only on the SCEV S itself (rather than it being control-flow-sensitive, for example), we could move the guard up above the Loop *L loop that begins on line 1735. Do you agree? (Although if we ever intended for SafeToHoist to be more sophisticated and analyze specific test-conditions to potentially hoist over, then leaving the checks inside the loop leaves us in a better place for that potential improvement. Personally, I lean toward moving the SafeToHoist guard outside the Loop *L loop.)

skatkov added inline comments.Nov 26 2018, 6:14 PM
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

I guess you cannot find a reproducer that case just because usually there is always a loop preheader.
In other words to reproduce this case you need a pass which uses this utility function when there is no a loop pre-header for the loop. It might be a dead code right now.

You can add a logging print to catch a case when this "else" is executed and run bunch of tests. If you find a case when it is triggered you can write a case after that.

wristow added inline comments.Nov 26 2018, 6:23 PM
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

I think you're right about there always being a loop pre-header when starting from C/C++. I'll do the experimenting you suggest with a logging-print to see whether that code is exercised in any tests I have. I haven't tried, but I imagine I could start with a .ll test-case with a loop pre-header, and manually tweak it to not have one (and so then it would exercise that code). I'll play with that a bit, too.

skatkov added inline comments.Nov 26 2018, 6:29 PM
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

It is a good idea to start with an ll file with pre-header and then eliminate it. Be careful with loop passes. They should build LCSSA form first and pre-header will be installed automatically. However loop vectorizer is a functional pass so you should be ok to play with it.

BTW you can even try the test from this CL. Just eliminate pre-header (do two predecessors for the current loop header) and check what happens.

wristow marked an inline comment as done.Nov 27 2018, 11:07 PM
wristow added inline comments.
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

Assuming this part should also be guarded, then since the SafeToHoist lambda is depends only on the SCEV S itself (rather than it being control-flow-sensitive, for example), we could move the guard up above the Loop *L loop that begins on line 1735. Do you agree?

Actually, that isn't true. Moving the SafeToHoist check above the Loop *L loop as a whole is slightly different semantically. So I'm abandoning that thought.

skatkov added inline comments.Nov 27 2018, 11:10 PM
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

What exactly do you mean by "slightly different semantically"?

wristow marked an inline comment as done.Nov 27 2018, 11:15 PM
wristow added inline comments.
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

I've experimented with this, and I'm unable to create a test-case that exercises that code when vectoriztion is enabled.

When I manually create a .ll file with a loop that doesn't have a preheader, and I enable vectorization (via opt -loop-vectorize), a preheader is always created in the resulting IR. Looking into it, I see that the Loop Canonicalization Pass (in "LoopSimplify.cpp") creates the preheader (and other related) blocks. (As an aside, if I tweak "LoopSimplify.cpp" to disable that pass to experiment with this, then assertions fail due to unexpected block structure (e.g., Entry blocks not dominating Exit blocks).)

As a couple of side points: If I disable the setting of InsertPt in that "loop Preheader is null" case, all the LLVM tests pass. But on the other hand, that code is not 100% dead, in that I see it gets exercised for three LLVM tests:

LLVM :: CodeGen/Hexagon/loop-prefetch.ll
LLVM :: CodeGen/PowerPC/loop-data-prefetch-inner.ll
LLVM :: CodeGen/PowerPC/ppc64-get-cache-line-size.ll

None of those tests have vectorizaiton enabled (they all run llc, and look for some cache/prefetch-related instruction).

Looking at the history of when that "loop Preheader is null" code was added, it was done back in revision 147439 (in early 2012). That commit included adding a new test:

llvm/test/Transforms/LoopStrengthReduce/2012-01-02-nopreheader.ll

but that test-case no longer exercises that block of code, and so of course it passes with it disabled. (There are a couple other similar tests that were added later, and they also don't exercise that block of code.)

The bottom line is that other than the three prefetch-related tests (which don't enable vectorization), I don't have any test-cases that exercise that code at all, much less demonstrate how/whether it is needed (again, those three prefetch-related tests that do exercise that "loop Preheader is null" code all still pass, even if that code is disabled). And in particular, it appears to be impossible to create a test with vectorization enabled that exercises that code. So I'm hesitant to make a change to that code (by guarding it with SafeToHoist). Someone with more experience in this area may be able to come up with a test-case (or justify also guarding it, without a test-case -- or explain why it should not be guarded). But I think that additional step can be done separately from this patch.

wristow marked an inline comment as done.Nov 27 2018, 11:19 PM
wristow added inline comments.
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

The while loop in the non-loop-invariant portion currently gets executed irrespective of whether SafeToHoist is true. If we move the SafeToHoist guard above the entire Loop *L loop, that while loop will not be executed when SafeToHoist is false.

skatkov added inline comments.Nov 27 2018, 11:38 PM
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

Well, may be I'm missing something....

SafeToHoist does not depend on L. We can hoist check for SafeToHoist from the for loop.
Let's compare the semantic if it returns false:
In the new case we will not do anything.
In the old case:
You will not be able to modify InsertPt if SafeToHoist==false and case when there is no pre-header should be fixed as well.
As soon as InsertPt is not changed it is an equal to &*Builder.GetInsertPoint();
So the first condition in a while loop you mentioned will be false and while loop will not be executed.

So I do not see any difference. Do I miss anything?

mkazantsev added inline comments.
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

I agree with Serguei that the case of a loop without preheader should also be fixed. I don't see any way how presence of preheader affects the bug being fixed. The simple example that comes to my mind is following:

void foo(int x, int y) {
    for (int i = 0; i < N; i++) {
      call foo(y); // Executes system.exit if y == 0.
      // Expansion point
    }
}

We request SCEV Expander to expand x / y at the expansion point. Provided that both x and y are loop-invariant, this logic should pick loop header's first instruction as the insertion point. This is incorrect, because if y == 0, the execution should not reach the expansion point and we should never divide by zero. If we insert in loop header, this will happen.

I guess it is hard (or impossible) to write an opt test on a loop which will have no preheader, because in practice all loop passes forcefully create it. You can write a C++ unit test on that. Please find examples at unittests/Analysis/ScalarEvolutionTest.cpp.

wristow marked an inline comment as done.Nov 28 2018, 9:55 AM
wristow added inline comments.
llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1734–1735

First, thanks for all that feedback!

About the semantics of moving the guard to outside the loop:

Well, may be I'm missing something....
...
So the first condition in a while loop you mentioned will be false and while loop will not be executed.

So I do not see any difference. Do I miss anything?

You're absolutely right, Serguei. I missed the point that the first condition of the while loop is guaranteed to be false if InsertPt hasn't previously been modified. With that, then assuming everyone is happy with also guarding the case without the loop preheader, then we can move the SafeToHoist guard out of the loop completely.

Regarding Max's comment:

I agree with Serguei that the case of a loop without preheader should also be fixed. I don't see any way how presence of preheader affects the bug being fixed. The simple example that comes to my mind is following:
...

I definitely agree in principle that the lack of a preheader wouldn't make a dangerous hoist safe. My concern is that I was unable to create a test-case that didn't have a preheader, and so I couldn't demonstrate what that code was doing. And more directly, the comment in that block of code:

// LSR sets the insertion point for AddRec start/step values to the
// block start to simplify value reuse, even though it's an invalid
// position. SCEVExpander must correct for this in all cases.

seems to be saying "InsertPt is being set incorrectly here on purpuse, and SCEVExpander is going to fix it". That made me uncomfortable that there is some delicate balance happening here. And since I couldn't come up with a test-case that demonstrates that delicate balance (and since I'm not experienced in this code), I was hesitant to disrupt what might be a delicate balance. But since you two are experienced in this code, and are not concerned there is a delicate balance being disrupted by also guarding that code, I'm happy to take your advice on that.

To clarify my point of my difficulty of coming up with a test-case, if I take Max's test-case and compile it to a .ll file (at -O1), there will be a loop preheader. It's straightforward to edit it to remove the preheader, but then if I run opt -loop-vectorize on it, the Loop Canonicalization Pass in "LoopSimplify.cpp" re-creates the preheader. So frustratingly for me, that "no loop preheader" block of code still doesn't get exercised.

In short: I'm definitely fine re-doing this patch to also guard that non-loop-preheader portion, given that you two aren't concerned it's breaking some delicate balance. And in re-doing it, I'll move the SafeToHoist guard outside the loop entirely.

wristow updated this revision to Diff 175737.Nov 28 2018, 11:46 AM

Have the SafeToHoist check guard the entire hoisting loop, as discussed in the review.

wristow updated this revision to Diff 175740.Nov 28 2018, 11:55 AM

Tabs were accidentally added in the previous updated patch. This update just removes those tabs.

skatkov accepted this revision.Nov 28 2018, 6:59 PM

LGTM. If Max or Sanjoy does not object in a, say, one day, feel free to land.

llvm/lib/Analysis/ScalarEvolutionExpander.cpp
1765

you can remove {}

1776

you can remove {}. I know it is not your code but probably makes sense.

This revision is now accepted and ready to land.Nov 28 2018, 6:59 PM

Thanks for the review. I'll hold off for a day or so before committing, to give Max and Sanjoy an opportunity to object. (And I'll remove the {} that you suggested, before committing.)

To clarify my comment about test: I've given an example of how it conceptually should look. Unit tests provide you opportunity to construct exactly the CFG you want, not relying on what opt, clang or some canonicalization passes are doing. If you take a look at unittests/Analysis/ScalarEvolutionTest.cpp, there are examples of that:

// Check that SCEV does not save the SCEV -> V
// mapping of SCEV differ from V in NSW flag.
TEST_F(ScalarEvolutionsTest, SCEVCacheNSW) {
  /*
   * Create the following code:
   * func(i64 %a)
   * entry:
   *  %s1 = add i64 %a, -1
   *  %s2 = add nsw i64 %a, -1
   *  br label %exit
   * exit:
   *  ret %s
   */

  // Create a module.
  Module M("SCEVCacheNUW", Context);

  Type *T_int64 = Type::getInt64Ty(Context);

  FunctionType *FTy =
      FunctionType::get(Type::getVoidTy(Context), { T_int64 }, false);
  Function *F = cast<Function>(M.getOrInsertFunction("func", FTy));
  Argument *Arg = &*F->arg_begin();

...

You can construct the loop without preheader in this way. You can then explicitly ask expander to expand something to the point you want (there are examples of that in this file).

I am OK if you merge this patch as is under condition that you will later construct the unit test for non-preheader case, and if it will fail, fix it as well.

skatkov added inline comments.Nov 28 2018, 9:39 PM
llvm/test/Transforms/LoopVectorize/pr30806.ll
8

As a side comment, you can consider updating your pipeline with adding a loop unswitch pass somewhere before vectorization. This invariant check should be trivially hoisted from the outer loop.
You'll get better code and will not meet this bug probably :)

@mkazantsev: Thanks for clarifying about the Unit test approach. I'll create a Unit test to check on the non-preheader case as a followup commit.

@skatkov: About the loop unswitching suggestion, that sounds like a fine idea to me. But definitely outside the scope of this fix. (As an aside, other than a compiler-crash during vectorization (due to a deref of a null pointer), this is my first experience in the llvm vectorizer. So I'm not familiar with how to go about adding the improvement you suggested. It's experience I'd like to gain, so I'll consider it as possible future work. But to make sure I'm not misleading you, I don't have time to work on this in the immediate future.)

@skatkov: About the loop unswitching suggestion, that sounds like a fine idea to me. But definitely outside the scope of this fix.

Absolutely agreed. It was just a sid comment.

(As an aside, other than a compiler-crash during vectorization (due to a deref of a null pointer), this is my first experience in the llvm vectorizer. So I'm not familiar with how to go about adding the improvement you suggested.

I'm not sure whether you use a custom pipeline or not. But my point was that code came to vectorization pass seems strange because trivial loop unswitch optimization (provided by LoopUnswitch pass) could hoist the invariant check. By some reason it did not happen.

It's experience I'd like to gain, so I'll consider it as possible future work. But to make sure I'm not misleading you, I don't have time to work on this in the immediate future.)

Not a problem at all.

I'm not sure whether you use a custom pipeline or not. But my point was that code came to vectorization pass seems strange because trivial loop unswitch optimization (provided by LoopUnswitch pass) could hoist the invariant check. By some reason it did not happen.

To be clear, I'm not using a custom pipeline. I just happen to have a customer who reported a runtime crash because of an integer-div-by-0 (compiled simply as clang -O2), and that led me to PR30806. Which led me to looking into the vectorizer,.

All that said, your comment about it should have been caught by the LoopUnswitch pass, now makes me more optimistic about tackling that. Still not committing that I'll get to it soon, but it seems less daunting now. :)

This revision was automatically updated to reflect the committed changes.

This fix (rL347934) was reverted at rL348426 because it was causing SEGVs in instcombine. As noted in D55232, a similar fix as this one was originally committed about a year ago (rL289412), but it was reverted shortly thereafter at rL289453, because of OOB PHI operand access in instcombine. The cause of the SEGVs here may be the same underlying issue.