This is an archive of the discontinued LLVM Phabricator instance.

Fix a stack overflow in ScalarEvolution.
ClosedPublic

Authored by jreiffers on Jul 14 2022, 2:06 AM.

Details

Summary

Unfortunately, this overflow is extremely hard to reproduce reliably (in fact, I was unable to do so). The issue is that:

  • getOperandsToCreate sometimes skips creating an SCEV for the LHS
  • then, createSCEV is called for the BinaryOp
  • ... which calls getNoWrapFlagsFromUB
  • ... which under certain circumstances calls isSCEVExprNeverPoison
  • ... which under certain circumstances requires the SCEVs of all operands

For certain deep dependency trees, this causes a stack overflow.

Diff Detail

Event Timeline

jreiffers created this revision.Jul 14 2022, 2:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 2:06 AM
jreiffers requested review of this revision.Jul 14 2022, 2:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 2:06 AM
nikic added a subscriber: nikic.

Before doing something like this, we should evaluate compile-time impact of dropping this optimization entirely.

jreiffers added a comment.EditedJul 14 2022, 2:17 AM

Before doing something like this, we should evaluate compile-time impact of dropping this optimization entirely.

Happy to do that. programUndefinedIfPoison looks fairly expensive on its own. I'm fairly new to this project, do you have a pointer what to run? Also, unfortunately some tests appear to depend on this optimization.

It would also be great if I could drop the optimization in a follow-up, since this bug is currently causing production issues for us (and I'm not sure how long the tests will take to fix).

fhahn added a comment.Jul 14 2022, 8:35 AM

Before doing something like this, we should evaluate compile-time impact of dropping this optimization entirely.

Happy to do that. programUndefinedIfPoison looks fairly expensive on its own. I'm fairly new to this project, do you have a pointer what to run? Also, unfortunately some tests appear to depend on this optimization.

It would also be great if I could drop the optimization in a follow-up, since this bug is currently causing production issues for us (and I'm not sure how long the tests will take to fix).

Would it be possible to share a reproducer? Is this a new stack overflow?

jreiffers added a comment.EditedJul 14 2022, 9:13 AM

Would it be possible to share a reproducer? Is this a new stack overflow?

Here's what I've been using:

std::string replace_all(const std::string &s, const std::string &needle,
                        const std::string &replacement) {
  std::size_t prev = 0, pos;
  std::string out = s;
  while ((pos = out.find(needle, prev)) != std::string::npos) {
    prev = pos;
    out.replace(pos, needle.length(), replacement);
  }
  return out;
}

TEST_F(ScalarEvolutionsTest, StackOverflowForDeepExpression) {
#define MAX 70
#define MAX_STR "70"

  std::string kLoopBodyTemplate = R"(
    %a${n} = getelementptr inbounds i8, ptr %0, i64 14794768
    %b${n} = getelementptr inbounds [1 x [8192 x float]], ptr %a${n}, i64 0, i64 0, i64 %i${n-1}
    %c${n} = load float, ptr %b${n}, align 4
    %d${n} = getelementptr inbounds [1 x [8192 x [2 x float]]], ptr %1, i64 0, i64 0, i64 %i${n-1}, i64 0
    store float %c${n}, ptr %d${n}, align 8
    %e${n} = getelementptr inbounds i8, ptr %1, i64 14762000
    %f${n} = getelementptr inbounds [1 x [8192 x float]], ptr %e${n}, i64 0, i64 0, i64 %i${n-1}
    %g${n} = load float, ptr %f${n}, align 4
    %h${n} = getelementptr inbounds [1 x [8192 x [2 x float]]], ptr %1, i64 0, i64 0, i64 %i${n-1}, i64 1
    store float %g${n}, ptr %h${n}, align 4
    %i${n} = add nuw nsw i64 %i${n-1}, 1
    %j${n} = icmp eq i64 %i${n}, 8192
  )";

  std::string ModuleCode = R"(
    define i32 @foo() { 
    while.171.exit:
      br label %fusion.286.loop_header.dim.2.preheader
    fusion.286.loop_header.dim.2.preheader:           ; preds = %fusion.286.loop_header.dim.2.preheader, %while.171.exit
      %i0 = phi i64 [ 0, %while.171.exit ], [ %i)" MAX_STR
                              R"(, %fusion.286.loop_header.dim.2.preheader ]
      %0 = inttoptr i32 0 to ptr
      %1 = inttoptr i32 0 to ptr
  )";

  for (int i = 1; i <= MAX; ++i) {
    ModuleCode += replace_all(replace_all(kLoopBodyTemplate, "${n}", itostr(i)),
                              "${n-1}", itostr(i - 1));
  }

  ModuleCode +=
      "br i1 %j" MAX_STR
      R"(, label %fusion.286.loop_exit.dim.0, label %fusion.286.loop_header.dim.2.preheader
  fusion.286.loop_exit.dim.0:
    ret i32 0
  }
  )";

  LLVMContext C;
  SMDiagnostic Err;
  std::unique_ptr<Module> M = parseAssemblyString(ModuleCode, Err, C);

  ASSERT_EQ(Err.getMessage(), "");

  struct rlimit rl;
  ASSERT_EQ(getrlimit(RLIMIT_STACK, &rl), 0);

  rl.rlim_cur = 1;
  ASSERT_EQ(setrlimit(RLIMIT_STACK, &rl), 0);

  ASSERT_TRUE(M && "Could not parse module?");
  ASSERT_TRUE(!verifyModule(*M, &llvm::errs()) &&
              "Must have been well formed!");

  runWithSE(*M, "foo", [&](Function &F, LoopInfo &LI, ScalarEvolution &SE) {
    auto *ScevIV = SE.getSCEV(getInstructionByName(F, "i63"));
    ASSERT_NE(ScevIV, nullptr);
  });
}

But it's not very reliable, you may have to run it a few times and/or fiddle with MAX. For me it fails about 30% of the time.

fhahn added a comment.Jul 14 2022, 9:16 AM

Thanks, I'll give that a try!

But it's not very reliable, you may have to run it a few times and/or fiddle with MAX. For me it fails about 30% of the time.

This seems very surprising. I'd expect the stack overflow to reproduce consistently as each run should produce the same call stack with the same size (at least if the depth is deep enough). There's probably some non-determinism somewhere that should also be fixed.

nikic added a comment.Jul 14 2022, 9:19 AM

Before doing something like this, we should evaluate compile-time impact of dropping this optimization entirely.

Happy to do that. programUndefinedIfPoison looks fairly expensive on its own. I'm fairly new to this project, do you have a pointer what to run? Also, unfortunately some tests appear to depend on this optimization.

Ah sorry, my wording here was really ambiguous: By "optimization" I was referring to the compile-time optimization of combining operands from multiple add/muls, rather than just creating pairwise expressions. The preservation of nowrap flags via UB reasoning is important, and not something that can be dropped.

Ah sorry, my wording here was really ambiguous: By "optimization" I was referring to the compile-time optimization of combining operands from multiple add/muls, rather than just creating pairwise expressions. The preservation of nowrap flags via UB reasoning is important, and not something that can be dropped.

I also meant just trying to drop the special-casing of add and mul in getOperandsToCreate. I didn't run the tests myself, but bkramer mentioned that doing this (i.e. always enqueueing LHS and RHS here) broke some tests.

This seems very surprising. I'd expect the stack overflow to reproduce consistently as each run should produce the same call stack with the same size (at least if the depth is deep enough). There's probably some non-determinism somewhere that should also be fixed.

I agree, it's somewhat weird. However, I was running this on our internal test infrastructure, so the tests may have been running on different architectures/CPUs. Maybe that plays a role.

Friendly ping on this.

jreiffers updated this revision to Diff 447216.Jul 25 2022, 1:23 AM

Rebase changes.

bkramer accepted this revision.Aug 1 2022, 11:05 AM

Can we land this? The deep recursion is actively causing trouble for LLVM users by crashing the compiler. Even if it's not a perfect fix it's still better than overflowing the stack.

This revision is now accepted and ready to land.Aug 1 2022, 11:05 AM
fhahn added a comment.Aug 1 2022, 2:06 PM

Yeah, let me try to reproduce this locally tomorrow and land it over wise. @jreiffers could you share the interesting bit of the stack-trace causing the overflow?

@fhahn The recursion loop is createSCEV -> isSCEVExprNeverPoison -> createSCEVIter -> createSCEV (see also the summary of this change).

fhahn accepted this revision.Aug 2 2022, 2:29 PM

LGTM thanks! I managed to reproduce this locally with much larger value for MAX & co. It looks like avoiding the optimization is the only option here, as createSCEV itself will create SCEVs for all intermediate binary expressions in those cases......

It might be good to adjust the commit message to be a bit more precise about the actual fix.

Thanks for the reviews and discussion!

This revision was landed with ongoing or failed builds.Aug 3 2022, 2:08 AM
This revision was automatically updated to reflect the committed changes.
frasercrmck added inline comments.
llvm/lib/Analysis/ScalarEvolution.cpp
7318

Correct me if I'm wrong but we're not yet able to use C++17 features. This is causing a build bot failure: https://lab.llvm.org/buildbot#builders/77/builds/20381

fhahn added inline comments.Aug 5 2022, 3:40 AM
llvm/lib/Analysis/ScalarEvolution.cpp
7318

@jreiffers what's the status of this? Does the commit need reverting?

fhahn added a comment.Aug 5 2022, 3:40 AM

It might be good to adjust the commit message to be a bit more precise about the actual fix.

It would have been good to improve the commit title....

It might be good to adjust the commit message to be a bit more precise about the actual fix.

It would have been good to improve the commit title....

Sorry, missed that. Also, I don't know what improvement you had in mind. I find it pretty clear.

llvm/lib/Analysis/ScalarEvolution.cpp
7318
fhahn added a comment.Aug 5 2022, 4:10 AM

It might be good to adjust the commit message to be a bit more precise about the actual fix.

It would have been good to improve the commit title....

Sorry, missed that. Also, I don't know what improvement you had in mind. I find it pretty clear.

It's too generic, it should ideally be more clear about which stack overflow was fixed for example.

It's too generic, it should ideally be more clear about which stack overflow was fixed for example.

There's a description of what causes the stack overflow in the summary. I don't really have anything to add to that.