This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Fix getAddExpr for adding loop invariants into start of some AddRec
AbandonedPublic

Authored by skatkov on Jul 27 2021, 12:29 AM.

Details

Summary

getAddExpr utility uses computed flags from AddRec plus loop invariants
in AddRec.Start + loop invariants basing on the an assumption that
0th iteration of the loop exists. However the loop might be dead
(runtime or compile time), in this case the propagation of the flag becomes
invalid.

To fix the bug we need to ensure that 0th iteration happens or use less
strict set of flags.
The patch uses the check for dominating all latches from used loops to ensure
that 0th iteration exists.

Diff Detail

Event Timeline

skatkov created this revision.Jul 27 2021, 12:29 AM
skatkov requested review of this revision.Jul 27 2021, 12:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 27 2021, 12:29 AM

When you're checking for is0thIterationGuaranteed, do you also need to check for abnormal exits from the loop?

What happens if you just assume is0thIterationGuaranteed is always false? I'm not sure all this work to try to prove the flags is worth doing.

When you're checking for is0thIterationGuaranteed, do you also need to check for abnormal exits from the loop?

This is a good question. If I understand your question correctly you are asking about abnormal exit in outer loop,
before entering the current loop. Probably it is a problem.

What happens if you just assume is0thIterationGuaranteed is always false? I'm not sure all this work to try to prove the flags is worth doing.

At least no-one LLVM test is failing. Do you propose to just always say that we cannot guarantee 0th iteration?
I'll update the patch. If it will cause any performance issue we can return to discussion how to detect that 0th iteration is possible.

skatkov added a comment.EditedJul 27 2021, 9:14 PM

What happens if you just assume is0thIterationGuaranteed is always false? I'm not sure all this work to try to prove the flags is worth doing.

At least no-one LLVM test is failing. Do you propose to just always say that we cannot guarantee 0th iteration?
I'll update the patch. If it will cause any performance issue we can return to discussion how to detect that 0th iteration is possible.

ok I was too optimistic, I've got 15 failures which will look into

Failed Tests (15):
  LLVM :: Analysis/LoopAccessAnalysis/number-of-memchecks.ll
  LLVM :: Analysis/LoopAccessAnalysis/reverse-memcheck-bounds.ll
  LLVM :: Analysis/ScalarEvolution/flags-from-poison.ll
  LLVM :: Analysis/ScalarEvolution/incorrect-exit-count.ll
  LLVM :: Analysis/ScalarEvolution/max-backedge-taken-count-guard-info.ll
  LLVM :: Analysis/ScalarEvolution/no-wrap-symbolic-becount.ll
  LLVM :: Analysis/ScalarEvolution/nsw-offset-assume.ll
  LLVM :: Analysis/ScalarEvolution/nsw-offset.ll
  LLVM :: Analysis/ScalarEvolution/nsw.ll
  LLVM :: Analysis/ScalarEvolution/ptrtoint.ll
  LLVM :: Analysis/ScalarEvolution/range_nw_flag.ll
  LLVM :: CodeGen/ARM/ParallelDSP/pr42729.ll
  LLVM :: Transforms/LoopIdiom/basic.ll
  LLVM :: Transforms/LoopStrengthReduce/X86/expander-crashes.ll
  LLVM :: Transforms/LoopVectorize/runtime-check-pointer-element-type.ll

Ok, it happened due to I disabled also the case when Start + Invariants has no AddRecs inside.

skatkov updated this revision to Diff 362304.Jul 28 2021, 1:40 AM

please take a look.

This comment was removed by mkazantsev.
efriedma added inline comments.Jul 30 2021, 12:28 PM
llvm/lib/Analysis/ScalarEvolution.cpp
2791

I'm not sure I see the connection between UsedLoops.size() and the treatment of nowrap flags.

I think that the first version of this patch was better. The only problem that was confusing is naming. In fact, is0thIterationGuaranteed should be is0thIterationDominatedByThis. It is totally fine if the 0th iteration of outer loop does not happen (because of abnormal exit or whatsoever). The important bit that it cannot happen withou execution of the given Add. And therefore the no-wrap flags on the addrec may be inferred from the fact that this add has executed.

I suggest returning the initial solution with this renaming.

@efriedma WDYT?

A SCEV add being marked nsw means, essentially, that at any point in the IR where all the operands are defined, the add doesn't wrap.

The problem here is that we're splitting the add into pieces. a+b+1 being nsw doesn't necessarily imply that a+1 is nsw in all contexts; the nsw only applies in contexts where b is defined. This is a little subtle, but it's the only interpretation that's consistent with some of the ways we try to prove nsw flags.

(The definition point for a SCEV is either the point of definition for a SCEVUnknown, or the beginning of the loop header of an AddRec.)

But we can extend the logic a little. If a+1 is nsw in contexts where b is defined, it's also nsw in any context that must eventually flow to the definition of b. If the definition of a is such a context, a+1 is always nsw.

So, for example, if "a" is an addrec for a loop, and "b" is an addrec for a loop nested inside the first loop, and the control flow is simple, nsw on "a+b+1" implies nsw for "a+1". Abnormal exits break this sort of proof, though: we assumed control must eventually flow to b's loop header.

Also, this is possibly an argument for changing the way nsw flags work. The amount of work we do to try to translate simple IR markings in SCEV is getting a bit crazy.

The problem here is that we're splitting the add into pieces. a+b+1 being nsw doesn't necessarily imply that a+1 is nsw in all contexts; the nsw only applies in contexts where b is defined. This is a little subtle, but it's the only interpretation that's consistent with some of the ways we try to prove nsw flags.

I think this is a self-contradictory interpretation. What if we instead computed a+b+1-b <nsw> and then decided to simplify b away, getting a+1<nsw> and no hint that it's only for b's context?

The problem here is that we're splitting the add into pieces. a+b+1 being nsw doesn't necessarily imply that a+1 is nsw in all contexts; the nsw only applies in contexts where b is defined. This is a little subtle, but it's the only interpretation that's consistent with some of the ways we try to prove nsw flags.

I think this is a self-contradictory interpretation. What if we instead computed a+b+1-b <nsw> and then decided to simplify b away, getting a+1<nsw>

a+1 wouldn't be nsw? And in fact, that's what getAddExpr currently does. I don't see how that's a contradiction.

If we say that a+b+1-b <nsw> implies a+1<nsw>, and similarly say ({a,+,1} + b)<nsw> implies (a+b)<nsw>, that leads to a consistent system, I think. But that would imply the bug here isn't in getAddExpr at all; instead, getNoWrapFlagsFromUB() is fundamentally broken.

mkazantsev added a comment.EditedAug 4 2021, 9:13 PM

a+1 wouldn't be nsw? And in fact, that's what getAddExpr currently does. I don't see how that's a contradiction.

Imagine that the point where b is defined is also guarded by other conditions. In particular, this very place can be guarded by a != SINT_MAX. We could put the <nsw> only because we took all these guards into consideration. In this case, a+1+b-b<nsw> implies a+1<nsw> *only at the point where b is defined* and nowhere else.

So whenever b exists, it is legal to say a+1+b-b is nsw because of facts unrelated to value of b, but relevant to the specific point in code.

But there is no way to specify this, right?

But there is no way to specify this, right?

Exactly; we don't have any place to store nsw markings that only apply to a specific region of the function. This is also the reason that D106331 is so awkward.

Spent some time trying to get my head around this review, and how we generally handle flags on SCEV objects used in multiple contexts. https://github.com/preames/public-notes/blob/master/llvm-loop-opt-ideas.rst#scev-wrap-flags

My conclusion so far is that this patch is not a complete fix. Or at least, I found an analogous bit of code which includes several preconditions this code does not.

I'll also note that I'm not yet at a point of really having a feeling for what path forward we should take. The way flags are handled currently appears to be largely the inverse of how flags in IR CSE are handled. I'm still wrapping my head around the implications of a localized fix (how much opt quality do we loose?) and deeper changes to the representation of flags (how scary is it).

Just a side note (maybe not directly related to this one): the way how flags in SCEV are designed now (effectively set *after* SCEV construction is finished and later mutated) has been a source of subtleties and tricky bugs for a good while. I know it's hard, but maybe at some point we should just stop trying to do what we are doing now, and make SCEVs truly immutable. This will, in particular, prevent us from updating of flags of outer loop and fix this bug along with many other bugs of this variety.

This complexity just doesn't seem worth sustaining.

reames added a comment.Sep 7 2021, 1:48 PM

Spent some more time thinking about this, and updated my running summary (https://github.com/preames/public-notes/blob/master/llvm-loop-opt-ideas.rst#scev-wrap-flags) with my thinking on how to fix the root problem.

However, I do want to explicitly note that I'm open to incrementalism here. This *particular* patch is addressing a *particular* instance of our flags problem. I am 100% open to fixing this issue in isolation. (See inline comments)

llvm/lib/Analysis/ScalarEvolution.cpp
2781–2802

This comment is subtle, and almost correct, but not quite. We'd have to prove both that the loop loop is always taken, and that the values are defined in the function scope. (a.g. a GEP off a global pointer would require a non-function scope)

2791

As with Eli, I don't see why the loops used is relevant. Take a look at the interesting example from my writeup, and I think you'll see this check is irrelevant.

I'm guessing that this was an attempt to prevent the need for widespread test changes. Can you confirm that and give a feel for how bad they are? (Autoupdate is your friend...)

The place I expect this to matter the most is in trip count computation. Given that, I'd be really curious to see if rebasing this over Eli's D106331 helps reduce that test diff. That change should get some of the context sensitivity lost here back, and might very well cut down on the impact.

I wrote up what I think the current semantics are supposed to be in https://reviews.llvm.org/D109553. Assuming we agree, I'd like to land that, and then fix this piece of code accordingly.

I still think we probably want different semantics, but getting to my desired semantics from our current ones looks quite involved, and I think we need to start with getting to *any* consistent model.

The semantics proposed in D109553 have been approved and landed. I have a review (D109845) which addresses the same issue as this one, but is based on reasoning in line with the new semantics.

skatkov abandoned this revision.Sep 15 2021, 8:13 PM

Abandon in favor of D109845