This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Establish control over disposition caches
AbandonedPublic

Authored by mkazantsev on Sep 24 2021, 1:35 AM.

Details

Summary

SCEV's loop and block disposition caches are a mess. Loop disposition has
a public method to be called to external users when it may break, but for
some reason is not called from forgetLoop. Block disposition does not even
have API for this. So it may be (and actually is) broken.

It should not be a responsibility of transform passes to know what caches SCEV
has or how they work. It's enough that have API to forgetLoop and forgetValue
that allow to signal that something might have changed.

In this patch, we regain full control of SCEV over its disposition caches. Accessor
methods are moved to private API and are now called from forgetLoop and
forgetValue. It might be tad more conservative than needed, but it's a lesser evil
compared to the broken state we now have.

It may have a negative compile time impact. Though, it shouldn't really be a problem:
typically passes tend to make all queries first and only then do the IR changes, so we
should at least enjoy full power of caching while querying.

Diff Detail

Event Timeline

mkazantsev created this revision.Sep 24 2021, 1:35 AM
mkazantsev requested review of this revision.Sep 24 2021, 1:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2021, 1:35 AM
nikic requested changes to this revision.Sep 27 2021, 12:19 PM

I'd like to see some analysis of what is actually wrong in the affected test cases. We do already invalidate dispositions in forgetValue() and forgetLoop(), and I'd like to have an understanding of why the current invalidation is insufficient. Sure, invalidating *everything* will certainly fix the problem, but I don't think that's how we want to do things if it can be avoided.

llvm/lib/Analysis/ScalarEvolution.cpp
7548

This looks like a really big hammer that goes against how SCEV invalidation normally works. Note that the loop above is supposed to walk all SCEVs based on the value and calls forgetMemoizedResults() for them -- this includes clearing the loop and block dispositions.

This revision now requires changes to proceed.Sep 27 2021, 12:19 PM

I'd like to see some analysis of what is actually wrong in the affected test cases. We do already invalidate dispositions in forgetValue() and forgetLoop(), and I'd like to have an understanding of why the current invalidation is insufficient. Sure, invalidating *everything* will certainly fix the problem, but I don't think that's how we want to do things if it can be avoided.

Neither forgetValue nor forgetLoop drops BlockDisposition cache. We literally don't have a method to drop it surgically (like the one we have for Loop Disposition). And we have some cache which never gets invalidated, things go wrong eventually.

In D110385, I simply added validation of this cache and it started stepping over dangling pointers in LCSSA and LoopFusion pass. They look to not be the only sources in the problem, because we never ever tried to keep this cache in legal state.

I think when thigs are *that* broken, "what is actually wrong" doesn't matter, because cache that is never verified and never indalidated is just broken, unless opposite is proven.

mkazantsev added inline comments.Sep 28 2021, 7:56 AM
llvm/lib/Analysis/ScalarEvolution.cpp
7548

You are correct, I'll remove it.

nikic added a comment.Sep 28 2021, 7:59 AM

Neither forgetValue nor forgetLoop drops BlockDisposition cache. We literally don't have a method to drop it surgically (like the one we have for Loop Disposition). And we have some cache which never gets invalidated, things go wrong eventually.

The invalidation happens here:

https://github.com/llvm/llvm-project/blob/f3932ae1a078075e9e35f51ead3aaca05a9a23c7/llvm/lib/Analysis/ScalarEvolution.cpp#L12723-L12724
https://github.com/llvm/llvm-project/blob/f3932ae1a078075e9e35f51ead3aaca05a9a23c7/llvm/lib/Analysis/ScalarEvolution.cpp#L7528
https://github.com/llvm/llvm-project/blob/f3932ae1a078075e9e35f51ead3aaca05a9a23c7/llvm/lib/Analysis/ScalarEvolution.cpp#L7489

Am I misunderstanding something? We invalidate these in exactly the same way we invalidate all the other per-SCEV caches.

mkazantsev abandoned this revision.Sep 29 2021, 12:56 AM

Yes, right. What I meant is that there is no way to invalidate it from outside, like in forgetLoopDispositions. But you are right, this patch is an overkill. I'll come up with something more lightweight.

mkazantsev reclaimed this revision.EditedSep 29 2021, 5:53 AM

Reclaiming the revision just to explain what happens. I don't have any other (nearly cheap) idea how to fix this. Maybe you guys do.

The problem is with hypothetical SCEVs, and existing mechanism of forgetting cached results bound to instructions doesn't work. Let me describe what happens when we run test test/Transforms/LCSSA/pr44058.ll.

Right before we run formLCSSAForInstructions, some block dispositions are already made. The state of cached BlockDispositions is following:

SCEV = ((-1 * %tmp) + ((%tmp + %tmp4) smax %tmp)) ---> bb13 : 2
SCEV = ((%tmp + %tmp4) smax %tmp) ---> bb13 : 2
SCEV = (-1 * %tmp) ---> bb13 : 2
SCEV = (%tmp + %tmp4) ---> bb13 : 2
SCEV = -1 ---> bb13 : 2
SCEV = %tmp ---> bb13 : 2
SCEV = %tmp4 ---> bb13 : 2
SCEV = 1 ---> bb13 : 2
SCEV = 0 ---> bb13 : 2

Here 2 stands for "properly dominates block". And here is analyze -scalar-evolution for the same method:

Classifying expressions for: @foo
  %tmp = load i32, i32* %arg, align 4
  -->  %tmp U: full-set S: full-set             Exits: <<Unknown>>              LoopDispositions: { %bb3: Variant, %bb13: Invariant }
  %tmp4 = load i32, i32* %arg1, align 4
  -->  %tmp4 U: full-set S: full-set            Exits: <<Unknown>>              LoopDispositions: { %bb3: Variant, %bb13: Invariant }
  %tmp5 = add i32 %tmp4, %tmp
  -->  (%tmp + %tmp4) U: full-set S: full-set           Exits: <<Unknown>>              LoopDispositions: { %bb3: Variant, %bb13: Invariant }
  %tmp9 = add nsw i32 %tmp, 1
  -->  (1 + %tmp) U: full-set S: full-set
  %tmp12 = phi i32 [ 0, %bb3 ], [ %tmp4, %bb10 ]
  -->  ((-1 * %tmp) + ((%tmp + %tmp4) smax %tmp)) U: full-set S: full-set               Exits: <<Unknown>>              LoopDispositions: { %bb3: Variant, %bb13: Invariant }
  %tmp14 = phi i32 [ %tmp15, %bb13 ], [ 0, %bb11 ]
  -->  {0,+,1}<nuw><nsw><%bb13> U: [0,2147483647) S: [0,2147483647)             Exits: (-1 + (1 smax ((-1 * %tmp) + ((%tmp + %tmp4) smax %tmp))))<nsw>          LoopDispositions: { %bb13: Computable, %bb3: Variant }
  %tmp15 = add nuw nsw i32 %tmp14, 1
  -->  {1,+,1}<nuw><nsw><%bb13> U: [1,-2147483648) S: [1,-2147483648)           Exits: (1 smax ((-1 * %tmp) + ((%tmp + %tmp4) smax %tmp)))              LoopDispositions: { %bb13: Computable, %bb3: Variant }

Note that we have this small and amazing ((%tmp + %tmp4) smax %tmp) that does not correspond to any instruction. However, it still gets into cache.

At some point, formLCSSAForInstructions calls forgetValue. The following values and SCEVs get wiped out of the map:

forgetValue called on   %tmp = load i32, i32* %arg, align 4
        Erasing from map:   %tmp = load i32, i32* %arg, align 4 (SCEV = %tmp)
        Erasing from map:   %tmp5 = add i32 %tmp4, %tmp (SCEV = (%tmp + %tmp4))

Note that (%tmp + %tmp4) is now dead. However, it is still an argument of our amazing unkillable hypothetical ((%tmp + %tmp4) smax %tmp). There is no way that it will ever be removed from BlocksDisposition because it is only done for SCEVs that are bound for instructions. The only way to delete it is to wipe out the entire map.

It also seems that the same bug affects *all* other maps we have, and the only legal option for forgetValue is to completely invalidate all SCEV.

I see only one potential solution: when we forget memoized results, we should also forget them for all users of a given SCEV. AFAIK there is no API for this currently. And this is potentially, well... Costly.

Any other good ideas? @nikic @reames

This revision now requires changes to proceed.Sep 29 2021, 5:53 AM
mkazantsev requested review of this revision.Sep 29 2021, 5:54 AM

The only theory I have so far is that this hypothetical SCEV isn't bound to instruction, but still is available somewhere (like, it's some loop's backedge count). I'll check this. However, the situation is worrysome even if so. We don't control which SCEVs are hypothetical and which are not, and create tons of them.

Max,

Just to check my understanding, the problem this is trying to fix is specific to a non-canonical stale SCEV right? That is, one which is still allocated (e.g. in the bump pointer allocator), but no longer used for new expressions (e.g. has been removed from the UniqueSCEV map)?

Your example creates a stale SCEV via a RAUW operation on an instruction which contributes one of the (SCEV) operands. I think it's important to highlight that *all* of the SCEVs for the original expression remain in the stale state. We remove them from the bi-directional maps, but we do *not* deallocate them.

As an aside, we also create stale SCEVs quite explicitly when working with RAUW operations on SCEVUnknowns.

Though, hm, maybe we actually have two levels of staleness here. We have "not in bi-directional maps", and we have "not in unique scev map". Your case is really only the former, SCEV unknown is the later. Not sure if that matters yet, or not. (It does, see later.)

If we're on the same page with that, read further. If not, stop here.

I think I'd lean towards validating most stale SCEVs in the disposition cache. This necessitates considering a stale entry valid, but as you point out, we'd otherwise have to walk SCEV users which we don't have a mechanism to do. (We seem to keep stumbling into that from different angles, maybe we should just change that?)

With that starting point, the question then becomes why the stale SCEV doesn't produce the same result? I think the answer probably comes down to an interaction between how an interaction between SCEVUnknown::deleted() and the block disposition handling for an unknown. If my skim of the code is correct, your verification probably crashes on "dyn_cast<Instruction>(cast<SCEVUnknown>(S)->getValue())" in computeBlockDisposition right?

I think I'm leaning towards something along the lines of "verify a stale SCEVs disposition unless it contains a SCEVUnknown which has been deleted".

Another approach would be to only verify one step of the block disposition. E.g. "assuming operands are correctly cached, what would our block disposition be? Is that the cached result?" This would side step the problem by bailing out if the operand wasn't cached. The existing forgetValue would eliminate the SCEV unknown node, and thus verification would side step the same class of stale expressions.

Neither of these are pretty, but I think they'd work and be less invasive/expensive.

mkazantsev planned changes to this revision.Oct 6 2021, 4:14 AM

I'm going to introduce tracking of SCEV users. It seems to be the only correct way to approach this.