This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Infer ranges for SCC consisting of cycled Phis
ClosedPublic

Authored by mkazantsev on Sep 28 2021, 4:52 AM.

Details

Summary

Our current strategy of computing ranges of SCEVUnknown Phis was to simply
compute the union of ranges of all its inputs. In order to avoid infinite recursion,
we mark Phis as pending and conservatively return full set for them. As result,
even simplest patterns of cycled phis always have a range of full set.

This patch makes this logic a bit smarter. We basically do the same, but instead
of taking inputs of single Phi we find its strongly connected component (SCC)
and compute the union of all inputs that come into this SCC from outside.

Processing entire SCC together has one more advantage: we can set range for all
of them at once, because the only thing that happens to them is the same value is
being passed between those Phis. So, despite we spend more time analyzing a
single Phi, overall we may save time by not processing other SCC members, so
amortized compile time spent should be approximately the same.

Diff Detail

Event Timeline

mkazantsev created this revision.Sep 28 2021, 4:52 AM
mkazantsev requested review of this revision.Sep 28 2021, 4:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 28 2021, 4:52 AM
mkazantsev retitled this revision from [WIP][SCEV] Infer ranges for SCC consisting of cycled Phis to [SCEV] Infer ranges for SCC consisting of cycled Phis.

Theoretically it should not harm CT, but would be nice to ensure that. Nikita, could you please help me with that?

llvm/lib/Analysis/ScalarEvolution.cpp
6597

Why is it okay to ignore operands in PendingPhiRanges here?

mkazantsev added inline comments.Sep 28 2021, 10:27 PM
llvm/lib/Analysis/ScalarEvolution.cpp
6597

They are not ignored, they are some SS Phi's input so they will be in Roots. We will take the range from them for our union.

mkazantsev added inline comments.Sep 28 2021, 10:42 PM
llvm/lib/Analysis/ScalarEvolution.cpp
6597

SCC Phi*

Seems overall reasonable, though, I'm curious, where do you see this trigger in practice? Given this formulation doesn't involve anything but phis, it disallows IVs. Given that, I'm curious what patterns create this.

llvm/lib/Analysis/ScalarEvolution.cpp
6576–6577

This block of code has gotten quite complicated, probably time for a helper function.

6597

I'm not sure about this statement. IF this is the *first* recursion (e.g. the top level value is a phi) in this SCC then sure, but we'd analyze something which happened to contain a non-SCC phi, then recursed into it's operands (which contained an SCC phi), I'm unsure. At a minimum, I think the code would be easier to follow if you seperated the SCC detection from the PendingPHIRange mechanism.

6626

Does this ever get hit? I'd expect this to be unreachable as we'd always traverse the same SCC by definition from any entry.

Seems overall reasonable, though, I'm curious, where do you see this trigger in practice? Given this formulation doesn't involve anything but phis, it disallows IVs. Given that, I'm curious what patterns create this.

It's the first step. I'm planning to generalize this to deal with chunked IV.

Imagine we had a single block loop which was incrementing. Everything was fine, and SCEV knew how to deal with its range.

Then we chunked the iteration space is split in chunks, and we have two loops, outer phi being start for inner phi, and taking it as backedge input.

SCEV in this case becomes just helpless to do anything. It cannot even prove they are non-negative if they start from 0 and only go up. The thing I am aiming is SCC's consisting of SCEVUnknowns and SCEVAddReds<nuw>/<nsw>. Idea is following: if a SCC only takes non-negative values from outside, and all its addrecs have nuw/nsw, then the entire thing is non-negative.

Examples of this I've merged as test/Transforms/IndVarSimplify/outer_phi.ll.

But this is a bit far-fetched. As a first step I wanted to fix case with cycled Phis, whether they are chunked IV or they are just siblings or whatever.

mkazantsev added inline comments.Oct 5 2021, 3:00 AM
llvm/lib/Analysis/ScalarEvolution.cpp
6597

Basically, the algorithm is correct even if we ignore random nodes. The phis that get into SCC all:

  • Are reachable from first Phi;
  • Use first Phi (directly or indirectly).

It means that in any case SCC is some strongly connected component, even if not maximal by inclusion. It's OK. Our statement that SCC Phis can only take values that come to them from outside is still true.

In practice, if we ever meet a pending input, it will have range of full set and entire thing will also be full set.

6626

I think you are right. Need to check.

mkazantsev planned changes to this revision.Oct 5 2021, 4:44 AM

Planning rework.

mkazantsev added inline comments.Oct 5 2021, 5:02 AM
llvm/lib/Analysis/ScalarEvolution.cpp
6626

Actually no, this happens quite often. Here is what happens when I add assert:

Failed Tests (22):
  LLVM :: Analysis/CostModel/X86/interleaved-load-f64-stride-4.ll
  LLVM :: Analysis/CostModel/X86/interleaved-load-i16-stride-6.ll
  LLVM :: Analysis/CostModel/X86/interleaved-store-f32-stride-2.ll
  LLVM :: Analysis/CostModel/X86/interleaved-store-i32-stride-4.ll
  LLVM :: Analysis/CostModel/X86/interleaved-store-i32-stride-6.ll
  LLVM :: Analysis/CostModel/X86/interleaved-store-i64-stride-3.ll
  LLVM :: Analysis/CostModel/X86/interleaved-store-i8-stride-6.ll
  LLVM :: CodeGen/AMDGPU/amdpal-callable.ll
  LLVM :: CodeGen/AMDGPU/wave32.ll
  LLVM :: CodeGen/ARM/loop-indexing.ll
  LLVM :: CodeGen/Hexagon/swp-epilog-reuse4.ll
  LLVM :: CodeGen/X86/stack-protector.ll
  LLVM :: CodeGen/X86/vector-fshl-256.ll
  LLVM :: Transforms/Attributor/willreturn.ll
  LLVM :: Transforms/IndVarSimplify/no-iv-rewrite.ll
  LLVM :: Transforms/LoopStrengthReduce/lsr-comp-time.ll
  LLVM :: Transforms/LoopUnroll/runtime-loop-branchweight.ll
  LLVM :: Transforms/LoopUnroll/runtime-loop2.ll
  LLVM :: Transforms/LoopVectorize/ARM/tail-folding-counting-down.ll
  LLVM :: Transforms/LoopVectorize/PowerPC/reg-usage.ll
  LLVM :: Transforms/LoopVectorize/X86/gather_scatter.ll
  LLVM :: Transforms/LoopVectorize/tail-folding-counting-down.ll
mkazantsev added inline comments.Oct 5 2021, 5:37 AM
llvm/lib/Analysis/ScalarEvolution.cpp
6576–6577

Factored out.

mkazantsev updated this revision to Diff 377165.Oct 5 2021, 5:42 AM

Factored out SCC logic.

Ok, sorry for taking so long to reply to this. Something had been bugging me, and I hadn't quite taken the time to figure out what. I finally did.

I think this solution is majorly overkill, and adds an amount of complexity which is not justified. Let me walk through an alternate approach; I'm curious what you think.

The basic problem we have is a loop nest where the inner loop has an unknown trip count, and an addrec IV. This inner loop is wrapped in an outer loop which uses the increment of the inner IV as the backedge value of the outer IV. It's worth highlighting that if the inner loop had a computable trip count, the outer loops IV would likely also become an addrec. We can totally form the expression for the step of the outer loop, but since it's not loop invariant in the outer loop, we don't have an addrec.

As a brief aside, I've long wondered whether we should allow addrec's with loop varying step. If we did, we'd have a clean model for the outer IV. I'm not proposing we do that now, just mentioning it.

However, I think we can and should pattern match this "almost an addrec" pattern in the range check logic. Looking at a SCEVUnknown phi, checking to see if it matches this pattern (loop header, backedge is addrec with start value function of self, check!) is pretty straight forward. Much much simpler than the SCC matching at least.

Unless you have a motivating case which is not a nested loop, I think we should do this approach instead.

Inline comments on the SCC implementation. As stated in previous comment, I think we should take an alternate approach, but leaving the comments since I took the time to build the mental state on the code.

llvm/lib/Analysis/ScalarEvolution.cpp
6717

This case results in the new code being strictly inferior to the old code on large phi nodes.

6725

I'm not sure simply ignoring an input is correct here. We may have to abort instead.

Hm, I think you're intent is that these get treated like external roots to the SCC. This a) probably explains why you get inconsistent SCCs over time, and b) deserves a comment at minimum,

6735

Instead of reusing the worklist variable, please scope the code such that you can use two worklists. Makes tracking the flow much easier.

6793

This bothers me for two reasons: 1) It implies we sometimes don't recognize a phi node as being part of an SCC, and sometimes do, and 2) this is spooky action at a distance where later queries change cached results.

I strongly suspect this is masking a bug above.

reames requested changes to this revision.Nov 30 2021, 9:43 AM
This revision now requires changes to proceed.Nov 30 2021, 9:43 AM

Thanks for the suggesting, I need to wright a prototype to see how it works and if it covers all I need.

Actually the definition of "almost addrec" doesn't even cover the motivation for this patch, where we have 2 phis from same block that just swap values. I don't think we can have any good framing here.

mkazantsev added inline comments.Dec 2 2021, 11:14 PM
llvm/lib/Analysis/ScalarEvolution.cpp
6717

Two points here:

  1. Old code, seeing cycled phis, will always return full set, which in union will also give fullset for all phis. So no, in this case it's not worse, because "worse" is impossible.
  1. Old code, not seeing cycled phis, will do exactly same thing as this one (e.g. process phi itself as the sole root).

So no, if I'm reading this correctly, it is strictly not worse than the old one.

6725

Off course it is correct to cut in any arbitrary place. We can take any subset of Phis, even if it's not a SCC, collect all inputs from outside of this subset, and use their union as estimate of the range of all these Phis. But it's only profitable when this subset is a SCC.

mkazantsev updated this revision to Diff 391622.EditedDec 3 2021, 6:08 AM

Because incomplete SCCs draw a lot of questions and suspicions, despite I think everything should still work correctly for them, these cases are unimportant for any practical purpose.

I've reworked it so that it will either find full SCC (and therefore do better than before), or conservatively bail to processing of a single Phi.

Reduction of SCC to "AddRec with variant step" is too weak and doesn't handle swap example, so I think think it's worthwhile.

reames accepted this revision.Feb 14 2022, 2:02 PM

LGTM.

I'm still not entirely convinced this is the best approach, but the simplified approach appears correct, and this looks at least reasonable.

This revision is now accepted and ready to land.Feb 14 2022, 2:02 PM
This revision was landed with ongoing or failed builds.Feb 17 2022, 3:03 AM
This revision was automatically updated to reflect the committed changes.

I'm seeing a miscompile caused by assuming the entire SCC has the same range. One phi in the SCC has a corresponding llvm.assume limiting its range, but that assume doesn't apply to another phi in the SCC.

I apologize for the fairly large repro (at least it's only one function!).
$ opt -passes=indvars test.ll -S
You'll see that the br condition right before the invoke i32 @wobble.7 becomes false. That's because %tmp951 has a corresponding assume that it's non-negative. However, %tmp298, which is in the same phi SCC, is not under that assumption, but this patch still updates its range to be non-negative.

llvm/lib/Analysis/ScalarEvolution.cpp
6693–6698

Here

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 7:54 PM
xbolva00 added inline comments.
llvm/test/Analysis/ScalarEvolution/cycled_phis.ll
33

Remove fixme

I'm seeing a miscompile caused by assuming the entire SCC has the same range. One phi in the SCC has a corresponding llvm.assume limiting its range, but that assume doesn't apply to another phi in the SCC.

I apologize for the fairly large repro (at least it's only one function!).
$ opt -passes=indvars test.ll -S
You'll see that the br condition right before the invoke i32 @wobble.7 becomes false. That's because %tmp951 has a corresponding assume that it's non-negative. However, %tmp298, which is in the same phi SCC, is not under that assumption, but this patch still updates its range to be non-negative.

Taking a look...

This comment was removed by mkazantsev.

Oh I think I see what happens... We could side-exit before assume was executed for example. This is a fundamental bug then. Need to find another approach to this.

I'll try a completely different solution then.