This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Sequential/in-order `UMin` expression
ClosedPublic

Authored by lebedev.ri on Jan 6 2022, 1:47 PM.

Details

Summary

This is a very rough proof of concept.

As discussed in https://github.com/llvm/llvm-project/issues/53020 / https://reviews.llvm.org/D116692,
SCEV is forbidden from reasoning about 'backedge taken count'
if the branch condition is a poison-safe logical operation,
which is conservatively correct, but is severely limiting.

Instead, we should have a way to express those
poison blocking properties in SCEV expressions.

The proposed semantics is:

Sequential/in-order min/max SCEV expressions are non-commutative variants
of commutative min/max SCEV expressions. If none of their operands
are poison, then they are functionally equivalent, otherwise,
if the operand that represents the saturation point* of given expression,
comes before the first poison operand, then the whole expression is not poison,
but is said saturation point.
  • saturation point - the maximal/minimal possible integer value for the given type

The lowering is straight-forward:

compare each operand to the saturation point,
perform sequential in-order logical-or (poison-safe!) ordered reduction over those checks,
and if reduction returned true then return saturation point
else return the naive min/max reduction over the operands

https://alive2.llvm.org/ce/z/Q7jxvH (2 ops)
https://alive2.llvm.org/ce/z/QCRrhk (3 ops)
Note that we don't need to check the last operand: https://alive2.llvm.org/ce/z/abvHQS
Note that this is not commutative: https://alive2.llvm.org/ce/z/FK9e97

That allows us to handle the patterns in question.

Diff Detail

Event Timeline

lebedev.ri created this revision.Jan 6 2022, 1:47 PM
lebedev.ri requested review of this revision.Jan 6 2022, 1:47 PM
lebedev.ri updated this revision to Diff 397995.Jan 6 2022, 2:33 PM

Precommit the tests.
Drop dead code (safe UMax).

lebedev.ri updated this revision to Diff 397998.Jan 6 2022, 2:44 PM

Fix 3+-operand expansion - unsurprisingly, we must use logical or, i.e. it's poison-blocking form.

reames added a comment.Jan 6 2022, 4:04 PM

Before you go any further, can you explain what you mean by "poison safe"? What IR are you hoping to generate in the end, and why is that more correct than what we have previously?

p.s. I understand the current code is broken, and why. I just haven't seen a viable proposal for a fix as of yet.

nikic added a comment.Jan 6 2022, 11:53 PM

Thanks, this is about what I had in mind. The need for this is annoying, but I don't really see a way around it.

Before you go any further, can you explain what you mean by "poison safe"? What IR are you hoping to generate in the end, and why is that more correct than what we have previously?

p.s. I understand the current code is broken, and why. I just haven't seen a viable proposal for a fix as of yet.

The lowering is X == 0 ? 0 : umin(X, Y), which has the same result as umin(X, Y), except in the case where X == 0 and Y == poison, in which case poison is not propagated. So if you have an always-taken exit with TC=0 that prevents branching on a later poison exit, then this is now modeled correctly.

An alternative lowering would be umin(X, freeze Y).

nikic added inline comments.Jan 6 2022, 11:57 PM
llvm/lib/Analysis/ScalarEvolution.cpp
3973

getMinMaxExpr() currently assumes that the operands are commutative, e.g. in GroupByComplexity. Some of the folds would have to be skipped or done differently for "safe umin".

nikic added a comment.Jan 7 2022, 12:02 AM

Worth noting that we should be using "safe umin" not just for the logical or case, but also when combining exit counts from multiple exits, which is the more common case. It's probably best to do that separately, but that would give a clearer picture of how bad this is in terms of practical impact, because we have a lot more tests for that.

aqjune added a comment.Jan 7 2022, 1:13 AM

Since x == 0 ? x : umin(x, y) cannot be represented using the current SCEV operations (at least using the ops in SCEVTypes). I believe the new ops in this patch are necessary.
Another approach to support such expressions would be adding a ternary operator and comparisons to SCEV, but it would require bigger changes, I guess?

On the other hand, I think operations in SCEV must be clear about how it deals with poison values.
Can we assume that inputs/outputs of operations in SCEV can be poison in general, or it is allowed only for certain operations like SafeUMinExpr?

lebedev.ri edited the summary of this revision. (Show Details)Jan 7 2022, 8:53 AM
lebedev.ri updated this revision to Diff 398166.Jan 7 2022, 8:59 AM
lebedev.ri retitled this revision from [SCEV] Poison-safe `UMin` expression to [SCEV] Saturating `UMin` expression.

Thanks for taking a look!

Before you go any further, can you explain what you mean by "poison safe"? What IR are you hoping to generate in the end, and why is that more correct than what we have previously?

I believe others have already explained it well before i could.
I've adjusted the differential's description with a bit more blurb and alive2 reasoning.
Let me know if that is still not sufficient.

In D116766, @nikic wrote:

getMinMaxExpr() currently assumes that the operands are commutative, e.g. in GroupByComplexity. Some of the folds would have to be skipped or done differently for "safe umin".

Right. I've now made the expression non-commutative.

Worth noting that we should be using "safe umin" not just for the logical or case, but also when combining exit counts from multiple exits, which is the more common case. It's probably best to do that separately, but that would give a clearer picture of how bad this is in terms of practical impact, because we have a lot more tests for that.

Yep, i really don't want to deal with everything at once :)

Using "safe" really bothered me, i've gone ahead and used "saturating" instead,
since that is what happens in reality: https://alive2.llvm.org/ce/z/abvHQS (+ non-commutativity)

lebedev.ri edited the summary of this revision. (Show Details)Jan 7 2022, 9:10 AM
lebedev.ri added a reviewer: efriedma.
lebedev.ri updated this revision to Diff 398322.Jan 8 2022, 1:54 AM
lebedev.ri retitled this revision from [SCEV] Saturating `UMin` expression to [SCEV] Sequential/in-order `UMin` expression.

Naming things is perhaps the hardest part. After thinking about it more,
i feel like what should be emphasized is the sequential-ness of these reductions.
I'm not going to change this myself again, but i'm open to thoughts whether there is a better naming alternative.

fhahn added a subscriber: fhahn.Jan 8 2022, 2:42 AM
nikic accepted this revision.Jan 10 2022, 7:48 AM

I'm on board with the "sequential" name, LGTM from my side. But please wait for @reames to approve as well.

There are some pretty obvious folds we can do (in particular, we should be trying to convert umin_seq to umin with known non-zero/non-poison values), but those are best left for later.

llvm/include/llvm/Analysis/ScalarEvolution.h
632

Still using the old name here.

736

Rename to Sequential here as well for consistency?

llvm/include/llvm/Analysis/ScalarEvolutionExpressions.h
56

Precommit reformat of this file?

552

Is this needed? Doesn't look relevant for minmax.

This revision is now accepted and ready to land.Jan 10 2022, 7:48 AM
lebedev.ri marked 5 inline comments as done.

Thanks for taking a look!
Squash last few post-rename relics.

Waiting on @reames / @efriedma / @mkazantsev.

llvm/include/llvm/Analysis/ScalarEvolutionExpressions.h
560–561

This is the only user of setNoWrapFlags(), i simply mimic what SCEVMinMaxExpr does.

I skimmed the code for a sanity check, but am mostly just commenting on the high level bits.

I'm not convinced this is the right approach, but I also don't want to block iterative progress. My primary concern is that I don't see how this solves the multiple exit case and that I think we're going to end up needing a new representation there, which indirectly solves this problem. However, I think we can work step-wise here, and come back to refactor/clean this up if needed.

Personally, I think I'd have preferred to introduce a freeze node as that seems potentially more reusable, but I have no strong argument there, just a gut feel.

On naming, I'd suggest ordered reduction as that's the term I'm familiar with from the vectorizaion literature for floating point reductions, and poison in integer domain seems analogous, but this is a fairly minor point.

So overall, weakly hesitant, but don't let that stop you.

llvm/include/llvm/IR/IRBuilder.h
1574

I'd expand this comment a bit.

From the vectorization literature, the term is often ordered reduction. Might be good to use the same name.

lebedev.ri marked an inline comment as done.

@reames thanks for taking a look!
I'm hesitant as to what name is most obvious, that will depend
on the literature/area, and i'm just not seeing anything less contrived.

I do agree that it's possible that some more generic solution
will subsume this fix, but as you have said, there's likely no need
to not be step-by-step process.
I'm somewhat cautious of freeze here, it's a heavy hammer.

With that, does this look good?

nikic added a comment.Jan 10 2022, 9:12 AM

I skimmed the code for a sanity check, but am mostly just commenting on the high level bits.

I'm not convinced this is the right approach, but I also don't want to block iterative progress. My primary concern is that I don't see how this solves the multiple exit case and that I think we're going to end up needing a new representation there, which indirectly solves this problem. However, I think we can work step-wise here, and come back to refactor/clean this up if needed.

Why do you think this would't solve the multiple exit case? The new node is currently not used for that case, but I do believe it would address that case as well (if used).

I skimmed the code for a sanity check, but am mostly just commenting on the high level bits.

I'm not convinced this is the right approach, but I also don't want to block iterative progress. My primary concern is that I don't see how this solves the multiple exit case and that I think we're going to end up needing a new representation there, which indirectly solves this problem. However, I think we can work step-wise here, and come back to refactor/clean this up if needed.

Why do you think this would't solve the multiple exit case? The new node is currently not used for that case, but I do believe it would address that case as well (if used).

The case I was thinking of was when exit1 was taken on iteration N, but exit2's condition becomes poison on that same iteration. However, thinking about it harder, I think the existing code is required to simply return N+1 for exit2, and that's not directly a problem. However, by that logic we don't have a problem around multiple exits at all, so I'm clearly missing/forgetting something.

So, maybe it does? I haven't fully thought through that problem yet.

With that, does this look good?

As said before, my remaining comments are non-blocking. You've got a LGTM, go for it.

lebedev.ri edited the summary of this revision. (Show Details)Jan 10 2022, 9:24 AM

With that, does this look good?

As said before, my remaining comments are non-blocking. You've got a LGTM, go for it.

Thank you.

This revision was automatically updated to reflect the committed changes.

Hi, this change caused clang to crash in chromium arm builds; could you take a look?

reduced repro

$ cat t.c
int a, b;
int c() {
  int d;
  while (a) {
    int e, f;
    for (; e && d; ++e) {
      g();
      ++d;
    }
    for (; f < e; ++f)
      if (b)
        return 0;
  }
}
$ clang -cc1 -triple thumbv7-unknown-linux-android23 -S -Oz t.c

Thanks!

nathanchance added a subscriber: nathanchance.EditedJan 10 2022, 5:48 PM

Just in case it is something different than the Chromium report, I see an assertion failure while building the Linux kernel. Reduced reproducer:

$ cat route.i
struct snd_pcm_plugin_channel {
  int enabled : 1;
} snd_pcm_area_copy(), *zero_areas_dvp;
int route_transfer_plugin_1_0, route_transfer_plugin_0_0;
void route_transfer() {
  int nsrcs, ndsts, dst;
  nsrcs = route_transfer_plugin_0_0;
  ndsts = route_transfer_plugin_1_0;
  dst = 0;
  for (; dst < ndsts && dst < nsrcs; ++dst)
    snd_pcm_area_copy();
  ndsts = dst;
  dst = 0;
  for (; dst < ndsts; ++dst)
    zero_areas_dvp->enabled = 0;
}

$ clang --target=aarch64-linux-gnu -O1 -c -o /dev/null route.i

$ clang --target=aarch64-linux-gnu -O2 -c -o /dev/null route.i
Unknown SCEV type!
UNREACHABLE executed at /home/nathan/cbl/github/tc-build/llvm-project/llvm/lib/Analysis/ScalarEvolution.cpp:9299!
...

I've disabled this in f62f47f5. If you see failures after that, let me know and I'll do a full revert.

Thanks! That was the same failure.
Reduced and relanded/fixed in rG76a0abbc13cdfd3ae71f8db8a9376f65a9f6f725.