This is an archive of the discontinued LLVM Phabricator instance.

[SCEV] Introduce `SCEVSelectExpr`
Needs ReviewPublic

Authored by lebedev.ri on Jan 21 2023, 4:53 PM.

Details

Summary

I believe, the time has come.

First, let's look at an example:

#include <algorithm>

void foo(int* data, int width, short*LUT, bool cond) {
    for(int i = 0; i != width; ++i) {
        int& val = data[i];
        val = LUT[std::min(std::max(val, cond ? 0 : 128), 255)];
    }
}

It's obvious to vectorize, and indeed that does happen: https://godbolt.org/z/erqdaed6b

But what if we change the element type of LUT?

#include <algorithm>

void foo(int* data, int width, int*LUT, bool cond) {
    for(int i = 0; i != width; ++i) {
        int& val = data[i];
        val = LUT[std::min(std::max(val, cond ? 0 : 128), 255)];
    }
}

https://godbolt.org/z/8xrEP88fc <- oops.

The original example vectorized because of TBAA.
Indeed, we don't do a great job of modelling the select's,
so no wonder we failed.

LoopVectorizer has this interesting logic:

// Walk back through the IR for a pointer, looking for a select like the
// following:
//
//  %offset = select i1 %cmp, i64 %a, i64 %b
//  %addr = getelementptr double, double* %base, i64 %offset
//  %ld = load double, double* %addr, align 8
//
// We won't be able to form a single SCEVAddRecExpr from this since the
// address for each loop iteration depends on %cmp. We could potentially
// produce multiple valid SCEVAddRecExprs, though, and check all of them for
// memory safety/aliasing if needed.
//
// If we encounter some IR we don't yet handle, or something obviously fine
// like a constant, then we just add the SCEV for that term to the list passed
// in by the caller. If we have a node that may potentially yield a valid
// SCEVAddRecExpr then we decompose it into parts and build the SCEV terms
// ourselves before adding to the list.
static void findForkedSCEVs(
    ScalarEvolution *SE, const Loop *L, Value *Ptr,
    SmallVectorImpl<PointerIntPair<const SCEV *, 1, bool>> &ScevList,
    unsigned Depth)

... that tries to deal with exactly this case, but it does so
by effectively inventing it's own representation,
which forces it to re-implement handling for everything that SCEV already handles,
and it is incomplete.
(We have something similar in SCEV already, getRangeViaFactoring())

I would like to teach SCEV about select's, and then replace that code
with a straight-forward, complete, SCEV-based implementation.

The (only?) regression i see in the diffs is the fact that i didn't add recursive predicate reasoning for selects.

This has a reasonably small compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=0cdf030cf8dcfb8bfe551c16b14a29e124497069&to=832e0fd5f8a1ecedc64e414627ad8641d70d2682&stat=instructions:u
... while having some size-total changes.

Diff Detail

Event Timeline

lebedev.ri created this revision.Jan 21 2023, 4:53 PM
lebedev.ri requested review of this revision.Jan 21 2023, 4:53 PM
nikic added a comment.Jan 22 2023, 2:51 AM

I think modelling selects in SCEV makes sense, and from a brief look the patch is pretty straightforward.

What I didn't really understand are your examples in the patch description. The referenced godbolt examples don't contain select instructions, and the input program also doesn't look like it should involve selects, so I'm a bit unclear on the relation these examples have to the patch.

Looking at the test diffs, I'm wondering whether the canonical form for logical and should be %a umin_seq %b or select %a, %b, false. I can see arguments in favor of both (the former has symmetry with bitwise and %a umin %b, the latter is more in line with IR and extends to logical or, where we don't have umax_seq). Probably doesn't matter much either way (I don't think we particularly care about modelling of i1 values in SCEV).

nikic added a comment.Jan 22 2023, 2:54 AM

This patch is missing an update of SCEVPoisonCollector, because poison is not propagated from all select operands. (Adding a test for this would be good.)

lebedev.ri added a comment.EditedJan 22 2023, 4:57 AM

What I didn't really understand are your examples in the patch description. The referenced godbolt examples don't contain select instructions, and the input program also doesn't look like it should involve selects, so I'm a bit unclear on the relation these examples have to the patch.

I think non-refactoring patches that say what they do, but don't say why they do what they do, should be automatically reverted.
That TLDR is my general motivation. Sure, they no longer have select's because they got canonicalized into min/max, but that isn't true for everything.

Looking at the test diffs, I'm wondering whether the canonical form for logical and should be %a umin_seq %b or select %a, %b, false. I can see arguments in favor of both (the former has symmetry with bitwise and %a umin %b, the latter is more in line with IR and extends to logical or, where we don't have umax_seq). Probably doesn't matter much either way (I don't think we particularly care about modelling of i1 values in SCEV).

For everything boolean, select should be the canonical form.

This patch is missing an update of SCEVPoisonCollector, because poison is not propagated from all select operands. (Adding a test for this would be good.)

Sigh.

lebedev.ri edited the summary of this revision. (Show Details)

Peeled off all refactorings, diff is much simpler now.

lebedev.ri added inline comments.Jan 22 2023, 8:57 AM
llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info-rewrite-expressions.ll
281

This regresses

289

SE.getSCEVAtScope(%iv, L->getParentLoop()) becomes {(select %c, 4, 2),+,2}<%loop>,
that is not loop-invariant, so we say Exits: <<Unknown>>.

llvm/test/Transforms/IRCE/decrementing-loop.ll
117–121

This regresses

159–163

This regresses

llvm/test/Transforms/IndVarSimplify/lftr-pr20680.ll
12

I'm not sure if this is a regression or not.

llvm/test/Transforms/IndVarSimplify/promote-iv-to-eliminate-casts.ll
314

I'm guessing this is a regression

lebedev.ri added a subscriber: Meinersbur.

@Meinersbur please help with polly stuff. I don't expect that i guessed right,
and i certainly didn't update the test correctly. They have way too much undef in them...

And more refactorings.

lebedev.ri marked 4 inline comments as not done.Jan 22 2023, 12:44 PM
lebedev.ri added inline comments.
llvm/test/Analysis/ScalarEvolution/max-backedge-taken-count-guard-info-rewrite-expressions.ll
281

And gone.

My primary concern here is compile-time; in general, the more complex expressions we model, the more we have to worry about SCEV going out of control in complex cases.

polly/lib/Support/SCEVValidator.cpp
352

Something like the following should fix the regression on polly/test/ScopInfo/Alias-0.ll etc.:

ValidatorResult Cond = visit(Expr->getCondition());
ValidatorResult TrueExpr = visit(Expr->getTrueValue());
ValidatorResult FalseExpr = visit(Expr->getFalseValue());
if (Cond.isConstant() && TrueExpr.isConstant() && FalseExpr.isConstant())
  return ValidatorResult(SCEVType::PARAM, Expr);
LLVM_DEBUG(dbgs() << "INVALID: select of non-constant expressions");
return ValidatorResult(SCEVType::INVALID);

(The tests involving undef selects probably need to be modified to avoid using undef.)

My primary concern here is compile-time; in general, the more complex expressions we model, the more we have to worry about SCEV going out of control in complex cases.

True, that seems to be The concern for SCEV.

lebedev.ri edited the summary of this revision. (Show Details)Jan 22 2023, 6:29 PM

My primary concern here is compile-time; in general, the more complex expressions we model, the more we have to worry about SCEV going out of control in complex cases.

True, that seems to be The concern for SCEV.

This has an unreasonably small compile-time impact:
https://llvm-compile-time-tracker.com/compare.php?from=0cdf030cf8dcfb8bfe551c16b14a29e124497069&to=832e0fd5f8a1ecedc64e414627ad8641d70d2682&stat=instructions:u
... while having some size-total changes.

nikic added a comment.Jan 23 2023, 3:54 AM

What I didn't really understand are your examples in the patch description. The referenced godbolt examples don't contain select instructions, and the input program also doesn't look like it should involve selects, so I'm a bit unclear on the relation these examples have to the patch.

I think non-refactoring patches that say what they do, but don't say why they do what they do, should be automatically reverted.
That TLDR is my general motivation. Sure, they no longer have select's because they got canonicalized into min/max, but that isn't true for everything.

Can you update your examples to use a case that involves selects? The current ones are more confusing than helpful as far as motivation is concerned.

llvm/include/llvm/Analysis/ScalarEvolutionExpressions.h
925

FWIW I find this kind of std::array + zip + std::get + references code hard to understand. Maybe an STLExtras helper for this recurring pattern (initialize an std::array from a range) would be useful. Though personally I'd just write this using SmallVector, just like all the other methods above. The additional complication of std::array isn't worthwhile for code that is not performance sensitive.

llvm/lib/Analysis/ScalarEvolution.cpp
4347

This check can currently be omitted, because we already check for existing scev below. This early check is only needed if there are expensive recursive folds which we try to bypass.

llvm/test/Analysis/ScalarEvolution/exit-count-select-safe.ll
577

Regression, here and below. I think the regression itself isn't even particularly important, more that this loses most of our test coverage for implied poison reasoning in SCEV.

llvm/test/Transforms/SimpleLoopUnswitch/update-scev.ll
15

Why is this matching the predicated count?

lebedev.ri edited the summary of this revision. (Show Details)Jan 23 2023, 11:31 AM
lebedev.ri marked 3 inline comments as done.

Thank you for taking a look!
Addressing nits.

@efriedma @Meinersbur that didn't help polly tests. I'm completely not familiar with polly,
so i don't feel qualified to address the polly part, or adjusting the tests.

llvm/lib/Analysis/ScalarEvolution.cpp
4347

Here and elsewhere: i have no clue what this is supposed to mean,
and i don't feel like guessing. Is this is a statement, a passing-by remark,
or a change request? If you are requesting changes, please do so explicitly.

nikic added inline comments.Jan 23 2023, 1:59 PM
llvm/test/Analysis/ScalarEvolution/exit-count-select-safe.ll
577

This is my only blocking concern, rest are suggestions. This can be addressed by either a) canonicalizing to umin_seq, b) duplicating the implied poison reasoning for selects or c) duplicating the implied poison test coverage to work with umin_seq between exit counts (which will definitely use a proper umin_seq, not a select). As said, I'm more concerned about the test coverage here than the fold for booleans specifically.

Maybe a naive question, can we model select cond, x, y as zext(cond) * x + zext(1 - cond) * y, at least when we can prove that x and y are not poison? Will that help the vectorizer?

I see an interesting prospect of allowing SCEVCouldNotCompute for either TrueValue or FalseValue to model some situations in loop exit counts. Is it worth allowing it?

llvm/include/llvm/Analysis/ScalarEvolution.h
649

We don't have such generalized version for div/rem because they have a fixed number of operands, and I also don't think we should have it here, when the number of operands is fixed.

llvm/include/llvm/Analysis/ScalarEvolutionExpressions.h
561

Same as above, don't we want it to be urem-like, with separate fields for semantically separate operands? Any advantage of using an array?

566

Maybe some more asserts, same as you have in getSelectExpr to make sure these facts hold after all simplifications?

581

Is it possible that either TrueVal or FalseVal is SCEVCouldNotCompute? Is it asserted somewhere?

I'm asking because it's not unreasonable to allow it.

Imagine loop:

while (isInfinite) { do smth }

The exit cound of this loop can be modelled as select isInfinite, CouldNotCompute, 0. So allowing this value might be useful. However, in this case, this method's implementation is wrong.

Do we want this? Either yes or no, I'd prefer a clear comment about it.