Page MenuHomePhabricator

[SCEV] Extend ability to infer flags to more complicates scopes

Authored by reames on Oct 7 2021, 1:38 PM.



At it's heart, this change simply extends the reasoning for proving that B must execute if A does to allow a single-successor or loop preheader chain of blocks. The majority of the change is in making that reasonable efficient.

To make this efficient, we need to cache the per-block queries for the intermediate nodes in the found path. (I'd love to do the edge cases too, but the invalidation is trickier.) This patch does so by taking an existing loop level cache, and essentially splitting it into a per-block cache and then summarizing back to loop level. In particular, we exactly parallel the construction and invalidation of the new-block cache so that no new invalidation events should be needed. The new cache should be "as correct" as the original code.

The invalidation actions we need to worry about are adding and removing instructions from a block. For removal, we might end up in an imprecise state. For addition, we might end up in a incorrect state. The existing LoopProperties cache has exactly the same issues, and depends on forgetLoop calls for correctness when we insert new instructions (with interesting properties) into a loop.

Long term, I'm actually hoping to sink the notion of block properties into BasicBlock itself, but starting here with a standalone patch makes a lot of sense.

Diff Detail

Event Timeline

reames created this revision.Oct 7 2021, 1:38 PM
reames requested review of this revision.Oct 7 2021, 1:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2021, 1:38 PM
mkazantsev requested changes to this revision.EditedOct 7 2021, 10:01 PM

Why am I getting more and more scared every time a new cache is added in SCEV? :) Please add verifiation info ::verify().

I think you might be also missing preheader invalidation in forgetLoop.


What about preheader?

This revision now requires changes to proceed.Oct 7 2021, 10:01 PM
mkazantsev added inline comments.Oct 7 2021, 10:11 PM

Does it handle invoke terminator correctly?

mkazantsev added inline comments.Oct 7 2021, 10:19 PM

Quick check: A dominates B?


I guess this will hand if you call it on 2 unreachable blocks going to one another.




nit: capitalize lambda name.

reames added inline comments.Oct 11 2021, 10:02 AM

Two answers:
As actually used, that's trivially true.

For the interface, we don't need this property. Consider B in the header of some loop L, with A in the (unconditional) latch block. A must reach B even though A does not dominate B.

This could be an optimization since the case implemented doesn't catch the case just mentioned. Will add with that as a comment.


Good catch, we should filter by reachability since unreachable CFGs are weird.


They're methods, not variables. (Or at least, I believe that's our style convention.)


Er, it's calling the same function as previously? Not sure what you mean?


Good catch, had not considered the implication of this combined with the attempted caching of the last element in the path.

reames updated this revision to Diff 378739.Oct 11 2021, 11:18 AM

Rebase and address review comments.

The added verify routine reveals a bug in LoopFusion, which appears to be broader than this patch. I'm going to look into that, but need to finish something else first, so it might be a day or two.

reames updated this revision to Diff 378756.Oct 11 2021, 12:15 PM

I was wrong about the loop-fusion failures being a bug in loop fusion, it was a bug in the verification I'd added.

However, the complexity of the "fixed" version of the verification makes me really doubt that verification here is worthwhile at all. Our infrastructure of value handles of const types is *awful*. I'm posting this so that anyone interested can see the cleanest version I could come up with. (The addition of the new constructor to SCEVCallbackVH hides the needs for a *lot* of const_casts otherwise required for e.g. WeakVH which would be a more natural fit.)

Given I don't plan to do the core ValueHandle work needed to make the verification readable (at this time), I'm going to drop the verification entirely in the next revision.

reames updated this revision to Diff 378760.Oct 11 2021, 12:22 PM

Drop verification complexity.

I'm generally OK now, but please give your opinion on forgetLoop question since there might be a bug lurking here.


This is a really strange and counter-intuitive limitation. I'm OK with it for now, but I think we might want to remove it in the future. Should we add a TODO here or in method description?


Fair, I didn't think of this case. Unconditional latches are rare, though, so I think this is fine.


Makes sense to check reachibility of A as well, just to avoid useless work and save some CT.


As an idea for follow-up: if PrevBB is the only exit of a loop (w/o abnormal exits or locks etc), we can also return it, because the loop should be finite. Maybe makes sense to add a TODO.


I was certain they are treated as variables, and there is a lot of examples of this in this very file (grep by "= [", most are capitalized).

./ScalarEvolution.cpp:639:    const auto IsGVNameSemantic = [&](const GlobalValue *GV) {
./ScalarEvolution.cpp:854:  auto IsLessComplex = [&](const SCEV *LHS, const SCEV *RHS) {
./ScalarEvolution.cpp:2356:  auto IsKnownNonNegative = [&](const SCEV *S) {
./ScalarEvolution.cpp:2370:    auto Opcode = [&] {
./ScalarEvolution.cpp:2469:  auto ComputeFlags = [this, OrigFlags](const ArrayRef<const SCEV *> Ops) {
./ScalarEvolution.cpp:2513:  auto FindTruncSrcType = [&]() -> Type * {
./ScalarEvolution.cpp:3058:  auto ComputeFlags = [this, OrigFlags](const ArrayRef<const SCEV *> Ops) {
./ScalarEvolution.cpp:3638:  const bool AssumeInBoundsFlags = [&]() {
./ScalarEvolution.cpp:3752:    auto FoldOp = [&](const APInt &LHS, const APInt &RHS) {
./ScalarEvolution.cpp:4196:    auto MatchMinMaxNegation = [&](const SCEVMinMaxExpr *MME) {
./ScalarEvolution.cpp:5193:  auto getExtendedExpr = [&](const SCEV *Expr,
./ScalarEvolution.cpp:5208:  auto PredIsKnownFalse = [&](const SCEV *Expr,
./ScalarEvolution.cpp:5228:  auto AppendPredicate = [&](const SCEV *Expr,
./ScalarEvolution.cpp:5300:  auto areExprsEqual = [&](const SCEV *Expr1, const SCEV *Expr2) -> bool {
./ScalarEvolution.cpp:5738:      auto CoerceOperand = [&](const SCEV *Op) -> const SCEV * {
./ScalarEvolution.cpp:6626:  auto pushOp = [&](const SCEV *S) {
./ScalarEvolution.cpp:6771:    auto HasSideEffects = [](Instruction *I) {
./ScalarEvolution.cpp:7692:  auto PredicateNotAlwaysTrue = [](const ExitNotTakenInfo &ENT) {
./ScalarEvolution.cpp:7715:  auto PredicateNotAlwaysTrue = [](const ExitNotTakenInfo &ENT) {
./ScalarEvolution.cpp:9404:  auto SolveForBoundary = [&](APInt Bound) -> std::pair<Optional<APInt>,bool> {
./ScalarEvolution.cpp:9422:    auto LeavesRange = [&] (const APInt &X) {
./ScalarEvolution.cpp:9688:  auto ComputesEqualValues = [](const Instruction *A, const Instruction *B) {
./ScalarEvolution.cpp:9714:  auto TrivialCase = [&](bool TriviallyTrue) {
./ScalarEvolution.cpp:10230:  auto CheckRanges = [&](const ConstantRange &RangeLHS,
./ScalarEvolution.cpp:10261:  auto MatchBinaryAddToConst = [this](const SCEV *X, const SCEV *Y,
./ScalarEvolution.cpp:10529:    auto ProofFn = [&](ICmpInst::Predicate P) {
./ScalarEvolution.cpp:10537:  auto ProveViaGuard = [&](const BasicBlock *Block) {
./ScalarEvolution.cpp:10541:      auto ProofFn = [&](ICmpInst::Predicate P) {
./ScalarEvolution.cpp:10551:  auto ProveViaCond = [&](const Value *Condition, bool Inverse) {
./ScalarEvolution.cpp:10556:      auto ProofFn = [&](ICmpInst::Predicate P) {
./ScalarEvolution.cpp:10790:  auto IsSignFlippedPredicate = [](CmpInst::Predicate P1,
./ScalarEvolution.cpp:11143:  auto ProvedEasily = [&](const SCEV *S1, const SCEV *S2) {
./ScalarEvolution.cpp:11337:  auto GetOpFromSExt = [&](const SCEV *S) {
./ScalarEvolution.cpp:11352:  auto IsSGTViaContext = [&](const SCEV *S1, const SCEV *S2) {
./ScalarEvolution.cpp:11376:    auto IsSumGreaterThanRHS = [&](const SCEV *S1, const SCEV *S2) {
./ScalarEvolution.cpp:11696:  auto canAssumeNoSelfWrap = [&](const SCEVAddRecExpr *AR) {
./ScalarEvolution.cpp:11842:      auto wouldZeroStrideBeUB = [&]() {
./ScalarEvolution.cpp:11859:    auto isUBOnWrap = [&]() {
./ScalarEvolution.cpp:11954:    auto canProveRHSGreaterThanEqualStart = [&]() {
./ScalarEvolution.cpp:12007:    bool MayAddOverflow = [&] {
./ScalarEvolution.cpp:13581:  const auto MatchURemWithDivisor = [&](const SCEV *B) {
./ScalarEvolution.cpp:13653:  auto CollectCondition = [&](ICmpInst::Predicate Predicate, const SCEV *LHS,

just an idea: merge the two loops together?


Can we use lookup?


I thought there might have been a bug before your patch, but it seems that isGuaranteedToTransferExecutionToSuccessor handles it correctly.


What still worries me (though I can't say if there is a bug here) is that your path might contain blocks from multiple (nested) loops, and here we might only forget inner loop. Will that still be correct?

reames planned changes to this revision.Oct 25 2021, 10:40 AM

Placing this on hold. I was expecting to see perf regressions reported against the flag redefinition changes, and this was expected to be needed to help claw back some of that performance. As there's been nothing reported, I'd prefer not to add further complexity here without cause. Placing this on hold for the moment, if we go a couple more weeks without reported regressions I'll abandon this change.


We know A dominates B. From B being reachable, A must also be.


We have this same basic pattern repeating in a bunch of places. If we ever implement this idea, we'd need to visit all of them. I don't think having a comment in one place really helps.

reames abandoned this revision.Wed, Nov 17, 9:00 AM

Still no reported issues with the original patch. Abandoning for now, will return if warranted.