Page MenuHomePhabricator

[SCEV]When safe, compute MinStart as unsigned_min(Start - Stride) + Stride in computeMaxBECountForLT
Needs ReviewPublic

Authored by zzheng on Jun 6 2019, 6:16 PM.



$ cat loop-small-runtime-upperbound.ll

target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"

@global = dso_local local_unnamed_addr global i32 0, align 4
@global.1 = dso_local local_unnamed_addr global i8* null, align 4

define dso_local void @hoge(i8 %arg) {
  %x = load i32, i32* @global, align 4
  %0 = icmp ult i32 %x, 17
  br i1 %0, label %loop, label %exit

  %iv = phi i32 [ %x, %entry ], [, %loop ] = add nuw i32 %iv, 8
  %1 = load i8*, i8** @global.1, align 4
  %2 = getelementptr inbounds i8, i8* %1, i32 1
  store i8* %2, i8** @global.1, align 4
  store i8 %arg, i8* %1, align 1
  %3 = icmp ult i32, 17
  br i1 %3, label %loop, label %exit

exit:                                             ; preds = %bb12, %bb
  ret void

$ opt loop-small-runtime-upperbound.ll -analyze -scalar-evolution

The loop runs a max of 3 iters, but SCEV computes max BE-taken count as 3.
The same issue is also found in test/Analysis/ScalarEvolution/2008-11-18-Stride2.ll, where max BE-taken count is 333 instead of 334.

In computeMaxBECountForLT(), when Start is a (C + %x), where C is a constant and %x is an unknown, getUnsignedRangeMin(Start) returns full-set because of %x.
But loop entry is guarded by:

%0 = icmp ult i32 %x, 17

so x is known in [0, 17), thus MinStart shall be C rather than 0.

Diff Detail


Event Timeline

zzheng created this revision.Jun 6 2019, 6:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2019, 6:16 PM
Herald added a subscriber: javed.absar. · View Herald Transcript
zzheng edited the summary of this revision. (Show Details)Jun 6 2019, 6:17 PM
zzheng updated this revision to Diff 203490.Jun 6 2019, 9:18 PM

re-upload with full context

efriedma added inline comments.Jun 7 2019, 12:38 PM

If I'm following correctly, this is sort of similar to what we do in ScalarEvolution::howFarToZero: If Start - Stride doesn't overflow, instead of querying unsigned_min(Start) directly, we can use unsigned_min(Start - Stride) + Stride instead.

It looks like this is actually computing unsigned_min(Start) + Stride, though, which I don't think is correct.

It's not obvious to me that the case where Stride is not a constant (so it's actually unsigned_min(Start - Stride) + unsigned_min(Stride)) works the same way as the case where Stride is a constant, although it seems plausible.

zzheng updated this revision to Diff 204185.Jun 11 2019, 3:59 PM
zzheng retitled this revision from [SCEV][WIP] Try fix PR42175 to [SCEV]When safe, use Stride as MinStart in computeMaxBECountForLT.
zzheng marked an inline comment as done.Jun 11 2019, 4:02 PM

Is the isLoopEntryGuardedByCond actually proving what you need it to prove? Even if Start-Stride is in the range [0, End), that doesn't necessarily imply Start-Stride doesn't overflow. For example, suppose Start is 0, End is -1, and Stride is 2.

I guess if we prove both Stride and End are nonnegative, it's okay.

zzheng updated this revision to Diff 205871.Jun 20 2019, 12:12 PM
zzheng retitled this revision from [SCEV]When safe, use Stride as MinStart in computeMaxBECountForLT to [SCEV]When safe, compute MinStart as unsigned_min(Start - Stride) + Stride in computeMaxBECountForLT.

I think this is sound now.

Could you add some test coverage for the cases we don't and/or can't transform? Maybe also a test where "end" is known-positive, but not constant.


Probably easier to understand the C form of the loop, at first glance: for (unsigned i = start; i < 17; i += 8) { [...] }




Fix this comment?

I may be missing the obvious, but the code as written appears to be assuming the post-increment form of the induction variable is passed in, not the pre-increment form. What ensures that?

If I'm reading this right, you're basically trying to get a more specific range for Start, using the fact proved by the loop entry guard right? I've seen such cases come up several times recently in things I've worked on, maybe it's time to add a getUnsignedRangeAtScope(SCEV, Loop) variant? (Just as we have a getSCEVAtScope)