https://bugs.llvm.org/show_bug.cgi?id=42175
$ cat loop-small-runtime-upperbound.ll
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
@global = dso_local local_unnamed_addr global i32 0, align 4
@global.1 = dso_local local_unnamed_addr global i8* null, align 4
define dso_local void @hoge(i8 %arg) {
entry:
  %x = load i32, i32* @global, align 4
  %0 = icmp ult i32 %x, 17
  br i1 %0, label %loop, label %exit
loop:
  %iv = phi i32 [ %x, %entry ], [ %iv.next, %loop ]
  %iv.next = add nuw i32 %iv, 8
  %1 = load i8*, i8** @global.1, align 4
  %2 = getelementptr inbounds i8, i8* %1, i32 1
  store i8* %2, i8** @global.1, align 4
  store i8 %arg, i8* %1, align 1
  %3 = icmp ult i32 %iv.next, 17
  br i1 %3, label %loop, label %exit
exit:                                             ; preds = %bb12, %bb
  ret void
}$ opt loop-small-runtime-upperbound.ll -analyze -scalar-evolution
...
The loop runs a max of 3 iters, but SCEV computes max BE-taken count as 3.
The same issue is also found in test/Analysis/ScalarEvolution/2008-11-18-Stride2.ll, where max BE-taken count is 333 instead of 334.
In computeMaxBECountForLT(), when Start is a (C + %x), where C is a constant and %x is an unknown, getUnsignedRangeMin(Start) returns full-set because of %x. 
But loop entry is guarded by:
%0 = icmp ult i32 %x, 17
so x is known in [0, 17), thus MinStart shall be C rather than 0.
If I'm following correctly, this is sort of similar to what we do in ScalarEvolution::howFarToZero: If Start - Stride doesn't overflow, instead of querying unsigned_min(Start) directly, we can use unsigned_min(Start - Stride) + Stride instead.
It looks like this is actually computing unsigned_min(Start) + Stride, though, which I don't think is correct.
It's not obvious to me that the case where Stride is not a constant (so it's actually unsigned_min(Start - Stride) + unsigned_min(Stride)) works the same way as the case where Stride is a constant, although it seems plausible.