This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
-
ScalarEvolution.h
-
lib/Analysis/
-
Analysis/
9
ScalarEvolution.cpp
-
test/Analysis/ScalarEvolution/
-
Analysis/
-
ScalarEvolution/
-
infer-trip-count-idx-wrap.ll
-
infer-trip-count.ll

Differential D155049

[ScalarEvolution] Infer loop max trip count from memory accesses
Needs ReviewPublic

Authored by Peakulorain on Jul 12 2023, 1:15 AM.

Download Raw Diff

Details

Reviewers

reames
nikic
mkazantsev
xbolva00
jdoerfert

Summary

Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.

Diff Detail

Event Timeline

Peakulorain created this revision.Jul 12 2023, 1:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 1:15 AM

Herald added subscribers: StephenFan, hiraditya. · View Herald Transcript

Peakulorain requested review of this revision.Jul 12 2023, 1:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2023, 1:15 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Hi, All,
This patch is a fixed commit.
Related to:
https://reviews.llvm.org/D109821
https://reviews.llvm.org/D113554

In order to infer this trip count, I divide the process into the following steps:

Collect the load/store instructions that are executed on each iteration of the loop;
Filter out Reads/Write that may overlap;
Calculate the possible wrap value for each Ptr index and record the smaller one;
Infer the maximum number of executables from the total memory size/step value;
Compare the calculated value with the smaller wrap value in the index to ensure that the step is strictly increasing.

Happy to see it!

I am mainly interested in cases when you have a small array and with this new analyzed size, you can do unrolling.

Before we start looking at the implementation, please:

Add an option that uses the new logic to refine the standard BE count results.
Move tests to llvm/test/Analysis/ScalarEvolution using that option.

This revision now requires changes to proceed.Jul 12 2023, 1:56 AM

Harbormaster completed remote builds in B244693: Diff 539422.Jul 12 2023, 3:44 AM

In D155049#4492565, @nikic wrote:

Before we start looking at the implementation, please:

Add an option that uses the new logic to refine the standard BE count results.

Move tests to llvm/test/Analysis/ScalarEvolution using that option.

Thanks for the guide, updated as requested.

Harbormaster completed remote builds in B245326: Diff 540324.Jul 14 2023, 3:07 AM

Could you please explain at a high level why we need all this custom wrapping logic? Why are the no-wrap flags on the AddRec insufficient?

In D155049#4509641, @nikic wrote:

Could you please explain at a high level why we need all this custom wrapping logic? Why are the no-wrap flags on the AddRec insufficient?

The purpose of using the logic I implemented myself is to focus on calculating how many iterations the index of GEP will wrap. With this value, we can know if the loop will fall into an infinite loop.

for.body:
  %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i8 %iv to i64
  %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
  store i32 0, i32* %arrayidx, align 4
  %inc = add i8 %iv, 1
  %inc_zext = zext i8 %inc to i32
  %cmp = icmp ult i32 %inc_zext, %len
  br i1 %cmp, label %for.body, label %loopexit

If the value of %len (which comes from argument) is greater than the maximum value that i8 can represent, loop falls into infinite, but the store access is wandering in a fixed area without UB. This is done to ensure that in this case the inference is correct.

In D155049#4510451, @Peakulorain wrote:
In D155049#4509641, @nikic wrote:

Could you please explain at a high level why we need all this custom wrapping logic? Why are the no-wrap flags on the AddRec insufficient?

The purpose of using the logic I implemented myself is to focus on calculating how many iterations the index of GEP will wrap. With this value, we can know if the loop will fall into an infinite loop.
for.body:
  %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i8 %iv to i64
  %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
  store i32 0, i32* %arrayidx, align 4
  %inc = add i8 %iv, 1
  %inc_zext = zext i8 %inc to i32
  %cmp = icmp ult i32 %inc_zext, %len
  br i1 %cmp, label %for.body, label %loopexit
If the value of %len (which comes from argument) is greater than the maximum value that i8 can represent, loop falls into infinite, but the store access is wandering in a fixed area without UB. This is done to ensure that in this case the inference is correct.

In such a case the pointer will be something like ((4 * (zext i8 {0,+,1}<%for.body> to i64))<nuw><nsw> + %a)<nuw> rather than the {%a,+,4}<nuw><%for.body> it would be in the non-wrapping case. Why is the restriction to addrec pointers not sufficient for this case?

In D155049#4510693, @nikic wrote:
In D155049#4510451, @Peakulorain wrote:
In D155049#4509641, @nikic wrote:

Could you please explain at a high level why we need all this custom wrapping logic? Why are the no-wrap flags on the AddRec insufficient?

The purpose of using the logic I implemented myself is to focus on calculating how many iterations the index of GEP will wrap. With this value, we can know if the loop will fall into an infinite loop.
for.body:
  %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i8 %iv to i64
  %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
  store i32 0, i32* %arrayidx, align 4
  %inc = add i8 %iv, 1
  %inc_zext = zext i8 %inc to i32
  %cmp = icmp ult i32 %inc_zext, %len
  br i1 %cmp, label %for.body, label %loopexit
If the value of %len (which comes from argument) is greater than the maximum value that i8 can represent, loop falls into infinite, but the store access is wandering in a fixed area without UB. This is done to ensure that in this case the inference is correct.
In such a case the pointer will be something like ((4 * (zext i8 {0,+,1}<%for.body> to i64))<nuw><nsw> + %a)<nuw> rather than the {%a,+,4}<nuw><%for.body> it would be in the non-wrapping case. Why is the restriction to addrec pointers not sufficient for this case?

Thanks for your help, the above case is indeed filtered out by constraints. But please see :

define void @test(i32 signext %len) {...
for.body:
  %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i8 %iv to i64
  %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
  store i32 0, i32* %arrayidx, align 4
  %inc = add nuw nsw i8 %iv, 1
  %inc_zext = zext i8 %inc to i32
  %cmp = icmp slt i32 %inc_zext, %len
  br i1 %cmp, label %for.body, label %loopexit
  ...
}

this case would get {%a,+,4}<nuw><%for.body>. In such a situation, I think it is necessary to calculate how many iterations to wrap. :)

In D155049#4514907, @Peakulorain wrote:
Thanks for your help, the above case is indeed filtered out by constraints. But please see :
define void @test(i32 signext %len) {...
for.body:
  %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i8 %iv to i64
  %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
  store i32 0, i32* %arrayidx, align 4
  %inc = add nuw nsw i8 %iv, 1
  %inc_zext = zext i8 %inc to i32
  %cmp = icmp slt i32 %inc_zext, %len
  br i1 %cmp, label %for.body, label %loopexit
  ...
}
this case would get {%a,+,4}<nuw><%for.body>. In such a situation, I think it is necessary to calculate how many iterations to wrap. :)

Doesn't the add nuw exclude wrapping in this case though? This is why SCEV concludes it's okay to look through the zext.

In D155049#4515837, @nikic wrote:
In D155049#4514907, @Peakulorain wrote:
Thanks for your help, the above case is indeed filtered out by constraints. But please see :
define void @test(i32 signext %len) {...
for.body:
  %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i8 %iv to i64
  %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
  store i32 0, i32* %arrayidx, align 4
  %inc = add nuw nsw i8 %iv, 1
  %inc_zext = zext i8 %inc to i32
  %cmp = icmp slt i32 %inc_zext, %len
  br i1 %cmp, label %for.body, label %loopexit
  ...
}
this case would get {%a,+,4}<nuw><%for.body>. In such a situation, I think it is necessary to calculate how many iterations to wrap. :)
Doesn't the add nuw exclude wrapping in this case though? This is why SCEV concludes it's okay to look through the zext.

I know that nuw flag has excluded wrapping. Even so, on this basis, the BE we get by (MemSize / StepSize + 1) is 501, I'm concerned that this inferred value is not available, so I did a comparison with i8 wrap value. If the inferred value is within the loop iterator wrap value, then we consider it available.

In D155049#4516983, @Peakulorain wrote:
In D155049#4515837, @nikic wrote:
In D155049#4514907, @Peakulorain wrote:
Thanks for your help, the above case is indeed filtered out by constraints. But please see :
define void @test(i32 signext %len) {...
for.body:
  %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i8 %iv to i64
  %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
  store i32 0, i32* %arrayidx, align 4
  %inc = add nuw nsw i8 %iv, 1
  %inc_zext = zext i8 %inc to i32
  %cmp = icmp slt i32 %inc_zext, %len
  br i1 %cmp, label %for.body, label %loopexit
  ...
}
this case would get {%a,+,4}<nuw><%for.body>. In such a situation, I think it is necessary to calculate how many iterations to wrap. :)
Doesn't the add nuw exclude wrapping in this case though? This is why SCEV concludes it's okay to look through the zext.
I know that nuw flag has excluded wrapping. Even so, on this basis, the BE we get by (MemSize / StepSize + 1) is 501, I'm concerned that this inferred value is not available, so I did a comparison with i8 wrap value. If the inferred value is within the loop iterator wrap value, then we consider it available.

In this example, isn't it okay if the max trip count is reported as 501? We actually know that the max trip count must be 256 due to the nuw on the add, even if SCEV fails to realize it. As such, reporting 501 as a conservative over-estimate should be fine. Am I missing something?

Peakulorain updated this revision to Diff 542373.Jul 20 2023, 1:50 AM

In D155049#4517718, @nikic wrote:
In D155049#4516983, @Peakulorain wrote:
In D155049#4515837, @nikic wrote:
In D155049#4514907, @Peakulorain wrote:
Thanks for your help, the above case is indeed filtered out by constraints. But please see :
define void @test(i32 signext %len) {...
for.body:
  %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %idxprom = zext i8 %iv to i64
  %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom
  store i32 0, i32* %arrayidx, align 4
  %inc = add nuw nsw i8 %iv, 1
  %inc_zext = zext i8 %inc to i32
  %cmp = icmp slt i32 %inc_zext, %len
  br i1 %cmp, label %for.body, label %loopexit
  ...
}
this case would get {%a,+,4}<nuw><%for.body>. In such a situation, I think it is necessary to calculate how many iterations to wrap. :)
Doesn't the add nuw exclude wrapping in this case though? This is why SCEV concludes it's okay to look through the zext.
I know that nuw flag has excluded wrapping. Even so, on this basis, the BE we get by (MemSize / StepSize + 1) is 501, I'm concerned that this inferred value is not available, so I did a comparison with i8 wrap value. If the inferred value is within the loop iterator wrap value, then we consider it available.
In this example, isn't it okay if the max trip count is reported as 501? We actually know that the max trip count must be 256 due to the nuw on the add, even if SCEV fails to realize it. As such, reporting 501 as a conservative over-estimate should be fine. Am I missing something?

Okia, looks like I'm worrying too much. Then the new implementation is divided into the following steps:

Collect the load/store instructions that are executed on each iteration of the loop;
Filter out Reads/Write that may overlap;
Infer the maximum number of executables from the total memory size/step value;

Please see if there are any other issues for this patch. :)

Harbormaster completed remote builds in B246817: Diff 542373.Jul 20 2023, 4:42 AM

Hi, nikic. Sorry for bothering. I would like to ask if you have any other doubts about this patch?

Peakulorain updated this revision to Diff 547089.Aug 3 2023, 8:07 PM

Back to prior diff. @nikic

Harbormaster completed remote builds in B250242: Diff 547089.Aug 3 2023, 10:35 PM

In D155049#4559560, @Peakulorain wrote:

Back to prior diff.

Why? The new diff looked better, though it probably should have NoSelfWrap -> NoUnsignedWrap.

Though even then it seems like this patch has more code/checks for which I don't understand the purpose. For example, why do we care whether the memory accesses may overlap or not?

tianshilei1992 added a subscriber: tianshilei1992.Sep 26 2023, 1:33 PM

I do not follow parts of this. The overlapping stuff is odd. We should also filter access earlier. And finally, lot's of the checks seem weird.
What I was expecting is roughly:

Loop count C is unknown.
Size of the object is T.
Access size is A.
Access function is F, we can restrict it to <B, +, S>loop with S > 0.

What we know is that
(T - B) >= 0
or the loop trip count is 0.
And
T >= C * (S * A) + B
which is
(T - B) >= C * S * A
which gives us, under some checks
(T - B) / (S * A) >= C
The left hand side should be constant and thereby bound C.

What am I missing?

llvm/lib/Analysis/ScalarEvolution.cpp
255	Do we have compile time numbers with this set to true?
8184–8195	We have a helper for this `llvm::getObjectSize` https://llvm.org/doxygen/namespacellvm.html#a62ba0e5ee2d86f663c6de4efda6082a7
8252–8255	I also don't understand why we need to check overlapping.
8263–8266	I would have expected this code much earlier. The expectation is that most pointer bases are not bounded, so why do we bother collecting the instructions, checking the access expression, etc. We should look at the base first before we ever consider using the access for reasoning.
8272–8281	I doubt this is necessary given the other restrictions. You already verified the access happens every iteration and it will happen whenever we enter the loop header. This is only necessary if we relax the conditions and allow early exists. Thus, add 1 only if the existing block is not unique or not the latch.
8291	I don't understand this.
8321	If we got a proper value, no need to use the secondary reasoning, right?

fmayer added a subscriber: fmayer.Sep 27 2023, 12:21 PM

jdoerfert added a reviewer: jdoerfert.Sep 27 2023, 1:58 PM

How this is going to affect sanitizers? We still want them being able to detect overflows.

In D155049#4651542, @vitalybuka wrote:

How this is going to affect sanitizers? We still want them being able to detect overflows.

more context: sanitizers use SCEV to decide to not instrument accesses where SCEV tells us that they are in range. With features like this that exploit UB in SCEV, we can no longer rely on this, because the whole point of the sanitizer is to catch UB.

In D155049#4651543, @fmayer wrote:

In D155049#4651542, @vitalybuka wrote:

How this is going to affect sanitizers? We still want them being able to detect overflows.

more context: sanitizers use SCEV to decide to not instrument accesses where SCEV tells us that they are in range. With features like this that exploit UB in SCEV, we can no longer rely on this, because the whole point of the sanitizer is to catch UB.

SCEV (and other helpers) already refine UB, don't they? That said, you can disable the feature, right now via a command line flag. That said, we probably want a "catch all" try not to exploit UB flag.

In D155049#4651544, @jdoerfert wrote:

In D155049#4651543, @fmayer wrote:

In D155049#4651542, @vitalybuka wrote:

How this is going to affect sanitizers? We still want them being able to detect overflows.

more context: sanitizers use SCEV to decide to not instrument accesses where SCEV tells us that they are in range. With features like this that exploit UB in SCEV, we can no longer rely on this, because the whole point of the sanitizer is to catch UB.

SCEV (and other helpers) already refine UB, don't they? That said, you can disable the feature, right now via a command line flag. That said, we probably want a "catch all" try not to exploit UB flag.

Thanks! Yes such a catch all would be great. Just confirming I understand the CL correctly (I didn't actually read all of the code). If I take the loop from the discourse

int square(int num) {
    int A[3];
    for (int i = 0; i < num; ++i)
      A[i] = i * num;
    return A[1] + A[2];
}

SCEV would tell me that 0 <= i < 3 is true?

In D155049#4651545, @fmayer wrote:
In D155049#4651544, @jdoerfert wrote:

In D155049#4651543, @fmayer wrote:

In D155049#4651542, @vitalybuka wrote:

How this is going to affect sanitizers? We still want them being able to detect overflows.

more context: sanitizers use SCEV to decide to not instrument accesses where SCEV tells us that they are in range. With features like this that exploit UB in SCEV, we can no longer rely on this, because the whole point of the sanitizer is to catch UB.

SCEV (and other helpers) already refine UB, don't they? That said, you can disable the feature, right now via a command line flag. That said, we probably want a "catch all" try not to exploit UB flag.

Thanks! Yes such a catch all would be great. Just confirming I understand the CL correctly (I didn't actually read all of the code). If I take the loop from the discourse
int square(int num) {
    int A[3];
    for (int i = 0; i < num; ++i)
      A[i] = i * num;
    return A[1] + A[2];
}
SCEV would tell me that 0 <= i < 3 is true?

And also num < 3?

SCEV (and other helpers) already refine UB, don't they? That said, you can disable the feature, right now via a command line flag. That said, we probably want a "catch all" try not to exploit UB flag.

We should check function attr.

llvm/lib/Analysis/ScalarEvolution.cpp

8138

&& !(F.hasFnAttribute(Attribute::SanitizeAddress) ||
         F.hasFnAttribute(Attribute::SanitizeThread) ||
         F.hasFnAttribute(Attribute::SanitizeMemory) ||
         F.hasFnAttribute(Attribute::SanitizeHWAddress) ||
         F.hasFnAttribute(Attribute::SanitizeMemTag))

8138

or maybe near flag check

In D155049#4651546, @fmayer wrote:
In D155049#4651545, @fmayer wrote:
In D155049#4651544, @jdoerfert wrote:

In D155049#4651543, @fmayer wrote:

In D155049#4651542, @vitalybuka wrote:

How this is going to affect sanitizers? We still want them being able to detect overflows.

more context: sanitizers use SCEV to decide to not instrument accesses where SCEV tells us that they are in range. With features like this that exploit UB in SCEV, we can no longer rely on this, because the whole point of the sanitizer is to catch UB.

SCEV (and other helpers) already refine UB, don't they? That said, you can disable the feature, right now via a command line flag. That said, we probably want a "catch all" try not to exploit UB flag.

Thanks! Yes such a catch all would be great. Just confirming I understand the CL correctly (I didn't actually read all of the code). If I take the loop from the discourse
int square(int num) {
    int A[3];
    for (int i = 0; i < num; ++i)
      A[i] = i * num;
    return A[1] + A[2];
}
SCEV would tell me that 0 <= i < 3 is true?
And also num < 3?

Yes, or at least the loop count is bound by 3. Which should be enough for the unroller to get going.

We can check for attributes, is there a helper that encapsulates all the specific attribute names?

In D155049#4651572, @jdoerfert wrote:
In D155049#4651546, @fmayer wrote:
In D155049#4651545, @fmayer wrote:
In D155049#4651544, @jdoerfert wrote:

In D155049#4651543, @fmayer wrote:

In D155049#4651542, @vitalybuka wrote:

How this is going to affect sanitizers? We still want them being able to detect overflows.

more context: sanitizers use SCEV to decide to not instrument accesses where SCEV tells us that they are in range. With features like this that exploit UB in SCEV, we can no longer rely on this, because the whole point of the sanitizer is to catch UB.

SCEV (and other helpers) already refine UB, don't they? That said, you can disable the feature, right now via a command line flag. That said, we probably want a "catch all" try not to exploit UB flag.

Thanks! Yes such a catch all would be great. Just confirming I understand the CL correctly (I didn't actually read all of the code). If I take the loop from the discourse
int square(int num) {
    int A[3];
    for (int i = 0; i < num; ++i)
      A[i] = i * num;
    return A[1] + A[2];
}
SCEV would tell me that 0 <= i < 3 is true?
And also num < 3?
Yes, or at least the loop count is bound by 3. Which should be enough for the unroller to get going.

We can check for attributes, is there a helper that encapsulates all the specific attribute names?

I'm asking because I think we don't care about the loop count for sanitizer purposes. But if SCEV told us that num < 3 is always true, we might fail to insert some checks

@Peakulorain Are you still interested in pushing this forward? I would otherwise take over.

Hi @Peakulorain, if it is okay with you, @jdoerfert and I will push it forward.

I have proposed a PR https://github.com/llvm/llvm-project/pull/70361 based on this patch.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ScalarEvolution.h

6 lines

lib/

Analysis/

ScalarEvolution.cpp

236 lines

test/

Analysis/

ScalarEvolution/

infer-trip-count-idx-wrap.ll

110 lines

infer-trip-count.ll

191 lines

Diff 547089

llvm/include/llvm/Analysis/ScalarEvolution.h

Show First 20 Lines • Show All 792 Lines • ▼ Show 20 Lines	public:
/// between LHS and RHS. This is used to eliminate casts.		/// between LHS and RHS. This is used to eliminate casts.
bool isLoopBackedgeGuardedByCond(const Loop *L, ICmpInst::Predicate Pred,		bool isLoopBackedgeGuardedByCond(const Loop *L, ICmpInst::Predicate Pred,
const SCEV LHS, const SCEV RHS);		const SCEV LHS, const SCEV RHS);

/// A version of getTripCountFromExitCount below which always picks an		/// A version of getTripCountFromExitCount below which always picks an
/// evaluation type which can not result in overflow.		/// evaluation type which can not result in overflow.
const SCEV getTripCountFromExitCount(const SCEV ExitCount);		const SCEV getTripCountFromExitCount(const SCEV ExitCount);

		/// Returns the upper bound of the loop trip count infered from memory access.
		/// Can not access bytes starting outside the statically allocated size
		/// without being immediate UB.
		/// Returns SCEVCouldNotCompute if the trip count could not be inferred.
		const SCEV getConstantMaxTripCountFromMemAccess(const Loop L);

/// Convert from an "exit count" (i.e. "backedge taken count") to a "trip		/// Convert from an "exit count" (i.e. "backedge taken count") to a "trip
/// count". A "trip count" is the number of times the header of the loop		/// count". A "trip count" is the number of times the header of the loop
/// will execute if an exit is taken after the specified number of backedges		/// will execute if an exit is taken after the specified number of backedges
/// have been taken. (e.g. TripCount = ExitCount + 1). Note that the		/// have been taken. (e.g. TripCount = ExitCount + 1). Note that the
/// expression can overflow if ExitCount = UINT_MAX. If EvalTy is not wide		/// expression can overflow if ExitCount = UINT_MAX. If EvalTy is not wide
/// enough to hold the result without overflow, result unsigned wraps with		/// enough to hold the result without overflow, result unsigned wraps with
/// 2s-complement semantics. ex: EC = 255 (i8), TC = 0 (i8)		/// 2s-complement semantics. ex: EC = 255 (i8), TC = 0 (i8)
const SCEV getTripCountFromExitCount(const SCEV ExitCount, Type *EvalTy,		const SCEV getTripCountFromExitCount(const SCEV ExitCount, Type *EvalTy,
▲ Show 20 Lines • Show All 1,580 Lines • Show Last 20 Lines

llvm/lib/Analysis/ScalarEvolution.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 243 Lines • ▼ Show 20 Lines	EnableFiniteLoopControl("scalar-evolution-finite-loop", cl::Hidden,
cl::desc("Handle <= and >= in finite loops"),		cl::desc("Handle <= and >= in finite loops"),
cl::init(true));		cl::init(true));

static cl::opt<bool> UseContextForNoWrapFlagInference(		static cl::opt<bool> UseContextForNoWrapFlagInference(
"scalar-evolution-use-context-for-no-wrap-flag-strenghening", cl::Hidden,		"scalar-evolution-use-context-for-no-wrap-flag-strenghening", cl::Hidden,
cl::desc("Infer nuw/nsw flags using context where suitable"),		cl::desc("Infer nuw/nsw flags using context where suitable"),
cl::init(true));		cl::init(true));

		static cl::opt<bool> UseMemoryAccessUBForBEInference(
		"scalar-evolution-infer-max-trip-count-from-memory-access", cl::Hidden,
		cl::desc("Infer loop max trip count from memory access"),
		cl::init(false));
		jdoerfertUnsubmitted Not Done Reply Inline Actions Do we have compile time numbers with this set to true? jdoerfert: Do we have compile time numbers with this set to true?

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SCEV class definitions		// SCEV class definitions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Implementation of the SCEV class.		// Implementation of the SCEV class.
//		//

▲ Show 20 Lines • Show All 7,821 Lines • ▼ Show 20 Lines	static unsigned getConstantTripCount(const SCEVConstant *ExitCount) {
// Guard against huge trip counts.		// Guard against huge trip counts.
if (ExitConst->getValue().getActiveBits() > 32)		if (ExitConst->getValue().getActiveBits() > 32)
return 0;		return 0;

// In case of integer overflow, this returns 0, which is correct.		// In case of integer overflow, this returns 0, which is correct.
return ((unsigned)ExitConst->getZExtValue()) + 1;		return ((unsigned)ExitConst->getZExtValue()) + 1;
}		}

		/// Collect Load/Store instructions that must be executed on each iteration
		/// inside loop \p L .
		static void
		collectExeLoadStoreInsideLoop(const Loop *L, DominatorTree &DT,
		SmallVector<Instruction *, 4> &MemInsts) {
		// It is difficult to tell if the load/store instruction is executed on every
		// iteration inside an irregular loop.
		if (!L->isLoopSimplifyForm() \|\| !L->isInnermost())
		return;

		// FIXME: To make the scene more typical, we only analysis loops that have
		// one exiting block and that block must be the latch. To make it easier to
		// capture loops that have memory access and memory access will be executed
		// on each iteration.
		const BasicBlock *LoopLatch = L->getLoopLatch();
		assert(LoopLatch && "See defination of simplify form loop.");
		if (L->getExitingBlock() != LoopLatch)
		return;

		for (auto *BB : L->getBlocks()) {
		// Go here, we can know that Loop is a single exiting and simplified form
		// loop. Make sure that infer from Memory Operation in those BBs must be
		// executed in loop. First step, we can make sure that max execution time
		// of MemAccessBB in loop represents latch max excution time.
		// Such BB as below should be skipped:
		// Entry
		// │
		// ┌─────▼─────┐
		// │Loop Header◄─────┐
		// └──┬──────┬─┘ │
		// │ │ │
		// ┌────────▼──┐ ┌─▼─────┐ │
		// │MemAccessBB│ │OtherBB│ │
		// └────────┬──┘ └─┬─────┘ │
		// │ │ │
		// ┌─▼──────▼─┐ │
		// │Loop Latch├─────┘
		// └────┬─────┘
		// ▼
		// Exit
		if (!DT.dominates(BB, LoopLatch))
		continue;

		for (Instruction &Inst : *BB) {
		if (isa<LoadInst>(&Inst) \|\| isa<StoreInst>(&Inst))
		vitalybukaUnsubmitted Not Done Reply Inline Actions && !(F.hasFnAttribute(Attribute::SanitizeAddress) \|\| F.hasFnAttribute(Attribute::SanitizeThread) \|\| F.hasFnAttribute(Attribute::SanitizeMemory) \|\| F.hasFnAttribute(Attribute::SanitizeHWAddress) \|\| F.hasFnAttribute(Attribute::SanitizeMemTag)) vitalybuka: ``` && !(F.hasFnAttribute(Attribute::SanitizeAddress) \|\| F.hasFnAttribute(Attribute…
		vitalybukaUnsubmitted Not Done Reply Inline Actions or maybe near flag check vitalybuka: or maybe near flag check
		MemInsts.push_back(&Inst);
		}
		}
		}

		/// Returns true if memory access instruction \p I may be overlapping under
		/// recurrence expression \p Rec which comes from the pointer operand of \p I .
		static bool memReadWriteMaybeOverlapping(Instruction *I,
		const SCEVAddRecExpr *Rec,
		ScalarEvolution *SE) {
		assert(isa<LoadInst>(I) \|\| isa<StoreInst>(I));
		const SCEV *AccessSize = SE->getElementSize(I);
		const SCEV Step = Rec->getStepRecurrence(SE);
		// The unknown situation is regarded as a possible overlap.
		if (!AccessSize \|\| !Step \|\| !isa<SCEVConstant>(AccessSize) \|\|
		!isa<SCEVConstant>(Step))
		return true;

		ConstantInt *StepC = cast<SCEVConstant>(Step)->getValue();
		if (StepC->isZero() \|\| StepC->isNegative() \|\|
		StepC->getValue().getActiveBits() > 32)
		return true;

		ConstantInt *AcSizeC = cast<SCEVConstant>(AccessSize)->getValue();
		// When the accessing size is greater than the step size, then overlap occurs.
		if (AcSizeC->getValue().getZExtValue() > StepC->getValue().getZExtValue())
		return true;

		// Notice that pointer start may be narrow wraps arround.
		const SCEV *PtrStart = Rec->getStart();
		if (PtrStart != SE->getPointerBase(Rec))
		return true;

		return false;
		}

		/// Returns a SCEV representing the memory size of pointer \p V .
		/// TODO: Memory size of more types can be identified here.
		static const SCEV getCertainSizeOfMem(const SCEV V, Type *RTy,
		const DataLayout &DL,
		ScalarEvolution *SE) {
		const SCEVUnknown *PtrBase = dyn_cast<SCEVUnknown>(V);
		if (!PtrBase)
		return nullptr;

		// TODO: Memory which has certain size.
		AllocaInst *AllocateInst = dyn_cast<AllocaInst>(PtrBase->getValue());
		if (!AllocateInst)
		return nullptr;

		// Make sure only handle normal array.
		auto *Ty = dyn_cast<ArrayType>(AllocateInst->getAllocatedType());
		auto *ArrSize = dyn_cast<ConstantInt>(AllocateInst->getArraySize());
		if (!Ty \|\| !ArrSize \|\| !ArrSize->isOne())
		return nullptr;

		return SE->getConstant(RTy, DL.getTypeAllocSize(Ty));
		jdoerfertUnsubmitted Not Done Reply Inline Actions We have a helper for this `llvm::getObjectSize` https://llvm.org/doxygen/namespacellvm.html#a62ba0e5ee2d86f663c6de4efda6082a7 jdoerfert: We have a helper for this `llvm::getObjectSize` https://llvm.org/doxygen/namespacellvm.
		}

		static const SCEV howManyItersSelfWrap(const SCEV V, ScalarEvolution *SE) {
		if (auto *AddRec = dyn_cast<SCEVAddRecExpr>(V)) {
		const SCEV *CUpper = SE->getConstant(SE->getUnsignedRangeMax(V));
		const SCEV *CLower = SE->getConstant(SE->getUnsignedRangeMin(V));
		const SCEV *Limit = SE->getMinusSCEV(CUpper, CLower);
		const SCEV Step = AddRec->getStepRecurrence(SE);
		return SE->getUDivCeilSCEV(Limit, Step);
		}
		return SE->getCouldNotCompute();
		}

		/// Returns the smaller one of the wraps that will occur in the indexes.
		static const SCEV getSmallCountOfIdxSelfWrap(Value Ptr, ScalarEvolution *SE) {
		auto *PtrGEP = dyn_cast<GetElementPtrInst>(Ptr);
		if (!PtrGEP)
		return SE->getCouldNotCompute();

		SmallVector<const SCEV *> CountColl;
		for (Value *Index : PtrGEP->indices()) {
		Value *V = Index;
		if (isa<ZExtInst>(V) \|\| isa<SExtInst>(V))
		V = cast<Instruction>(Index)->getOperand(0);
		const SCEV *Count = howManyItersSelfWrap(SE->getSCEV(V), SE);
		if (!isa<SCEVCouldNotCompute>(Count)) {
		CountColl.push_back(Count);
		}
		}

		if (CountColl.empty())
		return SE->getCouldNotCompute();

		return SE->getUMinFromMismatchedTypes(CountColl);
		}

		const SCEV *
		ScalarEvolution::getConstantMaxTripCountFromMemAccess(const Loop *L) {
		SmallVector<Instruction *, 4> MemInsts;
		collectExeLoadStoreInsideLoop(L, DT, MemInsts);

		// Collect AddRecExpr that meets the requirements and can be analyzed.
		SmallPtrSet<const SCEVAddRecExpr *, 2> Exprs;
		using MapType = DenseMap<const SCEVAddRecExpr , const SCEVConstant >;
		MapType IdxWrapMap;
		for (Instruction *I : MemInsts) {
		Value *Ptr = getLoadStorePointerOperand(I);
		assert(Ptr && "getLoadStorePointerOperand changed.");
		auto *AddRec = dyn_cast<SCEVAddRecExpr>(getSCEV(Ptr));
		if (!AddRec \|\| !AddRec->isAffine())
		continue;

		auto *IdxWrap = getSmallCountOfIdxSelfWrap(Ptr, this);
		if (!isa<SCEVConstant>(IdxWrap))
		continue;

		if (!memReadWriteMaybeOverlapping(I, AddRec, this)) {
		Exprs.insert(AddRec);
		IdxWrapMap[AddRec] = cast<SCEVConstant>(IdxWrap);
		}
		jdoerfertUnsubmitted Not Done Reply Inline Actions I also don't understand why we need to check overlapping. jdoerfert: I also don't understand why we need to check overlapping.
		}

		const DataLayout &DL = getDataLayout();
		SmallVector<const SCEV *> InferCountColl;
		for (auto *Rec : Exprs) {
		const SCEV *PtrBase = getPointerBase(Rec);
		const SCEV Step = Rec->getStepRecurrence(this);
		const SCEV *MemSize =
		getCertainSizeOfMem(PtrBase, Step->getType(), DL, this);
		if (!MemSize)
		continue;
		jdoerfertUnsubmitted Not Done Reply Inline Actions I would have expected this code much earlier. The expectation is that most pointer bases are not bounded, so why do we bother collecting the instructions, checking the access expression, etc. We should look at the base first before we ever consider using the access for reasoning. jdoerfert: I would have expected this code much earlier. The expectation is that most pointer bases are…

		// Now we can infer a max execution time by MemLength/StepLength.
		auto *MaxExeCount = dyn_cast<SCEVConstant>(getUDivCeilSCEV(MemSize, Step));
		if (!MaxExeCount \|\| MaxExeCount->getAPInt().getActiveBits() > 32)
		continue;

		// If the loop reaches the maximum number of executions, we can not
		// access bytes starting outside the statically allocated size without
		// being immediate UB. But it is allowed to enter loop header one more
		// time.
		auto *InferCount = dyn_cast<SCEVConstant>(
		getAddExpr(MaxExeCount, getOne(MaxExeCount->getType())));
		// Discard the maximum number of execution times under 32bits.
		if (!InferCount \|\| InferCount->getAPInt().getActiveBits() > 32)
		continue;
		jdoerfertUnsubmitted Not Done Reply Inline Actions I doubt this is necessary given the other restrictions. You already verified the access happens every iteration and it will happen whenever we enter the loop header. This is only necessary if we relax the conditions and allow early exists. Thus, add 1 only if the existing block is not unique or not the latch. jdoerfert: I doubt this is necessary given the other restrictions. You already verified the access happens…

		// Since gep indices are silently zext to the indexing type, we will have
		// a narrow gep index which wraps around rather than increasing strictly.
		// If the maximum number of round trips required by the index is greater
		// or equal to our inferred value, then the inferred value is acceptable,
		// otherwise unknown.
		ConstantInt *WrapVC = IdxWrapMap[Rec]->getValue();
		ConstantInt *InferVC = InferCount->getValue();
		if (InferVC->getValue().getZExtValue() > WrapVC->getValue().getZExtValue())
		continue;
		jdoerfertUnsubmitted Not Done Reply Inline Actions I don't understand this. jdoerfert: I don't understand this.

		InferCountColl.push_back(InferCount);
		}

		if (InferCountColl.empty())
		return getCouldNotCompute();

		return getUMinFromMismatchedTypes(InferCountColl);
		}

unsigned ScalarEvolution::getSmallConstantTripCount(const Loop *L) {		unsigned ScalarEvolution::getSmallConstantTripCount(const Loop *L) {
auto *ExitCount = dyn_cast<SCEVConstant>(getBackedgeTakenCount(L, Exact));		auto *ExitCount = dyn_cast<SCEVConstant>(getBackedgeTakenCount(L, Exact));
return getConstantTripCount(ExitCount);		return getConstantTripCount(ExitCount);
}		}

unsigned		unsigned
ScalarEvolution::getSmallConstantTripCount(const Loop *L,		ScalarEvolution::getSmallConstantTripCount(const Loop *L,
const BasicBlock *ExitingBlock) {		const BasicBlock *ExitingBlock) {
assert(ExitingBlock && "Must pass a non-null exiting block!");		assert(ExitingBlock && "Must pass a non-null exiting block!");
assert(L->isLoopExiting(ExitingBlock) &&		assert(L->isLoopExiting(ExitingBlock) &&
"Exiting block must actually branch out of the loop!");		"Exiting block must actually branch out of the loop!");
const SCEVConstant *ExitCount =		const SCEVConstant *ExitCount =
dyn_cast<SCEVConstant>(getExitCount(L, ExitingBlock));		dyn_cast<SCEVConstant>(getExitCount(L, ExitingBlock));
return getConstantTripCount(ExitCount);		return getConstantTripCount(ExitCount);
}		}

unsigned ScalarEvolution::getSmallConstantMaxTripCount(const Loop *L) {		unsigned ScalarEvolution::getSmallConstantMaxTripCount(const Loop *L) {
const auto *MaxExitCount =		const auto *MaxExitCount =
dyn_cast<SCEVConstant>(getConstantMaxBackedgeTakenCount(L));		dyn_cast<SCEVConstant>(getConstantMaxBackedgeTakenCount(L));
return getConstantTripCount(MaxExitCount);		unsigned MaxExitCountN = getConstantTripCount(MaxExitCount);
		jdoerfertUnsubmitted Not Done Reply Inline Actions If we got a proper value, no need to use the secondary reasoning, right? jdoerfert: If we got a proper value, no need to use the secondary reasoning, right?
		if (UseMemoryAccessUBForBEInference) {
		auto *MaxInferCount = getConstantMaxTripCountFromMemAccess(L);
		if (auto *InferCount = dyn_cast<SCEVConstant>(MaxInferCount)) {
		unsigned InferValue = (unsigned)(InferCount->getValue()->getZExtValue());
		MaxExitCountN = (MaxExitCountN == 0)
		? InferValue
		: std::min(MaxExitCountN, InferValue);
		}
		}
		return MaxExitCountN;
}		}

unsigned ScalarEvolution::getSmallConstantTripMultiple(const Loop *L) {		unsigned ScalarEvolution::getSmallConstantTripMultiple(const Loop *L) {
SmallVector<BasicBlock *, 8> ExitingBlocks;		SmallVector<BasicBlock *, 8> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);		L->getExitingBlocks(ExitingBlocks);

std::optional<unsigned> Res;		std::optional<unsigned> Res;
for (auto *ExitingBB : ExitingBlocks) {		for (auto *ExitingBB : ExitingBlocks) {
▲ Show 20 Lines • Show All 5,321 Lines • ▼ Show 20 Lines	static void PrintLoopInfo(raw_ostream &OS, ScalarEvolution *SE,
OS << "\n";		OS << "\n";

if (SE->hasLoopInvariantBackedgeTakenCount(L)) {		if (SE->hasLoopInvariantBackedgeTakenCount(L)) {
OS << "Loop ";		OS << "Loop ";
L->getHeader()->printAsOperand(OS, /PrintType=/false);		L->getHeader()->printAsOperand(OS, /PrintType=/false);
OS << ": ";		OS << ": ";
OS << "Trip multiple is " << SE->getSmallConstantTripMultiple(L) << "\n";		OS << "Trip multiple is " << SE->getSmallConstantTripMultiple(L) << "\n";
}		}

		if (UseMemoryAccessUBForBEInference) {
		unsigned SmallMaxTrip = SE->getSmallConstantMaxTripCount(L);
		OS << "Loop ";
		L->getHeader()->printAsOperand(OS, /PrintType=/false);
		OS << ": ";
		if (SmallMaxTrip)
		OS << "Small constant max trip is " << SmallMaxTrip << "\n";
		else
		OS << "Small constant max trip couldn't be computed. " << "\n";
		}
}		}

namespace llvm {		namespace llvm {
raw_ostream &operator<<(raw_ostream &OS, ScalarEvolution::LoopDisposition LD) {		raw_ostream &operator<<(raw_ostream &OS, ScalarEvolution::LoopDisposition LD) {
switch (LD) {		switch (LD) {
case ScalarEvolution::LoopVariant:		case ScalarEvolution::LoopVariant:
OS << "Variant";		OS << "Variant";
break;		break;
▲ Show 20 Lines • Show All 1,877 Lines • Show Last 20 Lines

llvm/test/Analysis/ScalarEvolution/infer-trip-count-idx-wrap.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt < %s -disable-output "-passes=print<scalar-evolution>" -scalar-evolution-classify-expressions=0 -scalar-evolution-infer-max-trip-count-from-memory-access 2>&1 \| FileCheck %s

				define void @ComputeMaxTripCountFromArrayIdxWrap(i32 signext %len) {
				; CHECK-LABEL: 'ComputeMaxTripCountFromArrayIdxWrap'
				; CHECK-NEXT: Determining loop execution counts for: @ComputeMaxTripCountFromArrayIdxWrap
				; CHECK-NEXT: Loop %for.body: backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is 2147483646
				; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.body: Trip multiple is 1
				; CHECK-NEXT: Loop %for.body: Small constant max trip is 2147483647
				;
				entry:
				%a = alloca [256 x i32], align 4
				%cmp4 = icmp sgt i32 %len, 0
				br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				br label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%idxprom = zext i8 %iv to i64
				%arrayidx = getelementptr inbounds [256 x i32], [256 x i32]* %a, i64 0, i64 %idxprom
				store i32 0, i32* %arrayidx, align 4
				%inc = add nuw nsw i8 %iv, 1
				%inc_zext = zext i8 %inc to i32
				%cmp = icmp slt i32 %inc_zext, %len
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}

				define void @ComputeMaxTripCountFromArrayIdxWrap2(i32 signext %len) {
				; CHECK-LABEL: 'ComputeMaxTripCountFromArrayIdxWrap2'
				; CHECK-NEXT: Determining loop execution counts for: @ComputeMaxTripCountFromArrayIdxWrap2
				; CHECK-NEXT: Loop %for.body: backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is 2147483646
				; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.body: Trip multiple is 1
				; CHECK-NEXT: Loop %for.body: Small constant max trip is 2147483647
				;
				entry:
				%a = alloca [127 x i32], align 4
				%cmp4 = icmp sgt i32 %len, 0
				br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				br label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%idxprom = zext i8 %iv to i64
				%arrayidx = getelementptr inbounds [127 x i32], [127 x i32]* %a, i64 0, i64 %idxprom
				store i32 0, i32* %arrayidx, align 4
				%inc = add nuw nsw i8 %iv, 1
				%inc_zext = zext i8 %inc to i32
				%cmp = icmp slt i32 %inc_zext, %len
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}

				define void @ComputeMaxTripCountFromArrayIdxWrap3(i32 signext %len) {
				; CHECK-LABEL: 'ComputeMaxTripCountFromArrayIdxWrap3'
				; CHECK-NEXT: Determining loop execution counts for: @ComputeMaxTripCountFromArrayIdxWrap3
				; CHECK-NEXT: Loop %for.body: backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is 2147483646
				; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.body: Trip multiple is 1
				; CHECK-NEXT: Loop %for.body: Small constant max trip is 21
				;
				entry:
				%a = alloca [20 x i32], align 4
				%cmp4 = icmp sgt i32 %len, 0
				br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				br label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%idxprom = zext i8 %iv to i64
				%arrayidx = getelementptr inbounds [20 x i32], [20 x i32]* %a, i64 0, i64 %idxprom
				store i32 0, i32* %arrayidx, align 4
				%inc = add nuw nsw i8 %iv, 1
				%inc_zext = zext i8 %inc to i32
				%cmp = icmp slt i32 %inc_zext, %len
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}

llvm/test/Analysis/ScalarEvolution/infer-trip-count.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_analyze_test_checks.py
				; RUN: opt < %s -disable-output "-passes=print<scalar-evolution>" -scalar-evolution-classify-expressions=0 -scalar-evolution-infer-max-trip-count-from-memory-access 2>&1 \| FileCheck %s

				define void @ComputeMaxTripCountFromArrayNormal(i32 signext %len) {
				; CHECK-LABEL: 'ComputeMaxTripCountFromArrayNormal'
				; CHECK-NEXT: Determining loop execution counts for: @ComputeMaxTripCountFromArrayNormal
				; CHECK-NEXT: Loop %for.body: backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is 2147483646
				; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.body: Trip multiple is 1
				; CHECK-NEXT: Loop %for.body: Small constant max trip is 8
				;
				entry:
				%a = alloca [7 x i32], align 4
				%cmp4 = icmp sgt i32 %len, 0
				br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				br label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%iv = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%idxprom = zext i32 %iv to i64
				%arrayidx = getelementptr inbounds [7 x i32], [7 x i32]* %a, i64 0, i64 %idxprom
				store i32 0, i32* %arrayidx, align 4
				%inc = add nuw nsw i32 %iv, 1
				%cmp = icmp slt i32 %inc, %len
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}


				define void @ComputeMaxTripCountFromZeroArray(i32 signext %len) {
				; CHECK-LABEL: 'ComputeMaxTripCountFromZeroArray'
				; CHECK-NEXT: Determining loop execution counts for: @ComputeMaxTripCountFromZeroArray
				; CHECK-NEXT: Loop %for.body: backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is 2147483646
				; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.body: Trip multiple is 1
				; CHECK-NEXT: Loop %for.body: Small constant max trip is 1
				;
				entry:
				%a = alloca [0 x i32], align 4
				%cmp4 = icmp sgt i32 %len, 0
				br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				br label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%iv = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%idxprom = zext i32 %iv to i64
				%arrayidx = getelementptr inbounds [0 x i32], [0 x i32]* %a, i64 0, i64 %idxprom
				store i32 0, i32* %arrayidx, align 4
				%inc = add nuw nsw i32 %iv, 1
				%cmp = icmp slt i32 %inc, %len
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}

				define void @ComputeMaxTripCountFromExtremArray(i32 signext %len) {
				; CHECK-LABEL: 'ComputeMaxTripCountFromExtremArray'
				; CHECK-NEXT: Determining loop execution counts for: @ComputeMaxTripCountFromExtremArray
				; CHECK-NEXT: Loop %for.body: backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: constant max backedge-taken count is 2147483646
				; CHECK-NEXT: Loop %for.body: symbolic max backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Loop %for.body: Predicated backedge-taken count is (-1 + %len)
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.body: Trip multiple is 1
				; CHECK-NEXT: Loop %for.body: Small constant max trip is 2147483647
				;
				entry:
				%a = alloca [4294967295 x i1], align 4
				%cmp4 = icmp sgt i32 %len, 0
				br i1 %cmp4, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				br label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%iv = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%idxprom = zext i32 %iv to i64
				%arrayidx = getelementptr inbounds [4294967295 x i1], [4294967295 x i1]* %a, i64 0, i64 %idxprom
				store i1 0, i1* %arrayidx, align 4
				%inc = add nuw nsw i32 %iv, 1
				%cmp = icmp slt i32 %inc, %len
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}


				define void @ComputeMaxTripCountFromArrayInBranch(i32 signext %len) {
				; CHECK-LABEL: 'ComputeMaxTripCountFromArrayInBranch'
				; CHECK-NEXT: Determining loop execution counts for: @ComputeMaxTripCountFromArrayInBranch
				; CHECK-NEXT: Loop %for.cond: backedge-taken count is (0 smax %len)
				; CHECK-NEXT: Loop %for.cond: constant max backedge-taken count is 2147483647
				; CHECK-NEXT: Loop %for.cond: symbolic max backedge-taken count is (0 smax %len)
				; CHECK-NEXT: Loop %for.cond: Predicated backedge-taken count is (0 smax %len)
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.cond: Trip multiple is 1
				; CHECK-NEXT: Loop %for.cond: Small constant max trip is 2147483648
				;
				entry:
				%a = alloca [8 x i32], align 4
				br label %for.cond

				for.cond:
				%iv = phi i32 [ %inc, %for.inc ], [ 0, %entry ]
				%cmp = icmp slt i32 %iv, %len
				br i1 %cmp, label %for.body, label %for.cond.cleanup

				for.cond.cleanup:
				br label %for.end

				for.body:
				%cmp1 = icmp slt i32 %iv, 8
				br i1 %cmp1, label %if.then, label %if.end

				if.then:
				%idxprom = sext i32 %iv to i64
				%arrayidx = getelementptr inbounds [8 x i32], [8 x i32]* %a, i64 0, i64 %idxprom
				store i32 0, i32* %arrayidx, align 4
				br label %if.end

				if.end:
				br label %for.inc

				for.inc:
				%inc = add nsw i32 %iv, 1
				br label %for.cond

				for.end:
				ret void
				}

				define void @ComputeMaxTripCountFromMultiDemArray(i32 signext %len) {
				; CHECK-LABEL: 'ComputeMaxTripCountFromMultiDemArray'
				; CHECK-NEXT: Determining loop execution counts for: @ComputeMaxTripCountFromMultiDemArray
				; CHECK-NEXT: Loop %for.cond: backedge-taken count is (0 smax %len)
				; CHECK-NEXT: Loop %for.cond: constant max backedge-taken count is 2147483647
				; CHECK-NEXT: Loop %for.cond: symbolic max backedge-taken count is (0 smax %len)
				; CHECK-NEXT: Loop %for.cond: Predicated backedge-taken count is (0 smax %len)
				; CHECK-NEXT: Predicates:
				; CHECK: Loop %for.cond: Trip multiple is 1
				; CHECK-NEXT: Loop %for.cond: Small constant max trip is 2147483648
				;
				entry:
				%a = alloca [3 x [5 x i32]], align 4
				br label %for.cond

				for.cond:
				%iv = phi i32 [ %inc, %for.inc ], [ 0, %entry ]
				%cmp = icmp slt i32 %iv, %len
				br i1 %cmp, label %for.body, label %for.cond.cleanup

				for.cond.cleanup:
				br label %for.end

				for.body:
				%arrayidx = getelementptr inbounds [3 x [5 x i32]], [3 x [5 x i32]]* %a, i64 0, i64 3
				%idxprom = sext i32 %iv to i64
				%arrayidx1 = getelementptr inbounds [5 x i32], [5 x i32]* %arrayidx, i64 0, i64 %idxprom
				store i32 0, i32* %arrayidx1, align 4
				br label %for.inc

				for.inc:
				%inc = add nsw i32 %iv, 1
				br label %for.cond

				for.end:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ScalarEvolution] Infer loop max trip count from memory accessesNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 547089

llvm/include/llvm/Analysis/ScalarEvolution.h

llvm/lib/Analysis/ScalarEvolution.cpp

llvm/test/Analysis/ScalarEvolution/infer-trip-count-idx-wrap.ll

llvm/test/Analysis/ScalarEvolution/infer-trip-count.ll

[ScalarEvolution] Infer loop max trip count from memory accesses
Needs ReviewPublic