Data references in a loop should not access elements over the
statically allocated size. So we can infer a loop max trip count
from this undefined behavior.
Details
- Reviewers
reames nikic mkazantsev xbolva00 jdoerfert
Diff Detail
Event Timeline
Hi, All,
This patch is a fixed commit.
Related to:
https://reviews.llvm.org/D109821
https://reviews.llvm.org/D113554
In order to infer this trip count, I divide the process into the following steps:
- Collect the load/store instructions that are executed on each iteration of the loop;
- Filter out Reads/Write that may overlap;
- Calculate the possible wrap value for each Ptr index and record the smaller one;
- Infer the maximum number of executables from the total memory size/step value;
- Compare the calculated value with the smaller wrap value in the index to ensure that the step is strictly increasing.
Happy to see it!
I am mainly interested in cases when you have a small array and with this new analyzed size, you can do unrolling.
Before we start looking at the implementation, please:
- Add an option that uses the new logic to refine the standard BE count results.
- Move tests to llvm/test/Analysis/ScalarEvolution using that option.
Could you please explain at a high level why we need all this custom wrapping logic? Why are the no-wrap flags on the AddRec insufficient?
The purpose of using the logic I implemented myself is to focus on calculating how many iterations the index of GEP will wrap. With this value, we can know if the loop will fall into an infinite loop.
for.body: %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i8 %iv to i64 %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom store i32 0, i32* %arrayidx, align 4 %inc = add i8 %iv, 1 %inc_zext = zext i8 %inc to i32 %cmp = icmp ult i32 %inc_zext, %len br i1 %cmp, label %for.body, label %loopexit
If the value of %len (which comes from argument) is greater than the maximum value that i8 can represent, loop falls into infinite, but the store access is wandering in a fixed area without UB. This is done to ensure that in this case the inference is correct.
In such a case the pointer will be something like ((4 * (zext i8 {0,+,1}<%for.body> to i64))<nuw><nsw> + %a)<nuw> rather than the {%a,+,4}<nuw><%for.body> it would be in the non-wrapping case. Why is the restriction to addrec pointers not sufficient for this case?
Thanks for your help, the above case is indeed filtered out by constraints. But please see :
define void @test(i32 signext %len) {... for.body: %iv = phi i8 [ %inc, %for.body ], [ 0, %for.body.preheader ] %idxprom = zext i8 %iv to i64 %arrayidx = getelementptr inbounds [500 x i32], [500 x i32]* %a, i64 0, i64 %idxprom store i32 0, i32* %arrayidx, align 4 %inc = add nuw nsw i8 %iv, 1 %inc_zext = zext i8 %inc to i32 %cmp = icmp slt i32 %inc_zext, %len br i1 %cmp, label %for.body, label %loopexit ... }
this case would get {%a,+,4}<nuw><%for.body>. In such a situation, I think it is necessary to calculate how many iterations to wrap. :)
Doesn't the add nuw exclude wrapping in this case though? This is why SCEV concludes it's okay to look through the zext.
I know that nuw flag has excluded wrapping. Even so, on this basis, the BE we get by (MemSize / StepSize + 1) is 501, I'm concerned that this inferred value is not available, so I did a comparison with i8 wrap value. If the inferred value is within the loop iterator wrap value, then we consider it available.
In this example, isn't it okay if the max trip count is reported as 501? We actually know that the max trip count must be 256 due to the nuw on the add, even if SCEV fails to realize it. As such, reporting 501 as a conservative over-estimate should be fine. Am I missing something?
Okia, looks like I'm worrying too much. Then the new implementation is divided into the following steps:
- Collect the load/store instructions that are executed on each iteration of the loop;
- Filter out Reads/Write that may overlap;
- Infer the maximum number of executables from the total memory size/step value;
Please see if there are any other issues for this patch. :)
Hi, nikic. Sorry for bothering. I would like to ask if you have any other doubts about this patch?
Why? The new diff looked better, though it probably should have NoSelfWrap -> NoUnsignedWrap.
Though even then it seems like this patch has more code/checks for which I don't understand the purpose. For example, why do we care whether the memory accesses may overlap or not?
I do not follow parts of this. The overlapping stuff is odd. We should also filter access earlier. And finally, lot's of the checks seem weird.
What I was expecting is roughly:
Loop count C is unknown.
Size of the object is T.
Access size is A.
Access function is F, we can restrict it to <B, +, S>loop with S > 0.
What we know is that
(T - B) >= 0
or the loop trip count is 0.
And
T >= C * (S * A) + B
which is
(T - B) >= C * S * A
which gives us, under some checks
(T - B) / (S * A) >= C
The left hand side should be constant and thereby bound C.
What am I missing?
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
255 | Do we have compile time numbers with this set to true? | |
8184–8195 | We have a helper for this llvm::getObjectSize https://llvm.org/doxygen/namespacellvm.html#a62ba0e5ee2d86f663c6de4efda6082a7 | |
8252–8255 | I also don't understand why we need to check overlapping. | |
8263–8266 | I would have expected this code much earlier. The expectation is that most pointer bases are not bounded, so why do we bother collecting the instructions, checking the access expression, etc. We should look at the base first before we ever consider using the access for reasoning. | |
8268–8278 | If we got a proper value, no need to use the secondary reasoning, right? | |
8272–8281 | I doubt this is necessary given the other restrictions. You already verified the access happens every iteration and it will happen whenever we enter the loop header. This is only necessary if we relax the conditions and allow early exists. Thus, add 1 only if the existing block is not unique or not the latch. | |
8291 | I don't understand this. |
How this is going to affect sanitizers? We still want them being able to detect overflows.
more context: sanitizers use SCEV to decide to not instrument accesses where SCEV tells us that they are in range. With features like this that exploit UB in SCEV, we can no longer rely on this, because the whole point of the sanitizer is to catch UB.
SCEV (and other helpers) already refine UB, don't they? That said, you can disable the feature, right now via a command line flag. That said, we probably want a "catch all" try not to exploit UB flag.
Thanks! Yes such a catch all would be great. Just confirming I understand the CL correctly (I didn't actually read all of the code). If I take the loop from the discourse
int square(int num) { int A[3]; for (int i = 0; i < num; ++i) A[i] = i * num; return A[1] + A[2]; }
SCEV would tell me that 0 <= i < 3 is true?
SCEV (and other helpers) already refine UB, don't they? That said, you can disable the feature, right now via a command line flag. That said, we probably want a "catch all" try not to exploit UB flag.
We should check function attr.
llvm/lib/Analysis/ScalarEvolution.cpp | ||
---|---|---|
8138 | && !(F.hasFnAttribute(Attribute::SanitizeAddress) || F.hasFnAttribute(Attribute::SanitizeThread) || F.hasFnAttribute(Attribute::SanitizeMemory) || F.hasFnAttribute(Attribute::SanitizeHWAddress) || F.hasFnAttribute(Attribute::SanitizeMemTag)) | |
8138 | or maybe near flag check |
Yes, or at least the loop count is bound by 3. Which should be enough for the unroller to get going.
We can check for attributes, is there a helper that encapsulates all the specific attribute names?
I'm asking because I think we don't care about the loop count for sanitizer purposes. But if SCEV told us that num < 3 is always true, we might fail to insert some checks
@Peakulorain Are you still interested in pushing this forward? I would otherwise take over.
I have proposed a PR https://github.com/llvm/llvm-project/pull/70361 based on this patch.
Do we have compile time numbers with this set to true?