[BasicAA] Support arbitrary pointer sizes (and fix an overflow bug)
Needs ReviewPublic

Authored by hfinkel on Oct 6 2017, 7:59 PM.

Details

Summary

Motivated by the discussion in D38499, this patch updates BasicAA to support arbitrary pointer sizes by switching most remaining non-APInt calculations to use APInt. The size of these APInts is set to the maximum pointer size (maximum over all address spaces described by the data layout string).

Most of this translation is straightforward (although needs to be checked carefully), but this patch contains a fix for a bug that revealed itself during this translation process. In order for test/Analysis/BasicAA/gep-and-alias.ll to pass, which is run with 32-bit pointers, the intermediate calculations must be performed using 64-bit integers. This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1*V+C2, and we then multiply this by Scale, and distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value. Thus, when initially applying the APInt conversion, because the maximum pointer size in this test is 32 bits, it started failing. Suspicious, I created a 64-bit version of this test (included here), and that failed (miscompiled) on trunk for a similar reason (the multiplication can overflow).

After fixing this overflow bug, the first test case (at least) in Analysis/BasicAA/q.bad.ll started failing. This is also a 32-bit test, and was relying on having 64-bit intermediate values to have BasicAA return an accurate result. In order to fix this problem, and because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code (especially portable code using int64_t), it seems reasonable to always use at least 64-bit integers. In this way, we won't regress our analysis capabilities (and there's a command-line option added, so experimenting with this should be easy).

This should also fix the problem motivating D38499. Michael, can you please test this, improve the test case from D38499 so it can included, and generate additional test cases if possible for extra-large pointers.

Please review.

Diff Detail

hfinkel created this revision.Oct 6 2017, 7:59 PM

because I believe that it is not uncommon to use i64 indexing expressions in 32-bit code

Whether or not that's true at the source-code level, it's should be uncommon at the IR level; instcombine will canonicalize GEP indexing to use pointer-width values.

lib/Analysis/BasicAliasAnalysis.cpp
519

I'm not sure I understand this comment.

It's true that C2*Scale can overflow, but I'm not sure it makes sense to try to address that here. The multiply by the scale can overflow no matter where the scale comes from (assuming the GEP isn't marked inbounds).

hfinkel added inline comments.Oct 10 2017, 3:39 PM
lib/Analysis/BasicAliasAnalysis.cpp
519

Can you please elaborate?

If this overflows, then the offset here isn't an offset, and the reasoning done later won't be correct.

In the test case, the GEPs are marked as inbounds. I don't think that's relevant here.

efriedma added inline comments.Oct 10 2017, 4:14 PM
lib/Analysis/BasicAliasAnalysis.cpp
519

Consider the following testcase:

target datalayout = "e-m:e-p:32:32-f64:32:64-f80:32-n8:16:32-S128"
define i32* @b1(i32 *%a) {
%r = getelementptr i32, i32* %a, i32 -2147483648
ret i32* %r
}
define i32* @b2(i32 *%a) {
%o = add i32 -2147483648, 0
%r = getelementptr i32, i32* %a, i32 %o
ret i32* %r
}

In both cases, %a and %r should be mustalias. If you run "-aa-eval", we somehow conclude that the pointers are mustalias in b1, and noalias in b2. If you're trying to make overflow well-behaved, it doesn't make sense to treat the overflow introduced by GetLinearExpression differently from the overflow inherent in scaling.

hfinkel added inline comments.Oct 10 2017, 10:14 PM
lib/Analysis/BasicAliasAnalysis.cpp
519

I agree, however, as I attempted to explain in the comment:

  1. The problem is not in GetLinearExpression itself, it is with how the results of GetLinearExpression are being used here.
  2. What's happening here, where we're multiplying by the scale (which by itself is always fine), and applying the distributive property, can introduce an overflow in cases where an overflow might not have existed in the original program.

I believe that the test cases demonstrate this: there are certainly values for the function parameters for which they'll be no overflow in the address calculations.

efriedma added inline comments.Oct 11 2017, 12:15 PM
lib/Analysis/BasicAliasAnalysis.cpp
519

I think aliasGEP needs to be changed to accept that decompsed GEP arithmetic will overflow, or somehow adopt much more restrictive overflow checking (in many more places than this patch does).


Suppose we want to adopt more restrictive overflow checking. First, so we have an expression (C1*V+C2)*Scale. If we treat (C1*V+C2) as opaque, there's one place this can overflow: the multiply by Scale. Currently, we completely ignore this possibility.

So now, instead of treating the inner expression as opaque, we want to decompose it to something like (C1*Scale)*V+C2*Scale. There are four arithmetic operations here. All four of them can potentially overflow. And two of them can't be overflow-checked here because V isn't known at compile-time.

Given that, I don't see how overflow-checking C2 * Scale accomplishes anything except hiding the problem for your exact testcase.

mppf added a comment.Oct 11 2017, 1:16 PM

@hfinkel - I'm obviously not your main reviewer, but I did spend a few minutes looking at this.

First, it appears that this changeset addresses the problem I was having. I'm running in to a new problem with 128-bit pointers when using trunk (vs LLVM 4 or 5) which I'll suggest a fix for / bug report once I've got something minimized. I'll be working on that as well as generating a better test case for this BasicAA issue.

Thanks!

lib/Analysis/BasicAliasAnalysis.cpp
379

This is meant to just take the bottom PointerSize bits. Shouldn't you use APInt.trunc to express it more simply?

387

I find it surprising that you needed to do this, but I don't have a great understanding of what's going on. In the change description, you pointed out

This is because, as noted in the patch, when GetLinearExpression decomposes an expression into C1*V+C2, and we then multiply this by Scale, and distribute, to get (C1*Scale)*V + C2*Scale, it can be the case that, even through C1*V+C2 does not overflow for relevant values of V, (C2*Scale) can overflow. If this happens, later logic will draw invalid conclusions from the (base) offset value.

Can you explain why it's not sufficient to do these computations in the number ring for the GEP? E.g. if all GEPs were with 32-bit pointers, mathematically the distribution should work if everything is done with 32-bit numbers, including with overflow. And the LLVM spec says that is what these GEPs mean...

Is it just that the alias analysis pass then conservatively throws up its hands if overflow occurred? Or is it the case that the computations done here might cross different pointer sizes?

492

I'm probably missing something, but isn't .sextOrSelf(x).sextOrTrunc(x) going to do the same thing as .sextOrTrunc(x) ?

553

I've never seen !! used like this before. Is it intentional? Is it the same as if (Scale != 0) ? If so, wouldn't that be clearer?

1098

This addition seems surprising to me. Why wouldn't we just use APInt for the computation below? Is this a workaround for something?

hfinkel added inline comments.Oct 11 2017, 11:52 PM
lib/Analysis/BasicAliasAnalysis.cpp
379

Yes, where the result is sign extended. We can't just trunc the APInt, however, or its size will be wrong (I believe that this function ends up dealing with cases where we're dealing with a pointer size that'smaller than the maximum/initial one).

You're right, however, that writing Offset.trunc(PointerSize).sextOrSelf(PrevBitWidth) would be clearer.

387

I don't have a great answer to this question, but I can say that some of the logic here doesn't work if we assume that we're working mod 2^n. For one thing, I think that makes it very hard to do any kind of relational comparisons. a < b does not have a useful meaning if a and b are both really congruence classes (i.e., if all we know is that a is really a + k*2^n, for some k, and b is really b + j*2^n, for some j, then it's impossible to conclude which number is greater or smaller than some other).

As I recall, we were specifically running into a issue with this check:

// If we know all the variables are positive, then GEP1 >= GEP1BasePtr.
// If GEP1BasePtr > V2 (GEP1BaseOffset > 0) then we know the pointers
// don't alias if V2Size can fit in the gap between V2 and GEP1BasePtr.
if (AllPositive && GEP1BaseOffset.sgt(0) && GEP1BaseOffset.uge(V2Size))
  return NoAlias;

The GEP1BaseOffset > 0 isn't something you can meaningfully do if all we know is the value is some element of its congruence class (because, in that case, all numbers are positive and negative). The other comparison has this problem too.

492

I think you've overlooked one parenthesis.

I'm trying to match the original code here, so I'm sign extending CIdx before the multiplication. It seemed like the result of the multiplication may have returned a 64-bit APInt (even if it was originally 32-bits), and so in that case, we need to truncate again (and there is no truncOrSelf). Looking through the APInt source code, I can't explain that behavior, so I'll look at this again.

519

I think aliasGEP needs to be changed to accept that decompsed GEP arithmetic will overflow, or somehow adopt much more restrictive overflow checking (in many more places than this patch does).

I think that it does need to be changed, but if we can fix it like this, that probably preserves our optimization abilities. Changing it in other ways, AFAIKT, will weaken it (perhaps unnecessarily). The only other place I see that might have this issue (introducing new compile-time overflows) is in BasicAAResult::constantOffsetHeuristic. There may, however, be a more-general problem...

If we treat (C1*V+C2) as opaque, there's one place this can overflow: the multiply by Scale. Currently, we completely ignore this possibility.

If it is opaque, then it is just a value (regardless of whether or not its computation depended on some overflowing result). Is there something we'd need to do?

So now, instead of treating the inner expression as opaque, we want to decompose it to something like (C1*Scale)*V+C2*Scale. There are four arithmetic operations here. All four of them can potentially overflow. And two of them can't be overflow-checked here because V isn't known at compile-time.

When I looked at this previously, my thought had been that only the C2*Scale overflow could happen in well-defined code. I may have been wrong about that. The issue is that if C1*Scale overflows then C1*V is likely to overflow in the original program. That, however, is clearly not necessarily true. So, yes, I agree, we should check that too.

Then there's the other two operations. They, indeed, might overflow. I don't see anything to prevent them from overflowing even if the original expression did not (essentially in some cases where |V| < Scale and |C1|,|C2| are large in the right ways). In the face of such overflows, the decomposition may not be sound, unfortunately.

Given that, I don't see how overflow-checking C2 * Scale accomplishes anything except hiding the problem for your exact testcase.

This may be a fair criticism. I think that the general problem of dealing with overflows is going to need a careful audit and a lot of changes. There's a lot of logic here that likely isn't completely sound. However, I don't want to make things worse, certainly, and this was a pre-existing regression test. I should check for C1*Scale overflow too (to avoid making anything worse).

More-general pre-existing problems, however, I'd recommend treating as orthogonal to this change. I think that more-consistently using APInts makes addressing these problems easier, not harder, in follow-up work.

553

Yea, it's intentional. IIRC, APInt does not have a Boolean conversion. This is used other places in LLVM's codebase.

1098

Ah, sort of. Updating StructLayout::getElementOffset to take an APInt does not seem worthwhile. You're right, however: this is the wrong test. It should say:

if (C1->getActiveBits() > 64 || C2->getActiveBits() > 64)

mppf added a comment.Oct 12 2017, 7:15 AM

First, it appears that this changeset addresses the problem I was having. I'm running in to a new problem with 128-bit pointers when using trunk (vs LLVM 4 or 5) which I'll suggest a fix for / bug report once I've got something minimized. I'll be working on that as well as generating a better test case for this BasicAA issue.

@hfinkel fyi, the new problem I mentioned (since LLVM 5) appears to be due to D37460. It actually doesn't have anything to do with 128-bit pointers. I'll comment on that (closed) review to bring up the issue, but let me know if a different action is more appropriate. Thanks!

efriedma added inline comments.Oct 12 2017, 11:57 AM
lib/Analysis/BasicAliasAnalysis.cpp
519

I think that more-consistently using APInts makes addressing these problems easier, not harder, in follow-up work.

Okay, this makes sense.

mppf added a comment.Nov 1 2017, 8:53 AM

This change is still important to me. What needs to happen next for it to make progress? Thanks!

I read through the comments and it seems like everything has been addressed. Does anything more need to happen to have this land?

efriedma added inline comments.Wed, Jul 11, 3:20 PM
lib/Analysis/BasicAliasAnalysis.cpp
519

This still needs to be updated (at least, the comment needs to change).

1098

This still needs to be updated.