When SimplifyCFG, and probably other passes, create a call to
llvm.assume to retain some knowledge, they need to update the state of
GlobalsAA. If not, later passes might think the function is side-effect
free, or read only, which it is not, due to the llvm.assume call.
Deducing the wrong memory effects can lead to problems, e.g., hoisting
of a call and thereby introducing UB.
Details
Diff Detail
Event Timeline
I don't think this is a GlobalsAA issue, this is a llvm.assume modeling issue
using the test case in the bug
define internal void @foo(ptr %p) { %c = icmp eq ptr %p, null br i1 %c, label %err, label %ok err: unreachable ok: ret void } define void @bar() { call void @foo() ret void }
opt -aa-pipeline= 'function-attrs,function(simplifycfg)' on the IR in the issue still shows memory(none) on foo regardless of AA even though there is an assume.
We either need to not add assumes in function passes, or model assumes better
That is a good point. However, I fear just updating the function attribute when SimplifyCFG adds a call is also insufficient. Don't we need both?
Let's look at @aeubanks case first. We have a transform like this:
define internal void @src(ptr %p) memory(none) { %c = icmp eq ptr %p, null br i1 %c, label %err, label %ok err: unreachable ok: ret void } define internal void @tgt(ptr %p) memory(none) { %c = icmp eq ptr %p, null %0 = xor i1 %c, true call void @llvm.assume(i1 %0) memory(inaccessiblemem: readwrite) ret void }
I believe this transform is correct. The memory attribute (and other similar attributes) always operate from the perspective of effects observable by the caller. In this case, the caller can legally treat the @tgt function as memory(none). The fact that the whole function is memory(none) does not imply that there are no memory accesses inside the function -- this is most obviously true for accesses to allocas, but I believe this is essentially the same situation. You can think of the "inaccessible memory" being modified as being function-local inaccessible memory, effects on which cannot be observed beyond the function boundary.
Now for @jdoerfert's case, things are a bit more interesting. Again, I believe it is fine for FunctionAttrs to infer memory(none) on the function, but the GlobalsAA interaction could also cause us to ignore the assume effects inside the function.
For assumes in particular, I believe that the memory effects exists only to avoid optimizing the assume away. Assumes don't really have effects. The property ensuring their correctness is the lack of speculatable, which prevents them from being hoisted past control flow. So I believe from a correctness perspective, treating assumes as memory(none) is fine as well. However, it may result in assumes being dropped unprofitably (though in practice this probably doesn't happen because we don't use AA to drive DCE).
However, is this problem actually specific to assumes? For example, what if the vectorizer replaces plain load/store with some kind of masked load/store intrinsic -- will that get treated as memory(none) by GlobalsAA as well? If yes, that would certainly be wrong and could lead to real miscompilations.
And if yes, then I think it's clear that the approach you're persuing here cannot work. This means that we have to update the call graph every time an intrinsic call is introduced, while we currently widely assume that this is not necessary. In D141190 you implemented a change to start including non-nocallback intrinsics in the call graph and made GlobalsAA rely on that -- I think we just found out what the reason for the original design was and probably need to revert that change. As long as arbitrary transforms can insert intrinsics, we cannot require them to be part of the call graph.
I think I agree with you.
And if yes, then I think it's clear that the approach you're persuing here cannot work. This means that we have to update the call graph every time an intrinsic call is introduced, while we currently widely assume that this is not necessary. In D141190 you implemented a change to start including non-nocallback intrinsics in the call graph and made GlobalsAA rely on that -- I think we just found out what the reason for the original design was and probably need to revert that change. As long as arbitrary transforms can insert intrinsics, we cannot require them to be part of the call graph.
I'm not sold on this, though it does not matter, see below. (W/o D141190, no intrinsics are ever represented. So, how does not representing them at all help us against the problem that occur if we do not representing them if they are added later?)
It looks like we do not need this after all. We might drop the assumptions (or the calls containing them) but that is fine. We cannot hoist them, and consequently the UB we assume not to happen will not be accidentally triggered. Other calls (intrinsic or not) that are inserted late, e.g., by the vectorizer, should similarly have less effects than the original code they replace, which should again allow the reasoning of @nikic to hold.
I'll close the bug as invalid.