This fixes a miscompile in NVPTX that's been exposed by
https://github.com/llvm/llvm-project/commit/fa5d31f825699b0fe858d4f432bd3fbbbec523c8
and which resulted in shared memory stores/loads being incorrectly eliminated.
https://lists.llvm.org/pipermail/llvm-dev/2021-November/154060.html
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
60 ms | x64 debian > LLVM.Bindings/Go::go.test |
Event Timeline
I don't think this should be part of the main AA implementation, but rather be sunk into the AA provider that is producing incorrect results (here GlobalModRef, unless BasicAA also produces an incorrect result?)
Test file name
llvm/test/Analysis/GlobalsModRef/intrinsic_convergent.ll | ||
---|---|---|
1 ↗ | (On Diff #392857) | Test file name no longer matches |
BasicAA does not seem to have any impact on the miscompile.
I've moved the code to GlobalModRef and changed it to check for the absence of nosync.
The good news is that it fixes my case. The bad -- it breaks a lot of other GlobalModRef tests. IIUIC, it's due to the functions in the tests not having nosync.
What's supposed to be the source of nosync attribute? Is it expected to be set explicitly by the user, or does LLVM relies on automatically inferring it?
Updated tests to use nosync attributes where the tests assumed it.
Added nosync to a couple of objc intrinsics.
What's supposed to be the source of nosync attribute? Is it expected to be set explicitly by the user, or does LLVM relies on automatically inferring it?
Generally, we expect to infer it in attributor/function-attrs. Adding nosync to the tests is fine.
llvm/include/llvm/IR/Intrinsics.td | ||
---|---|---|
419 ↗ | (On Diff #392935) | I don't think this is right. Among other issues, refcounting on objc objects can run arbitrary code. |
llvm/lib/Analysis/GlobalsModRef.cpp | ||
916 | This isn't the right way to query an attribute on a call. Should be something like Call->hasFnAttr(Attribute::NoSync). |
EDIT:
Second thought, this does not fix my bug: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154185.html
I think it may. At least it does seem to preserve the load and produces identical code for both functions in my build. Unpatched LLVM does not.
Args: bin/opt -debug -aa-pipeline=basic-aa,globals-aa -passes=require<globals-aa>,function(gvn<pre;load-pre;split-backedge-load-pre;memdep>) -S GVN iteration: 0 GVN: load i32 %r is clobbered by call void @llvm.sync() GVN iteration: 0 GVN: load i32 %r is clobbered by call void @sync()
But then I'm confused. The code that is patched here is generic for calls, no? I think this is a particular intrinsic issue.
I don't know why @llvm.sync and @sync are treated differently in your example. I doubt this patch will change that.
On the other hand, both @llvm.sync and @sync are just functions w/o nosync (and no other relevant attributes) and that is sufficient to treat them conservatively with this patch in place.
It's quite possible that in your case this patch only masks the issue.
It seems the function w/o nosycn is already treated as potential clobber of the global while the intrinsic is not. This patch now "unifies" it by looking for nosync at the call site, but maybe we should not pretend intrinsics are special.
Here is your original example with a function instead of an intrinsic, all seems to work just fine: https://godbolt.org/z/7cn8jKz17
In general, GlobalsAA follows the call graph, and therefore conservatively assumes a call to an external function can call back into the current module.
The callgraph code special-cases intrinsics, though. Except for a few specific intrinsics, we assume intrinsics can't call back into the current module. See Intrinsic::isLeaf.
That is a very good point. Let me see if I can figure out what makes intrinsics (even non-existing ones) special.
- Then the intrinsics should be marked with the attribute that says they won't call back into the module: nocallback (which has no lang ref entry... https://reviews.llvm.org/D90275#2454912, anyway).
- Not calling back into the module is not sufficient. Nosync is necessary, as noted here. I'm still somewhat unsure why it is checked for a call but OK, that might be the interface. I fear we forget to check it in other places too.
Sounds like isLeaf is not not completely correct, either. Earlier comment mentioned that "refcounting on objc objects can run arbitrary code".
While poking at the Intrinsics.td I've noticed that NoSync is supposed to be set on intrinsics by default. The fact that int_objc_autoreleasePoolPush was affected by this patch suggests that we somehow failed to propagate it.
- Then the intrinsics should be marked with the attribute that says they won't call back into the module: nocallback (which has no lang ref entry... https://reviews.llvm.org/D90275#2454912, anyway).
Nor does it have a matching intrinsic property.
- Not calling back into the module is not sufficient. Nosync is necessary, as noted here. I'm still somewhat unsure why it is checked for a call but OK, that might be the interface. I fear we forget to check it in other places too.
I am way out of my depth here, so it's quite possible that I'm doing it wrong. If there's a better way/place to do it, I'm open to suggestions.
I think we should locate the place intrinsics are handled special right now. And there we require the function to be nocallback and nosync to argue the function will not have an effect on the module.
That seems to be the right solution to all these weird effects. Obviously we need to add nocallback to the default intrinsic attributes and to other intrinsics that don't use the default intrinsic definition.
@efriedma wdyt? Can you point out where the intrinsic handling is located?
Please let's not bring nocallback into this. It's an as-yet completely unused attribute, and we haven't even converted most target intrinsics to use default attributes yet. Changing Intrinsic::isLeaf() to use an attribute instead of a hard-coded list seems completely orthogonal to this issue.
It is not orthogonal. The problem doesn't happen for functions because we think intrinsics have a magic property, or in fact multiple, that cannot cause an effect in certain situations. This is plain wrong. From there things will always go sideways.
The properties we associate (somewhere) with intrinsics are captured by nosync and nocallback. Neither is inherent to intrinsics but we pretend they are and that causes the bug. Only by making these explicit we can fix the bug without just hiding it, and treat functions that have the same properties the same way.
All that said, we obviously need a lang ref patch for nocallback.
Finally, not having converted target intrinsics is even more reason to do this. Pretending intrinsics are special in any way you just need to when you write an analysis/optimization is always going to cause bugs. If we do not specify properties explicitly we are fighting a loosing battle. If we make them explicit however, people might take a look at their target directives and port them, as we ported the nvptx ones recently ;).
This may be the place which causes different behavior of intrinsics vs functions.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/CallGraph.cpp#L96
That said, we seem to have multiple moving parts here, that may be worth separating:
- lack of nosync should force GlobalModRef to treat functions conservatively.
Problem: not all intrinsics have been converted to use default attributes, so it will be a performance regression for them.
- Some intrinsics *may* call back and that's currently covered only by an incomplete isLeaf.
Problem: nocallback is not here yet and we have the same issue with the default attributes as above.
- Both issues above will not have a quick fix, but we do need to deal with the miscompiles we have now.
Q: What can we do short-term to mitigate the issue?
Perhaps as a stop-gap workaround, we can explicitly enumerate NVPTX intrinsics that must be treated as a synchronization barrier.
AFAICT, the only other platform that may potentially be affected by this is AMDGPU, so the special case list should be manageable.
The bug being fixed here isn't exclusive to intrinsics. The following shows the same issue using a spinloop:
_Atomic int a = 0; static int z = 0; __attribute((noinline)) void g(int x) { while (a != x); } int f(int x) { z = x; g(x); return z; }
Given that, I don't think this whole tangent about the control flow graph and hardcoding which intrinsics are nocallback is relevant to this bug.
Requiring nosync when you want to derive "no side-effect" seems agreed upon. Let's start there.
Problem: not all intrinsics have been converted to use default attributes, so it will be a performance regression for them.
This could be a problem, yes. Not sure how much of one it is in practice.
Perhaps as a stop-gap workaround, we can explicitly enumerate NVPTX intrinsics that must be treated as a synchronization barrier.
I wouldn't really be happy with this, but I wouldn't block that patch.
My suggestion was about listing the intrinsics that act as a barrier. I.e. teach GlobalModRef that @llvm.nvvm.barrier0 is a ModRef for all values. While it's not a proper fix, it does force LLVM to be more conservative about memory accesses on both sides of it and prevents miscompilation. While not pretty, the work-around is narrow in scope and does not block general work on having these issues fixed properly.
Intrinsics being treated differently from functions because of assumed nocallback is an independent issue.
We have DefaultIntrinsic for a really long while now. Targets won't change if they don't have too. Let's prioritize correctness and require nosync, tell people on the list to update their def files already.
It might be easier to send the patch and let interested parties review the changes. I'll do it shortly.
Undo ObjC intrinsic properties change. Fixed the tests instead.
Moved nosync check into getModRefInfo, where it makes more
sense as an additional check for potential side effects of the call.
Added @asbirlea as a reviewer for the GlobalsModRef tests.
On a second thought, I just don't know enough to convert all targets to use DefaultIntrinsic correctly.
Let's prioritize correctness and require nosync, tell people on the list to update their def files already.
SGTM.
llvm/lib/Analysis/GlobalsModRef.cpp | ||
---|---|---|
957–958 | I'm not quite sure whether this is completely correct. What if it's an intrinsic w/o IntrNoSync but with IntrNoMem property, which technically also indicates no side effects. |
llvm/lib/Analysis/GlobalsModRef.cpp | ||
---|---|---|
957–958 | readnone and readonly both imply nosync; we treat synchronization as a read and write to memory. In terms of intrinsics, we currently don't consistently add nosync attributes. If we mark an intrinsic IntrNoMem, that should imply nosync, even without IntrNoSync. We shouldn't be trying to infer nosync from readnone/readonly in GlobalsAA. We should probably hack IntrinsicEmitter::EmitAttributes to make that work. |
llvm/lib/Analysis/GlobalsModRef.cpp | ||
---|---|---|
957–958 |
So, IIUIC you're proposing to set nosync on intrinsics if either IntrNoSync or IntrNoMem properties are set. This part makes sense. I'll update the patch.
I assume you're talking about the Call->onlyReadsMemory() ? Ref : ModRef bit above. The idea is not to infer nosync, but rather to establish the lower bound on the valid result when nosync is absent based on what we do know about the function. That said, it appears that there's a difference between intrinsic's IntrReadMem which is documented as "It does not write to memory and has no other side effects." and function's readonly attribute which does *not* imply "no other side effects". So, for a regular function w/o nosync we may need to return ModRef, even if the function has readonly attribute. |
llvm/lib/Analysis/GlobalsModRef.cpp | ||
---|---|---|
957–958 |
This is not true. This is specifically why nosync is a separate attribute for non-memory communication |
llvm/lib/Analysis/GlobalsModRef.cpp | ||
---|---|---|
957–958 |
The status quo is the following:
In sum, LLVM behaves as if readonly/readnone implies nosync, and has behaved this way since before the nosync attribute was introduced in D62766. We could potentially mess with this, I guess, but I don't see any pressing reason we'd want to. |
This came up on our GPU working group call today.
We have now run into this problem 3 distinct times:
https://lists.llvm.org/pipermail/llvm-dev/2021-November/154060.html
https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402
https://github.com/intel/llvm/issues/1258
We should really fix this conceptually. I suggest to patch doesNotAccessMemory and friends in Function.h and InstrTypes.h to include a new argument
bool doesNotAccessMemory(bool IgnoreSynchronization = false) { return (IgnoreSynchronization || hasNoSync()) && hasFnAttribute(Attribute::ReadNone); }
This will make sure most argumentation based on readnone/only/writeonly are synchronization aware.
llvm/lib/Analysis/GlobalsModRef.cpp | ||
---|---|---|
957–958 |
This is not entirely true. Target independent intrinsics, AArch64, and NVVM all use DefaultIntrinsics which include nosync.
No, please don't. Nosync is by design not implied by readnone:
Here is the nvptx and amdgcn intrinsic for shuffle: https://godbolt.org/z/TaWMT89G4
That ship has sailed a long time ago, basically as we introduced nosync we explicitly changed this. There are leftover assumptions like 3 around but that is a different story. |
llvm/lib/Analysis/GlobalsModRef.cpp | ||
---|---|---|
957–958 |
The introduction of nosync retroactively changed the meaning of readnone? That doesn't seem right; the nosync patch didn't change LangRef, nor did it change any code involving readonly/readnone. And we haven't subsequently adjusted the code querying readonly/readnone, or any alias analysis code, or clang. A couple mismarked GPU intrinsics doesn't mean the rest of the world is broken... Anyway, if you do want to change/clarify this, please propose a LangRef patch and post a message on Discourse. |
It appears that there's no consensus on how nosync and readonly/readnone are supposed to interact.
So for the time being I propose that we treat them as independent for GlobalsModRef.
It may be overly conservative, but it fixes a real correctness issues caused by GlobalsModRef ignoring lack of nosync.
It does not solve all the problems mentioned above, but it's a step forward, IMO.
Would that be acceptable?
One question is -- for calls w/o nosync should we always return ModRefInfo::ModRef (i.e. we have no idea what other threads may do)?
Or should we allow returning ModRefInfo::Mod if Call->onlyReadsMemory() is true?
llvm/lib/Analysis/GlobalsModRef.cpp | ||
---|---|---|
957–958 |
In this case, NVPTX intrinsics *are* marked correctly (i.e. they do have 'nosync' applied correctly where it's applicable). It''s other backends' intrinsics that are yet to start using DefaultIntrinsic. |
I think https://reviews.llvm.org/D123531 might have fixed (parts of) this. I included the test case llvm/test/Analysis/GlobalsModRef/functions_without_nosync.ll there and it sems to work fine.
This isn't the right way to query an attribute on a call. Should be something like Call->hasFnAttr(Attribute::NoSync).