This is an archive of the discontinued LLVM Phabricator instance.

GlobalsModRef should treat functions w/o nosync conservatively.
AbandonedPublic

Authored by tra on Dec 7 2021, 4:03 PM.

Details

Summary

This fixes a miscompile in NVPTX that's been exposed by
https://github.com/llvm/llvm-project/commit/fa5d31f825699b0fe858d4f432bd3fbbbec523c8
and which resulted in shared memory stores/loads being incorrectly eliminated.
https://lists.llvm.org/pipermail/llvm-dev/2021-November/154060.html

Diff Detail

Unit TestsFailed

Event Timeline

tra created this revision.Dec 7 2021, 4:03 PM
tra requested review of this revision.Dec 7 2021, 4:03 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2021, 4:03 PM
arsenm added a subscriber: arsenm.Dec 7 2021, 5:00 PM

Isn't this what nosync is for? The convergence property isn't really relevant here

nikic added a comment.Dec 7 2021, 11:58 PM

I don't think this should be part of the main AA implementation, but rather be sunk into the AA provider that is producing incorrect results (here GlobalModRef, unless BasicAA also produces an incorrect result?)

tra updated this revision to Diff 392857.Dec 8 2021, 11:57 AM
tra edited the summary of this revision. (Show Details)

Moved the check to GlobalModRef and switched to checking for nosync

Test file name

llvm/test/Analysis/GlobalsModRef/intrinsic_convergent.ll
1 ↗(On Diff #392857)

Test file name no longer matches

tra added a comment.Dec 8 2021, 12:04 PM

I don't think this should be part of the main AA implementation, but rather be sunk into the AA provider that is producing incorrect results (here GlobalModRef, unless BasicAA also produces an incorrect result?)

BasicAA does not seem to have any impact on the miscompile.

I've moved the code to GlobalModRef and changed it to check for the absence of nosync.

The good news is that it fixes my case. The bad -- it breaks a lot of other GlobalModRef tests. IIUIC, it's due to the functions in the tests not having nosync.

What's supposed to be the source of nosync attribute? Is it expected to be set explicitly by the user, or does LLVM relies on automatically inferring it?

tra retitled this revision from [AA] Teach AA about convergent instrinsics that affect loads/stores. to GlobalsModRef should treat functions w/o nosync conservatively..Dec 8 2021, 12:06 PM
tra edited the summary of this revision. (Show Details)
tra updated this revision to Diff 392867.Dec 8 2021, 12:15 PM

Renamed the test.

tra updated this revision to Diff 392935.Dec 8 2021, 2:16 PM

Updated tests to use nosync attributes where the tests assumed it.
Added nosync to a couple of objc intrinsics.

tra added a subscriber: pete.Dec 8 2021, 2:22 PM
tra added inline comments.
llvm/include/llvm/IR/Intrinsics.td
418–419 ↗(On Diff #392935)

@pete -- is IntrNoSync valid for these intrinsics?

If not, then the test/Analysis/GlobalsModRef/intrinsic_addressnottaken*.ll need to be changed as they would no longer work if these intrinsics do not have IntrNoSync.

pete added a subscriber: ahatanak.Dec 8 2021, 2:24 PM
pete added inline comments.
llvm/include/llvm/IR/Intrinsics.td
418–419 ↗(On Diff #392935)

I think its valid yeah. @ahatanak might know for sure, as I think he's looked at this more recently than I have.

What's supposed to be the source of nosync attribute? Is it expected to be set explicitly by the user, or does LLVM relies on automatically inferring it?

Generally, we expect to infer it in attributor/function-attrs. Adding nosync to the tests is fine.

llvm/include/llvm/IR/Intrinsics.td
419 ↗(On Diff #392935)

I don't think this is right. Among other issues, refcounting on objc objects can run arbitrary code.

llvm/lib/Analysis/GlobalsModRef.cpp
916

This isn't the right way to query an attribute on a call. Should be something like Call->hasFnAttr(Attribute::NoSync).

jdoerfert added a comment.EditedDec 8 2021, 2:40 PM

EDIT:
Second thought, this does not fix my bug: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154185.html

tra updated this revision to Diff 392942.Dec 8 2021, 2:58 PM

Use CallBase::hasFnAttr

tra added a comment.EditedDec 8 2021, 3:07 PM

EDIT:
Second thought, this does not fix my bug: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154185.html

I think it may. At least it does seem to preserve the load and produces identical code for both functions in my build. Unpatched LLVM does not.

Args: bin/opt -debug -aa-pipeline=basic-aa,globals-aa -passes=require<globals-aa>,function(gvn<pre;load-pre;split-backedge-load-pre;memdep>) -S
GVN iteration: 0
GVN: load i32 %r is clobbered by   call void @llvm.sync()
GVN iteration: 0
GVN: load i32 %r is clobbered by   call void @sync()

EDIT:
Second thought, this does not fix my bug: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154185.html

I think it may. At least it does seem to preserve the load and produces identical code for both functions in my build. Unpatched LLVM does not.

Args: bin/opt -debug -aa-pipeline=basic-aa,globals-aa -passes=require<globals-aa>,function(gvn<pre;load-pre;split-backedge-load-pre;memdep>) -S
GVN iteration: 0
GVN: load i32 %r is clobbered by   call void @llvm.sync()
GVN iteration: 0
GVN: load i32 %r is clobbered by   call void @sync()

But then I'm confused. The code that is patched here is generic for calls, no? I think this is a particular intrinsic issue.

tra added a comment.Dec 8 2021, 3:45 PM

But then I'm confused. The code that is patched here is generic for calls, no? I think this is a particular intrinsic issue.

I don't know why @llvm.sync and @sync are treated differently in your example. I doubt this patch will change that.

On the other hand, both @llvm.sync and @sync are just functions w/o nosync (and no other relevant attributes) and that is sufficient to treat them conservatively with this patch in place.

It's quite possible that in your case this patch only masks the issue.

But then I'm confused. The code that is patched here is generic for calls, no? I think this is a particular intrinsic issue.

I don't know why @llvm.sync and @sync are treated differently in your example. I doubt this patch will change that.

On the other hand, both @llvm.sync and @sync are just functions w/o nosync (and no other relevant attributes) and that is sufficient to treat them conservatively with this patch in place.

It's quite possible that in your case this patch only masks the issue.

It seems the function w/o nosycn is already treated as potential clobber of the global while the intrinsic is not. This patch now "unifies" it by looking for nosync at the call site, but maybe we should not pretend intrinsics are special.
Here is your original example with a function instead of an intrinsic, all seems to work just fine: https://godbolt.org/z/7cn8jKz17

EDIT:
Second thought, this does not fix my bug: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154185.html

I think it may. At least it does seem to preserve the load and produces identical code for both functions in my build. Unpatched LLVM does not.

Args: bin/opt -debug -aa-pipeline=basic-aa,globals-aa -passes=require<globals-aa>,function(gvn<pre;load-pre;split-backedge-load-pre;memdep>) -S
GVN iteration: 0
GVN: load i32 %r is clobbered by   call void @llvm.sync()
GVN iteration: 0
GVN: load i32 %r is clobbered by   call void @sync()

But then I'm confused. The code that is patched here is generic for calls, no? I think this is a particular intrinsic issue.

In general, GlobalsAA follows the call graph, and therefore conservatively assumes a call to an external function can call back into the current module.

The callgraph code special-cases intrinsics, though. Except for a few specific intrinsics, we assume intrinsics can't call back into the current module. See Intrinsic::isLeaf.

tra added a comment.Dec 8 2021, 4:30 PM

It seems the function w/o nosycn is already treated as potential clobber of the global while the intrinsic is not. This patch now "unifies" it by looking for nosync at the call site, but maybe we should not pretend intrinsics are special.
Here is your original example with a function instead of an intrinsic, all seems to work just fine: https://godbolt.org/z/7cn8jKz17

That is a very good point. Let me see if I can figure out what makes intrinsics (even non-existing ones) special.

EDIT:
Second thought, this does not fix my bug: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154185.html

I think it may. At least it does seem to preserve the load and produces identical code for both functions in my build. Unpatched LLVM does not.

Args: bin/opt -debug -aa-pipeline=basic-aa,globals-aa -passes=require<globals-aa>,function(gvn<pre;load-pre;split-backedge-load-pre;memdep>) -S
GVN iteration: 0
GVN: load i32 %r is clobbered by   call void @llvm.sync()
GVN iteration: 0
GVN: load i32 %r is clobbered by   call void @sync()

But then I'm confused. The code that is patched here is generic for calls, no? I think this is a particular intrinsic issue.

In general, GlobalsAA follows the call graph, and therefore conservatively assumes a call to an external function can call back into the current module.

The callgraph code special-cases intrinsics, though. Except for a few specific intrinsics, we assume intrinsics can't call back into the current module. See Intrinsic::isLeaf.

  1. Then the intrinsics should be marked with the attribute that says they won't call back into the module: nocallback (which has no lang ref entry... https://reviews.llvm.org/D90275#2454912, anyway).
  2. Not calling back into the module is not sufficient. Nosync is necessary, as noted here. I'm still somewhat unsure why it is checked for a call but OK, that might be the interface. I fear we forget to check it in other places too.
tra added a comment.Dec 9 2021, 10:25 AM

In general, GlobalsAA follows the call graph, and therefore conservatively assumes a call to an external function can call back into the current module.

The callgraph code special-cases intrinsics, though. Except for a few specific intrinsics, we assume intrinsics can't call back into the current module. See Intrinsic::isLeaf.

Sounds like isLeaf is not not completely correct, either. Earlier comment mentioned that "refcounting on objc objects can run arbitrary code".

While poking at the Intrinsics.td I've noticed that NoSync is supposed to be set on intrinsics by default. The fact that int_objc_autoreleasePoolPush was affected by this patch suggests that we somehow failed to propagate it.

  1. Then the intrinsics should be marked with the attribute that says they won't call back into the module: nocallback (which has no lang ref entry... https://reviews.llvm.org/D90275#2454912, anyway).

Nor does it have a matching intrinsic property.

  1. Not calling back into the module is not sufficient. Nosync is necessary, as noted here. I'm still somewhat unsure why it is checked for a call but OK, that might be the interface. I fear we forget to check it in other places too.

I am way out of my depth here, so it's quite possible that I'm doing it wrong. If there's a better way/place to do it, I'm open to suggestions.

I think we should locate the place intrinsics are handled special right now. And there we require the function to be nocallback and nosync to argue the function will not have an effect on the module.
That seems to be the right solution to all these weird effects. Obviously we need to add nocallback to the default intrinsic attributes and to other intrinsics that don't use the default intrinsic definition.
@efriedma wdyt? Can you point out where the intrinsic handling is located?

nikic added a comment.Dec 9 2021, 12:51 PM

Please let's not bring nocallback into this. It's an as-yet completely unused attribute, and we haven't even converted most target intrinsics to use default attributes yet. Changing Intrinsic::isLeaf() to use an attribute instead of a hard-coded list seems completely orthogonal to this issue.

Please let's not bring nocallback into this. It's an as-yet completely unused attribute, and we haven't even converted most target intrinsics to use default attributes yet. Changing Intrinsic::isLeaf() to use an attribute instead of a hard-coded list seems completely orthogonal to this issue.

It is not orthogonal. The problem doesn't happen for functions because we think intrinsics have a magic property, or in fact multiple, that cannot cause an effect in certain situations. This is plain wrong. From there things will always go sideways.

The properties we associate (somewhere) with intrinsics are captured by nosync and nocallback. Neither is inherent to intrinsics but we pretend they are and that causes the bug. Only by making these explicit we can fix the bug without just hiding it, and treat functions that have the same properties the same way.

All that said, we obviously need a lang ref patch for nocallback.

Finally, not having converted target intrinsics is even more reason to do this. Pretending intrinsics are special in any way you just need to when you write an analysis/optimization is always going to cause bugs. If we do not specify properties explicitly we are fighting a loosing battle. If we make them explicit however, people might take a look at their target directives and port them, as we ported the nvptx ones recently ;).

tra added a comment.EditedDec 9 2021, 2:08 PM

The properties we associate (somewhere) with intrinsics are captured by nosync and nocallback. Neither is inherent to intrinsics but we pretend they are and that causes the bug. Only by making these explicit we can fix the bug without just hiding it, and treat functions that have the same properties the same way.

This may be the place which causes different behavior of intrinsics vs functions.
https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/CallGraph.cpp#L96

That said, we seem to have multiple moving parts here, that may be worth separating:

  • lack of nosync should force GlobalModRef to treat functions conservatively.

    Problem: not all intrinsics have been converted to use default attributes, so it will be a performance regression for them.
  • Some intrinsics *may* call back and that's currently covered only by an incomplete isLeaf.

    Problem: nocallback is not here yet and we have the same issue with the default attributes as above.
  • Both issues above will not have a quick fix, but we do need to deal with the miscompiles we have now.

    Q: What can we do short-term to mitigate the issue?

Perhaps as a stop-gap workaround, we can explicitly enumerate NVPTX intrinsics that must be treated as a synchronization barrier.
AFAICT, the only other platform that may potentially be affected by this is AMDGPU, so the special case list should be manageable.

The bug being fixed here isn't exclusive to intrinsics. The following shows the same issue using a spinloop:

_Atomic int a = 0;
static int z = 0;
__attribute((noinline))
void g(int x) {
   while (a != x);
}
int f(int x) {
   z = x;
   g(x);
   return z;
}

Given that, I don't think this whole tangent about the control flow graph and hardcoding which intrinsics are nocallback is relevant to this bug.

Requiring nosync when you want to derive "no side-effect" seems agreed upon. Let's start there.

Problem: not all intrinsics have been converted to use default attributes, so it will be a performance regression for them.

This could be a problem, yes. Not sure how much of one it is in practice.

Perhaps as a stop-gap workaround, we can explicitly enumerate NVPTX intrinsics that must be treated as a synchronization barrier.

I wouldn't really be happy with this, but I wouldn't block that patch.

tra added a comment.EditedDec 9 2021, 2:43 PM

The bug being fixed here isn't exclusive to intrinsics. The following shows the same issue using a spinloop:
Given that, I don't think this whole tangent about the control flow graph and hardcoding which intrinsics are nocallback is relevant to this bug.

My suggestion was about listing the intrinsics that act as a barrier. I.e. teach GlobalModRef that @llvm.nvvm.barrier0 is a ModRef for all values. While it's not a proper fix, it does force LLVM to be more conservative about memory accesses on both sides of it and prevents miscompilation. While not pretty, the work-around is narrow in scope and does not block general work on having these issues fixed properly.

Intrinsics being treated differently from functions because of assumed nocallback is an independent issue.

We have DefaultIntrinsic for a really long while now. Targets won't change if they don't have too. Let's prioritize correctness and require nosync, tell people on the list to update their def files already.

tra added a comment.EditedDec 9 2021, 2:52 PM

Problem: not all intrinsics have been converted to use default attributes, so it will be a performance regression for them.

This could be a problem, yes. Not sure how much of one it is in practice.

We have DefaultIntrinsic for a really long while now. Targets won't change if they don't have too. Let's prioritize correctness and require nosync, tell people on the list to update their def files already.

It might be easier to send the patch and let interested parties review the changes. I'll do it shortly.

tra updated this revision to Diff 394068.Dec 13 2021, 3:05 PM

Undo ObjC intrinsic properties change. Fixed the tests instead.
Moved nosync check into getModRefInfo, where it makes more
sense as an additional check for potential side effects of the call.

tra updated this revision to Diff 394069.Dec 13 2021, 3:09 PM

Reverted the changes in test/Transforms/ObjCARC/basic.ll that are no longer needed.

tra added a subscriber: asbirlea.

Added @asbirlea as a reviewer for the GlobalsModRef tests.

It might be easier to send the patch and let interested parties review the changes. I'll do it shortly.

On a second thought, I just don't know enough to convert all targets to use DefaultIntrinsic correctly.

Let's prioritize correctness and require nosync, tell people on the list to update their def files already.

SGTM.

llvm/lib/Analysis/GlobalsModRef.cpp
957–958

I'm not quite sure whether this is completely correct.

What if it's an intrinsic w/o IntrNoSync but with IntrNoMem property, which technically also indicates no side effects.

efriedma added inline comments.Dec 14 2021, 12:17 PM
llvm/lib/Analysis/GlobalsModRef.cpp
957–958

readnone and readonly both imply nosync; we treat synchronization as a read and write to memory.

In terms of intrinsics, we currently don't consistently add nosync attributes. If we mark an intrinsic IntrNoMem, that should imply nosync, even without IntrNoSync. We shouldn't be trying to infer nosync from readnone/readonly in GlobalsAA. We should probably hack IntrinsicEmitter::EmitAttributes to make that work.

tra added inline comments.Dec 14 2021, 1:01 PM
llvm/lib/Analysis/GlobalsModRef.cpp
957–958

readnone and readonly both imply nosync; we treat synchronization as a read and write to memory.

In terms of intrinsics, we currently don't consistently add nosync attributes. If we mark an intrinsic IntrNoMem, that should imply nosync, even without IntrNoSync.
We shouldn't be trying to infer nosync from readnone/readonly in GlobalsAA. We should probably hack IntrinsicEmitter::EmitAttributes to make that work.

So, IIUIC you're proposing to set nosync on intrinsics if either IntrNoSync or IntrNoMem properties are set. This part makes sense. I'll update the patch.
I think we should also infer nosync from IntrReadMem, IntrWriteMem, and IntrArgMemOnly as they all are documented as "and has no other side effects".

We shouldn't be trying to infer nosync from readnone/readonly in GlobalsAA.

I assume you're talking about the Call->onlyReadsMemory() ? Ref : ModRef bit above.

The idea is not to infer nosync, but rather to establish the lower bound on the valid result when nosync is absent based on what we do know about the function.
It's equivalent of ConservativeResult in getModRefInfoForArgument() above.

That said, it appears that there's a difference between intrinsic's IntrReadMem which is documented as "It does not write to memory and has no other side effects." and function's readonly attribute which does *not* imply "no other side effects".

So, for a regular function w/o nosync we may need to return ModRef, even if the function has readonly attribute.

arsenm added inline comments.Dec 14 2021, 1:05 PM
llvm/lib/Analysis/GlobalsModRef.cpp
957–958

readnone and readonly both imply nosync; we treat synchronization as a read and write to memory.

This is not true. This is specifically why nosync is a separate attribute for non-memory communication

efriedma added inline comments.Dec 14 2021, 1:48 PM
llvm/lib/Analysis/GlobalsModRef.cpp
957–958

readnone and readonly both imply nosync; we treat synchronization as a read and write to memory.

This is not true. This is specifically why nosync is a separate attribute for non-memory communication

The status quo is the following:

  1. AliasAnalysis and similar queries currently don't check for nosync on readonly/readnone calls; they assume such a call doesn't appear to modify memory.
  2. We don't infer readonly/readnone for functions which perform synchronization.
  3. We don't mark intrinsics which perform synchronization readonly/readnone.

In sum, LLVM behaves as if readonly/readnone implies nosync, and has behaved this way since before the nosync attribute was introduced in D62766.

We could potentially mess with this, I guess, but I don't see any pressing reason we'd want to.

This came up on our GPU working group call today.

We have now run into this problem 3 distinct times:
https://lists.llvm.org/pipermail/llvm-dev/2021-November/154060.html
https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402
https://github.com/intel/llvm/issues/1258

We should really fix this conceptually. I suggest to patch doesNotAccessMemory and friends in Function.h and InstrTypes.h to include a new argument

bool doesNotAccessMemory(bool IgnoreSynchronization = false) { return (IgnoreSynchronization || hasNoSync()) && hasFnAttribute(Attribute::ReadNone); }

This will make sure most argumentation based on readnone/only/writeonly are synchronization aware.

llvm/lib/Analysis/GlobalsModRef.cpp
957–958

In terms of intrinsics, we currently don't consistently add nosync attributes

This is not entirely true. Target independent intrinsics, AArch64, and NVVM all use DefaultIntrinsics which include nosync.
Other targets will not update their td files as it becomes necessary.

So, IIUIC you're proposing to set nosync on intrinsics if either IntrNoSync or IntrNoMem properties are set. This part makes sense. I'll update the patch.

No, please don't. Nosync is by design not implied by readnone:
LangRef: "Note that through convergent function calls non-memory communication, e.g., cross-lane operations, are possible and are also considered synchronization."
The reasoning for the contrary come from a time before synchronization was a dedicated effect. A time where we needed memory for everything, incl. sync and termination.

The status quo is the following:
...

  1. Is certainly not true. From that, 2. is neither. Which makes 1. a wrong assumption.

Here is the nvptx and amdgcn intrinsic for shuffle: https://godbolt.org/z/TaWMT89G4
One time it's readnone, one time inaccessible memory only, both will expose the GVN problem.

We could potentially mess with this, I guess, but I don't see any pressing reason we'd want to.

That ship has sailed a long time ago, basically as we introduced nosync we explicitly changed this. There are leftover assumptions like 3 around but that is a different story.

efriedma added inline comments.Feb 18 2022, 1:21 PM
llvm/lib/Analysis/GlobalsModRef.cpp
957–958

We could potentially mess with this, I guess, but I don't see any pressing reason we'd want to.

That ship has sailed a long time ago, basically as we introduced nosync we explicitly changed this. There are leftover assumptions like 3 around but that is a different story.

The introduction of nosync retroactively changed the meaning of readnone? That doesn't seem right; the nosync patch didn't change LangRef, nor did it change any code involving readonly/readnone. And we haven't subsequently adjusted the code querying readonly/readnone, or any alias analysis code, or clang. A couple mismarked GPU intrinsics doesn't mean the rest of the world is broken...

Anyway, if you do want to change/clarify this, please propose a LangRef patch and post a message on Discourse.

tra added a comment.Mar 2 2022, 3:08 PM

It appears that there's no consensus on how nosync and readonly/readnone are supposed to interact.

So for the time being I propose that we treat them as independent for GlobalsModRef.
It may be overly conservative, but it fixes a real correctness issues caused by GlobalsModRef ignoring lack of nosync.
It does not solve all the problems mentioned above, but it's a step forward, IMO.

Would that be acceptable?

One question is -- for calls w/o nosync should we always return ModRefInfo::ModRef (i.e. we have no idea what other threads may do)?
Or should we allow returning ModRefInfo::Mod if Call->onlyReadsMemory() is true?

llvm/lib/Analysis/GlobalsModRef.cpp
957–958

A couple mismarked GPU intrinsics doesn't mean the rest of the world is broken...

In this case, NVPTX intrinsics *are* marked correctly (i.e. they do have 'nosync' applied correctly where it's applicable). It''s other backends' intrinsics that are yet to start using DefaultIntrinsic.

Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2022, 3:08 PM

I think https://reviews.llvm.org/D123531 might have fixed (parts of) this. I included the test case llvm/test/Analysis/GlobalsModRef/functions_without_nosync.ll there and it sems to work fine.

tra abandoned this revision.Apr 12 2022, 10:38 AM

Abandoning in favor of D123531