This is an archive of the discontinued LLVM Phabricator instance.

[GlobalsModRef][FIX] Ensure we honor synchronizing effects of intrinsics
ClosedPublic

Authored by jdoerfert on Apr 11 2022, 11:38 AM.

Details

Summary

This is a long standing problem that resurfaces once in a while [0,1].

In 2008 we thought intrinsics do not read/write globals passed to them:
https://github.com/llvm/llvm-project/commit/d4133ac31535ce5176f97e9fc81825af8a808760
This is not correct given that intrinsics can synchronize threads and
cause effects to effectively become visible.

Fixes: https://github.com/llvm/llvm-project/issues/54851

[0] https://discourse.llvm.org/t/bug-gvn-memdep-bug-in-the-presence-of-intrinsics/59402
[1] https://reviews.llvm.org/D115302

Diff Detail

Event Timeline

jdoerfert created this revision.Apr 11 2022, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2022, 11:38 AM
jdoerfert requested review of this revision.Apr 11 2022, 11:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 11 2022, 11:38 AM
Herald added a subscriber: wdng. · View Herald Transcript

Clang format

Actually clang format

My understanding is that the assumption the code is trying to make is that intrinsics are special because they can't call other code in the current module.

In practice, this is wrong for two reasons:

  1. Some intrinsics imply synchronization.
  2. Some intrinsics can call arbitrary code.

So I don't think hasNoSync() is sufficient...

I'd appreciate a better comment about the invariants involved here.

(Looking more closely, I don't think this really overlaps with D115302.)

My understanding is that the assumption the code is trying to make is that intrinsics are special because they can't call other code in the current module.

In practice, this is wrong for two reasons:

  1. Some intrinsics imply synchronization.
  2. Some intrinsics can call arbitrary code.

Right. I only checked 1. I also need to check nocallback for 2. Will update.

Add nocallback check and improve comment

Did you mean to drop the !F->isIntrinsic() check? I mean, I guess you could replace it with !F->isDeclaration(), but I think you need some protection against optnone function definitions, since we don't analyze their bodies.

Did you mean to drop the !F->isIntrinsic() check? I mean, I guess you could replace it with !F->isDeclaration(), but I think you need some protection against optnone function definitions, since we don't analyze their bodies.

I dropped it on purpose but we might need more checks.
I don't believe intrinsics are special, or at least should be.

To me it looked like optnone is treated like a declaration already. Is that wrong? If it is not, I don't think we need the declaration check but at the end of the day we can be conservative in the presence of optnone.

Handle optnone conservatively

The check looks right now.

Remaining concerns:

  1. There's another use of isIntrinsic() a few lines earlier; do we need to fix that?
  2. Is this going to have the same issue as D115302 where there are a bunch of intrinsics that currently don't have the expected markings?
  3. Missing testcases.

Testcase?

NOTE: I did not yet modify any tests but only tried out the reproducer of https://github.com/llvm/llvm-project/issues/54851.

The check looks right now.

Great. I'll get on the remaining issues.

  1. There's another use of isIntrinsic() a few lines earlier; do we need to fix that?

We do. I will make a helper and use it both times.

  1. Is this going to have the same issue as D115302 where there are a bunch of intrinsics that currently don't have the expected markings?

People will need to annotate their intrinsics or use DefaultIntrinsics at some point. The latter have nosync and nocallback.

  1. Missing testcases.

working on it.

jdoerfert updated this revision to Diff 422048.Apr 11 2022, 3:14 PM
jdoerfert added a subscriber: asbirlea.

Update tests and add new one.

@asbirlea, please take a look at the Intrinsics.td change. We can also apply it
prior to this patch separately.

jdoerfert updated this revision to Diff 422075.Apr 11 2022, 4:45 PM

Fix Transforms/ObjCARC/basic.ll

jdoerfert edited the summary of this revision. (Show Details)Apr 12 2022, 8:08 AM
tra added a comment.Apr 12 2022, 10:34 AM

I'm not 100% sure if the issue underlying https://reviews.llvm.org/D115302 would be solved by this or not.

You've included the functions_without_nosync.ll test from D115302 which verifies that this patch fixes that issue, too. AFAICT, both patches are trying to address the same issue, but due to my lack of expertise, I was (overly?) cautious and trying to change as little as possible, and probably was doing it in a wrong place, too.

We need to document how nosync (or lack of it) interacts with other attributes describing side effects. E.g. interaction between IntrReadMem, IntrArgMemOnly and (lack of) nosync.

One concern raised on D115302 was the impact on back-ends that have not been upgraded to use DefaultIntrinsic. For them this patch would prevent some optimizations. This could potentially be an issue for users who may use such intrinsic in the hot path and they will have no easy way to work around that, as intrinsic properties are not under user control. Should we consider having a temporary hidden escape hatch option to revert to the old behavior?

I'm not 100% sure if the issue underlying https://reviews.llvm.org/D115302 would be solved by this or not.

You've included the functions_without_nosync.ll test from D115302 which verifies that this patch fixes that issue, too. AFAICT, both patches are trying to address the same issue, but due to my lack of expertise, I was (overly?) cautious and trying to change as little as possible, and probably was doing it in a wrong place, too.

We need to document how nosync (or lack of it) interacts with other attributes describing side effects. E.g. interaction between IntrReadMem, IntrArgMemOnly and (lack of) nosync.

Yes. We need to properly make sync a first-class side-effect, get away from memory writes, etc.

One concern raised on D115302 was the impact on back-ends that have not been upgraded to use DefaultIntrinsic. For them this patch would prevent some optimizations. This could potentially be an issue for users who may use such intrinsic in the hot path and they will have no easy way to work around that, as intrinsic properties are not under user control. Should we consider having a temporary hidden escape hatch option to revert to the old behavior?

DefaultIntrinsic is available for a long time by now, people will not switch if they don't have to, nor will they revisit their intrinsics. It's still plenty of time till a release. Let's not add a hatch too early, give people a chance to realize and adjust.

jdoerfert edited the summary of this revision. (Show Details)Apr 12 2022, 10:46 AM

DefaultIntrinsic is available for a long time by now, people will not switch if they don't have to, nor will they revisit their intrinsics. It's still plenty of time till a release. Let's not add a hatch too early, give people a chance to realize and adjust.

I guess this is fine. But please post a note on Discourse.

llvm/include/llvm/IR/Intrinsics.td
428 ↗(On Diff #422238)

In general, popping an autorelease pool deallocates objects, and deallocating objects can run arbitrary user code. So the new markings on int_objc_autoreleasePoolPop are wrong, I think. Marking up int_objc_autoreleasePoolPush should be fine, though.

@rjmccall @ahatanak can you confirm?

DefaultIntrinsic is available for a long time by now, people will not switch if they don't have to, nor will they revisit their intrinsics. It's still plenty of time till a release. Let's not add a hatch too early, give people a chance to realize and adjust.

I guess this is fine. But please post a note on Discourse.

Will do as this is ready.

llvm/include/llvm/IR/Intrinsics.td
428 ↗(On Diff #422238)

I am waiting for ObjC people to look at this. I added the annotation based on two tests we otherwise fail, not based on the intrinsics (which I don't know).
Though, I might have misunderstood: llvm/test/Analysis/GlobalsModRef/intrinsic_addressnottaken1.ll

tra added inline comments.Apr 12 2022, 12:25 PM
llvm/include/llvm/IR/Intrinsics.td
428 ↗(On Diff #422238)

I think those tests do not care specifically about int_objc_autoreleasePoolPush/Pop and should be changed to use something less special, e.g. to @llvm.stacksave/@llvm.stackrestore as I've done in D115302.

jdoerfert updated this revision to Diff 422329.Apr 12 2022, 2:02 PM

Update tests as D115302 did, avoid objc changes

No objc changes. Took test changes from D115302.

This revision is now accepted and ready to land.Apr 12 2022, 2:35 PM
This revision was landed with ongoing or failed builds.Apr 12 2022, 2:44 PM
This revision was automatically updated to reflect the committed changes.
tra added a comment.Apr 12 2022, 4:48 PM

@jdoerfert: A friendly nag to update intrinsic attribute docs. Looks like they didn't make it to the landed change.

andrew.w.kaylor added inline comments.
llvm/lib/Analysis/GlobalsModRef.cpp
540

Should this also check onlyAccessesInaccessibleMemory() and onlyAccessesInaccessibleMemOrArgMem()? Same question for line 546 below.

We need to document how nosync (or lack of it) interacts with other attributes describing side effects. E.g. interaction between IntrReadMem, IntrArgMemOnly and (lack of) nosync.

@jdoerfert: A friendly nag to update intrinsic attribute docs. Looks like they didn't make it to the landed change.

You mean the above? That ain't something I can actually "just do". My opinion and the opinion of others are known to be different. What we need (in my opinion) is a major overhaul of our idea of side effects. We did a first step with termination but now we need to make "sync" and "write" different things. Even writing that up will take some time.

llvm/lib/Analysis/GlobalsModRef.cpp
540

Probably but unrelated. Feel free to propose a follow up.

tra added a comment.Apr 13 2022, 11:59 AM

We need to document how nosync (or lack of it) interacts with other attributes describing side effects. E.g. interaction between IntrReadMem, IntrArgMemOnly and (lack of) nosync.

@jdoerfert: A friendly nag to update intrinsic attribute docs. Looks like they didn't make it to the landed change.

You mean the above?

Yes.

That ain't something I can actually "just do". My opinion and the opinion of others are known to be different.

You do have the power to change the way LLVM works. I believe you do have the power to document how things work now and save other folks who just want to create an intrinsic the hassle of finding out about the quirks the hard way, when something breaks.

What we need (in my opinion) is a major overhaul of our idea of side effects. We did a first step with termination but now we need to make "sync" and "write" different things. Even writing that up will take some time.

IMO, that's independent of documenting the current state of affairs. For what it's worth, I do agree with you on the nature of the problem we have describing intrinsics' side effects.

We need to document how nosync (or lack of it) interacts with other attributes describing side effects. E.g. interaction between IntrReadMem, IntrArgMemOnly and (lack of) nosync.

@jdoerfert: A friendly nag to update intrinsic attribute docs. Looks like they didn't make it to the landed change.

You mean the above?

Yes.

That ain't something I can actually "just do". My opinion and the opinion of others are known to be different.

You do have the power to change the way LLVM works. I believe you do have the power to document how things work now and save other folks who just want to create an intrinsic the hassle of finding out about the quirks the hard way, when something breaks.

OK. But I'm still not sure where you want me to document what exactly. What I "changed" here is:

  • Intrinsics can execute arbitrary code if they are not annotated with nocallback, and
  • Intrinsics can make effects of other "threads of execution" visible if they are not annotated with nosync.

For me that was already a given because we don't have the opposite written anywhere (as far as I know).
I could put the two points explicitly here: https://llvm.org/docs/LangRef.html#intrinsic-functions, would that be a good place?

tra added a comment.Apr 15 2022, 11:08 AM

OK. But I'm still not sure where you want me to document what exactly. What I "changed" here is:

  • Intrinsics can execute arbitrary code if they are not annotated with nocallback, and

Does it imply that we can not trust Intr*Mem attributes? If that's the case, perhaps we should implement some sort of sanity check for the attributes to report nonsensical combinations.

If Intr*Mem take precedence, then it should be documented. E.g. if we have IntrArgMemOnly, then we're allowed to assume that the intrinsic will only touch specific arg only.

Also, currently Intrinsics.td says this:

//... 'has no other side effects'
// language of the IntrNoMem, IntrReadMem, IntrWriteMem, and IntrArgMemOnly
// intrinsic properties.`

This appears to contradict your assertion that lack of nocallback implies execution of arbitrary code which in turn implies arbitrary side effects.

I can't tell from reading Intrinsics.td what side effects we expect from an intrinsic with only IntrNoMem. Do we now need to specify IntrNoMem,nosync, nocallback in order to have a truly no-side-effects intrinsic?

  • Intrinsics can make effects of other "threads of execution" visible if they are not annotated with nosync.

Similarly here. nosync gives us a guarantee that no synchronization takes place, but without it, can we believe IntrReadMem that we do not write anything? Or do we need nosync for IntrReadMem to have an effect?

For me that was already a given because we don't have the opposite written anywhere (as far as I know).

The problem with no* attributes is that a person who looks at the intrinsics that do not have them would not know that lack of an attribute may affect the meaning of the explicitly specified ones.
It's hard to ask questions about something one may not even be aware of. Hence my harping on documenting the interaction between these attributes where developers will see it.

I could put the two points explicitly here: https://llvm.org/docs/LangRef.html#intrinsic-functions, would that be a good place?

LangRef does not seem to say anything specifically about intrinsics' attributes (though they do map to function attributes).
It also does not mention nocallback at all.

I was thinking of augmenting Intrinsics.td as that's where developers who implement intrinsics would look for the reference info.

readnone etc. surely encompass the entire behavior of the call, which of course includes any callbacks it might make. And synchronization needs to be understood as effectively a store, since synchronizing with another thread which has done a store guarantees the visibility of that store on this thread; and thus readonly etc. must imply nosync. So whatever nocallback means, it must be finer-grained, suitable for situations where the stronger attributes cannot be used.

I think it would be a huge shame if we had to go update a million testcases just to compensate for nocallback being added when there's already a stronger attribute on the intrinsic. That does not seem like a good use of anyone's time.

Sorry, that was a bit of a jumble of different ideas that I pressed into the form of a single message. Let me try to do better.

Effects summarization attributes like readnone and so on have to be understood as encompassing the entire behavior of the call. That of course includes any callbacks into visible code the call might make. So the absence of nocallback doesn't weaken other attributes in any way.

I think the right way of thinking about synchronization, effects-wise, is that it is effectively a potential store to any memory that might be accessible by another thread. Synchronization is therefore necessarily excluded by readonly, argmemonly, and so on. We can usually ignore the possibility of stores happening concurrently on other threads, but synchronization forces us to honor them, and that is indistinguishable, in terms of effects, from the store happening directly as part of the call.

I haven't read much about nocallback, so I don't entirely understand what the optimization goal is. If it's not implied by the existing effects attributes, then I guess we need to add it everywhere, and if that implies test churn, so be it. But I want to make sure that we're not modeling something useless in the optimizer just because we can imagine code that it accurately describes; ultimately, the IR is a tool that exists to be of use to us as compiler developers, and perfect accuracy by itself is not a goal.

readnone etc. surely encompass the entire behavior of the call, which of course includes any callbacks it might make.

Agreed, that is what it means now. What the code here assumed is that an intrinsic could *not* callback into the module, but it can.

And synchronization needs to be understood as effectively a store, since synchronizing with another thread which has done a store guarantees the visibility of that store on this thread; and thus readonly etc. must imply nosync.

Yes, at least that is the traditional interpretation which is leftover from the time we did not have nosync.
So, when we want to document the status quo, then we'd say nosync and read/write/only/none do *not* compose on the call level but only on the overall effect level.
Thus, read/write/only/none are "stronger" than "may-sync", if present. FWIW, we do not have a verifier check for that, e.g., that read-only/none has to go with nosync.
Instead of creating one we could also change that assumption but that is probably another story.
The current interpretation effectively means our annotations for certain intrinsics (e.g., gpu shuffle) are wrong as they are "must-sync" and readnone.

So whatever nocallback means, it must be finer-grained, suitable for situations where the stronger attributes cannot be used.

I don't think that is the right way to look at it. In contrast to nosync, nocallback is even now orthogonal/composable with memory attributes.
nocallback means it cannot invoke module code. It's not otherwise impacting the memory attributes, nor are they impacting nocallback beyond their particular specification.
Think of a readonly function/intrinsic which is not nocallback. It can totally call into the module and use a function there to read a global internal to the module that does not escape.
The function in the module doesn't even need to be readonly. It just needs to be readonly for the invocations by the readonly function/intrinsic.
Even readnone composes as it allows the function/intrinsic that is not nocallback to call into the module and get the address of a internal global, e.g., in order to return it.

I think it would be a huge shame if we had to go update a million testcases just to compensate for nocallback being added when there's already a stronger attribute on the intrinsic. That does not seem like a good use of anyone's time.

As I said above, nocallback and memory attributes compose already. No need to change anything as far as I can tell. The issue fixed in this patch did also not arise because of their combination. Memory attributes on functions w/ or w/o nocallback work the same way.
nosync is a different story. We should de-tangle it wrt. memory intrinsics and that would require test changes (among other things). However, if we could simply augment exiting read/write/only/none with nosync. That's mechanical. We are going to update literally ever test case for opaque pointers, before we did it for explicit load/store types, etc.
De-tangling the effects would hover open up the door for proper reasoning about non-escaped local variables in multi-threaded environments, something we basically cannot do right now. In the current situation "may-sync" prevents read-only/none. This is bad now because shuffle intrinics, for example, have (or would need to have) arbitrary side-effects even though they don't need that. If we untangle the attributes we get composing attributes. It would also ensure memory attributes are "local", as the rest of our attributes are. sync/nosync models effects from other threads of execution, there is no need to muddle that with the local read/write behavior of the function at hand.

LangRef does not seem to say anything specifically about intrinsics' attributes (though they do map to function attributes). It also does not mention nocallback at all.

I know :(, I lost track of it: https://reviews.llvm.org/D90275#2453426
I can try to write something, if anyone else wants, please feel free.

jdoerfert added a comment.EditedApr 15 2022, 12:49 PM

Sorry, that was a bit of a jumble of different ideas that I pressed into the form of a single message. Let me try to do better.

I replied to the first post. Let me also reply here ;)

Effects summarization attributes like readnone and so on have to be understood as encompassing the entire behavior of the call. That of course includes any callbacks into visible code the call might make. So the absence of nocallback doesn't weaken other attributes in any way.

Yes, the entire behavior of the call. To be precise, right now it is the entire "becomes visible as part of the call" behavior and not the "local behavior" of the call.

I think the right way of thinking about synchronization, effects-wise, is that it is effectively a potential store to any memory that might be accessible by another thread.

That is what we basically do now, unfortunately. It puts all possible effects by any possible other thread together, which is basically always "worthless".

Synchronization is therefore necessarily excluded by readonly, argmemonly, and so on.

No. We cannot derive either for "may-sync" functions but the user can for sure combine them in a sensible way. If all my shared state is the one argument I pass, argmemonly and "may-sync" work together fine. If I know there is no write effect by other threads that can become visible at this point, e.g., I only wake them up but they haven't done anything on shared state yet, I can use readonly with "may-sync", etc.

We can usually ignore the possibility of stores happening concurrently on other threads, but synchronization forces us to honor them, and that is indistinguishable, in terms of effects, from the store happening directly as part of the call.

Yes. Though with our modeling we can not express that certain effects are known to be not present. We can also not annotate functions based on their local effects and then use the context in which they are called to determine if there are other threads of execution that it might interact with.

I haven't read much about nocallback, so I don't entirely understand what the optimization goal is.

We use it in-tree for inter-procedural reachability analysis. So far, we mostly ask if a load/store may reach another load/store inter-procedurally. If I have nocallback I can get proper results because otherwise I have to assume that things like intrinsics and also known runtime functions are potentially calling any extern user function in my module and thereby reach the load/store in question.

If it's not implied by the existing effects attributes, then I guess we need to add it everywhere, and if that implies test churn, so be it.

Attributes should, in general, not imply other attributes except if we have good reason for it. They compose and that is as it should be.
As an example, I can annotate my malloc and vprintf calls in the OpenMP GPU runtime as nocallback (via __attribute__((leaf))) and then perform proper reachability analysis even though they are not read/write/only/none.

But I want to make sure that we're not modeling something useless in the optimizer just because we can imagine code that it accurately describes; ultimately, the IR is a tool that exists to be of use to us as compiler developers, and perfect accuracy by itself is not a goal.

I agree. But we need to avoid justifying things with the fact that we didn't used in the past. For the longest time we had no parallelism-aware optimizations at all, which allows us to lump synchronization and memory effects together w/o any drawback. We also did not dinstinguish other "non-memory effects", e.g., termination. The latter causes all sorts of hassle for people until we finally decoupled termination from "may-write". I mean, we literally introduced the llvm.sideffect intrinsic to pretend there was a write in a loop that may not actually have one. That kind of solution is not helpful either.