Page MenuHomePhabricator

Don't treat readnone call in presplit coroutine as not access memory
AcceptedPublic

Authored by ChuanqiXu on Jun 9 2022, 1:26 AM.

Details

Summary

To solve the readnone problems in coroutines. See https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015 for details.

According to the discussion, we decide to fix the problem by inserting isPresplitCoroutine() checks in different passes instead of wrapping/unwrapping readnone attributes in CoroEarly/CoroCleanup passes. In this direction, we might not be able to cover every case at first. Let's take a "find and fix" strategy.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ChuanqiXu added inline comments.Jun 9 2022, 2:38 AM
llvm/include/llvm/IR/InstrTypes.h
1857

Got it. I would convert "coroutine.presplit" into an enum attribute before landing this one.

Thank you for doing this. It looks alright to me.

We probably want to put a note in LangRef noting that "readnone" doesn't encompass the thread id in coroutines.

llvm/include/llvm/IR/InstrTypes.h
1856

Probably we should describe this in more detail in the coroutine documentation, and put a pointer to that documentation in the code.

Address comments:

  • Add description in LangRef.rst and Coroutines.rst
ChuanqiXu marked an inline comment as done.Jun 28 2022, 10:34 PM
ChuanqiXu added inline comments.
llvm/include/llvm/IR/InstrTypes.h
1856

I've added a description in Coroutines.rst but it is not more detailed than this. Do you mean to give more thoughts we've made in https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015? I feel it is too wordy for readers.

Given this is not dependent on D125291, do you think it is better to land this patch first? @nikic @nhaehnle @efriedma

nikic added a comment.Jul 7 2022, 1:23 AM

Compile-time impact is low / acceptable: http://llvm-compile-time-tracker.com/compare.php?from=40a4078e14c2c6c5e2d0a1776285aa7491e791b3&to=d753f388d52778fff19b5a6d82a5b9c4869a4273&stat=instructions

I'm fine with this change, but I didn't follow the original discussion.

llvm/docs/LangRef.rst
1953

Drop "the"

llvm/lib/Analysis/BasicAliasAnalysis.cpp
755–757

You mean the getModRefBehavior call on the function below? I think it may be better to guard that call instead.

Compile-time impact is low / acceptable: http://llvm-compile-time-tracker.com/compare.php?from=40a4078e14c2c6c5e2d0a1776285aa7491e791b3&to=d753f388d52778fff19b5a6d82a5b9c4869a4273&stat=instructions

I'm fine with this change, but I didn't follow the original discussion.

Thanks! The original discussion is really long. But I am sure the direction is agreed by at least @jyknight @nhaehnle and @efriedma

llvm/lib/Analysis/BasicAliasAnalysis.cpp
755–757

Yeah, I mean the call at line 781. I feel it looks better/cleaner/clearer to put the check here. The logic is consistent with the above check (We can't do better.)

nikic added inline comments.Jul 7 2022, 1:37 AM
llvm/lib/Analysis/GlobalsModRef.cpp
270 ↗(On Diff #440857)

I've taken the liberty of deleting this whole function in https://github.com/llvm/llvm-project/commit/4a579abd9f95bf9fda920759aced3874d04c5b9e. This was unnecessary code duplication with BasicAA.

nikic added inline comments.Jul 7 2022, 1:39 AM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
755–757

With your current implementation, won't you still run into a problem with a writeonly function, which will report as writeonly from function FMRB?

ChuanqiXu added inline comments.Jul 7 2022, 2:45 AM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
755–757

Oh, I missed it. Thanks for pointing it out. I'm working on it.

llvm/lib/Analysis/GlobalsModRef.cpp
270 ↗(On Diff #440857)

This might not be right. After the patch, when I call GlobalsAAResult::getModRefBehavior(CallBase*), it would call https://github.com/llvm/llvm-project/blob/519d7876cbee5a5d3cd40d41525cd45e44fb07a8/llvm/include/llvm/Analysis/AliasAnalysis.h#L1236-L1238 all the time, which is not correct.

ChuanqiXu added inline comments.Jul 7 2022, 2:51 AM
llvm/lib/Analysis/GlobalsModRef.cpp
270 ↗(On Diff #440857)

I think we could improve the original implementation. I thought it would call BasicAAResult::getModRefBehavior(CallBase*) too.

nikic added inline comments.Jul 7 2022, 3:01 AM
llvm/lib/Analysis/GlobalsModRef.cpp
270 ↗(On Diff #440857)

This might not be right. After the patch, when I call GlobalsAAResult::getModRefBehavior(CallBase*), it would call https://github.com/llvm/llvm-project/blob/519d7876cbee5a5d3cd40d41525cd45e44fb07a8/llvm/include/llvm/Analysis/AliasAnalysis.h#L1236-L1238 all the time, which is not correct.

This is fine. GlobalsAAResult methods are not intended to be called directly -- the public alias analysis API goes through an AAResults aggregate over multiple AA providers, which always includes BasicAA in a real pipeline. AA providers should not reimplement functionality provided by other AA providers, they are designed to compose via AAResults instead.

ChuanqiXu updated this revision to Diff 442839.Jul 7 2022, 3:17 AM

Address comments:

  • Handle write only cases.
ChuanqiXu marked 4 inline comments as done.Jul 7 2022, 3:18 AM
ChuanqiXu marked an inline comment as done.
ChuanqiXu added inline comments.
llvm/lib/Analysis/GlobalsModRef.cpp
270 ↗(On Diff #440857)

Thanks for clarifying!

Given this is not dependent on D125291, do you think it is better to land this patch first? @nikic @nhaehnle @efriedma

Yes, this change can go in separately first.

Thanks! The original discussion is really long. But I am sure the direction is agreed by at least @jyknight @nhaehnle and @efriedma

Yes, this is the direction agreed upon: just checking "am I in a coroutine" in appropriate places, and pretending we didn't see the readnone/etc data -- with a potential long-term goal to incrementally teach passes to be smarter and understand the concept of "reads only thread-id", so they don't need to just give up entirely.

llvm/docs/LangRef.rst
1953

They can do so always, not just in a presplit coroutine. And "thread-id" isn't a defined term anywhere.

So I think this should probably say "Accessing the current thread's identity, e.g. getting the address of a thread-local variable is not considered a memory read."

llvm/lib/Analysis/BasicAliasAnalysis.cpp
755–757

I don't understand this change. ReadNone/WriteOnly being present on the call versus being present the function doesn't make a difference to the semantics vs presplit coroutines, unlike what's the case for operand bundles.

Hm...oh, ok...I see why you're doing this. Function::doesNotAccessMemory doesn't (can't) be modified to return false if we're in a coroutine, so we don't get the special-cased behavior there. That divergence is sufficiently confusing that I think we should probably not implement it that way.

Just looking at the calls to "doesNotAccessMemory", I wonder if we may be better off moving the query to the callers, anyways, instead of modifying Function/Call doesNotAccessMemory (and friends) themselves. AFAICT, it's not actually correct for all of the callers to get the new behavior.

ChuanqiXu updated this revision to Diff 443115.Jul 7 2022, 7:33 PM
ChuanqiXu marked an inline comment as done.

Address comments.

ChuanqiXu marked an inline comment as done.Jul 7 2022, 7:45 PM
ChuanqiXu added inline comments.
llvm/lib/Analysis/BasicAliasAnalysis.cpp
755–757

I think it is better to modify CallBase::doesNotAccessMemory instead of modify the call sites.

For the perspective of semantics, we've changed the semantics for readnone attribute. It should be naturally correct to modify Call:: doesNotAccessMemory to show the change.

For the perspective of engineering, I think the current method is better too. What I imaged is a new developer who wants to develop a new optimization pass. The implementation calls CallBase:: doesNotAccessMemory at several places. But the implementation may not be right since he forgets to add the in presplit coroutine checks. What I want to say is that it requires the developers more if we choose to add the check at call sites. But we could avoid it.

nikic added a comment.Jul 14 2022, 1:15 AM

Do we also need to upgrade argmemonly to inaccessibleorargmemonly? Assuming that the thread ID counts as inaccessible memory.

llvm/include/llvm/IR/InstrTypes.h
1854

couldn't -> can't, wouldn't -> won't

1871

Due to -> Because

llvm/test/Transforms/Coroutines/coro-readnone-01.ll
6

It would be better to consistently use ptr in the test and drop the -opaque-pointers flag. Currently it mixed ptr and i8*...

Do we also need to upgrade argmemonly to inaccessibleorargmemonly? Assuming that the thread ID counts as inaccessible memory.

Nit: I wouldn't say thread ID counts as inaccessible memory, rather it's something that is apart entirely (given the change for readnone).

I think it would be consistent to say that an argmemonly function is allowed to read the thread ID. That does suggest that analogous changes for onlyAccessesArgMemory are needed, but I haven't thought about it very carefully.

nikic added a comment.Jul 14 2022, 7:02 AM

Do we also need to upgrade argmemonly to inaccessibleorargmemonly? Assuming that the thread ID counts as inaccessible memory.

Nit: I wouldn't say thread ID counts as inaccessible memory, rather it's something that is apart entirely (given the change for readnone).

I think it would be consistent to say that an argmemonly function is allowed to read the thread ID. That does suggest that analogous changes for onlyAccessesArgMemory are needed, but I haven't thought about it very carefully.

Yeah, you're right, it's not inaccessible memory under this model. I guess onlyAccessesInaccessibleMemory and onlyAccessesInaccessibleMemOrArgMem would have to be changed as well. Kinda unfortunate in that things like llvm.assume go from modelling a control dependence to doing an arbitrary read.

Yeah, you're right, it's not inaccessible memory under this model. I guess onlyAccessesInaccessibleMemory and onlyAccessesInaccessibleMemOrArgMem would have to be changed as well. Kinda unfortunate in that things like llvm.assume go from modelling a control dependence to doing an arbitrary read.

True, but at least it's only pre-split. In the Discourse thread we did talk about adding a noreadthreadid attribute, though I think the preliminary conclusion was to wait and see if it'd be actually worth doing the change.

Right. In order to get back to having this optimization power even within unsplit coroutines, we have to start modeling the effect of being dependent on the identity of current thread. That effect includes reading the current thread ID, taking the address of thread-local memory, etc. And the right way to do that is to add a nothreadid attribute (or however we decide to spell it) that calls can affirmatively say they have. Once we have that in the IR, we can also talk about having a language feature that lowers to it (e.g. an attribute stronger than __attribute__((const))).

ChuanqiXu marked 2 inline comments as done.

Address comments: Handle onlyAccessesInaccessibleMemory and onlyAccessesInaccessibleMemOrArgMem.

Yeah, it looks better to add things like noreadthreadid in the future.

nikic added inline comments.Fri, Jul 15, 8:22 AM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
779

Do we lose anything substantial with just the Call->getFunction()->isPresplitCoroutine() condition?

Alternatively, I would implement this as a fixup afterwards that looks something like this:

if (Call->getFunction()->isPreSplitCoroutine())
  Min = FunctionModRefBehavior(Min | FMRB_OnlyReadsMemory);
ChuanqiXu added inline comments.Sun, Jul 17, 8:26 PM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
779

For,

define void @f() presplitcoroutine  {
      entry:
        %ArgMemOnlyCall = call i32 @argmemonly_func()
        ret void
      }

      declare i32 @argmemonly_func() argmemonly

The current implementation would get FMRB_OnlyAccessesArgumentPointees for ArgMemOnlyCall. But the suggested change would get FMRB_UnknownModRefBehavior. I am OK with the suggested change if you feel like the benefit is not worth for the cost. I know compilation time is an important feature of Clang/LLVM.

nikic added inline comments.Mon, Jul 18, 1:16 AM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
779

But isn't FMRB_OnlyAccessesArgumentPointees incorrect, because the thread ID is not an argument, so it accesses non-argument memory?

ChuanqiXu updated this revision to Diff 445421.Mon, Jul 18, 1:44 AM

Handle argmemonly.

ChuanqiXu added inline comments.Mon, Jul 18, 1:46 AM
llvm/lib/Analysis/BasicAliasAnalysis.cpp
779

Oh, I missed that case mentioned by @nhaehnle. Then the current style would return FMRB_UnknownModRefBehavior for ReadOnlyCall but the previous style would return FMRB_OnlyReadsMemory.

define void @f() presplitcoroutine  {
      entry:
        %ReadOnlyCall = call i32 @readonly_func()
        ret void
}

declare i32 @ readonly_func() readonly

But I feel like the benefit is not worthy. So I follow your suggestion in this revision.

nikic accepted this revision.Mon, Jul 18, 7:14 AM

LGTM, but please wait a day in case there are more comments.

llvm/lib/Analysis/BasicAliasAnalysis.cpp
771

writelonly -> writeonly

This revision is now accepted and ready to land.Mon, Jul 18, 7:14 AM

LGTM, but please wait a day in case there are more comments.

Thanks for reviewing!

This revision was landed with ongoing or failed builds.Tue, Jul 19, 7:39 PM
This revision was automatically updated to reflect the committed changes.
ChuanqiXu reopened this revision.Wed, Jul 20, 2:04 AM

I met 2 crashes after landing the patch. From my point of view, the problems of the 2 crashes lives somewhere else. I sent 2 draft fixes D130153 and D130155 to fix these 2 crashes. But I am not sure if these 2 fixes are good and I reverted this one to avoid disturbing any one else.

This revision is now accepted and ready to land.Wed, Jul 20, 2:04 AM
fhahn added a subscriber: fhahn.Mon, Jul 25, 9:41 AM

The follow-up changes required by this patch (D130155, D130153) seem to highlight some gaps with the new modeling . As a side-effect we now consider intrinsics we know *won't* read the thread id as may-read, even if they are marked readonly, which breaks assumptions in a couple of places. There probably will be more. It seems like we should at least have a way to mark functions we know don't access thread ids, rather than updating code to account for the fact that read none intrinsics now are considered may-read in some circumstances.

I have not been through all the previous discussion, but from the langref changes the new behavior of doesNotAccessMemory & co is not very clear to me. The langref change specifically calls out reading the id of the current thread being not considered reading memory. But doesNotAccessMemory seems to consider it accessing memory in Coro-split functions? Shouldn't the LangRef's definition at least clearly call out the interaction with the presplitcoroutine attribute?

In general, reasoning about the interaction between various memory attributes is already quite tricky and making their behavior dependent on a different attribute seems to make it even more difficult.

It seems like we should at least have a way to mark functions we know don't access thread ids, rather than updating code to account for the fact that read none intrinsics now are considered may-read in some circumstances.

The overall plan involves adding a "nothreadid" attribute (name subject to bikeshedding) at some point. But it was left out of the initial patch, since it's not critical for correctness.

I have not been through all the previous discussion, but from the langref changes the new behavior of doesNotAccessMemory & co is not very clear to me. The langref change specifically calls out reading the id of the current thread being not considered reading memory. But doesNotAccessMemory seems to consider it accessing memory in Coro-split functions? Shouldn't the LangRef's definition at least clearly call out the interaction with the presplitcoroutine attribute?

https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/48 is a summary of the overall design here. I'd suggest reading the rest of the discussion if you want more context... it's hard to summarize the discussion, but we concluded this was the least disruptive solution.

It might be worth explicitly calling out the interaction with coroutines more explicitly in LangRef, sure.

The follow-up changes required by this patch (D130155, D130153) seem to highlight some gaps with the new modeling . As a side-effect we now consider intrinsics we know *won't* read the thread id as may-read, even if they are marked readonly, which breaks assumptions in a couple of places. There probably will be more.

(Besides what Eli has said about the modeling)

It surprised me too that the patch would cause crash. I thought the only effect of the patch is to block some optimizations in presplit coroutines and this is the known cost. And for the assumptions, we would meet crash once we break it. So it looks like the range of the assumptions could be tested. And I've tested our internal workloads and folly (The largest open sourced user of coroutines I know) and it looks fine. So I think the places would not be too much.

fhahn added a comment.Thu, Aug 4, 6:49 AM

https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/48 is a summary of the overall design here. I'd suggest reading the rest of the discussion if you want more context... it's hard to summarize the discussion, but we concluded this was the least disruptive solution.

It might be worth explicitly calling out the interaction with coroutines more explicitly in LangRef, sure.

Thanks Eli, that's very helpful message to read! IIUC the plan is to an extra attribute to allow the distinction between read none & not reading thread id, which should be used for all/most current intrinsics. This should avoid the need for workarounds like D130155 & D130153 by better modeling semantics. IMO this would be practical reasons to adding the attribute up front.

https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015/48 is a summary of the overall design here. I'd suggest reading the rest of the discussion if you want more context... it's hard to summarize the discussion, but we concluded this was the least disruptive solution.

It might be worth explicitly calling out the interaction with coroutines more explicitly in LangRef, sure.

Thanks Eli, that's very helpful message to read! IIUC the plan is to an extra attribute to allow the distinction between read none & not reading thread id, which should be used for all/most current intrinsics. This should avoid the need for workarounds like D130155 & D130153 by better modeling semantics. IMO this would be practical reasons to adding the attribute up front.

Do you suggest to add the noread_threadid attribute first? So that we could emit noread_threadid for intrinsics like memset or dbg.declare so that we could skip some workarounds like D130155 & D130153. Do I understand right? IIUC, I'm OK since 15.x is already branched and we have enough time.