This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem
AbandonedPublic

Authored by tpr on Nov 14 2018, 12:49 AM.

Details

Summary

The result of this intrinsic depends on what control flow it is inside,
so I am removing the IntrNoMem mark to stop it being CSEd.

Change-Id: Iaf95db68f30c1b07e695d050a03c81f12ca7629d

Diff Detail

Event Timeline

tpr created this revision.Nov 14 2018, 12:49 AM

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

arsenm added a subscriber: arsenm.Nov 14 2018, 9:00 AM

Probably not the right fix, but also needs a testcase

tpr added a comment.Nov 16 2018, 3:29 AM

It's EarlyCSE, which seems to completely ignore Convergent. That's bad, right?

I agree this needs a test case.

Removing IntrNoMem is not a good solution because it pessimizes other things. Ideally we'd say convergent prevents movement across control flow in either direction because other functions are affected by this as well. @sameerds how is that effort going?

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

I'm also not convinced that this patch actually fixes the problem. If you have code like this:

if (cc) {
  WWM code
  other stuff v1
} else {
  identical WWM code
  other stuff v2
}

then it is perfectly correct to transform this to

WWM code
if (cc) {
  other stuff v1
} else
  other stuff v2
}

based on LLVM IR semantics. That's why we have the ugly "pass-through inline assembly" hack in the Mesa frontends right now for ballot etc.

We really need the convergent that prevents movement across control flow in either direction.

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

No, I meant readnone, since AFAIK that's what IntrNoMem here maps to. The bit about "caller-visible state" was taken directly from the langref entry for readnone. My point was that in something like:

WWM code
if (cc) {
  other stuff v1
} else {
  identical WWM code
  other stuff v2
}

it wouldn't be allowed for LLVM to rewrite the use of the second WWM to point to the first, even now, since the semantics of readnone aren't strong enough to guarantee that the two calls return the same thing, at least as far as I understand (I think someone else, maybe Tom, explained this to me over IRC a while ago). Your issue is different, and indeed removing IntrNoMem isn't going to help with that at all. I agree that we need a better solution for that. But for the current patch, I can't think of a situation where removing NoMem/readonly would disallow a transform that shouldn't be allowed.

tpr added a comment.Nov 16 2018, 6:17 AM

EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up.

Nicolai, thanks for reminding me of the Mesa inline asm hack, which you have told me about before. I think my best short term solution is to use that same hack in my frontend when generating llvm.amdgcn.set.inactive. Could you point me at that hack again please?

nhaehnle added a comment.EditedNov 16 2018, 6:25 AM

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

No, I meant readnone, since AFAIK that's what IntrNoMem here maps to. The bit about "caller-visible state" was taken directly from the langref entry for readnone. My point was that in something like:

WWM code
if (cc) {
  other stuff v1
} else {
  identical WWM code
  other stuff v2
}

it wouldn't be allowed for LLVM to rewrite the use of the second WWM to point to the first, even now, since the semantics of readnone aren't strong enough to guarantee that the two calls return the same thing, at least as far as I understand (I think someone else, maybe Tom, explained this to me over IRC a while ago).

Both EarlyCSE and GVN seem to disagree with that, at least for individual function calls marked readnone.

@tpr The relevant code in Mesa is here: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/common/ac_llvm_build.c#L367

tpr added a comment.Nov 16 2018, 6:25 AM

Oh, I've found it already in use in a different case in LLPC:

; Prevent optimization of backend compiler on the control flow
%2 = call i32 asm sideeffect "; %1", "=v,0"(i32 %1)
In D54516#1301072, @tpr wrote:

EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up.

By "disable", do you mean modifying EarlyCSE to not touch convergent calls? What if you do that for GVN as well? GVN::ValueTable::lookupOrAddCall() seems to be the right place for that. Such spot fixes as a useful step in the right direction, to be preferred over repeating the asm hack. One already exists in GVNHoist. This may sound a bit whackamoley, but the "effort" that @nhaehnle was asking about is essentially a more organized way to audit all the places where convergent calls should be handled specially. That project has not gained sufficient motivation yet to commit to.

In D54516#1301072, @tpr wrote:

EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up.

By "disable", do you mean modifying EarlyCSE to not touch convergent calls? What if you do that for GVN as well? GVN::ValueTable::lookupOrAddCall() seems to be the right place for that. Such spot fixes as a useful step in the right direction, to be preferred over repeating the asm hack. One already exists in GVNHoist. This may sound a bit whackamoley, but the "effort" that @nhaehnle was asking about is essentially a more organized way to audit all the places where convergent calls should be handled specially. That project has not gained sufficient motivation yet to commit to.

What I mean isn't just about an audit of convergent. It's about redefining what convergent means. Currently, convergent only means "adding control-dependencies is forbidden". This means that what EarlyCSE and GVN are doing is correct.

Instead, we really want convergent to mean "both adding and removing control-dependencies is forbidden". Step one is to change the definition in the LangRef. Step two is the audit of making sure the code actually complies with the new definition.

tpr abandoned this revision.Nov 16 2018, 10:05 AM

For a quick fix for my specific problem, I have gone for the frontend hack, so I am abandoning this review.