The result of this intrinsic depends on what control flow it is inside,
so I am removing the IntrNoMem mark to stop it being CSEd.
Change-Id: Iaf95db68f30c1b07e695d050a03c81f12ca7629d
Differential D54516
[AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem tpr on Nov 14 2018, 12:49 AM. Authored by
Details
Diff Detail
Event TimelineComment Actions I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass? Comment Actions I agree this needs a test case. Removing IntrNoMem is not a good solution because it pessimizes other things. Ideally we'd say convergent prevents movement across control flow in either direction because other functions are affected by this as well. @sameerds how is that effort going? Comment Actions readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly? I'm also not convinced that this patch actually fixes the problem. If you have code like this: if (cc) { WWM code other stuff v1 } else { identical WWM code other stuff v2 } then it is perfectly correct to transform this to WWM code if (cc) { other stuff v1 } else other stuff v2 } based on LLVM IR semantics. That's why we have the ugly "pass-through inline assembly" hack in the Mesa frontends right now for ballot etc. We really need the convergent that prevents movement across control flow in either direction. Comment Actions No, I meant readnone, since AFAIK that's what IntrNoMem here maps to. The bit about "caller-visible state" was taken directly from the langref entry for readnone. My point was that in something like: WWM code if (cc) { other stuff v1 } else { identical WWM code other stuff v2 } it wouldn't be allowed for LLVM to rewrite the use of the second WWM to point to the first, even now, since the semantics of readnone aren't strong enough to guarantee that the two calls return the same thing, at least as far as I understand (I think someone else, maybe Tom, explained this to me over IRC a while ago). Your issue is different, and indeed removing IntrNoMem isn't going to help with that at all. I agree that we need a better solution for that. But for the current patch, I can't think of a situation where removing NoMem/readonly would disallow a transform that shouldn't be allowed. Comment Actions EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up. Nicolai, thanks for reminding me of the Mesa inline asm hack, which you have told me about before. I think my best short term solution is to use that same hack in my frontend when generating llvm.amdgcn.set.inactive. Could you point me at that hack again please? Comment Actions Both EarlyCSE and GVN seem to disagree with that, at least for individual function calls marked readnone. @tpr The relevant code in Mesa is here: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/common/ac_llvm_build.c#L367 Comment Actions Oh, I've found it already in use in a different case in LLPC: ; Prevent optimization of backend compiler on the control flow %2 = call i32 asm sideeffect "; %1", "=v,0"(i32 %1) Comment Actions By "disable", do you mean modifying EarlyCSE to not touch convergent calls? What if you do that for GVN as well? GVN::ValueTable::lookupOrAddCall() seems to be the right place for that. Such spot fixes as a useful step in the right direction, to be preferred over repeating the asm hack. One already exists in GVNHoist. This may sound a bit whackamoley, but the "effort" that @nhaehnle was asking about is essentially a more organized way to audit all the places where convergent calls should be handled specially. That project has not gained sufficient motivation yet to commit to. Comment Actions What I mean isn't just about an audit of convergent. It's about redefining what convergent means. Currently, convergent only means "adding control-dependencies is forbidden". This means that what EarlyCSE and GVN are doing is correct. Instead, we really want convergent to mean "both adding and removing control-dependencies is forbidden". Step one is to change the definition in the LangRef. Step two is the audit of making sure the code actually complies with the new definition. Comment Actions For a quick fix for my specific problem, I have gone for the frontend hack, so I am abandoning this review. |