This is an archive of the discontinued LLVM Phabricator instance.

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

Probably not the right fix, but also needs a testcase

It's EarlyCSE, which seems to completely ignore Convergent. That's bad, right?

tpr added a reviewer: sameerds.Nov 16 2018, 3:59 AM

I agree this needs a test case.

Removing IntrNoMem is not a good solution because it pessimizes other things. Ideally we'd say convergent prevents movement across control flow in either direction because other functions are affected by this as well. @sameerds how is that effort going?

In D54516#1298178, @cwabbott wrote:

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

I'm also not convinced that this patch actually fixes the problem. If you have code like this:

if (cc) {
  WWM code
  other stuff v1
} else {
  identical WWM code
  other stuff v2
}

then it is perfectly correct to transform this to

WWM code
if (cc) {
  other stuff v1
} else
  other stuff v2
}

based on LLVM IR semantics. That's why we have the ugly "pass-through inline assembly" hack in the Mesa frontends right now for ballot etc.

We really need the convergent that prevents movement across control flow in either direction.

In D54516#1300997, @nhaehnle wrote:

In D54516#1298178, @cwabbott wrote:

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

No, I meant readnone, since AFAIK that's what IntrNoMem here maps to. The bit about "caller-visible state" was taken directly from the langref entry for readnone. My point was that in something like:

WWM code
if (cc) {
  other stuff v1
} else {
  identical WWM code
  other stuff v2
}

it wouldn't be allowed for LLVM to rewrite the use of the second WWM to point to the first, even now, since the semantics of readnone aren't strong enough to guarantee that the two calls return the same thing, at least as far as I understand (I think someone else, maybe Tom, explained this to me over IRC a while ago). Your issue is different, and indeed removing IntrNoMem isn't going to help with that at all. I agree that we need a better solution for that. But for the current patch, I can't think of a situation where removing NoMem/readonly would disallow a transform that shouldn't be allowed.

EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up.

Nicolai, thanks for reminding me of the Mesa inline asm hack, which you have told me about before. I think my best short term solution is to use that same hack in my frontend when generating llvm.amdgcn.set.inactive. Could you point me at that hack again please?

In D54516#1301013, @cwabbott wrote:
In D54516#1300997, @nhaehnle wrote:

In D54516#1298178, @cwabbott wrote:

I believe the combination of Convergent + not Speculatable should mean that the compiler shouldn't hoist it to a non-control-equivalent block and shouldn't CSE it. In particular, IIRC it's not guaranteed that that a readnone function always return the same value when it's called with the same arguments, so it's not safe to CSE -- it just means that LLVM can move other things across it, since it doesn't modify *caller-visible* state. What pass is causing a problem? Maybe it's a bug in the pass?

readnone is literally readnone. Maybe you're mixing this up with inaccessiblememonly?

No, I meant readnone, since AFAIK that's what IntrNoMem here maps to. The bit about "caller-visible state" was taken directly from the langref entry for readnone. My point was that in something like:
WWM code
if (cc) {
  other stuff v1
} else {
  identical WWM code
  other stuff v2
}
it wouldn't be allowed for LLVM to rewrite the use of the second WWM to point to the first, even now, since the semantics of readnone aren't strong enough to guarantee that the two calls return the same thing, at least as far as I understand (I think someone else, maybe Tom, explained this to me over IRC a while ago).

Both EarlyCSE and GVN seem to disagree with that, at least for individual function calls marked readnone.

@tpr The relevant code in Mesa is here: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/common/ac_llvm_build.c#L367

Oh, I've found it already in use in a different case in LLPC:

; Prevent optimization of backend compiler on the control flow
%2 = call i32 asm sideeffect "; %1", "=v,0"(i32 %1)

In D54516#1301072, @tpr wrote:

EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up.

By "disable", do you mean modifying EarlyCSE to not touch convergent calls? What if you do that for GVN as well? GVN::ValueTable::lookupOrAddCall() seems to be the right place for that. Such spot fixes as a useful step in the right direction, to be preferred over repeating the asm hack. One already exists in GVNHoist. This may sound a bit whackamoley, but the "effort" that @nhaehnle was asking about is essentially a more organized way to audit all the places where convergent calls should be handled specially. That project has not gained sufficient motivation yet to commit to.

In D54516#1301197, @sameerds wrote:

In D54516#1301072, @tpr wrote:

EarlyCSE does seem to common up in this situation. And, if I disable that, I get GVN commoning it up.

By "disable", do you mean modifying EarlyCSE to not touch convergent calls? What if you do that for GVN as well? GVN::ValueTable::lookupOrAddCall() seems to be the right place for that. Such spot fixes as a useful step in the right direction, to be preferred over repeating the asm hack. One already exists in GVNHoist. This may sound a bit whackamoley, but the "effort" that @nhaehnle was asking about is essentially a more organized way to audit all the places where convergent calls should be handled specially. That project has not gained sufficient motivation yet to commit to.

What I mean isn't just about an audit of convergent. It's about redefining what convergent means. Currently, convergent only means "adding control-dependencies is forbidden". This means that what EarlyCSE and GVN are doing is correct.

Instead, we really want convergent to mean "both adding and removing control-dependencies is forbidden". Step one is to change the definition in the LangRef. Step two is the audit of making sure the code actually complies with the new definition.

For a quick fix for my specific problem, I have gone for the frontend hack, so I am abandoning this review.

Revision Contents

Path

Size

include/

llvm/

IR/

IntrinsicsAMDGPU.td

2 lines

Diff 173994

include/llvm/IR/IntrinsicsAMDGPU.td

	Show First 20 Lines • Show All 1,356 Lines • ▼ Show 20 Lines
	// Given a value, copies it while setting all the inactive lanes to a given			// Given a value, copies it while setting all the inactive lanes to a given
	// value. Note that OpenGL helper lanes are considered active, so if the			// value. Note that OpenGL helper lanes are considered active, so if the
	// program ever uses WQM, then the instruction and the first source will be			// program ever uses WQM, then the instruction and the first source will be
	// computed in WQM.			// computed in WQM.
	def int_amdgcn_set_inactive :			def int_amdgcn_set_inactive :
	Intrinsic<[llvm_anyint_ty],			Intrinsic<[llvm_anyint_ty],
	[LLVMMatchType<0>, // value to be copied			[LLVMMatchType<0>, // value to be copied
	LLVMMatchType<0>], // value for the inactive lanes to take			LLVMMatchType<0>], // value for the inactive lanes to take
	[IntrNoMem, IntrConvergent]>;			[IntrConvergent]>;

	// Waterfall intrinics used to tag a region as requiring waterfall loops to			// Waterfall intrinics used to tag a region as requiring waterfall loops to
	// transform VGPR values into SGPR values in a loop			// transform VGPR values into SGPR values in a loop
	// Best case (uniform index in VGPR) the loop will execute once, worst case (all			// Best case (uniform index in VGPR) the loop will execute once, worst case (all
	// values in the index are different) the loop will execute wave-size times.			// values in the index are different) the loop will execute wave-size times.
	// There is a requirement that all three intrinsics are present.			// There is a requirement that all three intrinsics are present.
	// There is also a requirement that all three intrinsics are contained in a			// There is also a requirement that all three intrinsics are contained in a
	// single block.			// single block.
	▲ Show 20 Lines • Show All 245 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMemAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 173994

include/llvm/IR/IntrinsicsAMDGPU.td

[AMDGPU] Do not mark llvm.amdgcn.set.inactive as IntrNoMem
AbandonedPublic