This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] do not sink intrinsics even with non-constant operands
Needs RevisionPublic

Authored by hakzsam on Mar 4 2017, 7:45 AM.

Download Raw Diff

Details

Reviewers

nhaehnle
jmolloy
mehdi_amini
mareko
arsenm

Summary

[SimplifyCFG] do not sink intrinsics even with non-constant operands

The test #12 in sink-common-code.ll fails not because
@llvm.ctlz/cttz are intrinsics but because the last operand is
constant, and thus canReplaceOperandWithVariable() will return
false and the instructions won't be sunk. Presumably, we want to
avoid sinking all intrinsics, not by luck like that.

However, since "[SimplifyCFG] Change the algorithm in
SinkThenElseCodeToEnd", the pass is more aggressive and it sunks
intrinsics where it should not. This is because the one PHI node
limitation has been removed, but usually intrinsics have many
operands. The bug being described was just hidden.

Currently the pass doesn't allow to sink intrinsics (like other
instructions) with constant operands and that makes sense, but
it's not enough. Instead we should not sink any intrinsics even
if one of the operand is non-constant. Some of them are specials,
such as @llvm.frameaddress or @llvm.SI.image.sample.v2i32 (AMDGPU),
and it's complicated to know if they can be sunk or not. This patch
just adds an early exit in this situation.

This fixes minor rendering issues with "Crusader Kings II" and
recently released "Dirt Rally" (using the AMDGPU backend). Some
intrinsics were wrongly removed.

As a side effect, this has the advantage to reduce GPR pressure
a little bit with AMDGPU.

This only regresses no-md-sink.ll, but that makes sense if we
avoid to sink all intrinsics.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=32001
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

It would nice if this patch (or a similar one) could be in the final LLVM 4.0 release because I think it's important for us in order to fix these two games, at least.

If this change is not the right one, maybe we can just re-introduce the one PHI node limitation and fix it up later?

For AMD folks, I see some good changes with my shader-dv private repo, here's the full report https://hastebin.com/xeloteleja.pas.

Thanks for your feedback.

Diff Detail

Event Timeline

hakzsam created this revision.Mar 4 2017, 7:45 AM

Herald added subscribers: tpr, wdng. · View Herald TranscriptMar 4 2017, 7:45 AM

Hi,

I'm not sure this is the best approach. What you're essentially saying here is that we should avoid sinking all intrinsics because they may have unmodelled sideeffects. What I'd prefer is for us to understand which intrinsics have such sideeffects and special case them.

This is the reason canReplaceOperandWithVariable() exists - to allow sinking intrinsics (such as ctlz) without breaking other intrinsics.

It sounds like your @llvm.SI.image.sample.v2i32 intrinsic misbehaves if sunk. Why actually is this? We already don't sink intrinsics where we're going to replace a constant with a variable, so the sunk code should look almost identical at an instruction stream level as the non-sunk code.

Can you explain a bit more why the sinking causes a codegen fault? Perhaps then I might understand better what the best fix would be.

Cheers,

James

This revision now requires changes to proceed.Mar 5 2017, 7:31 AM

I think you really want the convergent attribute for parameters patch instead

Hi Matt,

For my own reference (and because I'm having trouble parsing the langref) - should tail sinking of convergent calls be disabled? noduplicate calls should be fine, because noduplicate says nothing about *commoning*, just duplicating.

James

Hi James,

Yes, as I said in the initial commit description, I'm not sure myself if it's the right fix (looks like too aggressive even if it works like this). Instead, I think we might need to add a new contrainst on that intrinsic like noduplicate or convergent.

Basically, the program I'm currently trying to fix with that patch has similar instructions inside a if/else block and SinkThenElseCodeToEnd ends up by removing the else block entirely. This is because it sinks all instructions but llvm.image_sample which is basically a read texture instruction should not be moved. The one phi node limit prevented this intrinsic to be moved before, the bug was hidden.

After looking at the SimplifyCFG pass again, looks like we don't allow to fold blocks that contain noduplicate or convergent calls. Setting the attribute on llvm.image_sample (from Mesa) fixes the issue.

Maybe the issue has to be fixed in Mesa.

Okay, I think the correct fix is to avoid sinking intrinsics if the convergent attribute is set.

How about this?

For me, that sounds OK. I'd like @arsenm to confirm my understanding about the convergent attribute though.

Or more generally, avoid sinking all functions declared with the convergent attribute.

From Intrinsics.td, int_ctpop does not access memory or have any other side effects. So it should be safe to sink. Do you have the complete test case which fails. Also, please include more context with the patch e.g., (diff -U1000 HEAD~)

In D30609#692817, @jmolloy wrote:

Hi Matt,

For my own reference (and because I'm having trouble parsing the langref) - should tail sinking of convergent calls be disabled? noduplicate calls should be fine, because noduplicate says nothing about *commoning*, just duplicating.

James

For the uses cases for convergent, such a program is broken from the beginning. If in this example ctpop was a barrier, this would hang if i1 was a divergent branch. Commoning them here would have the effect of "fixing" the program, so I suppose this should be legal.

For clarification @llvm.SI.image.sample.v2i32 is not and should not be convergent. The problematic aspects apply to the uniform sampler arguments, which D26348 is intended to fix

We could actually infer the uniformity of the branch condition in this example because the program is undefined otherwise, so I think this is fine

https://reviews.llvm.org/D26348 is probably better.

I'm not sure I understand how this patch makes sense as is.

As of today, you can't express this constraint in LLVM in a fine grain way. It is the responsibility of the Front-end to emit/define intrinsic with the right set of conservative flags in the absence of fine-grain ones.

In D30609#693535, @mehdi_amini wrote:

I'm not sure I understand how this patch makes sense as is.

As of today, you can't express this constraint in LLVM in a fine grain way. It is the responsibility of the Front-end to emit/define intrinsic with the right set of conservative flags in the absence of fine-grain ones.

Well, it's just a workaround which doesn't fix the issue as it should be. You can just ignore the patch.

arsenm resigned from this revision.Feb 21 2019, 5:54 PM

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 21 2019, 5:54 PM

Revision Contents

Path

Size

lib/

Transforms/

Utils/

SimplifyCFG.cpp

4 lines

test/

Transforms/

SimplifyCFG/

sink-common-code.ll

26 lines

Diff 90580

lib/Transforms/Utils/SimplifyCFG.cpp

Context not available.
	// have no uses.	// have no uses.
	if (!isa<StoreInst>(I) && !I->hasOneUse())	if (!isa<StoreInst>(I) && !I->hasOneUse())
	return false;	return false;
		// An intrinsic should not be moved even if one of its operand can be
		// replaced by a variable.
		if (isa<IntrinsicInst>(I))
		return false;
	}	}

	const Instruction *I0 = Insts.front();	const Instruction *I0 = Insts.front();
Context not available.

test/Transforms/SimplifyCFG/sink-common-code.ll

Context not available.
	; CHECK: call i32 @llvm.ctlz	; CHECK: call i32 @llvm.ctlz
	; CHECK: call i32 @llvm.cttz	; CHECK: call i32 @llvm.cttz

		; We can't common an intrinsic even with non-constant operands!
		define i32 @test_intrinsic(i1 zeroext %flag, i32 %w, i32 %x, i32 %y) {
		entry:
		br i1 %flag, label %if.then, label %if.else

		if.then:
		%dummy = add i32 %w, 5
		%sv1 = call i32 @llvm.ctpop.i32(i32 %y)
		br label %if.end

		if.else:
		%dummy1 = add i32 %w, 6
		%sv2 = call i32 @llvm.ctpop.i32(i32 %x)
		br label %if.end

		if.end:
		%p = phi i32 [ %sv1, %if.then ], [ %sv2, %if.else ]
		ret i32 1
		}

		declare i32 @llvm.ctpop.i32(i32 %x)

		; CHECK-LABEL: test_intrinsic
		; CHECK: call i32 @llvm.ctpop
		; CHECK: call i32 @llvm.ctpop

	; The TBAA metadata should be properly combined.	; The TBAA metadata should be properly combined.
	define i32 @test13(i1 zeroext %flag, i32 %x, i32* %y) {	define i32 @test13(i1 zeroext %flag, i32 %x, i32* %y) {
	entry:	entry:
Context not available.