This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] do not sink intrinsics even with non-constant operands
Needs RevisionPublic

Authored by hakzsam on Mar 4 2017, 7:45 AM.

Details

Summary
[SimplifyCFG] do not sink intrinsics even with non-constant operands

The test #12 in sink-common-code.ll fails not because
@llvm.ctlz/cttz are intrinsics but because the last operand is
constant, and thus canReplaceOperandWithVariable() will return
false and the instructions won't be sunk. Presumably, we want to
avoid sinking all intrinsics, not by luck like that.

However, since "[SimplifyCFG] Change the algorithm in
SinkThenElseCodeToEnd", the pass is more aggressive and it sunks
intrinsics where it should not. This is because the one PHI node
limitation has been removed, but usually intrinsics have many
operands. The bug being described was just hidden.

Currently the pass doesn't allow to sink intrinsics (like other
instructions) with constant operands and that makes sense, but
it's not enough. Instead we should not sink any intrinsics even
if one of the operand is non-constant. Some of them are specials,
such as @llvm.frameaddress or @llvm.SI.image.sample.v2i32 (AMDGPU),
and it's complicated to know if they can be sunk or not. This patch
just adds an early exit in this situation.

This fixes minor rendering issues with "Crusader Kings II" and
recently released "Dirt Rally" (using the AMDGPU backend). Some
intrinsics were wrongly removed.

As a side effect, this has the advantage to reduce GPR pressure
a little bit with AMDGPU.

This only regresses no-md-sink.ll, but that makes sense if we
avoid to sink all intrinsics.

Bugzilla: https://bugs.llvm.org/show_bug.cgi?id=32001
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

It would nice if this patch (or a similar one) could be in the final LLVM 4.0 release because I think it's important for us in order to fix these two games, at least.

If this change is not the right one, maybe we can just re-introduce the one PHI node limitation and fix it up later?

For AMD folks, I see some good changes with my shader-dv private repo, here's the full report https://hastebin.com/xeloteleja.pas.

Thanks for your feedback.

Diff Detail

Event Timeline

hakzsam created this revision.Mar 4 2017, 7:45 AM
jmolloy requested changes to this revision.Mar 5 2017, 7:31 AM

Hi,

I'm not sure this is the best approach. What you're essentially saying here is that we should avoid sinking all intrinsics because they may have unmodelled sideeffects. What I'd prefer is for us to understand which intrinsics have such sideeffects and special case them.

This is the reason canReplaceOperandWithVariable() exists - to allow sinking intrinsics (such as ctlz) without breaking other intrinsics.

It sounds like your @llvm.SI.image.sample.v2i32 intrinsic misbehaves if sunk. Why actually is this? We already don't sink intrinsics where we're going to replace a constant with a variable, so the sunk code should look almost identical at an instruction stream level as the non-sunk code.

Can you explain a bit more why the sinking causes a codegen fault? Perhaps then I might understand better what the best fix would be.

Cheers,

James

This revision now requires changes to proceed.Mar 5 2017, 7:31 AM
arsenm edited edge metadata.Mar 5 2017, 5:17 PM

I think you really want the convergent attribute for parameters patch instead

Hi Matt,

For my own reference (and because I'm having trouble parsing the langref) - should tail sinking of convergent calls be disabled? noduplicate calls should be fine, because noduplicate says nothing about *commoning*, just duplicating.

James

Hi James,

Yes, as I said in the initial commit description, I'm not sure myself if it's the right fix (looks like too aggressive even if it works like this). Instead, I think we might need to add a new contrainst on that intrinsic like noduplicate or convergent.

Basically, the program I'm currently trying to fix with that patch has similar instructions inside a if/else block and SinkThenElseCodeToEnd ends up by removing the else block entirely. This is because it sinks all instructions but llvm.image_sample which is basically a read texture instruction should not be moved. The one phi node limit prevented this intrinsic to be moved before, the bug was hidden.

After looking at the SimplifyCFG pass again, looks like we don't allow to fold blocks that contain noduplicate or convergent calls. Setting the attribute on llvm.image_sample (from Mesa) fixes the issue.

Maybe the issue has to be fixed in Mesa.

Okay, I think the correct fix is to avoid sinking intrinsics if the convergent attribute is set.

How about this?

For me, that sounds OK. I'd like @arsenm to confirm my understanding about the convergent attribute though.

Or more generally, avoid sinking all functions declared with the convergent attribute.

From Intrinsics.td, int_ctpop does not access memory or have any other side effects. So it should be safe to sink. Do you have the complete test case which fails. Also, please include more context with the patch e.g., (diff -U1000 HEAD~)

Hi Matt,

For my own reference (and because I'm having trouble parsing the langref) - should tail sinking of convergent calls be disabled? noduplicate calls should be fine, because noduplicate says nothing about *commoning*, just duplicating.

James

For the uses cases for convergent, such a program is broken from the beginning. If in this example ctpop was a barrier, this would hang if i1 was a divergent branch. Commoning them here would have the effect of "fixing" the program, so I suppose this should be legal.

For clarification @llvm.SI.image.sample.v2i32 is not and should not be convergent. The problematic aspects apply to the uniform sampler arguments, which D26348 is intended to fix

We could actually infer the uniformity of the branch condition in this example because the program is undefined otherwise, so I think this is fine

mehdi_amini requested changes to this revision.Mar 6 2017, 12:18 PM
mehdi_amini added a subscriber: mehdi_amini.

I'm not sure I understand how this patch makes sense as is.

As of today, you can't express this constraint in LLVM in a fine grain way. It is the responsibility of the Front-end to emit/define intrinsic with the right set of conservative flags in the absence of fine-grain ones.

I'm not sure I understand how this patch makes sense as is.

As of today, you can't express this constraint in LLVM in a fine grain way. It is the responsibility of the Front-end to emit/define intrinsic with the right set of conservative flags in the absence of fine-grain ones.

Well, it's just a workaround which doesn't fix the issue as it should be. You can just ignore the patch.

arsenm resigned from this revision.Feb 21 2019, 5:54 PM