This is an archive of the discontinued LLVM Phabricator instance.

Target: Add convergent flag to INLINEASM instruction
ClosedPublic

Authored by wdng on Jun 9 2016, 5:17 PM.

Diff Detail

Repository
rL LLVM

Event Timeline

wdng updated this revision to Diff 60285.Jun 9 2016, 5:17 PM
wdng retitled this revision from to inline asm - lightning not respecting convergent flag.
wdng updated this object.
wdng added reviewers: tstellarAMD, arsenm, scchan.
wdng set the repository for this revision to rL LLVM.
wdng added a project: Restricted Project.
arsenm added inline comments.Jun 9 2016, 5:18 PM
test/CodeGen/AMDGPU/convergent_flag.ll
1 ↗(On Diff #60285)

This is missing a run line.

You should run instnamer on the test. A better name would also be convergent-inlineasm.ll

3 ↗(On Diff #60285)

Space after ;

4 ↗(On Diff #60285)

Unused arguments cal also be removed. The align 2 should be removed.

6–13 ↗(On Diff #60285)

All of this can be replaced with something simpler, like a single argument use

wdng added a comment.Jun 9 2016, 5:21 PM

I am able to compile the convergent_flag.ll text using llc without any flags. Say if I insert the following line onto the top of the file, it doesn't work. So what's the correct "-march"? Any ideas? (I will add reference assembly later)

; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s

The patch summary should reflect that this is a target independent change, so something like:

Target: Add convergent flag to INLINEASM instruction

test/CodeGen/AMDGPU/convergent_flag.ll
4–28 ↗(On Diff #60285)

It would be great if you could make this test case smaller. I think you could remove most of the instructions from the entry block.

arsenm edited edge metadata.Jun 9 2016, 5:24 PM
In D21214#454261, @wdng wrote:

I am able to compile the convergent_flag.ll text using llc without any flags. Say if I insert the following line onto the top of the file, it doesn't work. So what's the correct "-march"? Any ideas? (I will add reference assembly later)

; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s

Even if it compiles without flags, it's not running llc without the RUN line.

We usually prefer to not have the triple in the test itself and specify that in the llc run line so multiple run lines can use different ones. You need to add -mtriple=amdgcn--amdhsa for the intrinsic to work. You won't need that if you simplify the testcase to not use it.

The check prefix should also be GCN in case VI differences show up.
opt -strip -instnamer will clean up the names of the values so it will be easier to make future changes to the test.

wdng updated this object.Jun 9 2016, 10:41 PM
wdng edited edge metadata.
tstellarAMD retitled this revision from inline asm - lightning not respecting convergent flag to Target: Add convergent flag to INLINEASM instruction.Jun 10 2016, 3:16 AM
tstellarAMD updated this object.
tstellarAMD added reviewers: echristo, jlebar.
jlebar added inline comments.Jun 10 2016, 9:43 AM
include/llvm/Target/Target.td
792 ↗(On Diff #60285)

I am pretty sure this is not what you want.

AIUI this makes *all* inline asm, on all platforms, as convergent. This will prevent certain optimizations on blocks that contain inline asm, which will be a regression on basically every platform other than GPUs.

What we do when compiling CUDA device code is, in clang, mark all *calls* to inline asm as convergent. Call instructions can already individually be marked as convergent or not. This is conservative: The frontend or backend could in theory analyze the inline asm and, if it doesn't contain convergent instructions, not add (or remove) the attribute.

Adding the IR attribute to the call should cover all IR optimizations. But then we still need to mark the inline asm machine instruction as convergent. We're not doing this yet in NVPTX, and it's a bug.

Currently, a machine instruction either is or isn't convergent. The way we handle this for NVPTX call instructions is we have two instructions, a convergent one and a non-convergent one, and when we lower a call, we choose the one or the other.

That may be tricky to do for inline asm, in which case conservatively saying that the relevant machine instruction is always convergent probably wouldn't be a big deal, at least for us.

jlebar requested changes to this revision.Jun 10 2016, 9:43 AM
jlebar edited edge metadata.
This revision now requires changes to proceed.Jun 10 2016, 9:43 AM
wdng updated this revision to Diff 60368.Jun 10 2016, 9:53 AM
wdng edited edge metadata.
wdng marked 5 inline comments as done.

Modified LIT test for adding convergent flag to INLINEASM instruction based on Matt's and Tom's comments.

jlebar added inline comments.Jun 10 2016, 9:59 AM
test/CodeGen/AMDGPU/convergent-inlineasm.ll
1

There is no need for a check-prefix. You can get rid of that and s/GCN/CHECK/ everywhere in this test.

I would also suggest changing this test so that the CHECKs immediately precede the relevant lines of IR.

But more to the point, does this test fail without your change and pass with your change? I don't see how this is checking for convergence at all. I would recommend trying to write the minimal test case that checks for the thing you're changing -- this does not seem to be it, to me. (In fact I'm surprised this works at all, since the IR doesn't define the attributes #1 and #3...)

include/llvm/Target/Target.td
792 ↗(On Diff #60368)

We are also marking calls to inline assembly as convergent, but the problem with inline assembly is that once the IR gets converted to a MachineInstr, the information about convergence is lost.

Because INLINEASM is a target independent instruction, we don't really have the same flexibility when lowering a regular call, because there is not a target equivalent for INLINEASM.

I think INLINEASM also had a similar issue with the hasSideEffects flag, and the solution was to encode that information as one of the operands to INLINEASM, maybe we should encode convergence information in the same way. Does that sound like a good solution?

I think INLINEASM also had a similar issue with the hasSideEffects flag, and the solution was to encode that information as one of the operands to INLINEASM, maybe we should encode convergence information in the same way. Does that sound like a good solution?

Something like the mayLoad property on MachineInstr? I think that would work.

wdng added inline comments.Jun 10 2016, 11:36 AM
test/CodeGen/AMDGPU/convergent-inlineasm.ll
1

If there is a convergence flag, we need to check that the inline assembly is still in the entry block:

; BB#0:

v_mov_b32_e32 v1, 1
;;#ASMSTART
v_cmp_ne_i32_e64 s[2:3], 0, v1
;;#ASMEND
v_cmp_eq_i32_e32 vcc, 8, v0
s_and_saveexec_b64 s[0:1], vcc 
s_xor_b64 s[0:1], exec, s[0:1]

; BB#1:

. . .

; BB0_2:

. . .

That's why GCN CHECK is used in the reference assembly.

jlebar added inline comments.Jun 10 2016, 11:42 AM
test/CodeGen/AMDGPU/convergent-inlineasm.ll
1

OK, thanks for the explanation. As described, this is an end-to-end test, of the sort that we usually do not write in llvm. (Or at least, we do not rely on them as the exclusive means for checking a patch's correctness.)

The problem is, what you've described is not strictly about testing the convergence of an instruction, but rather about checking that convergence prevents a certain optimization that would otherwise run.

But if we change llvm so that this optimization no longer runs (seems reasonable, particularly since we're not running llc -O2 or anything), then your test will always pass, even without the fix you're making in this patch.

At the very least, there needs to be a comment explaining what the test is checking. But again, it's fragile as written, and that imposes a cost on all maintainers. So if there is a simpler way to check that your change does the right thing, I would very much prefer that.

I guess the good news is, to do what Tom suggested in his last comment, you're probably going to want a different set of tests anyway. :)

wdng updated this revision to Diff 61256.Jun 20 2016, 9:19 AM
wdng edited edge metadata.

Tried a different implementation based on Tom's comments.

wdng updated this revision to Diff 61257.Jun 20 2016, 9:22 AM

Modified convergent flag LIT test + code changes based on Tom' suggestion.

What method are you using to upload this patch? Phabricator is only showing me the changes since the previous revision and not the changes when compared to trunk. This doesn't seem to happen for other patches.

include/llvm/IR/InlineAsm.h
226

The indentation looks wrong here.

lib/CodeGen/MachineInstr.cpp
1757–1758

Indentation again.

lib/CodeGen/MachineVerifier.cpp
818

This line is more than 80 characters.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6738

Indentation.

test/CodeGen/AMDGPU/convergent-inlineasm.ll
15

There should be an explicit name on this block. I would recommend running this whole test through opt -metarenamer.

25

This can be dropped.

test/CodeGen/AMDGPU/convergent-inlineasm.ll
25

I mean you can drop the #0 attributes. I think the rest are OK, but I can't see the whole patch.

wdng updated this revision to Diff 61270.Jun 20 2016, 10:58 AM
wdng marked 7 inline comments as done.

Changes based on Tom's comments.

arsenm added inline comments.Jun 20 2016, 11:01 AM
test/CodeGen/AMDGPU/convergent-inlineasm.ll
5

function name should be changed to something about what it is testing

I can see the whole diff now, thanks.

test/CodeGen/AMDGPU/convergent-inlineasm.ll
1

This test is missing CHECK lines.

7

It's a little strange to have the call instruction have a different attribute set than the declaration. I would change this to use the same attributes as the declaration.

arsenm added inline comments.Jun 20 2016, 11:04 AM
test/CodeGen/AMDGPU/convergent-inlineasm.ll
7

The call site attributes can be removed here

Don't we also need to change MachineInstr::isConvergent() to make it check the ExtraInfo (like MachineInstr::mayLoad/mayStore)? If possible please add a test that fails without that change and passes with that change.

wdng updated this revision to Diff 61309.Jun 20 2016, 2:29 PM
wdng marked 4 inline comments as done.

Changes based on Matt and Tom's comments

wdng added a comment.Jun 20 2016, 3:53 PM

Don't we also need to change MachineInstr::isConvergent() to make it check the ExtraInfo (like MachineInstr::mayLoad/mayStore)? If possible please add a test that fails without that change and passes with that change.

"that changes" --> Could you please let me know what changes you are referring to?

In D21214#462755, @wdng wrote:

Don't we also need to change MachineInstr::isConvergent() to make it check the ExtraInfo (like MachineInstr::mayLoad/mayStore)? If possible please add a test that fails without that change and passes with that change.

"that changes" --> Could you please let me know what changes you are referring to?

"That change" referred to the change you've now made to MachineInstr::isConvergent().

wdng updated this revision to Diff 61378.Jun 21 2016, 8:24 AM

Add fail test based on Justin's comments.

In D21214#463301, @wdng wrote:

Add fail test based on Justin's comments.

I think you misunderstood what Justin was asking for. The original test convergent-inlineasm.ll is a test case that reproduces the bug we are trying to fix. That test case will fail without your patch, but it should pass with your patch. So you don't need to add any additional test cases.

The reason Justin made those comments was because your original patch didn't actually fix the bug, so it appeared as though the test case you had added would pass even without your fix.

So, you don't need to add any more tests. Please drop the nonconvergent-inlineasm.ll because it's not really testing anything.

wdng updated this revision to Diff 61399.Jun 21 2016, 10:09 AM

Drop the nonconvergent-inlineasm.ll

In D21214#463301, @wdng wrote:

Add fail test based on Justin's comments.

I think you misunderstood what Justin was asking for. The original test convergent-inlineasm.ll is a test case that reproduces the bug we are trying to fix. That test case will fail without your patch, but it should pass with your patch. So you don't need to add any additional test cases.

The reason Justin made those comments was because your original patch didn't actually fix the bug, so it appeared as though the test case you had added would pass even without your fix.

So, you don't need to add any more tests. Please drop the nonconvergent-inlineasm.ll because it's not really testing anything.

A test with nonconvergent asm in the entry block that still sinks out of the entry block would show the original patch which always is isconvergent on inlineasm would break with it. It doesn't need to be in its own file though

wdng updated this revision to Diff 61573.Jun 22 2016, 10:53 AM

Added a LIT test based on Matt's comment.

wdng updated this revision to Diff 61576.Jun 22 2016, 10:58 AM

Added a LIT test based on Matt's comments.
Dropped an unrelated test for the previous commit.

arsenm accepted this revision.Jun 22 2016, 11:10 AM
arsenm edited edge metadata.

LGTM with trailing whitespace fixed

test/CodeGen/AMDGPU/convergent-inlineasm.ll
29–36

I think the test has some trailing whitespace

This revision was automatically updated to reflect the committed changes.