Download Raw Diff

Details

Reviewers

• tstellarAMD
echristo
jlebar
scchan
arsenm

Commits

rG0526e7f8d907: AMDGPU: Add convergent flag to INLINEASM instruction.
rL273455: AMDGPU: Add convergent flag to INLINEASM instruction.

Diff Detail

Repository: rL LLVM

Event Timeline

wdng updated this revision to Diff 60285.Jun 9 2016, 5:17 PM

wdng retitled this revision from to inline asm - lightning not respecting convergent flag.

wdng updated this object.

wdng added reviewers: • tstellarAMD, arsenm, scchan.

wdng set the repository for this revision to rL LLVM.

wdng added a project: Restricted Project.

arsenm added inline comments.Jun 9 2016, 5:18 PM

test/CodeGen/AMDGPU/convergent_flag.ll
1 ↗	(On Diff #60285)	This is missing a run line. You should run instnamer on the test. A better name would also be convergent-inlineasm.ll
3 ↗	(On Diff #60285)	Space after ;
4 ↗	(On Diff #60285)	Unused arguments cal also be removed. The align 2 should be removed.
6–13 ↗	(On Diff #60285)	All of this can be replaced with something simpler, like a single argument use

I am able to compile the convergent_flag.ll text using llc without any flags. Say if I insert the following line onto the top of the file, it doesn't work. So what's the correct "-march"? Any ideas? (I will add reference assembly later)

; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s

The patch summary should reflect that this is a target independent change, so something like:

Target: Add convergent flag to INLINEASM instruction

test/CodeGen/AMDGPU/convergent_flag.ll
4–28 ↗	(On Diff #60285)	It would be great if you could make this test case smaller. I think you could remove most of the instructions from the entry block.

In D21214#454261, @wdng wrote:

I am able to compile the convergent_flag.ll text using llc without any flags. Say if I insert the following line onto the top of the file, it doesn't work. So what's the correct "-march"? Any ideas? (I will add reference assembly later)

; RUN: llc -march=amdgcn -verify-machineinstrs < %s | FileCheck -check-prefix=SI %s

Even if it compiles without flags, it's not running llc without the RUN line.

We usually prefer to not have the triple in the test itself and specify that in the llc run line so multiple run lines can use different ones. You need to add -mtriple=amdgcn--amdhsa for the intrinsic to work. You won't need that if you simplify the testcase to not use it.

The check prefix should also be GCN in case VI differences show up.
opt -strip -instnamer will clean up the names of the values so it will be easier to make future changes to the test.

wdng updated this object.Jun 9 2016, 10:41 PM

wdng edited edge metadata.

• tstellarAMD retitled this revision from inline asm - lightning not respecting convergent flag to Target: Add convergent flag to INLINEASM instruction.Jun 10 2016, 3:16 AM

• tstellarAMD updated this object.

• tstellarAMD added reviewers: echristo, jlebar.

jlebar added inline comments.Jun 10 2016, 9:43 AM

include/llvm/Target/Target.td
792 ↗	(On Diff #60285)	I am pretty sure this is not what you want. AIUI this makes all inline asm, on all platforms, as convergent. This will prevent certain optimizations on blocks that contain inline asm, which will be a regression on basically every platform other than GPUs. What we do when compiling CUDA device code is, in clang, mark all calls to inline asm as convergent. Call instructions can already individually be marked as convergent or not. This is conservative: The frontend or backend could in theory analyze the inline asm and, if it doesn't contain convergent instructions, not add (or remove) the attribute. Adding the IR attribute to the call should cover all IR optimizations. But then we still need to mark the inline asm machine instruction as convergent. We're not doing this yet in NVPTX, and it's a bug. Currently, a machine instruction either is or isn't convergent. The way we handle this for NVPTX call instructions is we have two instructions, a convergent one and a non-convergent one, and when we lower a call, we choose the one or the other. That may be tricky to do for inline asm, in which case conservatively saying that the relevant machine instruction is always convergent probably wouldn't be a big deal, at least for us.

jlebar requested changes to this revision.Jun 10 2016, 9:43 AM

jlebar edited edge metadata.

This revision now requires changes to proceed.Jun 10 2016, 9:43 AM

Modified LIT test for adding convergent flag to INLINEASM instruction based on Matt's and Tom's comments.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptJun 10 2016, 9:53 AM

jlebar added inline comments.Jun 10 2016, 9:59 AM

test/CodeGen/AMDGPU/convergent-inlineasm.ll
1	There is no need for a check-prefix. You can get rid of that and s/GCN/CHECK/ everywhere in this test. I would also suggest changing this test so that the CHECKs immediately precede the relevant lines of IR. But more to the point, does this test fail without your change and pass with your change? I don't see how this is checking for convergence at all. I would recommend trying to write the minimal test case that checks for the thing you're changing -- this does not seem to be it, to me. (In fact I'm surprised this works at all, since the IR doesn't define the attributes #1 and #3...)

• tstellarAMD added inline comments.Jun 10 2016, 10:05 AM

include/llvm/Target/Target.td
792 ↗	(On Diff #60368)	We are also marking calls to inline assembly as convergent, but the problem with inline assembly is that once the IR gets converted to a MachineInstr, the information about convergence is lost. Because INLINEASM is a target independent instruction, we don't really have the same flexibility when lowering a regular call, because there is not a target equivalent for INLINEASM. I think INLINEASM also had a similar issue with the hasSideEffects flag, and the solution was to encode that information as one of the operands to INLINEASM, maybe we should encode convergence information in the same way. Does that sound like a good solution?

I think INLINEASM also had a similar issue with the hasSideEffects flag, and the solution was to encode that information as one of the operands to INLINEASM, maybe we should encode convergence information in the same way. Does that sound like a good solution?

Something like the mayLoad property on MachineInstr? I think that would work.

wdng added inline comments.Jun 10 2016, 11:36 AM

test/CodeGen/AMDGPU/convergent-inlineasm.ll
1	If there is a convergence flag, we need to check that the inline assembly is still in the entry block: ; BB#0: v_mov_b32_e32 v1, 1 ;;#ASMSTART v_cmp_ne_i32_e64 s[2:3], 0, v1 ;;#ASMEND v_cmp_eq_i32_e32 vcc, 8, v0 s_and_saveexec_b64 s[0:1], vcc s_xor_b64 s[0:1], exec, s[0:1] ; BB#1: . . . ; BB0_2: . . . That's why GCN CHECK is used in the reference assembly.

jlebar added inline comments.Jun 10 2016, 11:42 AM

test/CodeGen/AMDGPU/convergent-inlineasm.ll
1	OK, thanks for the explanation. As described, this is an end-to-end test, of the sort that we usually do not write in llvm. (Or at least, we do not rely on them as the exclusive means for checking a patch's correctness.) The problem is, what you've described is not strictly about testing the convergence of an instruction, but rather about checking that convergence prevents a certain optimization that would otherwise run. But if we change llvm so that this optimization no longer runs (seems reasonable, particularly since we're not running llc -O2 or anything), then your test will always pass, even without the fix you're making in this patch. At the very least, there needs to be a comment explaining what the test is checking. But again, it's fragile as written, and that imposes a cost on all maintainers. So if there is a simpler way to check that your change does the right thing, I would very much prefer that. I guess the good news is, to do what Tom suggested in his last comment, you're probably going to want a different set of tests anyway. :)

Tried a different implementation based on Tom's comments.

Modified convergent flag LIT test + code changes based on Tom' suggestion.

What method are you using to upload this patch? Phabricator is only showing me the changes since the previous revision and not the changes when compared to trunk. This doesn't seem to happen for other patches.

include/llvm/IR/InlineAsm.h
226	The indentation looks wrong here.
lib/CodeGen/MachineInstr.cpp
1757–1758	Indentation again.
lib/CodeGen/MachineVerifier.cpp
818–820	This line is more than 80 characters.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6738	Indentation.
test/CodeGen/AMDGPU/convergent-inlineasm.ll
15	There should be an explicit name on this block. I would recommend running this whole test through opt -metarenamer.
25	This can be dropped.

• tstellarAMD added inline comments.Jun 20 2016, 9:33 AM

test/CodeGen/AMDGPU/convergent-inlineasm.ll
25	I mean you can drop the #0 attributes. I think the rest are OK, but I can't see the whole patch.

Changes based on Tom's comments.

arsenm added inline comments.Jun 20 2016, 11:01 AM

test/CodeGen/AMDGPU/convergent-inlineasm.ll
5	function name should be changed to something about what it is testing

I can see the whole diff now, thanks.

test/CodeGen/AMDGPU/convergent-inlineasm.ll
1	This test is missing CHECK lines.
7	It's a little strange to have the call instruction have a different attribute set than the declaration. I would change this to use the same attributes as the declaration.

arsenm added inline comments.Jun 20 2016, 11:04 AM

test/CodeGen/AMDGPU/convergent-inlineasm.ll
7	The call site attributes can be removed here

Don't we also need to change MachineInstr::isConvergent() to make it check the ExtraInfo (like MachineInstr::mayLoad/mayStore)? If possible please add a test that fails without that change and passes with that change.

Changes based on Matt and Tom's comments

In D21214#462483, @jlebar wrote:

Don't we also need to change MachineInstr::isConvergent() to make it check the ExtraInfo (like MachineInstr::mayLoad/mayStore)? If possible please add a test that fails without that change and passes with that change.

"that changes" --> Could you please let me know what changes you are referring to?

In D21214#462755, @wdng wrote:

In D21214#462483, @jlebar wrote:

Don't we also need to change MachineInstr::isConvergent() to make it check the ExtraInfo (like MachineInstr::mayLoad/mayStore)? If possible please add a test that fails without that change and passes with that change.

"that changes" --> Could you please let me know what changes you are referring to?

"That change" referred to the change you've now made to MachineInstr::isConvergent().

Add fail test based on Justin's comments.

In D21214#463301, @wdng wrote:

Add fail test based on Justin's comments.

I think you misunderstood what Justin was asking for. The original test convergent-inlineasm.ll is a test case that reproduces the bug we are trying to fix. That test case will fail without your patch, but it should pass with your patch. So you don't need to add any additional test cases.

The reason Justin made those comments was because your original patch didn't actually fix the bug, so it appeared as though the test case you had added would pass even without your fix.

So, you don't need to add any more tests. Please drop the nonconvergent-inlineasm.ll because it's not really testing anything.

Drop the nonconvergent-inlineasm.ll

In D21214#463362, @tstellarAMD wrote:

In D21214#463301, @wdng wrote:

Add fail test based on Justin's comments.

I think you misunderstood what Justin was asking for. The original test convergent-inlineasm.ll is a test case that reproduces the bug we are trying to fix. That test case will fail without your patch, but it should pass with your patch. So you don't need to add any additional test cases.

The reason Justin made those comments was because your original patch didn't actually fix the bug, so it appeared as though the test case you had added would pass even without your fix.

So, you don't need to add any more tests. Please drop the nonconvergent-inlineasm.ll because it's not really testing anything.

A test with nonconvergent asm in the entry block that still sinks out of the entry block would show the original patch which always is isconvergent on inlineasm would break with it. It doesn't need to be in its own file though

Added a LIT test based on Matt's comment.

Added a LIT test based on Matt's comments.
Dropped an unrelated test for the previous commit.

LGTM with trailing whitespace fixed

test/CodeGen/AMDGPU/convergent-inlineasm.ll
30–37	I think the test has some trailing whitespace

Closed by commit rL273455: AMDGPU: Add convergent flag to INLINEASM instruction. (authored by wdng). · Explain WhyJun 22 2016, 11:58 AM

This revision was automatically updated to reflect the committed changes.

Diff 61270

include/llvm/IR/InlineAsm.h

Context not available.
	Extra_AsmDialect = 4,	Extra_AsmDialect = 4,
	Extra_MayLoad = 8,	Extra_MayLoad = 8,
	Extra_MayStore = 16,	Extra_MayStore = 16,
		Extra_IsConvergent = 32,
		tstellarAMDUnsubmitted Done Reply Inline Actions The indentation looks wrong here. tstellarAMD: The indentation looks wrong here.

	// Inline asm operands map to multiple SDNode / MachineInstr operands.	// Inline asm operands map to multiple SDNode / MachineInstr operands.
	// The first operand is an immediate describing the asm operand, the low	// The first operand is an immediate describing the asm operand, the low
Context not available.

lib/CodeGen/MachineInstr.cpp

Context not available.
	OS << " [mayload]";	OS << " [mayload]";
	if (ExtraInfo & InlineAsm::Extra_MayStore)	if (ExtraInfo & InlineAsm::Extra_MayStore)
	OS << " [maystore]";	OS << " [maystore]";
		if (ExtraInfo & InlineAsm::Extra_IsConvergent)
		OS << " [isconvergent]";
		tstellarAMDUnsubmitted Done Reply Inline Actions Indentation again. tstellarAMD: Indentation again.
	if (ExtraInfo & InlineAsm::Extra_IsAlignStack)	if (ExtraInfo & InlineAsm::Extra_IsAlignStack)
	OS << " [alignstack]";	OS << " [alignstack]";
	if (getInlineAsmDialect() == InlineAsm::AD_ATT)	if (getInlineAsmDialect() == InlineAsm::AD_ATT)
Context not available.

lib/CodeGen/MachineVerifier.cpp

Context not available.
	if (!MI->getOperand(1).isImm())	if (!MI->getOperand(1).isImm())
	report("Asm flags must be an immediate", MI);	report("Asm flags must be an immediate", MI);
	// Allowed flags are Extra_HasSideEffects = 1, Extra_IsAlignStack = 2,	// Allowed flags are Extra_HasSideEffects = 1, Extra_IsAlignStack = 2,
	// Extra_AsmDialect = 4, Extra_MayLoad = 8, and Extra_MayStore = 16.	// Extra_AsmDialect = 4, Extra_MayLoad = 8, and Extra_MayStore = 16,
	if (!isUInt<5>(MI->getOperand(1).getImm()))	// and Extra_IsConvergent = 32.
		if (!isUInt<6>(MI->getOperand(1).getImm()))
		tstellarAMDUnsubmitted Done Reply Inline Actions This line is more than 80 characters. tstellarAMD: This line is more than 80 characters.
	report("Unknown asm flags", &MI->getOperand(1), 1);	report("Unknown asm flags", &MI->getOperand(1), 1);

	static_assert(InlineAsm::MIOp_FirstOperand == 2, "Asm format changed");	static_assert(InlineAsm::MIOp_FirstOperand == 2, "Asm format changed");
Context not available.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

Context not available.
	ExtraInfo \|= InlineAsm::Extra_HasSideEffects;	ExtraInfo \|= InlineAsm::Extra_HasSideEffects;
	if (IA->isAlignStack())	if (IA->isAlignStack())
	ExtraInfo \|= InlineAsm::Extra_IsAlignStack;	ExtraInfo \|= InlineAsm::Extra_IsAlignStack;
		if (CS.isConvergent())
		ExtraInfo \|= InlineAsm::Extra_IsConvergent;
	// Set the asm dialect.	// Set the asm dialect.
	ExtraInfo \|= IA->getDialect() * InlineAsm::Extra_AsmDialect;	ExtraInfo \|= IA->getDialect() * InlineAsm::Extra_AsmDialect;

Context not available.

test/CodeGen/AMDGPU/convergent-inlineasm.ll

This file was added.

				; RUN: llc -mtriple=amdgcn--amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s
				jlebarUnsubmitted Not Done Reply Inline Actions There is no need for a check-prefix. You can get rid of that and s/GCN/CHECK/ everywhere in this test. I would also suggest changing this test so that the CHECKs immediately precede the relevant lines of IR. But more to the point, does this test fail without your change and pass with your change? I don't see how this is checking for convergence at all. I would recommend trying to write the minimal test case that checks for the thing you're changing -- this does not seem to be it, to me. (In fact I'm surprised this works at all, since the IR doesn't define the attributes #1 and #3...) jlebar: There is no need for a check-prefix. You can get rid of that and s/GCN/CHECK/ everywhere in…
				wdngAuthorUnsubmitted Not Done Reply Inline Actions If there is a convergence flag, we need to check that the inline assembly is still in the entry block: ; BB#0: v_mov_b32_e32 v1, 1 ;;#ASMSTART v_cmp_ne_i32_e64 s[2:3], 0, v1 ;;#ASMEND v_cmp_eq_i32_e32 vcc, 8, v0 s_and_saveexec_b64 s[0:1], vcc s_xor_b64 s[0:1], exec, s[0:1] ; BB#1: . . . ; BB0_2: . . . That's why GCN CHECK is used in the reference assembly. wdng: If there is a convergence flag, we need to check that the inline assembly is still in the entry…
				jlebarUnsubmitted Not Done Reply Inline Actions OK, thanks for the explanation. As described, this is an end-to-end test, of the sort that we usually do not write in llvm. (Or at least, we do not rely on them as the exclusive means for checking a patch's correctness.) The problem is, what you've described is not strictly about testing the convergence of an instruction, but rather about checking that convergence prevents a certain optimization that would otherwise run. But if we change llvm so that this optimization no longer runs (seems reasonable, particularly since we're not running llc -O2 or anything), then your test will always pass, even without the fix you're making in this patch. At the very least, there needs to be a comment explaining what the test is checking. But again, it's fragile as written, and that imposes a cost on all maintainers. So if there is a simpler way to check that your change does the right thing, I would very much prefer that. I guess the good news is, to do what Tom suggested in his last comment, you're probably going to want a different set of tests anyway. :) jlebar: OK, thanks for the explanation. As described, this is an end-to-end test, of the sort that we…
				tstellarAMDUnsubmitted Done Reply Inline Actions This test is missing CHECK lines. tstellarAMD: This test is missing CHECK lines.

				declare i32 @llvm.amdgcn.workitem.id.x() #0

				define void @barney(i64 addrspace(1)* nocapture %arg) {
				arsenmUnsubmitted Done Reply Inline Actions function name should be changed to something about what it is testing arsenm: function name should be changed to something about what it is testing
				bb:
				%tmp = call i32 @llvm.amdgcn.workitem.id.x() #1
				tstellarAMDUnsubmitted Done Reply Inline Actions It's a little strange to have the call instruction have a different attribute set than the declaration. I would change this to use the same attributes as the declaration. tstellarAMD: It's a little strange to have the call instruction have a different attribute set than the…
				arsenmUnsubmitted Done Reply Inline Actions The call site attributes can be removed here arsenm: The call site attributes can be removed here
				%tmp1 = tail call i64 asm "v_cmp_ne_i32_e64 $0, 0, $1", "=s,v"(i32 1) #2
				%tmp2 = icmp eq i32 %tmp, 8
				br i1 %tmp2, label %bb3, label %bb5

				bb3: ; preds = %bb
				%tmp4 = getelementptr i64, i64 addrspace(1)* %arg, i32 %tmp
				store i64 %tmp1, i64 addrspace(1)* %arg, align 8
				br label %bb5
				tstellarAMDUnsubmitted Done Reply Inline Actions There should be an explicit name on this block. I would recommend running this whole test through opt -metarenamer. tstellarAMD: There should be an explicit name on this block. I would recommend running this whole test…

				bb5: ; preds = %bb3, %bb
				ret void
				}

				attributes #0 = { nounwind readnone }
				attributes #1 = { readnone }
				attributes #2 = { convergent nounwind readnone }
				tstellarAMDUnsubmitted Done Reply Inline Actions This can be dropped. tstellarAMD: This can be dropped.
				tstellarAMDUnsubmitted Done Reply Inline Actions I mean you can drop the #0 attributes. I think the rest are OK, but I can't see the whole patch. tstellarAMD: I mean you can drop the #0 attributes. I think the rest are OK, but I can't see the whole…
				arsenmUnsubmitted Not Done Reply Inline Actions I think the test has some trailing whitespace arsenm: I think the test has some trailing whitespace

This is an archive of the discontinued LLVM Phabricator instance.

Target: Add convergent flag to INLINEASM instruction
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 61270

include/llvm/IR/InlineAsm.h

lib/CodeGen/MachineInstr.cpp

lib/CodeGen/MachineVerifier.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

test/CodeGen/AMDGPU/convergent-inlineasm.ll

This is an archive of the discontinued LLVM Phabricator instance.

Target: Add convergent flag to INLINEASM instructionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 61270

include/llvm/IR/InlineAsm.h

lib/CodeGen/MachineInstr.cpp

lib/CodeGen/MachineVerifier.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

test/CodeGen/AMDGPU/convergent-inlineasm.ll

Target: Add convergent flag to INLINEASM instruction
ClosedPublic