This is an archive of the discontinued LLVM Phabricator instance.

Differential D22025

AMDGPU/SI: Do not insert EndCf in an unreachable block
ClosedPublic

Authored by cfang on Jul 5 2016, 4:45 PM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
arsenm

Commits

rG6b49fa4ca714: AMDGPU/SI: Do not insert EndCf in an unreachable block
rL297243: AMDGPU/SI: Do not insert EndCf in an unreachable block

Summary

We may have a better solution. But it make non sense to insert the EndCf in the unreachable block when the unreachable instruction is the first instruction in the block.

Diff Detail

Event Timeline

cfang updated this revision to Diff 62803.Jul 5 2016, 4:45 PM

cfang retitled this revision from to AMDGPU/SI: Do not insert EndCf in an unreachable block.

cfang updated this object.

cfang added reviewers: arsenm, • tstellarAMD.

cfang added subscribers: arsenm, llvm-commits.

Herald added a subscriber: kzhuravl. · View Herald TranscriptJul 5 2016, 4:45 PM

I don't agree that it doesn't make sense to put the end.cf into an unreachable block. If it ordinarily would go into a block in that place, it makes sense. This needs IR checklines for the intrinsic insertion points

In D22025#474703, @arsenm wrote:

I don't agree that it doesn't make sense to put the end.cf into an unreachable block. If it ordinarily would go into a block in that place, it makes sense. This needs IR checklines for the intrinsic insertion points

This needs tome tests where there are instructions before the unreachable. If other instruction like an abort were there, the end cf still needs to be inserted

I think the key is the FirstInsertionPt in the BB is a Unreachable. This should essentially guarantee no other instructions before this Unreachable (PHI, Landing Pad?).
I think we are safe here not to insert the end cf. And we may need come informative comments here.

Update the test.

This patch fixes the "Instruction does not dominate all uses!" when we try to insert EndCF in an unreachable block where unreachable is
the first instruction in the block. Actually we don't insert ENDCF in such case.

However if there are instructions before unreachable in the block, we should insert the EndCF intrinsics, and we will hit
the "Instruction does not dominate all uses!" issue.

arsenm added inline comments.Jul 22 2016, 10:22 AM

test/CodeGen/AMDGPU/si-annotate-cf-unreachable.ll
31–35	There should still be another copy of this function with blocks with code (like a volatile store) before the unreachable

cfang added inline comments.Jul 22 2016, 10:37 AM

test/CodeGen/AMDGPU/si-annotate-cf-unreachable.ll
31–35	The issue is: for an unreachable block, if we insert EndCf, we will hit "Instruction does not dominate all uses!" assert! The "fix" here is actually just a workaround: if there is no instruction before unreachable, we don't insert EndCf. If as you suggested there are instructions before unreachable, we will have to insert endCF, and the problem is still there. This is a special case in control flow that both "else" and "then" is unreachable. Maybe we need to insert endCF on both paths.

arsenm added inline comments.Jul 22 2016, 11:16 AM

test/CodeGen/AMDGPU/si-annotate-cf-unreachable.ll
31–35	You check at the insertion point if it is unreachable, not that the block ends in unreachable. That's what I want the other test for, to make sure it doesn't crash in that case. I think what you might really want to be doing is checking for no successors

cfang added inline comments.Jul 28 2016, 4:53 PM

test/CodeGen/AMDGPU/si-annotate-cf-unreachable.ll
31–35	Can you use a simple example to show me what do you want to test? I am afraid that the test you request should still fail with this patch.

arsenm added inline comments.Jul 28 2016, 4:56 PM

test/CodeGen/AMDGPU/si-annotate-cf-unreachable.ll
31–35	bb4: store volatile i32 0, i32 addrspace(1)* undef unreachable This patch should just be an optimization? It is not correct to do something for correctness assuming there is only one instruction in the block ending in unreachable

LGTM, I need this for other related fixes. This doesn't really solve the underlying issue here though

This revision is now accepted and ready to land.Mar 3 2017, 8:20 PM

In D22025#692198, @arsenm wrote:

LGTM, I need this for other related fixes. This doesn't really solve the underlying issue here though

I still have a new LIT failure on ret_jump.ll to fix!

Closed by commit rL297243: AMDGPU/SI: Do not insert EndCf in an unreachable block (authored by chfang). · Explain WhyMar 7 2017, 3:41 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

SIAnnotateControlFlow.cpp

6 lines

test/

CodeGen/

AMDGPU/

ret_jump.ll

1 line

si-annotate-cf-unreachable.ll

34 lines

Diff 62803

lib/Target/AMDGPU/SIAnnotateControlFlow.cpp

	Show All 12 Lines
	std::vector<BasicBlock*> Preds;			std::vector<BasicBlock*> Preds;
	for (pred_iterator PI = pred_begin(BB), PE = pred_end(BB); PI != PE; ++PI) {			for (pred_iterator PI = pred_begin(BB), PE = pred_end(BB); PI != PE; ++PI) {
	if (std::find(Latches.begin(), Latches.end(), *PI) == Latches.end())			if (std::find(Latches.begin(), Latches.end(), *PI) == Latches.end())
	Preds.push_back(*PI);			Preds.push_back(*PI);
	}			}
	BB = llvm::SplitBlockPredecessors(BB, Preds, "endcf.split", DT, LI, false);			BB = llvm::SplitBlockPredecessors(BB, Preds, "endcf.split", DT, LI, false);
	}			}

	Value *Exec = popSaved();			Instruction FirstInsertionPt = &BB->getFirstInsertionPt();
	if (!isa<UndefValue>(Exec))			if (!isa<UndefValue>(Exec) && !isa<UnreachableInst>(FirstInsertionPt))
	CallInst::Create(EndCf, Exec, "", &*BB->getFirstInsertionPt());			CallInst::Create(EndCf, Exec, "", FirstInsertionPt);
	}			}

	/// \brief Annotate the control flow with intrinsics so the backend can			/// \brief Annotate the control flow with intrinsics so the backend can
	/// recognize if/then/else and loops.			/// recognize if/then/else and loops.
	bool SIAnnotateControlFlow::runOnFunction(Function &F) {			bool SIAnnotateControlFlow::runOnFunction(Function &F) {

	DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();			DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
	LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();			LI = &getAnalysis<LoopInfoWrapperPass>().getLoopInfo();
	Show All 12 Lines

test/CodeGen/AMDGPU/ret_jump.ll

	Show All 9 Lines
	; GCN: s_and_saveexec_b64 [[SAVE_EXEC:s\[[0-9]+:[0-9]+\]]], vcc			; GCN: s_and_saveexec_b64 [[SAVE_EXEC:s\[[0-9]+:[0-9]+\]]], vcc
	; GCN-NEXT: s_xor_b64 [[XOR_EXEC:s\[[0-9]+:[0-9]+\]]], exec, [[SAVE_EXEC]]			; GCN-NEXT: s_xor_b64 [[XOR_EXEC:s\[[0-9]+:[0-9]+\]]], exec, [[SAVE_EXEC]]
	; GCN-NEXT: ; mask branch [[UNREACHABLE_BB:BB[0-9]+_[0-9]+]]			; GCN-NEXT: ; mask branch [[UNREACHABLE_BB:BB[0-9]+_[0-9]+]]

	; GCN: [[RET_BB]]:			; GCN: [[RET_BB]]:
	; GCN-NEXT: ; return			; GCN-NEXT: ; return

	; GCN-NEXT: [[UNREACHABLE_BB]]:			; GCN-NEXT: [[UNREACHABLE_BB]]:
	; GCN-NEXT: s_or_b64 exec, exec, [[XOR_EXEC]]
	; GCN-NEXT: .Lfunc_end0			; GCN-NEXT: .Lfunc_end0
	define amdgpu_ps <{ i32, i32, i32, i32, i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @main([9 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [17 x <8 x i32>] addrspace(2)* byval, i32 addrspace(2)* byval, float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #0 {			define amdgpu_ps <{ i32, i32, i32, i32, i32, i32, i32, i32, i32, float, float, float, float, float, float, float, float, float, float, float, float, float, float }> @main([9 x <16 x i8>] addrspace(2)* byval, [17 x <16 x i8>] addrspace(2)* byval, [17 x <8 x i32>] addrspace(2)* byval, i32 addrspace(2)* byval, float inreg, i32 inreg, <2 x i32>, <2 x i32>, <2 x i32>, <3 x i32>, <2 x i32>, <2 x i32>, <2 x i32>, float, float, float, float, float, i32, i32, float, i32) #0 {
	main_body:			main_body:
	%p83 = call float @llvm.SI.fs.interp(i32 1, i32 0, i32 %5, <2 x i32> %7)			%p83 = call float @llvm.SI.fs.interp(i32 1, i32 0, i32 %5, <2 x i32> %7)
	%p87 = fmul float undef, %p83			%p87 = fmul float undef, %p83
	%p88 = fadd float %p87, undef			%p88 = fadd float %p87, undef
	%p93 = fadd float %p88, undef			%p93 = fadd float %p88, undef
	%p97 = fmul float %p93, undef			%p97 = fmul float %p93, undef
	Show All 12 Lines

test/CodeGen/AMDGPU/si-annotate-cf-unreachable.ll

This file was added.

				; RUN: llc < %s -march=amdgcn -mcpu=fiji -verify-machineinstrs \| FileCheck %s

				; FIXME: should emit s_endpgm
				; CHECK-LABEL: {{^}}no_endcf_in_unreachable_block:
				; CHECK: s_and_saveexec_b64
				; CHECK-NOT: s_endpgm
				; CHECK: .Lfunc_end0
				define void @no_endcf_in_unreachable_block(<4 x float> addrspace(1)* noalias nocapture readonly %arg) #0 {
				bb:
				%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
				br label %bb1

				bb1: ; preds = %bb
				%tmp2 = sext i32 %tmp to i64
				%tmp3 = getelementptr inbounds <4 x float>, <4 x float> addrspace(1)* %arg, i64 %tmp2
				%tmp4 = load <4 x float>, <4 x float> addrspace(1)* %tmp3, align 16
				br i1 undef, label %bb3, label %bb5 ; label order reversed

				bb3: ; preds = %bb1
				%tmp6 = extractelement <4 x float> %tmp4, i32 2
				%tmp7 = fcmp olt float %tmp6, 0.000000e+00
				br i1 %tmp7, label %bb4, label %bb5

				bb4: ; preds = %bb3
				unreachable

				bb5: ; preds = %bb3, %bb1
				unreachable
				}

				declare i32 @llvm.amdgcn.workitem.id.x() #1

				attributes #0 = { nounwind }
				attributes #1 = { nounwind readnone }