Diff 486325

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp

Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	void AMDGPUUnifyDivergentExitNodes::getAnalysisUsage(AnalysisUsage &AU) const{

// No divergent values are changed, only blocks and branch edges.		// No divergent values are changed, only blocks and branch edges.
AU.addPreserved<LegacyDivergenceAnalysis>();		AU.addPreserved<LegacyDivergenceAnalysis>();

// We preserve the non-critical-edgeness property		// We preserve the non-critical-edgeness property
AU.addPreservedID(BreakCriticalEdgesID);		AU.addPreservedID(BreakCriticalEdgesID);

// This is a cluster of orthogonal Transforms		// This is a cluster of orthogonal Transforms
AU.addPreservedID(LowerSwitchID);		AU.addPreservedID(LowerSwitchID);
		arsenmUnsubmitted Not Done Reply Inline Actions We should have a required LowerSwitchID too arsenm: We should have a required LowerSwitchID too
		gandhi21299AuthorUnsubmitted Done Reply Inline Actions I will have a seperate patch for that, it seems to be causing difficulties when the pass manager schedules UnifyDivergentExitNodes. gandhi21299: I will have a seperate patch for that, it seems to be causing difficulties when the pass…
		ruilingUnsubmitted Not Done Reply Inline Actions I think for function pass dependency or pass ordering, I still prefer they are managed by compiler developer. If I remember correctly, the new pass manager does not support dependency between function passes? ruiling: I think for function pass dependency or pass ordering, I still prefer they are managed by…
		arsenmUnsubmitted Not Done Reply Inline Actions The important part is verification. We shouldn't have arbitrary pass contracts not enforced by a verifier arsenm: The important part is verification. We shouldn't have arbitrary pass contracts not enforced by…
		ruilingUnsubmitted Not Done Reply Inline Actions I agree that pass contracts or assumption should be enforced by verification. For this specific issue, can we verify within this pass that a terminator should not be SwitchInst? ruiling: I agree that pass contracts or assumption should be enforced by verification. For this specific…
		arsenmUnsubmitted Not Done Reply Inline Actions That's what I was asking for for switch handling (should also worry about indirectbr, caller and invoke) arsenm: That's what I was asking for for switch handling (should also worry about indirectbr, caller…
FunctionPass::getAnalysisUsage(AU);		FunctionPass::getAnalysisUsage(AU);

AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
}		}

/// \returns true if \p BB is reachable through only uniform branches.		/// \returns true if \p BB is reachable through only uniform branches.
/// XXX - Is there a more efficient way to find this?		/// XXX - Is there a more efficient way to find this?
static bool isUniformlyReached(const LegacyDivergenceAnalysis &DA,		static bool isUniformlyReached(const LegacyDivergenceAnalysis &DA,
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
}		}

bool AMDGPUUnifyDivergentExitNodes::runOnFunction(Function &F) {		bool AMDGPUUnifyDivergentExitNodes::runOnFunction(Function &F) {
DominatorTree *DT = nullptr;		DominatorTree *DT = nullptr;
if (RequireAndPreserveDomTree)		if (RequireAndPreserveDomTree)
DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();		DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();

auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();		auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
		if (PDT.root_size() == 0 \|\|
// If there's only one exit, we don't need to do anything.		(PDT.root_size() == 1 &&
if (PDT.root_size() <= 1)		!isa<BranchInst>(PDT.getRoot()->getTerminator())))
		arsenmUnsubmitted Done Reply Inline Actions What about switches? arsenm: What about switches?
return false;		return false;

LegacyDivergenceAnalysis &DA = getAnalysis<LegacyDivergenceAnalysis>();		LegacyDivergenceAnalysis &DA = getAnalysis<LegacyDivergenceAnalysis>();
TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);		TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);

// Loop over all of the blocks in a function, tracking all of the blocks that		// Loop over all of the blocks in a function, tracking all of the blocks that
// return.		// return.
SmallVector<BasicBlock *, 4> ReturningBlocks;		SmallVector<BasicBlock *, 4> ReturningBlocks;
▲ Show 20 Lines • Show All 129 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll

	Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines
	; GFX908-NEXT: v_readfirstlane_b32 s5, v16			; GFX908-NEXT: v_readfirstlane_b32 s5, v16
	; GFX908-NEXT: s_and_b32 s5, 0xffff, s5			; GFX908-NEXT: s_and_b32 s5, 0xffff, s5
	; GFX908-NEXT: s_mul_i32 s1, s1, s5			; GFX908-NEXT: s_mul_i32 s1, s1, s5
	; GFX908-NEXT: s_mul_hi_u32 s9, s0, s5			; GFX908-NEXT: s_mul_hi_u32 s9, s0, s5
	; GFX908-NEXT: s_mul_i32 s0, s0, s5			; GFX908-NEXT: s_mul_i32 s0, s0, s5
	; GFX908-NEXT: s_add_i32 s1, s9, s1			; GFX908-NEXT: s_add_i32 s1, s9, s1
	; GFX908-NEXT: s_lshl_b64 s[0:1], s[0:1], 5			; GFX908-NEXT: s_lshl_b64 s[0:1], s[0:1], 5
	; GFX908-NEXT: s_branch .LBB3_2			; GFX908-NEXT: s_branch .LBB3_2
	; GFX908-NEXT: .LBB3_1: ; %bb12			; GFX908-NEXT: .LBB3_1: ; %Flow20
	; GFX908-NEXT: ; in Loop: Header=BB3_2 Depth=1			; GFX908-NEXT: ; in Loop: Header=BB3_2 Depth=1
	; GFX908-NEXT: s_add_u32 s6, s6, s4			; GFX908-NEXT: s_andn2_b64 vcc, exec, s[14:15]
	; GFX908-NEXT: s_addc_u32 s7, s7, 0			; GFX908-NEXT: s_cbranch_vccz .LBB3_12
	; GFX908-NEXT: s_add_u32 s10, s10, s12
	; GFX908-NEXT: s_addc_u32 s11, s11, s13
	; GFX908-NEXT: .LBB3_2: ; %bb9			; GFX908-NEXT: .LBB3_2: ; %bb9
	; GFX908-NEXT: ; =>This Loop Header: Depth=1			; GFX908-NEXT: ; =>This Loop Header: Depth=1
	; GFX908-NEXT: ; Child Loop BB3_5 Depth 2			; GFX908-NEXT: ; Child Loop BB3_5 Depth 2
	; GFX908-NEXT: s_cbranch_scc0 .LBB3_1			; GFX908-NEXT: s_mov_b64 s[16:17], -1
				; GFX908-NEXT: s_cbranch_scc0 .LBB3_10
	; GFX908-NEXT: ; %bb.3: ; %bb14			; GFX908-NEXT: ; %bb.3: ; %bb14
	; GFX908-NEXT: ; in Loop: Header=BB3_2 Depth=1			; GFX908-NEXT: ; in Loop: Header=BB3_2 Depth=1
	; GFX908-NEXT: global_load_dwordx2 v[2:3], v[0:1], off			; GFX908-NEXT: global_load_dwordx2 v[2:3], v[0:1], off
	; GFX908-NEXT: s_mov_b32 s9, s8			; GFX908-NEXT: s_mov_b32 s9, s8
	; GFX908-NEXT: v_mov_b32_e32 v4, s8			; GFX908-NEXT: v_mov_b32_e32 v4, s8
	; GFX908-NEXT: v_mov_b32_e32 v6, s8
	; GFX908-NEXT: v_mov_b32_e32 v8, s8			; GFX908-NEXT: v_mov_b32_e32 v8, s8
				; GFX908-NEXT: v_mov_b32_e32 v6, s8
	; GFX908-NEXT: v_mov_b32_e32 v5, s9			; GFX908-NEXT: v_mov_b32_e32 v5, s9
	; GFX908-NEXT: v_mov_b32_e32 v7, s9
	; GFX908-NEXT: v_mov_b32_e32 v9, s9			; GFX908-NEXT: v_mov_b32_e32 v9, s9
				; GFX908-NEXT: v_mov_b32_e32 v7, s9
	; GFX908-NEXT: v_cmp_lt_i64_e64 s[14:15], s[6:7], 0			; GFX908-NEXT: v_cmp_lt_i64_e64 s[14:15], s[6:7], 0
				; GFX908-NEXT: v_cmp_gt_i64_e64 s[16:17], s[6:7], -1
	; GFX908-NEXT: v_mov_b32_e32 v11, v5			; GFX908-NEXT: v_mov_b32_e32 v11, v5
	; GFX908-NEXT: s_mov_b64 s[16:17], s[10:11]			; GFX908-NEXT: s_mov_b64 s[20:21], s[10:11]
	; GFX908-NEXT: v_mov_b32_e32 v10, v4			; GFX908-NEXT: v_mov_b32_e32 v10, v4
	; GFX908-NEXT: s_waitcnt vmcnt(0)			; GFX908-NEXT: s_waitcnt vmcnt(0)
	; GFX908-NEXT: v_readfirstlane_b32 s5, v2			; GFX908-NEXT: v_readfirstlane_b32 s5, v2
	; GFX908-NEXT: v_readfirstlane_b32 s9, v3			; GFX908-NEXT: v_readfirstlane_b32 s9, v3
	; GFX908-NEXT: s_add_u32 s5, s5, 1			; GFX908-NEXT: s_add_u32 s5, s5, 1
	; GFX908-NEXT: s_addc_u32 s9, s9, 0			; GFX908-NEXT: s_addc_u32 s9, s9, 0
	; GFX908-NEXT: s_mul_hi_u32 s19, s2, s5			; GFX908-NEXT: s_mul_hi_u32 s19, s2, s5
	; GFX908-NEXT: s_mul_i32 s20, s3, s5			; GFX908-NEXT: s_mul_i32 s22, s3, s5
	; GFX908-NEXT: s_mul_i32 s18, s2, s5			; GFX908-NEXT: s_mul_i32 s18, s2, s5
	; GFX908-NEXT: s_mul_i32 s5, s2, s9			; GFX908-NEXT: s_mul_i32 s5, s2, s9
	; GFX908-NEXT: s_add_i32 s5, s19, s5			; GFX908-NEXT: s_add_i32 s5, s19, s5
	; GFX908-NEXT: s_add_i32 s5, s5, s20			; GFX908-NEXT: s_add_i32 s5, s5, s22
	; GFX908-NEXT: s_branch .LBB3_5			; GFX908-NEXT: s_branch .LBB3_5
	; GFX908-NEXT: .LBB3_4: ; %bb58			; GFX908-NEXT: .LBB3_4: ; %bb58
	; GFX908-NEXT: ; in Loop: Header=BB3_5 Depth=2			; GFX908-NEXT: ; in Loop: Header=BB3_5 Depth=2
	; GFX908-NEXT: v_add_co_u32_sdwa v2, vcc, v2, v16 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0			; GFX908-NEXT: v_add_co_u32_sdwa v2, vcc, v2, v16 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
	; GFX908-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc			; GFX908-NEXT: v_addc_co_u32_e32 v3, vcc, 0, v3, vcc
	; GFX908-NEXT: v_cmp_gt_i64_e32 vcc, 0, v[2:3]			; GFX908-NEXT: s_add_u32 s20, s20, s0
	; GFX908-NEXT: s_add_u32 s16, s16, s0			; GFX908-NEXT: v_cmp_lt_i64_e64 s[24:25], -1, v[2:3]
	; GFX908-NEXT: s_addc_u32 s17, s17, s1			; GFX908-NEXT: s_addc_u32 s21, s21, s1
	; GFX908-NEXT: s_cbranch_vccz .LBB3_1			; GFX908-NEXT: s_mov_b64 s[22:23], 0
				; GFX908-NEXT: s_andn2_b64 vcc, exec, s[24:25]
				; GFX908-NEXT: s_cbranch_vccz .LBB3_9
	; GFX908-NEXT: .LBB3_5: ; %bb16			; GFX908-NEXT: .LBB3_5: ; %bb16
	; GFX908-NEXT: ; Parent Loop BB3_2 Depth=1			; GFX908-NEXT: ; Parent Loop BB3_2 Depth=1
	; GFX908-NEXT: ; => This Inner Loop Header: Depth=2			; GFX908-NEXT: ; => This Inner Loop Header: Depth=2
	; GFX908-NEXT: s_add_u32 s20, s16, s18			; GFX908-NEXT: s_add_u32 s22, s20, s18
	; GFX908-NEXT: s_addc_u32 s21, s17, s5			; GFX908-NEXT: s_addc_u32 s23, s21, s5
	; GFX908-NEXT: global_load_dword v21, v19, s[20:21] offset:-12 glc			; GFX908-NEXT: global_load_dword v21, v19, s[22:23] offset:-12 glc
	; GFX908-NEXT: s_waitcnt vmcnt(0)			; GFX908-NEXT: s_waitcnt vmcnt(0)
	; GFX908-NEXT: global_load_dword v20, v19, s[20:21] offset:-8 glc			; GFX908-NEXT: global_load_dword v20, v19, s[22:23] offset:-8 glc
	; GFX908-NEXT: s_waitcnt vmcnt(0)			; GFX908-NEXT: s_waitcnt vmcnt(0)
	; GFX908-NEXT: global_load_dword v12, v19, s[20:21] offset:-4 glc			; GFX908-NEXT: global_load_dword v12, v19, s[22:23] offset:-4 glc
	; GFX908-NEXT: s_waitcnt vmcnt(0)			; GFX908-NEXT: s_waitcnt vmcnt(0)
	; GFX908-NEXT: global_load_dword v12, v19, s[20:21] glc			; GFX908-NEXT: global_load_dword v12, v19, s[22:23] glc
	; GFX908-NEXT: s_waitcnt vmcnt(0)			; GFX908-NEXT: s_waitcnt vmcnt(0)
	; GFX908-NEXT: ds_read_b64 v[12:13], v19			; GFX908-NEXT: ds_read_b64 v[12:13], v19
	; GFX908-NEXT: ds_read_b64 v[14:15], v0			; GFX908-NEXT: ds_read_b64 v[14:15], v0
	; GFX908-NEXT: s_and_b64 vcc, exec, s[14:15]			; GFX908-NEXT: s_andn2_b64 vcc, exec, s[16:17]
	; GFX908-NEXT: s_waitcnt lgkmcnt(0)			; GFX908-NEXT: s_waitcnt lgkmcnt(0)
	; GFX908-NEXT: s_cbranch_vccnz .LBB3_4			; GFX908-NEXT: s_cbranch_vccnz .LBB3_7
	; GFX908-NEXT: ; %bb.6: ; %bb51			; GFX908-NEXT: ; %bb.6: ; %bb51
	; GFX908-NEXT: ; in Loop: Header=BB3_5 Depth=2			; GFX908-NEXT: ; in Loop: Header=BB3_5 Depth=2
	; GFX908-NEXT: v_cvt_f32_f16_sdwa v22, v21 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1			; GFX908-NEXT: v_cvt_f32_f16_sdwa v22, v21 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
	; GFX908-NEXT: v_cvt_f32_f16_e32 v21, v21			; GFX908-NEXT: v_cvt_f32_f16_e32 v21, v21
	; GFX908-NEXT: v_cvt_f32_f16_sdwa v23, v20 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1			; GFX908-NEXT: v_cvt_f32_f16_sdwa v23, v20 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
	; GFX908-NEXT: v_cvt_f32_f16_e32 v20, v20			; GFX908-NEXT: v_cvt_f32_f16_e32 v20, v20
	; GFX908-NEXT: v_add_f32_e32 v24, v17, v12			; GFX908-NEXT: v_add_f32_e32 v24, v17, v12
	; GFX908-NEXT: v_add_f32_e32 v25, v18, v13			; GFX908-NEXT: v_add_f32_e32 v25, v18, v13
	; GFX908-NEXT: v_add_f32_e32 v26, 0, v12			; GFX908-NEXT: v_add_f32_e32 v26, 0, v12
	; GFX908-NEXT: v_add_f32_e32 v27, 0, v13			; GFX908-NEXT: v_add_f32_e32 v27, 0, v13
	; GFX908-NEXT: v_add_f32_e32 v15, v22, v15			; GFX908-NEXT: v_add_f32_e32 v15, v22, v15
	; GFX908-NEXT: v_add_f32_e32 v14, v21, v14			; GFX908-NEXT: v_add_f32_e32 v14, v21, v14
	; GFX908-NEXT: v_add_f32_e32 v13, v23, v13			; GFX908-NEXT: v_add_f32_e32 v13, v23, v13
	; GFX908-NEXT: v_add_f32_e32 v12, v20, v12			; GFX908-NEXT: v_add_f32_e32 v12, v20, v12
	; GFX908-NEXT: v_add_f32_e32 v5, v5, v25			; GFX908-NEXT: v_add_f32_e32 v5, v5, v25
	; GFX908-NEXT: v_add_f32_e32 v4, v4, v24			; GFX908-NEXT: v_add_f32_e32 v4, v4, v24
	; GFX908-NEXT: v_add_f32_e32 v7, v7, v27			; GFX908-NEXT: v_add_f32_e32 v9, v9, v27
	; GFX908-NEXT: v_add_f32_e32 v6, v6, v26			; GFX908-NEXT: v_add_f32_e32 v8, v8, v26
	; GFX908-NEXT: v_add_f32_e32 v8, v8, v14			; GFX908-NEXT: v_add_f32_e32 v6, v6, v14
	; GFX908-NEXT: v_add_f32_e32 v9, v9, v15			; GFX908-NEXT: v_add_f32_e32 v7, v7, v15
	; GFX908-NEXT: v_add_f32_e32 v10, v10, v12			; GFX908-NEXT: v_add_f32_e32 v10, v10, v12
	; GFX908-NEXT: v_add_f32_e32 v11, v11, v13			; GFX908-NEXT: v_add_f32_e32 v11, v11, v13
				; GFX908-NEXT: s_mov_b64 s[22:23], -1
	; GFX908-NEXT: s_branch .LBB3_4			; GFX908-NEXT: s_branch .LBB3_4
				ruilingUnsubmitted Not Done Reply Inline Actions Why there is no DummyReturnBlock for GFX908? ruiling: Why there is no DummyReturnBlock for GFX908?
				ruilingUnsubmitted Not Done Reply Inline Actions Did you try to get the answer for the question? It sounds strange we get different behavior for gfx908 and gfx90A here. ruiling: Did you try to get the answer for the question? It sounds strange we get different behavior for…
				gandhi21299AuthorUnsubmitted Done Reply Inline Actions In gfx908, the block is eliminated much later in the pipeline which does not happen in gfx90a. gandhi21299: In gfx908, the block is eliminated much later in the pipeline which does not happen in gfx90a.
				ruilingUnsubmitted Not Done Reply Inline Actions Thanks for taking a second look! ruiling: Thanks for taking a second look!
	;			;
	; GFX90A-LABEL: introduced_copy_to_sgpr:			; GFX90A-LABEL: introduced_copy_to_sgpr:
	; GFX90A: ; %bb.0: ; %bb			; GFX90A: ; %bb.0: ; %bb
	; GFX90A-NEXT: global_load_ushort v18, v[0:1], off glc			; GFX90A-NEXT: global_load_ushort v18, v[0:1], off glc
	; GFX90A-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0			; GFX90A-NEXT: s_load_dwordx4 s[0:3], s[4:5], 0x0
	; GFX90A-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x10			; GFX90A-NEXT: s_load_dwordx2 s[6:7], s[4:5], 0x10
	; GFX90A-NEXT: s_load_dword s9, s[4:5], 0x18			; GFX90A-NEXT: s_load_dword s9, s[4:5], 0x18
	; GFX90A-NEXT: s_mov_b32 s8, 0			; GFX90A-NEXT: s_mov_b32 s8, 0
	Show All 32 Lines
	; GFX90A-NEXT: v_readfirstlane_b32 s5, v18			; GFX90A-NEXT: v_readfirstlane_b32 s5, v18
	; GFX90A-NEXT: s_and_b32 s5, 0xffff, s5			; GFX90A-NEXT: s_and_b32 s5, 0xffff, s5
	; GFX90A-NEXT: s_mul_i32 s1, s1, s5			; GFX90A-NEXT: s_mul_i32 s1, s1, s5
	; GFX90A-NEXT: s_mul_hi_u32 s9, s0, s5			; GFX90A-NEXT: s_mul_hi_u32 s9, s0, s5
	; GFX90A-NEXT: s_mul_i32 s0, s0, s5			; GFX90A-NEXT: s_mul_i32 s0, s0, s5
	; GFX90A-NEXT: s_add_i32 s1, s9, s1			; GFX90A-NEXT: s_add_i32 s1, s9, s1
	; GFX90A-NEXT: s_lshl_b64 s[0:1], s[0:1], 5			; GFX90A-NEXT: s_lshl_b64 s[0:1], s[0:1], 5
	; GFX90A-NEXT: s_branch .LBB3_2			; GFX90A-NEXT: s_branch .LBB3_2
	; GFX90A-NEXT: .LBB3_1: ; %bb12			; GFX90A-NEXT: .LBB3_1: ; %Flow20
	; GFX90A-NEXT: ; in Loop: Header=BB3_2 Depth=1			; GFX90A-NEXT: ; in Loop: Header=BB3_2 Depth=1
	; GFX90A-NEXT: s_add_u32 s6, s6, s4			; GFX90A-NEXT: s_andn2_b64 vcc, exec, s[14:15]
	; GFX90A-NEXT: s_addc_u32 s7, s7, 0			; GFX90A-NEXT: s_cbranch_vccz .LBB3_12
	; GFX90A-NEXT: s_add_u32 s10, s10, s12
	; GFX90A-NEXT: s_addc_u32 s11, s11, s13
	; GFX90A-NEXT: .LBB3_2: ; %bb9			; GFX90A-NEXT: .LBB3_2: ; %bb9
	; GFX90A-NEXT: ; =>This Loop Header: Depth=1			; GFX90A-NEXT: ; =>This Loop Header: Depth=1
	; GFX90A-NEXT: ; Child Loop BB3_5 Depth 2			; GFX90A-NEXT: ; Child Loop BB3_5 Depth 2
	; GFX90A-NEXT: s_cbranch_scc0 .LBB3_1			; GFX90A-NEXT: s_mov_b64 s[16:17], -1
				; GFX90A-NEXT: s_cbranch_scc0 .LBB3_10
	; GFX90A-NEXT: ; %bb.3: ; %bb14			; GFX90A-NEXT: ; %bb.3: ; %bb14
	; GFX90A-NEXT: ; in Loop: Header=BB3_2 Depth=1			; GFX90A-NEXT: ; in Loop: Header=BB3_2 Depth=1
	; GFX90A-NEXT: global_load_dwordx2 v[4:5], v[2:3], off			; GFX90A-NEXT: global_load_dwordx2 v[4:5], v[2:3], off
	; GFX90A-NEXT: s_mov_b32 s9, s8			; GFX90A-NEXT: s_mov_b32 s9, s8
	; GFX90A-NEXT: v_pk_mov_b32 v[6:7], s[8:9], s[8:9] op_sel:[0,1]			; GFX90A-NEXT: v_pk_mov_b32 v[6:7], s[8:9], s[8:9] op_sel:[0,1]
	; GFX90A-NEXT: v_pk_mov_b32 v[8:9], s[8:9], s[8:9] op_sel:[0,1]
	; GFX90A-NEXT: v_pk_mov_b32 v[10:11], s[8:9], s[8:9] op_sel:[0,1]			; GFX90A-NEXT: v_pk_mov_b32 v[10:11], s[8:9], s[8:9] op_sel:[0,1]
				; GFX90A-NEXT: v_pk_mov_b32 v[8:9], s[8:9], s[8:9] op_sel:[0,1]
	; GFX90A-NEXT: v_cmp_lt_i64_e64 s[14:15], s[6:7], 0			; GFX90A-NEXT: v_cmp_lt_i64_e64 s[14:15], s[6:7], 0
	; GFX90A-NEXT: s_mov_b64 s[16:17], s[10:11]			; GFX90A-NEXT: v_cmp_gt_i64_e64 s[16:17], s[6:7], -1
				; GFX90A-NEXT: s_mov_b64 s[20:21], s[10:11]
	; GFX90A-NEXT: v_pk_mov_b32 v[12:13], v[6:7], v[6:7] op_sel:[0,1]			; GFX90A-NEXT: v_pk_mov_b32 v[12:13], v[6:7], v[6:7] op_sel:[0,1]
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: v_readfirstlane_b32 s5, v4			; GFX90A-NEXT: v_readfirstlane_b32 s5, v4
	; GFX90A-NEXT: v_readfirstlane_b32 s9, v5			; GFX90A-NEXT: v_readfirstlane_b32 s9, v5
	; GFX90A-NEXT: s_add_u32 s5, s5, 1			; GFX90A-NEXT: s_add_u32 s5, s5, 1
	; GFX90A-NEXT: s_addc_u32 s9, s9, 0			; GFX90A-NEXT: s_addc_u32 s9, s9, 0
	; GFX90A-NEXT: s_mul_hi_u32 s19, s2, s5			; GFX90A-NEXT: s_mul_hi_u32 s19, s2, s5
	; GFX90A-NEXT: s_mul_i32 s20, s3, s5			; GFX90A-NEXT: s_mul_i32 s22, s3, s5
	; GFX90A-NEXT: s_mul_i32 s18, s2, s5			; GFX90A-NEXT: s_mul_i32 s18, s2, s5
	; GFX90A-NEXT: s_mul_i32 s5, s2, s9			; GFX90A-NEXT: s_mul_i32 s5, s2, s9
	; GFX90A-NEXT: s_add_i32 s5, s19, s5			; GFX90A-NEXT: s_add_i32 s5, s19, s5
	; GFX90A-NEXT: s_add_i32 s5, s5, s20			; GFX90A-NEXT: s_add_i32 s5, s5, s22
	; GFX90A-NEXT: s_branch .LBB3_5			; GFX90A-NEXT: s_branch .LBB3_5
	; GFX90A-NEXT: .LBB3_4: ; %bb58			; GFX90A-NEXT: .LBB3_4: ; %bb58
	; GFX90A-NEXT: ; in Loop: Header=BB3_5 Depth=2			; GFX90A-NEXT: ; in Loop: Header=BB3_5 Depth=2
	; GFX90A-NEXT: v_add_co_u32_sdwa v4, vcc, v4, v18 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0			; GFX90A-NEXT: v_add_co_u32_sdwa v4, vcc, v4, v18 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:DWORD src1_sel:WORD_0
	; GFX90A-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc			; GFX90A-NEXT: v_addc_co_u32_e32 v5, vcc, 0, v5, vcc
	; GFX90A-NEXT: s_add_u32 s16, s16, s0			; GFX90A-NEXT: s_add_u32 s20, s20, s0
	; GFX90A-NEXT: v_cmp_gt_i64_e32 vcc, 0, v[4:5]			; GFX90A-NEXT: s_addc_u32 s21, s21, s1
	; GFX90A-NEXT: s_addc_u32 s17, s17, s1			; GFX90A-NEXT: v_cmp_lt_i64_e64 s[24:25], -1, v[4:5]
	; GFX90A-NEXT: s_cbranch_vccz .LBB3_1			; GFX90A-NEXT: s_mov_b64 s[22:23], 0
				; GFX90A-NEXT: s_andn2_b64 vcc, exec, s[24:25]
				; GFX90A-NEXT: s_cbranch_vccz .LBB3_9
	; GFX90A-NEXT: .LBB3_5: ; %bb16			; GFX90A-NEXT: .LBB3_5: ; %bb16
	; GFX90A-NEXT: ; Parent Loop BB3_2 Depth=1			; GFX90A-NEXT: ; Parent Loop BB3_2 Depth=1
	; GFX90A-NEXT: ; => This Inner Loop Header: Depth=2			; GFX90A-NEXT: ; => This Inner Loop Header: Depth=2
	; GFX90A-NEXT: s_add_u32 s20, s16, s18			; GFX90A-NEXT: s_add_u32 s22, s20, s18
	; GFX90A-NEXT: s_addc_u32 s21, s17, s5			; GFX90A-NEXT: s_addc_u32 s23, s21, s5
	; GFX90A-NEXT: global_load_dword v21, v19, s[20:21] offset:-12 glc			; GFX90A-NEXT: global_load_dword v21, v19, s[22:23] offset:-12 glc
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: global_load_dword v20, v19, s[20:21] offset:-8 glc			; GFX90A-NEXT: global_load_dword v20, v19, s[22:23] offset:-8 glc
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: global_load_dword v14, v19, s[20:21] offset:-4 glc			; GFX90A-NEXT: global_load_dword v14, v19, s[22:23] offset:-4 glc
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: global_load_dword v14, v19, s[20:21] glc			; GFX90A-NEXT: global_load_dword v14, v19, s[22:23] glc
	; GFX90A-NEXT: s_waitcnt vmcnt(0)			; GFX90A-NEXT: s_waitcnt vmcnt(0)
	; GFX90A-NEXT: ds_read_b64 v[14:15], v19			; GFX90A-NEXT: ds_read_b64 v[14:15], v19
	; GFX90A-NEXT: ds_read_b64 v[16:17], v0			; GFX90A-NEXT: ds_read_b64 v[16:17], v0
	; GFX90A-NEXT: s_and_b64 vcc, exec, s[14:15]			; GFX90A-NEXT: s_andn2_b64 vcc, exec, s[16:17]
	; GFX90A-NEXT: ; kill: killed $sgpr20 killed $sgpr21			; GFX90A-NEXT: ; kill: killed $sgpr22 killed $sgpr23
	; GFX90A-NEXT: s_waitcnt lgkmcnt(0)			; GFX90A-NEXT: s_waitcnt lgkmcnt(0)
	; GFX90A-NEXT: s_cbranch_vccnz .LBB3_4			; GFX90A-NEXT: s_cbranch_vccnz .LBB3_7
	; GFX90A-NEXT: ; %bb.6: ; %bb51			; GFX90A-NEXT: ; %bb.6: ; %bb51
	; GFX90A-NEXT: ; in Loop: Header=BB3_5 Depth=2			; GFX90A-NEXT: ; in Loop: Header=BB3_5 Depth=2
	; GFX90A-NEXT: v_cvt_f32_f16_sdwa v23, v21 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1			; GFX90A-NEXT: v_cvt_f32_f16_sdwa v23, v21 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
	; GFX90A-NEXT: v_cvt_f32_f16_e32 v22, v21			; GFX90A-NEXT: v_cvt_f32_f16_e32 v22, v21
	; GFX90A-NEXT: v_cvt_f32_f16_sdwa v21, v20 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1			; GFX90A-NEXT: v_cvt_f32_f16_sdwa v21, v20 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_1
	; GFX90A-NEXT: v_cvt_f32_f16_e32 v20, v20			; GFX90A-NEXT: v_cvt_f32_f16_e32 v20, v20
	; GFX90A-NEXT: v_pk_add_f32 v[24:25], v[0:1], v[14:15]			; GFX90A-NEXT: v_pk_add_f32 v[24:25], v[0:1], v[14:15]
	; GFX90A-NEXT: v_pk_add_f32 v[26:27], v[14:15], 0 op_sel_hi:[1,0]			; GFX90A-NEXT: v_pk_add_f32 v[26:27], v[14:15], 0 op_sel_hi:[1,0]
	; GFX90A-NEXT: v_pk_add_f32 v[16:17], v[22:23], v[16:17]			; GFX90A-NEXT: v_pk_add_f32 v[16:17], v[22:23], v[16:17]
	; GFX90A-NEXT: v_pk_add_f32 v[14:15], v[20:21], v[14:15]			; GFX90A-NEXT: v_pk_add_f32 v[14:15], v[20:21], v[14:15]
	; GFX90A-NEXT: v_pk_add_f32 v[6:7], v[6:7], v[24:25]			; GFX90A-NEXT: v_pk_add_f32 v[6:7], v[6:7], v[24:25]
	; GFX90A-NEXT: v_pk_add_f32 v[8:9], v[8:9], v[26:27]			; GFX90A-NEXT: v_pk_add_f32 v[10:11], v[10:11], v[26:27]
	; GFX90A-NEXT: v_pk_add_f32 v[10:11], v[10:11], v[16:17]			; GFX90A-NEXT: v_pk_add_f32 v[8:9], v[8:9], v[16:17]
	; GFX90A-NEXT: v_pk_add_f32 v[12:13], v[12:13], v[14:15]			; GFX90A-NEXT: v_pk_add_f32 v[12:13], v[12:13], v[14:15]
				; GFX90A-NEXT: s_mov_b64 s[22:23], -1
	; GFX90A-NEXT: s_branch .LBB3_4			; GFX90A-NEXT: s_branch .LBB3_4
				; GFX90A-NEXT: .LBB3_7: ; in Loop: Header=BB3_5 Depth=2
				; GFX90A-NEXT: s_mov_b64 s[22:23], s[14:15]
				; GFX90A-NEXT: s_andn2_b64 vcc, exec, s[22:23]
				; GFX90A-NEXT: s_cbranch_vccz .LBB3_4
				; GFX90A-NEXT: ; %bb.8: ; in Loop: Header=BB3_2 Depth=1
				; GFX90A-NEXT: ; implicit-def: $vgpr12_vgpr13
				; GFX90A-NEXT: ; implicit-def: $vgpr8_vgpr9
				; GFX90A-NEXT: ; implicit-def: $vgpr10_vgpr11
				; GFX90A-NEXT: ; implicit-def: $vgpr6_vgpr7
				; GFX90A-NEXT: ; implicit-def: $vgpr4_vgpr5
				; GFX90A-NEXT: ; implicit-def: $sgpr20_sgpr21
				; GFX90A-NEXT: .LBB3_9: ; %loop.exit.guard
				; GFX90A-NEXT: ; in Loop: Header=BB3_2 Depth=1
				; GFX90A-NEXT: s_xor_b64 s[16:17], s[22:23], -1
				; GFX90A-NEXT: .LBB3_10: ; %Flow19
				; GFX90A-NEXT: ; in Loop: Header=BB3_2 Depth=1
				; GFX90A-NEXT: s_mov_b64 s[14:15], -1
				; GFX90A-NEXT: s_and_b64 vcc, exec, s[16:17]
				; GFX90A-NEXT: s_cbranch_vccz .LBB3_1
				; GFX90A-NEXT: ; %bb.11: ; %bb12
				; GFX90A-NEXT: ; in Loop: Header=BB3_2 Depth=1
				; GFX90A-NEXT: s_add_u32 s6, s6, s4
				; GFX90A-NEXT: s_addc_u32 s7, s7, 0
				; GFX90A-NEXT: s_add_u32 s10, s10, s12
				; GFX90A-NEXT: s_addc_u32 s11, s11, s13
				; GFX90A-NEXT: s_mov_b64 s[14:15], 0
				; GFX90A-NEXT: s_branch .LBB3_1
				; GFX90A-NEXT: .LBB3_12: ; %DummyReturnBlock
				; GFX90A-NEXT: s_endpgm
	bb:			bb:
	%i = load volatile i16, ptr addrspace(4) undef, align 2			%i = load volatile i16, ptr addrspace(4) undef, align 2
	%i6 = zext i16 %i to i64			%i6 = zext i16 %i to i64
	%i7 = udiv i32 %arg1, %arg2			%i7 = udiv i32 %arg1, %arg2
	%i8 = zext i32 %i7 to i64			%i8 = zext i32 %i7 to i64
	br label %bb9			br label %bb9

	bb9: ; preds = %bb12, %bb			bb9: ; preds = %bb12, %bb
	▲ Show 20 Lines • Show All 323 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/branch-relaxation.ll

		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs -amdgpu-s-branch-bits=4 -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s		; RUN: llc -march=amdgcn -mcpu=tahiti -verify-machineinstrs -amdgpu-s-branch-bits=4 -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -enable-var-scope -check-prefix=GCN %s


; FIXME: We should use llvm-mc for this, but we can't even parse our own output.		; FIXME: We should use llvm-mc for this, but we can't even parse our own output.
; See PR33579.		; See PR33579.
; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-s-branch-bits=4 -o %t.o -filetype=obj -simplifycfg-require-and-preserve-domtree=1 %s		; RUN: llc -march=amdgcn -verify-machineinstrs -amdgpu-s-branch-bits=4 -o %t.o -filetype=obj -simplifycfg-require-and-preserve-domtree=1 %s
; RUN: llvm-readobj -r %t.o \| FileCheck --check-prefix=OBJ %s		; RUN: llvm-readobj -r %t.o \| FileCheck --check-prefix=OBJ %s

Show All 30 Lines
define amdgpu_kernel void @uniform_conditional_max_short_forward_branch(ptr addrspace(1) %arg, i32 %cnd) #0 {		define amdgpu_kernel void @uniform_conditional_max_short_forward_branch(ptr addrspace(1) %arg, i32 %cnd) #0 {
bb:		bb:
%cmp = icmp eq i32 %cnd, 0		%cmp = icmp eq i32 %cnd, 0
br i1 %cmp, label %bb3, label %bb2 ; +8 dword branch		br i1 %cmp, label %bb3, label %bb2 ; +8 dword branch

bb2:		bb2:
; 24 bytes		; 24 bytes
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
call void @llvm.amdgcn.s.sleep(i32 0)		call void @llvm.amdgcn.s.sleep(i32 0)
br label %bb3		br label %bb3

bb3:		bb3:
store volatile i32 %cnd, ptr addrspace(1) %arg		store volatile i32 %cnd, ptr addrspace(1) %arg
ret void		ret void
}		}

Show All 24 Lines
define amdgpu_kernel void @uniform_conditional_min_long_forward_branch(ptr addrspace(1) %arg, i32 %cnd) #0 {		define amdgpu_kernel void @uniform_conditional_min_long_forward_branch(ptr addrspace(1) %arg, i32 %cnd) #0 {
bb0:		bb0:
%cmp = icmp eq i32 %cnd, 0		%cmp = icmp eq i32 %cnd, 0
br i1 %cmp, label %bb3, label %bb2 ; +9 dword branch		br i1 %cmp, label %bb3, label %bb2 ; +9 dword branch

bb2:		bb2:
; 32 bytes		; 32 bytes
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
br label %bb3		br label %bb3

bb3:		bb3:
store volatile i32 %cnd, ptr addrspace(1) %arg		store volatile i32 %cnd, ptr addrspace(1) %arg
ret void		ret void
}		}

; GCN-LABEL: {{^}}uniform_conditional_min_long_forward_vcnd_branch:		; GCN-LABEL: {{^}}uniform_conditional_min_long_forward_vcnd_branch:
Show All 22 Lines
; GCN: s_endpgm		; GCN: s_endpgm
define amdgpu_kernel void @uniform_conditional_min_long_forward_vcnd_branch(ptr addrspace(1) %arg, float %cnd) #0 {		define amdgpu_kernel void @uniform_conditional_min_long_forward_vcnd_branch(ptr addrspace(1) %arg, float %cnd) #0 {
bb0:		bb0:
%cmp = fcmp oeq float %cnd, 0.0		%cmp = fcmp oeq float %cnd, 0.0
br i1 %cmp, label %bb3, label %bb2 ; + 8 dword branch		br i1 %cmp, label %bb3, label %bb2 ; + 8 dword branch

bb2:		bb2:
call void asm sideeffect " ; 32 bytes		call void asm sideeffect " ; 32 bytes
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
br label %bb3		br label %bb3

bb3:		bb3:
store volatile float %cnd, ptr addrspace(1) %arg		store volatile float %cnd, ptr addrspace(1) %arg
ret void		ret void
}		}

; GCN-LABEL: {{^}}min_long_forward_vbranch:		; GCN-LABEL: {{^}}min_long_forward_vbranch:
Show All 16 Lines	bb:
%tid.ext = zext i32 %tid to i64		%tid.ext = zext i32 %tid to i64
%gep = getelementptr inbounds i32, ptr addrspace(1) %arg, i64 %tid.ext		%gep = getelementptr inbounds i32, ptr addrspace(1) %arg, i64 %tid.ext
%load = load volatile i32, ptr addrspace(1) %gep		%load = load volatile i32, ptr addrspace(1) %gep
%cmp = icmp eq i32 %load, 0		%cmp = icmp eq i32 %load, 0
br i1 %cmp, label %bb3, label %bb2 ; + 8 dword branch		br i1 %cmp, label %bb3, label %bb2 ; + 8 dword branch

bb2:		bb2:
call void asm sideeffect " ; 32 bytes		call void asm sideeffect " ; 32 bytes
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
br label %bb3		br label %bb3

bb3:		bb3:
store volatile i32 %load, ptr addrspace(1) %gep		store volatile i32 %load, ptr addrspace(1) %gep
ret void		ret void
}		}

; GCN-LABEL: {{^}}long_backward_sbranch:		; GCN-LABEL: {{^}}long_backward_sbranch:
Show All 24 Lines
; GCN-NEXT: [[ENDBB]]:		; GCN-NEXT: [[ENDBB]]:
; GCN-NEXT: s_endpgm		; GCN-NEXT: s_endpgm
define amdgpu_kernel void @long_backward_sbranch(ptr addrspace(1) %arg) #0 {		define amdgpu_kernel void @long_backward_sbranch(ptr addrspace(1) %arg) #0 {
bb:		bb:
br label %bb2		br label %bb2

bb2:		bb2:
%loop.idx = phi i32 [ 0, %bb ], [ %inc, %bb2 ]		%loop.idx = phi i32 [ 0, %bb ], [ %inc, %bb2 ]
; 24 bytes		; 24 bytes
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
%inc = add nsw i32 %loop.idx, 1 ; add cost 4		%inc = add nsw i32 %loop.idx, 1 ; add cost 4
%cmp = icmp slt i32 %inc, 10 ; condition cost = 8		%cmp = icmp slt i32 %inc, 10 ; condition cost = 8
br i1 %cmp, label %bb2, label %bb3 ; -		br i1 %cmp, label %bb2, label %bb3 ; -

bb3:		bb3:
ret void		ret void
}		}

Show All 34 Lines

bb2:		bb2:
store volatile i32 17, ptr addrspace(1) undef		store volatile i32 17, ptr addrspace(1) undef
br label %bb4		br label %bb4

bb3:		bb3:
; 32 byte asm		; 32 byte asm
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
br label %bb4		br label %bb4

bb4:		bb4:
store volatile i32 63, ptr addrspace(1) %arg		store volatile i32 63, ptr addrspace(1) %arg
ret void		ret void
}		}

; GCN-LABEL: {{^}}uniform_unconditional_min_long_backward_branch:		; GCN-LABEL: {{^}}uniform_unconditional_min_long_backward_branch:
; GCN-NEXT: ; %bb.0: ; %entry		; GCN-NEXT: ; %bb.0: ; %entry
		; GCN-NEXT: s_and_b64 vcc, exec, -1
; GCN-NEXT: .L[[LOOP:BB[0-9]_[0-9]+]]: ; %loop		; GCN-NEXT: .L[[LOOP:BB[0-9]_[0-9]+]]: ; %loop
; GCN-NEXT: ; =>This Inner Loop Header: Depth=1		; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
; GCN-NEXT: ;;#ASMSTART		; GCN-NEXT: ;;#ASMSTART
; GCN-NEXT: v_nop_e64		; GCN-NEXT: v_nop_e64
; GCN-NEXT: v_nop_e64		; GCN-NEXT: v_nop_e64
; GCN-NEXT: v_nop_e64		; GCN-NEXT: v_nop_e64
; GCN-NEXT: v_nop_e64		; GCN-NEXT: v_nop_e64
; GCN-NEXT: ;;#ASMEND		; GCN-NEXT: ;;#ASMEND
		; GCN-NEXT: s_mov_b64 vcc, vcc
		; GCN-NEXT: s_cbranch_vccz .LBB6_2
; GCN-NEXT: {{.LBB[0-9]+_[0-9]+}}: ; %loop		; GCN-NEXT: {{.LBB[0-9]+_[0-9]+}}: ; %loop
; GCN-NEXT: ; in Loop: Header=[[LOOP]] Depth=1		; GCN-NEXT: ; in Loop: Header=[[LOOP]] Depth=1

; GCN-NEXT: s_getpc_b64 s[[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]]		; GCN-NEXT: s_getpc_b64 s[[[PC_LO:[0-9]+]]:[[PC_HI:[0-9]+]]]
; GCN-NEXT: [[POST_GETPC:.Lpost_getpc[0-9]+]]:{{$}}		; GCN-NEXT: [[POST_GETPC:.Lpost_getpc[0-9]+]]:{{$}}
; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], (.L[[LOOP]]-[[POST_GETPC]])&4294967295		; GCN-NEXT: s_add_u32 s[[PC_LO]], s[[PC_LO]], (.L[[LOOP]]-[[POST_GETPC]])&4294967295
; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], (.L[[LOOP]]-[[POST_GETPC]])>>32		; GCN-NEXT: s_addc_u32 s[[PC_HI]], s[[PC_HI]], (.L[[LOOP]]-[[POST_GETPC]])>>32
; GCN-NEXT: s_setpc_b64 s[[[PC_LO]]:[[PC_HI]]]		; GCN-NEXT: s_setpc_b64 s[[[PC_LO]]:[[PC_HI]]]
		; GCN-NEXT: .LBB6_2: ; %DummyReturnBlock
		; GCN-NEXT: s_endpgm
; GCN-NEXT: .Lfunc_end{{[0-9]+}}:		; GCN-NEXT: .Lfunc_end{{[0-9]+}}:
define amdgpu_kernel void @uniform_unconditional_min_long_backward_branch(ptr addrspace(1) %arg, i32 %arg1) {		define amdgpu_kernel void @uniform_unconditional_min_long_backward_branch(ptr addrspace(1) %arg, i32 %arg1) {
entry:		entry:
br label %loop		br label %loop

loop:		loop:
; 32 byte asm		; 32 byte asm
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
br label %loop		br label %loop
}		}

; Expansion of branch from %bb1 to %bb3 introduces need to expand		; Expansion of branch from %bb1 to %bb3 introduces need to expand
; branch from %bb0 to %bb2		; branch from %bb0 to %bb2

; GCN-LABEL: {{^}}expand_requires_expand:		; GCN-LABEL: {{^}}expand_requires_expand:
; GCN-NEXT: ; %bb.0: ; %bb0		; GCN-NEXT: ; %bb.0: ; %bb0
Show All 38 Lines

bb1:		bb1:
%val = load volatile i32, ptr addrspace(4) undef		%val = load volatile i32, ptr addrspace(4) undef
%cmp1 = icmp eq i32 %val, 3		%cmp1 = icmp eq i32 %val, 3
br i1 %cmp1, label %bb3, label %bb2		br i1 %cmp1, label %bb3, label %bb2

bb2:		bb2:
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
br label %bb3		br label %bb3

bb3:		bb3:
; These NOPs prevent tail-duplication-based outlining		; These NOPs prevent tail-duplication-based outlining
; from firing, which defeats the need to expand the branches and this test.		; from firing, which defeats the need to expand the branches and this test.
call void asm sideeffect		call void asm sideeffect
"v_nop_e64", ""() #0		"v_nop_e64", ""() #0
call void asm sideeffect		call void asm sideeffect
"v_nop_e64", ""() #0		"v_nop_e64", ""() #0
ret void		ret void
}		}

; Requires expanding of required skip branch.		; Requires expanding of required skip branch.

; GCN-LABEL: {{^}}uniform_inside_divergent:		; GCN-LABEL: {{^}}uniform_inside_divergent:
; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}		; GCN: v_cmp_gt_u32_e32 vcc, 16, v{{[0-9]+}}
; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc		; GCN-NEXT: s_and_saveexec_b64 [[MASK:s\[[0-9]+:[0-9]+\]]], vcc
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
entry:		entry:
%reg = call float asm sideeffect "v_mov_b32_e64 $0, 0", "=v"()		%reg = call float asm sideeffect "v_mov_b32_e64 $0, 0", "=v"()
%cmp0 = fcmp ogt float %reg, 0.000000e+00		%cmp0 = fcmp ogt float %reg, 0.000000e+00
br i1 %cmp0, label %loop, label %ret		br i1 %cmp0, label %loop, label %ret

loop:		loop:
%phi = phi float [ 0.000000e+00, %loop_body ], [ 1.000000e+00, %entry ]		%phi = phi float [ 0.000000e+00, %loop_body ], [ 1.000000e+00, %entry ]
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
%cmp1 = fcmp olt float %phi, 8.0		%cmp1 = fcmp olt float %phi, 8.0
br i1 %cmp1, label %loop_body, label %ret		br i1 %cmp1, label %loop_body, label %ret

loop_body:		loop_body:
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
br label %loop		br label %loop

ret:		ret:
store volatile i32 7, ptr addrspace(1) undef		store volatile i32 7, ptr addrspace(1) undef
ret void		ret void
}		}

; GCN-LABEL: {{^}}long_branch_hang:		; GCN-LABEL: {{^}}long_branch_hang:
Show All 29 Lines	bb9: ; preds = %bb
%tmp10 = and i1 %tmp7, %tmp		%tmp10 = and i1 %tmp7, %tmp
%tmp11 = icmp slt i32 %arg3, %arg4		%tmp11 = icmp slt i32 %arg3, %arg4
%tmp12 = or i1 %tmp11, %tmp7		%tmp12 = or i1 %tmp11, %tmp7
br i1 %tmp12, label %bb19, label %bb14		br i1 %tmp12, label %bb19, label %bb14

bb13: ; preds = %bb		bb13: ; preds = %bb
call void asm sideeffect		call void asm sideeffect
"v_nop_e64		"v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64		v_nop_e64
v_nop_e64", ""() #0		v_nop_e64", ""() #0
br i1 %tmp6, label %bb19, label %bb14		br i1 %tmp6, label %bb19, label %bb14

bb14: ; preds = %bb13, %bb9		bb14: ; preds = %bb13, %bb9
%tmp15 = icmp slt i32 %arg3, %arg4		%tmp15 = icmp slt i32 %arg3, %arg4
%tmp16 = or i1 %tmp15, %tmp		%tmp16 = or i1 %tmp15, %tmp
%tmp17 = and i1 %tmp6, %tmp16		%tmp17 = and i1 %tmp6, %tmp16
%tmp18 = zext i1 %tmp17 to i32		%tmp18 = zext i1 %tmp17 to i32
br label %bb19		br label %bb19
Show All 10 Lines

llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll

	Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
	}			}

	define amdgpu_kernel void @loop_const_true(ptr addrspace(3) %ptr, i32 %n) nounwind {			define amdgpu_kernel void @loop_const_true(ptr addrspace(3) %ptr, i32 %n) nounwind {
	; GCN-LABEL: loop_const_true:			; GCN-LABEL: loop_const_true:
	; GCN: ; %bb.0: ; %entry			; GCN: ; %bb.0: ; %entry
	; GCN-NEXT: s_load_dword s0, s[0:1], 0x9			; GCN-NEXT: s_load_dword s0, s[0:1], 0x9
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: s_addk_i32 s0, 0x80			; GCN-NEXT: s_addk_i32 s0, 0x80
				; GCN-NEXT: s_and_b64 vcc, exec, -1
	; GCN-NEXT: s_mov_b32 m0, -1			; GCN-NEXT: s_mov_b32 m0, -1
	; GCN-NEXT: .LBB1_1: ; %for.body			; GCN-NEXT: .LBB1_1: ; %for.body
	; GCN-NEXT: ; =>This Inner Loop Header: Depth=1			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: v_mov_b32_e32 v0, s0			; GCN-NEXT: v_mov_b32_e32 v0, s0
	; GCN-NEXT: ds_read_b32 v1, v0			; GCN-NEXT: ds_read_b32 v1, v0
	; GCN-NEXT: s_waitcnt lgkmcnt(0)			; GCN-NEXT: s_waitcnt lgkmcnt(0)
	; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1			; GCN-NEXT: v_add_f32_e32 v1, 1.0, v1
	; GCN-NEXT: ds_write_b32 v0, v1			; GCN-NEXT: ds_write_b32 v0, v1
	; GCN-NEXT: s_add_i32 s0, s0, 4			; GCN-NEXT: s_add_i32 s0, s0, 4
	; GCN-NEXT: s_branch .LBB1_1			; GCN-NEXT: s_mov_b64 vcc, vcc
				; GCN-NEXT: s_cbranch_vccnz .LBB1_1
				; GCN-NEXT: ; %bb.2: ; %DummyReturnBlock
				; GCN-NEXT: s_endpgm
	;			;
	; GCN_DBG-LABEL: loop_const_true:			; GCN_DBG-LABEL: loop_const_true:
	; GCN_DBG: ; %bb.0: ; %entry			; GCN_DBG: ; %bb.0: ; %entry
	; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9			; GCN_DBG-NEXT: s_load_dword s0, s[0:1], 0x9
	; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)			; GCN_DBG-NEXT: s_waitcnt lgkmcnt(0)
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 0
	; GCN_DBG-NEXT: s_mov_b32 s0, 0			; GCN_DBG-NEXT: s_mov_b32 s0, 0
	; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1			; GCN_DBG-NEXT: v_writelane_b32 v0, s0, 1
	▲ Show 20 Lines • Show All 267 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/control-flow-optnone.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; optnone disables AMDGPUAnnotateUniformValues, so no branch is known			; optnone disables AMDGPUAnnotateUniformValues, so no branch is known
	; to be uniform during instruction selection. The custom selection for			; to be uniform during instruction selection. The custom selection for
	; brcond was not checking if the branch was uniform, relying on the			; brcond was not checking if the branch was uniform, relying on the
	; selection pattern to check that. That would fail, so then the branch			; selection pattern to check that. That would fail, so then the branch
	; would fail to select.			; would fail to select.

	; GCN-LABEL: {{^}}copytoreg_divergent_brcond:			; GCN-LABEL: {{^}}copytoreg_divergent_brcond:
	; GCN: s_branch			; GCN: s_branch

	; GCN-DAG: v_cmp_lt_i32			; GCN-DAG: v_cmp_lt_i32
	; GCN-DAG: s_cmp_gt_i32			; GCN-DAG: s_cmp_gt_i32
	; GCN: s_and_b64			; GCN: s_and_b64
	; GCN: s_mov_b64 exec			; GCN: s_mov_b64 exec

	; GCN: s_or_b64 exec, exec			; GCN: s_or_b64 exec, exec
	; GCN: {{[s\|v]}}_cmp_eq_u32			; GCN: {{[s\|v]}}_cmp_eq_u32
	; GCN: s_cbranch			; GCN: s_cbranch_execz
	; GCN-NEXT: s_branch			; GCN-NEXT: s_branch
	define amdgpu_kernel void @copytoreg_divergent_brcond(i32 %arg, i32 %arg1, i32 %arg2) #0 {			define amdgpu_kernel void @copytoreg_divergent_brcond(i32 %arg, i32 %arg1, i32 %arg2) #0 {
	bb:			bb:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%tmp3 = zext i32 %tmp to i64			%tmp3 = zext i32 %tmp to i64
	%tmp5 = add i64 %tmp3, undef			%tmp5 = add i64 %tmp3, undef
	%tmp6 = trunc i64 %tmp5 to i32			%tmp6 = trunc i64 %tmp5 to i32
	%tmp7 = mul nsw i32 %tmp6, %arg2			%tmp7 = mul nsw i32 %tmp6, %arg2
	Show All 27 Lines

llvm/test/CodeGen/AMDGPU/infinite-loop.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -march=amdgcn -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI %s			; RUN: llc -march=amdgcn -verify-machineinstrs -simplifycfg-require-and-preserve-domtree=1 < %s \| FileCheck -check-prefix=SI %s
	; RUN: opt -mtriple=amdgcn-- -S -amdgpu-unify-divergent-exit-nodes -verify -simplifycfg-require-and-preserve-domtree=1 %s \| FileCheck -check-prefix=IR %s			; RUN: opt -mtriple=amdgcn-- -S -amdgpu-unify-divergent-exit-nodes -verify -simplifycfg-require-and-preserve-domtree=1 %s \| FileCheck -check-prefix=IR %s

	define amdgpu_kernel void @infinite_loop(ptr addrspace(1) %out) {			define amdgpu_kernel void @infinite_loop(ptr addrspace(1) %out) {
	; SI-LABEL: infinite_loop:			; SI-LABEL: infinite_loop:
	; SI: ; %bb.0: ; %entry			; SI: ; %bb.0: ; %entry
	; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9			; SI-NEXT: s_load_dwordx2 s[0:1], s[0:1], 0x9
	; SI-NEXT: s_mov_b32 s3, 0xf000			; SI-NEXT: s_mov_b32 s3, 0xf000
	; SI-NEXT: s_mov_b32 s2, -1			; SI-NEXT: s_mov_b32 s2, -1
	; SI-NEXT: v_mov_b32_e32 v0, 0x3e7			; SI-NEXT: v_mov_b32_e32 v0, 0x3e7
				; SI-NEXT: s_and_b64 vcc, exec, -1
	; SI-NEXT: .LBB0_1: ; %loop			; SI-NEXT: .LBB0_1: ; %loop
	; SI-NEXT: ; =>This Inner Loop Header: Depth=1			; SI-NEXT: ; =>This Inner Loop Header: Depth=1
	; SI-NEXT: s_waitcnt lgkmcnt(0)			; SI-NEXT: s_waitcnt lgkmcnt(0)
	; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0			; SI-NEXT: buffer_store_dword v0, off, s[0:3], 0
	; SI-NEXT: s_waitcnt vmcnt(0)			; SI-NEXT: s_waitcnt vmcnt(0)
	; SI-NEXT: s_branch .LBB0_1			; SI-NEXT: s_mov_b64 vcc, vcc
				; SI-NEXT: s_cbranch_vccnz .LBB0_1
				; SI-NEXT: ; %bb.2: ; %DummyReturnBlock
				; SI-NEXT: s_endpgm
	; IR-LABEL: @infinite_loop(			; IR-LABEL: @infinite_loop(
	; IR-NEXT: entry:			; IR-NEXT: entry:
	; IR-NEXT: br label [[LOOP:%.*]]			; IR-NEXT: br label [[LOOP:%.*]]
	; IR: loop:			; IR: loop:
	; IR-NEXT: store volatile i32 999, ptr addrspace(1) [[OUT:%.*]], align 4			; IR-NEXT: store volatile i32 999, ptr addrspace(1) [[OUT:%.*]], align 4
	; IR-NEXT: br label [[LOOP]]			; IR-NEXT: br i1 true, label [[LOOP]], label [[DUMMYRETURNBLOCK:%.*]]
				; IR: DummyReturnBlock:
				; IR-NEXT: ret void
				;
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	store volatile i32 999, ptr addrspace(1) %out, align 4			store volatile i32 999, ptr addrspace(1) %out, align 4
	br label %loop			br label %loop
	}			}

	Show All 23 Lines
	; IR-NEXT: [[TMP:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()			; IR-NEXT: [[TMP:%.*]] = tail call i32 @llvm.amdgcn.workitem.id.x()
	; IR-NEXT: [[COND:%.*]] = icmp eq i32 [[TMP]], 1			; IR-NEXT: [[COND:%.*]] = icmp eq i32 [[TMP]], 1
	; IR-NEXT: br i1 [[COND]], label [[LOOP:%.]], label [[UNIFIEDRETURNBLOCK:%.]]			; IR-NEXT: br i1 [[COND]], label [[LOOP:%.]], label [[UNIFIEDRETURNBLOCK:%.]]
	; IR: loop:			; IR: loop:
	; IR-NEXT: store volatile i32 999, ptr addrspace(1) [[OUT:%.*]], align 4			; IR-NEXT: store volatile i32 999, ptr addrspace(1) [[OUT:%.*]], align 4
	; IR-NEXT: br i1 true, label [[LOOP]], label [[UNIFIEDRETURNBLOCK]]			; IR-NEXT: br i1 true, label [[LOOP]], label [[UNIFIEDRETURNBLOCK]]
	; IR: UnifiedReturnBlock:			; IR: UnifiedReturnBlock:
	; IR-NEXT: ret void			; IR-NEXT: ret void
				;
	entry:			entry:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%cond = icmp eq i32 %tmp, 1			%cond = icmp eq i32 %tmp, 1
	br i1 %cond, label %loop, label %return			br i1 %cond, label %loop, label %return

	loop:			loop:
	store volatile i32 999, ptr addrspace(1) %out, align 4			store volatile i32 999, ptr addrspace(1) %out, align 4
	br label %loop			br label %loop
	▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	; IR: loop1:			; IR: loop1:
	; IR-NEXT: store volatile i32 999, ptr addrspace(1) [[OUT:%.*]], align 4			; IR-NEXT: store volatile i32 999, ptr addrspace(1) [[OUT:%.*]], align 4
	; IR-NEXT: br i1 true, label [[LOOP1]], label [[DUMMYRETURNBLOCK:%.*]]			; IR-NEXT: br i1 true, label [[LOOP1]], label [[DUMMYRETURNBLOCK:%.*]]
	; IR: loop2:			; IR: loop2:
	; IR-NEXT: store volatile i32 888, ptr addrspace(1) [[OUT]], align 4			; IR-NEXT: store volatile i32 888, ptr addrspace(1) [[OUT]], align 4
	; IR-NEXT: br i1 true, label [[LOOP2]], label [[DUMMYRETURNBLOCK]]			; IR-NEXT: br i1 true, label [[LOOP2]], label [[DUMMYRETURNBLOCK]]
	; IR: DummyReturnBlock:			; IR: DummyReturnBlock:
	; IR-NEXT: ret void			; IR-NEXT: ret void
				;
	entry:			entry:
	br i1 undef, label %loop1, label %loop2			br i1 undef, label %loop1, label %loop2

	loop1:			loop1:
	store volatile i32 999, ptr addrspace(1) %out, align 4			store volatile i32 999, ptr addrspace(1) %out, align 4
	br label %loop1			br label %loop1

	loop2:			loop2:
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; IR: inner_loop:			; IR: inner_loop:
	; IR-NEXT: store volatile i32 999, ptr addrspace(1) [[OUT:%.*]], align 4			; IR-NEXT: store volatile i32 999, ptr addrspace(1) [[OUT:%.*]], align 4
	; IR-NEXT: [[COND3:%.*]] = icmp eq i32 [[TMP]], 3			; IR-NEXT: [[COND3:%.*]] = icmp eq i32 [[TMP]], 3
	; IR-NEXT: br i1 true, label [[TRANSITIONBLOCK:%.*]], label [[UNIFIEDRETURNBLOCK]]			; IR-NEXT: br i1 true, label [[TRANSITIONBLOCK:%.*]], label [[UNIFIEDRETURNBLOCK]]
	; IR: TransitionBlock:			; IR: TransitionBlock:
	; IR-NEXT: br i1 [[COND3]], label [[INNER_LOOP]], label [[OUTER_LOOP]]			; IR-NEXT: br i1 [[COND3]], label [[INNER_LOOP]], label [[OUTER_LOOP]]
	; IR: UnifiedReturnBlock:			; IR: UnifiedReturnBlock:
	; IR-NEXT: ret void			; IR-NEXT: ret void
				;
	entry:			entry:
	%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()			%tmp = tail call i32 @llvm.amdgcn.workitem.id.x()
	%cond1 = icmp eq i32 %tmp, 1			%cond1 = icmp eq i32 %tmp, 1
	br i1 %cond1, label %outer_loop, label %return			br i1 %cond1, label %outer_loop, label %return

	outer_loop:			outer_loop:
	; %cond2 = icmp eq i32 %tmp, 2			; %cond2 = icmp eq i32 %tmp, 2
	; br i1 %cond2, label %outer_loop, label %inner_loop			; br i1 %cond2, label %outer_loop, label %inner_loop
	Show All 12 Lines

llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll

	Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
	; test the case where there's only a kill in an infinite loop			; test the case where there's only a kill in an infinite loop
	define amdgpu_ps void @only_kill() #0 {			define amdgpu_ps void @only_kill() #0 {
	; CHECK-LABEL: only_kill:			; CHECK-LABEL: only_kill:
	; CHECK: ; %bb.0: ; %main_body			; CHECK: ; %bb.0: ; %main_body
	; CHECK-NEXT: s_mov_b64 s[0:1], exec			; CHECK-NEXT: s_mov_b64 s[0:1], exec
	; CHECK-NEXT: .LBB2_1: ; %loop			; CHECK-NEXT: .LBB2_1: ; %loop
	; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], exec			; CHECK-NEXT: s_andn2_b64 s[0:1], s[0:1], exec
	; CHECK-NEXT: s_cbranch_scc0 .LBB2_3			; CHECK-NEXT: s_cbranch_scc0 .LBB2_4
	; CHECK-NEXT: ; %bb.2: ; %loop			; CHECK-NEXT: ; %bb.2: ; %loop
	; CHECK-NEXT: ; in Loop: Header=BB2_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB2_1 Depth=1
	; CHECK-NEXT: s_mov_b64 exec, 0			; CHECK-NEXT: s_mov_b64 exec, 0
	; CHECK-NEXT: s_branch .LBB2_1			; CHECK-NEXT: s_mov_b64 vcc, exec
	; CHECK-NEXT: .LBB2_3:			; CHECK-NEXT: s_cbranch_execnz .LBB2_1
				; CHECK-NEXT: ; %bb.3: ; %DummyReturnBlock
				; CHECK-NEXT: s_endpgm
				; CHECK-NEXT: .LBB2_4:
	; CHECK-NEXT: s_mov_b64 exec, 0			; CHECK-NEXT: s_mov_b64 exec, 0
	; CHECK-NEXT: exp null off, off, off, off done vm			; CHECK-NEXT: exp null off, off, off, off done vm
	; CHECK-NEXT: s_endpgm			; CHECK-NEXT: s_endpgm
	main_body:			main_body:
	br label %loop			br label %loop

	loop:			loop:
	call void @llvm.amdgcn.kill(i1 false) #3			call void @llvm.amdgcn.kill(i1 false) #3
	br label %loop			br label %loop
	}			}

	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/loop-live-out-copy-undef-subrange.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck %s			; RUN: llc -mtriple=amdgcn-amd-amdhsa -mcpu=gfx906 -verify-machineinstrs < %s \| FileCheck %s

	; This example used to produce a verifier error resulting from the			; This example used to produce a verifier error resulting from the
	; register coalescer leaving behind a false live interval when a live			; register coalescer leaving behind a false live interval when a live
	; out copy introduced new liveness for a subregister.			; out copy introduced new liveness for a subregister.

	define <3 x float> @liveout_undef_subrange(<3 x float> %arg) {			define <3 x float> @liveout_undef_subrange(<3 x float> %arg) {
	; CHECK-LABEL: liveout_undef_subrange:			; CHECK-LABEL: liveout_undef_subrange:
	; CHECK: ; %bb.0: ; %bb			; CHECK: ; %bb.0: ; %bb
	; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; CHECK-NEXT: v_add_f32_e32 v3, v2, v2			; CHECK-NEXT: v_add_f32_e32 v3, v2, v2
	; CHECK-NEXT: ; kill: killed $vgpr1
	; CHECK-NEXT: v_add_f32_e32 v0, v0, v0			; CHECK-NEXT: v_add_f32_e32 v0, v0, v0
	; CHECK-NEXT: .LBB0_1: ; %bb1
	; CHECK-NEXT: ; =>This Loop Header: Depth=1
	; CHECK-NEXT: ; Child Loop BB0_2 Depth 2
	; CHECK-NEXT: s_mov_b64 s[4:5], 0			; CHECK-NEXT: s_mov_b64 s[4:5], 0
	; CHECK-NEXT: .LBB0_2: ; %bb1			; CHECK-NEXT: ; kill: killed $vgpr1
	; CHECK-NEXT: ; Parent Loop BB0_1 Depth=1			; CHECK-NEXT: .LBB0_1: ; %bb1
	; CHECK-NEXT: ; => This Inner Loop Header: Depth=2			; CHECK-NEXT: ; =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: v_cmp_neq_f32_e32 vcc, 0, v2			; CHECK-NEXT: v_cmp_neq_f32_e32 vcc, 0, v2
	; CHECK-NEXT: s_or_b64 s[4:5], vcc, s[4:5]			; CHECK-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
	; CHECK-NEXT: s_andn2_b64 exec, exec, s[4:5]			; CHECK-NEXT: s_andn2_b64 exec, exec, s[4:5]
	; CHECK-NEXT: s_cbranch_execnz .LBB0_2			; CHECK-NEXT: s_cbranch_execnz .LBB0_1
	; CHECK-NEXT: ; %bb.3: ; %bb2			; CHECK-NEXT: ; %bb.2: ; %bb2
	; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1			; CHECK-NEXT: ; in Loop: Header=BB0_1 Depth=1
	; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]			; CHECK-NEXT: s_or_b64 exec, exec, s[4:5]
	; CHECK-NEXT: v_mul_f32_e32 v2, v3, v2			; CHECK-NEXT: v_mul_f32_e32 v2, v3, v2
	; CHECK-NEXT: s_branch .LBB0_1			; CHECK-NEXT: s_mov_b64 s[4:5], 0
				; CHECK-NEXT: s_cbranch_execnz .LBB0_1
				; CHECK-NEXT: ; %bb.3: ; %DummyReturnBlock
				; CHECK-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	br label %bb1			br label %bb1

	bb1: ; preds = %bb3, %bb			bb1: ; preds = %bb3, %bb
	%i = phi <3 x float> [ %arg, %bb ], [ %i11, %bb3 ]			%i = phi <3 x float> [ %arg, %bb ], [ %i11, %bb3 ]
	%i2 = extractelement <3 x float> %i, i64 2			%i2 = extractelement <3 x float> %i, i64 2
	%i3 = fmul float %i2, 1.000000e+00			%i3 = fmul float %i2, 1.000000e+00
	%i4 = fmul nsz <3 x float> %arg, <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>			%i4 = fmul nsz <3 x float> %arg, <float 2.000000e+00, float 2.000000e+00, float 2.000000e+00>
	Show All 13 Lines

llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s			; RUN: llc -march=amdgcn -verify-machineinstrs < %s \| FileCheck -check-prefix=GCN %s

	; GCN-LABEL: {{^}}negated_cond:			; GCN-LABEL: {{^}}negated_cond:
	; GCN: .LBB0_1:			; GCN: .LBB0_1:
	; GCN: v_cmp_eq_u32_e64 [[CC:[^,]+]],			; GCN: v_cmp_eq_u32_e64 [[CC:[^,]+]],
	; GCN: .LBB0_3:			; GCN: .LBB0_3:
	; GCN-NOT: v_cndmask_b32			; GCN-NOT: v_cndmask_b32
	; GCN-NOT: v_cmp			; GCN-NOT: v_cmp
	; GCN: s_andn2_b64 vcc, exec, [[CC]]			; GCN: s_andn2_b64 vcc, exec, [[CC]]
	; GCN: s_cbranch_vccnz .LBB0_2			; GCN: s_lshl_b32 s12, s12, 5
				; GCN: s_cbranch_vccz .LBB0_6
	define amdgpu_kernel void @negated_cond(ptr addrspace(1) %arg1) {			define amdgpu_kernel void @negated_cond(ptr addrspace(1) %arg1) {
	bb:			bb:
	br label %bb1			br label %bb1

	bb1:			bb1:
	%tmp1 = load i32, ptr addrspace(1) %arg1			%tmp1 = load i32, ptr addrspace(1) %arg1
	%tmp2 = icmp eq i32 %tmp1, 0			%tmp2 = icmp eq i32 %tmp1, 0
	br label %bb2			br label %bb2
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/si-annotate-nested-control-flows.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: llc -mtriple=amdgcn-amd-amdhsa %s -o - \| FileCheck %s

				define void @nested_inf_loop(i1 %0, i1 %1) {
				; CHECK-LABEL: nested_inf_loop:
				; CHECK-NEXT: %bb.0: ; %BB
				; CHECK-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
				; CHECK-NEXT: v_and_b32_e32 v1, 1, v1
				; CHECK-NEXT: v_and_b32_e32 v0, 1, v0
				; CHECK-NEXT: v_cmp_eq_u32_e64 s[4:5], 1, v1
				; CHECK-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
				; CHECK-NEXT: s_xor_b64 s[6:7], vcc, -1
				; CHECK-NEXT: s_mov_b64 s[8:9], 0
				; CHECK-NEXT: .LBB0_1: ; %BB1
				; CHECK: s_and_b64 s[10:11], exec, s[6:7]
				; CHECK-NEXT: s_or_b64 s[8:9], s[10:11], s[8:9]
				; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]
				; CHECK-NEXT: s_cbranch_execnz .LBB0_1
				; CHECK-NEXT: %bb.2: ; %BB2
				; CHECK: s_or_b64 exec, exec, s[8:9]
				; CHECK-NEXT: s_mov_b64 s[8:9], 0
				; CHECK-NEXT: .LBB0_3: ; %BB4
				; CHECK: s_and_b64 s[10:11], exec, s[4:5]
				; CHECK-NEXT: s_or_b64 s[8:9], s[10:11], s[8:9]
				; CHECK-NEXT: s_andn2_b64 exec, exec, s[8:9]
				; CHECK-NEXT: s_cbranch_execnz .LBB0_3
				; CHECK-NEXT: %bb.4: ; %loop.exit.guard
				; CHECK: s_or_b64 exec, exec, s[8:9]
				; CHECK-NEXT: s_mov_b64 vcc, 0
				; CHECK-NEXT: s_mov_b64 s[8:9], 0
				; CHECK-NEXT: s_branch .LBB0_1
				; CHECK-NEXT: %bb.5: ; %DummyReturnBlock
				; CHECK-NEXT: s_setpc_b64 s[30:31]
				BB:
				br label %BB1

				BB1:
				br i1 %0, label %BB3, label %BB2

				BB2:
				br label %BB4

				BB4:
				br i1 %1, label %BB3, label %BB4

				BB3:
				br label %BB1
				}

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

	Show First 20 Lines • Show All 290 Lines • ▼ Show 20 Lines
	; GCN-NEXT: v_writelane_b32 v40, s42, 10			; GCN-NEXT: v_writelane_b32 v40, s42, 10
	; GCN-NEXT: v_writelane_b32 v40, s43, 11			; GCN-NEXT: v_writelane_b32 v40, s43, 11
	; GCN-NEXT: v_writelane_b32 v40, s44, 12			; GCN-NEXT: v_writelane_b32 v40, s44, 12
	; GCN-NEXT: v_writelane_b32 v40, s45, 13			; GCN-NEXT: v_writelane_b32 v40, s45, 13
	; GCN-NEXT: v_writelane_b32 v40, s46, 14			; GCN-NEXT: v_writelane_b32 v40, s46, 14
	; GCN-NEXT: v_writelane_b32 v40, s47, 15			; GCN-NEXT: v_writelane_b32 v40, s47, 15
	; GCN-NEXT: v_writelane_b32 v40, s48, 16			; GCN-NEXT: v_writelane_b32 v40, s48, 16
	; GCN-NEXT: v_writelane_b32 v40, s49, 17			; GCN-NEXT: v_writelane_b32 v40, s49, 17
				; GCN-NEXT: v_writelane_b32 v40, s50, 18
				; GCN-NEXT: v_writelane_b32 v40, s51, 19
				; GCN-NEXT: v_writelane_b32 v40, s52, 20
				; GCN-NEXT: v_writelane_b32 v40, s53, 21
				; GCN-NEXT: v_writelane_b32 v40, s54, 22
				; GCN-NEXT: v_writelane_b32 v40, s55, 23
				; GCN-NEXT: v_writelane_b32 v40, s56, 24
				; GCN-NEXT: v_writelane_b32 v40, s57, 25
	; GCN-NEXT: v_mov_b32_e32 v41, v31			; GCN-NEXT: v_mov_b32_e32 v41, v31
	; GCN-NEXT: s_mov_b32 s44, s15			; GCN-NEXT: s_mov_b32 s46, s15
	; GCN-NEXT: s_mov_b32 s45, s14			; GCN-NEXT: s_mov_b32 s47, s14
	; GCN-NEXT: s_mov_b32 s46, s13			; GCN-NEXT: s_mov_b32 s48, s13
	; GCN-NEXT: s_mov_b32 s47, s12			; GCN-NEXT: s_mov_b32 s49, s12
	; GCN-NEXT: s_mov_b64 s[34:35], s[10:11]			; GCN-NEXT: s_mov_b64 s[34:35], s[10:11]
	; GCN-NEXT: s_mov_b64 s[36:37], s[8:9]			; GCN-NEXT: s_mov_b64 s[36:37], s[8:9]
	; GCN-NEXT: s_mov_b64 s[38:39], s[6:7]			; GCN-NEXT: s_mov_b64 s[38:39], s[6:7]
	; GCN-NEXT: s_mov_b64 s[40:41], s[4:5]			; GCN-NEXT: s_mov_b64 s[40:41], s[4:5]
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_mov_b64 s[50:51], 0
	; GCN-NEXT: v_mov_b32_e32 v0, 0			; GCN-NEXT: v_mov_b32_e32 v0, 0
	; GCN-NEXT: v_mov_b32_e32 v1, 0			; GCN-NEXT: v_mov_b32_e32 v1, 0
	; GCN-NEXT: v_and_b32_e32 v2, 0x3ff, v41
	; GCN-NEXT: v_mov_b32_e32 v43, 0
	; GCN-NEXT: flat_load_dword v44, v[0:1]			; GCN-NEXT: flat_load_dword v44, v[0:1]
	; GCN-NEXT: v_mov_b32_e32 v45, 0x7fc00000			; GCN-NEXT: v_and_b32_e32 v0, 0x3ff, v41
	; GCN-NEXT: s_getpc_b64 s[48:49]			; GCN-NEXT: v_mov_b32_e32 v43, 0
	; GCN-NEXT: s_add_u32 s48, s48, spam@rel32@lo+4			; GCN-NEXT: v_lshlrev_b32_e32 v42, 2, v0
	; GCN-NEXT: s_addc_u32 s49, s49, spam@rel32@hi+12
	; GCN-NEXT: v_lshlrev_b32_e32 v42, 2, v2
	; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)			; GCN-NEXT: s_waitcnt vmcnt(0) lgkmcnt(0)
	; GCN-NEXT: v_cmp_eq_f32_e64 s[42:43], 0, v44			; GCN-NEXT: v_cmp_eq_f32_e64 s[52:53], 0, v44
	; GCN-NEXT: s_branch .LBB1_3			; GCN-NEXT: v_cmp_neq_f32_e64 s[42:43], 0, v44
	; GCN-NEXT: .LBB1_1: ; %bb10			; GCN-NEXT: v_mov_b32_e32 v45, 0x7fc00000
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: s_branch .LBB1_2
	; GCN-NEXT: s_or_b64 exec, exec, s[6:7]			; GCN-NEXT: LBB1_1: ; %Flow7
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
	; GCN-NEXT: .LBB1_2: ; %bb18			; GCN-NEXT: s_or_b64 exec, exec, s[8:9]
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: s_and_b64 s[4:5], exec, s[4:5]
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: s_or_b64 s[50:51], s[4:5], s[50:51]
	; GCN-NEXT: s_mov_b64 s[4:5], 0			; GCN-NEXT: s_andn2_b64 exec, exec, s[50:51]
	; GCN-NEXT: .LBB1_3: ; %bb2			; GCN-NEXT: s_cbranch_execz .LBB1_18
	; GCN-NEXT: ; =>This Loop Header: Depth=1			; GCN-NEXT: .LBB1_2: ; %bb2
	; GCN-NEXT: ; Child Loop BB1_4 Depth 2			; GCN-NEXT: ; =>This Inner Loop Header: Depth=1
	; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: .LBB1_4: ; %bb2
	; GCN-NEXT: ; Parent Loop BB1_3 Depth=1
	; GCN-NEXT: ; => This Inner Loop Header: Depth=2
	; GCN-NEXT: flat_load_dword v0, v[42:43]			; GCN-NEXT: flat_load_dword v0, v[42:43]
	; GCN-NEXT: buffer_store_dword v43, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v43, off, s[0:3], 0
				; GCN-NEXT: s_mov_b64 s[4:5], -1
	; GCN-NEXT: s_waitcnt vmcnt(1)			; GCN-NEXT: s_waitcnt vmcnt(1)
	; GCN-NEXT: v_cmp_gt_i32_e32 vcc, 3, v0			; GCN-NEXT: v_cmp_lt_i32_e32 vcc, 2, v0
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execz .LBB1_6
	; GCN-NEXT: ; %bb.5: ; %bb8
	; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
	; GCN-NEXT: s_or_b64 s[6:7], vcc, s[6:7]
	; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: s_andn2_b64 exec, exec, s[6:7]
	; GCN-NEXT: s_cbranch_execnz .LBB1_4
	; GCN-NEXT: s_branch .LBB1_1
	; GCN-NEXT: .LBB1_6: ; %bb6
	; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2
	; GCN-NEXT: s_or_b64 exec, exec, s[8:9]
	; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 3, v0
	; GCN-NEXT: s_or_b64 s[4:5], vcc, s[4:5]
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: s_andn2_b64 exec, exec, s[4:5]			; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc
	; GCN-NEXT: s_cbranch_execnz .LBB1_4			; GCN-NEXT: s_xor_b64 s[54:55], exec, s[8:9]
	; GCN-NEXT: ; %bb.7: ; %bb11			; GCN-NEXT: s_cbranch_execz .LBB1_12
	; GCN-NEXT: ; in Loop: Header=BB1_4 Depth=2			; GCN-NEXT: ; %bb.3: ; %bb6
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: v_cmp_eq_u32_e64 s[44:45], 3, v0
				; GCN-NEXT: s_and_saveexec_b64 s[56:57], s[44:45]
				; GCN-NEXT: s_cbranch_execz .LBB1_11
				; GCN-NEXT: %bb.4: ; %bb11
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: s_getpc_b64 s[16:17]
				; GCN-NEXT: s_add_u32 s16, s16, spam@rel32@lo+4
				; GCN-NEXT: s_addc_u32 s17, s17, spam@rel32@hi+12
	; GCN-NEXT: s_mov_b64 s[4:5], s[40:41]			; GCN-NEXT: s_mov_b64 s[4:5], s[40:41]
	; GCN-NEXT: s_mov_b64 s[6:7], s[38:39]			; GCN-NEXT: s_mov_b64 s[6:7], s[38:39]
	; GCN-NEXT: s_mov_b64 s[8:9], s[36:37]			; GCN-NEXT: s_mov_b64 s[8:9], s[36:37]
	; GCN-NEXT: s_mov_b64 s[10:11], s[34:35]			; GCN-NEXT: s_mov_b64 s[10:11], s[34:35]
	; GCN-NEXT: s_mov_b32 s12, s47			; GCN-NEXT: s_mov_b32 s12, s49
	; GCN-NEXT: s_mov_b32 s13, s46			; GCN-NEXT: s_mov_b32 s13, s48
	; GCN-NEXT: s_mov_b32 s14, s45			; GCN-NEXT: s_mov_b32 s14, s47
	; GCN-NEXT: s_mov_b32 s15, s44			; GCN-NEXT: s_mov_b32 s15, s46
	; GCN-NEXT: v_mov_b32_e32 v31, v41			; GCN-NEXT: v_mov_b32_e32 v31, v41
	; GCN-NEXT: s_swappc_b64 s[30:31], s[48:49]			; GCN-NEXT: s_swappc_b64 s[30:31], s[16:17]
	; GCN-NEXT: v_cmp_eq_f32_e32 vcc, 0, v0			; GCN-NEXT: v_cmp_neq_f32_e32 vcc, 0, v0
	; GCN-NEXT: s_mov_b64 s[4:5], 0
	; GCN-NEXT: s_mov_b64 s[6:7], 0			; GCN-NEXT: s_mov_b64 s[6:7], 0
	; GCN-NEXT: s_and_saveexec_b64 s[8:9], vcc			; GCN-NEXT: s_and_saveexec_b64 s[4:5], vcc
	; GCN-NEXT: s_cbranch_execnz .LBB1_4			; GCN-NEXT: s_cbranch_execz .LBB1_10
	; GCN-NEXT: ; %bb.8: ; %bb14			; GCN-NEXT: ; %bb.5: ; %bb14
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: s_mov_b64 s[8:9], s[52:53]
				; GCN-NEXT: s_and_saveexec_b64 s[6:7], s[42:43]
				; GCN-NEXT: s_cbranch_execz .LBB1_7
				; GCN-NEXT: ; %bb.6: ; %bb16
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0
				; GCN-NEXT: s_or_b64 s[8:9], s[52:53], exec
				; GCN-NEXT: .LBB1_7: ; %Flow3
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: s_or_b64 exec, exec, s[6:7]
				; GCN-NEXT: s_mov_b64 s[6:7], 0
				; GCN-NEXT: s_and_saveexec_b64 s[10:11], s[8:9]
				; GCN-NEXT: s_xor_b64 s[8:9], exec, s[10:11]
				; GCN-NEXT: s_cbranch_execz .LBB1_9
				; GCN-NEXT: ; %bb.8: ; %bb17
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: s_mov_b64 s[6:7], exec
				; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0
				; GCN-NEXT: .LBB1_9: ; %Flow4
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[8:9]			; GCN-NEXT: s_or_b64 exec, exec, s[8:9]
	; GCN-NEXT: s_and_saveexec_b64 s[4:5], s[42:43]			; GCN-NEXT: s_and_b64 s[6:7], s[6:7], exec
	; GCN-NEXT: s_cbranch_execnz .LBB1_10			; GCN-NEXT: .LBB1_10: ; %Flow2
	; GCN-NEXT: ; %bb.9: ; %bb16			; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1
	; GCN-NEXT: s_or_b64 exec, exec, s[4:5]			; GCN-NEXT: s_or_b64 exec, exec, s[4:5]
				; GCN-NEXT: s_andn2_b64 s[4:5], s[44:45], exec
				; GCN-NEXT: s_and_b64 s[8:9], vcc, exec
				; GCN-NEXT: s_or_b64 s[44:45], s[4:5], s[8:9]
				; GCN-NEXT: s_and_b64 s[6:7], s[6:7], exec
				; GCN-NEXT: .LBB1_11: ; %Flow1
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: s_or_b64 exec, exec, s[56:57]
				; GCN-NEXT: s_orn2_b64 s[4:5], s[44:45], exec
				; GCN-NEXT: s_and_b64 s[6:7], s[6:7], exec
				; GCN-NEXT: ; implicit-def: $vgpr0
				; GCN-NEXT: .LBB1_12: ; %Flow
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: s_andn2_saveexec_b64 s[8:9], s[54:55]
				; GCN-NEXT: s_cbranch_execz .LBB1_16
				; GCN-NEXT: ; %bb.13: ; %bb8
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: v_cmp_eq_u32_e32 vcc, 1, v0
				; GCN-NEXT: s_mov_b64 s[10:11], s[6:7]
				; GCN-NEXT: s_and_saveexec_b64 s[12:13], vcc
				; GCN-NEXT: s_cbranch_execz .LBB1_15
				; GCN-NEXT: ; %bb.14: ; %bb10
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
	; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0			; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0
	; GCN-NEXT: .LBB1_10: ; %bb17			; GCN-NEXT: s_or_b64 s[10:11], s[6:7], exec
	; GCN-NEXT: ; in Loop: Header=BB1_3 Depth=1			; GCN-NEXT: .LBB1_15: ; %Flow6
	; GCN-NEXT: buffer_store_dword v44, off, s[0:3], 0			; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
	; GCN-NEXT: s_branch .LBB1_2			; GCN-NEXT: s_or_b64 exec, exec, s[12:13]
				; GCN-NEXT: s_andn2_b64 s[4:5], s[4:5], exec
				; GCN-NEXT: s_and_b64 s[12:13], vcc, exec
				; GCN-NEXT: s_andn2_b64 s[6:7], s[6:7], exec
				; GCN-NEXT: s_and_b64 s[10:11], s[10:11], exec
				; GCN-NEXT: s_or_b64 s[4:5], s[4:5], s[12:13]
				; GCN-NEXT: s_or_b64 s[6:7], s[6:7], s[10:11]
				; GCN-NEXT: .LBB1_16: ; %Flow5
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: s_or_b64 exec, exec, s[8:9]
				; GCN-NEXT: s_and_saveexec_b64 s[8:9], s[6:7]
				; GCN-NEXT: s_cbranch_execz .LBB1_1
				; GCN-NEXT: ; %bb.17: ; %bb18
				; GCN-NEXT: ; in Loop: Header=BB1_2 Depth=1
				; GCN-NEXT: buffer_store_dword v45, off, s[0:3], 0
				; GCN-NEXT: s_andn2_b64 s[4:5], s[4:5], exec
				; GCN-NEXT: s_branch .LBB1_1
				; GCN-NEXT: .LBB1_18: ; %DummyReturnBlock
				; GCN-NEXT: s_or_b64 exec, exec, s[50:51]
				; GCN-NEXT: v_readlane_b32 s57, v40, 25
				; GCN-NEXT: v_readlane_b32 s56, v40, 24
				; GCN-NEXT: v_readlane_b32 s55, v40, 23
				; GCN-NEXT: v_readlane_b32 s54, v40, 22
				; GCN-NEXT: v_readlane_b32 s53, v40, 21
				; GCN-NEXT: v_readlane_b32 s52, v40, 20
				; GCN-NEXT: v_readlane_b32 s51, v40, 19
				; GCN-NEXT: v_readlane_b32 s50, v40, 18
				; GCN-NEXT: v_readlane_b32 s49, v40, 17
				; GCN-NEXT: v_readlane_b32 s48, v40, 16
				; GCN-NEXT: v_readlane_b32 s47, v40, 15
				; GCN-NEXT: v_readlane_b32 s46, v40, 14
				; GCN-NEXT: v_readlane_b32 s45, v40, 13
				; GCN-NEXT: v_readlane_b32 s44, v40, 12
				; GCN-NEXT: v_readlane_b32 s43, v40, 11
				; GCN-NEXT: v_readlane_b32 s42, v40, 10
				; GCN-NEXT: v_readlane_b32 s41, v40, 9
				; GCN-NEXT: v_readlane_b32 s40, v40, 8
				; GCN-NEXT: v_readlane_b32 s39, v40, 7
				; GCN-NEXT: v_readlane_b32 s38, v40, 6
				; GCN-NEXT: v_readlane_b32 s37, v40, 5
				; GCN-NEXT: v_readlane_b32 s36, v40, 4
				; GCN-NEXT: v_readlane_b32 s35, v40, 3
				; GCN-NEXT: v_readlane_b32 s34, v40, 2
				; GCN-NEXT: v_readlane_b32 s31, v40, 1
				; GCN-NEXT: v_readlane_b32 s30, v40, 0
				; GCN-NEXT: buffer_load_dword v45, off, s[0:3], s33 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v44, off, s[0:3], s33 offset:4 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v43, off, s[0:3], s33 offset:8 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v42, off, s[0:3], s33 offset:12 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v41, off, s[0:3], s33 offset:16 ; 4-byte Folded Reload
				; GCN-NEXT: v_readlane_b32 s4, v46, 0
				; GCN-NEXT: s_or_saveexec_b64 s[6:7], -1
				; GCN-NEXT: buffer_load_dword v40, off, s[0:3], s33 offset:20 ; 4-byte Folded Reload
				; GCN-NEXT: buffer_load_dword v46, off, s[0:3], s33 offset:24 ; 4-byte Folded Reload
				; GCN-NEXT: s_mov_b64 exec, s[6:7]
				; GCN-NEXT: s_addk_i32 s32, 0xf800
				; GCN-NEXT: s_mov_b32 s33, s4
				; GCN-NEXT: s_waitcnt vmcnt(0)
				; GCN-NEXT: s_setpc_b64 s[30:31]
	bb:			bb:
	%tmp = load float, ptr null, align 16			%tmp = load float, ptr null, align 16
	br label %bb2			br label %bb2

	bb1: ; preds = %bb8, %bb6			bb1: ; preds = %bb8, %bb6
	br label %bb2			br label %bb2

	bb2:			bb2:
	▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll

	Show All 28 Lines
	; GCN-NEXT: s_and_saveexec_b32 s4, s4			; GCN-NEXT: s_and_saveexec_b32 s4, s4
	; GCN-NEXT: buffer_store_dword v0, v0, s[8:11], 0 offen			; GCN-NEXT: buffer_store_dword v0, v0, s[8:11], 0 offen
	; GCN-NEXT: ; implicit-def: $vgpr2_vgpr3_vgpr4_vgpr5			; GCN-NEXT: ; implicit-def: $vgpr2_vgpr3_vgpr4_vgpr5
	; GCN-NEXT: s_waitcnt_depctr 0xffe3			; GCN-NEXT: s_waitcnt_depctr 0xffe3
	; GCN-NEXT: s_xor_b32 exec_lo, exec_lo, s4			; GCN-NEXT: s_xor_b32 exec_lo, exec_lo, s4
	; GCN-NEXT: s_cbranch_execnz .LBB0_2			; GCN-NEXT: s_cbranch_execnz .LBB0_2
	; GCN-NEXT: ; %bb.3: ; in Loop: Header=BB0_1 Depth=1			; GCN-NEXT: ; %bb.3: ; in Loop: Header=BB0_1 Depth=1
	; GCN-NEXT: s_mov_b32 exec_lo, s5			; GCN-NEXT: s_mov_b32 exec_lo, s5
	; GCN-NEXT: s_branch .LBB0_1			; GCN-NEXT: s_mov_b32 vcc_lo, exec_lo
				; GCN-NEXT: s_cbranch_vccnz .LBB0_1
	;			;
	; GFX11-LABEL: vgpr_descriptor_waterfall_loop_idom_update:			; GFX11-LABEL: vgpr_descriptor_waterfall_loop_idom_update:
	; GFX11: ; %bb.0: ; %entry			; GFX11: ; %bb.0: ; %entry
	; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)			; GFX11-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	; GFX11-NEXT: s_waitcnt_vscnt null, 0x0			; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
	; GFX11-NEXT: .p2align 6			; GFX11-NEXT: .p2align 6
	; GFX11-NEXT: .LBB0_1: ; %bb0			; GFX11-NEXT: .LBB0_1: ; %bb0
	; GFX11-NEXT: ; =>This Loop Header: Depth=1			; GFX11-NEXT: ; =>This Loop Header: Depth=1
	Show All 14 Lines
	; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0			; GFX11-NEXT: s_and_b32 s0, vcc_lo, s0
	; GFX11-NEXT: s_and_saveexec_b32 s0, s0			; GFX11-NEXT: s_and_saveexec_b32 s0, s0
	; GFX11-NEXT: buffer_store_b32 v0, v0, s[4:7], 0 offen			; GFX11-NEXT: buffer_store_b32 v0, v0, s[4:7], 0 offen
	; GFX11-NEXT: ; implicit-def: $vgpr2_vgpr3_vgpr4_vgpr5			; GFX11-NEXT: ; implicit-def: $vgpr2_vgpr3_vgpr4_vgpr5
	; GFX11-NEXT: s_xor_b32 exec_lo, exec_lo, s0			; GFX11-NEXT: s_xor_b32 exec_lo, exec_lo, s0
	; GFX11-NEXT: s_cbranch_execnz .LBB0_2			; GFX11-NEXT: s_cbranch_execnz .LBB0_2
	; GFX11-NEXT: ; %bb.3: ; in Loop: Header=BB0_1 Depth=1			; GFX11-NEXT: ; %bb.3: ; in Loop: Header=BB0_1 Depth=1
	; GFX11-NEXT: s_mov_b32 exec_lo, s1			; GFX11-NEXT: s_mov_b32 exec_lo, s1
	; GFX11-NEXT: s_branch .LBB0_1			; GFX11-NEXT: s_delay_alu instid0(SALU_CYCLE_1)
				; GFX11-NEXT: s_mov_b32 vcc_lo, exec_lo
				; GFX11-NEXT: s_cbranch_vccnz .LBB0_1
				; GFX11-NEXT: ; %bb.4: ; %DummyReturnBlock
				; GFX11-NEXT: s_waitcnt_vscnt null, 0x0
				; GFX11-NEXT: s_setpc_b64 s[30:31]
	entry:			entry:
	br label %bb0			br label %bb0

	bb0:			bb0:
	%desc = load <4 x i32>, ptr %arg, align 8			%desc = load <4 x i32>, ptr %arg, align 8
	tail call void @llvm.amdgcn.raw.buffer.store.f32(float undef, <4 x i32> %desc, i32 undef, i32 0, i32 0)			tail call void @llvm.amdgcn.raw.buffer.store.f32(float undef, <4 x i32> %desc, i32 undef, i32 0, i32 0)
	br label %bb0			br label %bb0
	}			}

	declare void @llvm.amdgcn.raw.buffer.store.f32(float, <4 x i32>, i32, i32, i32 immarg) #0			declare void @llvm.amdgcn.raw.buffer.store.f32(float, <4 x i32>, i32, i32, i32 immarg) #0

	attributes #0 = { nounwind writeonly }			attributes #0 = { nounwind writeonly }

llvm/test/Transforms/LoopStrengthReduce/AMDGPU/different-addrspace-crash.ll

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: llc < %s \| FileCheck %s			; RUN: llc < %s \| FileCheck %s

	target triple = "amdgcn--"			target triple = "amdgcn--"

	; We need to compile this for a target where we have different address spaces,			; We need to compile this for a target where we have different address spaces,
	; and where pointers in those address spaces have different size.			; and where pointers in those address spaces have different size.
	; E.g. for amdgcn-- pointers in address space 0 are 32 bits and pointers in			; E.g. for amdgcn-- pointers in address space 0 are 32 bits and pointers in
	; address space 1 are 64 bits.			; address space 1 are 64 bits.

	; We shouldn't crash. Check that we get a loop with the two stores.			; We shouldn't crash. Check that we get a loop with the two stores.
	;CHECK-LABEL: foo:			;CHECK-LABEL: foo:
	;CHECK: [[LOOP_LABEL:.LBB[0-9]+_[0-9]+]]:			;CHECK: [[LOOP_LABEL:.LBB[0-9]+_[0-9]+]]:
	;CHECK: buffer_store_dword			;CHECK: buffer_store_dword
	;CHECK: buffer_store_dword			;CHECK: buffer_store_dword
	;CHECK: s_branch [[LOOP_LABEL]]			;CHECK: s_cbranch_vccnz [[LOOP_LABEL]]

	define amdgpu_kernel void @foo() {			define amdgpu_kernel void @foo() {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%idx0 = phi i32 [ %next_idx0, %loop ], [ 0, %entry ]			%idx0 = phi i32 [ %next_idx0, %loop ], [ 0, %entry ]
	%0 = getelementptr inbounds i32, i32 addrspace(5)* null, i32 %idx0			%0 = getelementptr inbounds i32, i32 addrspace(5)* null, i32 %idx0
	%1 = getelementptr inbounds i32, i32 addrspace(1)* null, i32 %idx0			%1 = getelementptr inbounds i32, i32 addrspace(1)* null, i32 %idx0
	store i32 1, i32 addrspace(5)* %0			store i32 1, i32 addrspace(5)* %0
	store i32 7, i32 addrspace(1)* %1			store i32 7, i32 addrspace(1)* %1
	%next_idx0 = add nuw nsw i32 %idx0, 1			%next_idx0 = add nuw nsw i32 %idx0, 1
	br label %loop			br label %loop
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Unify divergent nodes if the PostDom tree has one root
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 486325

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp

llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll

llvm/test/CodeGen/AMDGPU/branch-relaxation.ll

llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll

llvm/test/CodeGen/AMDGPU/control-flow-optnone.ll

llvm/test/CodeGen/AMDGPU/infinite-loop.ll

llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll

llvm/test/CodeGen/AMDGPU/loop-live-out-copy-undef-subrange.ll

llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll

llvm/test/CodeGen/AMDGPU/si-annotate-nested-control-flows.ll

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll

llvm/test/Transforms/LoopStrengthReduce/AMDGPU/different-addrspace-crash.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Unify divergent nodes if the PostDom tree has one rootClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 486325

llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp

llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll

llvm/test/CodeGen/AMDGPU/branch-relaxation.ll

llvm/test/CodeGen/AMDGPU/cf-loop-on-constant.ll

llvm/test/CodeGen/AMDGPU/control-flow-optnone.ll

llvm/test/CodeGen/AMDGPU/infinite-loop.ll

llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll

llvm/test/CodeGen/AMDGPU/loop-live-out-copy-undef-subrange.ll

llvm/test/CodeGen/AMDGPU/optimize-negated-cond.ll

llvm/test/CodeGen/AMDGPU/si-annotate-nested-control-flows.ll

llvm/test/CodeGen/AMDGPU/unstructured-cfg-def-use-issue.ll

llvm/test/CodeGen/AMDGPU/vgpr-descriptor-waterfall-loop-idom-update.ll

llvm/test/Transforms/LoopStrengthReduce/AMDGPU/different-addrspace-crash.ll

[AMDGPU] Unify divergent nodes if the PostDom tree has one root
ClosedPublic