This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation.
ClosedPublic

Authored by abinavpp on Nov 2 2021, 4:30 AM.

Download Raw Diff

Details

Reviewers

arsenm
foad
rampitec
vangthao

Commits

rGfbe61fb0aa23: [AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation.

Summary

The function to generate S_MOV_B64_IMM_PSEUDO was recently modified to
optimize AGPR to AGPR copy but it missed checking for the SGPR
clobbering for the S_MOV_B64_IMM_PSEUDO generation.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

abinavpp created this revision.Nov 2 2021, 4:30 AM

Herald added subscribers: kerbowa, hiraditya, t-tye and 6 others. · View Herald TranscriptNov 2 2021, 4:30 AM

abinavpp requested review of this revision.Nov 2 2021, 4:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2021, 4:30 AM

Herald added subscribers: llvm-commits, wdng. · View Herald Transcript

This fixes the codegen of test_rocrand_kernel_xorwow.cpp of rocRAND (tracked by SWDEV-306338).

Harbormaster completed remote builds in B131939: Diff 384042.Nov 2 2021, 5:20 AM

arsenm added inline comments.Nov 2 2021, 7:07 AM

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
96–97	Can you just change the range loop to preincrement the iterator?
102–116	Checking for specific opcodes with specific subreg indices is definitely not the right way to check for register interference.

vangthao added inline comments.Nov 2 2021, 10:20 AM

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
102	Why do we check for COPY here? If Reg is not AGPR then we will not touch any COPY instructions.
119	Is there a reason why we need a new loop to check this? This should do the same as the removed lines of code below: if (Def0) return false; if (Def1) return false; As in if there are more than one def of a subreg then bail.

rampitec added inline comments.Nov 2 2021, 10:37 AM

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp
178	If you do a separate loop I do not understand why this handling is still in the second loop.

We don't need the new loop. Sorry for the noise. We just need to make sure that
we don't do the S_MOV_B64_IMM_PSEUDO generation when we have an SGPR COPY like
how D104874 did.

abinavpp retitled this revision from [AMDGPU] Fix subreg checks in S_MOV_B64_IMM_PSEUDO generation. to [AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation..Nov 2 2021, 6:34 PM

abinavpp edited the summary of this revision. (Show Details)

Harbormaster completed remote builds in B132118: Diff 384297.Nov 2 2021, 7:08 PM

LGTM.

This revision is now accepted and ready to land.Nov 2 2021, 7:47 PM

Closed by commit rGfbe61fb0aa23: [AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation. (authored by abinavpp). · Explain WhyNov 2 2021, 9:17 PM

This revision was automatically updated to reflect the committed changes.

abinavpp added a commit: rGfbe61fb0aa23: [AMDGPU] Fix SGPR checks in S_MOV_B64_IMM_PSEUDO generation..

vangthao mentioned this in D108830: [AMDGPU] Propagate defining src reg for AGPR to AGPR Copys.Nov 3 2021, 1:57 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AMDGPU/

GCNPreRAOptimizations.cpp

35 lines

test/

CodeGen/

AMDGPU/

agpr-to-agpr-copy.mir

91 lines

combine-sreg64-inits.mir

103 lines

Diff 384042

llvm/lib/Target/AMDGPU/GCNPreRAOptimizations.cpp

Show First 20 Lines • Show All 79 Lines • ▼ Show 20 Lines

FunctionPass *llvm::createGCNPreRAOptimizationsPass() {		FunctionPass *llvm::createGCNPreRAOptimizationsPass() {
return new GCNPreRAOptimizations();		return new GCNPreRAOptimizations();
}		}

bool GCNPreRAOptimizations::processReg(Register Reg) {		bool GCNPreRAOptimizations::processReg(Register Reg) {
MachineInstr *Def0 = nullptr;		MachineInstr *Def0 = nullptr;
MachineInstr *Def1 = nullptr;		MachineInstr *Def1 = nullptr;
uint64_t Init = 0;		uint64_t Init = 0, NumDef0 = 0, NumDef1 = 0;
bool Changed = false;		bool Changed = false;
SmallSet<Register, 32> ModifiedRegs;		SmallSet<Register, 32> ModifiedRegs;
bool IsAGPRDst = TRI->isAGPRClass(MRI->getRegClass(Reg));		bool IsAGPRDst = TRI->isAGPRClass(MRI->getRegClass(Reg));

		// Bail out if emitting AMDGPU::S_MOV_B64_IMM_PSEUDO will clobber the SGPR
		// (sub)reg.
		//
		// Note: It's tricky to do this in the next def_instructions() loop since the
		// AMDGPU::COPY handling there can potentially modify the IR. TODO: Can we
		arsenmUnsubmitted Not Done Reply Inline Actions Can you just change the range loop to preincrement the iterator? arsenm: Can you just change the range loop to preincrement the iterator?
		// move the AGPR copy optimization somewhere else?
		if (!IsAGPRDst) {
		for (MachineInstr &I : MRI->def_instructions(Reg)) {
		unsigned Opc = I.getOpcode();
		if (Opc == AMDGPU::COPY \|\| Opc == AMDGPU::S_MOV_B32) {
		vangthaoUnsubmitted Not Done Reply Inline Actions Why do we check for COPY here? If Reg is not AGPR then we will not touch any COPY instructions. vangthao: Why do we check for COPY here? If Reg is not AGPR then we will not touch any COPY instructions.
		switch (I.getOperand(0).getSubReg()) {
		default:
		return false;

		case AMDGPU::sub0:
		++NumDef0;
		break;

		case AMDGPU::sub1:
		++NumDef1;
		break;
		}
		}
		}
		arsenmUnsubmitted Not Done Reply Inline Actions Checking for specific opcodes with specific subreg indices is definitely not the right way to check for register interference. arsenm: Checking for specific opcodes with specific subreg indices is definitely not the right way to…

		if (NumDef0 > 1 \|\| NumDef1 > 1)
		return false;
		vangthaoUnsubmitted Not Done Reply Inline Actions Is there a reason why we need a new loop to check this? This should do the same as the removed lines of code below: if (Def0) return false; if (Def1) return false; As in if there are more than one def of a subreg then bail. vangthao: Is there a reason why we need a new loop to check this? This should do the same as the removed…
		}

for (MachineInstr &I : MRI->def_instructions(Reg)) {		for (MachineInstr &I : MRI->def_instructions(Reg)) {
switch (I.getOpcode()) {		switch (I.getOpcode()) {
default:		default:
return false;		return false;
case AMDGPU::V_ACCVGPR_WRITE_B32_e64:		case AMDGPU::V_ACCVGPR_WRITE_B32_e64:
break;		break;
case AMDGPU::COPY: {		case AMDGPU::COPY: {
// Some subtargets cannot do an AGPR to AGPR copy directly, and need an		// Some subtargets cannot do an AGPR to AGPR copy directly, and need an
Show All 40 Lines	case AMDGPU::COPY: {
}		}

// Found the defining accvgpr_write, stop looking any further.		// Found the defining accvgpr_write, stop looking any further.
break;		break;
}		}
}		}
break;		break;
}		}
case AMDGPU::S_MOV_B32:		case AMDGPU::S_MOV_B32:
		rampitecUnsubmitted Not Done Reply Inline Actions If you do a separate loop I do not understand why this handling is still in the second loop. rampitec: If you do a separate loop I do not understand why this handling is still in the second loop.
if (I.getOperand(0).getReg() != Reg \|\| !I.getOperand(1).isImm() \|\|		if (I.getOperand(0).getReg() != Reg \|\| !I.getOperand(1).isImm() \|\|
I.getNumOperands() != 2)		I.getNumOperands() != 2)
return false;		return false;

switch (I.getOperand(0).getSubReg()) {		switch (I.getOperand(0).getSubReg()) {
default:		default:
return false;		return false;
case AMDGPU::sub0:		case AMDGPU::sub0:
if (Def0)
return false;
Def0 = &I;		Def0 = &I;
Init \|= I.getOperand(1).getImm() & 0xffffffff;		Init \|= I.getOperand(1).getImm() & 0xffffffff;
break;		break;
case AMDGPU::sub1:		case AMDGPU::sub1:
if (Def1)
return false;
Def1 = &I;		Def1 = &I;
Init \|= static_cast<uint64_t>(I.getOperand(1).getImm()) << 32;		Init \|= static_cast<uint64_t>(I.getOperand(1).getImm()) << 32;
break;		break;
}		}
break;		break;
}		}
}		}

▲ Show 20 Lines • Show All 66 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/agpr-to-agpr-copy.mir

	# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py			# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -mcpu=gfx908 -run-pass=liveintervals,amdgpu-pre-ra-optimizations -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GFX908 %s			# RUN: llc -march=amdgcn -mcpu=gfx908 -run-pass=liveintervals,amdgpu-pre-ra-optimizations -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GFX908 %s

	---			---
	name: test_mfma_f32_4x4x1f32_propagate_vgpr			name: test_mfma_f32_4x4x1f32_propagate_vgpr
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr0_sgpr1			liveins: $sgpr0_sgpr1
	; GFX908-LABEL: name: test_mfma_f32_4x4x1f32_propagate_vgpr			; GFX908-LABEL: name: test_mfma_f32_4x4x1f32_propagate_vgpr
	; GFX908: liveins: $sgpr0_sgpr1			; GFX908: liveins: $sgpr0_sgpr1
	; GFX908: [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr0_sgpr1			; GFX908-NEXT: {{ $}}
	; GFX908: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]](p4), 36, 0 :: (dereferenceable invariant load (s64), align 4, addrspace 4)			; GFX908-NEXT: [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr0_sgpr1
	; GFX908: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX908-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]](p4), 36, 0 :: (dereferenceable invariant load (s64), align 4, addrspace 4)
	; GFX908: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1123418112, implicit $exec			; GFX908-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX908: undef %4.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 [[V_MOV_B32_e32_1]], implicit $exec			; GFX908-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1123418112, implicit $exec
	; GFX908: %4.sub1:areg_128 = COPY [[V_MOV_B32_e32_1]]			; GFX908-NEXT: undef %4.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 [[V_MOV_B32_e32_1]], implicit $exec
	; GFX908: %4.sub2:areg_128 = COPY [[V_MOV_B32_e32_1]]			; GFX908-NEXT: %4.sub1:areg_128 = COPY [[V_MOV_B32_e32_1]]
	; GFX908: %4.sub3:areg_128 = COPY [[V_MOV_B32_e32_1]]			; GFX908-NEXT: %4.sub2:areg_128 = COPY [[V_MOV_B32_e32_1]]
	; GFX908: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec			; GFX908-NEXT: %4.sub3:areg_128 = COPY [[V_MOV_B32_e32_1]]
	; GFX908: [[V_MOV_B32_e32_3:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec			; GFX908-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec
	; GFX908: [[V_MFMA_F32_4X4X1F32_e64_:%[0-9]+]]:areg_128 = V_MFMA_F32_4X4X1F32_e64 [[V_MOV_B32_e32_3]], [[V_MOV_B32_e32_2]], %4, 0, 0, 0, implicit $mode, implicit $exec			; GFX908-NEXT: [[V_MOV_B32_e32_3:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
	; GFX908: [[COPY1:%[0-9]+]]:vreg_128 = COPY [[V_MFMA_F32_4X4X1F32_e64_]]			; GFX908-NEXT: [[V_MFMA_F32_4X4X1F32_e64_:%[0-9]+]]:areg_128 = V_MFMA_F32_4X4X1F32_e64 [[V_MOV_B32_e32_3]], [[V_MOV_B32_e32_2]], %4, 0, 0, 0, implicit $mode, implicit $exec
	; GFX908: GLOBAL_STORE_DWORDX4_SADDR [[V_MOV_B32_e32_]], [[COPY1]], [[S_LOAD_DWORDX2_IMM]], 0, 0, implicit $exec :: (store (s128), addrspace 1)			; GFX908-NEXT: [[COPY1:%[0-9]+]]:vreg_128 = COPY [[V_MFMA_F32_4X4X1F32_e64_]]
	; GFX908: S_ENDPGM 0			; GFX908-NEXT: GLOBAL_STORE_DWORDX4_SADDR [[V_MOV_B32_e32_]], [[COPY1]], [[S_LOAD_DWORDX2_IMM]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
				; GFX908-NEXT: S_ENDPGM 0
	%1:sgpr_64(p4) = COPY $sgpr0_sgpr1			%1:sgpr_64(p4) = COPY $sgpr0_sgpr1
	%4:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 36, 0 :: (dereferenceable invariant load (s64), align 4, addrspace 4)			%4:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 36, 0 :: (dereferenceable invariant load (s64), align 4, addrspace 4)
	%5:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			%5:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	%13:vgpr_32 = V_MOV_B32_e32 1123418112, implicit $exec			%13:vgpr_32 = V_MOV_B32_e32 1123418112, implicit $exec
	undef %11.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 %13:vgpr_32, implicit $exec			undef %11.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 %13:vgpr_32, implicit $exec
	%11.sub1:areg_128 = COPY %11.sub0:areg_128			%11.sub1:areg_128 = COPY %11.sub0:areg_128
	%11.sub2:areg_128 = COPY %11.sub0:areg_128			%11.sub2:areg_128 = COPY %11.sub0:areg_128
	%11.sub3:areg_128 = COPY %11.sub0:areg_128			%11.sub3:areg_128 = COPY %11.sub0:areg_128
	%8:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec			%8:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec
	%9:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec			%9:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
	%10:areg_128 = V_MFMA_F32_4X4X1F32_e64 %9:vgpr_32, %8:vgpr_32, %11:areg_128, 0, 0, 0, implicit $mode, implicit $exec			%10:areg_128 = V_MFMA_F32_4X4X1F32_e64 %9:vgpr_32, %8:vgpr_32, %11:areg_128, 0, 0, 0, implicit $mode, implicit $exec
	%12:vreg_128 = COPY %10:areg_128			%12:vreg_128 = COPY %10:areg_128
	GLOBAL_STORE_DWORDX4_SADDR %5:vgpr_32, %12:vreg_128, %4:sreg_64_xexec, 0, 0, implicit $exec :: (store (s128), addrspace 1)			GLOBAL_STORE_DWORDX4_SADDR %5:vgpr_32, %12:vreg_128, %4:sreg_64_xexec, 0, 0, implicit $exec :: (store (s128), addrspace 1)
	S_ENDPGM 0			S_ENDPGM 0
	...			...
	---			---
	name: test_mfma_f32_4x4x1f32_no_propagate_imm			name: test_mfma_f32_4x4x1f32_no_propagate_imm
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $sgpr0_sgpr1			liveins: $sgpr0_sgpr1
	; GFX908-LABEL: name: test_mfma_f32_4x4x1f32_no_propagate_imm			; GFX908-LABEL: name: test_mfma_f32_4x4x1f32_no_propagate_imm
	; GFX908: liveins: $sgpr0_sgpr1			; GFX908: liveins: $sgpr0_sgpr1
	; GFX908: [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr0_sgpr1			; GFX908-NEXT: {{ $}}
	; GFX908: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]](p4), 36, 0 :: (dereferenceable invariant load (s64), align 4, addrspace 4)			; GFX908-NEXT: [[COPY:%[0-9]+]]:sgpr_64(p4) = COPY $sgpr0_sgpr1
	; GFX908: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			; GFX908-NEXT: [[S_LOAD_DWORDX2_IMM:%[0-9]+]]:sreg_64_xexec = S_LOAD_DWORDX2_IMM [[COPY]](p4), 36, 0 :: (dereferenceable invariant load (s64), align 4, addrspace 4)
	; GFX908: undef %3.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 1073741824, implicit $exec			; GFX908-NEXT: [[V_MOV_B32_e32_:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	; GFX908: %3.sub1:areg_128 = COPY %3.sub0			; GFX908-NEXT: undef %3.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 1073741824, implicit $exec
	; GFX908: %3.sub2:areg_128 = COPY %3.sub0			; GFX908-NEXT: %3.sub1:areg_128 = COPY %3.sub0
	; GFX908: %3.sub3:areg_128 = COPY %3.sub0			; GFX908-NEXT: %3.sub2:areg_128 = COPY %3.sub0
	; GFX908: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec			; GFX908-NEXT: %3.sub3:areg_128 = COPY %3.sub0
	; GFX908: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec			; GFX908-NEXT: [[V_MOV_B32_e32_1:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec
	; GFX908: [[V_MFMA_F32_4X4X1F32_e64_:%[0-9]+]]:areg_128 = V_MFMA_F32_4X4X1F32_e64 [[V_MOV_B32_e32_2]], [[V_MOV_B32_e32_1]], %3, 0, 0, 0, implicit $mode, implicit $exec			; GFX908-NEXT: [[V_MOV_B32_e32_2:%[0-9]+]]:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
	; GFX908: [[COPY1:%[0-9]+]]:vreg_128 = COPY [[V_MFMA_F32_4X4X1F32_e64_]]			; GFX908-NEXT: [[V_MFMA_F32_4X4X1F32_e64_:%[0-9]+]]:areg_128 = V_MFMA_F32_4X4X1F32_e64 [[V_MOV_B32_e32_2]], [[V_MOV_B32_e32_1]], %3, 0, 0, 0, implicit $mode, implicit $exec
	; GFX908: GLOBAL_STORE_DWORDX4_SADDR [[V_MOV_B32_e32_]], [[COPY1]], [[S_LOAD_DWORDX2_IMM]], 0, 0, implicit $exec :: (store (s128), addrspace 1)			; GFX908-NEXT: [[COPY1:%[0-9]+]]:vreg_128 = COPY [[V_MFMA_F32_4X4X1F32_e64_]]
	; GFX908: S_ENDPGM 0			; GFX908-NEXT: GLOBAL_STORE_DWORDX4_SADDR [[V_MOV_B32_e32_]], [[COPY1]], [[S_LOAD_DWORDX2_IMM]], 0, 0, implicit $exec :: (store (s128), addrspace 1)
				; GFX908-NEXT: S_ENDPGM 0
	%1:sgpr_64(p4) = COPY $sgpr0_sgpr1			%1:sgpr_64(p4) = COPY $sgpr0_sgpr1
	%4:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 36, 0 :: (dereferenceable invariant load (s64), align 4, addrspace 4)			%4:sreg_64_xexec = S_LOAD_DWORDX2_IMM %1:sgpr_64(p4), 36, 0 :: (dereferenceable invariant load (s64), align 4, addrspace 4)
	%5:vgpr_32 = V_MOV_B32_e32 0, implicit $exec			%5:vgpr_32 = V_MOV_B32_e32 0, implicit $exec
	undef %11.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 1073741824, implicit $exec			undef %11.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 1073741824, implicit $exec
	%11.sub1:areg_128 = COPY %11.sub0:areg_128			%11.sub1:areg_128 = COPY %11.sub0:areg_128
	%11.sub2:areg_128 = COPY %11.sub0:areg_128			%11.sub2:areg_128 = COPY %11.sub0:areg_128
	%11.sub3:areg_128 = COPY %11.sub0:areg_128			%11.sub3:areg_128 = COPY %11.sub0:areg_128
	%8:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec			%8:vgpr_32 = V_MOV_B32_e32 1073741824, implicit $exec
	%9:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec			%9:vgpr_32 = V_MOV_B32_e32 1065353216, implicit $exec
	%10:areg_128 = V_MFMA_F32_4X4X1F32_e64 %9:vgpr_32, %8:vgpr_32, %11:areg_128, 0, 0, 0, implicit $mode, implicit $exec			%10:areg_128 = V_MFMA_F32_4X4X1F32_e64 %9:vgpr_32, %8:vgpr_32, %11:areg_128, 0, 0, 0, implicit $mode, implicit $exec
	%12:vreg_128 = COPY %10:areg_128			%12:vreg_128 = COPY %10:areg_128
	GLOBAL_STORE_DWORDX4_SADDR %5:vgpr_32, %12:vreg_128, %4:sreg_64_xexec, 0, 0, implicit $exec :: (store (s128), addrspace 1)			GLOBAL_STORE_DWORDX4_SADDR %5:vgpr_32, %12:vreg_128, %4:sreg_64_xexec, 0, 0, implicit $exec :: (store (s128), addrspace 1)
	S_ENDPGM 0			S_ENDPGM 0
	...			...
	---			---
	name: test_vgpr_subreg_propagate			name: test_vgpr_subreg_propagate
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1_vgpr2_vgpr3			liveins: $vgpr0_vgpr1_vgpr2_vgpr3
	; GFX908-LABEL: name: test_vgpr_subreg_propagate			; GFX908-LABEL: name: test_vgpr_subreg_propagate
	; GFX908: liveins: $vgpr0_vgpr1_vgpr2_vgpr3			; GFX908: liveins: $vgpr0_vgpr1_vgpr2_vgpr3
	; GFX908: [[COPY:%[0-9]+]]:vreg_128 = COPY $vgpr0_vgpr1_vgpr2_vgpr3, implicit $exec			; GFX908-NEXT: {{ $}}
	; GFX908: undef %1.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub0, implicit $exec			; GFX908-NEXT: [[COPY:%[0-9]+]]:vreg_128 = COPY $vgpr0_vgpr1_vgpr2_vgpr3, implicit $exec
	; GFX908: %1.sub1:areg_128 = COPY [[COPY]].sub0			; GFX908-NEXT: undef %1.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub0, implicit $exec
	; GFX908: %1.sub2:areg_128 = COPY [[COPY]].sub0			; GFX908-NEXT: %1.sub1:areg_128 = COPY [[COPY]].sub0
	; GFX908: %1.sub3:areg_128 = COPY [[COPY]].sub0			; GFX908-NEXT: %1.sub2:areg_128 = COPY [[COPY]].sub0
	; GFX908: S_ENDPGM 0, implicit [[COPY]], implicit %1			; GFX908-NEXT: %1.sub3:areg_128 = COPY [[COPY]].sub0
				; GFX908-NEXT: S_ENDPGM 0, implicit [[COPY]], implicit %1
	%0:vreg_128 = COPY $vgpr0_vgpr1_vgpr2_vgpr3, implicit $exec			%0:vreg_128 = COPY $vgpr0_vgpr1_vgpr2_vgpr3, implicit $exec
	undef %1.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 %0.sub0, implicit $exec			undef %1.sub0:areg_128 = V_ACCVGPR_WRITE_B32_e64 %0.sub0, implicit $exec
	%1.sub1:areg_128 = COPY %1.sub0:areg_128			%1.sub1:areg_128 = COPY %1.sub0:areg_128
	%1.sub2:areg_128 = COPY %1.sub0:areg_128			%1.sub2:areg_128 = COPY %1.sub0:areg_128
	%1.sub3:areg_128 = COPY %1.sub0:areg_128			%1.sub3:areg_128 = COPY %1.sub0:areg_128
	S_ENDPGM 0, implicit %0, implicit %1			S_ENDPGM 0, implicit %0, implicit %1
	...			...
	---			---
	name: test_nonmatching_agpr_subreg_no_propagate			name: test_nonmatching_agpr_subreg_no_propagate
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1			liveins: $vgpr0_vgpr1
	; GFX908-LABEL: name: test_nonmatching_agpr_subreg_no_propagate			; GFX908-LABEL: name: test_nonmatching_agpr_subreg_no_propagate
	; GFX908: liveins: $vgpr0_vgpr1			; GFX908: liveins: $vgpr0_vgpr1
	; GFX908: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1, implicit $exec			; GFX908-NEXT: {{ $}}
	; GFX908: undef %1.sub0:areg_64 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub0, implicit $exec			; GFX908-NEXT: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1, implicit $exec
	; GFX908: %1.sub1:areg_64 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub1, implicit $exec			; GFX908-NEXT: undef %1.sub0:areg_64 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub0, implicit $exec
	; GFX908: [[COPY1:%[0-9]+]]:areg_64 = COPY %1			; GFX908-NEXT: %1.sub1:areg_64 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub1, implicit $exec
	; GFX908: S_ENDPGM 0, implicit [[COPY]], implicit %1, implicit [[COPY1]]			; GFX908-NEXT: [[COPY1:%[0-9]+]]:areg_64 = COPY %1
				; GFX908-NEXT: S_ENDPGM 0, implicit [[COPY]], implicit %1, implicit [[COPY1]]
	%0:vreg_64 = COPY $vgpr0_vgpr1, implicit $exec			%0:vreg_64 = COPY $vgpr0_vgpr1, implicit $exec
	undef %1.sub0:areg_64 = V_ACCVGPR_WRITE_B32_e64 %0.sub0, implicit $exec			undef %1.sub0:areg_64 = V_ACCVGPR_WRITE_B32_e64 %0.sub0, implicit $exec
	%1.sub1:areg_64 = V_ACCVGPR_WRITE_B32_e64 %0.sub1, implicit $exec			%1.sub1:areg_64 = V_ACCVGPR_WRITE_B32_e64 %0.sub1, implicit $exec
	%2:areg_64 = COPY %1:areg_64			%2:areg_64 = COPY %1:areg_64
	S_ENDPGM 0, implicit %0, implicit %1, implicit %2			S_ENDPGM 0, implicit %0, implicit %1, implicit %2
	...			...
	---			---
	name: test_subreg_to_single_agpr_reg_propagate			name: test_subreg_to_single_agpr_reg_propagate
	tracksRegLiveness: true			tracksRegLiveness: true

	body: \|			body: \|
	bb.0:			bb.0:
	liveins: $vgpr0_vgpr1			liveins: $vgpr0_vgpr1
	; GFX908-LABEL: name: test_subreg_to_single_agpr_reg_propagate			; GFX908-LABEL: name: test_subreg_to_single_agpr_reg_propagate
	; GFX908: liveins: $vgpr0_vgpr1			; GFX908: liveins: $vgpr0_vgpr1
	; GFX908: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1, implicit $exec			; GFX908-NEXT: {{ $}}
	; GFX908: undef %1.sub0:areg_64 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub0, implicit $exec			; GFX908-NEXT: [[COPY:%[0-9]+]]:vreg_64 = COPY $vgpr0_vgpr1, implicit $exec
	; GFX908: %1.sub1:areg_64 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub1, implicit $exec			; GFX908-NEXT: undef %1.sub0:areg_64 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub0, implicit $exec
	; GFX908: [[COPY1:%[0-9]+]]:agpr_32 = COPY [[COPY]].sub1			; GFX908-NEXT: %1.sub1:areg_64 = V_ACCVGPR_WRITE_B32_e64 [[COPY]].sub1, implicit $exec
	; GFX908: S_ENDPGM 0, implicit [[COPY]], implicit %1, implicit [[COPY1]]			; GFX908-NEXT: [[COPY1:%[0-9]+]]:agpr_32 = COPY [[COPY]].sub1
				; GFX908-NEXT: S_ENDPGM 0, implicit [[COPY]], implicit %1, implicit [[COPY1]]
	%0:vreg_64 = COPY $vgpr0_vgpr1, implicit $exec			%0:vreg_64 = COPY $vgpr0_vgpr1, implicit $exec
	undef %1.sub0:areg_64 = V_ACCVGPR_WRITE_B32_e64 %0.sub0, implicit $exec			undef %1.sub0:areg_64 = V_ACCVGPR_WRITE_B32_e64 %0.sub0, implicit $exec
	%1.sub1:areg_64 = V_ACCVGPR_WRITE_B32_e64 %0.sub1, implicit $exec			%1.sub1:areg_64 = V_ACCVGPR_WRITE_B32_e64 %0.sub1, implicit $exec
	%2:agpr_32 = COPY %1.sub1:areg_64			%2:agpr_32 = COPY %1.sub1:areg_64
	S_ENDPGM 0, implicit %0, implicit %1, implicit %2			S_ENDPGM 0, implicit %0, implicit %1, implicit %2
	...			...

llvm/test/CodeGen/AMDGPU/combine-sreg64-inits.mir

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
	# RUN: llc -march=amdgcn -verify-machineinstrs -run-pass=liveintervals,amdgpu-pre-ra-optimizations %s -o - \| FileCheck -check-prefix=GCN %s			# RUN: llc -march=amdgcn -verify-machineinstrs -run-pass=liveintervals,amdgpu-pre-ra-optimizations %s -o - \| FileCheck -check-prefix=GCN %s

	---			---
	# GCN-LABEL: name: combine_sreg64_inits
	# GCN: %0:sgpr_64 = S_MOV_B64_IMM_PSEUDO 8589934593
	# GCN: S_NOP 0
	name: combine_sreg64_inits			name: combine_sreg64_inits
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0:			bb.0:
				; GCN-LABEL: name: combine_sreg64_inits
				; GCN: dead %0:sgpr_64 = S_MOV_B64_IMM_PSEUDO 8589934593
				; GCN-NEXT: S_NOP 0
	undef %0.sub0:sgpr_64 = S_MOV_B32 1			undef %0.sub0:sgpr_64 = S_MOV_B32 1
	S_NOP 0			S_NOP 0
	%0.sub1:sgpr_64 = S_MOV_B32 2			%0.sub1:sgpr_64 = S_MOV_B32 2
	...			...
	---			---
	# GCN-LABEL: name: combine_sreg64_inits_swap
	# GCN: %0:sgpr_64 = S_MOV_B64_IMM_PSEUDO 8589934593
	# GCN: S_NOP 0
	name: combine_sreg64_inits_swap			name: combine_sreg64_inits_swap
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0:			bb.0:
				; GCN-LABEL: name: combine_sreg64_inits_swap
				; GCN: dead %0:sgpr_64 = S_MOV_B64_IMM_PSEUDO 8589934593
				; GCN-NEXT: S_NOP 0
	undef %0.sub1:sgpr_64 = S_MOV_B32 2			undef %0.sub1:sgpr_64 = S_MOV_B32 2
	S_NOP 0			S_NOP 0
	%0.sub0:sgpr_64 = S_MOV_B32 1			%0.sub0:sgpr_64 = S_MOV_B32 1
	...			...
	---			---
	# GCN-LABEL: name: sreg64_inits_different_blocks			name: sreg64_subreg_copy_0
	# GCN: undef %0.sub0:sgpr_64 = S_MOV_B32 1			tracksRegLiveness: true
	# GCN: %0.sub1:sgpr_64 = S_MOV_B32 2			body: \|
				bb.0:
				; GCN-LABEL: name: sreg64_subreg_copy_0
				; GCN: [[DEF:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: undef %1.sub0:sgpr_64 = COPY [[DEF]]
				; GCN-NEXT: %1.sub0:sgpr_64 = S_MOV_B32 1
				; GCN-NEXT: dead %1.sub1:sgpr_64 = S_MOV_B32 2
				%0:sgpr_32 = IMPLICIT_DEF
				undef %1.sub0:sgpr_64 = COPY %0:sgpr_32
				%1.sub0:sgpr_64 = S_MOV_B32 1
				%1.sub1:sgpr_64 = S_MOV_B32 2
				...
				---
				name: sreg64_subreg_copy_1
				tracksRegLiveness: true
				body: \|
				bb.0:
				; GCN-LABEL: name: sreg64_subreg_copy_1
				; GCN: [[DEF:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: undef %1.sub0:sgpr_64 = S_MOV_B32 1
				; GCN-NEXT: %1.sub1:sgpr_64 = COPY [[DEF]]
				; GCN-NEXT: dead %1.sub1:sgpr_64 = S_MOV_B32 2
				%0:sgpr_32 = IMPLICIT_DEF
				undef %1.sub0:sgpr_64 = S_MOV_B32 1
				%1.sub1:sgpr_64 = COPY %0:sgpr_32
				%1.sub1:sgpr_64 = S_MOV_B32 2
				...
				---
				name: sreg64_subreg_copy_2
				tracksRegLiveness: true
				body: \|
				bb.0:
				; GCN-LABEL: name: sreg64_subreg_copy_2
				; GCN: [[DEF:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
				; GCN-NEXT: undef %1.sub0:sgpr_64 = S_MOV_B32 1
				; GCN-NEXT: %1.sub1:sgpr_64 = S_MOV_B32 2
				; GCN-NEXT: dead %1.sub0:sgpr_64 = COPY [[DEF]]
				%0:sgpr_32 = IMPLICIT_DEF
				undef %1.sub0:sgpr_64 = S_MOV_B32 1
				%1.sub1:sgpr_64 = S_MOV_B32 2
				%1.sub0:sgpr_64 = COPY %0:sgpr_32
				...
				---
	name: sreg64_inits_different_blocks			name: sreg64_inits_different_blocks
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
				; GCN-LABEL: name: sreg64_inits_different_blocks
				; GCN: bb.0:
				; GCN-NEXT: successors: %bb.1(0x80000000)
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: undef %0.sub0:sgpr_64 = S_MOV_B32 1
				; GCN-NEXT: {{ $}}
				; GCN-NEXT: bb.1:
				; GCN-NEXT: dead %0.sub1:sgpr_64 = S_MOV_B32 2
	bb.0:			bb.0:
	undef %0.sub0:sgpr_64 = S_MOV_B32 1			undef %0.sub0:sgpr_64 = S_MOV_B32 1

	bb.1:			bb.1:
	%0.sub1:sgpr_64 = S_MOV_B32 2			%0.sub1:sgpr_64 = S_MOV_B32 2
	...			...
	---			---
	# GCN-LABEL: name: sreg64_inits_two_defs_sub1
	# GCN: undef %0.sub0:sgpr_64 = S_MOV_B32 1
	# GCN: %0.sub1:sgpr_64 = S_MOV_B32 2
	# GCN: %0.sub1:sgpr_64 = S_MOV_B32 3
	name: sreg64_inits_two_defs_sub1			name: sreg64_inits_two_defs_sub1
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0:			bb.0:
				; GCN-LABEL: name: sreg64_inits_two_defs_sub1
				; GCN: undef %0.sub0:sgpr_64 = S_MOV_B32 1
				; GCN-NEXT: %0.sub1:sgpr_64 = S_MOV_B32 2
				; GCN-NEXT: dead %0.sub1:sgpr_64 = S_MOV_B32 3
	undef %0.sub0:sgpr_64 = S_MOV_B32 1			undef %0.sub0:sgpr_64 = S_MOV_B32 1
	%0.sub1:sgpr_64 = S_MOV_B32 2			%0.sub1:sgpr_64 = S_MOV_B32 2
	%0.sub1:sgpr_64 = S_MOV_B32 3			%0.sub1:sgpr_64 = S_MOV_B32 3
	...			...
	---			---
	# GCN-LABEL: name: sreg64_inits_two_defs_sub0
	# GCN: undef %0.sub0:sgpr_64 = S_MOV_B32 1
	# GCN: %0.sub1:sgpr_64 = S_MOV_B32 2
	# GCN: %0.sub0:sgpr_64 = S_MOV_B32 3
	name: sreg64_inits_two_defs_sub0			name: sreg64_inits_two_defs_sub0
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0:			bb.0:
				; GCN-LABEL: name: sreg64_inits_two_defs_sub0
				; GCN: undef %0.sub0:sgpr_64 = S_MOV_B32 1
				; GCN-NEXT: %0.sub1:sgpr_64 = S_MOV_B32 2
				; GCN-NEXT: dead %0.sub0:sgpr_64 = S_MOV_B32 3
	undef %0.sub0:sgpr_64 = S_MOV_B32 1			undef %0.sub0:sgpr_64 = S_MOV_B32 1
	%0.sub1:sgpr_64 = S_MOV_B32 2			%0.sub1:sgpr_64 = S_MOV_B32 2
	%0.sub0:sgpr_64 = S_MOV_B32 3			%0.sub0:sgpr_64 = S_MOV_B32 3
	...			...
	---			---
	# GCN-LABEL: name: sreg64_inits_full_def
	# GCN: undef %1.sub0:sgpr_64 = S_MOV_B32 1
	# GCN: %0:sgpr_64 = S_MOV_B64 3
	name: sreg64_inits_full_def			name: sreg64_inits_full_def
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0:			bb.0:
				; GCN-LABEL: name: sreg64_inits_full_def
				; GCN: dead undef %1.sub0:sgpr_64 = S_MOV_B32 1
				; GCN-NEXT: dead %0:sgpr_64 = S_MOV_B64 3
	undef %0.sub0:sgpr_64 = S_MOV_B32 1			undef %0.sub0:sgpr_64 = S_MOV_B32 1
	%0:sgpr_64 = S_MOV_B64 3			%0:sgpr_64 = S_MOV_B64 3
	...			...
	---			---
	# GCN-LABEL: name: sreg64_inits_imp_use
	# GCN: %0.sub0:sgpr_64 = S_MOV_B32 1, implicit $m0
	# GCN: %0.sub1:sgpr_64 = S_MOV_B32 2
	name: sreg64_inits_imp_use			name: sreg64_inits_imp_use
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0:			bb.0:
				; GCN-LABEL: name: sreg64_inits_imp_use
				; GCN: undef %0.sub0:sgpr_64 = S_MOV_B32 1, implicit $m0
				; GCN-NEXT: dead %0.sub1:sgpr_64 = S_MOV_B32 2
	undef %0.sub0:sgpr_64 = S_MOV_B32 1, implicit $m0			undef %0.sub0:sgpr_64 = S_MOV_B32 1, implicit $m0
	%0.sub1:sgpr_64 = S_MOV_B32 2			%0.sub1:sgpr_64 = S_MOV_B32 2
	...			...
	---			---
	# GCN-LABEL: name: sreg64_inits_imp_def
	# GCN: %0.sub0:sgpr_64 = S_MOV_B32 1, implicit-def $scc
	# GCN: %0.sub1:sgpr_64 = S_MOV_B32 2
	name: sreg64_inits_imp_def			name: sreg64_inits_imp_def
	tracksRegLiveness: true			tracksRegLiveness: true
	body: \|			body: \|
	bb.0:			bb.0:
				; GCN-LABEL: name: sreg64_inits_imp_def
				; GCN: undef %0.sub0:sgpr_64 = S_MOV_B32 1, implicit-def $scc
				; GCN-NEXT: dead %0.sub1:sgpr_64 = S_MOV_B32 2
	undef %0.sub0:sgpr_64 = S_MOV_B32 1, implicit-def $scc			undef %0.sub0:sgpr_64 = S_MOV_B32 1, implicit-def $scc
	%0.sub1:sgpr_64 = S_MOV_B32 2			%0.sub1:sgpr_64 = S_MOV_B32 2
	...			...