Download Raw Diff

Details

Reviewers

arsenm
tpr

Commits

rG96e51ed005a9: [AMDGPU] Implement copyPhysReg for 16 bit subregs

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rampitec created this revision.Feb 20 2020, 4:51 PM

Herald added a project: Restricted Project. · View Herald TranscriptFeb 20 2020, 4:51 PM

Herald added subscribers: kerbowa, hiraditya, t-tye and 7 others. · View Herald Transcript

rampitec added a parent revision: D74873: [AMDGPU] Define 16 bit VGPR subregs.Feb 20 2020, 4:51 PM

I don't think copies of these should ever be produced (at leasts for the high half) since the high half is not really addressable, and only appears that way to some instructions. Where are copies coming from?

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
701	V_PACK_B32_F16 has some FP flushing properties and is not suitable for a copy. I think you have to do essentially what D74740 does

In D74937#1885659, @arsenm wrote:

I don't think copies of these should ever be produced (at leasts for the high half) since the high half is not really addressable, and only appears that way to some instructions. Where are copies coming from?

First, hi16 registers are used by load_hi instructions, that is their destination. And then RA can happily copy anything to anything. For sanity we need to know how to copy any register.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
701	I cannot do it here, I would need to scavenge a physreg for a mask, either if I use v_perm_b32 (if available) or v_bfi_b32... In fact I do not see a good instruction to do it if v_pack_b32 does not work.

arsenm added inline comments.Feb 20 2020, 5:16 PM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
701	Yes, there are definitely missing instructions to handle this well. I think you can use V_ALIGNBIT_B32 without an extra register in a subset of cases

rampitec marked an inline comment as done.Feb 20 2020, 5:22 PM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
701	It does not work for the most needed thing: copy low to low. Well, in fact it does not help at all.

In D74937#1885675, @rampitec wrote:

In D74937#1885659, @arsenm wrote:

I don't think copies of these should ever be produced (at leasts for the high half) since the high half is not really addressable, and only appears that way to some instructions. Where are copies coming from?

First, hi16 registers are used by load_hi instructions, that is their destination. And then RA can happily copy anything to anything. For sanity we need to know how to copy any register.

The high result isn't what's encoded though, so they really are writing the 32-bit register. They only read the low 16-bits. I think the correct way to model this is a 32-bit write but only a 16-bit read

In D74937#1885689, @arsenm wrote:

In D74937#1885675, @rampitec wrote:

In D74937#1885659, @arsenm wrote:

I don't think copies of these should ever be produced (at leasts for the high half) since the high half is not really addressable, and only appears that way to some instructions. Where are copies coming from?

First, hi16 registers are used by load_hi instructions, that is their destination. And then RA can happily copy anything to anything. For sanity we need to know how to copy any register.

The high result isn't what's encoded though, so they really are writing the 32-bit register. They only read the low 16-bits. I think the correct way to model this is a 32-bit write but only a 16-bit read

Low16 are preserved and if we say we write 32 bit then we cannot model it.

In D74937#1885691, @rampitec wrote:

In D74937#1885689, @arsenm wrote:

In D74937#1885675, @rampitec wrote:

In D74937#1885659, @arsenm wrote:

I don't think copies of these should ever be produced (at leasts for the high half) since the high half is not really addressable, and only appears that way to some instructions. Where are copies coming from?

First, hi16 registers are used by load_hi instructions, that is their destination. And then RA can happily copy anything to anything. For sanity we need to know how to copy any register.

The high result isn't what's encoded though, so they really are writing the 32-bit register. They only read the low 16-bits. I think the correct way to model this is a 32-bit write but only a 16-bit read

Low16 are preserved and if we say we write 32 bit then we cannot model it.

I think declaring the high 16 is the output register is still wrong and not how it's encoded. Having only the 16-bit read is still an improvement

In D74937#1885695, @arsenm wrote:

In D74937#1885691, @rampitec wrote:

In D74937#1885689, @arsenm wrote:

In D74937#1885675, @rampitec wrote:

In D74937#1885659, @arsenm wrote:

I don't think copies of these should ever be produced (at leasts for the high half) since the high half is not really addressable, and only appears that way to some instructions. Where are copies coming from?

First, hi16 registers are used by load_hi instructions, that is their destination. And then RA can happily copy anything to anything. For sanity we need to know how to copy any register.

The high result isn't what's encoded though, so they really are writing the 32-bit register. They only read the low 16-bits. I think the correct way to model this is a 32-bit write but only a 16-bit read

Low16 are preserved and if we say we write 32 bit then we cannot model it.

I think declaring the high 16 is the output register is still wrong and not how it's encoded. Having only the 16-bit read is still an improvement

The testcase I showed in a parent revision only works if I define both, and for a reason: only so we can model independent subreg access.

rampitec marked an inline comment as done.Feb 20 2020, 5:47 PM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
701	Another thing which doesn't work is mov with sdwa. It needs dst_preserve and then it needs tied operand for this. If pack doesn't work I can think only about an extremely ugly solution in a general case: clear destination bits with two shifts, then use v_or_b16 with op_sel.

rampitec marked an inline comment as done.Feb 20 2020, 5:55 PM

rampitec added inline comments.

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
701	Oh, wait... We do not have v_or_b16. Then v_add_u16 with op_sel, afair it will preserve the other half of the destination.

Changed expansion to pairs of binary instructions.
Verified expansion with RT test.

arsenm added inline comments.Feb 28 2020, 7:35 AM

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
717	Braecss
723	Braecs
llvm/test/CodeGen/AMDGPU/lo16-hi16-physreg-copy.mir
106	Needs some tests with both halves in the same 32-bit register Also need some with kill and undef flag handling

Added braces.

llvm/test/CodeGen/AMDGPU/lo16-hi16-physreg-copy.mir
106	Good catch about the same VGPR. I will need some special handling for it.

rampitec planned changes to this revision.Feb 28 2020, 12:23 PM

Added handling of DestReg == SrcReg.
Added test for killed/undef/full reg partial copy.

Uploaded full context diff.

Ping.

rampitec added a reviewer: tpr.Apr 7 2020, 1:14 PM

Rebased.

arsenm accepted this revision.Apr 7 2020, 2:05 PM

This revision is now accepted and ready to land.Apr 7 2020, 2:05 PM

Closed by commit rG96e51ed005a9: [AMDGPU] Implement copyPhysReg for 16 bit subregs (authored by rampitec). · Explain WhyApr 7 2020, 2:44 PM

This revision was automatically updated to reflect the committed changes.

Diff 255820

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

Show First 20 Lines • Show All 673 Lines • ▼ Show 20 Lines	if (!AMDGPU::VGPR_32RegClass.contains(SrcReg)) {
return;		return;
}		}

BuildMI(MBB, MI, DL, get(AMDGPU::V_ACCVGPR_WRITE_B32), DestReg)		BuildMI(MBB, MI, DL, get(AMDGPU::V_ACCVGPR_WRITE_B32), DestReg)
.addReg(SrcReg, getKillRegState(KillSrc));		.addReg(SrcReg, getKillRegState(KillSrc));
return;		return;
}		}

		if (RC == &AMDGPU::VGPR_LO16RegClass \|\| RC == &AMDGPU::VGPR_HI16RegClass) {
		assert(AMDGPU::VGPR_LO16RegClass.contains(SrcReg) \|\|
		AMDGPU::VGPR_HI16RegClass.contains(SrcReg));

		// d s
		// l -> l : hhhhxxxx : xxxxllll -> v_alignbyte_b32 d, s, d, 2
		// llllhhhh : xxxxllll -> v_alignbyte_b32 d, d, d, 2
		// l -> h : xxxxllll : xxxxhhhh -> v_lshlrev_b32 d, 16, d
		// llll0000 : xxxxhhhh -> v_alignbyte_b32 d, s, d, 2
		// h -> l : hhhhxxxx : llllxxxx -> v_lshrrev_b32 d, 16, d
		// 0000hhhh : llllxxxx -> v_alignbyte_b32 d, d, s, 2
		// h -> h : xxxxllll : hhhhxxxx -> v_alignbyte_b32 d, d, s, 2
		// llllhhhh : hhhhxxxx -> v_alignbyte_b32 d, d, d, 2

		bool DstLow = RC == &AMDGPU::VGPR_LO16RegClass;
		bool SrcLow = AMDGPU::VGPR_LO16RegClass.contains(SrcReg);
		DestReg = RI.getMatchingSuperReg(DestReg,
		DstLow ? AMDGPU::lo16 : AMDGPU::hi16,
		&AMDGPU::VGPR_32RegClass);
		SrcReg = RI.getMatchingSuperReg(SrcReg,
		arsenmUnsubmitted Not Done Reply Inline Actions V_PACK_B32_F16 has some FP flushing properties and is not suitable for a copy. I think you have to do essentially what D74740 does arsenm: V_PACK_B32_F16 has some FP flushing properties and is not suitable for a copy. I think you have…
		rampitecAuthorUnsubmitted Done Reply Inline Actions I cannot do it here, I would need to scavenge a physreg for a mask, either if I use v_perm_b32 (if available) or v_bfi_b32... In fact I do not see a good instruction to do it if v_pack_b32 does not work. rampitec: I cannot do it here, I would need to scavenge a physreg for a mask, either if I use v_perm_b32…
		arsenmUnsubmitted Not Done Reply Inline Actions Yes, there are definitely missing instructions to handle this well. I think you can use V_ALIGNBIT_B32 without an extra register in a subset of cases arsenm: Yes, there are definitely missing instructions to handle this well. I think you can use…
		rampitecAuthorUnsubmitted Done Reply Inline Actions It does not work for the most needed thing: copy low to low. Well, in fact it does not help at all. rampitec: It does not work for the most needed thing: copy low to low. Well, in fact it does not help at…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Another thing which doesn't work is mov with sdwa. It needs dst_preserve and then it needs tied operand for this. If pack doesn't work I can think only about an extremely ugly solution in a general case: clear destination bits with two shifts, then use v_or_b16 with op_sel. rampitec: Another thing which doesn't work is mov with sdwa. It needs dst_preserve and then it needs tied…
		rampitecAuthorUnsubmitted Done Reply Inline Actions Oh, wait... We do not have v_or_b16. Then v_add_u16 with op_sel, afair it will preserve the other half of the destination. rampitec: Oh, wait... We do not have v_or_b16. Then v_add_u16 with op_sel, afair it will preserve the…
		SrcLow ? AMDGPU::lo16 : AMDGPU::hi16,
		&AMDGPU::VGPR_32RegClass);

		if (DestReg == SrcReg) {
		// l -> h : v_pk_add_u16 v1, v1, 0 op_sel_hi:[0,0]
		// h -> l : v_pk_add_u16 v1, v1, 0 op_sel:[1,0] op_sel_hi:[1,0]
		if (DstLow == SrcLow)
		return;
		BuildMI(MBB, MI, DL, get(AMDGPU::V_PK_ADD_U16), DestReg)
		.addImm(DstLow ? SISrcMods::OP_SEL_0 \| SISrcMods::OP_SEL_1 : 0)
		.addReg(DestReg, RegState::Undef)
		.addImm(0) // src1_mod
		.addImm(0) // src1
		.addImm(0)
		.addImm(0)
		.addImm(0)
		arsenmUnsubmitted Done Reply Inline Actions Braecss arsenm: Braecss
		.addImm(0)
		.addImm(0);

		return;
		}

		arsenmUnsubmitted Done Reply Inline Actions Braecs arsenm: Braecs
		// Last instruction first:
		auto Last = BuildMI(MBB, MI, DL, get(AMDGPU::V_ALIGNBYTE_B32), DestReg)
		.addReg((SrcLow && !DstLow) ? SrcReg : DestReg,
		(SrcLow && !DstLow) ? getKillRegState(KillSrc) : 0)
		.addReg((!SrcLow && DstLow) ? SrcReg : DestReg,
		(!SrcLow && DstLow) ? getKillRegState(KillSrc) : 0)
		.addImm(2);

		unsigned OpcFirst = (DstLow == SrcLow) ? AMDGPU::V_ALIGNBYTE_B32
		: SrcLow ? AMDGPU::V_LSHRREV_B32_e32
		: AMDGPU::V_LSHLREV_B32_e32;
		auto First = BuildMI(MBB, &*Last, DL, get(OpcFirst), DestReg);
		if (DstLow == SrcLow) { // alignbyte
		First.addReg(SrcLow ? SrcReg : DestReg,
		SrcLow ? getKillRegState(KillSrc) : RegState::Undef)
		.addReg(SrcLow ? DestReg : SrcReg,
		SrcLow ? RegState::Undef :getKillRegState(KillSrc))
		.addImm(2);
		} else {
		First.addImm(16)
		.addReg(DestReg, RegState::Undef);
		}

		return;
		}

unsigned EltSize = 4;		unsigned EltSize = 4;
unsigned Opcode = AMDGPU::V_MOV_B32_e32;		unsigned Opcode = AMDGPU::V_MOV_B32_e32;
if (RI.isSGPRClass(RC)) {		if (RI.isSGPRClass(RC)) {
// TODO: Copy vec3/vec5 with s_mov_b64s then final s_mov_b32.		// TODO: Copy vec3/vec5 with s_mov_b64s then final s_mov_b32.
if (!(RI.getRegSizeInBits(*RC) % 64)) {		if (!(RI.getRegSizeInBits(*RC) % 64)) {
Opcode = AMDGPU::S_MOV_B64;		Opcode = AMDGPU::S_MOV_B64;
EltSize = 8;		EltSize = 8;
} else {		} else {
▲ Show 20 Lines • Show All 6,055 Lines • Show Last 20 Lines

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

Show First 20 Lines • Show All 1,273 Lines • ▼ Show 20 Lines	StringRef SIRegisterInfo::getRegAsmName(unsigned Reg) const {
return AMDGPUInstPrinter::getRegisterName(Reg);		return AMDGPUInstPrinter::getRegisterName(Reg);
}		}

// FIXME: This is very slow. It might be worth creating a map from physreg to		// FIXME: This is very slow. It might be worth creating a map from physreg to
// register class.		// register class.
const TargetRegisterClass *		const TargetRegisterClass *
SIRegisterInfo::getPhysRegClass(MCRegister Reg) const {		SIRegisterInfo::getPhysRegClass(MCRegister Reg) const {
static const TargetRegisterClass *const BaseClasses[] = {		static const TargetRegisterClass *const BaseClasses[] = {
		&AMDGPU::VGPR_LO16RegClass,
		&AMDGPU::VGPR_HI16RegClass,
&AMDGPU::VGPR_32RegClass,		&AMDGPU::VGPR_32RegClass,
&AMDGPU::SReg_32RegClass,		&AMDGPU::SReg_32RegClass,
&AMDGPU::AGPR_32RegClass,		&AMDGPU::AGPR_32RegClass,
&AMDGPU::VReg_64RegClass,		&AMDGPU::VReg_64RegClass,
&AMDGPU::SReg_64RegClass,		&AMDGPU::SReg_64RegClass,
&AMDGPU::AReg_64RegClass,		&AMDGPU::AReg_64RegClass,
&AMDGPU::VReg_96RegClass,		&AMDGPU::VReg_96RegClass,
&AMDGPU::SReg_96RegClass,		&AMDGPU::SReg_96RegClass,
Show All 23 Lines	SIRegisterInfo::getPhysRegClass(MCRegister Reg) const {
return nullptr;		return nullptr;
}		}

// TODO: It might be helpful to have some target specific flags in		// TODO: It might be helpful to have some target specific flags in
// TargetRegisterClass to mark which classes are VGPRs to make this trivial.		// TargetRegisterClass to mark which classes are VGPRs to make this trivial.
bool SIRegisterInfo::hasVGPRs(const TargetRegisterClass *RC) const {		bool SIRegisterInfo::hasVGPRs(const TargetRegisterClass *RC) const {
unsigned Size = getRegSizeInBits(*RC);		unsigned Size = getRegSizeInBits(*RC);
switch (Size) {		switch (Size) {
		case 16:
		return getCommonSubClass(&AMDGPU::VGPR_LO16RegClass, RC) != nullptr \|\|
		getCommonSubClass(&AMDGPU::VGPR_HI16RegClass, RC) != nullptr;
case 32:		case 32:
return getCommonSubClass(&AMDGPU::VGPR_32RegClass, RC) != nullptr;		return getCommonSubClass(&AMDGPU::VGPR_32RegClass, RC) != nullptr;
case 64:		case 64:
return getCommonSubClass(&AMDGPU::VReg_64RegClass, RC) != nullptr;		return getCommonSubClass(&AMDGPU::VReg_64RegClass, RC) != nullptr;
case 96:		case 96:
return getCommonSubClass(&AMDGPU::VReg_96RegClass, RC) != nullptr;		return getCommonSubClass(&AMDGPU::VReg_96RegClass, RC) != nullptr;
case 128:		case 128:
return getCommonSubClass(&AMDGPU::VReg_128RegClass, RC) != nullptr;		return getCommonSubClass(&AMDGPU::VReg_128RegClass, RC) != nullptr;
▲ Show 20 Lines • Show All 629 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/lo16-hi16-physreg-copy.mir

This file was added.

				# RUN: llc -march=amdgcn -mcpu=gfx900 -start-before postrapseudos -asm-verbose=0 -verify-machineinstrs %s -o - \| FileCheck -check-prefix=GCN %s

				# GCN-LABEL: {{^}}lo_to_lo:
				# GCN: v_alignbyte_b32 v1, v0, v1, 2
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v1, 2
				name: lo_to_lo
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr1_lo16 = COPY $vgpr0_lo16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}lo_to_hi:
				# GCN: v_lshrrev_b32_e32 v1, 16, v1
				# GCN-NEXT: v_alignbyte_b32 v1, v0, v1, 2
				name: lo_to_hi
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr1_hi16 = COPY killed $vgpr0_lo16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}hi_to_lo:
				# GCN: v_lshlrev_b32_e32 v1, 16, v1
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v0, 2
				name: hi_to_lo
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr1_lo16 = COPY $vgpr0_hi16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}hi_to_hi:
				# GCN: v_alignbyte_b32 v1, v1, v0, 2
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v1, 2
				name: hi_to_hi
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr1_hi16 = COPY $vgpr0_hi16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}lo_to_lo_samereg:
				# GCN: s_waitcnt
				# GCN-NEXT: s_endpgm
				name: lo_to_lo_samereg
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr0_lo16 = COPY $vgpr0_lo16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}lo_to_hi_samereg:
				# GCN: v_pk_add_u16 v0, v0, 0 op_sel_hi:[0,0]
				name: lo_to_hi_samereg
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr0_hi16 = COPY $vgpr0_lo16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}hi_to_lo_samereg:
				# GCN: v_pk_add_u16 v0, v0, 0 op_sel:[1,0] op_sel_hi:[1,0]
				name: hi_to_lo_samereg
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr0_lo16 = COPY killed $vgpr0_hi16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}hi_to_hi_samereg:
				# GCN: s_waitcnt
				# GCN-NEXT: s_endpgm
				name: hi_to_hi_samereg
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr0_hi16 = COPY killed $vgpr0_hi16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}lo_to_lo_def_livein:
				# GCN: v_alignbyte_b32 v1, v0, v1, 2
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v1, 2
				name: lo_to_lo_def_livein
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0

				$vgpr1 = IMPLICIT_DEF
				arsenmUnsubmitted Done Reply Inline Actions Needs some tests with both halves in the same 32-bit register Also need some with kill and undef flag handling arsenm: Needs some tests with both halves in the same 32-bit register Also need some with kill and…
				rampitecAuthorUnsubmitted Done Reply Inline Actions Good catch about the same VGPR. I will need some special handling for it. rampitec: Good catch about the same VGPR. I will need some special handling for it.
				$vgpr1_lo16 = COPY $vgpr0_lo16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}lo_to_hi_def_livein:
				# GCN: v_lshrrev_b32_e32 v1, 16, v1
				# GCN-NEXT: v_alignbyte_b32 v1, v0, v1, 2
				name: lo_to_hi_def_livein
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0

				$vgpr1 = IMPLICIT_DEF
				$vgpr1_hi16 = COPY $vgpr0_lo16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}hi_to_lo_def_livein:
				# GCN: v_lshlrev_b32_e32 v1, 16, v1
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v0, 2
				name: hi_to_lo_def_livein
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0

				$vgpr1 = IMPLICIT_DEF
				$vgpr1_lo16 = COPY killed $vgpr0_hi16
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}hi_to_hi_def_livein:
				# GCN: v_alignbyte_b32 v1, v1, v0, 2
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v1, 2
				name: hi_to_hi_def_livein
				tracksRegLiveness: true
				body: \|
				bb.0:
				liveins: $vgpr0

				$vgpr1 = IMPLICIT_DEF
				$vgpr1_hi16 = COPY $vgpr0_hi16
				S_ENDPGM 0
				...

				# TODO: This can be coalesced into a VGPR_32 copy
				# GCN-LABEL: {{^}}lo_to_lo_hi_to_hi:
				# GCN: v_alignbyte_b32 v1, v0, v1, 2
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v1, 2
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v0, 2
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v1, 2
				# GCN-NEXT: v_mov_b32_e32 v2, v1
				# GCN-NEXT: s_endpgm
				name: lo_to_lo_hi_to_hi
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr1_lo16 = COPY $vgpr0_lo16
				$vgpr1_hi16 = COPY $vgpr0_hi16
				$vgpr2 = COPY killed $vgpr1
				S_ENDPGM 0
				...

				# GCN-LABEL: {{^}}lo_to_hi_hi_to_lo:
				# GCN: v_lshlrev_b32_e32 v1, 16, v1
				# GCN-NEXT: v_alignbyte_b32 v1, v1, v0, 2
				# GCN-NEXT: v_lshrrev_b32_e32 v1, 16, v1
				# GCN-NEXT: v_alignbyte_b32 v1, v0, v1, 2
				# GCN-NEXT: v_mov_b32_e32 v2, v1
				# GCN-NEXT: s_endpgm
				name: lo_to_hi_hi_to_lo
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr0 = IMPLICIT_DEF
				$vgpr1_lo16 = COPY $vgpr0_hi16
				$vgpr1_hi16 = COPY $vgpr0_lo16
				$vgpr2 = COPY killed $vgpr1
				S_ENDPGM 0
				...

				# NB: copy of undef just killed instead of expansion
				# GCN-LABEL: {{^}}lo_to_lo_undef:
				# GCN: s_waitcnt
				# GCN-NEXT: v_mov_b32_e32 v2, v1
				# GCN-NEXT: s_endpgm
				name: lo_to_lo_undef
				tracksRegLiveness: true
				body: \|
				bb.0:
				$vgpr1_lo16 = COPY undef $vgpr0_lo16
				$vgpr2 = COPY killed $vgpr1
				S_ENDPGM 0
				...

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Implement copyPhysReg for 16 bit subregs
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 255820

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/lo16-hi16-physreg-copy.mir

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Implement copyPhysReg for 16 bit subregsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 255820

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp

llvm/test/CodeGen/AMDGPU/lo16-hi16-physreg-copy.mir

[AMDGPU] Implement copyPhysReg for 16 bit subregs
ClosedPublic