Download Raw Diff

Details

Reviewers

JonChesterfield
arsenm
rampitec

Commits

rG2877b8766660: [AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the…

Summary

Sometimes we have a constant value loaded to VGPR. In case we further
need to rematrerialize it in the physical scalar register we may avoid VGPR to
SGPR copy replacing it with S_MOV_B32.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

alex-t created this revision.Dec 12 2022, 12:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 12:21 PM

Herald added subscribers: kosarev, foad, kerbowa and 6 others. · View Herald Transcript

alex-t requested review of this revision.Dec 12 2022, 12:21 PM

Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 12:21 PM

Herald added a subscriber: wdng. · View Herald Transcript

alex-t mentioned this in D139852: [amdgpu] Lower CopyToReg into SGPR explicitly to avoid illegal vgpr to sgpr copy.Dec 12 2022, 12:23 PM

Test case from D139852 (with code sequence slightly rearranged) passes with this, e.g.

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -O0 -mcpu=gfx1030 < %s | FileCheck %s

target triple = "amdgcn-amd-amdhsa"

; Unknown functions are conservatively passed all implicit parameters
declare void @unknown_call()
; Use the same constant as a sgpr parameter (for the kernel id) and for a vector operation
define protected amdgpu_kernel void @kern(ptr %addr) !llvm.amdgcn.lds.kernel.id !0 {
; CHECK-LABEL: kern:
; CHECK:       ; %bb.0:
; CHECK-NEXT:    s_mov_b32 s32, 0
; CHECK-NEXT:    s_add_u32 s12, s12, s17
; CHECK-NEXT:    s_addc_u32 s13, s13, 0
; CHECK-NEXT:    s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12
; CHECK-NEXT:    s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13
; CHECK-NEXT:    s_add_u32 s0, s0, s17
; CHECK-NEXT:    s_addc_u32 s1, s1, 0
; CHECK-NEXT:    v_writelane_b32 v40, s16, 0
; CHECK-NEXT:    s_mov_b32 s13, s15
; CHECK-NEXT:    s_mov_b32 s12, s14
; CHECK-NEXT:    v_readlane_b32 s14, v40, 0
; CHECK-NEXT:    s_mov_b64 s[16:17], s[8:9]
; CHECK-NEXT:    s_load_dwordx2 s[8:9], s[16:17], 0x0
; CHECK-NEXT:    v_mov_b32_e32 v5, 42
; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
; CHECK-NEXT:    v_mov_b32_e32 v3, s8
; CHECK-NEXT:    v_mov_b32_e32 v4, s9
; CHECK-NEXT:    flat_store_dword v[3:4], v5
; CHECK-NEXT:    s_mov_b64 s[18:19], 8
; CHECK-NEXT:    s_mov_b32 s8, s16
; CHECK-NEXT:    s_mov_b32 s9, s17
; CHECK-NEXT:    s_mov_b32 s16, s18
; CHECK-NEXT:    s_mov_b32 s15, s19
; CHECK-NEXT:    s_add_u32 s8, s8, s16
; CHECK-NEXT:    s_addc_u32 s15, s9, s15
; CHECK-NEXT:    ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9
; CHECK-NEXT:    s_mov_b32 s9, s15
; CHECK-NEXT:    s_getpc_b64 s[16:17]
; CHECK-NEXT:    s_add_u32 s16, s16, unknown_call@gotpcrel32@lo+4
; CHECK-NEXT:    s_addc_u32 s17, s17, unknown_call@gotpcrel32@hi+12
; CHECK-NEXT:    s_load_dwordx2 s[16:17], s[16:17], 0x0
; CHECK-NEXT:    s_mov_b64 s[22:23], s[2:3]
; CHECK-NEXT:    s_mov_b64 s[20:21], s[0:1]
; CHECK-NEXT:    s_mov_b32 s15, 20
; CHECK-NEXT:    v_lshlrev_b32_e64 v2, s15, v2
; CHECK-NEXT:    s_mov_b32 s15, 10
; CHECK-NEXT:    v_lshlrev_b32_e64 v1, s15, v1
; CHECK-NEXT:    v_or3_b32 v31, v0, v1, v2
; CHECK-NEXT:    s_mov_b32 s15, 42
; CHECK-NEXT:    s_mov_b64 s[0:1], s[20:21]
; CHECK-NEXT:    s_mov_b64 s[2:3], s[22:23]
; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
; CHECK-NEXT:    s_swappc_b64 s[30:31], s[16:17]
; CHECK-NEXT:    s_endpgm
  store i32 42, ptr %addr
  call fastcc void @unknown_call()
  ret void
}

!0 = !{i32 42}

if you add that to the commit I'll be pleased to accept. isMoveImmediate() is a helpful function to have available.

test added

test renamed

Brilliant, thank you!

This revision is now accepted and ready to land.Dec 12 2022, 1:46 PM

arsenm requested changes to this revision.Dec 12 2022, 1:48 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
851	Don't use value MachineOperands
852	This isn't guaranteed by isMoveImmediate

This revision now requires changes to proceed.Dec 12 2022, 1:48 PM

arsenm added inline comments.Dec 12 2022, 1:50 PM

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
853–855	Could also have 64-bit VGPR immediate. We have a pseudo for it

Sad times. @arsenm I'm guessing you're reluctant to ship D139852 in the meantime? Could check in this test with it and revert that block when this lands later.

changed as requested

alex-t marked 3 inline comments as done.Dec 12 2022, 2:27 PM

JonChesterfield added inline comments.Dec 12 2022, 2:32 PM

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
856	Not sure the signed comparison works here. INT64_MIN is smaller than UINT32_MAX. Checked getImm() and it does return a signed value, so I guess we either convert to unsigned then compare, or truncate to int32_t and check that is equal to the original

assign getImm() result to unsigned for implicit conversion

alex-t marked an inline comment as done.Dec 12 2022, 2:59 PM

Thanks! @arsenm?

arsenm added inline comments.Dec 12 2022, 3:11 PM

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
861	First of all, you should be just doing addOperand and forwarding whatever was there originally. No need to muck about with conversions or immediates The 64-bit case is not solved by simply truncating the immediate. It's materializing a 64-bit value. I'd think you'd be hard pressed to write a a testcase without using MIR (which probably should be done)

Harbormaster completed remote builds in B202693: Diff 482277.Dec 12 2022, 3:40 PM

JonChesterfield added inline comments.Dec 12 2022, 5:46 PM

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
861	If we're copying to a single sgpr and the value doesn't fit, codegen has already gone wrong. We may as well fall through to the current behaviour (which is to miscompile). If this code path might be copying to pairs of sgprs (I can't tell from the immediate surroundings) then this needs to be significantly more complicated, and probably pick up testing and so forth, but equally that's also broken at present. I'd rather emit the mov at ISel instead of here.

add operand instead of adding immediate

Harbormaster completed remote builds in B202878: Diff 482520.Dec 13 2022, 11:36 AM

reviewers comments addressed

alex-t marked 2 inline comments as done.Dec 13 2022, 12:08 PM

alex-t added inline comments.

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
861	Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if we have 32bit dest register

alex-t added inline comments.Dec 13 2022, 12:13 PM

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
861	Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if we have 32bit dest register Please ignore this comment. It was added accidentally. I don't know how to remove it :)

Harbormaster completed remote builds in B202924: Diff 482585.Dec 13 2022, 2:30 PM

arsenm requested changes to this revision.Dec 13 2022, 2:46 PM

arsenm added inline comments.

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
853	Not isImm, but !isReg
854–855	Should prefer to get the class and size from the vreg. physregs are expensive

This revision now requires changes to proceed.Dec 13 2022, 2:46 PM

alex-t marked 2 inline comments as done.Dec 15 2022, 5:48 AM

Harbormaster completed remote builds in B203326: Diff 483147.Dec 15 2022, 7:01 AM

arsenm accepted this revision.Dec 15 2022, 7:50 AM

This revision is now accepted and ready to land.Dec 15 2022, 7:50 AM

This revision was landed with ongoing or failed builds.Dec 15 2022, 3:38 PM

Closed by commit rG2877b8766660: [AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the… (authored by Alexander Timofeev <alexander.timofeev@amd.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Alexander Timofeev <alexander.timofeev@amd.com> added a commit: rG2877b8766660: [AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the….

JonChesterfield mentioned this in D141852: [amdgpu] Change LDS lowering default to hybrid.Jan 30 2023, 6:00 AM

JonChesterfield mentioned this in rGbf579a7049a6: [amdgpu] Change LDS lowering default to hybrid.Feb 24 2023, 7:27 AM

Diff 482268

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp

Show First 20 Lines • Show All 839 Lines • ▼ Show 20 Lines	if (!DstReg.isVirtual()) {
if (DstReg == AMDGPU::M0 &&		if (DstReg == AMDGPU::M0 &&
TRI->hasVectorRegisters(MRI->getRegClass(SrcReg))) {		TRI->hasVectorRegisters(MRI->getRegClass(SrcReg))) {
Register TmpReg =		Register TmpReg =
MRI->createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);		MRI->createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
BuildMI(*MI.getParent(), MI, MI.getDebugLoc(),		BuildMI(*MI.getParent(), MI, MI.getDebugLoc(),
TII->get(AMDGPU::V_READFIRSTLANE_B32), TmpReg)		TII->get(AMDGPU::V_READFIRSTLANE_B32), TmpReg)
.add(MI.getOperand(1));		.add(MI.getOperand(1));
MI.getOperand(1).setReg(TmpReg);		MI.getOperand(1).setReg(TmpReg);
		} else {
		MachineInstr *DefMI = MRI->getVRegDef(SrcReg);
		if (DefMI && DefMI->isMoveImmediate()) {
		MachineOperand SrcConst = DefMI->getOperand(AMDGPU::getNamedOperandIdx(
		arsenmUnsubmitted Done Reply Inline Actions Don't use value MachineOperands arsenm: Don't use value MachineOperands
		DefMI->getOpcode(), AMDGPU::OpName::src0));
		arsenmUnsubmitted Done Reply Inline Actions This isn't guaranteed by isMoveImmediate arsenm: This isn't guaranteed by isMoveImmediate
		if (SrcConst.isImm()) {
		arsenmUnsubmitted Done Reply Inline Actions Not isImm, but !isReg arsenm: Not isImm, but !isReg
		int64_t IMMVal = SrcConst.getImm();
		if (IMMVal <= UINT32_MAX) {
		arsenmUnsubmitted Done Reply Inline Actions Could also have 64-bit VGPR immediate. We have a pseudo for it arsenm: Could also have 64-bit VGPR immediate. We have a pseudo for it
		arsenmUnsubmitted Done Reply Inline Actions Should prefer to get the class and size from the vreg. physregs are expensive arsenm: Should prefer to get the class and size from the vreg. physregs are expensive
		Register TmpReg =
		JonChesterfieldUnsubmitted Done Reply Inline Actions Not sure the signed comparison works here. INT64_MIN is smaller than UINT32_MAX. Checked getImm() and it does return a signed value, so I guess we either convert to unsigned then compare, or truncate to int32_t and check that is equal to the original JonChesterfield: Not sure the signed comparison works here. INT64_MIN is smaller than UINT32_MAX. Checked getImm…
		MRI->createVirtualRegister(&AMDGPU::SReg_32_XM0RegClass);
		BuildMI(*MI.getParent(), MI, MI.getDebugLoc(),
		TII->get(AMDGPU::S_MOV_B32), TmpReg)
		.addImm(IMMVal);
		MI.getOperand(1).setReg(TmpReg);
		arsenmUnsubmitted Done Reply Inline Actions First of all, you should be just doing addOperand and forwarding whatever was there originally. No need to muck about with conversions or immediates The 64-bit case is not solved by simply truncating the immediate. It's materializing a 64-bit value. I'd think you'd be hard pressed to write a a testcase without using MIR (which probably should be done) arsenm: First of all, you should be just doing addOperand and forwarding whatever was there originally.
		JonChesterfieldUnsubmitted Done Reply Inline Actions If we're copying to a single sgpr and the value doesn't fit, codegen has already gone wrong. We may as well fall through to the current behaviour (which is to miscompile). If this code path might be copying to pairs of sgprs (I can't tell from the immediate surroundings) then this needs to be significantly more complicated, and probably pick up testing and so forth, but equally that's also broken at present. I'd rather emit the mov at ISel instead of here. JonChesterfield: If we're copying to a single sgpr and the value doesn't fit, codegen has already gone wrong. We…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if we have 32bit dest register alex-t: Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if we…
		alex-tAuthorUnsubmitted Done Reply Inline Actions Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if we have 32bit dest register Please ignore this comment. It was added accidentally. I don't know how to remove it :) alex-t: > Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if…
		}
		}
		}
}		}
return true;		return true;
}		}
if (!SrcReg.isVirtual() \|\| TRI->isAGPR(*MRI, SrcReg)) {		if (!SrcReg.isVirtual() \|\| TRI->isAGPR(*MRI, SrcReg)) {
TII->moveToVALU(MI, MDT);		TII->moveToVALU(MI, MDT);
return true;		return true;
}		}

▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -O0 -mcpu=gfx1030 < %s \| FileCheck %s

				target triple = "amdgcn-amd-amdhsa"

				; Unknown functions are conservatively passed all implicit parameters
				declare void @unknown_call()
				; Use the same constant as a sgpr parameter (for the kernel id) and for a vector operation
				define protected amdgpu_kernel void @kern(ptr %addr) !llvm.amdgcn.lds.kernel.id !0 {
				; CHECK-LABEL: kern:
				; CHECK: ; %bb.0:
				; CHECK-NEXT: s_mov_b32 s32, 0
				; CHECK-NEXT: s_add_u32 s12, s12, s17
				; CHECK-NEXT: s_addc_u32 s13, s13, 0
				; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12
				; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13
				; CHECK-NEXT: s_add_u32 s0, s0, s17
				; CHECK-NEXT: s_addc_u32 s1, s1, 0
				; CHECK-NEXT: v_writelane_b32 v40, s16, 0
				; CHECK-NEXT: s_mov_b32 s13, s15
				; CHECK-NEXT: s_mov_b32 s12, s14
				; CHECK-NEXT: v_readlane_b32 s14, v40, 0
				; CHECK-NEXT: s_mov_b64 s[16:17], s[8:9]
				; CHECK-NEXT: s_load_dwordx2 s[8:9], s[16:17], 0x0
				; CHECK-NEXT: v_mov_b32_e32 v5, 42
				; CHECK-NEXT: s_waitcnt lgkmcnt(0)
				; CHECK-NEXT: v_mov_b32_e32 v3, s8
				; CHECK-NEXT: v_mov_b32_e32 v4, s9
				; CHECK-NEXT: flat_store_dword v[3:4], v5
				; CHECK-NEXT: s_mov_b64 s[18:19], 8
				; CHECK-NEXT: s_mov_b32 s8, s16
				; CHECK-NEXT: s_mov_b32 s9, s17
				; CHECK-NEXT: s_mov_b32 s16, s18
				; CHECK-NEXT: s_mov_b32 s15, s19
				; CHECK-NEXT: s_add_u32 s8, s8, s16
				; CHECK-NEXT: s_addc_u32 s15, s9, s15
				; CHECK-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9
				; CHECK-NEXT: s_mov_b32 s9, s15
				; CHECK-NEXT: s_getpc_b64 s[16:17]
				; CHECK-NEXT: s_add_u32 s16, s16, unknown_call@gotpcrel32@lo+4
				; CHECK-NEXT: s_addc_u32 s17, s17, unknown_call@gotpcrel32@hi+12
				; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0
				; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3]
				; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1]
				; CHECK-NEXT: s_mov_b32 s15, 20
				; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2
				; CHECK-NEXT: s_mov_b32 s15, 10
				; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1
				; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2
				; CHECK-NEXT: s_mov_b32 s15, 42
				; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21]
				; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23]
				; CHECK-NEXT: s_waitcnt lgkmcnt(0)
				; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17]
				; CHECK-NEXT: s_endpgm
				store i32 42, ptr %addr
				call fastcc void @unknown_call()
				ret void
				}

				!0 = !{i32 42}

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the compile time constant
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 482268

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp

llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the compile time constantClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 482268

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp

llvm/test/CodeGen/AMDGPU/vgpr_constant_to_sgpr.ll

[AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the compile time constant
ClosedPublic