This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Lower VGPR to physical SGPR COPY to S_MOV_B32 if VGPR contains the compile time constant
ClosedPublic

Authored by alex-t on Dec 12 2022, 12:21 PM.

Details

Summary

Sometimes we have a constant value loaded to VGPR. In case we further
need to rematrerialize it in the physical scalar register we may avoid VGPR to
SGPR copy replacing it with S_MOV_B32.

Diff Detail

Event Timeline

alex-t created this revision.Dec 12 2022, 12:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 12:21 PM
alex-t requested review of this revision.Dec 12 2022, 12:21 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 12:21 PM
Herald added a subscriber: wdng. · View Herald Transcript

Test case from D139852 (with code sequence slightly rearranged) passes with this, e.g.

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc -O0 -mcpu=gfx1030 < %s | FileCheck %s

target triple = "amdgcn-amd-amdhsa"

; Unknown functions are conservatively passed all implicit parameters
declare void @unknown_call()
; Use the same constant as a sgpr parameter (for the kernel id) and for a vector operation
define protected amdgpu_kernel void @kern(ptr %addr) !llvm.amdgcn.lds.kernel.id !0 {
; CHECK-LABEL: kern:
; CHECK:       ; %bb.0:
; CHECK-NEXT:    s_mov_b32 s32, 0
; CHECK-NEXT:    s_add_u32 s12, s12, s17
; CHECK-NEXT:    s_addc_u32 s13, s13, 0
; CHECK-NEXT:    s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12
; CHECK-NEXT:    s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13
; CHECK-NEXT:    s_add_u32 s0, s0, s17
; CHECK-NEXT:    s_addc_u32 s1, s1, 0
; CHECK-NEXT:    v_writelane_b32 v40, s16, 0
; CHECK-NEXT:    s_mov_b32 s13, s15
; CHECK-NEXT:    s_mov_b32 s12, s14
; CHECK-NEXT:    v_readlane_b32 s14, v40, 0
; CHECK-NEXT:    s_mov_b64 s[16:17], s[8:9]
; CHECK-NEXT:    s_load_dwordx2 s[8:9], s[16:17], 0x0
; CHECK-NEXT:    v_mov_b32_e32 v5, 42
; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
; CHECK-NEXT:    v_mov_b32_e32 v3, s8
; CHECK-NEXT:    v_mov_b32_e32 v4, s9
; CHECK-NEXT:    flat_store_dword v[3:4], v5
; CHECK-NEXT:    s_mov_b64 s[18:19], 8
; CHECK-NEXT:    s_mov_b32 s8, s16
; CHECK-NEXT:    s_mov_b32 s9, s17
; CHECK-NEXT:    s_mov_b32 s16, s18
; CHECK-NEXT:    s_mov_b32 s15, s19
; CHECK-NEXT:    s_add_u32 s8, s8, s16
; CHECK-NEXT:    s_addc_u32 s15, s9, s15
; CHECK-NEXT:    ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9
; CHECK-NEXT:    s_mov_b32 s9, s15
; CHECK-NEXT:    s_getpc_b64 s[16:17]
; CHECK-NEXT:    s_add_u32 s16, s16, unknown_call@gotpcrel32@lo+4
; CHECK-NEXT:    s_addc_u32 s17, s17, unknown_call@gotpcrel32@hi+12
; CHECK-NEXT:    s_load_dwordx2 s[16:17], s[16:17], 0x0
; CHECK-NEXT:    s_mov_b64 s[22:23], s[2:3]
; CHECK-NEXT:    s_mov_b64 s[20:21], s[0:1]
; CHECK-NEXT:    s_mov_b32 s15, 20
; CHECK-NEXT:    v_lshlrev_b32_e64 v2, s15, v2
; CHECK-NEXT:    s_mov_b32 s15, 10
; CHECK-NEXT:    v_lshlrev_b32_e64 v1, s15, v1
; CHECK-NEXT:    v_or3_b32 v31, v0, v1, v2
; CHECK-NEXT:    s_mov_b32 s15, 42
; CHECK-NEXT:    s_mov_b64 s[0:1], s[20:21]
; CHECK-NEXT:    s_mov_b64 s[2:3], s[22:23]
; CHECK-NEXT:    s_waitcnt lgkmcnt(0)
; CHECK-NEXT:    s_swappc_b64 s[30:31], s[16:17]
; CHECK-NEXT:    s_endpgm
  store i32 42, ptr %addr
  call fastcc void @unknown_call()
  ret void
}

!0 = !{i32 42}

if you add that to the commit I'll be pleased to accept. isMoveImmediate() is a helpful function to have available.

alex-t updated this revision to Diff 482249.Dec 12 2022, 1:29 PM

test added

alex-t updated this revision to Diff 482251.Dec 12 2022, 1:32 PM

test renamed

JonChesterfield accepted this revision.Dec 12 2022, 1:46 PM

Brilliant, thank you!

This revision is now accepted and ready to land.Dec 12 2022, 1:46 PM
arsenm requested changes to this revision.Dec 12 2022, 1:48 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
852

Don't use value MachineOperands

853

This isn't guaranteed by isMoveImmediate

This revision now requires changes to proceed.Dec 12 2022, 1:48 PM
arsenm added inline comments.Dec 12 2022, 1:50 PM
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
854–856

Could also have 64-bit VGPR immediate. We have a pseudo for it

JonChesterfield added a comment.EditedDec 12 2022, 1:53 PM

Sad times. @arsenm I'm guessing you're reluctant to ship D139852 in the meantime? Could check in this test with it and revert that block when this lands later.

alex-t updated this revision to Diff 482268.Dec 12 2022, 2:26 PM

changed as requested

alex-t marked 3 inline comments as done.Dec 12 2022, 2:27 PM
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
857

Not sure the signed comparison works here. INT64_MIN is smaller than UINT32_MAX. Checked getImm() and it does return a signed value, so I guess we either convert to unsigned then compare, or truncate to int32_t and check that is equal to the original

alex-t updated this revision to Diff 482277.Dec 12 2022, 2:58 PM

assign getImm() result to unsigned for implicit conversion

alex-t marked an inline comment as done.Dec 12 2022, 2:59 PM
arsenm added inline comments.Dec 12 2022, 3:11 PM
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
862

First of all, you should be just doing addOperand and forwarding whatever was there originally. No need to muck about with conversions or immediates

The 64-bit case is not solved by simply truncating the immediate. It's materializing a 64-bit value. I'd think you'd be hard pressed to write a a testcase without using MIR (which probably should be done)

llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
862

If we're copying to a single sgpr and the value doesn't fit, codegen has already gone wrong. We may as well fall through to the current behaviour (which is to miscompile).

If this code path might be copying to pairs of sgprs (I can't tell from the immediate surroundings) then this needs to be significantly more complicated, and probably pick up testing and so forth, but equally that's also broken at present.

I'd rather emit the mov at ISel instead of here.

alex-t updated this revision to Diff 482520.Dec 13 2022, 9:20 AM

add operand instead of adding immediate

alex-t updated this revision to Diff 482585.Dec 13 2022, 12:08 PM

reviewers comments addressed

alex-t marked 2 inline comments as done.Dec 13 2022, 12:08 PM
alex-t added inline comments.
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
862

Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if we have 32bit dest register

alex-t added inline comments.Dec 13 2022, 12:13 PM
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
862

Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if we have 32bit dest register

Please ignore this comment. It was added accidentally. I don't know how to remove it :)

arsenm requested changes to this revision.Dec 13 2022, 2:46 PM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp
854

Not isImm, but !isReg

855–856

Should prefer to get the class and size from the vreg. physregs are expensive

This revision now requires changes to proceed.Dec 13 2022, 2:46 PM
alex-t updated this revision to Diff 483147.Dec 15 2022, 5:47 AM

register class and move size from the virtual source register
operand type check is loosen to !isReg()

alex-t marked 2 inline comments as done.Dec 15 2022, 5:48 AM
arsenm accepted this revision.Dec 15 2022, 7:50 AM
This revision is now accepted and ready to land.Dec 15 2022, 7:50 AM