Sometimes we have a constant value loaded to VGPR. In case we further
need to rematrerialize it in the physical scalar register we may avoid VGPR to
SGPR copy replacing it with S_MOV_B32.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Test case from D139852 (with code sequence slightly rearranged) passes with this, e.g.
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py ; RUN: llc -O0 -mcpu=gfx1030 < %s | FileCheck %s target triple = "amdgcn-amd-amdhsa" ; Unknown functions are conservatively passed all implicit parameters declare void @unknown_call() ; Use the same constant as a sgpr parameter (for the kernel id) and for a vector operation define protected amdgpu_kernel void @kern(ptr %addr) !llvm.amdgcn.lds.kernel.id !0 { ; CHECK-LABEL: kern: ; CHECK: ; %bb.0: ; CHECK-NEXT: s_mov_b32 s32, 0 ; CHECK-NEXT: s_add_u32 s12, s12, s17 ; CHECK-NEXT: s_addc_u32 s13, s13, 0 ; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_LO), s12 ; CHECK-NEXT: s_setreg_b32 hwreg(HW_REG_FLAT_SCR_HI), s13 ; CHECK-NEXT: s_add_u32 s0, s0, s17 ; CHECK-NEXT: s_addc_u32 s1, s1, 0 ; CHECK-NEXT: v_writelane_b32 v40, s16, 0 ; CHECK-NEXT: s_mov_b32 s13, s15 ; CHECK-NEXT: s_mov_b32 s12, s14 ; CHECK-NEXT: v_readlane_b32 s14, v40, 0 ; CHECK-NEXT: s_mov_b64 s[16:17], s[8:9] ; CHECK-NEXT: s_load_dwordx2 s[8:9], s[16:17], 0x0 ; CHECK-NEXT: v_mov_b32_e32 v5, 42 ; CHECK-NEXT: s_waitcnt lgkmcnt(0) ; CHECK-NEXT: v_mov_b32_e32 v3, s8 ; CHECK-NEXT: v_mov_b32_e32 v4, s9 ; CHECK-NEXT: flat_store_dword v[3:4], v5 ; CHECK-NEXT: s_mov_b64 s[18:19], 8 ; CHECK-NEXT: s_mov_b32 s8, s16 ; CHECK-NEXT: s_mov_b32 s9, s17 ; CHECK-NEXT: s_mov_b32 s16, s18 ; CHECK-NEXT: s_mov_b32 s15, s19 ; CHECK-NEXT: s_add_u32 s8, s8, s16 ; CHECK-NEXT: s_addc_u32 s15, s9, s15 ; CHECK-NEXT: ; kill: def $sgpr8 killed $sgpr8 def $sgpr8_sgpr9 ; CHECK-NEXT: s_mov_b32 s9, s15 ; CHECK-NEXT: s_getpc_b64 s[16:17] ; CHECK-NEXT: s_add_u32 s16, s16, unknown_call@gotpcrel32@lo+4 ; CHECK-NEXT: s_addc_u32 s17, s17, unknown_call@gotpcrel32@hi+12 ; CHECK-NEXT: s_load_dwordx2 s[16:17], s[16:17], 0x0 ; CHECK-NEXT: s_mov_b64 s[22:23], s[2:3] ; CHECK-NEXT: s_mov_b64 s[20:21], s[0:1] ; CHECK-NEXT: s_mov_b32 s15, 20 ; CHECK-NEXT: v_lshlrev_b32_e64 v2, s15, v2 ; CHECK-NEXT: s_mov_b32 s15, 10 ; CHECK-NEXT: v_lshlrev_b32_e64 v1, s15, v1 ; CHECK-NEXT: v_or3_b32 v31, v0, v1, v2 ; CHECK-NEXT: s_mov_b32 s15, 42 ; CHECK-NEXT: s_mov_b64 s[0:1], s[20:21] ; CHECK-NEXT: s_mov_b64 s[2:3], s[22:23] ; CHECK-NEXT: s_waitcnt lgkmcnt(0) ; CHECK-NEXT: s_swappc_b64 s[30:31], s[16:17] ; CHECK-NEXT: s_endpgm store i32 42, ptr %addr call fastcc void @unknown_call() ret void } !0 = !{i32 42}
if you add that to the commit I'll be pleased to accept. isMoveImmediate() is a helpful function to have available.
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp | ||
---|---|---|
853–855 | Could also have 64-bit VGPR immediate. We have a pseudo for it |
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp | ||
---|---|---|
856 | Not sure the signed comparison works here. INT64_MIN is smaller than UINT32_MAX. Checked getImm() and it does return a signed value, so I guess we either convert to unsigned then compare, or truncate to int32_t and check that is equal to the original |
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp | ||
---|---|---|
861 | First of all, you should be just doing addOperand and forwarding whatever was there originally. No need to muck about with conversions or immediates The 64-bit case is not solved by simply truncating the immediate. It's materializing a 64-bit value. I'd think you'd be hard pressed to write a a testcase without using MIR (which probably should be done) |
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp | ||
---|---|---|
861 | If we're copying to a single sgpr and the value doesn't fit, codegen has already gone wrong. We may as well fall through to the current behaviour (which is to miscompile). If this code path might be copying to pairs of sgprs (I can't tell from the immediate surroundings) then this needs to be significantly more complicated, and probably pick up testing and so forth, but equally that's also broken at present. I'd rather emit the mov at ISel instead of here. |
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp | ||
---|---|---|
861 | Since we cannot rematerialize the 64bit value in 32bit SGPR we already have incorrect MIR if we have 32bit dest register |
llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp | ||
---|---|---|
861 |
Please ignore this comment. It was added accidentally. I don't know how to remove it :) |
register class and move size from the virtual source register
operand type check is loosen to !isReg()
Don't use value MachineOperands