This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Do not reserve 16-bit registers
ClosedPublic

Authored by foad on Mar 29 2023, 7:22 AM.

Details

Reviewers
rampitec
Joe_Nash
arsenm
Group Reviewers
Restricted Project
Commits
rG8bad806f298c: [AMDGPU] Do not reserve 16-bit registers
Summary

There should be no need to reserve all SGPR hi16/lo16 halves, or all
AGPR hi16 halves. This should be done by marking the corresponding
register classes as not allocatable instead.

Diff Detail

Event Timeline

foad created this revision.Mar 29 2023, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2023, 7:22 AM
foad requested review of this revision.Mar 29 2023, 7:22 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2023, 7:22 AM
foad added inline comments.Mar 29 2023, 7:27 AM
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
613

I assume this comment refers to a vcc liveness error in test/CodeGen/AMDGPU/frame-index-elimination.ll. I have fixed that in a different way with D147157.

I am not so sure about SGPRs. I believe it is legal to use SGPR halves.

SGPR_HI16 is already marked as not allocatable. So L611 was partially redundant with that. But this patch does appear to remove a barrier to allocating into the hi or lo half of TMA_LO or some other non-GPR SReg.
I guess there is other code somewhere preventing allocating to those non-GPR SReg? If so, this patch can be marked NFC and the commit reworded to say it skips redundant reservations.

I am not so sure about SGPRs. I believe it is legal to use SGPR halves.

It appears that this patch does not change anything about the legality of allocating registers in class SGPR_LO16.

I am not so sure about SGPRs. I believe it is legal to use SGPR halves.

It appears that this patch does not change anything about the legality of allocating registers in class SGPR_LO16.

But if we will allow SGPRs this code will be needed on older targets again.

I am not so sure about SGPRs. I believe it is legal to use SGPR halves.

It appears that this patch does not change anything about the legality of allocating registers in class SGPR_LO16.

But if we will allow SGPRs this code will be needed on older targets again.

I think you are right about that. Perhaps that is a a good reason not to merge this patch unless there is some immediate benefit.

foad updated this revision to Diff 509586.Mar 30 2023, 2:27 AM

Make SReg_LO16 unallocatable

foad added a comment.Mar 30 2023, 2:34 AM

I am not so sure about SGPRs. I believe it is legal to use SGPR halves.

I am not sure about this. It is "legal" in that the SGPR_LO16 is allocatable, but I don't think anything uses it - except for one handwritten MIR test in test/CodeGen/AMDGPU/fold_16bit_imm.mir.

It appears that this patch does not change anything about the legality of allocating registers in class SGPR_LO16.

But if we will allow SGPRs this code will be needed on older targets again.

I think you are right about that. Perhaps that is a a good reason not to merge this patch unless there is some immediate benefit.

I don't like guessing what will be required in future. Even with the GFX11 True16 work in progress it is unclear to me whether 16-bit SGPRs will be used.

My motivation is to clean up and simplify SIRegisterInfo::getReservedRegs based on what is required today. It seems like a lot of stuff has crept in and I am trying to understand what it is all for.

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
615

I've tried to preserve the intent of this code by making SReg_LO16 unallocatable but leaving SGPR_LO16 allocatable.

Here is the thing:

v_mov_b16 v0.h, vcc_lo.h
    0xd59c4800
    0x0000006a
v_mov_b16 v0.l, vcc_hi.l
    0x7e00386b

So this all seems legal.

rampitec accepted this revision.Mar 30 2023, 12:25 PM

OK, let's merge this. I believe t16 work will need to redo it anyway at the end.

This revision is now accepted and ready to land.Mar 30 2023, 12:25 PM
foad added a comment.Mar 31 2023, 6:15 AM

Here is the thing:

v_mov_b16 v0.h, vcc_lo.h
    0xd59c4800
    0x0000006a
v_mov_b16 v0.l, vcc_hi.l
    0x7e00386b

So this all seems legal.

Right, VALU instructions can access 16-bit SGPRs, and the assembler/disassembler has to accept that. The tricky question is whether we want to expose that to codegen. and I think it will be hard to do that because there is no S_MOV_B16 instruction and in general no way to write to a 16-bit SGPR; you can only read from them.

Thanks for the review.

arsenm accepted this revision.Mar 31 2023, 6:19 AM
This revision was landed with ongoing or failed builds.Mar 31 2023, 6:56 AM
This revision was automatically updated to reflect the committed changes.

Here is the thing:

v_mov_b16 v0.h, vcc_lo.h
    0xd59c4800
    0x0000006a
v_mov_b16 v0.l, vcc_hi.l
    0x7e00386b

So this all seems legal.

Right, VALU instructions can access 16-bit SGPRs, and the assembler/disassembler has to accept that. The tricky question is whether we want to expose that to codegen. and I think it will be hard to do that because there is no S_MOV_B16 instruction and in general no way to write to a 16-bit SGPR; you can only read from them.

Thanks for the review.

Good point about the scalar move. It explains the problem. Now I think I agree about the codegen support.