Makes the SRC_(SHARED|PRIVATE)_(BASE|LIMIT) registers 64 bit instead of 32.
They're still usable as 32 bit operands by using the _LO suffix.
Preparation for D137542
Paths
| Differential D137767
[AMDGPU] Make aperture registers 64 bit ClosedPublic Authored by Pierre-vh on Nov 10 2022, 12:17 AM.
Details
Summary Makes the SRC_(SHARED|PRIVATE)_(BASE|LIMIT) registers 64 bit instead of 32. Preparation for D137542
Diff Detail
Event TimelinePierre-vh added a child revision: D137542: [AMDGPU] Use aperture registers instead of S_GETREG.Nov 10 2022, 12:18 AM Comment Actions
Can't this register still be used as a 32 bit operand ? (even if it's bugged) Then we need it for proper assembly/disassembly I believe, no? Comment Actions
OK, seems reasonable. I see we already have some asm/dis tests for 64-bit uses of these sources, e.g. llvm/test/MC/AMDGPU/literals.s tests s_and_b64 s[0:1], s[0:1], src_shared_base. How did that work? Should it work differently now? Comment Actions I do remember struggling to define a register without sub registers, and it eventually blowing up the number of defined register classes Comment Actions I'd rather not define a separate 32-bit version, other than _LO and _HI are probably necessary. Do we already have assembler tests for this? Comment Actions Comments I wasn't able to do this without creating the 16 bit subregisters. Comment Actions
I think the end result is the same but internally it's different. Before, it looks like SRC_SHARED_BASE was used for both 64 bit and 32 bit operand. Maybe as a hack? decodeSpecialReg64 was returning those aperture registers even though they're 32 bits. Pierre-vh retitled this revision from [AMDGPU] Add aperture register 64 bit variants to [AMDGPU] Make aperture registers 64 bit.Nov 15 2022, 2:14 AM Comment Actions Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class. Comment Actions
That either means we're missing a class restriction or need a reserve Comment Actions
Ah the class restriction was it! I fixed it. I got confused and was using SREG_64 (includes aperture register) for the COPY dst register, but if I use SGPR (doesn't include it) then copy coalescing won't mess it up Comment Actions
But naturally it should be SReg_64. It should be valid to copy to VCC, and used directly as a SSrc_b64/VSrc_b64 operand. You may want an SReg_64 variant that excludes these
Comment Actions
In D137542 you said I shouldn't set a register class. If I add a register class, how will it help? Should I add the new class & keep using the constrain? Comment Actions
The complaint was mostly about where you set the class on the virtual register, during legalize and not during the select of the copy. Here I mean you may want a different register class to represent the use restriction where you want everything else, but to exclude the aperture regs This revision is now accepted and ready to land.Nov 17 2022, 8:14 AM This revision was landed with ongoing or failed builds.Nov 22 2022, 1:18 AM Closed by commit rG220147d536f3: [AMDGPU] Make aperture registers 64 bit (authored by Pierre-vh). · Explain Why This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 477098 llvm/lib/Target/AMDGPU/AMDGPUResourceUsageAnalysis.cpp
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
llvm/lib/Target/AMDGPU/SIRegisterInfo.td
|
The whole point of reserveRegisterTuples is you don't need to handle each individual subregister