This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Make aperture registers 64 bit
ClosedPublic

Authored by Pierre-vh on Nov 10 2022, 12:17 AM.

Details

Summary

Makes the SRC_(SHARED|PRIVATE)_(BASE|LIMIT) registers 64 bit instead of 32.
They're still usable as 32 bit operands by using the _LO suffix.

Preparation for D137542

Diff Detail

Event Timeline

Pierre-vh created this revision.Nov 10 2022, 12:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2022, 12:17 AM
Pierre-vh requested review of this revision.Nov 10 2022, 12:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 10 2022, 12:17 AM
foad added a comment.Nov 10 2022, 1:24 AM

Why do we need the 32-bit variants?

Why do we need the 32-bit variants?

Can't this register still be used as a 32 bit operand ? (even if it's bugged) Then we need it for proper assembly/disassembly I believe, no?

foad added a comment.Nov 10 2022, 2:33 AM

Why do we need the 32-bit variants?

Can't this register still be used as a 32 bit operand ? (even if it's bugged) Then we need it for proper assembly/disassembly I believe, no?

OK, seems reasonable. I see we already have some asm/dis tests for 64-bit uses of these sources, e.g. llvm/test/MC/AMDGPU/literals.s tests s_and_b64 s[0:1], s[0:1], src_shared_base. How did that work? Should it work differently now?

I do remember struggling to define a register without sub registers, and it eventually blowing up the number of defined register classes

I'd rather not define a separate 32-bit version, other than _LO and _HI are probably necessary. Do we already have assembler tests for this?

Pierre-vh updated this revision to Diff 475392.Nov 15 2022, 2:12 AM

Comments

I wasn't able to do this without creating the 16 bit subregisters.
I added a FIXME for it. If I don't add them and try to just define the
32 bit register directly using SIReg, it creates a lot of issues with register pressure sets and classes.
We end up with a bunch of autogenerated ones that are useless and just cause crashes everywhere.

Why do we need the 32-bit variants?

Can't this register still be used as a 32 bit operand ? (even if it's bugged) Then we need it for proper assembly/disassembly I believe, no?

OK, seems reasonable. I see we already have some asm/dis tests for 64-bit uses of these sources, e.g. llvm/test/MC/AMDGPU/literals.s tests s_and_b64 s[0:1], s[0:1], src_shared_base. How did that work? Should it work differently now?

I think the end result is the same but internally it's different. Before, it looks like SRC_SHARED_BASE was used for both 64 bit and 32 bit operand. Maybe as a hack? decodeSpecialReg64 was returning those aperture registers even though they're 32 bits.
I'm not sure how to test this properly, I think existing tests are enough but the fact that they didn't need changes makes me doubt they're covering this properly

Pierre-vh updated this revision to Diff 475393.Nov 15 2022, 2:14 AM

clang-format

Pierre-vh retitled this revision from [AMDGPU] Add aperture register 64 bit variants to [AMDGPU] Make aperture registers 64 bit.Nov 15 2022, 2:14 AM
Pierre-vh edited the summary of this revision. (Show Details)
Pierre-vh planned changes to this revision.Nov 15 2022, 2:44 AM
Pierre-vh requested review of this revision.Nov 15 2022, 3:00 AM

Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class.

arsenm added a comment.EditedNov 15 2022, 8:00 AM

Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class.

That either means we're missing a class restriction or need a reserve

Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class.

That either means we're missing a class restriction or need a reserve

Ah the class restriction was it! I fixed it. I got confused and was using SREG_64 (includes aperture register) for the COPY dst register, but if I use SGPR (doesn't include it) then copy coalescing won't mess it up

Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class.

That either means we're missing a class restriction or need a reserve

Ah the class restriction was it! I fixed it. I got confused and was using SREG_64 (includes aperture register) for the COPY dst register, but if I use SGPR (doesn't include it) then copy coalescing won't mess it up

But naturally it should be SReg_64. It should be valid to copy to VCC, and used directly as a SSrc_b64/VSrc_b64 operand. You may want an SReg_64 variant that excludes these

llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
567

The whole point of reserveRegisterTuples is you don't need to handle each individual subregister

Note: I can't use COPY in D137542 because otherwise it'll "simplify" it and we end up with instructions that use the _HI register variants - which shouldn't be there and are just here because TableGen needs them/I get hundreds of crashes without it. It seems to assume there'll be sub1 for every reg in the class.

That either means we're missing a class restriction or need a reserve

Ah the class restriction was it! I fixed it. I got confused and was using SREG_64 (includes aperture register) for the COPY dst register, but if I use SGPR (doesn't include it) then copy coalescing won't mess it up

But naturally it should be SReg_64. It should be valid to copy to VCC, and used directly as a SSrc_b64/VSrc_b64 operand. You may want an SReg_64 variant that excludes these

In D137542 you said I shouldn't set a register class. If I add a register class, how will it help? Should I add the new class & keep using the constrain?

Pierre-vh marked an inline comment as done.

Fix reserveRegisterTuples call

In D137542 you said I shouldn't set a register class. If I add a register class, how will it help? Should I add the new class & keep using the constrain?

The complaint was mostly about where you set the class on the virtual register, during legalize and not during the select of the copy. Here I mean you may want a different register class to represent the use restriction where you want everything else, but to exclude the aperture regs

arsenm accepted this revision.Nov 17 2022, 8:14 AM
This revision is now accepted and ready to land.Nov 17 2022, 8:14 AM
This revision was landed with ongoing or failed builds.Nov 22 2022, 1:18 AM
This revision was automatically updated to reflect the committed changes.