Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Note: I'm unsure if codegen is actually correct for this. I don't understand this bit particularly well but instead of leaving this one the side I thought I would give it a shot and discuss whether it's correct during review.
I'm assuming you haven't tried running this. I have another attempt for this I did about 2 years ago I can send you. My conclusion then was this is a broken feature, and it's not really a 32-bit register. Instead, this seems to be a 64-bit register and any 32-bit operand ends up reading the low bits (which are always 0). We should still try to use them, but you have to extract the high bits with a shift
I can't find the piece that was anymore than what you have. I know I had a patch emitting a 64-bit shift somewhere
I indeed haven't tried running it yet, I planned too but some issues prevented me from doing it.
Do you mean that we need to add a 32 bits right-shift to src_shared_based before using it?
Should we just store it in a SGPR pair and use the high register maybe? (e.g. s_mov_b64 s[0:1], src_shared_base then use s0 (or is it s1? I always forget)
So I indeed checked ocltst and the previous version crashed.
Now it looks fine, but the verifier crashes in a lot of tests, especially GISel ones.
The issue is that we need to use this register as a 64 bit operand but it's a 32 bit register, so the verifier complains on S_MOV_B64.
How can I solve this? Is there another instruction I could use or do we have to change the RC of the src_shared/private_base register to 64 bit? (ideally it should be available for both, no?)
Note: tests haven't all been updated yet because of the verifier crashes.
do we have to change the RC of the src_shared/private_base register to 64 bit?
Yes that sounds right.
(ideally it should be available for both, no?)
That would be useful bu apparently it's not how the hardware works. I guess it gives you the 32 low bits which is not very useful because it will always be 0 for src_*_base and -1 for src_*_limit.
I don't know. I'm not too familiar with that stuff. Can you copy what we do for SGPR_NULL64?
Need to be rebased on a future patch that will add the 64 bit variant of the aperture registers (almost done, just need to get tests to pass)
How have you tested this? OpenCL conformance flat tests with -O0 and -O2 should be good enough
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
1830 | You can just build a copy at this point. We should use copies and mark it as a constant register |
I used https://github.com/RadeonOpenCompute/ROCm-OpenCL-Runtime/tree/develop/tests/ocltst, do you mean I should use https://github.com/KhronosGroup/OpenCL-CTS?
Which test should I run? All of them? I don't see a "flat" category
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
1830 | If I use a COPY it'll think it's okay to eliminate it and use the _HI register directly (which is not supposed to exist but I had to declare it to fix tablegen issues) |
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
1830 | It's too early to set a register class here. In principle it should work, but it would be better to let the selection constrain the register class normally |
Note: the machine where my full dev setup is is down at the moment so I can't run OCLTst right now, will do it as soon as possible (and before landing definitely)
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
119 ↗ | (On Diff #477113) | Lowercase is |
149–150 ↗ | (On Diff #477113) | You shouldn't need to special case this (also note there's no equivalent for the DAG). Either it needs to belong to a different class from SReg_64, or you need to reserve the high bits |
llvm/lib/Target/AMDGPU/SIISelLowering.cpp | ||
5521–5524 | This will get combined later to bitcast and extract later, but I guess that's more annoying to emit | |
llvm/test/CodeGen/AMDGPU/GlobalISel/inst-select-copy.mir | ||
485–488 ↗ | (On Diff #477113) | A basic selection test shouldn't have pre-set register classes |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
149–150 ↗ | (On Diff #477113) |
Aren't they reserved with reserveRegisterTuples? If I create a different class, do you mean I should exclude those aperture registers from SReg and create a superclass with both SReg and the apertures? I'm not sure I understand how the class will help if I shouldn't special-case the uses of those aperture registers |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
149–150 ↗ | (On Diff #477113) | If the reserve didn't do it, then yes. Using different classes is the correct way to represent use restrictions. Really what we want is a class without the sub1 subregister, but has a hack we're trying to avoid that, so a class that excludes the aperture regs is the next best thing. Not sure if you really need the reserve after that. |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
149–150 ↗ | (On Diff #477113) | Do I need to do something similar to SRegOrLds, then? Also:
And I suppose all those changes should be in a separate patch, right? |
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp | ||
---|---|---|
149–150 ↗ | (On Diff #477113) |
Well the 32-bit register cases are purely artificial. I'd assume these are included only by the broadest set of SGPRs (ideally wouldn't be in any allocatable set)
This is a question of naming. I'd probably lean towards leaving the SReg name/operands as-is, and introducing an allocatable subclass that excludes them (e.g. like SReg_64_XEXEC) |
You can just build a copy at this point. We should use copies and mark it as a constant register