Image sample instructions that need more than 5 VGPRs for VAddr can use
partial NSA for NSA encoding format. VGPRs that can not fit into the
encoding are sequential after the last one.
This patch adds assembly and disassembly parts.
Details
Diff Detail
Event Timeline
Same could be done for GFX10, however differences between GFX10.1 and GFX10.3 cause inconveniences.
MaxNSA size for 10.1 is 5 and for 10.3 is 13 so _V6, _V7,... opcodes for GFX10 already exist where every vaddr is a VGPR_32.
We would need new versions exclusive for non-10.3.
Not sure if that is the best approach. We would end up with opcodes like:
IMAGE_SAMPLE_D_V1_V6_nsa_gfx10
IMAGE_SAMPLE_D_V1_V6_partial_nsa_gfx10
or
IMAGE_SAMPLE_D_V1_V6_nsa_gfx1030
IMAGE_SAMPLE_D_V1_V6_nsa_gfx1010
or
IMAGE_SAMPLE_D_V1_V6_nsa_navi2_gfx10
IMAGE_SAMPLE_D_V1_V6_nsa_navi1_gfx10
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp | ||
---|---|---|
3695 | Should try to avoid raw generation checks and have some feature for this |
llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp | ||
---|---|---|
10 | I think the idea is that the assembler should not include any of the "codegen" parts of the backend. If getNSAMaxSize needs to be shared by codegen and the assembler, it should be moved into AMDGPUBaseInfo - that is why AMDGPUBaseInfo exists. | |
llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp | ||
910–915 | This should check !hasPartialNSAEncoding? | |
llvm/lib/Target/AMDGPU/GCNSubtarget.h | ||
940 | No else after return. |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
1950 | I'm not sure if it's allowed to call into GCNSubtarget from this code - does anyone else know? |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
1950 | No, GCNSubtarget is from codegen. |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
1950 | I guess it happens to work if you something defined in GCNSubtarget.h. But really I don't think we should be including GCNSubtarget.h here in the first place. |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
1950 | Should I duplicate the code from getNSAMaxSize() here or have GCNSubtarget.h call AMDGPUBaseInfo version? |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
1950 | Have GCNSubtarget.h call AMDGPUBaseInfo version. You will have to use IsaVersion for the generation checks. There are lots of examples in AMDGPUBaseInfo.cpp. |
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp | ||
---|---|---|
1949–1954 | or if (isGFX10(STI)) return hasGFX10_3Insts(STI) ? 13 : 5; if (isGFX11(STI)) return 5; return 0; |
I think the idea is that the assembler should not include any of the "codegen" parts of the backend. If getNSAMaxSize needs to be shared by codegen and the assembler, it should be moved into AMDGPUBaseInfo - that is why AMDGPUBaseInfo exists.