GFX10 image instructions use one or more address operands starting at
vaddr0, instead of a single vaddr operand, to allow for NSA forms.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
This isn't ready to push yet but I'd like to get some feedback on it -- see comments inline.
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp | ||
---|---|---|
116 | Is this really true? I see opcodes like IMAGE_SAMPLE_D_CL_O_V4_V16 mentioned in some generated tables, but I'm not sure what the V16 there really means or whether these will ever occur in practice. | |
llvm/test/CodeGen/AMDGPU/merge-image-load.mir | ||
2 ↗ | (On Diff #270187) | These tests pass with -mcpu=gfx1010 but I don't think they're really testing anything useful. Do I need to regenerate them using gfx10-specific image sample opcodes or something? |
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp | ||
---|---|---|
116 | I think that's because we do not have all register classes, so it uses VReg_512 for 10, 11, or 12 dword address for example. I believe a longest one has 12 vaddrs, like this: image_sample_c_d_cl_o v[16:19], [v8, v9, v10, v11, v12, v13, v14, v15, v16, v17, v18, v19], s[20:27], s[100:103] dmask:0xf dim:SQ_RSRC_IMG_3D ; encoding: [0x16,0x0f,0xec,0xf0,0x08,0x10,0x25,0x03,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0x10,0x11,0x12,0x13,0x00] 0x16,0x0f,0xec,0xf0,0x08,0x10,0x25,0x03,0x09,0x0a,0x0b,0x0c,0x0d,0x0e,0x0f,0x10,0x11,0x12,0x13,0x14 You can check it in MIMGInstructions.td where MIMG_Sampler_AddrSizes is defined. |
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp | ||
---|---|---|
116 | I think 12 is what I remember was the maximum. I was confused by the missing register classes. We probably should add the missing ones now that globalisel at least would handle them (I had to go back and break the code to round to the instruction constraints) |
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp | ||
---|---|---|
116 | It will probably require not only register classes, but value types as well. |
Mostly LGTM, with some nitpicks.
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp | ||
---|---|---|
108–113 | false instead of 0 for bools. But how about keeping RegisterEnum and making this two unsigned chars, one for the RegisterEnum flags and one for the NumVAddrs? | |
116 | I remember the value types being the part where I got stuck when I tried to do this a very, very long time ago. And yes, 12 is the maximum number of vaddrs. | |
430–431 | Remove the struct |
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp | ||
---|---|---|
108–113 | I think using separate fields is better than bit twiddling. The storage size of AddressRegs doesn't really matter because it isn't stored anywhere, only used to return info from getRegs. In retrospect I shouldn't have bothered squashing NumVAddrs into a single byte. |
llvm/test/CodeGen/AMDGPU/merge-image-load.mir | ||
---|---|---|
2 ↗ | (On Diff #270187) | Yes, it would be good if you could add tests for image instructions with NSA. |
llvm/test/CodeGen/AMDGPU/merge-image-load-gfx10.mir | ||
---|---|---|
13 | s/ =C/ = C/ |
false instead of 0 for bools.
But how about keeping RegisterEnum and making this two unsigned chars, one for the RegisterEnum flags and one for the NumVAddrs?