This patch also adds a new address space to represent loads from the constant
address space that use a buffer resource.
Note that this new address space does not support GEP instructions due to
limitations in the instruction selector.
Paths
| Differential D27586
AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic Needs ReviewPublic Authored by • tstellarAMD on Dec 8 2016, 11:49 AM.
Details
Diff Detail
Event Timeline• tstellarAMD updated this object. Herald added subscribers: tony-tye, yaxunl, nhaehnle and 2 others. · View Herald TranscriptDec 8 2016, 11:49 AM
Comment Actions How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)? Comment Actions
I don't think it's legal to select amdgcn.buffer.load to SMRD unless you can prove that it is uniform. llvm.amdgcn.s.buffer.load is known to always be uniform. Comment Actions
Comment Actions
Well, I can't prove that it is uniform, but neither does Mesa for non-constant offsets. The idea of my patch is that SMRD is selected first and moveToVALU will lower it if necessary. Comment Actions I haven't looked in too much detail yet. I assume getelementptr doesn't work with these pointers, so it would be good to have a negative test which ensures that GEP use fails. Comment Actions
That's correct. In order to support GEP, we would need to make i128 legal in the backend, which would be a pretty significant change. GlobalISel will make this much much easier, so I'm not sure it's even worth trying to support with SelectionDAG. Comment Actions Would you please describe the purpose of this patch? It's not obvious why it's useful. What is the behavior of the constant address space in LLVM? Note that vertex buffers and 'restrict' read-only buffers use immutable memory too, so those are also 'constant' and selecting SMEM for those is desirable if it's possible. OpenGL Constant buffers aren't limited to SMEM. They can be read with a VGPR offset too. LLVM doesn't have the capability to recognize a non-constant SGPR offset at ISel. If you wanna select SMEM with a non-constant offset, you also need to update moveToVALU. Comment Actions
The main reason it is useful is because it tells the compiler that this is a load from a constant value without neededing any more analysis. It's also useful because s_buffer_load_* instructions have a much more simplified resource descriptor, so if then do end up getting selected to MUBUF you don't have to worry about swizzled addressing. It is true however, that you could just use a single llvm.amdgcn.buffer.load.i32 intrinsic for everything, but you may end up with worse code if you are unable to do the analysis required to select it to SMRD instructions.
Constant address space just means that the memory is unchanged for the life of the program.
Comment Actions
For OpenGL, we'll also need to support non-constant SGPR offsets. And for those, moveToVALU needs to have the corresponding lowering. I don't think having an intrinsic that only accepts constant nodes for the offset is useful. Also for OpenGL, llvm.amdgcn.buffer.load for SMRD might not be the best choice, because Mesa also needs to determine whether SMRD can be selected with regard to SQC L1 coherency. Therefore, I'd define llvm.amdgcn.s.buffer.load as: "do llvm.amdgcn.buffer.load, but allow SMRD selection for constant and non-constant offsets, and also the referenced memory is immutable (which is why we can use SMRD)". I don't think any other definition is useful for Mesa. Do we agree?
Yes, but what effect on the compiler does it have? Will it allow code sinking and arbitrary scheduling? Comment Actions I've got a scenario that will benefit from this change. It seems to me that this might be a more workable solution than D28993 (although a more generic approach like that is attractive). This change doesn't necessarily preclude something like D28993 at a later stage does it? Is it possible to make the non-const offset change that mareko talks about as well? Comment Actions
Yes, we'd like to support non-const SGPR offsets as well. Constant offsets are not that interesting to us. Constant offsets can be supported first just to get things going. Non-const offsets can be added afterwards, and the SALU->VALU lowering for s_load_dword should also do inst_offset folding at least. The idea is that the lowering shouldn't generate worse code than amdgcn.buffer.load.dword used directly.
Revision Contents
Diff 80800 include/llvm/IR/IntrinsicsAMDGPU.td
lib/Target/AMDGPU/AMDGPU.h
lib/Target/AMDGPU/AMDGPUISelLowering.h
lib/Target/AMDGPU/AMDGPUISelLowering.cpp
lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.h
lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
lib/Target/AMDGPU/BUFInstructions.td
lib/Target/AMDGPU/SIISelLowering.h
lib/Target/AMDGPU/SIISelLowering.cpp
lib/Target/AMDGPU/SIInstrInfo.td
lib/Target/AMDGPU/SMInstructions.td
test/CodeGen/AMDGPU/mubuf.ll
test/CodeGen/AMDGPU/smrd.ll
test/Transforms/EarlyCSE/AMDGPU/intrinsics.ll
|
This should probably be a more general type so that you can also mangle with FP types