This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic
Needs ReviewPublic

Authored by tstellarAMD on Dec 8 2016, 11:49 AM.

Details

Summary

This patch also adds a new address space to represent loads from the constant
address space that use a buffer resource.

Note that this new address space does not support GEP instructions due to
limitations in the instruction selector.

Event Timeline

tstellarAMD retitled this revision from to AMDGPU/SI: Add llvm.amdgcn.s.buffer.load intrinsic.
tstellarAMD updated this object.
tstellarAMD added a reviewer: arsenm.
tstellarAMD added a subscriber: llvm-commits.
arsenm added inline comments.Dec 14 2016, 9:47 AM
include/llvm/IR/IntrinsicsAMDGPU.td
421

This should probably be a more general type so that you can also mangle with FP types

mareko added inline comments.Dec 14 2016, 11:20 AM
include/llvm/IR/IntrinsicsAMDGPU.td
425

Do these flags prevent CSE? Mesa re-loads TGSI constants on each use. There is no reuse. It relies on CSE to do its the job.

More test cases and rebase on top of latest master.

include/llvm/IR/IntrinsicsAMDGPU.td
421

The only other option would be llvm_any_ty, I think tablegen needs some more work to make this happen. But we can always change this in the future without breaking backwards compatibility, because the intrinsic is already overloaded.

mareko edited edge metadata.Jan 31 2017, 4:01 PM

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

I don't think it's legal to select amdgcn.buffer.load to SMRD unless you can prove that it is uniform. llvm.amdgcn.s.buffer.load is known to always be uniform.

arsenm added inline comments.Jan 31 2017, 4:49 PM
lib/Target/AMDGPU/AMDGPU.h
172

Why 42?

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
167

Does this need to specify non-integral? Also there are a handful of places that assume 64-bit max we should take care of

lib/Target/AMDGPU/AMDGPUTargetTransformInfo.cpp
87

This should be last

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

I don't think it's legal to select amdgcn.buffer.load to SMRD unless you can prove that it is uniform. llvm.amdgcn.s.buffer.load is known to always be uniform.

lib/Target/AMDGPU/AMDGPU.h
172

4 dword resource size for address space 2

lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
167

We want to be able to use inttoptr, so I don't think we can say non-integral. Do you know where any of these places are?

How is this different from using amdgcn.buffer.load if D28993 lands (which is not certain)?

I don't think it's legal to select amdgcn.buffer.load to SMRD unless you can prove that it is uniform. llvm.amdgcn.s.buffer.load is known to always be uniform.

Well, I can't prove that it is uniform, but neither does Mesa for non-constant offsets. The idea of my patch is that SMRD is selected first and moveToVALU will lower it if necessary.

nhaehnle edited edge metadata.Feb 1 2017, 3:51 AM

I haven't looked in too much detail yet. I assume getelementptr doesn't work with these pointers, so it would be good to have a negative test which ensures that GEP use fails.

I haven't looked in too much detail yet. I assume getelementptr doesn't work with these pointers, so it would be good to have a negative test which ensures that GEP use fails.

That's correct. In order to support GEP, we would need to make i128 legal in the backend, which would be a pretty significant change. GlobalISel will make this much much easier, so I'm not sure it's even worth trying to support with SelectionDAG.

mareko added a comment.Feb 1 2017, 5:53 AM

Would you please describe the purpose of this patch? It's not obvious why it's useful.

What is the behavior of the constant address space in LLVM? Note that vertex buffers and 'restrict' read-only buffers use immutable memory too, so those are also 'constant' and selecting SMEM for those is desirable if it's possible.

OpenGL Constant buffers aren't limited to SMEM. They can be read with a VGPR offset too. LLVM doesn't have the capability to recognize a non-constant SGPR offset at ISel. If you wanna select SMEM with a non-constant offset, you also need to update moveToVALU.

Would you please describe the purpose of this patch? It's not obvious why it's useful.

The main reason it is useful is because it tells the compiler that this is a load from a constant value without neededing any more analysis. It's also useful because s_buffer_load_* instructions have a much more simplified resource descriptor, so if then do end up getting selected to MUBUF you don't have to worry about swizzled addressing. It is true however, that you could just use a single llvm.amdgcn.buffer.load.i32 intrinsic for everything, but you may end up with worse code if you are unable to do the analysis required to select it to SMRD instructions.

What is the behavior of the constant address space in LLVM? Note that vertex buffers and 'restrict' read-only buffers use immutable memory too, so those are also 'constant' and selecting SMEM for those is desirable if it's possible.

Constant address space just means that the memory is unchanged for the life of the program.

OpenGL Constant buffers aren't limited to SMEM. They can be read with a VGPR offset too. LLVM doesn't have the capability to recognize a non-constant SGPR offset at ISel. If you wanna select SMEM with a non-constant offset, you also need to update moveToVALU.

Would you please describe the purpose of this patch? It's not obvious why it's useful.

The main reason it is useful is because it tells the compiler that this is a load from a constant value without neededing any more analysis. It's also useful because s_buffer_load_* instructions have a much more simplified resource descriptor, so if then do end up getting selected to MUBUF you don't have to worry about swizzled addressing. It is true however, that you could just use a single llvm.amdgcn.buffer.load.i32 intrinsic for everything, but you may end up with worse code if you are unable to do the analysis required to select it to SMRD instructions.

For OpenGL, we'll also need to support non-constant SGPR offsets. And for those, moveToVALU needs to have the corresponding lowering. I don't think having an intrinsic that only accepts constant nodes for the offset is useful.

Also for OpenGL, llvm.amdgcn.buffer.load for SMRD might not be the best choice, because Mesa also needs to determine whether SMRD can be selected with regard to SQC L1 coherency. Therefore, I'd define llvm.amdgcn.s.buffer.load as: "do llvm.amdgcn.buffer.load, but allow SMRD selection for constant and non-constant offsets, and also the referenced memory is immutable (which is why we can use SMRD)". I don't think any other definition is useful for Mesa. Do we agree?

What is the behavior of the constant address space in LLVM? Note that vertex buffers and 'restrict' read-only buffers use immutable memory too, so those are also 'constant' and selecting SMEM for those is desirable if it's possible.

Constant address space just means that the memory is unchanged for the life of the program.

Yes, but what effect on the compiler does it have? Will it allow code sinking and arbitrary scheduling?

t-tye added a subscriber: t-tye.Mar 22 2017, 6:38 PM
tony-tye removed a subscriber: tony-tye.Mar 22 2017, 6:50 PM

I've got a scenario that will benefit from this change. It seems to me that this might be a more workable solution than D28993 (although a more generic approach like that is attractive). This change doesn't necessarily preclude something like D28993 at a later stage does it?

Is it possible to make the non-const offset change that mareko talks about as well?

I've got a scenario that will benefit from this change. It seems to me that this might be a more workable solution than D28993 (although a more generic approach like that is attractive). This change doesn't necessarily preclude something like D28993 at a later stage does it?

Is it possible to make the non-const offset change that mareko talks about as well?

Yes, we'd like to support non-const SGPR offsets as well. Constant offsets are not that interesting to us. Constant offsets can be supported first just to get things going. Non-const offsets can be added afterwards, and the SALU->VALU lowering for s_load_dword should also do inst_offset folding at least. The idea is that the lowering shouldn't generate worse code than amdgcn.buffer.load.dword used directly.

arsenm resigned from this revision.Feb 21 2019, 6:52 PM