This patch implements amdgcn image intrinsics. The flags are passed as a mask parameter.
Details
Diff Detail
Event Timeline
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
203 | The full instruction name is image_get_resinfo, so the intrinsic should be int_amdgcn_image_getresinfo | |
211 | This requires a descriptive comment (including the values for which bits) | |
276–307 | These are also missing the image part of the name as well | |
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp | ||
156–162 | Param names should be capitalized, the function name itself should be lowercase | |
1103–1116 | You don't need these intermediate variables, you can just check the bit as the argument to getTargetConstant directly | |
1134–1135 | Commented out code | |
lib/Target/AMDGPU/SIDefines.h | ||
111–114 | Not all of these bits should be exposed. GLC, SLC, and TFE definitely should not be. TFE changes the register class of the output. This should strictly be the ones for controlling the sampling | |
lib/Target/AMDGPU/SIInstructions.td | ||
2738 | Not sure what this means | |
2740–2749 | I think you can reduce the number of repeated lines by passing in the suffix, and then concat + cast to the intrinsic (Similar to how MUBUF_LoadIntrinsicPat does it) | |
test/CodeGen/AMDGPU/llvm.amdgcn.gather4.ll | ||
4–5 ↗ | (On Diff #65635) | Space between ; and start of comment. Also should probably switch to GCN prefix because it seems likely we'll need to distinguish subtargets here |
9–13 ↗ | (On Diff #65635) | It should work to just store the output directly into an output pointer rather than having to scalarize and export since this is a compute shader |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
198–201 | Changing this intrinsic will break Mesa, we will need to update Mesa before we can commit this. |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
193–198 | Can we just not do this kind of change, please? I don't see how it improves anything, it's inconsistent with the ISA description which has the flags separate, and it'll require an annoying flag day synchronization with Mesa. |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
193–198 | I agree. Separate flags also play nicely with assembler. |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
193–198 | We plan to add and expose d16 bit, so the Intrinsics for Mesa has to be updated anyway. For the future chips, we may add more flag (bit) and all applications (including Mesa) have to be updated every time a new flag is added. One advantage of using mask parameter is that we don't have to update the application if the new bit is not exposed. | |
198–201 | We will have to add d16 bit! So Mesa will have to be update anyway. |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
193–198 | The assembler has nothing to do with the intrinsic definition. This isn't changing the MachineInstr's operand structure |
update the patch based on Matt's comments.
The following comments are not updated:
- >Not all of these bits should be exposed. GLC, SLC, and TFE definitely should not be. TFE changes the register class of the output. This should strictly be the ones for controlling the sampling
- I asked this question before I started, and the agreement is to expose all at this moment. We have to study all these flags in detail and get a short list to expose later.
- >I think you can reduce the number of repeated lines by passing in the suffix, and then concat + cast to the intrinsic (Similar to how MUBUF_LoadIntrinsicPat does it)
- what's the problem of repeated lines? I think it is more readable than simplified one. Or we can simplify the def in the TableGen cleanup work.
NOTE that Mesa has to be update to accommodate the image_load/image_store intrinsic parameter change.
BTW, SLC and GLC bits definitely SHOULD be exposed. LLVM isn't high-level enough to be able to make any assumptions about cache strategies.
GLC and SLC should still be separate parameters from the format mask (or metadata should control these like it will for non temporal regular loads/stores)
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
193–198 |
I really prefer using a mask over having to create an entire new set of intrinsics each time we have to add a new bit. I think we have two solutions here:
|
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
221–228 | Moving the sample intrinsics to this file is unrelated to the AMDGPUImageLoad/AMDGPUImageStore changes, so this should be done in a separate patch. |
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
221–228 | The patch is to implement amdgcn image inttrinsics, which has three categories: AMDGPUImageLoad, AMDGPUImageStore and AMDGPUImageSample. While AMDGPUImageSample is newly defined, and the other two are modified, they do use the same mechanism, i.e. mask parameter! I think it should be better for them to be together in one patch. |
Do not use mask parameter!
Keep image load/store intrinsics
expose all flags for sampler intrinsics
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
206 | This should go in a separate patch. This patch should be only the sampler changes. | |
218 | This is an unrelated change. | |
224 | I'm thinking vdata should be llvm_anyfloat_ty, so we can have it return <4 x half> for the d16 operations. Though it's going to be weird that some <4 x half> values take 4 registers and some only take two. Another thing I'm not sure of is if image samplers always return floating-point values and never integers. | |
226 | This should be changed to llvm_anyint_ty, so that we can infer the r128 bit. | |
232 | This r128 bit should be dropped. | |
lib/Target/AMDGPU/SIInstructions.td | ||
2722–2729 | These image load/store changes should go in a separate patch. |
Thanks for doing this. One comment below.
include/llvm/IR/IntrinsicsAMDGPU.td | ||
---|---|---|
233 | tfe should be dropped. AFAIU it changes the return type (5 return values instead of 4). |
Implement sampler intrinsics:
- define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty as a result, we expect the intrinsics name to have three suffixes to overload each of these three types;
- D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as rsrc type, then r128 is true!
- don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da.
- LIT tests are not included now!
Can we just not do this kind of change, please? I don't see how it improves anything, it's inconsistent with the ISA description which has the flags separate, and it'll require an annoying flag day synchronization with Mesa.