Page MenuHomePhabricator

AMDGPU/SI: Implement amdgcn image intrinsics
ClosedPublic

Authored by tstellarAMD on Jul 26 2016, 5:12 PM.

Details

Reviewers
cfang
arsenm
Summary

This patch implements amdgcn image intrinsics. The flags are passed as a mask parameter.

Event Timeline

cfang updated this revision to Diff 65635.Jul 26 2016, 5:12 PM
cfang retitled this revision from to AMDGPU/SI: Implement amdgcn image intrinsics.
cfang updated this object.
cfang added reviewers: arsenm, tstellarAMD.
cfang added a subscriber: llvm-commits.
arsenm added inline comments.Jul 26 2016, 6:05 PM
include/llvm/IR/IntrinsicsAMDGPU.td
203

The full instruction name is image_get_resinfo, so the intrinsic should be int_amdgcn_image_getresinfo

211

This requires a descriptive comment (including the values for which bits)

276–307

These are also missing the image part of the name as well

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
156–162 ↗(On Diff #65635)

Param names should be capitalized, the function name itself should be lowercase

1103–1116 ↗(On Diff #65635)

You don't need these intermediate variables, you can just check the bit as the argument to getTargetConstant directly

1134–1135 ↗(On Diff #65635)

Commented out code

lib/Target/AMDGPU/SIDefines.h
111–114 ↗(On Diff #65635)

Not all of these bits should be exposed. GLC, SLC, and TFE definitely should not be. TFE changes the register class of the output. This should strictly be the ones for controlling the sampling

lib/Target/AMDGPU/SIInstructions.td
2738 ↗(On Diff #65635)

Not sure what this means

2740–2749 ↗(On Diff #65635)

I think you can reduce the number of repeated lines by passing in the suffix, and then concat + cast to the intrinsic (Similar to how MUBUF_LoadIntrinsicPat does it)

test/CodeGen/AMDGPU/llvm.amdgcn.gather4.ll
4–5 ↗(On Diff #65635)

Space between ; and start of comment. Also should probably switch to GCN prefix because it seems likely we'll need to distinguish subtargets here

9–13 ↗(On Diff #65635)

It should work to just store the output directly into an output pointer rather than having to scalarize and export since this is a compute shader

include/llvm/IR/IntrinsicsAMDGPU.td
198–201

Changing this intrinsic will break Mesa, we will need to update Mesa before we can commit this.

nhaehnle added inline comments.
include/llvm/IR/IntrinsicsAMDGPU.td
193–198

Can we just not do this kind of change, please? I don't see how it improves anything, it's inconsistent with the ISA description which has the flags separate, and it'll require an annoying flag day synchronization with Mesa.

nhaustov added inline comments.
include/llvm/IR/IntrinsicsAMDGPU.td
193–198

I agree. Separate flags also play nicely with assembler.

cfang added inline comments.Jul 28 2016, 1:26 PM
include/llvm/IR/IntrinsicsAMDGPU.td
193–198

We plan to add and expose d16 bit, so the Intrinsics for Mesa has to be updated anyway. For the future chips, we may add more flag (bit) and all applications (including Mesa) have to be updated every time a new flag is added. One advantage of using mask parameter is that we don't have to update the application if the new bit is not exposed.

198–201

We will have to add d16 bit! So Mesa will have to be update anyway.

arsenm added inline comments.Jul 28 2016, 1:36 PM
include/llvm/IR/IntrinsicsAMDGPU.td
193–198

The assembler has nothing to do with the intrinsic definition. This isn't changing the MachineInstr's operand structure

cfang updated this revision to Diff 66039.Jul 28 2016, 3:38 PM

update the patch based on Matt's comments.

The following comments are not updated:

  1. >Not all of these bits should be exposed. GLC, SLC, and TFE definitely should not be. TFE changes the register class of the output. This should strictly be the ones for controlling the sampling
    • I asked this question before I started, and the agreement is to expose all at this moment. We have to study all these flags in detail and get a short list to expose later.
  1. >I think you can reduce the number of repeated lines by passing in the suffix, and then concat + cast to the intrinsic (Similar to how MUBUF_LoadIntrinsicPat does it)
    • what's the problem of repeated lines? I think it is more readable than simplified one. Or we can simplify the def in the TableGen cleanup work.

NOTE that Mesa has to be update to accommodate the image_load/image_store intrinsic parameter change.

cfang updated this revision to Diff 66193.Jul 29 2016, 4:08 PM

Remove the RSVD bit based on Tom's suggestion.

mareko added a subscriber: mareko.Aug 1 2016, 3:02 AM

BTW, SLC and GLC bits definitely SHOULD be exposed. LLVM isn't high-level enough to be able to make any assumptions about cache strategies.

arsenm edited edge metadata.Aug 1 2016, 3:04 PM

BTW, SLC and GLC bits definitely SHOULD be exposed. LLVM isn't high-level enough to be able to make any assumptions about cache strategies.

GLC and SLC should still be separate parameters from the format mask (or metadata should control these like it will for non temporal regular loads/stores)

tstellarAMD edited edge metadata.EditedAug 2 2016, 9:50 AM
This comment has been deleted.
include/llvm/IR/IntrinsicsAMDGPU.td
193–198

Can we just not do this kind of change, please? I don't see how it improves anything, it's inconsistent with the ISA description which has the flags separate, and it'll require an annoying flag day synchronization with Mesa.

I really prefer using a mask over having to create an entire new set of intrinsics each time we have to add a new bit. I think we have two solutions here:

  1. As Matt has suggested, keep mesa the same, and have the auto-upgrader in LLVM change from the old intrinsics to the new mask style intrinsics. Although, with this solution, I think we'd still eventually want/need Mesa to start using the mask version.
  1. Define the intrinsics as var_arg. This would allow us to add to i1 operands without breaking the existing operands. I'm just not sure how well var_arg intrinsics are supported.
tstellarAMD added inline comments.Aug 2 2016, 1:30 PM
include/llvm/IR/IntrinsicsAMDGPU.td
221–228

Moving the sample intrinsics to this file is unrelated to the AMDGPUImageLoad/AMDGPUImageStore changes, so this should be done in a separate patch.

cfang added inline comments.Aug 2 2016, 2:01 PM
include/llvm/IR/IntrinsicsAMDGPU.td
221–228

The patch is to implement amdgcn image inttrinsics, which has three categories: AMDGPUImageLoad, AMDGPUImageStore and AMDGPUImageSample. While AMDGPUImageSample is newly defined, and the other two are modified, they do use the same mechanism, i.e. mask parameter! I think it should be better for them to be together in one patch.

cfang updated this revision to Diff 66855.Aug 4 2016, 2:03 PM
cfang edited edge metadata.

Do not use mask parameter!

Keep image load/store intrinsics

expose all flags for sampler intrinsics

tstellarAMD added inline comments.Aug 4 2016, 2:45 PM
include/llvm/IR/IntrinsicsAMDGPU.td
206

This should go in a separate patch. This patch should be only the sampler changes.

218

This is an unrelated change.

224

I'm thinking vdata should be llvm_anyfloat_ty, so we can have it return <4 x half> for the d16 operations. Though it's going to be weird that some <4 x half> values take 4 registers and some only take two.

Another thing I'm not sure of is if image samplers always return floating-point values and never integers.

226

This should be changed to llvm_anyint_ty, so that we can infer the r128 bit.

232

This r128 bit should be dropped.

lib/Target/AMDGPU/SIInstructions.td
2722–2729 ↗(On Diff #66855)

These image load/store changes should go in a separate patch.

Thanks for doing this. One comment below.

include/llvm/IR/IntrinsicsAMDGPU.td
233

tfe should be dropped. AFAIU it changes the return type (5 return values instead of 4).

cfang updated this revision to Diff 67245.Aug 8 2016, 4:47 PM

Implement sampler intrinsics:

  1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty as a result, we expect the intrinsics name to have three suffixes to overload each of these three types;
  1. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as rsrc type, then r128 is true!
  1. don't expose TFE flag, and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da.
  1. LIT tests are not included now!

The intrinsic definitions look good now, thanks.

cfang updated this revision to Diff 67384.Aug 9 2016, 11:53 AM

Add LIT tests for sampler intrinsics!

tstellarAMD commandeered this revision.Oct 11 2016, 2:18 PM
tstellarAMD edited reviewers, added: cfang; removed: tstellarAMD.
tstellarAMD updated this object.

Re-base on top of trunk.

arsenm accepted this revision.Oct 11 2016, 10:08 PM
arsenm edited edge metadata.

LGTM

This revision is now accepted and ready to land.Oct 11 2016, 10:08 PM