This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Propose to redefine image load/store intrinsics
AbandonedPublic

Authored by cfang on Aug 8 2016, 4:56 PM.

Details

Summary

This is a proposal to sync the definition of image load/store intrinsics with these pf samplers: https://reviews.llvm.org/D22838.

  1. define vdata type to be llvm_anyfloat_ty, address type to be llvm_anyfloat_ty, and rsrc type to be llvm_anyint_ty as a result, we expect the intrinsics name to have three suffixes to overload each of these three types;
  1. D128 as well as two other flags are implied in the three types, for example, if you use v8i32 as rsrc type, then r128 is true!
  1. don't expose TFE flag and unorm flag (set to 1), and other flags are exposed in the instruction order: unrm, glc, slc, lwe and da.
  1. LIT tests are not fully updated now!

Diff Detail

Event Timeline

cfang updated this revision to Diff 67249.Aug 8 2016, 4:56 PM
cfang retitled this revision from to AMDGPU/SI: Propose to redefine image load/store intrinsics.
cfang updated this object.
cfang added reviewers: arsenm, tstellarAMD.
cfang added subscribers: arsenm, llvm-commits.
arsenm edited edge metadata.Aug 8 2016, 8:15 PM

Why wouldn't unorm be exposed?

include/llvm/IR/IntrinsicsAMDGPU.td
255

da should not be exposed since it is the mangling of the coordinate type

tstellarAMD added inline comments.Aug 8 2016, 8:20 PM
include/llvm/IR/IntrinsicsAMDGPU.td
255

As far as I know da doesn't impact the coordinate type. Mesa doesn't seem to change anything about the coordinates when setting them.

cfang updated this revision to Diff 67415.Aug 9 2016, 2:11 PM
cfang edited edge metadata.

Update LIT tests based on new intrinsics definition for image load and image:

  1. image intrinsics name has three suffixes for types of vdata, address and resource, respectively. Something like: declare void @llvm.amdgcn.image.store.v4f32.v2i32.v8i32(<4 x float>, <2 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #0 declare <4 x float> @llvm.amdgcn.image.load.v4f32.v4i32.v8i32(<4 x i32>, <8 x i32>, i32, i1, i1, i1, i1) #1
  1. The four flags exposed are in the following order: glc, slc, lwe and da
  1. unrm is not exposed and default to 1

I don't know about making vdata (input & output) overloaded. If you do, please at least add a machine instruction verifier that checks the type against dmask.

include/llvm/IR/IntrinsicsAMDGPU.td
255

da does affect how coordinates are interpreted by the TA block, but not in a way that LLVM could extract from the coordinate type, since LLVM cannot know everything it needs to know to do so (it doesn't know the image dimensions).

mareko added a subscriber: mareko.Aug 10 2016, 8:00 AM

"unorm" and "da" must be exposed as parameters. They don't change the type, but they change the behavior of the TA hardware block. In all cases, the type is always floating-point.

"r128" doesn't have to be exposed and it's kinda useless. We don't have any use case for it and I think the next-gen hardware (after Polaris) doesn't have it either.

if you use v8i32 as rsrc type, then r128 is true!

Also, the quoted sentence is non-sense. In 100% of our cases, r128 must be 0.

if you use v8i32 as rsrc type, then r128 is true!

Also, the quoted sentence is non-sense. In 100% of our cases, r128 must be 0.

Yes, the quoted comment is wrong. Do you think I can drop r128 bit support?
i.e. set resource type to be v8i32 and r128 bit to be 0 all the time?

I heard GL somehow uses R128 bit.

"unorm" and "da" must be exposed as parameters. They don't change the type, but they change the behavior of the TA hardware block. In all cases, the type is always floating-point.

"r128" doesn't have to be exposed and it's kinda useless. We don't have any use case for it and I think the next-gen hardware (after Polaris) doesn't have it either.

This patch also consider image load and image store. For image store, unorm bit must be 1. I haven't seen any restriction regarding image load.
Are you sure the coordinate type is always float-point? I know for image_sample, it is the case, and not sure image load and image store.

if you use v8i32 as rsrc type, then r128 is true!

Also, the quoted sentence is non-sense. In 100% of our cases, r128 must be 0.

Yes, the quoted comment is wrong. Do you think I can drop r128 bit support?
i.e. set resource type to be v8i32 and r128 bit to be 0 all the time?

I heard GL somehow uses R128 bit.

Mesa GL doesn't use R128. If the closed GL does, I don't know about it.

"unorm" and "da" must be exposed as parameters. They don't change the type, but they change the behavior of the TA hardware block. In all cases, the type is always floating-point.

"r128" doesn't have to be exposed and it's kinda useless. We don't have any use case for it and I think the next-gen hardware (after Polaris) doesn't have it either.

This patch also consider image load and image store. For image store, unorm bit must be 1. I haven't seen any restriction regarding image load.
Are you sure the coordinate type is always float-point? I know for image_sample, it is the case, and not sure image load and image store.

Image loads/stores are a different category. There are several categories but you can split all image opcodes into two:

  • takes a floating-point address (all sample and gather opcodes) and also a sampler descriptor, the "unorm" and "da" bits apply here
  • takes an integer address (all load, store and atomic opcodes), the "unorm" bit is irrelevant (or must be 1 in some cases) and only the "da" bit applies
cfang added a comment.Aug 10 2016, 1:37 PM

> Mesa GL doesn't use R128. If the closed GL does, I don't know about it.

Just discussed with other compiler engineers. Compiler should expose existing hardware feature no matter it is currently used or not.

cfang added a comment.Aug 10 2016, 1:42 PM

Image loads/stores are a different category. There are several categories but you can split all image opcodes into two:

  • takes a floating-point address (all sample and gather opcodes) and also a sampler descriptor, the "unorm" and "da" bits apply here
  • takes an integer address (all load, store and atomic opcodes), the "unorm" bit is irrelevant (or must be 1 in some cases) and only the "da" bit applies

Thanks, This is what we are doing currently.
This patch handle image load/store; unorm ==1 and expose da

Another patch (https://reviews.llvm.org/D22838) handle sampler related image intrinsics which expose unorm and da

cfang added a comment.Aug 11 2016, 3:24 PM

I don't know about making vdata (input & output) overloaded. If you do, please at least add a machine instruction verifier that checks the type against dmask.

The data could be types of float or half, so we need to overload the vdata type. Also we need to set D16 bit for vector of half type.

Could you outline what should I do in order to "add a machine instruction verifier that checks the type against dmask". Thanks.

cfang added a comment.Aug 11 2016, 3:28 PM

Hello, All:

Do you agree with the proposed image load/store intrinsics definition? If yes, what should you suggest foe the existing applications that use "old" (existing) image load/store intrinsics? Thanks so much, and we need this change urgently!

The changes look good to me. Existing users (Mesa) will need to be fixed manually, but we can take care of that.

arsenm added a comment.Dec 5 2016, 2:24 PM

Was this already committed?

cfang abandoned this revision.Dec 5 2016, 3:00 PM

withdraw this revision because this same patch has been committed to trunk.