Page MenuHomePhabricator

[AMDGPU] Use d16 flag for image.sample instructions
ClosedPublic

Authored by mariusz-sikora-at-amd on Apr 22 2022, 12:15 AM.

Details

Summary

Image.sample instruction can be forced to return half type instead of
float when d16 flag is enabled.

This patch adds new pattern in InstCombine to detect if output of
image.sample is used later only by fptrunc which converts the type
from float to half. If pattern is detected then fptrunc and image.sample
are combined to single image.sample which is returning half type.
Later in Lowering part d16 flag is added to image sample intrinsic.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 12:15 AM
mariusz-sikora-at-amd requested review of this revision.Apr 22 2022, 12:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 22 2022, 12:15 AM
foad added a comment.Apr 22 2022, 2:50 AM

Could do with some tests for other instructions like image_gather4, image_load, image_msaa_load.

llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
116

Please update the comment to describe the new arguments.

235

As a further improvement you could handle multiple uses if all of them are fptruncs - but perhaps that will never happen in practice because of CSE.

236

LLVM uses CamelCase for variable names, like User.

foad added inline comments.Apr 22 2022, 7:39 AM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
225

I think this should be hasD16Images() but I'm not 100% sure. You could check by running your test case through llc with -mcpu=gfx810 to see if it manages to generate valid ISA.

mariusz-sikora-at-amd marked 3 inline comments as done.Apr 22 2022, 10:10 AM
mariusz-sikora-at-amd added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
235

If you are referring to:
%s = call half @llvm.amdgcn.image.xxx
%s1 = fptrunc float %s to half
%s2 = fptrunc float %s to half
...
then this will be optimize before entering to simplifyAMDGCNImageIntrinsic

Changes after review

foad accepted this revision.Apr 25 2022, 4:21 AM

LGTM, thanks!

This revision is now accepted and ready to land.Apr 25 2022, 4:21 AM

I don't have commit access, can anyone merge this patch for me, please?

Mariusz Sikora
mariusz.sikora@amd.com

This revision was automatically updated to reflect the committed changes.

This change was merged to main and later reverted because of memory-sanitizer failing tests. I'm re-opening this review to publish updated version of the change.

This revision is now accepted and ready to land.Apr 26 2022, 9:37 PM

In last version (which was merged and reverted) I made a mistake at the end
of modifyIntrinsicCall function where I was doing operations on removed
instruction (InstToReplace).

This update is fixing this issue.
I used buildbot-scripts locally to reproduce failing tests and
later to verify if my new fix is fixing it.
(https://github.com/google/sanitizers/wiki/SanitizerBotReproduceBuild)

piotr accepted this revision.Apr 27 2022, 12:27 AM

LGTM

foad added inline comments.Apr 27 2022, 2:54 AM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
142

Can you compare pointers instead of using isIdenticalTo? That would be simpler, if it works.

mariusz-sikora-at-amd marked an inline comment as done.Apr 27 2022, 3:42 AM
mariusz-sikora-at-amd added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
142

Yes it will work. Thanks.

mariusz-sikora-at-amd marked an inline comment as done.

Compare pointers of OldIntr and InstToRemove instead of using isIdenticalTo.

sebastian-ne requested changes to this revision.Apr 27 2022, 8:42 AM
sebastian-ne added a subscriber: sebastian-ne.

I wanted to add this combine before, but I don’t think there is a way to add d16 to an instruction without potentially breaking the code.
The reason is, when an image_sample has the d16 flag enabled, it will use f32→f16 truncation or i32→i16 truncation, depending on the texture format in the descriptor.

Combining image_sample+fptrunc to image_sample d16 works fine for float textures, but I assume we don’t know at compile time if a texture is an integer or float texture.
The application may interpret stored values as float and does an fptrunc, but the texture is actually defined as an integer texture, so the hardware uses an integer trunc instead, giving different results.

This revision now requires changes to proceed.Apr 27 2022, 8:42 AM

I wanted to add this combine before, but I don’t think there is a way to add d16 to an instruction without potentially breaking the code.
The reason is, when an image_sample has the d16 flag enabled, it will use f32→f16 truncation or i32→i16 truncation, depending on the texture format in the descriptor.

Combining image_sample+fptrunc to image_sample d16 works fine for float textures, but I assume we don’t know at compile time if a texture is an integer or float texture.
The application may interpret stored values as float and does an fptrunc, but the texture is actually defined as an integer texture, so the hardware uses an integer trunc instead, giving different results.

If the semantics change between integer and FP we should probably have separate opcodes

piotr added a comment.Apr 28 2022, 1:08 AM

I wanted to add this combine before, but I don’t think there is a way to add d16 to an instruction without potentially breaking the code.
The reason is, when an image_sample has the d16 flag enabled, it will use f32→f16 truncation or i32→i16 truncation, depending on the texture format in the descriptor.

Combining image_sample+fptrunc to image_sample d16 works fine for float textures, but I assume we don’t know at compile time if a texture is an integer or float texture.
The application may interpret stored values as float and does an fptrunc, but the texture is actually defined as an integer texture, so the hardware uses an integer trunc instead, giving different results.

Are you aware of any examples that would fail with this patch? At least in Vulkan, the image format must match the sampled type.

sebastian-ne accepted this revision.Apr 28 2022, 2:37 AM

I wanted to add this combine before, but I don’t think there is a way to add d16 to an instruction without potentially breaking the code.
The reason is, when an image_sample has the d16 flag enabled, it will use f32→f16 truncation or i32→i16 truncation, depending on the texture format in the descriptor.

Combining image_sample+fptrunc to image_sample d16 works fine for float textures, but I assume we don’t know at compile time if a texture is an integer or float texture.
The application may interpret stored values as float and does an fptrunc, but the texture is actually defined as an integer texture, so the hardware uses an integer trunc instead, giving different results.

Are you aware of any examples that would fail with this patch? At least in Vulkan, the image format must match the sampled type.

I see, I didn’t know the type on the intrinsic always matches the type of the texture. Sorry for the noise.
It would be nice to have the same combine with integer textures.

This revision is now accepted and ready to land.Apr 28 2022, 2:37 AM

I wanted to add this combine before, but I don’t think there is a way to add d16 to an instruction without potentially breaking the code.
The reason is, when an image_sample has the d16 flag enabled, it will use f32→f16 truncation or i32→i16 truncation, depending on the texture format in the descriptor.

Combining image_sample+fptrunc to image_sample d16 works fine for float textures, but I assume we don’t know at compile time if a texture is an integer or float texture.
The application may interpret stored values as float and does an fptrunc, but the texture is actually defined as an integer texture, so the hardware uses an integer trunc instead, giving different results.

Are you aware of any examples that would fail with this patch? At least in Vulkan, the image format must match the sampled type.

I see, I didn’t know the type on the intrinsic always matches the type of the texture. Sorry for the noise.
It would be nice to have the same combine with integer textures.

I will add the combine with integer in the next patch. Thanks for pointing it.

This revision was landed with ongoing or failed builds.May 4 2022, 9:30 PM
This revision was automatically updated to reflect the committed changes.