This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][InstCombine] Use D16 if only f16 precision is needed
AbandonedPublic

Authored by sebastian-ne on Jan 21 2022, 4:34 AM.

Details

Reviewers
arsenm
foad
Summary

If an image load feeds into truncates to 16-bit, use a d16 load.
In the same way, if a stored value has only 16-bit precission, enable
d16.

Diff Detail

Event Timeline

sebastian-ne created this revision.Jan 21 2022, 4:34 AM
sebastian-ne requested review of this revision.Jan 21 2022, 4:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2022, 4:34 AM
foad added inline comments.Jan 21 2022, 5:11 AM
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
64–67

I don't understand why we need this code.

90–91

I know you haven't changed this but it seems dangerous: for a d16 input treated as unsigned you should not match sext here, and for a d16 input treated as signed you should not match zext. Or do all d16 integer inputs ignore the high bits?

107

NewTy = VTy->getWithNewType(NewScalarTy)?

llvm/lib/Target/AMDGPU/AMDGPUInstrInfo.h
41

If this was already unused, you can remove it with a separate NFC patch (consider it pre-approved if you want).

sebastian-ne abandoned this revision.Jan 24 2022, 5:45 AM
sebastian-ne added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstCombineIntrinsic.cpp
90–91

You’re right, this seems to be incorrect. Good catch.
As far as I understand, the format field in the image descriptor decides if float or int conversion is used, so the compiler can’t combine d16 at all.
For A16, it’s a uint for instructions without sampler or a float for instructions with sampler. I’ll fix that.