This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Explicitly track d16 for image legalization
ClosedPublic

Authored by arsenm on Jan 10 2022, 7:24 AM.

Details

Summary

We were trying to guess at the original IR type for image intrinsics
after legalization to figure out if they were d16, but this didn't
work. Explicitly track if this is a d16 operation or not in the
opcode, as is done for the buffer intrinsics.

The OpenCL library is using f32 image writes with a dmask of 15 for
some reason, and this was incorrectly switching them to use d16. Fixes
image failures in the OpenCL conformance test. The equivalent dmask
for loads doesn't even select in either selector.

Diff Detail

Event Timeline

arsenm created this revision.Jan 10 2022, 7:24 AM
arsenm requested review of this revision.Jan 10 2022, 7:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2022, 7:24 AM
Herald added a subscriber: wdng. · View Herald Transcript
sebastian-ne accepted this revision.Jan 10 2022, 8:56 AM

Looks good to me, the code looks a lot better than before.

Having a float image store with dmask=15 sounds bad. What happens if the load/store is assigned the last allocated register? I.e.

; Shader allocates v[0:63]
image_store v63, v[1:2], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm

Will the hardware try to read v64–v66 and hang because they are not allocated?
(I remember there was an issue with d16 and the hardware incorrectly computing the register requirement, but not sure if that leads to hangs or values getting lost.)

This revision is now accepted and ready to land.Jan 10 2022, 8:56 AM

Looks good to me, the code looks a lot better than before.

Having a float image store with dmask=15 sounds bad. What happens if the load/store is assigned the last allocated register? I.e.

; Shader allocates v[0:63]
image_store v63, v[1:2], s[0:7] dmask:0xf dim:SQ_RSRC_IMG_2D unorm

Will the hardware try to read v64–v66 and hang because they are not allocated?
(I remember there was an issue with d16 and the hardware incorrectly computing the register requirement, but not sure if that leads to hangs or values getting lost.)

I don't know, but my guess would be the hardware ignores the high bits of the mask, and we should probably teach codegen to ignore them too and select (at least in the load case). I believe the out of bounds register access behavior has always been discard writes, return v0