This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Fix creating invalid copy when adjusting dmask
ClosedPublic

Authored by arsenm on Oct 17 2017, 10:44 PM.

Details

Reviewers
airlied
mareko
Summary

Move the entire optimization to one place. Before it was possible
to adjust dmask without changing the register class of the output
instruction, since they were done in separate places. Fix all

Diff Detail

Event Timeline

arsenm created this revision.Oct 17 2017, 10:44 PM

There are some test crashes with this and I think I made the wrong guess for dmask behavior

mareko edited edge metadata.Oct 18 2017, 9:57 AM

Each bit of dmask determines whether that component is enabled. Image opcodes return 4 components if dmask == 0xf. If dmask == 0x2, image opcodes only return the 2nd component in <1 x float>. If dmask = 0x5, image opcodes return the 1st and 3rd component in <2 x float>. If dmask = 0xa, image opcodes return the 2nd and 4th component in <2 x float>.
Gather4 opcodes are an exception and always return 4 components.

arsenm updated this revision to Diff 119871.Oct 23 2017, 9:21 AM

Fix other test failures

mareko accepted this revision.Oct 24 2017, 2:14 AM

LGTM.

This revision is now accepted and ready to land.Oct 24 2017, 2:14 AM

With this, when dmask = 0x2, we get "image_get_lod v[0:1], ...". Based on what Marek said, wouldn't it only be returning a single value to v0? What would v1 get set to?

In comparison, for image_sample with dmask = 0x2 I see only a single destination register specified. That's also what I see on the proprietary driver for image_get_lod.

With this, when dmask = 0x2, we get "image_get_lod v[0:1], ...". Based on what Marek said, wouldn't it only be returning a single value to v0? What would v1 get set to?

In comparison, for image_sample with dmask = 0x2 I see only a single destination register specified. That's also what I see on the proprietary driver for image_get_lod.

You are right. dmask = 0x2 makes image_get_lod return the 2nd channel as <1 x float>, which will be in v0 in your example.

In case it's still confusing: the number of components returned by image opcodes is popcount(dmask). The code could just do popcount(dmask) instead of computing BitsSet.

arsenm updated this revision to Diff 125016.Nov 30 2017, 2:14 PM
arsenm edited the summary of this revision. (Show Details)
mareko added a comment.Dec 4 2017, 1:17 PM

There are no piglit regressions.

arsenm closed this revision.Dec 4 2017, 2:20 PM

r319705