Move the entire optimization to one place. Before it was possible
to adjust dmask without changing the register class of the output
instruction, since they were done in separate places. Fix all
Details
Diff Detail
Event Timeline
There are some test crashes with this and I think I made the wrong guess for dmask behavior
Each bit of dmask determines whether that component is enabled. Image opcodes return 4 components if dmask == 0xf. If dmask == 0x2, image opcodes only return the 2nd component in <1 x float>. If dmask = 0x5, image opcodes return the 1st and 3rd component in <2 x float>. If dmask = 0xa, image opcodes return the 2nd and 4th component in <2 x float>.
Gather4 opcodes are an exception and always return 4 components.
With this, when dmask = 0x2, we get "image_get_lod v[0:1], ...". Based on what Marek said, wouldn't it only be returning a single value to v0? What would v1 get set to?
In comparison, for image_sample with dmask = 0x2 I see only a single destination register specified. That's also what I see on the proprietary driver for image_get_lod.
You are right. dmask = 0x2 makes image_get_lod return the 2nd channel as <1 x float>, which will be in v0 in your example.
In case it's still confusing: the number of components returned by image opcodes is popcount(dmask). The code could just do popcount(dmask) instead of computing BitsSet.