This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Turn D16 for MIMG instructions into a regular operand
ClosedPublic

Authored by nhaehnle on May 27 2018, 1:55 PM.

Details

Summary

This allows us to reduce the number of different machine instruction
opcodes, which reduces the table sizes and helps flatten the TableGen
multiclass hierarchies.

We can do this because for each hardware MIMG opcode, we have a full set
of IMAGE_xxx_Vn_Vm machine instructions for all required sizes of vdata
and vaddr registers. Instead of having separate D16 machine instructions,
a packed D16 instructions loading e.g. 4 components can simply use the
same V2 opcode variant that non-D16 instructions use.

We still require a TSFlag for D16 buffer instructions, because the
D16-ness of buffer instructions is part of the opcode. Renaming the flag
should help avoid future confusion.

The one non-obvious code change is that for gather4 instructions, the
disassembler can no longer automatically decide whether to use a V2 or
a V4 variant. The existing logic which choose the correct variant for
other MIMG instruction is extended to cover gather4 as well.

As a bonus, some of the assembler error messages are now more helpful
(e.g., complaining about a wrong data size instead of a non-existing
instruction).

While we're at it, delete a whole bunch of dead legacy TableGen code.

Change-Id: I89b02c2841c06f95e662541433e597f5d4553978

Diff Detail

Repository
rL LLVM

Event Timeline

nhaehnle created this revision.May 27 2018, 1:55 PM

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

IIRC the high bits are preserved except when ECC is enabled.

Okay, so modeling partial register updates. That would require us to add a $vdst_orig tied operand, right? Is there anything else, or any reason why we couldn't just do that unconditionally (but as an undef use), whether D16 or not?

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

IIRC the high bits are preserved except when ECC is enabled.

Okay, so modeling partial register updates. That would require us to add a $vdst_orig tied operand, right? Is there anything else, or any reason why we couldn't just do that unconditionally (but as an undef use), whether D16 or not?

Actually, can we have a VGPR32_LO/HI as the vdata register class for modeling the fact that the upper bits are preserved?

artem.tamazov resigned from this revision.May 30 2018, 3:52 AM

Adding Dmitry as he is more fluent in this domain.

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

IIRC the high bits are preserved except when ECC is enabled.

Okay, so modeling partial register updates. That would require us to add a $vdst_orig tied operand, right? Is there anything else, or any reason why we couldn't just do that unconditionally (but as an undef use), whether D16 or not?

Actually, can we have a VGPR32_LO/HI as the vdata register class for modeling the fact that the upper bits are preserved?

I'm not sure. I've only looked at this a little bit before but it certainly needs experimentation. My guess is the tied operand will be necessary

I was thinking of actually moving the opposite direction for these. For modeling partial register updates, I think having operands for all of these is unmanageable. The problem is worse for ALU instructions with the zero high bit control bit. I think the same issues apply here. What is the behavior for the high bits if only 1 or 3 components are enabled?

IIRC the high bits are preserved except when ECC is enabled.

Okay, so modeling partial register updates. That would require us to add a $vdst_orig tied operand, right? Is there anything else, or any reason why we couldn't just do that unconditionally (but as an undef use), whether D16 or not?

Actually, can we have a VGPR32_LO/HI as the vdata register class for modeling the fact that the upper bits are preserved?

I'm not sure. I've only looked at this a little bit before but it certainly needs experimentation. My guess is the tied operand will be necessary

Okay. Let's assume we don't get half-sized register classes and we do need the tied operand. The question at hand is whether D16 instructions need to be separate from non-D16 instructions. I would prefer them to be not separate, hence this patch. I think it should be possible to always have tied operands, but with undef uses in the non-D16 case. What do you think?

dp accepted this revision.Jun 1 2018, 5:30 AM

Looks good.

Regarding partial register updates: would there be any performance benefit from supporting this feature for MIMG? I.e. cannot we just ignore this feature and handle upper bits as undefined?

This revision is now accepted and ready to land.Jun 1 2018, 5:30 AM
In D47434#1118807, @dp wrote:

Regarding partial register updates: would there be any performance benefit from supporting this feature for MIMG? I.e. cannot we just ignore this feature and handle upper bits as undefined?

I believe that's effectively what we're doing today.

I'm assuming that I can submit this as-is soon. I'm holding off for now so that I can submit all MIMG-related changes in this stack at once.

This revision was automatically updated to reflect the committed changes.