This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][MC][GFX11] Correct v_fmac_.*_e64_dpp
ClosedPublic

Authored by dp on Sep 30 2022, 7:04 AM.

Diff Detail

Event Timeline

dp created this revision.Sep 30 2022, 7:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2022, 7:04 AM
dp requested review of this revision.Sep 30 2022, 7:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2022, 7:04 AM

Unfortunately there is a codegen issue with this patch. Since the Instruction would no longer have an old operand, we couldn't create the DPP version (See GCNDPPCombine.cpp:244). I have just pushed a test for that 50dfd3e9e4930834c3c77a289000fd57bbc16727

dp added a comment.Sep 30 2022, 10:35 AM

Unfortunately there is a codegen issue with this patch. Since the Instruction would no longer have an old operand, we couldn't create the DPP version (See GCNDPPCombine.cpp:244). I have just pushed a test for that 50dfd3e9e4930834c3c77a289000fd57bbc16727

So for _e64_dpp we need both old and src2 and both these operands must be tied to dst, correct?

In D134961#3827685, @dp wrote:

Unfortunately there is a codegen issue with this patch. Since the Instruction would no longer have an old operand, we couldn't create the DPP version (See GCNDPPCombine.cpp:244). I have just pushed a test for that 50dfd3e9e4930834c3c77a289000fd57bbc16727

So for _e64_dpp we need both old and src2 and both these operands must be tied to dst, correct?

Yes, it seems that would be correct to me.

The FIXME hints the presence of src2 at all is a hack, but I'm not sure how practical it is to remove src2 completely.

dp added a comment.Sep 30 2022, 1:08 PM

Unfortunately there is a codegen issue with this patch.

It looks like we have never supported dpp variants of MAC and FMAC opcodes in codegen. e32 dpp definitions also lack $old argument. See https://github.com/llvm/llvm-project/blob/2812a1413f19b9ebd954b57ecbd021656508eade/llvm/lib/Target/AMDGPU/VOP2Instructions.td#L417

dp added a comment.Oct 3 2022, 4:19 AM

So for _e64_dpp we need both old and src2 and both these operands must be tied to dst, correct?

Yes, it seems that would be correct to me.

After a bit of digging I found that we cannot tie more than one operand to another. See https://github.com/llvm/llvm-project/blob/2812a1413f19b9ebd954b57ecbd021656508eade/llvm/utils/TableGen/CodeGenInstruction.cpp#L372
I believe it is the reason why codegen does not support dpp variants of MAC and FMAC opcodes.

If I understand the current state: e32 dpp is not creatable but is assemblable, e64 dpp is creatable but not assemblable. Perhaps a way to fix _e64 is in the asmParser in cvtVOP3DPP ~L8900? I think it's treating the operand as src2 when it is actually the old operand.

dp added a comment.Oct 3 2022, 7:22 AM

If I understand the current state: e32 dpp is not creatable but is assemblable, e64 dpp is creatable but not assemblable.

Correct. I'm not sure e64 dpp is creatable but at least it has $old operand.

Perhaps a way to fix _e64 is in the asmParser in cvtVOP3DPP ~L8900? I think it's treating the operand as src2 when it is actually the old operand.

cvtVOP3DPP incorrectly handles $old because it is not tied to $dst.

It seems possible to change cvtVOP3DPP to handle v_fmac opcodes correctly. But if we go that way, $src2 will be tied to $dst, but $old will be present but not tied. Is this acceptable?

In D134961#3830721, @dp wrote:

If I understand the current state: e32 dpp is not creatable but is assemblable, e64 dpp is creatable but not assemblable.

Correct. I'm not sure e64 dpp is creatable but at least it has $old operand.

Perhaps a way to fix _e64 is in the asmParser in cvtVOP3DPP ~L8900? I think it's treating the operand as src2 when it is actually the old operand.

cvtVOP3DPP incorrectly handles $old because it is not tied to $dst.

It seems possible to change cvtVOP3DPP to handle v_fmac opcodes correctly. But if we go that way, $src2 will be tied to $dst, but $old will be present but not tied. Is this acceptable?

I believe it is actually that way currently in main. And it seems to work in GCNDPPCombine at least. Though I have not strongly tested it, maybe it's wasting a register? @rampitec

Tablegen dump yields :
def V_FMAC_F32_e64_dpp

...
dag InOperandList = (ins anonymous_9483:$old ...
...
string Constraints = "$vdst = $src2";
string DisableEncoding = "$src2";
In D134961#3830721, @dp wrote:

If I understand the current state: e32 dpp is not creatable but is assemblable, e64 dpp is creatable but not assemblable.

Correct. I'm not sure e64 dpp is creatable but at least it has $old operand.

Perhaps a way to fix _e64 is in the asmParser in cvtVOP3DPP ~L8900? I think it's treating the operand as src2 when it is actually the old operand.

cvtVOP3DPP incorrectly handles $old because it is not tied to $dst.

It seems possible to change cvtVOP3DPP to handle v_fmac opcodes correctly. But if we go that way, $src2 will be tied to $dst, but $old will be present but not tied. Is this acceptable?

I believe it is actually that way currently in main. And it seems to work in GCNDPPCombine at least. Though I have not strongly tested it, maybe it's wasting a register? @rampitec

Tablegen dump yields :
def V_FMAC_F32_e64_dpp

...
dag InOperandList = (ins anonymous_9483:$old ...
...
string Constraints = "$vdst = $src2";
string DisableEncoding = "$src2";

Do we really need both? src2 and old are basically the same thing here.

In D134961#3830721, @dp wrote:

If I understand the current state: e32 dpp is not creatable but is assemblable, e64 dpp is creatable but not assemblable.

Correct. I'm not sure e64 dpp is creatable but at least it has $old operand.

Perhaps a way to fix _e64 is in the asmParser in cvtVOP3DPP ~L8900? I think it's treating the operand as src2 when it is actually the old operand.

cvtVOP3DPP incorrectly handles $old because it is not tied to $dst.

It seems possible to change cvtVOP3DPP to handle v_fmac opcodes correctly. But if we go that way, $src2 will be tied to $dst, but $old will be present but not tied. Is this acceptable?

I believe it is actually that way currently in main. And it seems to work in GCNDPPCombine at least. Though I have not strongly tested it, maybe it's wasting a register? @rampitec

Tablegen dump yields :
def V_FMAC_F32_e64_dpp

...
dag InOperandList = (ins anonymous_9483:$old ...
...
string Constraints = "$vdst = $src2";
string DisableEncoding = "$src2";

Do we really need both? src2 and old are basically the same thing here.

I think the most correct thing to do is remove src2 operand from FMAC. But the FIXME at L410 implies we had good reason for adding src2. If we don't want to do that, we should either fix GCNDPPCombine to work without an old operand for these instruction, or fix the Asm/Dasm to work for them.

dp updated this revision to Diff 465733.Oct 6 2022, 8:44 AM

Correct assembler and disassembler to handle special fmac operands (old, src2, src2_modifiers).

rampitec accepted this revision.Oct 6 2022, 2:30 PM

LGTM

This revision is now accepted and ready to land.Oct 6 2022, 2:30 PM
This revision was landed with ongoing or failed builds.Oct 7 2022, 6:22 AM
This revision was automatically updated to reflect the committed changes.