This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Use v_mad_u64_u32 for IMAD32
ClosedPublic

Authored by rampitec on Jun 7 2022, 2:43 PM.

Details

Summary

Nic Curtis done the experiments to prove it is faster than a
separate mul and add.

Fixes: SWDEV-332806

Diff Detail

Event Timeline

rampitec created this revision.Jun 7 2022, 2:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 2:43 PM
rampitec requested review of this revision.Jun 7 2022, 2:43 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 7 2022, 2:43 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Jun 7 2022, 2:45 PM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
7

Should also test globalisel

rampitec added inline comments.Jun 7 2022, 2:54 PM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
7

GlobalISel does not work:

VOP3Instructions.td:582:1: warning: Skipped pattern: Could not infer class for EXTRACT_SUBREG operand #0
def : IMAD32_Pats<V_MAD_U64_U32_e64>;

This instruction has 2 results, gisel does not handle it.

rampitec added inline comments.Jun 7 2022, 3:08 PM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
7

Moreover, if I restate it this way:

(EXTRACT_SUBREG (i32 (COPY_TO_REGCLASS (inst $src0, $src1,
                                        (REG_SEQUENCE SReg_64, // Use scalar and let it be legalized
                                                      $src2, sub0,
                                                      (i32 (IMPLICIT_DEF)), sub1),
                                        0 /* clamp */), VReg_64)),
                sub0)

The message it clearer:

VOP3Instructions.td:582:1: warning: Skipped pattern: Dst pattern child has multiple results
def : IMAD32_Pats<V_MAD_U64_U32_e64>;

foad added inline comments.Jun 8 2022, 4:16 AM
llvm/test/CodeGen/AMDGPU/mad_64_32.ll
302–304

It's a shame about the extra movs around the mad but that is not your fault.

llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
47

As a follow up it would be nice if we could fold the "42" into the mad as an inline src2.

rampitec added inline comments.Jun 8 2022, 9:56 AM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
47

Sure. I will take a look why didn't it fold in a followup.

rampitec added inline comments.Jun 8 2022, 12:07 PM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
47

This actually did not fold because of the IMPLICIT_DEF. This is technically not a constant, but a 64 bit partial undef:

%3:sreg_32 = S_MOV_B32 42
%5:sreg_32 = IMPLICIT_DEF
%4:sreg_64 = REG_SEQUENCE killed %3:sreg_32, %subreg.sub0, killed %5:sreg_32, %subreg.sub1
%7:vreg_64, %8:sreg_64 = V_MAD_U64_U32_e64 %0:vgpr_32, %1:vgpr_32, killed %4:sreg_64, 0, implicit $exec
rampitec updated this revision to Diff 435303.Jun 8 2022, 12:55 PM
rampitec marked an inline comment as done.

Handle immediate src2.

rampitec updated this revision to Diff 435349.Jun 8 2022, 3:00 PM

Newly added pattern with imm:$src2 causes assert in global isel exporter with debug tablegen:

assert(WaitingForNamedOperands == 0 &&
       "previous predicate didn't find all operands or "
       "nested predicate that uses operands");

Disable global isel for this rule, it does not work anyway because of the 2 result instruction unsupported by gisel.

rampitec added inline comments.Jun 8 2022, 3:31 PM
llvm/test/CodeGen/AMDGPU/atomic_optimizations_buffer.ll
274

As a separate patch, we can replace any unused sdst with null on gfx10+. Probably in the SIFoldOperands.

foad accepted this revision.Jun 9 2022, 1:53 AM

LGTM, thanks. But we should really do gisel too, even if it means writing C++ code.

llvm/lib/Target/AMDGPU/VOP3Instructions.td
413

Just curious: how does this work? This class has no GISelPredicateCode, but isn't that the same as having a GISelPredicateCode that always returns true?

llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
47

Thanks! Can you add a test where the constant is too large to go inline, if there isn't one already? I guess this can be a literal on GFX10 but not on GFX9.

This revision is now accepted and ready to land.Jun 9 2022, 1:53 AM
rampitec added inline comments.Jun 9 2022, 2:02 AM
llvm/lib/Target/AMDGPU/VOP3Instructions.td
413

To the best of my understanding if you do not have this, you do not have gisel support as well (I.e. it is like if it returns false, not true). Then practically this pattern doesn't work at all because gisel cannot handle multiple results, so this only mutes debug build assert. On a practical side.

llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
47

I believe 42 cannot be inline? That's why I was using it in the first place.

foad added inline comments.Jun 9 2022, 2:11 AM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
47

Integers from -16 to 64 are inline.

rampitec added a comment.EditedJun 9 2022, 2:14 AM

LGTM, thanks. But we should really do gisel too, even if it means writing C++ code.

I do not really think it needs C++ code. What it needs is a gisel exporter working in a way supporting what sdag can do. Maybe we need to extend tg syntax to extract a specific result from a node. This is my religious belief.

Looking at even sdag code for handling of mul_lohi I believe this an unnecessary overkill, serving just one purpose: underdeveloped gisel backend.

rampitec added inline comments.Jun 9 2022, 2:20 AM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
47

My bad, thanks! I will add a test with a larger constant.

rampitec updated this revision to Diff 435619.Jun 9 2022, 11:25 AM
rampitec marked 2 inline comments as done.

Added test with large immediate as src2.

This revision was landed with ongoing or failed builds.Jun 9 2022, 11:40 AM
This revision was automatically updated to reflect the committed changes.
foad added inline comments.Jun 10 2022, 3:33 AM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
69

This mov is not required (since the high part of the 64-bit constant is undef) but I guess it is not worth the effort to try to eliminate it.

rampitec added inline comments.Jun 10 2022, 9:07 AM
llvm/test/CodeGen/AMDGPU/mad_u64_u32.ll
69

Yes, this would not be trivial.

foad added inline comments.Jun 24 2022, 4:07 AM
llvm/lib/Target/AMDGPU/VOP3Instructions.td
591

This causes assertion failures if you try to form a mad with a negative immediate addend - see the test case added in D128435.