This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] gfx11 WMMA instruction support
ClosedPublic

Authored by Joe_Nash on Jun 28 2022, 1:15 PM.

Details

Summary

gfx11 introduces new WMMA (Wave Matrix Multiply-accumulate)
instructions.

Diff Detail

Event Timeline

Joe_Nash created this revision.Jun 28 2022, 1:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 1:15 PM
Joe_Nash requested review of this revision.Jun 28 2022, 1:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 28 2022, 1:15 PM
Joe_Nash added reviewers: foad, arsenm, rampitec, piotr, rdomingu, Restricted Project.Jun 28 2022, 1:16 PM
arsenm added inline comments.Jun 28 2022, 1:19 PM
llvm/include/llvm/IR/IntrinsicsAMDGPU.td
1990

Missing the clang changes and tests for this

Joe_Nash updated this revision to Diff 441022.Jun 29 2022, 8:11 AM

Removed clang builtin from intrinsics. The builtins can be added in a later patch.

arsenm added inline comments.Jun 29 2022, 10:36 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3739–3740

Probably should have a verifier check this, or just rely on 0/non-0

llvm/lib/Target/AMDGPU/GCNHazardRecognizer.cpp
1708

Can use Register() in place of NoRegister

llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
3268

Why skip implicit operands?

Joe_Nash updated this revision to Diff 441127.Jun 29 2022, 12:28 PM
Joe_Nash marked 2 inline comments as done.

use Register() in place of NoRegister. Add implicit operands in convertToThreeAddress

foad added inline comments.Jun 30 2022, 1:12 AM
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp
3270

I think you could just raise the upper bound of the loop above to E = MI.getNumOperands() instead of adding this extra call?

Joe_Nash updated this revision to Diff 441401.Jun 30 2022, 7:55 AM

better way to copy implicit operands

llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3739–3740

I don't know what to do here. The intrinsic has an i1 field. If you put a non-i1 value that will be reported right? Given that, we are asserting that other parts of ISel haven't transformed this value incorrectly. Also, we do the same thing in selectDotIUVOP3PMods, line 3726. Please let me know what could be done.

Joe_Nash marked an inline comment as done.Jun 30 2022, 7:56 AM
arsenm added inline comments.Jun 30 2022, 7:58 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3739–3740

I'd just invert the check below to != 0. The machine verifier is certainly not enforcing this be 0/-1 for booleans. Practically speaking, this would only come up for hand written MIR

piotr added a comment.Jun 30 2022, 8:00 AM
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3739–3740

I don't know what to do here. The intrinsic has an i1 field. If you put a non-i1 value that will be reported right? Given that, we are asserting that other parts of ISel haven't transformed this value incorrectly. Also, we do the same thing in selectDotIUVOP3PMods, line 3726. Please let me know what could be done.

Indeed, this was strongly inspired by the existing code in selectDotIUVOP3PMods, which handles the intrinsic in a similar way.

Joe_Nash updated this revision to Diff 441404.Jun 30 2022, 8:08 AM

invert op_sel_0 value check

arsenm accepted this revision.Jun 30 2022, 8:09 AM
arsenm added inline comments.
llvm/lib/Target/AMDGPU/AMDGPUInstructionSelector.cpp
3739

No real point to the assert anymore

This revision is now accepted and ready to land.Jun 30 2022, 8:09 AM
This revision was landed with ongoing or failed builds.Jun 30 2022, 8:43 AM
This revision was automatically updated to reflect the committed changes.