This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Added intrinsics/instructions for MMA ops on (sub-)integers
ClosedPublic

Authored by tra on Mar 29 2019, 2:55 PM.

Details

Summary

PTX 6.3 (CUDA-10.0) extends wmma instruction to support s8/u8/s4/u4/b1 -> s32.

All of the new instructions are still handled mostly by tablegen. I've slightly
refactored the code to drive intrinsic/instruction generation from a master
list of supported variants, so all irregularities have to be implemented in one place only.

The test generation script wmma.py has been refactored in a similar way.
I've added additional checks to verify the sanity of the set of tests generated
by the script for particular PTX and SM combination.

Event Timeline

tra created this revision.Mar 29 2019, 2:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 29 2019, 2:55 PM
timshen accepted this revision.Apr 1 2019, 3:24 PM

Discussed with Art offline. The tablegen code is still not readable, but it's considerably better than the past, and inventing new tools (e.g. Cartesian product) may be hard.

llvm/include/llvm/IR/IntrinsicsNVVM.td
155

Can you add a few examples of the generated regs?

155

Can you document Type{A,B,C,D} for their meanings?

This revision is now accepted and ready to land.Apr 1 2019, 3:24 PM
tra marked an inline comment as done.Apr 1 2019, 4:37 PM
tra added inline comments.
llvm/include/llvm/IR/IntrinsicsNVVM.td
155

Typical MMA_REGS record looks like this:

def anonymous_58 {      // WMMA_REGS
  string geom = "m16n16k16";
  string frag = "a";
  string ptx_elt_type = "f16";
  string gft = "m16n16k16:a:f16";
  string ft = "a:f16";
  list<LLVMType> regs = [llvm_v2f16_ty, llvm_v2f16_ty, llvm_v2f16_ty, llvm_v2f16_ty, llvm_v2f16_ty, llvm_v2f16_ty, llvm_v2f16_ty, llvm_v2f16_ty];
}

It carries information necessary to generate relevant bits of the instrisics & instructions. E.g. how many registers we need to use for the fragment, what do we call them and what's the corresponding LLVM type.
The details on supported fragment formats can be found in the latest PTX ISA docs:
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-fragment
I'll add the link to the comment.

The string lists in TypeX carry PTX types supported by MMA ops with geometries specified by Geom.

tra updated this revision to Diff 193755.Apr 4 2019, 11:34 AM
tra edited the summary of this revision. (Show Details)
  • Enabled .satf for s4/u4.
This revision was automatically updated to reflect the committed changes.