Page MenuHomePhabricator

[CUDA] Implemented _[bi]mma* builtins.

Authored by tra on Apr 4 2019, 11:27 AM.



These builtins provide access to the new integer and
sub-integer variants of MMA (matrix multiply-accumulate) instructions
provided by CUDA-10.x on sm_75 (AKA Turing) GPUs.

Also added a feature for PTX 6.4. While Clang/LLVM does not generate
any PTX instructions that need it, we still need to pass it through to
ptxas in order to be able to compile code that uses the new 'mma'
instruction as inline assembly (e.g used by NVIDIA's CUTLASS library

Diff Detail

rC Clang

Event Timeline

tra created this revision.Apr 4 2019, 11:27 AM
tra updated this revision to Diff 193774.Apr 4 2019, 1:49 PM
tra edited the summary of this revision. (Show Details)

Cleaned up mma test generation.

tra updated this revision to Diff 193796.Apr 4 2019, 3:49 PM
  • Fixed minor issues with parameters of the new builtins:
    • __imma*_st_c_i32 builtins have 'const int * src'
    • __bmma_m8n8k128_mma_xor_popc_b1 does not have 'satf' argument.
tra updated this revision to Diff 193809.Apr 4 2019, 4:41 PM
  • Added PTX64 to the list of builtins' constraints.
timshen added inline comments.Apr 5 2019, 12:10 PM
12884 ↗(On Diff #193809)

How about having a simple struct and a function?

struct NvptxMmaLdstInfo {
  unsigned NumResults;
  unsigned IID_col;
  unsigned IID_row;

NvptxMmaLdstInfo getNvptxMmaLdstInfo(unsigned BuiltinID) { ... }

I don't see the need for classes here.

13020 ↗(On Diff #193809)

ditto (struct + function)?

tra updated this revision to Diff 194226.Apr 8 2019, 5:08 PM
  • Converted class to struct+function as Tim suggested.
tra marked 2 inline comments as done.Apr 8 2019, 5:09 PM
timshen accepted this revision.Apr 8 2019, 5:17 PM
This revision is now accepted and ready to land.Apr 8 2019, 5:17 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptApr 25 2019, 3:27 PM
Herald added a subscriber: kristina. · View Herald Transcript