This is an archive of the discontinued LLVM Phabricator instance.

[CUDA] Added __hmma_m16n16k16_* builtins to support mma instructions in sm_70
ClosedPublic

Authored by tra on Oct 10 2017, 9:53 AM.

Event Timeline

tra created this revision.Oct 10 2017, 9:53 AM
jlebar accepted this revision.Oct 11 2017, 9:40 AM
jlebar added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
9726

weird indentation?

9733

Urg, this isn't a bool? Do we want it to be?

9761

Accidentally left over?

9762

s/8/NumElements/?
s/16/f16/?

Maybe it would be better to write it as "Return value has type [[f16 x 2] x NumResults]."?

9784

Nit, at this point it's probably better to assign NumResults in each branch, since there are only two. clang should make sure that we don't accidentally use it uninitialized.

9786

s/are using/use/

9800

spacing. (Probably just worth clang-formatting this and the other patch.)

9802

Nit, we know that there won't ever be more than 8 elements...

This revision is now accepted and ready to land.Oct 11 2017, 9:40 AM
tra updated this revision to Diff 118636.Oct 11 2017, 10:12 AM
tra marked 6 inline comments as done.

Addressed Justin's comments.

clang/lib/CodeGen/CGBuiltin.cpp
9726

My emacs and clang-format keep fighting case indentation... Fixed.

9733

There are no explicit declarations for these builtins in CUDA headers. Callers of these builtins pass 0/1 and corresponding intrinsic described in NVVM-IR spec shows the argument type as i32, so I've made the type integer in clang.

9762

That was part of the leftover block. Particular types are irrelevant here. All we care is to store whatever intrinsic call returned ([4 or 8 elements of v2f16 or float] ) in the destination array (which is int[] ).

9802

We have two extra arguments -- destination buffer and stride.

jlebar added inline comments.Oct 11 2017, 10:47 AM
clang/lib/CodeGen/CGBuiltin.cpp
9733

sgtm

This revision was automatically updated to reflect the committed changes.