This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX, CUDA] Add .and.popc variant of the b1 MMA instruction.
ClosedPublic

Authored by tra on Jul 2 2021, 5:07 PM.

Details

Summary

Extends the changes in D104847 and adds another MMA instruction variant and corresponding intrinsics & builtins.

That should allow clang to compile mma.h from CUDA-11.3.

Didn't test it much yet. There may still be some sharp corners.

Diff Detail

Event Timeline

tra created this revision.Jul 2 2021, 5:07 PM
tra requested review of this revision.Jul 2 2021, 5:07 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 2 2021, 5:07 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript
steffenlarsen requested changes to this revision.Jul 12 2021, 5:25 AM

Good stuff! Thanks for adding this and adjusting the test generator. I have requested some minor changes, though nothing critical. Are the test failures related to these changes?

clang/test/CodeGen/builtins-nvptx-mma.py
35–39
84

The following changes would remove the need for the m16n16k8 cases in is_ldst_variant_supported.

261
This revision now requires changes to proceed.Jul 12 2021, 5:25 AM
tra updated this revision to Diff 358041.Jul 12 2021, 12:53 PM
tra marked an inline comment as done.
tra edited the summary of this revision. (Show Details)

Addressed review comments.

tra added inline comments.Jul 12 2021, 12:56 PM
clang/test/CodeGen/builtins-nvptx-mma.py
35–39

Default initializers that use objects in python are one of the common gotchas.
https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments

It probably does not make much of a difference in this case as we do not modify it, but I'd prefer to avoid it nevertheless.

84

This was done deliberately. I did have that in an earlier variant of the patch, but eventually settled on the current version.

My thinking is that make_ldst_ops(m16n16k8) would create whatever is necessary for m16n16k8, and we'd keep the decision of what to produce in one place in is_ldst_variant_supported. Splitting one make_ldst_ops into two here makes this decision implicit which would need additional explanations. Code that needs less explanations wins. :-)

steffenlarsen accepted this revision.Jul 13 2021, 2:22 AM

Thank you for addressing my concerns. I am happy with the changes. Great work!

clang/test/CodeGen/builtins-nvptx-mma.py
35–39

Interesting and a little horrifying. I agree it wouldn't have been a problem in this particular case, but I understand your concern and I am fine with keeping it as-is. 👍

84

My concern is that it is more confusing this way because they are the only load/store ops generated where the type is then later filtered. It makes sense for MMA because it needs to filter out select combinations, but here the comment above says one thing and make_ldst_ops does something else.

I don't think it makes a huge difference either way, so if you prefer the current version I am okay with keeping it. That said, it might be good to have a comment in the m16n16k8 case of is_ldst_variant_supported making the connection to this.

This revision is now accepted and ready to land.Jul 13 2021, 2:22 AM
tra updated this revision to Diff 358365.Jul 13 2021, 11:21 AM

Updated LD/ST generation.

tra marked an inline comment as done.Jul 14 2021, 2:56 PM
tra added inline comments.
clang/test/CodeGen/builtins-nvptx-mma.py
84

Fair enough. I've adopted your suggestion. Builtins don't have as much magic as intrinsics, so your approach is indeed simpler.

steffenlarsen accepted this revision.Jul 15 2021, 1:24 AM

LGTM! Are the test failures related to this, you reckon?

tra updated this revision to Diff 359047.Jul 15 2021, 10:37 AM
tra marked an inline comment as done.

rebase

tra added a comment.Jul 15 2021, 10:47 AM

LGTM! Are the test failures related to this, you reckon?

AFAICT, no, the test failures don't seem to be related. It appears that the test runs rarely succeed in general. They are mostly red for nearly all build attempts (https://reviews.llvm.org/harbormaster/build/?plan=PHID-HMCP-p2oc4ocen3l2yzymvg2l), even though LLVM buildbots are green (https://lab.llvm.org/buildbot/#/).

This revision was landed with ongoing or failed builds.Jul 15 2021, 12:02 PM
This revision was automatically updated to reflect the committed changes.