As stated here (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma):
".and operation in single-bit wmma requires sm_80 or higher."
Paths
| Differential D131265
Fixed sm version for .and bmma operator. ClosedPublic Authored by JackAKirk on Aug 5 2022, 8:30 AM.
Details Summary As stated here (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-instructions-wmma-mma): ".and operation in single-bit wmma requires sm_80 or higher."
Diff Detail
Event TimelineThis revision is now accepted and ready to land.Aug 5 2022, 9:50 AM Comment Actions
Thanks. If you could land it for me that would be much appreciated. I don't have the rights. This revision was landed with ongoing or failed builds.Aug 5 2022, 12:14 PM Closed by commit rG3e0e5568a6a8: [CUDA] Fixed sm version constrain for __bmma_m8n8k128_mma_and_popc_b1. (authored by JackAKirk, committed by tra). · Explain Why This revision was automatically updated to reflect the committed changes. Comment Actions Looks like the tests needed to be updated (and I've found one bug which explains how we've missed this). Comment Actions
Ah yes I see it. Thanks for updating the tests.
Revision Contents
Diff 450351 clang/include/clang/Basic/BuiltinsNVPTX.def
clang/test/CodeGen/builtins-nvptx-mma.cu
clang/test/CodeGen/builtins-nvptx-mma.py
|