For the conversion to nvgpu mma.sync and ldmatrix pathways, the code
was missing support for the i4 data type. While fixing this, another
bug was discoverd that caused the number of ldmatrix tiles calculated for
certain operand types and configurations to be incorrect. This change
fixes both issues and adds additional tests.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo