Details
Details
- Reviewers
herhut qcolombet nicolasvasilache
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| mlir/test/Integration/GPU/CUDA/sm90a/tma_wgmma_128_128_64_f32_f16_f16.mlir | ||
|---|---|---|
| 108 | Thanks for adding this visual. Can you also please mention that the first 4 bits are ignored. The first +2 is obtained by using (CTA_Tile_K x 2 bytes) / 2^4 = 16x2/2^4 = 2. | |
Thanks for adding this visual. Can you also please mention that the first 4 bits are ignored. The first +2 is obtained by using (CTA_Tile_K x 2 bytes) / 2^4 = 16x2/2^4 = 2.