Details
Details
- Reviewers
herhut qcolombet nicolasvasilache
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/test/Integration/GPU/CUDA/sm90a/tma_wgmma_128_128_64_f32_f16_f16.mlir | ||
---|---|---|
108 | Thanks for adding this visual. Can you also please mention that the first 4 bits are ignored. The first +2 is obtained by using (CTA_Tile_K x 2 bytes) / 2^4 = 16x2/2^4 = 2. |
Thanks for adding this visual. Can you also please mention that the first 4 bits are ignored. The first +2 is obtained by using (CTA_Tile_K x 2 bytes) / 2^4 = 16x2/2^4 = 2.