This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Add H100 matmul 128x128x64 (f32+=f16*f16)
Needs ReviewPublic

Authored by guraypp on Sep 4 2023, 5:27 AM.

Details

Summary

MLIR recently started supporting mbarriers, tma and wgmma instructions together. This is an integration program that tests them all together.

Depends on D158434, D159342

Diff Detail

Event Timeline

guraypp created this revision.Sep 4 2023, 5:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2023, 5:27 AM
guraypp requested review of this revision.Sep 4 2023, 5:27 AM
manishucsd added inline comments.
mlir/test/Integration/GPU/CUDA/sm90a/tma_wgmma_128_128_64_f32_f16_f16.mlir
108

Thanks for adding this visual. Can you also please mention that the first 4 bits are ignored. The first +2 is obtained by using (CTA_Tile_K x 2 bytes) / 2^4 = 16x2/2^4 = 2.