This is an archive of the discontinued LLVM Phabricator instance.

[mlir][nvgpu] Fix affine maps computing indices for LdMatrixOp srcMemref
ClosedPublic

Authored by manishucsd on Nov 29 2022, 9:04 PM.

Details

Summary

This patch fixes and simplifies the ldmatrix affine map arithmetic by abstracting the affine expressions in terms of pitch-linear layout (strided and contiguous dimensions). Then it applies the maps for strided and contiguous dimensions in row-major and col-major. 

LdMatrixOp collaboratively (32 threads in a warp) load tiles (8 row x 128b col) of data. It can load either x1, x2, x4 tiles. Additionally, it can transpose at 16-bit granularity when moving data from the Shared Memory to registers. 

This patch fixes affine map (laneid -> coordinate index a thread points in a tile).

  • Loading x4 tiles needs all 32 lanes T0-31 point to a contiguous chunk of 128b. The issue was exposed when running this case.
  • Loading x2 tiles and x1 needs T0-15 threads and T0-7 threads points to contiguous chunk of 128b. The patch is NFC for these cases.

Diff Detail

Event Timeline

manishucsd created this revision.Nov 29 2022, 9:04 PM
Herald added a project: Restricted Project. · View Herald Transcript
manishucsd requested review of this revision.Nov 29 2022, 9:04 PM
ThomasRaoux accepted this revision.Nov 30 2022, 3:31 PM

Looks good to me but I don't know this code in details. It would be good to have @christopherbate input.

This revision is now accepted and ready to land.Nov 30 2022, 3:31 PM
christopherbate accepted this revision.Nov 30 2022, 7:40 PM

Can you add some more detail to the commit message? Per our offline discussion, the issue became apparent when the .x4 variant is used with transpose = true. The changes in the tests are mostly NFC with the exception of m16n16k16_mmasync16816_fp16_f16_row_row_row, where the rowB/colB affine maps were incorrect. For the other .x2 test cases, the new affine maps introduced here and the old affine maps in the CHECK statements are equivalent because only the first 16 thread ids in the warp matter.

LGTM

manishucsd edited the summary of this revision. (Show Details)Dec 1 2022, 7:47 AM
manishucsd updated this revision to Diff 479297.Dec 1 2022, 7:52 AM

Making lit test consistent.

manishucsd updated this revision to Diff 479485.Dec 1 2022, 5:24 PM
manishucsd updated this revision to Diff 479487.Dec 1 2022, 5:52 PM
This revision was landed with ongoing or failed builds.Dec 1 2022, 6:31 PM
This revision was automatically updated to reflect the committed changes.