This is an archive of the discontinued LLVM Phabricator instance.

[mlir][nvgpu] shared memory access optimization pass
ClosedPublic

Authored by christopherbate on Jun 9 2022, 4:44 PM.

Details

Summary

This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by
newColIdx = col % vecSize + perm[row](col/vecSize,row)
where perm is a permutation function indexed by row and vecSize
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized accesses
common to loading matrix multiplication operands to/from shared memory.

Note: many changes here are code movement created after adding an IR subfolder for the nvgpu dialect, I can separate those to a another diff if required.

Diff Detail

Event Timeline

christopherbate created this revision.Jun 9 2022, 4:44 PM
Herald added a project: Restricted Project. · View Herald Transcript
christopherbate requested review of this revision.Jun 9 2022, 4:44 PM
christopherbate edited the summary of this revision. (Show Details)Jun 9 2022, 4:45 PM

Add check for sub-views + test.

Fix bazel build.

NFC: fix usage of 'NvGpu' to NVGPU in CMake to be consistent.
Prune dependencies/includes. Fix typos.k

This looks great! Few minor comments.

mlir/include/mlir/Dialect/NVGPU/Passes.td
16

can you add a one line summary?

mlir/include/mlir/Dialect/NVGPU/Transforms/Transforms.h
2–3

nit: formating

19

This and the header above don't seem needed?

mlir/lib/Dialect/NVGPU/CMakeLists.txt
2

nit: missing new line

mlir/lib/Dialect/NVGPU/Transforms/OptimizeSharedMemory.cpp
2

description looks out of date?

15

from a layering point of view it's not great that this depends on NVVM dialect. Is this only because of the address space? If this is the only dependency I wonder if we should either use the GPU dialect helper or have something in NVGPU. What do you think?

121

you should be able to use indicesMutable() instead of using hardcoded operand index. (same below)

270

missing new line

christopherbate marked 6 inline comments as done.

Address comments.

mlir/lib/Dialect/NVGPU/Transforms/OptimizeSharedMemory.cpp
15

OK, I changed to the gpu dialect function for now. We can replace it later if the gpu dialect dependency needs to be eliminated.

121

Thanks, I was wondering how to clean this up!

This revision is now accepted and ready to land.Jun 10 2022, 3:46 PM

Fix rebase issues with bazel build

The build is just giving MLIR :: Examples/standalone/test.toy error with "no space left on device" error, so I'll go ahead and land this.

This revision was landed with ongoing or failed builds.Jun 17 2022, 8:35 AM
This revision was automatically updated to reflect the committed changes.

This has caused a break in the Windows MLIR bot:

614.601 [639/64/2749] Building CXX object tools\mlir\lib\Dialect\NVGPU\Transforms\CMakeFiles\obj.MLIRNVGPUTransforms.dir\OptimizeSharedMemory.cpp.obj
FAILED: tools/mlir/lib/Dialect/NVGPU/Transforms/CMakeFiles/obj.MLIRNVGPUTransforms.dir/OptimizeSharedMemory.cpp.obj ... 
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Dialect\NVGPU\Transforms\OptimizeSharedMemory.cpp(194): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Dialect\NVGPU\Transforms\OptimizeSharedMemory.cpp(194): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)

This has caused a break in the Windows MLIR bot:

614.601 [639/64/2749] Building CXX object tools\mlir\lib\Dialect\NVGPU\Transforms\CMakeFiles\obj.MLIRNVGPUTransforms.dir\OptimizeSharedMemory.cpp.obj
FAILED: tools/mlir/lib/Dialect/NVGPU/Transforms/CMakeFiles/obj.MLIRNVGPUTransforms.dir/OptimizeSharedMemory.cpp.obj ... 
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Dialect\NVGPU\Transforms\OptimizeSharedMemory.cpp(194): error C2220: the following warning is treated as an error
C:\buildbot\mlir-x64-windows-ninja\llvm-project\mlir\lib\Dialect\NVGPU\Transforms\OptimizeSharedMemory.cpp(194): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?)

Thanks, I can push a 1-line change to fix.

Thanks, I can push a 1-line change to fix.

Thank you! I often feel bad notifying about simple breaks instead of fixing them, but since I don't have commit I have to loop someone else in either way.