This is an archive of the discontinued LLVM Phabricator instance.

[mlir] GPU: introduce utilities for promotion to workgroup memory
ClosedPublic

Authored by ftynse on Dec 26 2019, 7:38 AM.

Details

Summary

Introduce a set of function that promote a memref argument of a
gpu.func to workgroup memory using memory attribution. The promotion
boils down to additional loops performing the copy from the original
argument to the attributed memory in the beginning of the function, and
back at the end of the function using all available threads. The loop
bounds are specified so as to adapt to any size of the workgroup. These
utilities are intended to compose with other existing utilities (loop
coalescing and tiling) in cases where the distribution of work across
threads is uneven, e.g. copying a 2D memref with only the threads along
the "x" dimension. Similarly, specialization of the kernel to specific
launch sizes should be implemented as a separate pass combining constant
propagation and canonicalization.

Introduce a simple attribute-driven pass to test the promotion
transformation since we don't have a heuristic at the moment.

Diff Detail

Event Timeline

ftynse created this revision.Dec 26 2019, 7:38 AM

Unit tests: pass. 61126 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

rriddle added inline comments.Dec 26 2019, 9:29 AM
mlir/include/mlir/Dialect/GPU/GPUOps.td
122

BTW: Value is now value-typed.

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp
184

has_single_element

mehdi_amini added inline comments.Dec 26 2019, 9:45 AM
mlir/lib/Dialect/GPU/CMakeLists.txt
25

I see locally this error:

CMake Error at .../llvm-project/mlir/lib/Dialect/GPU/CMakeLists.txt:18 (target_link_libraries):
  Target "MLIRGPUOpsIncGen" of type UTILITY may not be linked into another
  target.  One may link only to INTERFACE, OBJECT, STATIC or SHARED
  libraries, or to executables with the ENABLE_EXPORTS property set.
ftynse updated this revision to Diff 235836.Jan 2 2020, 1:49 AM

Updated to use value-typed Value.

ftynse edited the summary of this revision. (Show Details)Jan 2 2020, 1:52 AM
ftynse added reviewers: nicolasvasilache, rriddle, csigg.

Unit tests: pass. 61127 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

I think this can be significantly simplified by reusing existing infra better.

In the longer term, composing the following steps should allow to write all this in a few lines of code:

  1. (optionally, not sure if really needed) use a proper linalg.reshape to insert your dummy dimensions (once available)
  2. use the linalg.pointwise builder (https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/Dialect/Linalg/EDSC/Builders.h#L113) with the proper indexings and a region that just yields the input (see e.g. https://github.com/llvm/llvm-project/blob/master/mlir/test/EDSC/builder-api-test.cpp#L837)
  3. use linalgOpToLoops (https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/Dialect/Linalg/Transforms/LinalgTransforms.h#L75) (bonus points extend the infra to capture the loops so you can do 4. directly)
  4. apply mapLoopToProcessorIds (https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/Transforms/LoopUtils.h#L221)
  5. (alternative to 3. + 4.) use the linalg -> SPMD loops mapping transformation (future work)

Please add a TODO so we can revisit later and simplify.

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp
79

No need to reimplement EDSCs for the special case of constants, you can just use them: constant_index
See https://github.com/llvm/llvm-project/blob/master/mlir/test/EDSC/builder-api-test.cpp#L473 and code around for usage.

83

Since you're already using EDSCs, note that you already have support for this and the code above here:
https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/EDSC/Helpers.h#L67

109

Why roll your own special cases?
You could just emit the loops as you do (note the loops themselves are captured since you use the EDSCs) and then apply:

void mapLoopToProcessorIds(loop::ForOp forOp, ArrayRef<Value> processorId,
                           ArrayRef<Value> numProcessors);

https://github.com/llvm/llvm-project/blob/master/mlir/include/mlir/Transforms/LoopUtils.h#L221

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp
74

To tie the previous suggestions together:

  1. use EDSC MemRefView (feel free to rename), to capture all lbs/ubs/steps
  2. add your dummy dims
  3. emit loops, they are captured automatically
  4. apply mapToProcessorIds on the k-innermost loops
herhut added a subscriber: herhut.Jan 7 2020, 9:03 AM
ftynse updated this revision to Diff 236632.Jan 7 2020, 11:05 AM
ftynse marked an inline comment as done.
ftynse added a reviewer: herhut.
ftynse marked 7 inline comments as done.
ftynse added inline comments.
mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp
74

Done.
The generated code is much uglier, but let's see if it triggers any issues wrt canonicalization between every step.

rriddle added inline comments.Jan 7 2020, 11:18 AM
mlir/include/mlir/Dialect/GPU/GPUOps.td
120

What is "op" here?

mlir/include/mlir/Dialect/GPU/MemoryPromotion.h
4

This should be "MLIR Project"

mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp
144

has_single_element

mlir/lib/IR/Block.cpp
192

Why can you not use it directly?

mlir/test/lib/Transforms/TestGpuMemoryPromotion.cpp
4

Same here.

Unit tests: pass. 61301 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

ftynse marked 2 inline comments as done.Jan 7 2020, 11:26 AM
ftynse added inline comments.
mlir/lib/IR/Block.cpp
192

Use what? args_iterator and arguments.begin() iterate through different things.

rriddle marked an inline comment as done.Jan 7 2020, 2:17 PM
rriddle added inline comments.
mlir/lib/IR/Block.cpp
192

(Sorry, I meant 'it') Ahh yeah, I forgot about that.

ftynse updated this revision to Diff 236782.Jan 8 2020, 2:05 AM
ftynse marked 3 inline comments as done.
ftynse edited the summary of this revision. (Show Details)

Addressed rriddle's comments.

Unit tests: pass. 61306 tests passed, 0 failed and 736 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

rriddle accepted this revision.Jan 8 2020, 10:34 AM
rriddle added inline comments.
mlir/lib/Dialect/GPU/Transforms/MemoryPromotion.cpp
143

Can you add a message to this assert?

This revision is now accepted and ready to land.Jan 8 2020, 10:34 AM
nicolasvasilache accepted this revision.Jan 8 2020, 11:20 PM
ftynse marked an inline comment as done.Jan 9 2020, 1:01 AM
This revision was automatically updated to reflect the committed changes.