This is an archive of the discontinued LLVM Phabricator instance.

[mlir][gpu] Add DecomposeMemrefsPass
ClosedPublic

Authored by Hardcode84 on Jul 13 2023, 4:10 PM.

Details

Summary

Some GPU backends (SPIR-V) lower memrefs to bare pointers, so for dynamically sized/strided memrefs it will fail.
This pass extracts sizes and strides via memref.extract_strrided_metadata outside gpu.launch body and do index/offset calculation explicitly and then reconstructs memrefs via memref.reinterpret_cast.

memref.reinterpret_cast then lowered via https://reviews.llvm.org/D155011

Diff Detail

Event Timeline

Hardcode84 created this revision.Jul 13 2023, 4:10 PM
Hardcode84 requested review of this revision.Jul 13 2023, 4:10 PM

Nice !

Can we refactor the implementations and put them in a place in MemRefUtils to separate the orthogonal GPU concerns here ?
Then we can all reuse these nice features that people have been rewriting over and over.

mlir/lib/Dialect/GPU/Transforms/DecomposeMemrefs.cpp
32

anon ns and static is redundant.
The preferred mechanism in LLVM is static, could you drop the ns ?

46

Offhand this looks like an n^th implementation / extensions of features available in IndexingUtils and StaticValueUtils.

I made a similar comment in https://reviews.llvm.org/D155017.

I made some refactorings in the past in MemRef/Transforms/FoldMemRefAliasOps.cpp but more can probably be done.

Can we try to all converge on one thing and compound interest?

move some code into indexingUtils

tschuett added inline comments.
mlir/lib/Dialect/GPU/Transforms/DecomposeMemrefs.cpp
165
nicolasvasilache accepted this revision.Aug 9 2023, 4:10 PM

sorry for the delay, thanks!

mlir/lib/Dialect/GPU/Transforms/DecomposeMemrefs.cpp
52

Hmm I find this somewhat hard to parse that this is not a lambda but the result of the application of a lambda ..

How about:

memref::ExtractStridedMetadataOp newExtractStridedMetadata;
{
    OpBuilder::InsertionGuard g(rewriter);
    setInsertionPointToStart(rewriter, source);
   newExtractStridedMetadata = rewriter.create<memref::ExtractStridedMetadataOp>(loc, source);
};
This revision is now accepted and ready to land.Aug 9 2023, 4:10 PM
Hardcode84 updated this revision to Diff 548822.Aug 9 2023, 4:58 PM

rebase, review comments

Hardcode84 marked an inline comment as done.Aug 9 2023, 4:58 PM
This revision was landed with ongoing or failed builds.Aug 9 2023, 5:28 PM
This revision was automatically updated to reflect the committed changes.

Reverted as it broke some bots.

So, there is a problem, computeLinearIndex introduces dependency on affine dialect, but I cannot just add MLIRAffineDialect as dependency to IndexingUtils as it creates circular dependencies.

Any ideas how to handle it?

Hardcode84 reopened this revision.Aug 9 2023, 7:32 PM
This revision is now accepted and ready to land.Aug 9 2023, 7:32 PM
Hardcode84 updated this revision to Diff 548851.Aug 9 2023, 7:34 PM

moved to memrefutils

Hardcode84 added a comment.EditedAug 9 2023, 7:37 PM

Moved code to memrefUtils as it already have affine dependency. computeLinearIndex and getLinearizeMemRefAndOffset functions should be unified, but getLinearizeMemRefAndOffset is doing too many things at once and refactoring will be too big for unrelated patch.

nicolasvasilache added a comment.EditedAug 10 2023, 3:18 AM

Moved code to memrefUtils as it already have affine dependency. computeLinearIndex and getLinearizeMemRefAndOffset functions should be unified, but getLinearizeMemRefAndOffset is doing too many things at once and refactoring will be too big for unrelated patch.

Hmm not a fan of the landing place .. I suggest revisiting how you construct your AffineExpr using/extending IndexingUtils.
For instance, I am unclear why you need to interleave symbols/values for strides and indices.
Seems pretty easy to just use the following in IndexingUtils:

SmallVector<AffineExpr> computeElementwiseMul(ArrayRef<AffineExpr> v1, ArrayRef<AffineExpr> v2);
AffineExpr computeSum(MLIRContext *ctx, ArrayRef<AffineExpr> basis);

And build something resembling

OpFoldResult mlir::computeLinearIndex(OpBuilder &builder, Location loc,
                                      OpFoldResult sourceOffset,
                                      ArrayRef<OpFoldResult> strides,
                                      ArrayRef<OpFoldResult> indices) {
  assert(strides.size() == indices.size());
  auto sourceRank = static_cast<unsigned>(strides.size());

  // Hold the affine symbols and values for the computation of the offset.
  SmallVector<OpFoldResult> values{indices.begin(), indices.end()};
  llvm::append_range(values, strides);

  AffineExpr expr = computeTheAffineExprYouWantWithIndexingUtilsAndExtendThemAsNecessary();

  return affine::makeComposedFoldedAffineApply(builder, loc, expr, values);
}
nicolasvasilache added a comment.EditedAug 10 2023, 3:27 AM
OpFoldResult mlir::computeLinearIndex(OpBuilder &builder, Location loc,
                                      OpFoldResult sourceOffset,
                                      ArrayRef<OpFoldResult> strides,
                                      ArrayRef<OpFoldResult> indices) {

For the specific landing place of this function, how about Dialect/Affine/ViewLikeInterfaceUtils.h which contains related functions?

I wouldn't be opposed to moving this file to a better place as long as we keep all related functionality together as much as possible.
Maybe there is a need for AffineIndexingUtils that would just make the IndexingUtils APIs available with OpFoldResult + AffineApply.

Moved code to memrefUtils as it already have affine dependency. computeLinearIndex and getLinearizeMemRefAndOffset functions should be unified, but getLinearizeMemRefAndOffset is doing too many things at once and refactoring will be too big for unrelated patch.

Hmm not a fan of the landing place .. I suggest revisiting how you construct your AffineExpr using/extending IndexingUtils.
For instance, I am unclear why you need to interleave symbols/values for strides and indices.
Seems pretty easy to just use the following in IndexingUtils:

SmallVector<AffineExpr> computeElementwiseMul(ArrayRef<AffineExpr> v1, ArrayRef<AffineExpr> v2);
AffineExpr computeSum(MLIRContext *ctx, ArrayRef<AffineExpr> basis);

And build something resembling

OpFoldResult mlir::computeLinearIndex(OpBuilder &builder, Location loc,
                                      OpFoldResult sourceOffset,
                                      ArrayRef<OpFoldResult> strides,
                                      ArrayRef<OpFoldResult> indices) {
  assert(strides.size() == indices.size());
  auto sourceRank = static_cast<unsigned>(strides.size());

  // Hold the affine symbols and values for the computation of the offset.
  SmallVector<OpFoldResult> values{indices.begin(), indices.end()};
  llvm::append_range(values, strides);

  AffineExpr expr = computeTheAffineExprYouWantWithIndexingUtilsAndExtendThemAsNecessary();

  return affine::makeComposedFoldedAffineApply(builder, loc, expr, values);
}

Not sure I got you comment, the main issue is affine::makeComposedFoldedAffineApply which require dependency on affine dialect, and adding affine dialect dep to indexingUtils causing circular targets dependencies in cmake.

nicolasvasilache added a comment.EditedAug 10 2023, 5:21 AM

Not sure I got you comment, the main issue is affine::makeComposedFoldedAffineApply which require dependency on affine dialect, and adding affine dialect dep to indexingUtils causing circular targets dependencies in cmake.

Yes, sorry, there are too many parts to my comment, let me untangle:

  1. you should be able to make the impl of computeLinearIndex use pure AffineExpr helpers which can live in IndexingUtils (AffineExpr is a builtin type, it does not require a dependence on the affine dialect). I wrote one suggestion about using pointwise mul and add reduction on AffineExpr if you can live without the interleaving of indices and strides. If you really want the interleaving, I'd suggest adding more AffineExpr-only helpers.
  2. then your computeLinearIndex is ~3 lines and can live in Dialect/Affine/ViewLikeInterfaceUtils.h, which depends on the affine dialect and already contains helpers that are related in spirit.
  3. bonus points for a better reorganization of Dialect/Affine/ViewLikeInterfaceUtils.h into smaller components such as Dialect/Affine/IndexingUtils.h which would roughly mirror the existing IndexingUtils.h API but additionally use affine_apply. This is not for this PR.

Does this make sense ?

I can change computeLinearIndex to return an affineExpr + list of values so it will be user responsibility to pass it to makeComposedFoldedAffineApply. We won't need anything in Dialect/Affine/ViewLikeInterfaceUtils.h and interleaving and values order won't matter as long as user just forwards returned expr and values to makeComposedFoldedAffineApply.

nicolasvasilache accepted this revision.Aug 10 2023, 5:40 AM

I can change computeLinearIndex to return an affineExpr + list of values so it will be user responsibility to pass it to makeComposedFoldedAffineApply. We won't need anything in Dialect/Affine/ViewLikeInterfaceUtils.h and interleaving and values order won't matter as long as user just forwards returned expr and values to makeComposedFoldedAffineApply.

SGTM, this follows how we are doing things in FoldMemRefAliasOps.cpp and is prob the shortest path indeed.

move to IndexingUtils, do not use affine dialect

This revision was automatically updated to reflect the committed changes.