This is an archive of the discontinued LLVM Phabricator instance.

[mlir][MemRef] Canonicalize extract_strided_metadata(subview)
ClosedPublic

Authored by qcolombet on Sep 1 2022, 4:07 PM.

Details

Summary

Add a canonicalization step for extract_strided_metadata(subview).
The goal is to get rid of the subview while expressing its effects directly on the offset and strides of the base object.

In other words, this canonicalization replaces:

baseBuffer, offset, sizes, strides =
    extract_strided_metadata(
        subview(memref, subOffset, subSizes, subStrides))

With

baseBuffer, baseOffset, baseSizes, baseStrides =
    extract_strided_metadata(memref)
strides#i = baseStrides#i * subSizes#i
offset = baseOffset + sum(subOffset#i * strides#i)
sizes = subSizes

Diff Detail

Event Timeline

qcolombet created this revision.Sep 1 2022, 4:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 4:07 PM
qcolombet requested review of this revision.Sep 1 2022, 4:07 PM

Thanks for making progress on this @qcolombet !

Here is a first round of comments to reduce the impl size and make it more idiomatic.

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1222 ↗(On Diff #457430)

We should have something in StaticValueUtils.h|cpp to hide away the if/else complexity and return an OpFoldResult.
Could you reuse or extend ?

1228 ↗(On Diff #457430)

Can we use a single AffineApplyOp here and below?
At the higher levels of abstraction this is a better way of representing such indexing computations (i.e. 1 op instead of N ops).

See e.g. https://sourcegraph.com/github.com/llvm/llvm-project/-/blob/mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp?L154
You can also make the variadic process of defining symbols bound to a position better by using something like: https://sourcegraph.com/github.com/llvm/llvm-project/-/blob/mlir/lib/Dialect/Linalg/Transforms/Tiling.cpp?L281

1255 ↗(On Diff #457430)

Same above, a lot of this could be compressed down to almost a one liner

1271 ↗(On Diff #457430)

Can this be factored out into a helper function that would live next to e.g. SubViewOp::inferResultType ?
It could return a vector of OpFoldResult to avoid passing a builder and then StaticValueUtils can be used or extended to materialize the constants here.

All this should then become much more idiomatic (and shorter).

mlir/test/Dialect/MemRef/canonicalize.mlir
868 ↗(On Diff #457430)

with the suggestion above you'd only check 1 affine.apply op here with the properly captured AffineMap.

qcolombet added inline comments.Sep 2 2022, 12:08 PM
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1222 ↗(On Diff #457430)

Thanks for the pointer!
I dug a little bit more and I was struggling to find a generic way to describe the ValueRange we want to iterate on.

The high level interface is OffsetSizeAndStrideOpInterface and I thought of using that and then I stumbled on OffsetSizeAndStrideOpInterface::getMixedStrides() which covers that and returns directly OpFoldResult.

So looks like I may avoid the StaticValueUtils all together.

1228 ↗(On Diff #457430)

Thanks for that pointer as well.
Makes a lot of sense.

However regarding the 1 op vs. N ops. That's going to be true for the offset, but not for the strides.
Indeed, we still need to produce one value per stride since these appear in the final result.

I need to look closer how AffineApplyOp are built to avoid creating useless one (when one of the stride is 1 or both strides are constant).

1271 ↗(On Diff #457430)

Thanks for the suggestion, looking!

qcolombet added inline comments.Sep 2 2022, 4:39 PM
mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1228 ↗(On Diff #457430)

Alright reporting back on this.

If we do this, the MemRef dialect will depend on the Affine dialect, which I am not sure is what the community wants in general.

On top of that, the Affine dialect already depends on the MemRef dialect, so we would create a circular dependency. That may not be a deal breaker, but I don't have enough MLIR experience at this point to know how it could be broken.

qcolombet updated this revision to Diff 457748.Sep 2 2022, 6:56 PM
  • Use getMixedXXX and getValueOrCreateConstant instead of manually handling the dynamic/static strides and sizes
  • Make the code more compact:
    • Compute the offset and strides in the same loop
    • Filter the sizes and strides dimensions in the same loop

On the "make the code more compact", we could go even further and populate the resulting strides and sizes directly at the same time that we compute the strides and offset.

I decided against it as I like the separation of concern here, but I could be convinced otherwise. Similarly, I merged the offset and strides computations because it seems natural (and saved one loop), but if people prefer it separated like in the original patch that works for me too.

Hi @nicolasvasilache ,

Let me know what you want to do with respect to the Affine dialect. (Whether we should use it or not here.)

I feel that if we want to go in that direction, we would need to do this canonicalization somewhere else (i.e. out of the core MemRef dialect).
I am happy to go down that road if you feel that's the way to go, just tell me where this would fit.

For now, I've stuck with the arith expansions, with all the problems there are dealing with it. (See inlined comments.)

Cheers,
-Quentin

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1222 ↗(On Diff #457430)

I ended up giving up on OpFoldResult here because unlike the AffineOps, the ArithmeticOps don't play nicely with that type.

Well more precisely, you need to convert them to actual values to be able to construct the arith operations, unless I missed something.

1271 ↗(On Diff #457430)

I ended up populating the resulting strides and sizes in the same loop and I didn't see a good refactoring at this point.

I decided to go this way, because we already have to loop through the dimensions to populate the strides.

mlir/lib/Dialect/MemRef/IR/MemRefOps.cpp
1222 ↗(On Diff #457430)

Makes sense, thanks for trying!

Yes, the difference here is that AffineApply + AffineMap can represent constants as attribute / part of the attribute, but this is not the case for individual arith ops.
Whatever you do you have to materialize the constant, even when it folds perfectly.

Side note, a lower-level indexing dialect with an affine_apply op in it would allow breaking unnecessary dependences but the discussion churn trying to get progress on that has been too high, this is as good as we can do right now.

1228 ↗(On Diff #457430)

Fair enough.
Given that strides indeed require multiple ops anyway, this is fine.

For reference, one way to break cycles (but not applicable here) is to realize that canonicalization patterns don't necessarily need to be rooted on the consumer operation.
Here is one example: https://discourse.llvm.org/t/cross-dialect-folding-and-canonicalization/2740

The key insight (that does not apply here) is: "In this case, no dialect dependency (in the milr registration/loading sense) is needed because your pattern does not produce ops from a different dialect."
Here we would produce an affine dialect op.

1271 ↗(On Diff #457430)

SG!

Not using AffineApplyOp triggers a different set of tradeoffs and this looks fine to me.

mlir/test/Dialect/MemRef/canonicalize.mlir
910 ↗(On Diff #457748)

In this example and below I believe some foldings and tests are missing.
The strides of %ARG are known, they are [64, 4, 1], so the return should be %[[C4]], %[[C1]].

It would be nice to also have some mixed tests where the function argument type has e.g. strides<?x2> etc.

You could have 1 fully dynamic test that spells out all the IR (adds, muls etc) and a few other tests that are just 1-line cheks and look that the return have the right constant in the proper place.

Actually I forgot about some of the context in my last review, I am sorry .. the patterns we are talking about here should probably not be canonicalization patterns but rather patterns that we can apply with a specific new pass to fold away the subview ops in the presence of memref.extract_strided_metadata.
See https://reviews.llvm.org/D128986 for a PR that does these foldings with load/store operations.

I am unclear yet whether we want to add those patterns to the same pass or create a new pass but it seems possible to reuse and refactor pieces.
Helpers like getLinearAffineExpr can likely be reused and refactored to make more idiomatic use of OpFoldResult; emitting AffineApplyOp will be easier and not subject to cyclic dependences.

the patterns we are talking about here should probably not be canonicalization patterns but rather patterns that we can apply with a specific new pass to fold away the subview ops in the presence of memref.extract_strided_metadata.
See https://reviews.llvm.org/D128986 for a PR that does these foldings with load/store operations.

Thanks for the pointer!

I'll move the logic in that pass while using the affine ops and then we can decide if we want a new pass or not.

mlir/test/Dialect/MemRef/canonicalize.mlir
910 ↗(On Diff #457748)

Good point!
I was hoping the getStrides would do the right thing, but it doesn't.
Looks like we need to introduce another getter to get the attribute of the input memref and not the produced values in that case. (More precisely, I think we'll want a getter that returns a OpFoldResult.)

Looking.

qcolombet updated this revision to Diff 458342.Sep 6 2022, 6:43 PM
  • Move the canonicalization in its how pass (and call it simplification!)
  • Use affine apply to expand the offset and strides computations
  • Add a test with everything being dynamic to demonstrate all the computations
qcolombet added inline comments.Sep 6 2022, 7:03 PM
mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
89

Is it possible for the getStridesAndOffset method to fail on the source of a subview?

Put differently, how can I write a test for that part?

108

FWIW, I'm not super happy with this helper function, because we end up materializing (and then deleting later) constants.

I've tried to use the getStridesAndOffset function directly with the AffineExpr constructs, but I was not satisfied with the code.

Essentially, I would end up in the main loop below with something like:

AffineConstantExpr cst;

if (known  && (cst = strideExpr.dyn_cast<AffineConstantExpr>())) {
  makeComposedFolded(s0 * cst, {subStride})
} else {
  makeComposedFolded(s0 * s1, {subStride, origStride})
}

The main problem here is that the number of operands is different (subStride vs. subStride, origStride), so getting rid of the if is not easy (at least I didn't see a nice way to do it.)

What I tried for that was something like:

Expr = s0
Operands = {subStride}
If ()
  Expr *= cst
Else {
  Expr *= s1
 Operands.push_back(origStride)
}
 makeComposedFolded(s0 * Expr, Operands)

And I didn't find it particularly readable.

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
72

@nicolasvasilache I've kept some of the mixed dynamic/static tests.
Compared to the full dynamic test (extract_strided_metadata_of_subview_all_dynamic at the end of this file), these tests don't add much, but I feel they are more approachable than dissecting the huge affine map of the last test.

Let me know if you want to remove them nonetheless.

qcolombet added inline comments.Sep 6 2022, 7:06 PM
mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
220

Nit question: Is there a way to produce an affine map, with a nice ordering of the arguments?
E.g., origOffset, stride0, subStride0, subOffset0, ...

Although the current map is correct, the arguments are all over the place :).

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
70
SmallVector<Type> sizeStrideTypes(sourceRank, indexType);
89

If sourceType come from a valid subview op then you shouldn't be able to fail here.
You may want to assert success instead ans simplify code below.

108

I'm having trouble seeing it in this form but there must be significantly simpler ways of writing this.
Let's apply the first round of cleanups and revisit?

121

Iteratively composing makes it hard to control your operands and indices here.

You could do something like:

SmallVector<Value> newStrides;
for (unsigned i = 0; i < sourceRank; ++i) {
  newStrides.push_back(
    makeComposedFoldedAffineApply(rewriter, origLoc, s0 * s1, {getOrigStrideAtIdx(i), subStrides[i]}));
}

SmallVector<Value> values(2 * sourceRank + 1);
SmallVector<AffineExpr> symbols(2 * sourceRank + 1);

// Note: you may need to impl. bindSymbols yourself as the existing version works for variadic templates only.
bindSymbols(symbols, ctx);
AffineExpr expr = symbols.front();
for (unsigned i = 0; i < sourceRank; ++i) {
  expr = expr + symbols[1 + 2 * i] * symbols[1 + 2 * i + 1]; 
  values.push_back(getOrigStrideAtIdx(i));
  values.push_back(subStrides[i]);
}
    
Value finalOffset = makeComposedFoldedAffineApply(rewriter, origLoc, expr, values));
mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
223

can you please manually reflow and align?

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
220

See my comment above, I believe it's because you compose AffineMaps iteratively.
You could create a flat list of AffineExpr with the proper expressions, match that to the proper value and thus better control the alignment of values.
This would also resist canonicalizations (e.g. if 2 values are the same the corresponding symbol gets dropped and the other ones shifted preserving the original alignment on the other values).

qcolombet updated this revision to Diff 458524.Sep 7 2022, 12:03 PM
  • Introduce a bindSymbolsList function (had to give a different name because the compiler would be confused which version to map otherwise.)
  • Build the affine expr upfront and call composedFoldAffineApply just once
  • Reformat the tests manually to help readability
qcolombet marked 2 inline comments as done.Sep 7 2022, 12:12 PM
qcolombet added inline comments.
mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
89

I was suspecting something like that.

Thanks for the confirmation.

108

With the suggested cleanups it doesn't look bad.
We still create the constants and delete them on the fly though.
If you have ideas on how to fix that, that would be great!

121

Great!
The order makes much more sense now and more importantly is predictable.

Thanks!

BTW, I have one loop for the new strides and one loop for the offset. Let me know if you prefer that we merge both loops.

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
220

Works great!

qcolombet updated this revision to Diff 458528.Sep 7 2022, 12:13 PM
qcolombet marked an inline comment as done.
  • Nit: SmallVector initialize for indexType
qcolombet marked an inline comment as done.Sep 7 2022, 12:14 PM
nicolasvasilache accepted this revision.Sep 7 2022, 3:43 PM

Looks good, thanks!

mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
93

You can just use IntegerAttr Builder::getIndexAttr(int64_t value) to avoid materializing the constant.
This should work out of the box with OpFoldResult

113

This feels like too many levels of indirection for no good reason.
How about something resembling:

for () {
  OpFoldResult origStride = (ShapedType::isDynamicStrideOrOffset(sourceStrides[i])) ? 
    b.getIndexAttr(sourceStrides[i])  : origStrides[i];
   strides.push_back(makeComposedFoldedAffineApply(
       rewriter, origLoc, s0 * s1, {subStrides[i], origStride}));
}
121

same here, just spell it out with a ternary cond, the level of indirection dos not pay for itself.

145

nit: arrray

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
104

typo

152

typo

This revision is now accepted and ready to land.Sep 7 2022, 3:43 PM
qcolombet added inline comments.Sep 7 2022, 4:15 PM
mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
93

Ah great!
Looking.

113

Sounds like I should merge the loop for the strides and the loop of the offset then.
Otherwise I will repeat in both loops:

OpFoldResult origStride = (ShapedType::isDynamicStrideOrOffset(sourceStrides[i])) ? 
  b.getIndexAttr(sourceStrides[i])  : origStrides[i];
qcolombet updated this revision to Diff 458600.Sep 7 2022, 4:33 PM
  • Remove lambdas and use ternary op instead
  • Use getIndexAttr instead of getValueOrCreateConstantIndexOp
  • Merge the offset computation loop and the strides computation loop
  • Fix typos
qcolombet marked 8 inline comments as done.Sep 7 2022, 4:38 PM
qcolombet added inline comments.
mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
113

@nicolasvasilache I'm going to leave the PR open until tomorrow to give you a chance to see the merged loops.
(In other words, what I did to avoid the duplication of the ternary op for the original stride after getting rid of the lambda.)

mlir/test/Dialect/MemRef/simplify-extract-strided-metadata.mlir
104

Nice catch!

qcolombet updated this revision to Diff 458604.Sep 7 2022, 4:44 PM
qcolombet marked an inline comment as done.
  • Cleanup unused includes
nicolasvasilache accepted this revision.Sep 8 2022, 9:56 AM
nicolasvasilache added inline comments.
mlir/lib/Dialect/MemRef/Transforms/SimplifyExtractStridedMetadata.cpp
113

WFM, thanks!