This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Initial masking support in Linalg vectorizer
ClosedPublic

Authored by dcaballe on Nov 8 2022, 11:09 PM.

Details

Summary

This patch introduces the initial bits to support vector masking
using the vector.mask operation. Vectorization changes should be
NFC for non-masked cases. We can't test masked cases directly until
we extend the Transform dialect to support masking.

Diff Detail

Event Timeline

dcaballe created this revision.Nov 8 2022, 11:09 PM
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
dcaballe requested review of this revision.Nov 8 2022, 11:09 PM
dcaballe added inline comments.Nov 8 2022, 11:12 PM
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1704

Hey @nicolasvasilache, do you think you could help extending the Transform dialect so that we can provide the vector sizes for masked dims?

dcaballe planned changes to this revision.Nov 8 2022, 11:26 PM
dcaballe updated this revision to Diff 474887.Nov 11 2022, 4:22 PM

Fixing a couple of issues when no vector sizes for masked dimensions are not provided

nicolasvasilache requested changes to this revision.Nov 13 2022, 6:39 PM
nicolasvasilache added inline comments.
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
1704

Happy to!

Could you temporarily hardcode some size in there and add some test IR so I can see what I should expect?

This will likely require a new transform op that is not a blanket "vectorize the world" so that we can pass the information you want at a finer granularity. This will likely need some iteration to get to a reasonably scalable usage.

Left some other review comments in the meantime.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
80

super-nit: can we make the line spacing uniform between methods(here) and members(below)?

115

plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit.

115

Should this be a (better named) helper on the LinalgOp interface?
This seems to reimplement common functionality (but maybe not exactly) and rely on deep internal Linalg assumptions (e.g. all possible ways of defining the extent of an iterator have to match and you can therefore take the first one).

The name makes it hard to understand what it does and we should be doing any such manipulation in a very localized place in LinalgOp.

143

unsigned purge here and everywhere plz

160

Can we call these staticUpperBounds everywhere?
And the other ones dynamicUpperBounds ?

This seems easier to me to relate to what we're looking to do instead of vecSizesForMaskedDims and extractDynamicVectorDimValues.

164

How about early exit here ?

if (!linalgOp.hasDynamicShape()) {
  canonicalVecShape = linalgOp.getStaticLoopRanges();
  return success();
}

I don't think you need the checks and debugs after that in the static case?

172

Seems fishy, what happens in this case ?

I'd expect this to be not fail gracefully ..
Make it an assert and lift logic to the precondition to avoid this?

Edit: ah scratch that, I see that this is just after the precondition, can we make it part of the precondition?

177

Can we sprinkle a few precompute prefixes in some of these APIs to make it clear what happens at init time?

184
LLVM_DEBUG(llvm::interleaveComma(canonicalVecShape, llvm::dbgs() << ...));
254

pass RewriterBase here and everywhere possible post https://reviews.llvm.org/D137922 plz

279

Plz use updateRootInPlace once RewriterBase is piped through.

744

nit: can we spell this as:

// 3.a. Convert the indexing map for this input/output to a transfer read...
...
/// 3.a.i  For input reads we use the canonical vector shape.
if (linalgOp.isDpsInput(opOperand))
  ...
} else {
  /// 3.a.ii For output reads (iteration-carried dependence, e.g., reductions)
  ...

// 3.b. If masked, set in-bounds to true.
...
// 3.c. Not all ops support 0-d vectors,
929

Can we make this init the state as part of the precondition?

mlir/lib/IR/AffineMap.cpp
340

Fails when called on a non-projected-permutation.

is misleading here.

It expects a projected permutation otherwise it crashes.
Failing would have returned llvm::None without crashing unless I am missing something?

343

Better name and doc please, this is much too confusing.

In fact I think you can just do something like
llvm::find(map.getResults(), AffineDimExpr::get(input)) at the client and avoid adding more APIs to AffineMap

This revision now requires changes to proceed.Nov 13 2022, 6:39 PM
dcaballe updated this revision to Diff 477667.Nov 23 2022, 6:21 PM
dcaballe marked 6 inline comments as done.
  • Addressed feedback
  • New changes around vectorization initState and canonical vector shape computation
dcaballe added inline comments.Nov 23 2022, 6:21 PM
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
115

plz avoid unsigned everywhere, we know by now this is not meant for expressing >=0 but should really be used for bit-twiddling or when we really really nerd the extra bit.

Sorry,dim is a misnomer as it refers to the dimension position, not the size. It has to be unsigned as that's what it's expected by the AffineMap::getXXXYYYPosition().
Added to Pos to a few names. Hopefully that's better.

Moved it to LinalgOp interface. I can't think of a better name for the utility... It's just mapping an iteration space dimension to a dimension of an operand... Any other suggestion? I'm happy to replace with with any other existing utility but I couldn't find any.

143

As explained before, renamed to indicate that it's the position, not the size.

160

I had already renamed this locally to inputVectorSizes and change a bit the meaning. The input sizes are now taken into account to compute the canonical vector shape and if they are also provided for static shapes they should match the size of the static shapes. We are passing them all now to simplify the client API, including the transform dialect, as it's easier to provide all the vector sizes than having to filter out the static ones. Let me know if that works.

164

This is gone now. This code has changed a bit in the last version.

172

Let me know if it makes more sense in the new version, where the inputVectorSizes, if provided, should match the linalgop.getNumLoops(). Otherwise, this would be a bug.

177

Much better!

184

Ah! I didn't know this utility! It's been such a pain to always print SmallVector's...
Thanks!

254

I think we are going in the opposite direction based on the review comments?

929

Probably better to separate the concerns. I think even part of the precondition checks are reused outside of the vectorizer. A public interface was introduced recently.

mlir/lib/IR/AffineMap.cpp
343

I had renamed this like 10 times. It's a difficult name. Hopefully it's better now :).

dcaballe planned changes to this revision.Nov 24 2022, 12:12 AM
mlir/lib/IR/AffineMap.cpp
343

Wait .. what do I see just above .. literally the same functionality modulo an assert ..

Can we just have a single

Optional<int64_t> AffineMap::getResultPosition(AffineExpr e) const {
  for (int64_t i = 0, numResults = getNumResults(); i < numResults; i++)
    if (getResult(i) == e)
      return i;
  return llvm::None;
}

and let clients do the assertions they want ?

It seems very counterproductive to have all these special case functions with slightly varying assertions and hard to grok names ..

dcaballe updated this revision to Diff 479129.Nov 30 2022, 5:50 PM
dcaballe marked an inline comment as done.

Addressed feedback + minor fixes.
Please ignore the AffineMap utility. It will be removed after rebasing on top of D138946.

nicolasvasilache requested changes to this revision.Dec 1 2022, 7:00 AM

I do not understand the implication of

// TODO: We mask the transfer.transfer_write here because this op is
// special-cased. A linalg.yield may produced multiple vector.transfer_write
// ops and can't be mapped using BlockAndValueMapping.
AffineMap opOperandMap = linalgOp.getMatchingIndexingMap(opOperand);
write = state.maskOperation(b, write, linalgOp, opOperandMap);

.. also I am not seeing any test changes, so it seems you are adding a lot of code that is not tested and not activated ?

mlir/include/mlir/IR/AffineMap.h
177

I think this goes away with the rebase, just flagging for removal so we don't forget.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
75

Why not make this a ctor?

254

We cannot have code like this:

Operation *maskOpTerminator = &maskOp.getMaskRegion().front().back();
for (auto &en : llvm::enumerate(opToMask->getResults()))
  en.value().replaceAllUsesExcept(maskOp.getResult(en.index()),
                                  maskOpTerminator);

it must use a RewriterBase with updateRootInPlace

290

Hmm .. what's the contract between this createRegionMask lambda and the insertion points during
builder.create<vector::MaskOp> ?

I've seen too much ugly stuff re. insertion points leaking across function call boundaries.

Let's add an OpBuilder::InsertionGuard at the top of this function.

306–307

this must use a RewriterBase with updateRootInPlace

472

nit: produce

This revision now requires changes to proceed.Dec 1 2022, 7:00 AM
dcaballe updated this revision to Diff 479525.Dec 1 2022, 10:39 PM
dcaballe marked 4 inline comments as done.

Addressed feedback.

.. also I am not seeing any test changes, so it seems you are adding a lot of code that is not tested and not activated ?

Changes in the overall vectorization algorithm are tested with existing vectorization tests. This patch is NFC for those. Masking is not enabled if inputVectorSizes are not provided. If they are provided, only elementwise ops without reductions and fully dynamic shapes are vectorized.
I can't add unit tests until the new operation for masked vectorization is added to the transform dialect, as we no longer have the vectorizer testing pass. However, this PR has been extensively tested in IREE, both with and without masking for even more cases than the currently supported right now.
Waiting on the new transform dialect op to land to add more tests.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
75

Because it can fail, at least for now. This will change once we support more cases with masking. Then, we could assert and turn it into a constructor.

254

I can use updateRootInPlace when we have a rewriter here but I can't do any replacement method because opToMask is moved inside the mask region, not replaced.

290

Added guard. This is a simple lambda to create the op region. We follow the same approach for scf.if and other region ops. The only contract is that the region needs to have a vector::YieldOp, which is described in the vector.mask doc.

306–307

Added TODO until we have a rewriter.

472

I do not understand the implication of

TODO: We mask the transfer.transfer_write here because this op is
special-cased. A linalg.yield may produced multiple vector.transfer_write
// ops and can't be mapped using BlockAndValueMapping.

Good point. This is a comment for an old problem. I removed it and moved this code to buildVectorWrite.

mlir/lib/IR/AffineMap.cpp
343
dcaballe updated this revision to Diff 480308.Dec 5 2022, 6:22 PM

Added testing support to Transform dialect + tests

dcaballe updated this revision to Diff 480353.Dec 6 2022, 12:10 AM

Rebase + remove dead code (wrong rebase)

Waiting on the new transform dialect op to land to add more tests.

Thanks for integrating it and adding tests, the testing part LGTM, till need to make another pass on the last version of the code.

nicolasvasilache accepted this revision.Dec 6 2022, 10:07 AM

Thanks @dcaballe !

mlir/include/mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.td
1120 ↗(On Diff #480353)

"definite failure"

1138 ↗(On Diff #480353)

can you add a TODO: applyToOne plz ?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
229

Use the find API and the iterator you get from it to avoid multi-lookups

270

nit: uppermension :) ?

436

can you add a TODO to tighten op semantics so that we don't mix inbounds and mask since this is well defined?

760

I'll need to revisit all this in light of the broadcast separation.
From a cursory glance this looks reasonable, let's land and iterate.

891

ok as a first appox.

This revision is now accepted and ready to land.Dec 6 2022, 10:07 AM
dcaballe marked 7 inline comments as done.Dec 12 2022, 5:31 PM

Thanks! I addressed comments. Landing now...

nicolasvasilache added a comment.EditedAug 10 2023, 5:15 AM

@dcaballe, I was looking at this PR as I was doing some spelunking and I realize we are not testing the case of the SSA value as well as the error case when the SSA value is not a constant.

Can we please add the missing tests in a a followup ?