This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/Transforms/
-
mlir/
-
Dialect/
-
Linalg/
-
Transforms/
1/1
Transforms.h
-
lib/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
TransformOps/
2/2
LinalgTransformOps.cpp
-
Transforms/
5/10
Vectorization.cpp

Differential D150495

[mlir][linalg] Unify generic vectorization interface.
ClosedPublic

Authored by hanchung on May 12 2023, 4:12 PM.

Download Raw Diff

Details

Reviewers

dcaballe
ThomasRaoux
nicolasvasilache
mravishankar
qcolombet
aartbik
awarzynski

Commits

rG0d871fef186d: [mlir][linalg] Unify generic vectorization interface.

Summary

It breaks the logic of maskedVectorize (on tensor.pad ops) into
precondition checks and vectorization implementation; unifies the
interface.

The revision also rename`s vectorizeLinalgOpPrecondition` to
vectorizeOpPrecondition because we can vectorize ops other
than LinalgOps.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hanchung created this revision.May 12 2023, 4:12 PM

Herald added a reviewer: aartbik. · View Herald TranscriptMay 12 2023, 4:12 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, bzcheeseman and 22 others. · View Herald Transcript

hanchung requested review of this revision.May 12 2023, 4:12 PM

Herald added a project: Restricted Project. · View Herald TranscriptMay 12 2023, 4:12 PM

Herald added subscribers: • pcwang-thead, limo1996, stephenneuendorffer. · View Herald Transcript

rebase

Harbormaster completed remote builds in B231737: Diff 521837.May 12 2023, 4:29 PM

rebase

Thank you so much for looking into this! I really appreciate it. LG, leaving some comments for now

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1286	I think we need the `notifyMatchFailure` for the transform dialect. We should be able to use `FailureOr<vector::TransferWriteOp>` and return a TransferWriteOp from the `vectorize` side, I think. (Assuming that that was the reason to remove `FailureOr<vector::TransferWriteOp>`
1451	return failure()?
1503	Perhaps it's worth creating a vectorizeOpPreconditions utility or similar

hanchung added inline comments.May 12 2023, 4:46 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1286	I can add `notifyMatchFailure` back. RE `TransferWriteOp`, the transform dialect does not use the `TransferWriteOp` operation, so I remove it. @nicolasvasilache do we need to return the op for transform dialect use case?

hanchung added a child revision: D150497: [mlir][linalg] Add support for dynamic pad op masking vectorization..May 12 2023, 4:47 PM

Harbormaster completed remote builds in B231738: Diff 521838.May 12 2023, 4:57 PM

address comments -- creating a vectorizeOpPreconditions utility

hanchung edited the summary of this revision. (Show Details)May 12 2023, 5:06 PM

dcaballe added a reviewer: awarzynski.May 12 2023, 5:20 PM

Harbormaster completed remote builds in B231742: Diff 521843.May 12 2023, 5:55 PM

Thanks a ton for this clean-up! Just a few comments inline :)

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
579
mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp
2932	Could this diagnostic be a bit more descriptive? For example, "Unsupported Op, cannot vectorize".
2938	[nit] I wouldn't list any specific Ops here. There's already `linalg.generic`, `linalg.conv`, `tensor.pad`, `tensor.extract` [nit2] How about making this more descriptive: "Attempted to vectorize, but failed" (I think that it helps if the vectorizer communicates were _roughly_ it failed)
mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1466	Would adding tensorExtractVectorizationPrecondition to this list make sense? This would allow to get rid of customPreconditions, but then ExtractOp is never considered in isolation, is it?
1480
1524–1525	General question - would it make sense to completely separate convolutions from GenericOp? Basically, there seem to be 3 major cases in this file: linalg.generic linalg.conv tensor.pad I would be nice if the implementation reflected that - I know, easier said than done :) Also, perhaps my interpretation is wrong?

I am looking at this PR and the following comment in https://reviews.llvm.org/D148261:

Most of the code in maskedVectorize is duplicated/ad-hoc. I think a first step would be to unify maskedVectorize and vectorize into a single public API that can handle an Operation *, even if this public interface just dispatches to the existing maskedVectorize and vectorize initially. Then we can think about how to move tensor.pad support into the main vectorization path that uses the vectorization state, etc.

and I am wondering what the intended end state is.

It seems there is desire to fuse the various implementations and "sink the switch" deeper into the "main vectorization path that uses the vectorization state, etc".

IMO such deep switch into the implementation seems to go against the philosphy of external models that we have been using successfully e.g. in bufferization (@springerm in case he wants to chime in).

Instead, my hope was that we'd keep the switch at the top level and evolve towards more pluggable vectorization and external model + ops that define their own (masked) vectorization.

Could you comment on the approaches and tradeoffs @dcaballe ?

Hey, Nicolas! Thanks for the feedback!

One the challenges we are facing right now with a split API is that we end up generating code like the following on the client side:

for (auto op : candidates) {
  SmallVector<int64_t> vectorSizes;
  if (auto linalgOp = dyn_cast<linalg::LinalgOp>(op)) {
    if (enableVectorMasking) {
      vectorSizes.append(getVectorSizes(linalgOp, canonicalVectorShape));
    }
    (void)linalg::vectorize(rewriter, linalgOp, vectorSizes,
                            vectorizeGatherAccesses);
  } else if (auto padOp = dyn_cast<tensor::PadOp>(op)) {
    if (!enableVectorMasking) continue;
    auto ty = padOp.getResultType();
    // TODO(hanchung): Infer the vector sizes for pad op after
    // maskedVectorize method allows dynamic result shapes.
    if (!ty.hasStaticShape()) continue;
    SmallVector<int64_t> vectorSizes(ty.getShape().begin(),
                                     ty.getShape().end());
    FailureOr<vector::TransferWriteOp> maybeWriteOp =
        linalg::maskedVectorize(rewriter, padOp, vectorSizes);
    if (failed(maybeWriteOp)) {
      continue;
    }
  }
};

This has several drawbacks: a) both APIs are identical but we have to call them both individually, b) the client needs to know about all the APIs and call them individually and c) we have to add redundant support for flags, which can also lead to different "versions" of the APIs if they all are not updated at the same time.

The second big problem is masking. It is complicated to do it right. It requires a vectorization state and logic to determine if an operation actually needs masks or not so having everything go through the main vectorization path should help with all of this without replicating that logic. To give you a more specific example, the masks introduced in vectorizeAsTensorPadOp are causing some "silent" performance issues because we are masking all the pad ops, including those that doesn't require mask. That leads to some canonicalization patterns not triggering.

I think having a unified API and vectorization path should help with all of this, as long as we allow the flexibility that we need. Regarding extensibility, I thought we already had that with the "hook" mechanism. Perhaps it's just a matter of extending it support the external model approach? In that sense, bufferization seems like a much more general transformation that spans across multiple dialects. For vectorization we have multiple vectorizers (e.g., Linalg and Affine) so within the Linalg vectorizer things would be kind of contained, modulo some experimental operations that we may have off tree.

Another option would be to expose the vectorization state and all the masking logic in the public API but that doesn't sound good to me either. What are your thoughts about this?

I think having a unified API and vectorization path should help with all of this, as long as we allow the flexibility that we need. Regarding extensibility, I thought we already had that with the "hook" mechanism. Perhaps it's just a matter of extending it support the external model approach?

Precisely, I am talking about how to evolve this towards a better unified path and hopefully avoiding pitfalls I seem to see us engaging on while at the same time paying off some older technical debt.

Reading the quoted sentence above, I think I understand where the confusion comes from:

the hook mechanism is meant for extending vectorization to support new ops that appear *within* a linalg.generic region. It is one possible extension mechanism, it may or may not need to move to an external model (mostly depending on how much extensibility we need in the future).
the top-level switch is about supporting new ops that are unrelated to linalg.generic (e.g. tensor.pad, memref.copy). Similarly here, this may or may not need to move to an external model.

A unified API is good but I think I am also seeing premisses of trying to shoehorn 2. into 1. and I would caution against that: not everything should be represented in terms of / thought of as a single linalg.generic.

You can already see that e.g. tensor.pad has a different tiling external model and a different bufferization external model than a linalg.generic.
Some of the difference is due to the fact that tensor.pad is not DPS and we should likely evolve it to become DPS, but even with DPS this is still far from a linalg.generic.
Side historic note: early modelings tried to represent tensor.pad and tensor.concat as linalg.generic but that was a bridge too far: interfaces (e.g. for tiling, for bufferization, ...) was the way to go here.

At the top-level today, we have:

linalg.generic that implement specific flavors of convolution we know how to handle
linalg.generic with projected permutations for which we have a general algorithm
memref.copy (which is unnecessary duplication at this level but exists for other reasons)
tensor.pad
other ops we'd want to support

Separately, we have the hook for the body of a`linalg.generic` with projected permutations that handles tensor.extract etc.

Strawman proposal

We have likely reached a point where we need to revisit some of this and attack the deeper first design issue: why do we have 2 linalg.generic vectorizations?

My TL;DR is that complexity increased because of trying to do too many things at the same time that are not materializend in the IR.
We have reached the point where we could split this better between:

pluggable top-level op vectorization (with and without masking); this would capture tensor.pad, any linalg.generic and other ops at that level.
better separate masked transfers from masked/unmasked compute payload by introducing a vector.generic that we have been missing for a while: vector.generic is a programmable vector abstraction that carries good arithmetic intensity (i.e. it is a "better vector.contract" that also supports in-register reuse with sliding windows etc)
progressive lowerings and canonicalization between masked/unmasked vector.generic and other vector ops. The current hooks we have will then become progressive lowerings within the vector dialect and should work quite seamlessly with masking (i.e. pull one op out of a masked/unmasked vector.generic, rinse, repeat)

I believe this will set us off on a more composable longer-term path and nicely intercepts some of the ongoing work on higher-dimension vector programming for GPU.

Thoughts?

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1524–1525	For this particular point, I would want to see a real cost/benefit discussion before an effort is attempted to "completely separate convolutions from GenericOp", which has potentially deep implications. The fact that the current implementation of vectorization of Linalg ops with `ai + bj` is separate is to me more a symptom of missing abstractions in the vector dialect and the complexities that this creates. I elaborate more below.

@dcaballe @hanchung just to be clear in case this wasn't: this is not a request to hold off on this particular PR.
You have legitimate usability needs that seem addressed here.

The points I raised are about looking around the next corner, reevaluating the current state more holistically and evolving towards a better future: I believe this first implementation of vectorization that we have been adding to and predates a lot of the MLIR infrastructure has reached its limit.

Thanks for the review!

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1466	The old public method, which is `vectorizeLinalgOpPrecondition`, has the input argument; and I'd like to keep it the same for users. It exposes the check for users. I assume that users would use the method to do some analysis before they configure pipelines. User should consider the case when they are using it, right?
1524–1525	We can separate it to linalg::ConvInterface, linalg::LinalgOp, tensor::PadOp, but there are some concerns. What if an operation implements ConvInterface but it can't be vectorized by vectorizeConvolution method? Should we fall back to try `vectorizeAsLinalgGeneric`? I think we need a wider discussion about what the final vectorization model looks like before we completely separate them.

Thanks @nicolasvasilache for the details! Yes, I agree that we should evaluate the future state of vectorization and make progress toward it, though I don't know how it would look like. My intention is to have a unified API. I believe that this would be a good start toward a better future. (At least, we're starting the discussion here.)

Harbormaster completed remote builds in B232646: Diff 523100.May 17 2023, 11:07 AM

I think we are pretty aligned but there are some misunderstandings. The goal here was not to vectorize tensor.pad as a linalg.generic but just to bring tensor.pad vectorization to the main vectorization path so that we can reuse the vectorization state for masking, etc. (that is not done by this patch), but preserving its separate vectorization logic. That's all :).

I don't expect that everything can be vectorized using linalg.generic, but the fact that we have to extend "Linalg vectorizer" to support tensor ops kind of indicates that we have a representation gap in Linalg. Maybe we need a new flavor of linalg.generic where inputs can have different bounds, maybe we can extend linalg.generic to support "modifiers" of the iteration space (i.e., affine.if ops). This limitation also impacts convolution vectorization, which is kind of special cased because: 1) we are kind of decomposing conv ops in place as we vectorize, and 2) conv indexing maps are not pure affine but semi affine. We can certainly move decomposition outside of the vectorizer and improve vectorization support for semi-affine maps but that won't fix the abstraction gap in Linalg. We are mostly relying on LinalgOpInterface but that doesn't seem to be enough. We can extend Linalg representation to fill the gaps or think about a new "VectorizableOpInterface" that can be implemented by ops outside Linalg and provide all the information that the vectorizer needs. WDYT?

vector.generic sounds great a an old TODO, indeed, but I think that's something to use after vectorization. It won't solve the problem that we have before vectorization.

Ok, I think we can continue the design discussion in a different venue. Nicolas is ok with moving forward so if all the feedback has been addressed, it should be ok to move forward with it. Thanks for working on this!

This revision is now accepted and ready to land.May 18 2023, 10:54 AM

Closed by commit rG0d871fef186d: [mlir][linalg] Unify generic vectorization interface. (authored by hanchung). · Explain WhyMay 18 2023, 12:59 PM

This revision was automatically updated to reflect the committed changes.

hanchung added a commit: rG0d871fef186d: [mlir][linalg] Unify generic vectorization interface..

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

Transforms.h

9 lines

lib/

Dialect/

Linalg/

TransformOps/

LinalgTransformOps.cpp

19 lines

Transforms/

Vectorization.cpp

201 lines

Diff 521832

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines

/// Normal copy to between src and dst.

LogicalResult copyToGPUPrivateMemory(OpBuilder &b, Value src, Value dst);

/// In case of GPU private memory there is no need to deallocate since the

/// memory is freed when going outside of the scope.

LogicalResult deallocateGPUPrivateMemory(OpBuilder &, Value /*buffer*/);

/// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes`

awarzynskiUnsubmitted

Done

LogicalResult deallocateGPUPrivateMemory(OpBuilder &, Value /*buffer*/);

- /// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes`

+ /// Emit a suitable vector form for /p op. If provided, `inputVectorSizes`

/// are used to vectorize this operation. `inputVectorSizes` must match the rank

awarzynski:

/// are used to vectorize this operation. `inputVectorSizes` must match the rank

/// of the iteration space of the operation and the sizes must be smaller or

/// equal than their counterpart interation space sizes, if static.

/// `inputVectorShapes` also allows the vectorization of operations with dynamic

/// shapes.

LogicalResult vectorize(RewriterBase &rewriter, LinalgOp linalgOp,

LogicalResult vectorize(RewriterBase &rewriter, Operation *op,

ArrayRef<int64_t> inputVectorSizes = {},

bool vectorizeNDExtract = false);

/// Emit a suitable vector form for a Copy op with fully static shape.

LogicalResult vectorizeCopy(RewriterBase &builder, memref::CopyOp copyOp);

/// Vectorize a `padOp` with (1) static result type, (2) constant padding value

/// and (3) all-zero lowPad to

/// `transfer_write_in_bounds(transfer_read_masked(pad_source, pad_value))`.

FailureOr<vector::TransferWriteOp>

maskedVectorize(RewriterBase &rewriter, tensor::PadOp padOp,

ArrayRef<int64_t> inputVectorSizes);

/// Emit a loop nest of `scf.for` with the proper body for `linalgOp`.

FailureOr<LinalgLoops> linalgOpToLoops(RewriterBase &rewriter,

LinalgOp linalgOp);

/// Emit a loop nest of `scf.parallel` with the proper body for `linalgOp`.

FailureOr<LinalgLoops> linalgOpToParallelLoops(RewriterBase &rewriter,

LinalgOp linalgOp);

▲ Show 20 Lines • Show All 844 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

Show First 20 Lines • Show All 2,921 Lines • ▼ Show 20 Lines	for (OpFoldResult sz : getMixedVectorSizes()) {
}		}

vectorSizes.push_back(attr.getInt());		vectorSizes.push_back(attr.getInt());
}		}

// TODO: Check that the correct number of vectorSizes was provided.		// TODO: Check that the correct number of vectorSizes was provided.

for (Operation *target : targets) {		for (Operation *target : targets) {
if (auto padOp = dyn_cast<tensor::PadOp>(target)) {		if (!isa<linalg::LinalgOp, tensor::PadOp>(target)) {
FailureOr<vector::TransferWriteOp> maybeWriteOp =
maskedVectorize(rewriter, padOp, vectorSizes);
if (failed(maybeWriteOp)) {
return mlir::emitSilenceableFailure(target->getLoc())		return mlir::emitSilenceableFailure(target->getLoc())
<< "failed to vectorize padOp";		<< "cannot vectorize op ";
		awarzynskiUnsubmitted Done Reply Inline Actions Could this diagnostic be a bit more descriptive? For example, "Unsupported Op, cannot vectorize". awarzynski: Could this diagnostic be a bit more descriptive? For example, "Unsupported Op, cannot…
}
continue;
}

auto linalgOp = dyn_cast<LinalgOp>(target);
if (!linalgOp) {
return mlir::emitSilenceableFailure(target->getLoc())
<< "cannot vectorize non-Linalg op";
}		}

if (failed(linalg::vectorize(rewriter, linalgOp, vectorSizes,		if (failed(linalg::vectorize(rewriter, target, vectorSizes,
getVectorizeNdExtract()))) {		getVectorizeNdExtract()))) {
return mlir::emitSilenceableFailure(target->getLoc())		return mlir::emitSilenceableFailure(target->getLoc())
<< "failed to vectorize linalg op";		<< "failed to vectorize linalg/pad op";
		awarzynskiUnsubmitted Done Reply Inline Actions [nit] I wouldn't list any specific Ops here. There's already `linalg.generic`, `linalg.conv`, `tensor.pad`, `tensor.extract` [nit2] How about making this more descriptive: "Attempted to vectorize, but failed" (I think that it helps if the vectorizer communicates were _roughly_ it failed) awarzynski: [nit] I wouldn't list any specific Ops here. There's already `linalg.generic`, `linalg.conv`…
}		}
}		}

return DiagnosedSilenceableFailure::success();		return DiagnosedSilenceableFailure::success();
}		}

void transform::MaskedVectorizeOp::getEffects(		void transform::MaskedVectorizeOp::getEffects(
SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {		SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {
▲ Show 20 Lines • Show All 154 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 1,274 Lines • ▼ Show 20 Lines if (result.status == VectorizationStatus::NewOp) {

LDBG("New vector op: " << *maybeMaskedOp << "\n"); LDBG("New vector op: " << *maybeMaskedOp << "\n");

bvm.map(op.getResults(), maybeMaskedOp->getResults()); bvm.map(op.getResults(), maybeMaskedOp->getResults());

} }

return success(); return success();

} }

/// Vectorize a `padOp` with (1) static result type, (2) constant padding value

/// and (3) all-zero lowPad to

/// `transfer_write_in_bounds(transfer_read_masked(pad_source, pad_value))`.

static LogicalResult

dcaballeUnsubmitted

Not Done

I think we need the notifyMatchFailure for the transform dialect. We should be able to use FailureOr<vector::TransferWriteOp> and return a TransferWriteOp from the vectorize side, I think. (Assuming that that was the reason to remove FailureOr<vector::TransferWriteOp>

dcaballe: I think we need the `notifyMatchFailure` for the transform dialect. We should be able to use…

hanchungAuthorUnsubmitted

Done

I can add notifyMatchFailure back. RE TransferWriteOp, the transform dialect does not use the TransferWriteOp operation, so I remove it. @nicolasvasilache do we need to return the op for transform dialect use case?

hanchung: I can add `notifyMatchFailure` back. RE `TransferWriteOp`, the transform dialect does not use…

vectorizeAsTensorPadOp(RewriterBase &rewriter, tensor::PadOp padOp,

ArrayRef<int64_t> inputVectorSizes,

SmallVectorImpl<Value> &newResults) {

auto padValue = padOp.getConstantPaddingValue();

Location loc = padOp.getLoc();

int64_t rank = inputVectorSizes.size();

auto maskType = VectorType::get(inputVectorSizes, rewriter.getI1Type());

auto vectorType = VectorType::get(inputVectorSizes, padValue.getType());

// transfer_write_in_bounds(transfer_read_masked(pad_source, pad_value))

OpBuilder::InsertionGuard g(rewriter);

rewriter.setInsertionPoint(padOp);

auto zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);

auto emptyOp =

rewriter.create<tensor::EmptyOp>(loc, padOp.getResultType(),

/*dynamicSizes=*/ValueRange{});

SmallVector<OpFoldResult> mixedSourceDims =

getMixedDimensions(rewriter, loc, padOp.getSource());

Value mask =

rewriter.create<vector::CreateMaskOp>(loc, maskType, mixedSourceDims);

auto transferReadOp = rewriter.create<vector::TransferReadOp>(

loc,

/*vectorType=*/vectorType,

/*source=*/padOp.getSource(),

/*indices=*/SmallVector<Value>(rank, zero),

/*padding=*/padValue,

/*inBounds=*/SmallVector<bool>(rank, true));

auto maskedOp = cast<vector::MaskOp>(

mlir::vector::maskOperation(rewriter, transferReadOp, mask));

auto transferWriteOp = rewriter.create<vector::TransferWriteOp>(

loc,

/*vector=*/maskedOp->getResult(0),

/*source=*/emptyOp,

/*indices=*/SmallVector<Value>(rank, zero),

/*inBounds=*/SmallVector<bool>(rank, true));

newResults.push_back(transferWriteOp.getResult());

return success();

}

// TODO: probably need some extra checks for reduction followed by consumer // TODO: probably need some extra checks for reduction followed by consumer

// ops that may not commute (e.g. linear reduction + non-linear instructions). // ops that may not commute (e.g. linear reduction + non-linear instructions).

static LogicalResult reductionPreconditions(LinalgOp op) { static LogicalResult reductionPreconditions(LinalgOp op) {

if (llvm::none_of(op.getIteratorTypesArray(), isReductionIterator)) { if (llvm::none_of(op.getIteratorTypesArray(), isReductionIterator)) {

LDBG("reduction precondition failed: no reduction iterator\n"); LDBG("reduction precondition failed: no reduction iterator\n");

return failure(); return failure();

} }

for (OpOperand *opOperand : op.getDpsInitOperands()) { for (OpOperand *opOperand : op.getDpsInitOperands()) {

▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines mlir::linalg::vectorizeLinalgOpPrecondition(LinalgOp linalgOp,

} }

if (failed(reductionPreconditions(linalgOp))) { if (failed(reductionPreconditions(linalgOp))) {

LDBG("precondition failed: reduction preconditions\n"); LDBG("precondition failed: reduction preconditions\n");

return failure(); return failure();

} }

return success(); return success();

} }

/// Converts affine.apply Ops to arithmetic operations. static LogicalResult

static void convertAffineApply(RewriterBase &rewriter, LinalgOp linalgOp) { vectorizePadOpPrecondition(tensor::PadOp padOp,

OpBuilder::InsertionGuard g(rewriter);

auto toReplace = linalgOp.getBlock()->getOps<affine::AffineApplyOp>();

for (auto op : make_early_inc_range(toReplace)) {

rewriter.setInsertionPoint(op);

auto expanded = affine::expandAffineExpr(

rewriter, op->getLoc(), op.getAffineMap().getResult(0),

op.getOperands().take_front(op.getAffineMap().getNumDims()),

op.getOperands().take_back(op.getAffineMap().getNumSymbols()));

rewriter.replaceOp(op, expanded);

}

FailureOr<vector::TransferWriteOp>

mlir::linalg::maskedVectorize(RewriterBase &rewriter, tensor::PadOp padOp,

ArrayRef<int64_t> inputVectorSizes) { ArrayRef<int64_t> inputVectorSizes) {

auto padValue = padOp.getConstantPaddingValue(); auto padValue = padOp.getConstantPaddingValue();

if (!padValue) { if (!padValue) {

LDBG("pad value is not constant: " << padOp << "\n"); LDBG("pad value is not constant: " << padOp << "\n");

return rewriter.notifyMatchFailure(padOp, "pad value is not constant"); return failure();

} }

ArrayRef<int64_t> resultTensorShape = padOp.getResultType().getShape(); ArrayRef<int64_t> resultTensorShape = padOp.getResultType().getShape();

if (!(resultTensorShape == inputVectorSizes)) { if (!(resultTensorShape == inputVectorSizes)) {

LDBG("result tensor shape must match input vector sizes: " << padOp LDBG("result tensor shape must match input vector sizes: " << padOp

<< "\n"); << "\n");

return rewriter.notifyMatchFailure( // return failure();

dcaballeUnsubmitted

Done

return failure()?

dcaballe: return failure()?

padOp, "result tensor shape must match input vector sizes");

} }

if (llvm::any_of(padOp.getLow(), [](Value v) { if (llvm::any_of(padOp.getLow(), [](Value v) {

std::optional<int64_t> res = getConstantIntValue(v); std::optional<int64_t> res = getConstantIntValue(v);

return !res.has_value() || res.value() != 0; return !res.has_value() || res.value() != 0;

})) { })) {

LDBG("low pad must all be zero: " << padOp << "\n"); LDBG("low pad must all be zero: " << padOp << "\n");

return rewriter.notifyMatchFailure(padOp, "low pad must all be zero"); return failure();

} }

Location loc = padOp.getLoc(); return success();

int64_t rank = inputVectorSizes.size(); }

auto maskType = VectorType::get(inputVectorSizes, rewriter.getI1Type());

auto vectorType = VectorType::get(inputVectorSizes, padValue.getType());

// transfer_write_in_bounds(transfer_read_masked(pad_source, pad_value)) /// Converts affine.apply Ops to arithmetic operations.

static void convertAffineApply(RewriterBase &rewriter, LinalgOp linalgOp) {

awarzynskiUnsubmitted

Not Done

Would adding tensorExtractVectorizationPrecondition to this list make sense? This would allow to get rid of customPreconditions, but then ExtractOp is never considered in isolation, is it?

awarzynski: Would adding [[ https://github.com/llvm/llvm-project/blob/2c52a1892505aeefd7735beafda2a410cde2c…

hanchungAuthorUnsubmitted

Done

The old public method, which is vectorizeLinalgOpPrecondition, has the input argument; and I'd like to keep it the same for users. It exposes the check for users. I assume that users would use the method to do some analysis before they configure pipelines. User should consider the case when they are using it, right?

hanchung: The old public method, which is `vectorizeLinalgOpPrecondition`, has the input argument; and…

OpBuilder::InsertionGuard g(rewriter); OpBuilder::InsertionGuard g(rewriter);

rewriter.setInsertionPoint(padOp); auto toReplace = linalgOp.getBlock()->getOps<affine::AffineApplyOp>();

auto zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);

auto emptyOp = for (auto op : make_early_inc_range(toReplace)) {

rewriter.create<tensor::EmptyOp>(loc, padOp.getResultType(), rewriter.setInsertionPoint(op);

/*dynamicSizes=*/ValueRange{}); auto expanded = affine::expandAffineExpr(

SmallVector<OpFoldResult> mixedSourceDims = rewriter, op->getLoc(), op.getAffineMap().getResult(0),

getMixedDimensions(rewriter, loc, padOp.getSource()); op.getOperands().take_front(op.getAffineMap().getNumDims()),

Value mask = op.getOperands().take_back(op.getAffineMap().getNumSymbols()));

rewriter.create<vector::CreateMaskOp>(loc, maskType, mixedSourceDims); rewriter.replaceOp(op, expanded);

auto transferReadOp = rewriter.create<vector::TransferReadOp>( }

loc,

/*vectorType=*/vectorType,

/*source=*/padOp.getSource(),

/*indices=*/SmallVector<Value>(rank, zero),

/*padding=*/padValue,

/*inBounds=*/SmallVector<bool>(rank, true));

auto maskedOp = cast<vector::MaskOp>(

mlir::vector::maskOperation(rewriter, transferReadOp, mask));

auto transferWriteOp = rewriter.create<vector::TransferWriteOp>(

loc,

/*vector=*/maskedOp->getResult(0),

/*source=*/emptyOp,

/*indices=*/SmallVector<Value>(rank, zero),

/*inBounds=*/SmallVector<bool>(rank, true));

rewriter.replaceOp(padOp, transferWriteOp->getResults());

return transferWriteOp;

} }

/// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes` /// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes`

awarzynskiUnsubmitted

Not Done

}

- /// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes`

+ /// Emit a suitable vector form for /p op. If provided, `inputVectorSizes`

/// are used to vectorize this operation. `inputVectorSizes` must match the rank

awarzynski:

/// are used to vectorize this operation. `inputVectorSizes` must match the rank /// are used to vectorize this operation. `inputVectorSizes` must match the rank

/// of the iteration space of the operation and the input vector sizes must be /// of the iteration space of the operation and the input vector sizes must be

/// greater than or equal to their counterpart iteration space sizes, if static. /// greater than or equal to their counterpart iteration space sizes, if static.

/// `inputVectorShapes` also allows the vectorization of operations with dynamic /// `inputVectorShapes` also allows the vectorization of operations with dynamic

/// shapes. /// shapes.

LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, LinalgOp linalgOp, LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, Operation *op,

ArrayRef<int64_t> inputVectorSizes, ArrayRef<int64_t> inputVectorSizes,

bool vectorizeNDExtract) { bool vectorizeNDExtract) {

LDBG("Attempting to vectorize:\n" << linalgOp << "\n"); LDBG("Attempting to vectorize:\n" << *op << "\n");

LDBG("Input vector sizes: "); LDBG("Input vector sizes: ");

LLVM_DEBUG(llvm::interleaveComma(inputVectorSizes, llvm::dbgs())); LLVM_DEBUG(llvm::interleaveComma(inputVectorSizes, llvm::dbgs()));

LLVM_DEBUG(llvm::dbgs() << "\n"); LLVM_DEBUG(llvm::dbgs() << "\n");

if (failed(vectorizeLinalgOpPrecondition(linalgOp, inputVectorSizes, auto precondResult =

vectorizeNDExtract))) { TypeSwitch<Operation *, LogicalResult>(op)

.Case<linalg::LinalgOp>([&](auto linalgOp) {

return vectorizeLinalgOpPrecondition(linalgOp, inputVectorSizes,

vectorizeNDExtract);

})

.Case<tensor::PadOp>([&](auto padOp) {

return vectorizePadOpPrecondition(padOp, inputVectorSizes);

})

.Default([](auto) { return failure(); });

dcaballeUnsubmitted

Done

Perhaps it's worth creating a vectorizeOpPreconditions utility or similar

dcaballe: Perhaps it's worth creating a vectorizeOpPreconditions utility or similar

if (failed(precondResult)) {

LDBG("Vectorization pre-conditions failed\n"); LDBG("Vectorization pre-conditions failed\n");

return failure(); return failure();

} }

// Initialize vectorization state. // Initialize vectorization state.

VectorizationState state(rewriter); VectorizationState state(rewriter);

if (auto linalgOp = dyn_cast<linalg::LinalgOp>(op)) {

if (failed(state.initState(rewriter, linalgOp, inputVectorSizes))) { if (failed(state.initState(rewriter, linalgOp, inputVectorSizes))) {

LDBG("Vectorization state couldn't be initialized\n"); LDBG("Vectorization state couldn't be initialized\n");

return failure(); return failure();

} }

}

SmallVector<Value> results; SmallVector<Value> results;

auto vectorizeResult =

TypeSwitch<Operation *, LogicalResult>(op)

.Case<linalg::LinalgOp>([&](auto linalgOp) {

// TODO: isaConvolutionOpInterface that can also infer from generic // TODO: isaConvolutionOpInterface that can also infer from generic

// features. Will require stride/dilation attributes inference. // features. Will require stride/dilation attributes inference.

FailureOr<Operation *> convOr = vectorizeConvolution(rewriter, linalgOp); FailureOr<Operation *> convOr =

vectorizeConvolution(rewriter, linalgOp);

awarzynskiUnsubmitted

Not Done

General question - would it make sense to completely separate convolutions from GenericOp? Basically, there seem to be 3 major cases in this file:

linalg.generic
linalg.conv
tensor.pad

I would be nice if the implementation reflected that - I know, easier said than done :) Also, perhaps my interpretation is wrong?

awarzynski: General question - would it make sense to completely separate convolutions from GenericOp?

nicolasvasilacheUnsubmitted

Not Done

For this particular point, I would want to see a real cost/benefit discussion before an effort is attempted to "completely separate convolutions from GenericOp", which has potentially deep implications.
The fact that the current implementation of vectorization of Linalg ops with a*i + b*j is separate is to me more a symptom of missing abstractions in the vector dialect and the complexities that this creates.
I elaborate more below.

nicolasvasilache: For this particular point, I would want to see a real cost/benefit discussion before an effort…

hanchungAuthorUnsubmitted

Done

We can separate it to linalg::ConvInterface, linalg::LinalgOp, tensor::PadOp, but there are some concerns. What if an operation implements ConvInterface but it can't be vectorized by vectorizeConvolution method? Should we fall back to try vectorizeAsLinalgGeneric? I think we need a wider discussion about what the final vectorization model looks like before we completely separate them.

hanchung: We can separate it to linalg::ConvInterface, linalg::LinalgOp, tensor::PadOp, but there are…

if (succeeded(convOr)) { if (succeeded(convOr)) {

llvm::append_range(results, (*convOr)->getResults()); llvm::append_range(results, (*convOr)->getResults());

} else { return success();

if (failed(vectorizeLinalgOpPrecondition(linalgOp, inputVectorSizes, }

vectorizeNDExtract)))

return failure(); LDBG("Vectorize generic by broadcasting to the canonical vector "

LDBG("Vectorize generic by broadcasting to the canonical vector shape\n"); "shape\n");

// Pre-process before proceeding. // Pre-process before proceeding.

convertAffineApply(rewriter, linalgOp); convertAffineApply(rewriter, linalgOp);

// TODO: 'vectorize' takes in a 'RewriterBase' which is up-casted to // TODO: 'vectorize' takes in a 'RewriterBase' which is up-casted

// 'OpBuilder' when it is passed over to some methods like // to 'OpBuilder' when it is passed over to some methods like

// 'vectorizeAsLinalgGeneric'. This is highly problematic: if we erase an op // 'vectorizeAsLinalgGeneric'. This is highly problematic: if we

// within these methods, the actual rewriter won't be notified and we will // erase an op within these methods, the actual rewriter won't be

// end up with read-after-free issues! // notified and we will end up with read-after-free issues!

if (failed(vectorizeAsLinalgGeneric(rewriter, state, linalgOp, results))) return vectorizeAsLinalgGeneric(rewriter, state, linalgOp, results);

})

.Case<tensor::PadOp>([&](auto padOp) {

return vectorizeAsTensorPadOp(rewriter, padOp, inputVectorSizes,

results);

})

.Default([](auto) { return failure(); });

if (failed(vectorizeResult)) {

LDBG("Vectorization failed\n");

return failure(); return failure();

} }

if (!results.empty()) if (!results.empty())

rewriter.replaceOp(linalgOp, results); rewriter.replaceOp(op, results);

else else

rewriter.eraseOp(linalgOp); rewriter.eraseOp(op);

return success(); return success();

} }

LogicalResult mlir::linalg::vectorizeCopy(RewriterBase &rewriter, LogicalResult mlir::linalg::vectorizeCopy(RewriterBase &rewriter,

memref::CopyOp copyOp) { memref::CopyOp copyOp) {

auto srcType = cast<MemRefType>(copyOp.getSource().getType()); auto srcType = cast<MemRefType>(copyOp.getSource().getType());

▲ Show 20 Lines • Show All 1,426 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Unify generic vectorization interface.ClosedPublic

Details

Diff Detail

Event Timeline

Strawman proposal

Revision Contents

Diff 521832

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/lib/Dialect/Linalg/TransformOps/LinalgTransformOps.cpp

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

[mlir][linalg] Unify generic vectorization interface.
ClosedPublic