This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
-
Vectorization.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
vectorize-convolution.mlir

Differential D149155

[mlir][linalg][conv] Flatten the channel dimension
AbandonedPublic

Authored by awarzynski on Apr 25 2023, 7:08 AM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
dcaballe
ThomasRaoux
rsuderman

Summary

This patch adds an option to flatten the channel dimension when
vectorising 1D convolutions. This is very beneficial when vectorising
convolutions of tensors with low channel count (e.g. 1 or 2). Tensors
like this are very common in Computer Vision workloads.

This doesn't change the vectorisation in any fundamental way. However,
it does require collapsing and then re-expanding shapes as well as some
other fine-tuning when enabled. At the Linalg vectoriser level, this is
controlled through the flattenChannelDim flag (new parameter for
depthwiseConv), which defaults to false (so that the default
behaviour does not change).

Co-authored by: Bradley Smith <bradley.smith@arm.com>

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

awarzynski created this revision.Apr 25 2023, 7:08 AM

Herald added a reviewer: aartbik. · View Herald TranscriptApr 25 2023, 7:08 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, hanchung, Moerafaat and 25 others. · View Herald Transcript

awarzynski requested review of this revision.Apr 25 2023, 7:08 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 25 2023, 7:08 AM

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: • pcwang-thead, limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

awarzynski added reviewers: ThomasRaoux, rsuderman.Apr 25 2023, 7:09 AM

Harbormaster completed remote builds in B228023: Diff 516796.Apr 25 2023, 7:20 AM

Hi everyone,

This specialization of the vectorizer is incredibly beneficial for our workloads. I do realize that there's a bit of code duplication here, so I am happy to refactor this if there's a better way. I am yet to figure out how to connect flattenChannelDim (in Vectorization.cpp) to the vectorization patters, but was hoping the get some feedback on the changes in depthwiseConv first. I hope that that's OK.

Thanks for taking a look!

awarzynski retitled this revision from [linalg][conv] Flatten the channel dimension to [mlir][linalg][conv] Flatten the channel dimension.Apr 26 2023, 12:51 AM

Thanks for the contribution! I think the general consensus here is that we have to refactor all the "pre-vectorization" convolution decomposition/transformation work somewhere outside the vectorizer but we haven't had a chance to do so. I would be reluctant to adding more complexity to Vectorization.cpp in this regard without at least trying to do so (to the extent that I have a patch that introduces masking support for convolutions but I'm stuck with it for the same reason). Since your flatten transformation seems something relatively independent of the existing decomposition, would you be willing to try moving it to a pre-vectorization pass? Thanks!

In D149155#4306229, @dcaballe wrote:

Thanks for the contribution! I think the general consensus here is that we have to refactor all the "pre-vectorization" convolution decomposition/transformation work somewhere outside the vectorizer but we haven't had a chance to do so. I would be reluctant to adding more complexity to Vectorization.cpp in this regard without at least trying to do so (to the extent that I have a patch that introduces masking support for convolutions but I'm stuck with it for the same reason). Since your flatten transformation seems something relatively independent of the existing decomposition, would you be willing to try moving it to a pre-vectorization pass? Thanks!

+1, this seems like you'd want to reach for a pre-vectorization rewrite at the tensor level.

In D149155#4326318, @nicolasvasilache wrote:

In D149155#4306229, @dcaballe wrote:

Thanks for the contribution! I think the general consensus here is that we have to refactor all the "pre-vectorization" convolution decomposition/transformation work somewhere outside the vectorizer but we haven't had a chance to do so. I would be reluctant to adding more complexity to Vectorization.cpp in this regard without at least trying to do so (to the extent that I have a patch that introduces masking support for convolutions but I'm stuck with it for the same reason). Since your flatten transformation seems something relatively independent of the existing decomposition, would you be willing to try moving it to a pre-vectorization pass? Thanks!

+1, this seems like you'd want to reach for a pre-vectorization rewrite at the tensor level.

Thank you for taking a look and for your feedback! I'm worried that a pass won't be sufficient. Basically, I am trying to rewrite "depthwise NHWC" convolutions as "depthwise NH(WxC)" (or "depthwise NHW"). Now, the vectorizer splits convolutions into:

"channeled with batch number" (see here) (e.g. depthwhise NHWC),
"non-channeled without batch number" (see here) (e.g. "plain" HW).

However, what I am proposing would require a dedicated category:

"non-channeled with batch number" (e.g. "depthwise NHW").

So, on top of a pass I will most likely need yet another hook/case in the vectorizer to support these "new" convolutions. Perhaps I'm missing something obvious? In any case, I wanted to bring this up before attempting a different approach (i.e. with a pass). If need be, would it be OK to introduce another case for convolutions in the vectorizer?

So, on top of a pass I will most likely need yet another hook/case in the vectorizer to support these "new" convolutions. Perhaps I'm missing something obvious? In any case, I wanted to bring this up before attempting a different approach (i.e. with a pass). If need be, would it be OK to introduce another case for convolutions in the vectorizer?

Yes, I think if you can move most of the transformation to a pre-vectorizer pass, then just adding a new case to drive the specific vectorization for that case would be acceptable.

In D149155#4332999, @dcaballe wrote:

So, on top of a pass I will most likely need yet another hook/case in the vectorizer to support these "new" convolutions. Perhaps I'm missing something obvious? In any case, I wanted to bring this up before attempting a different approach (i.e. with a pass). If need be, would it be OK to introduce another case for convolutions in the vectorizer?

Yes, I think if you can move most of the transformation to a pre-vectorizer pass, then just adding a new case to drive the specific vectorization for that case would be acceptable.

Thank you for confirming!

I agree that that would be much cleaner and more future-proof, though will require more work. Let me abandon this for now. My bandwidth is limited ATM, but we should be able to upload something in the next few months (indicating timescales just in case others are interested in this as well). Thank you for the feedback!

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

231 lines

test/

Dialect/

Linalg/

vectorize-convolution.mlir

32 lines

Diff 516796

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 2,590 Lines • ▼ Show 20 Lines	struct Conv1DGenerator

/// Generate a vector implementation for:		/// Generate a vector implementation for:
/// ```		/// ```
/// Op def: ( n, w, c, kw)		/// Op def: ( n, w, c, kw)
/// Iters: ({Par(), Par(), Par(), Red()})		/// Iters: ({Par(), Par(), Par(), Red()})
/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c}, {n, w, c}}		/// Layout: {{n, strideW * w + dilationW * kw, c}, {kw, c}, {n, w, c}}
/// ```		/// ```
/// kw is always unrolled.		/// kw is always unrolled.
		///
		/// Set /p flattenChannelDim to true to flatten the channel dimension too.
		/// This leads to better vectorisation when the number of channels is low
		/// relative to native vector sizes (e.g. 1 vs 4).
/// TODO: w (resp. kw) is unrolled when the strideW ( resp. dilationW) is		/// TODO: w (resp. kw) is unrolled when the strideW ( resp. dilationW) is
/// > 1.		/// > 1.
FailureOr<Operation *> depthwiseConv() {		FailureOr<Operation *> depthwiseConv(
		bool flattenChannelDim = true /* TODO: Change the default to false in the final version */
		) {
if (!valid)		if (!valid)
return rewriter.notifyMatchFailure(op, "unvectorizable depthwise conv");		return rewriter.notifyMatchFailure(op, "unvectorizable depthwise conv");

int64_t nSize, wSize, cSize, kwSize;		int64_t nSize, wSize, cSize, kwSize;
// kernel{kw, c}		// kernel{kw, c}
bindShapeDims(rhsShapedType, kwSize, cSize);		bindShapeDims(rhsShapedType, kwSize, cSize);
// out{n, w, c}		// out{n, w, c}
bindShapeDims(resShapedType, nSize, wSize);		bindShapeDims(resShapedType, nSize, wSize);

vector::TransferWriteOp write;		vector::TransferWriteOp write;
Value zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);		Value zero = rewriter.create<arith::ConstantIndexOp>(loc, 0);

// w is unrolled (i.e. wSizeStep == 1) iff strideW > 1.		// w is unrolled (i.e. wSizeStep == 1) iff strideW > 1.
// When strideW == 1, we can batch the contiguous loads and avoid		// When strideW == 1, we can batch the contiguous loads and avoid
// unrolling		// unrolling
int64_t wSizeStep = strideW == 1 ? wSize : 1;		int64_t wSizeStep = strideW == 1 ? wSize : 1;

Type lhsEltType = lhsShapedType.getElementType();		Type lhsEltType = lhsShapedType.getElementType();
Type rhsEltType = rhsShapedType.getElementType();		Type rhsEltType = rhsShapedType.getElementType();
Type resEltType = resShapedType.getElementType();		Type resEltType = resShapedType.getElementType();
VectorType lhsType = VectorType::get(		VectorType rhsType = VectorType::get({kwSize, cSize}, rhsEltType);
		VectorType lhsType;
		if (!flattenChannelDim) {
		lhsType = VectorType::get(
{nSize,		{nSize,
// iw = ow * sw + kw * dw - 1		// iw = ow * sw + kw * dw - 1
// (i.e. 16 convolved with 3 (@stride 1 dilation 1) -> 14)		// (i.e. 16 convolved with 3 (@stride 1 dilation 1) -> 14)
((wSize - 1) * strideW + 1) + ((kwSize - 1) * dilationW + 1) - 1,		((wSize - 1) * strideW + 1) + ((kwSize - 1) * dilationW + 1) - 1,
cSize},		cSize},
lhsEltType);		lhsEltType);
VectorType rhsType = VectorType::get({kwSize, cSize}, rhsEltType);		} else {
VectorType resType = VectorType::get({nSize, wSize, cSize}, resEltType);		lhsType = VectorType::get(
		{nSize,
		// iw = (ow * sw + kw * dw - 1) * c
		// (i.e. 16 convolved with 3 (@stride 1 dilation 1) -> 14)
		(((wSize - 1) * strideW + 1) + ((kwSize - 1) * dilationW + 1) - 1) *
		cSize},
		lhsEltType);
		}

// Read lhs slice of size {n, w * strideW + kw * dilationW, c} @ [0, 0,		VectorType resType;
// 0].		if (!flattenChannelDim) {
Value lhs = rewriter.create<vector::TransferReadOp>(		resType = VectorType::get({nSize, wSize, cSize}, resEltType);
loc, lhsType, lhsShaped, ValueRange{zero, zero, zero});		} else {
		resType = VectorType::get({nSize, wSize * cSize}, resEltType);
		}

		Value res, lhs, lhsFlat, resFlat;
// Read rhs slice of size {kw, c} @ [0, 0].		// Read rhs slice of size {kw, c} @ [0, 0].
Value rhs = rewriter.create<vector::TransferReadOp>(loc, rhsType, rhsShaped,		Value rhs = rewriter.create<vector::TransferReadOp>(loc, rhsType, rhsShaped,
ValueRange{zero, zero});		ValueRange{zero, zero});

		SmallVector<ReassociationIndices> reassociation;
		if (!flattenChannelDim) {
		// Read lhs slice of size {n, w * strideW + kw * dilationW, c} @ [0, 0,
		// 0].
		lhs = rewriter.create<vector::TransferReadOp>(
		loc, lhsType, lhsShaped, ValueRange{zero, zero, zero});
// Read res slice of size {n, w, c} @ [0, 0, 0].		// Read res slice of size {n, w, c} @ [0, 0, 0].
Value res = rewriter.create<vector::TransferReadOp>(		res = rewriter.create<vector::TransferReadOp>(
loc, resType, resShaped, ValueRange{zero, zero, zero});		loc, resType, resShaped, ValueRange{zero, zero, zero});
		} else {
		reassociation = {{0}, {1, 2}};

		// Flatten w and c dimensions
		lhsFlat = rewriter.create<tensor::CollapseShapeOp>(
		loc, RankedTensorType::get(lhsType.getShape(), lhsEltType), lhsShaped,
		reassociation);
		resFlat = rewriter.create<tensor::CollapseShapeOp>(
		loc, RankedTensorType::get(resType.getShape(), resEltType), resShaped,
		reassociation);

		// Read lhs slice of size {n, (w * strideW + kw * dilationW) * c} @ [0,
		// 0].
		lhs = rewriter.create<vector::TransferReadOp>(loc, lhsType, lhsFlat,
		ValueRange{zero, zero});
		// Read res slice of size {n, w * c} @ [0, 0].
		res = rewriter.create<vector::TransferReadOp>(loc, resType, resFlat,
		ValueRange{zero, zero});
		}

//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
// Begin vector-only rewrite part		// Begin vector-only rewrite part
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
// Unroll along kw and read slices of lhs and rhs.		// Unroll along kw and read slices of lhs and rhs.
SmallVector<Value> lhsVals, rhsVals, resVals;		SmallVector<Value> lhsVals, rhsVals, resVals;
		if (!flattenChannelDim) {
// Extract lhs slice of size {n, wSizeStep, c}		// Extract lhs slice of size {n, wSizeStep, c}
// @ [0, sw * w + dw * kw, 0].		// @ [0, sw * w + dw * kw, 0].
for (int64_t kw = 0; kw < kwSize; ++kw) {		for (int64_t kw = 0; kw < kwSize; ++kw) {
for (int64_t w = 0; w < wSize; w += wSizeStep) {		for (int64_t w = 0; w < wSize; w += wSizeStep) {
lhsVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(		lhsVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(
loc, lhs,		loc, lhs,
/offsets=/ArrayRef<int64_t>{0, w * strideW + kw * dilationW, 0},		/offsets=/ArrayRef<int64_t>{0, w * strideW + kw * dilationW, 0},
/sizes=/ArrayRef<int64_t>{nSize, wSizeStep, cSize},		/sizes=/ArrayRef<int64_t>{nSize, wSizeStep, cSize},
/strides=/ArrayRef<int64_t>{1, 1, 1}));		/strides=/ArrayRef<int64_t>{1, 1, 1}));
}		}
}		}
		} else {
		// Extract lhs slice of size {n, wSizeStep * c}
		// @ [0, (sw * w + dw * kw) * cSize].
		for (int64_t kw = 0; kw < kwSize; ++kw) {
		for (int64_t w = 0; w < wSize; w += wSizeStep) {
		lhsVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(
		loc, lhs,
		/offsets=/
		ArrayRef<int64_t>{0, (w * strideW + kw * dilationW) * cSize},
		/sizes=/ArrayRef<int64_t>{nSize, wSizeStep * cSize},
		/strides=/ArrayRef<int64_t>{1, 1}));
		}
		}
		}
// Extract rhs slice of size {c} @ [kw].		// Extract rhs slice of size {c} @ [kw].
for (int64_t kw = 0; kw < kwSize; ++kw) {		for (int64_t kw = 0; kw < kwSize; ++kw) {
rhsVals.push_back(rewriter.create<vector::ExtractOp>(		rhsVals.push_back(rewriter.create<vector::ExtractOp>(
loc, rhs, /offsets=/ArrayRef<int64_t>{kw}));		loc, rhs, /offsets=/ArrayRef<int64_t>{kw}));
}		}
// Extract res slice: {n, wSizeStep, c} @ [0, w, 0].
		// Extract res slice
		if (!flattenChannelDim) {
		// Regular case: {n, wSizeStep, c} @ [0, w, 0].
for (int64_t w = 0; w < wSize; w += wSizeStep) {		for (int64_t w = 0; w < wSize; w += wSizeStep) {
resVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(		resVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(
loc, res,		loc, res,
/offsets=/ArrayRef<int64_t>{0, w, 0},		/offsets=/ArrayRef<int64_t>{0, w, 0},
/sizes=/ArrayRef<int64_t>{nSize, wSizeStep, cSize},		/sizes=/ArrayRef<int64_t>{nSize, wSizeStep, cSize},
/strides=/ArrayRef<int64_t>{1, 1, 1}));		/strides=/ArrayRef<int64_t>{1, 1, 1}));
}		}
		} else {
		// Flattened case: {n, wSizeStep * c} @ [0, w].
		for (int64_t w = 0; w < wSize; w += wSizeStep) {
		resVals.push_back(rewriter.create<vector::ExtractStridedSliceOp>(
		loc, res,
		/offsets=/ArrayRef<int64_t>{0, w * cSize},
		/sizes=/ArrayRef<int64_t>{nSize, wSizeStep * cSize},
		/strides=/ArrayRef<int64_t>{1, 1}));
		}
		}

auto linearIndex = [&](int64_t kw, int64_t w) {		auto linearIndex = [&](int64_t kw, int64_t w) {
return kw * (wSize / wSizeStep) + w;		return kw * (wSize / wSizeStep) + w;
};		};

// Compute contraction: O{n, w, c} += I{n, sw * w + dw * kw, c} * F{c}		// Compute contraction
		// 1. Regular: O{n, w, c} += I{n, sw * w + dw * kw, c} * F{c}
		// 2. Flattened: O{n, w * c} += I{n, (sw * w + dw * kw) * c} * F{c}
		if (!flattenChannelDim) {
for (int64_t kw = 0; kw < kwSize; ++kw) {		for (int64_t kw = 0; kw < kwSize; ++kw) {
for (int64_t w = 0; w < wSize; w += wSizeStep) {		for (int64_t w = 0; w < wSize; w += wSizeStep) {
resVals[w] = depthwiseConv1dSliceAsMulAcc(rewriter, loc,		resVals[w] = depthwiseConv1dSliceAsMulAcc(rewriter, loc,
lhsVals[linearIndex(kw, w)],		lhsVals[linearIndex(kw, w)],
rhsVals[kw], resVals[w]);		rhsVals[kw], resVals[w]);
}		}
}		}
		} else {
		for (int64_t kw = 0; kw < kwSize; ++kw) {
		for (int64_t w = 0; w < wSize; w += wSizeStep) {
		resVals[w] = depthwiseConv1dFlatSliceAsMulAcc(
		rewriter, loc, lhsVals[linearIndex(kw, w)], rhsVals[kw],
		resVals[w]);
		}
		}
		}

// Its possible we failed to create the Fma.		// Its possible we failed to create the Fma.
if (!llvm::all_of(resVals, [](Value v) { return v; })) {		if (!llvm::all_of(resVals, [](Value v) { return v; })) {
// Manually revert (in reverse order) to avoid leaving a bad IR state.		// Manually revert (in reverse order) to avoid leaving a bad IR state.
for (auto &collection :		for (auto &collection :
{resVals, rhsVals, lhsVals, {res, rhs, lhs, zero}})		{resVals, rhsVals, lhsVals, {res, rhs, lhs, zero}})
for (Value v : collection)		for (Value v : collection)
rewriter.eraseOp(v.getDefiningOp());		rewriter.eraseOp(v.getDefiningOp());
return rewriter.notifyMatchFailure(op, "failed to create FMA");		return rewriter.notifyMatchFailure(op, "failed to create FMA");
}		}

// Write back res slice: {n, wSizeStep, c} @ [0, w, 0].		// Write back res slice. This does not depend on kw.
// This does not depend on kw.		if (!flattenChannelDim) {
		// Regular case: {n, wSizeStep, c} @ [0, w, 0]
for (int64_t w = 0; w < wSize; w += wSizeStep) {		for (int64_t w = 0; w < wSize; w += wSizeStep) {
res = rewriter.create<vector::InsertStridedSliceOp>(		res = rewriter.create<vector::InsertStridedSliceOp>(
loc, resVals[w], res,		loc, resVals[w], res,
/offsets=/ArrayRef<int64_t>{0, w, 0},		/offsets=/ArrayRef<int64_t>{0, w, 0},
/strides=/ArrayRef<int64_t>{1, 1, 1});		/strides=/ArrayRef<int64_t>{1, 1, 1});
}		}
		} else {
		// Flattened case: {n, wSizeStep * c} @ [0, w].
		for (int64_t w = 0; w < wSize; w += wSizeStep) {
		res = rewriter.create<vector::InsertStridedSliceOp>(
		loc, resVals[w], res,
		/offsets=/ArrayRef<int64_t>{0, w * cSize},
		/strides=/ArrayRef<int64_t>{1, 1});
		}
		}
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//
// End vector-only rewrite part		// End vector-only rewrite part
//===------------------------------------------------------------------===//		//===------------------------------------------------------------------===//

		if (flattenChannelDim) {
		// Write back res slice of size {n, w * c} @ [0, 0].
		vector::TransferWriteOp resWrite =
		rewriter.create<vector::TransferWriteOp>(loc, res, resFlat,
		ValueRange{zero, zero});

		// Re-expand shape
		return rewriter
		.create<tensor::ExpandShapeOp>(loc, resShapedType,
		resWrite.getResult(), reassociation)
		.getOperation();
		}
// Write back res slice of size {n, w, c} @ [0, 0, 0].		// Write back res slice of size {n, w, c} @ [0, 0, 0].
return rewriter		return rewriter
.create<vector::TransferWriteOp>(loc, res, resShaped,		.create<vector::TransferWriteOp>(loc, res, resShaped,
ValueRange{zero, zero, zero})		ValueRange{zero, zero, zero})
.getOperation();		.getOperation();
}		}

/// Lower lhs{n, w, c} * rhs{c} -> res{n, w, c} to MulAcc		/// Lower lhs{n, w, c} * rhs{c} -> res{n, w, c} to MulAcc
Show All 14 Lines	Value depthwiseConv1dSliceAsMulAcc(RewriterBase &rewriter, Location loc,

if (resTy.getElementType().isa<FloatType>())		if (resTy.getElementType().isa<FloatType>())
return rewriter.create<vector::FMAOp>(loc, lhs, rhs, res);		return rewriter.create<vector::FMAOp>(loc, lhs, rhs, res);

auto mul = rewriter.create<arith::MulIOp>(loc, lhs, rhs);		auto mul = rewriter.create<arith::MulIOp>(loc, lhs, rhs);
return rewriter.create<arith::AddIOp>(loc, mul, res);		return rewriter.create<arith::AddIOp>(loc, mul, res);
}		}

		/// Lower lhs{n, w * c} * rhs{c} -> res{n, w * c} to MulAcc
		Value depthwiseConv1dFlatSliceAsMulAcc(RewriterBase &rewriter, Location loc,
		Value lhs, Value rhs, Value res) {
		auto rhsTy = rhs.getType().cast<ShapedType>();
		auto resTy = res.getType().cast<ShapedType>();

		lhs = promote(rewriter, loc, lhs, resTy);

		auto rhsSize = rhs.getType().cast<VectorType>().getShape()[0];
		auto resSize = res.getType().cast<VectorType>().getShape()[1];

		SmallVector<int64_t, 16> indicies;
		for (int i = 0; i < resSize / rhsSize; ++i) {
		for (int j = 0; j < rhsSize; ++j)
		indicies.push_back(j);
		}

		rhs = rewriter.create<vector::ShuffleOp>(loc, rhs, rhs, indicies);

		rhs = rewriter.create<vector::BroadcastOp>(
		loc, resTy.clone(rhsTy.getElementType()), rhs);
		rhs = promote(rewriter, loc, rhs, resTy);

		if (!lhs \|\| !rhs)
		return nullptr;

		if (resTy.getElementType().isa<FloatType>())
		return rewriter.create<vector::FMAOp>(loc, lhs, rhs, res);

		auto mul = rewriter.create<arith::MulIOp>(loc, lhs, rhs);
		return rewriter.create<arith::AddIOp>(loc, mul, res);
		}

/// Entry point for non-channeled convolution:		/// Entry point for non-channeled convolution:
/// {{w + kw}, {kw}, {w}}		/// {{w + kw}, {kw}, {w}}
FailureOr<Operation *> generateNonChanneledConv() {		FailureOr<Operation *> generateNonChanneledConv() {
AffineExpr w, kw;		AffineExpr w, kw;
bindDims(ctx, w, kw);		bindDims(ctx, w, kw);
if (!iters({Par(), Red()}))		if (!iters({Par(), Red()}))
return rewriter.notifyMatchFailure(op,		return rewriter.notifyMatchFailure(op,
"failed to match conv::W 1-par 1-red");		"failed to match conv::W 1-par 1-red");
▲ Show 20 Lines • Show All 211 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorize-convolution.mlir

	Show First 20 Lines • Show All 901 Lines • ▼ Show 20 Lines
	// CHECK: %[[V2:.+]] = vector.transpose %[[V0]], [0, 2, 1] : vector<4x2x5xf32> to vector<4x5x2xf32>			// CHECK: %[[V2:.+]] = vector.transpose %[[V0]], [0, 2, 1] : vector<4x2x5xf32> to vector<4x5x2xf32>
	// CHECK: %[[V3:.+]] = vector.transpose %[[V1]], [0, 2, 1] : vector<4x2x3xf32> to vector<4x3x2xf32>			// CHECK: %[[V3:.+]] = vector.transpose %[[V1]], [0, 2, 1] : vector<4x2x3xf32> to vector<4x3x2xf32>
	// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 0, 0], sizes = [4, 3, 2], strides = [1, 1, 1]} : vector<4x5x2xf32> to vector<4x3x2xf32>			// CHECK: %[[V4:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 0, 0], sizes = [4, 3, 2], strides = [1, 1, 1]} : vector<4x5x2xf32> to vector<4x3x2xf32>
	// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 2, 0], sizes = [4, 3, 2], strides = [1, 1, 1]} : vector<4x5x2xf32> to vector<4x3x2xf32>			// CHECK: %[[V5:.+]] = vector.extract_strided_slice %[[V2]] {offsets = [0, 2, 0], sizes = [4, 3, 2], strides = [1, 1, 1]} : vector<4x5x2xf32> to vector<4x3x2xf32>
	// CHECK: %[[V6:.+]] = arith.addf %[[V4]], %[[V3]] : vector<4x3x2xf32>			// CHECK: %[[V6:.+]] = arith.addf %[[V4]], %[[V3]] : vector<4x3x2xf32>
	// CHECK: %[[V7:.+]] = arith.addf %[[V5]], %[[V6]] : vector<4x3x2xf32>			// CHECK: %[[V7:.+]] = arith.addf %[[V5]], %[[V6]] : vector<4x3x2xf32>
	// CHECK: %[[V8:.+]] = vector.transpose %[[V7]], [0, 2, 1] : vector<4x3x2xf32> to vector<4x2x3xf32>			// CHECK: %[[V8:.+]] = vector.transpose %[[V7]], [0, 2, 1] : vector<4x3x2xf32> to vector<4x2x3xf32>
	// CHECK: vector.transfer_write %[[V8:.+]], %[[OUTPUT]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xf32>, memref<4x2x3xf32>			// CHECK: vector.transfer_write %[[V8:.+]], %[[OUTPUT]][%[[Vc0]], %[[Vc0]], %[[Vc0]]] {in_bounds = [true, true, true]} : vector<4x2x3xf32>, memref<4x2x3xf32>

				// -----

				func.func @flatten(%input: tensor<1x8x3xi8>, %filter: tensor<1x3xi8>, %output: tensor<1x8x3xi8>) -> (tensor<1x8x3xi8>) {
				%res = linalg.depthwise_conv_1d_nwc_wc
				{dilations = dense<1> : vector<1xi64>,
				strides = dense<1> : vector<1xi64>}
				ins(%input, %filter : tensor<1x8x3xi8>, tensor<1x3xi8>)
				outs(%output : tensor<1x8x3xi8>) -> tensor<1x8x3xi8>
				return %res : tensor<1x8x3xi8>
				}

				// CHECK-LABEL: func.func @flatten(
				// CHECK-SAME: %[[VAL_0:.*]]: tensor<1x8x3xi8>,
				// CHECK-SAME: %[[VAL_1:.*]]: tensor<1x3xi8>,
				// CHECK-SAME: %[[VAL_2:.*]]: tensor<1x8x3xi8>) -> tensor<1x8x3xi8> {
				// CHECK: %[[VAL_3:.*]] = arith.constant 0 : index
				// CHECK: %[[VAL_4:.*]] = arith.constant 0 : i8
				// CHECK: %[[VAL_5:.*]] = vector.transfer_read %[[VAL_1]]{{\[}}%[[VAL_3]], %[[VAL_3]]], %[[VAL_4]] {in_bounds = [true, true]} : tensor<1x3xi8>, vector<1x3xi8>
				// CHECK: %[[VAL_6:.*]] = tensor.collapse_shape %[[VAL_0]] {{\[\[}}0], [1, 2]] : tensor<1x8x3xi8> into tensor<1x24xi8>
				// CHECK: %[[VAL_7:.*]] = tensor.collapse_shape %[[VAL_2]] {{\[\[}}0], [1, 2]] : tensor<1x8x3xi8> into tensor<1x24xi8>
				// CHECK: %[[VAL_8:.*]] = vector.transfer_read %[[VAL_6]]{{\[}}%[[VAL_3]], %[[VAL_3]]], %[[VAL_4]] {in_bounds = [true, true]} : tensor<1x24xi8>, vector<1x24xi8>
				// CHECK: %[[VAL_9:.*]] = vector.transfer_read %[[VAL_7]]{{\[}}%[[VAL_3]], %[[VAL_3]]], %[[VAL_4]] {in_bounds = [true, true]} : tensor<1x24xi8>, vector<1x24xi8>
				// CHECK: %[[VAL_10:.*]] = vector.extract %[[VAL_5]][0] : vector<1x3xi8>
				// CHECK: %[[VAL_11:.*]] = vector.shuffle %[[VAL_10]], %[[VAL_10]] [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2] : vector<3xi8>, vector<3xi8>
				// CHECK: %[[VAL_12:.*]] = vector.broadcast %[[VAL_11]] : vector<24xi8> to vector<1x24xi8>
				// CHECK: %[[VAL_13:.*]] = arith.muli %[[VAL_8]], %[[VAL_12]] : vector<1x24xi8>
				// CHECK: %[[VAL_14:.*]] = arith.addi %[[VAL_13]], %[[VAL_9]] : vector<1x24xi8>
				// CHECK: %[[VAL_15:.*]] = vector.transfer_write %[[VAL_14]], %[[VAL_7]]{{\[}}%[[VAL_3]], %[[VAL_3]]] {in_bounds = [true, true]} : vector<1x24xi8>, tensor<1x24xi8>
				// CHECK: %[[VAL_16:.*]] = tensor.expand_shape %[[VAL_15]] {{\[\[}}0], [1, 2]] : tensor<1x24xi8> into tensor<1x8x3xi8>
				// CHECK: return %[[VAL_16]] : tensor<1x8x3xi8>
				// CHECK: }