This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/Transforms/
-
mlir/
-
Dialect/
-
Linalg/
-
Transforms/
-
Transforms.h
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
3/3
Bufferize.cpp
-
Transforms.cpp
1/1
Vectorization.cpp
-
test/
-
Dialect/Linalg/
-
Linalg/
-
generalize-pad-tensor.mlir
-
Integration/Dialect/Linalg/CPU/
-
Dialect/
-
Linalg/
-
CPU/
-
test-padtensor.mlir
-
lib/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
TestLinalgTransforms.cpp

Differential D105293

Refactor GenericPadTensorOpVectorizationPattern
ClosedPublic

Authored by cathyzhyi on Jul 1 2021, 9:35 AM.

Download Raw Diff

Details

Reviewers

silvas
nicolasvasilache
springerm
aartbik

Commits

rG35df2f6fbd1a: Refactor GenericPadTensorOpVectorizationPattern

Summary

Refactor the original code to rewrite a PadTensorOp into a
sequence of InitTensorOp, FillOp and InsertSliceOp without
vectorization by default. GenericPadTensorOpVectorizationPattern
provides a customized OptimizeCopyFn to vectorize the
copying step.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

cathyzhyi created this revision.Jul 1 2021, 9:35 AM

Herald added subscribers: dcaballe, cota, mravishankar and 17 others. · View Herald TranscriptJul 1 2021, 9:35 AM

cathyzhyi requested review of this revision.Jul 1 2021, 9:35 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 1 2021, 9:35 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

delete redundant comments

remove redundant type check

Can you add an integration test analogous to mlir/test/Integration/Dialect/Linalg/CPU/test-subtensor-insert.mlir ?

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp
221	nit: Prefer PadTensorOp::Adaptor
235	nit: use llvm::hasSingleElement
254	nit: Use `.` at the end of the comment. Same with the other comments. https://llvm.org/docs/CodingStandards.html#commenting
mlir/test/Dialect/Linalg/bufferize.mlir
270 ↗	(On Diff #355925)	can you give meaningful names to the CHECK values?

This revision now requires changes to proceed.Jul 1 2021, 10:14 AM

Harbormaster completed remote builds in B112019: Diff 355925.Jul 1 2021, 10:43 AM

Note that I asked for a similar rewrite in https://reviews.llvm.org/D102804 but it was eventually not done.
I am wondering whether the pattern you add here should be more generally be the rewrite on tensors I mentioned and then just let existing bufferization kick in?

In D105293#2854047, @nicolasvasilache wrote:

Note that I asked for a similar rewrite in https://reviews.llvm.org/D102804 but it was eventually not done.
I am wondering whether the pattern you add here should be more generally be the rewrite on tensors I mentioned and then just let existing bufferization kick in?

That makes sense to me!

address comments

@cathyzhyi I copyhacked your code into a tensor rewrite here: https://reviews.llvm.org/D105317

It seems fine modulo canonicalizations missing on memref::DimOp + tensor (which also interferes with tensor::DimOp that @springerm is looking at).

Feel free to take whatever makes sense to you and refactor / land.
Depending on where you go with this I'll adapt.

Will pick it up again tomorrow.

Thanks for pushing this!

In D105293#2854084, @silvas wrote:

In D105293#2854047, @nicolasvasilache wrote:

Note that I asked for a similar rewrite in https://reviews.llvm.org/D102804 but it was eventually not done.
I am wondering whether the pattern you add here should be more generally be the rewrite on tensors I mentioned and then just let existing bufferization kick in?

That makes sense to me!

+1 thanks for the suggestion!

Harbormaster completed remote builds in B112081: Diff 356010.Jul 1 2021, 2:34 PM

use rewrite pattern instead

@nicolasvasilache seems with the rewrite pattern version the integration test passed but there are some failures in the unit test. Will need to take a closer look tmr. Here is the unit test.

func @pad_tensor(%arg0: tensor<4x?x2x?xf32>, %arg1: index) -> tensor<4x?x?x?xf32> {
  %cst = constant 0.0 : f32
  %out = linalg.pad_tensor %arg0 low[0, 0, %arg1, 0] high[0, 0, 0, %arg1]  {
  ^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index, %gen_arg4: index):  // no predecessors
    linalg.yield %cst : f32
  } : tensor<4x?x2x?xf32> to tensor<4x?x?x?xf32>
  return %out : tensor<4x?x?x?xf32>
}

The functionality of this pattern is already provided by GenericPadTensorOpVectorizationPattern. It also translates a PadTensorOp into InitTensorOp + FillOp + InsertSliceOp. However, GenericPadTensorOpVectorizationPattern also does a bit more: It tries to generate vectorized alternatives to FillOp and InsertSliceOp. Only if there is not enough static type information, it generates FillOp and InsertSliceOp.

See https://reviews.llvm.org/D103679 for details. (The pattern was extended in subsequent commits, so check on Github for the most recent version.)

Can you run the vectorization pass or is vectorization not desired in your use case? If you want just InitTensorOp + FillOp + InsertSliceOp and no vectorization, I think there should be a way to avoid duplicating that functionality.

This revision now requires changes to proceed.Jul 1 2021, 7:30 PM

Harbormaster completed remote builds in B112132: Diff 356084.Jul 1 2021, 7:48 PM

In D105293#2854620, @cathyzhyi wrote:
@nicolasvasilache seems with the rewrite pattern version the integration test passed but there are some failures in the unit test. Will need to take a closer look tmr. Here is the unit test.
func @pad_tensor(%arg0: tensor<4x?x2x?xf32>, %arg1: index) -> tensor<4x?x?x?xf32> {
  %cst = constant 0.0 : f32
  %out = linalg.pad_tensor %arg0 low[0, 0, %arg1, 0] high[0, 0, 0, %arg1]  {
  ^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index, %gen_arg4: index):  // no predecessors
    linalg.yield %cst : f32
  } : tensor<4x?x2x?xf32> to tensor<4x?x?x?xf32>
  return %out : tensor<4x?x?x?xf32>
}

It seems the version @springerm has for vectorization is already fitting the bill and would need to be split out from the vectorization part.
@springerm , indeed if we get to bufferization and still have this form, we should bufferize without relying on vectorization.

After looking at this in more detail, the easiest way would be to add a boolean (template) parameter to GenericPadTensorOpVectorizationPattern, which enables or disables vectorization (and maybe rename the pattern). Then register the pattern in applyEnablingTransformations. However, then we would have a dependency of ComprehensiveBufferize.cpp on Vectorization.cpp. Not sure if that is a good idea. I don't see a better way of factoring out common functionality.

Alternatively, just duplicate the pattern (without vectorization) as this revision does at the moment. With the above code suggestion, we would be duplicating around 50 lines of code. Maybe not too bad.

@nicolasvasilache What do you think?

mlir/lib/Dialect/Linalg/Transforms/ComprehensiveBufferize.cpp
2490–2507 ↗	(On Diff #356084)	Can be replaced with: Value paddingValue = op.getConstantPaddingValue(); if (!paddingValue) return failure();

In D105293#2854818, @springerm wrote:

After looking at this in more detail, the easiest way would be to add a boolean (template) parameter to GenericPadTensorOpVectorizationPattern, which enables or disables vectorization (and maybe rename the pattern). Then register the pattern in applyEnablingTransformations. However, then we would have a dependency of ComprehensiveBufferize.cpp on Vectorization.cpp. Not sure if that is a good idea. I don't see a better way of factoring out common functionality.

Alternatively, just duplicate the pattern (without vectorization) as this revision does at the moment. With the above code suggestion, we would be duplicating around 50 lines of code. Maybe not too bad.

@nicolasvasilache What do you think?

Indeed, so what I have in my local branch is quite close:

the FillOp is not a "tryVectorize" but just a "createFillOrGenerateOp", it is unconditionally always there
I just deleted the copy vectorization.

It seems this could be refactored and moved to Linalg/Transforms/Transforms.h and take a lambda for the copy.
By default it would take nothing (to just lower to other tensor ops).
The vectorization pattern would just reconfigure the pattern to add its own copy vectorization mechanism.

refactor and reuse existing code as suggested.

Herald added a reviewer: aartbik. · View Herald TranscriptJul 2 2021, 11:18 AM

update comments that's no longer relevant.

@silvas @nicolasvasilache @springerm Any suggestion on which pass to put this pattern in? There is a LinalgGeneralizationPass which currently only applies to patterns lowering to Linalg.generic.

Harbormaster completed remote builds in B112236: Diff 356232.Jul 2 2021, 11:57 AM

In D105293#2855869, @cathyzhyi wrote:

@silvas @nicolasvasilache @springerm Any suggestion on which pass to put this pattern in? There is a LinalgGeneralizationPass which currently only applies to patterns lowering to Linalg.generic.

The pattern will be used by whichever passes need it, such as ComprehensiveBufferize, Bufferize. (you should be able to just add this pattern into Bufferize.cpp in your original patch and let dialect conversion just work.

add the pattern into linalg bufferization pass

Oh, sorry, I thought you meant which woudl use it. It would be nice if it was in a generic helper file. You can probably put it next to PadTensorOpTransformationPattern for consistency.

Harbormaster completed remote builds in B112246: Diff 356246.Jul 2 2021, 12:55 PM

move the new pattern's source code next to PadTensorOpTransformationPattern for
consistency

LGTM. Wait for @springerm / @nicolasvasilache approval too.

nicolasvasilache accepted this revision.Jul 2 2021, 2:00 PM

Harbormaster completed remote builds in B112262: Diff 356269.Jul 2 2021, 2:17 PM

Nice refactoring!

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
684–688	can be deleted

This revision is now accepted and ready to land.Jul 2 2021, 6:23 PM

delete redundant comments as suggested

cathyzhyi marked an inline comment as done.Jul 2 2021, 7:42 PM

Harbormaster completed remote builds in B112302: Diff 356314.Jul 2 2021, 8:08 PM

This has reached consensus, I suspect @cathyzhyi does not yet have write privilege to github, committing on her behalf.

Closed by commit rG35df2f6fbd1a: Refactor GenericPadTensorOpVectorizationPattern (authored by cathyzhyi, committed by nicolasvasilache). · Explain WhyJul 7 2021, 4:45 AM

This revision was automatically updated to reflect the committed changes.

nicolasvasilache added a commit: rG35df2f6fbd1a: Refactor GenericPadTensorOpVectorizationPattern.

cathyzhyi mentioned this in D105642: Mark TensorDialect legal and PadTensor op illegal.Jul 8 2021, 12:14 PM

silvas mentioned this in rG7c35aae35b2c: Mark TensorDialect legal and PadTensor op illegal.Jul 8 2021, 3:02 PM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

Transforms.h

22 lines

lib/

Dialect/

Linalg/

Transforms/

Bufferize.cpp

1 line

Transforms.cpp

89 lines

Vectorization.cpp

101 lines

test/

Dialect/

Linalg/

generalize-pad-tensor.mlir

46 lines

Integration/

Dialect/

Linalg/

CPU/

test-padtensor.mlir

33 lines

lib/

Dialect/

Linalg/

TestLinalgTransforms.cpp

12 lines

Diff 356919

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

	Show First 20 Lines • Show All 877 Lines • ▼ Show 20 Lines
	/// `linalg.generic`.			/// `linalg.generic`.
	struct PadTensorOpTransformationPattern : public OpRewritePattern<PadTensorOp> {			struct PadTensorOpTransformationPattern : public OpRewritePattern<PadTensorOp> {
	using OpRewritePattern<PadTensorOp>::OpRewritePattern;			using OpRewritePattern<PadTensorOp>::OpRewritePattern;

	LogicalResult matchAndRewrite(PadTensorOp padOp,			LogicalResult matchAndRewrite(PadTensorOp padOp,
	PatternRewriter &rewriter) const override;			PatternRewriter &rewriter) const override;
	};			};

				using OptimizeCopyFn =
				std::function<LogicalResult(PatternRewriter &, PadTensorOp, Value)>;

				/// Rewrite a PadTensorOp into a sequence of InitTensorOp, FillOp and
				/// InsertSliceOp. For now, only constant padding values are supported.
				/// `OptimizeCopyFn` can be used to customize copying step optimization.
				struct GeneralizePadTensorOpPattern : public OpRewritePattern<PadTensorOp> {
				GeneralizePadTensorOpPattern(MLIRContext *context,
				OptimizeCopyFn optimizeCopyFn = nullptr,
				PatternBenefit benefit = 1)
				: OpRewritePattern<PadTensorOp>(context, benefit),
				optimizeCopyFn(optimizeCopyFn) {}
				LogicalResult matchAndRewrite(PadTensorOp padOp,
				PatternRewriter &rewriter) const override;

				protected:
				OptimizeCopyFn optimizeCopyFn;
				Value createFillOrGenerateOp(PatternRewriter &rewriter, PadTensorOp padOp,
				Value dest,
				const SmallVector<Value> &dynSizes) const;
				};

	/// Populates `patterns` with patterns that vectorize linalg.pad_tensor.			/// Populates `patterns` with patterns that vectorize linalg.pad_tensor.
	/// These patterns are meant to apply in a complementary fashion. Benefits			/// These patterns are meant to apply in a complementary fashion. Benefits
	/// are used to encode a certain ordering of pattern application. To avoid			/// are used to encode a certain ordering of pattern application. To avoid
	/// scattering magic constants throughout the code base, the patterns must be			/// scattering magic constants throughout the code base, the patterns must be
	/// added with this function. `baseBenefit` can be used to offset the benefit			/// added with this function. `baseBenefit` can be used to offset the benefit
	/// of all PadTensorOp vectorization patterns by a certain value.			/// of all PadTensorOp vectorization patterns by a certain value.
	void populatePadTensorOpVectorizationPatterns(			void populatePadTensorOpVectorizationPatterns(
	RewritePatternSet &patterns, PatternBenefit baseBenefit = 1);			RewritePatternSet &patterns, PatternBenefit baseBenefit = 1);
	▲ Show 20 Lines • Show All 201 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp

Show First 20 Lines • Show All 212 Lines • ▼ Show 20 Lines	matchAndRewrite(LinalgOp op, ArrayRef<Value> operands,
Location loc = op.getLoc();		Location loc = op.getLoc();
SmallVector<Value, 2> newOutputBuffers;		SmallVector<Value, 2> newOutputBuffers;

if (failed(allocateBuffersForResults(loc, op, adaptor.outputs(),		if (failed(allocateBuffersForResults(loc, op, adaptor.outputs(),
newOutputBuffers, rewriter))) {		newOutputBuffers, rewriter))) {
return op.emitOpError()		return op.emitOpError()
<< "Failed to allocate buffers for tensor results.";		<< "Failed to allocate buffers for tensor results.";
}		}

		silvasUnsubmitted Done Reply Inline Actions nit: Prefer PadTensorOp::Adaptor silvas: nit: Prefer PadTensorOp::Adaptor
// Delegate to the linalg generic pattern.		// Delegate to the linalg generic pattern.
if (auto genericOp = dyn_cast<linalg::GenericOp>(*op)) {		if (auto genericOp = dyn_cast<linalg::GenericOp>(*op)) {
finalizeBufferAllocationForGenericOp(rewriter, genericOp,		finalizeBufferAllocationForGenericOp(rewriter, genericOp,
adaptor.inputs(), newOutputBuffers);		adaptor.inputs(), newOutputBuffers);
return success();		return success();
}		}

finalizeBufferAllocation(rewriter, op, adaptor.inputs(), newOutputBuffers);		finalizeBufferAllocation(rewriter, op, adaptor.inputs(), newOutputBuffers);
return success();		return success();
}		}
};		};

/// Convert `extract_slice %t [offsets][sizes][strides] -> %st` to an		/// Convert `extract_slice %t [offsets][sizes][strides] -> %st` to an
/// alloc + copy pattern.		/// alloc + copy pattern.
		silvasUnsubmitted Done Reply Inline Actions nit: use llvm::hasSingleElement silvas: nit: use llvm::hasSingleElement
/// ```		/// ```
/// %a = alloc(sizes)		/// %a = alloc(sizes)
/// %sv = subview %source [offsets][sizes][strides]		/// %sv = subview %source [offsets][sizes][strides]
/// linalg_copy(%sv, %a)		/// linalg_copy(%sv, %a)
/// ```		/// ```
///		///
/// This pattern is arguable a std pattern once linalg::CopyOp becomes		/// This pattern is arguable a std pattern once linalg::CopyOp becomes
/// std::CopyOp.		/// std::CopyOp.
class ExtractSliceOpConverter		class ExtractSliceOpConverter
: public OpConversionPattern<tensor::ExtractSliceOp> {		: public OpConversionPattern<tensor::ExtractSliceOp> {
public:		public:
using OpConversionPattern<tensor::ExtractSliceOp>::OpConversionPattern;		using OpConversionPattern<tensor::ExtractSliceOp>::OpConversionPattern;

LogicalResult		LogicalResult
matchAndRewrite(tensor::ExtractSliceOp op, ArrayRef<Value> operands,		matchAndRewrite(tensor::ExtractSliceOp op, ArrayRef<Value> operands,
ConversionPatternRewriter &rewriter) const final {		ConversionPatternRewriter &rewriter) const final {
tensor::ExtractSliceOpAdaptor adaptor(operands, op->getAttrDictionary());		tensor::ExtractSliceOpAdaptor adaptor(operands, op->getAttrDictionary());
Value sourceMemref = adaptor.source();		Value sourceMemref = adaptor.source();
assert(sourceMemref.getType().isa<MemRefType>());		assert(sourceMemref.getType().isa<MemRefType>());
		silvasUnsubmitted Done Reply Inline Actions nit: Use `.` at the end of the comment. Same with the other comments. https://llvm.org/docs/CodingStandards.html#commenting silvas: nit: Use `.` at the end of the comment. Same with the other comments. https://llvm.

MemRefType subviewMemRefType =		MemRefType subviewMemRefType =
getTypeConverter()->convertType(op.getType()).cast<MemRefType>();		getTypeConverter()->convertType(op.getType()).cast<MemRefType>();
// op.sizes() capture exactly the dynamic alloc operands matching the		// op.sizes() capture exactly the dynamic alloc operands matching the
// subviewMemRefType thanks to subview/slice canonicalization and		// subviewMemRefType thanks to subview/slice canonicalization and
// verification.		// verification.
Value alloc = rewriter.create<memref::AllocOp>(		Value alloc = rewriter.create<memref::AllocOp>(
op.getLoc(), subviewMemRefType, op.sizes());		op.getLoc(), subviewMemRefType, op.sizes());
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	void runOnOperation() override {
// Mark all Linalg operations illegal as long as they work on tensors.		// Mark all Linalg operations illegal as long as they work on tensors.
auto isLegalOperation = [&](Operation *op) {		auto isLegalOperation = [&](Operation *op) {
return typeConverter.isLegal(op);		return typeConverter.isLegal(op);
};		};
target.addDynamicallyLegalDialect<linalg::LinalgDialect>(isLegalOperation);		target.addDynamicallyLegalDialect<linalg::LinalgDialect>(isLegalOperation);
target.addDynamicallyLegalOp<ConstantOp>(isLegalOperation);		target.addDynamicallyLegalOp<ConstantOp>(isLegalOperation);

RewritePatternSet patterns(&context);		RewritePatternSet patterns(&context);
		patterns.add<GeneralizePadTensorOpPattern>(patterns.getContext());
populateLinalgBufferizePatterns(typeConverter, patterns);		populateLinalgBufferizePatterns(typeConverter, patterns);
if (failed(applyPartialConversion(getOperation(), target,		if (failed(applyPartialConversion(getOperation(), target,
std::move(patterns))))		std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}
};		};
} // end anonymous namespace		} // end anonymous namespace

Show All 19 Lines

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

Show First 20 Lines • Show All 693 Lines • ▼ Show 20 Lines	rewriter.replaceOpWithNewOp<linalg::GenericOp>(
getNParallelLoopsAttrs(resultShapedType.getRank()),		getNParallelLoopsAttrs(resultShapedType.getRank()),
[&](OpBuilder &nestedBuilder, Location nestedLoc, ValueRange args) {		[&](OpBuilder &nestedBuilder, Location nestedLoc, ValueRange args) {
nestedBuilder.create<linalg::YieldOp>(nestedLoc, args[0]);		nestedBuilder.create<linalg::YieldOp>(nestedLoc, args[0]);
});		});

return success();		return success();
}		}

		/// Filling `dest` using FillOp constant padding value if possible.
		/// Otherwise, generate a tensor::GenerateOp.
		Value GeneralizePadTensorOpPattern::createFillOrGenerateOp(
		PatternRewriter &rewriter, PadTensorOp padOp, Value dest,
		const SmallVector<Value> &dynSizes) const {
		auto padValue = padOp.getConstantPaddingValue();
		if (padValue)
		return rewriter.create<FillOp>(padOp.getLoc(), padValue, dest).result();

		// Fill could not be optimized: Lower to tensor::GenerateOp with region.
		auto generateOp = rewriter.create<tensor::GenerateOp>(
		padOp.getLoc(), padOp.getResultType(), dynSizes);
		// Copy region to new op.
		BlockAndValueMapping bvm;
		padOp.region().cloneInto(&generateOp.getRegion(), bvm);
		// Rewrite linalg::YieldOp to tensor::YieldOp.
		OpBuilder::InsertionGuard guard(rewriter);
		auto yieldOp =
		dyn_cast<linalg::YieldOp>(generateOp.getRegion().front().getTerminator());
		assert(yieldOp && "malformed PadTensorOp: expected YieldOp terminator");
		assert(yieldOp.values().size() == 1);
		rewriter.setInsertionPoint(yieldOp);
		rewriter.replaceOpWithNewOp<tensor::YieldOp>(yieldOp, yieldOp.values()[0]);
		return generateOp;
		}

		LogicalResult
		GeneralizePadTensorOpPattern::matchAndRewrite(PadTensorOp padOp,
		PatternRewriter &rewriter) const {
		// Given an OpFoldResult, return an index-typed value.
		auto getIdxValue = [&](OpFoldResult ofr) {
		if (auto val = ofr.dyn_cast<Value>())
		return val;
		return rewriter
		.create<ConstantIndexOp>(
		padOp.getLoc(), ofr.get<Attribute>().cast<IntegerAttr>().getInt())
		.getResult();
		};

		auto resultType = padOp.getResultType();
		// Compute size of InitTensorOp. Any combination of static/dynamic is
		// supported.
		SmallVector<Value> dynSizes;
		SmallVector<int64_t> staticSizes;
		for (unsigned dim = 0; dim < resultType.getRank(); ++dim) {
		if (resultType.isDynamicDim(dim)) {
		auto srcSize = rewriter.createOrFold<tensor::DimOp>(padOp.getLoc(),
		padOp.source(), dim);
		// Add low and high padding value.
		auto plusLow = rewriter.createOrFold<AddIOp>(
		padOp.getLoc(), srcSize, getIdxValue(padOp.getMixedLowPad()[dim]));
		auto plusHigh = rewriter.createOrFold<AddIOp>(
		padOp.getLoc(), plusLow, getIdxValue(padOp.getMixedHighPad()[dim]));
		dynSizes.push_back(plusHigh);
		}
		staticSizes.push_back(resultType.getDimSize(dim));
		}

		// Init tensor and fill it with padding.
		Value init = rewriter.create<InitTensorOp>(
		padOp.getLoc(), dynSizes, staticSizes, resultType.getElementType());
		Value fill = createFillOrGenerateOp(rewriter, padOp, init, dynSizes);

		// Try optimize the copy of source.
		if (optimizeCopyFn && optimizeCopyFn(rewriter, padOp, fill).succeeded())
		return success();

		// PadTensorOps cannot be optimized. Generate a InsertSliceOp instead
		// for copying the PadOp source.
		auto sourceType = padOp.getSourceType();
		// Compute size of source of PadTensorOp.
		SmallVector<OpFoldResult> srcSizes;
		for (unsigned dim = 0; dim < sourceType.getRank(); ++dim) {
		if (sourceType.isDynamicDim(dim)) {
		srcSizes.push_back(rewriter.createOrFold<tensor::DimOp>(
		padOp.getLoc(), padOp.source(), dim));
		} else {
		srcSizes.push_back(rewriter.getIndexAttr(sourceType.getDimSize(dim)));
		}
		}
		// Strides of InsertSliceOp are all 1.
		SmallVector<OpFoldResult> strides(sourceType.getRank(),
		rewriter.getIndexAttr(1));
		rewriter.replaceOpWithNewOp<tensor::InsertSliceOp>(
		padOp, padOp.source(), fill, padOp.getMixedLowPad(), srcSizes, strides);

		return success();
		}

/// Given an OpFoldResult, return a Value. If the OpFoldResult is an Attribute,		/// Given an OpFoldResult, return a Value. If the OpFoldResult is an Attribute,
/// it must be of type Integer.		/// it must be of type Integer.
static Value asValue(OpBuilder &builder, Location loc, OpFoldResult ofr) {		static Value asValue(OpBuilder &builder, Location loc, OpFoldResult ofr) {
if (auto val = ofr.dyn_cast<Value>())		if (auto val = ofr.dyn_cast<Value>())
return val;		return val;
auto intVal = getConstantIntValue(ofr);		auto intVal = getConstantIntValue(ofr);
assert(intVal && "expected Value or IntegerAttr");		assert(intVal && "expected Value or IntegerAttr");
return builder.create<ConstantIndexOp>(loc, *intVal);		return builder.create<ConstantIndexOp>(loc, *intVal);
▲ Show 20 Lines • Show All 228 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 675 Lines • ▼ Show 20 Lines	static SmallVector<Value> ofrToIndexValues(OpBuilder &builder, Location loc,
});		});
return result;		return result;
}		}

/// Rewrite a PadTensorOp into a sequence of InitTensorOp, FillOp and		/// Rewrite a PadTensorOp into a sequence of InitTensorOp, FillOp and
/// InsertSliceOp. For now, only constant padding values are supported.		/// InsertSliceOp. For now, only constant padding values are supported.
/// If there is enough static type information, TransferReadOps and		/// If there is enough static type information, TransferReadOps and
/// TransferWriteOps may be generated instead of InsertSliceOps.		/// TransferWriteOps may be generated instead of InsertSliceOps.
struct GenericPadTensorOpVectorizationPattern		struct GenericPadTensorOpVectorizationPattern
: public OpRewritePattern<PadTensorOp> {		: public GeneralizePadTensorOpPattern {
using OpRewritePattern<PadTensorOp>::OpRewritePattern;		GenericPadTensorOpVectorizationPattern(MLIRContext *context,
		PatternBenefit benefit = 1)
LogicalResult matchAndRewrite(PadTensorOp padOp,		: GeneralizePadTensorOpPattern(context, tryVectorizeCopy, benefit) {}
		springermUnsubmitted Done Reply Inline Actions can be deleted springerm: can be deleted
PatternRewriter &rewriter) const final {
// Given an OpFoldResult, return an index-typed value.
auto getIdxValue = [&](OpFoldResult ofr) {
if (auto val = ofr.dyn_cast<Value>())
return val;
return rewriter.create<ConstantIndexOp>(
padOp.getLoc(), getIntFromAttr(ofr.get<Attribute>())).getResult();
};

auto resultType = padOp.getResultType();
// Compute size of InitTensorOp. Any combination of static/dynamic is
// supported.
SmallVector<Value> dynSizes;
SmallVector<int64_t> staticSizes;
for (unsigned dim = 0; dim < resultType.getRank(); ++dim) {
if (resultType.isDynamicDim(dim)) {
auto srcSize = rewriter.createOrFold<tensor::DimOp>(
padOp.getLoc(), padOp.source(), dim);
// Add low and high padding value.
auto plusLow = rewriter.createOrFold<AddIOp>(
padOp.getLoc(), srcSize, getIdxValue(padOp.getMixedLowPad()[dim]));
auto plusHigh = rewriter.createOrFold<AddIOp>(
padOp.getLoc(), plusLow, getIdxValue(padOp.getMixedHighPad()[dim]));
dynSizes.push_back(plusHigh);
}
staticSizes.push_back(resultType.getDimSize(dim));
}

// Init tensor and fill it with padding.
Value init = rewriter.create<InitTensorOp>(
padOp.getLoc(), dynSizes, staticSizes, resultType.getElementType());
Value fill = tryVectorizeFill(rewriter, padOp, init, dynSizes);

// Try vectorizing the copy of source.
if (tryVectorizeCopy(rewriter, padOp, fill).succeeded())
return success();

// Neither source type nor PadTensorOp result type have static shape. Such
// PadTensorOps cannot be vectorized. Generate a InsertSliceOp instead
// for copying the PadOp source.

auto sourceType = padOp.getSourceType();
// Compute size of source of PadTensorOp.
SmallVector<OpFoldResult> srcSizes;
for (unsigned dim = 0; dim < sourceType.getRank(); ++dim) {
if (sourceType.isDynamicDim(dim)) {
srcSizes.push_back(rewriter.createOrFold<tensor::DimOp>(
padOp.getLoc(), padOp.source(), dim));
} else {
srcSizes.push_back(rewriter.getIndexAttr(sourceType.getDimSize(dim)));
}
}
// Strides of InsertSliceOp are all 1.
SmallVector<OpFoldResult> strides(sourceType.getRank(),
rewriter.getIndexAttr(1));
rewriter.replaceOpWithNewOp<tensor::InsertSliceOp>(
padOp, padOp.source(), fill, padOp.getMixedLowPad(), srcSizes, strides);

return success();
}

/// Vectorize the filling of `dest`. This is possible if the padOp is padding
/// with a constant value. Otherwise, generate a tensor::GenerateOp.
Value tryVectorizeFill(PatternRewriter &rewriter, PadTensorOp padOp,
Value dest, const SmallVector<Value> &dynSizes) const {
// Fill can be vectorized if padValue is a constant. (If there is enough
// static type information, the FillOp will be vectorized by another
// pattern.)
auto padValue = padOp.getConstantPaddingValue();
if (padValue)
return rewriter.create<FillOp>(padOp.getLoc(), padValue, dest).result();

// Fill could not be vectorized: Lower to tensor::GenerateOp with region.
auto generateOp = rewriter.create<tensor::GenerateOp>(
padOp.getLoc(), padOp.getResultType(), dynSizes);
// Copy region to new op.
BlockAndValueMapping bvm;
padOp.region().cloneInto(&generateOp.getRegion(), bvm);
// Rewrite linalg::YieldOp to tensor::YieldOp.
OpBuilder::InsertionGuard guard(rewriter);
auto yieldOp = dyn_cast<linalg::YieldOp>(
generateOp.getRegion().front().getTerminator());
assert(yieldOp && "malformed PadTensorOp: expected YieldOp terminator");
assert(yieldOp.values().size() == 1);
rewriter.setInsertionPoint(yieldOp);
rewriter.replaceOpWithNewOp<tensor::YieldOp>(yieldOp, yieldOp.values()[0]);
return generateOp;
}

/// Vectorize the copying of a PadTensorOp's source. This is possible if each		/// Vectorize the copying of a PadTensorOp's source. This is possible if each
/// dimension size is statically know in the source type or the result type		/// dimension size is statically know in the source type or the result type
/// (or both).		/// (or both).
LogicalResult tryVectorizeCopy(PatternRewriter &rewriter, PadTensorOp padOp,		static LogicalResult tryVectorizeCopy(PatternRewriter &rewriter,
Value dest) const {		PadTensorOp padOp, Value dest) {
auto sourceType = padOp.getSourceType();		auto sourceType = padOp.getSourceType();
auto resultType = padOp.getResultType();		auto resultType = padOp.getResultType();

// Copy cannot be vectorized if pad value is non-constant and source shape		// Copy cannot be vectorized if pad value is non-constant and source shape
// is dynamic. In case of a dynamic source shape, padding must be appended		// is dynamic. In case of a dynamic source shape, padding must be appended
// by TransferReadOp, but TransferReadOp supports only constant padding.		// by TransferReadOp, but TransferReadOp supports only constant padding.
auto padValue = padOp.getConstantPaddingValue();		auto padValue = padOp.getConstantPaddingValue();
if (!padValue) {		if (!padValue) {
▲ Show 20 Lines • Show All 673 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/generalize-pad-tensor.mlir

This file was added.

				// RUN: mlir-opt -split-input-file --test-linalg-transform-patterns="test-generalize-pad-tensor" %s \| FileCheck --check-prefix=CHECK %s

				// CHECK-LABEL: func @generalize_pad_tensor_static_shape(
				// CHECK-SAME: %[[IN:.*]]: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {
				// CHECK: %[[C0:.*]] = constant 0.000000e+00 : f32
				// CHECK: %[[INIT:.*]] = linalg.init_tensor [1, 32, 32, 1] : tensor<1x32x32x1xf32>
				// CHECK: %[[FILL:.*]] = linalg.fill(%[[C0]], %[[INIT]]) : f32, tensor<1x32x32x1xf32> -> tensor<1x32x32x1xf32>
				// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]][0, 2, 2, 0] [1, 28, 28, 1] [1, 1, 1, 1] : tensor<1x28x28x1xf32> into tensor<1x32x32x1xf32>
				// CHECK: return %[[PADDED]] : tensor<1x32x32x1xf32>
				func @generalize_pad_tensor_static_shape(%arg0: tensor<1x28x28x1xf32>) -> tensor<1x32x32x1xf32> {
				%cst = constant 0.000000e+00 : f32
				%0 = linalg.pad_tensor %arg0 low[0, 2, 2, 0] high[0, 2, 2, 0] {
				^bb0(%arg1: index, %arg2: index, %arg3: index, %arg4: index): // no predecessors
				linalg.yield %cst : f32
				} : tensor<1x28x28x1xf32> to tensor<1x32x32x1xf32>
				return %0 : tensor<1x32x32x1xf32>
				}

				// CHECK-LABEL: func @generalize_pad_tensor_dynamic_shape(
				// CHECK-SAME: %[[IN:.*]]: tensor<4x?x2x?xf32>,
				// CHECK-SAME: %[[OFFSET:.*]]: index) -> tensor<4x?x?x?xf32> {
				// CHECK: %[[C0:.*]] = constant 0 : index
				// CHECK: %[[CST:.*]] = constant 0.000000e+00 : f32
				// CHECK: %[[C2:.*]] = constant 2 : index
				// CHECK: %[[C1:.*]] = constant 1 : index
				// CHECK: %[[C3:.*]] = constant 3 : index
				// CHECK: %[[DIM1:.*]] = tensor.dim %[[IN]], %[[C1]] : tensor<4x?x2x?xf32>
				// CHECK: %[[OUT_DIM2:.*]] = addi %[[OFFSET]], %[[C2]] : index
				// CHECK: %[[DIM3:.*]] = tensor.dim %[[IN]], %[[C3]] : tensor<4x?x2x?xf32>
				// CHECK: %[[OUT_DIM3:.*]] = addi %[[DIM3]], %[[OFFSET]] : index
				// CHECK: %[[INIT:.*]] = linalg.init_tensor [4, %[[DIM1]], %[[OUT_DIM2]], %[[OUT_DIM3]]] : tensor<4x?x?x?xf32>
				// CHECK: %[[FILL:.*]] = linalg.fill(%[[CST]], %[[INIT]]) : f32, tensor<4x?x?x?xf32> -> tensor<4x?x?x?xf32>
				// CHECK: %[[DIM1_1:.*]] = tensor.dim %[[IN]], %[[C1]] : tensor<4x?x2x?xf32>
				// CHECK: %[[DIM3_1:.*]] = tensor.dim %[[IN]], %[[C3]] : tensor<4x?x2x?xf32>
				// CHECK: %[[PADDED:.*]] = tensor.insert_slice %[[IN]] into %[[FILL]]{{\[}}%[[C0]], %[[C0]], %[[OFFSET]], %[[C0]]] [4, %[[DIM1_1]], 2, %[[DIM3_1]]] [1, 1, 1, 1] : tensor<4x?x2x?xf32> into tensor<4x?x?x?xf32>
				// CHECK: return %[[PADDED]] : tensor<4x?x?x?xf32>
				// CHECK: }
				func @generalize_pad_tensor_dynamic_shape(%arg0: tensor<4x?x2x?xf32>, %arg1: index) -> tensor<4x?x?x?xf32> {
				%c0 = constant 0 : index
				%cst = constant 0.0 : f32
				%out = linalg.pad_tensor %arg0 low[%c0, %c0, %arg1, %c0] high[%c0, %c0, %c0, %arg1] {
				^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index, %gen_arg4: index): // no predecessors
				linalg.yield %cst : f32
				} : tensor<4x?x2x?xf32> to tensor<4x?x?x?xf32>
				return %out : tensor<4x?x?x?xf32>
				}

mlir/test/Integration/Dialect/Linalg/CPU/test-padtensor.mlir

This file was added.

				// RUN: mlir-opt %s -linalg-bufferize -std-bufferize \
				// RUN: -tensor-constant-bufferize -tensor-bufferize -func-bufferize \
				// RUN: -finalizing-bufferize \
				// RUN: -convert-linalg-to-loops -convert-scf-to-std -convert-linalg-to-llvm -convert-std-to-llvm \| \
				// RUN: mlir-cpu-runner -e main -entry-point-result=void \
				// RUN: -shared-libs=%mlir_integration_test_dir/libmlir_runner_utils%shlibext \
				// RUN: \| FileCheck %s


				func @main() {
				%const = constant dense<[[[1.0, 2.0, 3.0], [2.0, 3.0, 4.0]]]> : tensor<1x2x3xf32>
				%dynamic = tensor.cast %const: tensor<1x2x3xf32> to tensor<1x?x3xf32>
				%offset = constant 2 : index
				%cst = constant 2.3 : f32
				%c0 = constant 0 : index
				%out = linalg.pad_tensor %dynamic low[%c0, %offset, %c0] high[%c0, %c0, %offset] {
				^bb0(%gen_arg1: index, %gen_arg2: index, %gen_arg3: index): // no predecessors
				linalg.yield %cst : f32
				} : tensor<1x?x3xf32> to tensor<1x?x?xf32>
				%unranked = tensor.cast %out: tensor<1x?x?xf32> to tensor<*xf32>
				call @print_memref_f32(%unranked) : (tensor<*xf32>) -> ()

				// CHECK: Unranked Memref base@ = {{0x[-9a-f]*}}
				// CHECK-SAME: rank = 3 offset = 0 sizes = [1, 4, 5] strides = [20, 5, 1] data =
				// CHECK-NEXT{LITERAL}: [[[2.3, 2.3, 2.3, 2.3, 2.3],
				// CHECK-NEXT: [2.3, 2.3, 2.3, 2.3, 2.3],
				// CHECK-NEXT: [1, 2, 3, 2.3, 2.3],
				// CHECK-NEXT: [2, 3, 4, 2.3, 2.3]]]

				return
				}

				func private @print_memref_f32(%ptr : tensor<*xf32>)

mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp

Show First 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	Option<bool> testTileAndPadPattern{
llvm::cl::desc("Test tile and pad pattern"), llvm::cl::init(false)};		llvm::cl::desc("Test tile and pad pattern"), llvm::cl::init(false)};
Option<int> testHoistPadding{*this, "test-hoist-padding",		Option<int> testHoistPadding{*this, "test-hoist-padding",
llvm::cl::desc("Test hoist padding"),		llvm::cl::desc("Test hoist padding"),
llvm::cl::init(0)};		llvm::cl::init(0)};
Option<bool> testTransformPadTensor{		Option<bool> testTransformPadTensor{
*this, "test-transform-pad-tensor",		*this, "test-transform-pad-tensor",
llvm::cl::desc("Test transform pad tensor by copying with generic ops"),		llvm::cl::desc("Test transform pad tensor by copying with generic ops"),
llvm::cl::init(false)};		llvm::cl::init(false)};
		Option<bool> testGeneralizePadTensor{
		*this, "test-generalize-pad-tensor",
		llvm::cl::desc("Test transform pad tensor by copying with generic ops"),
		llvm::cl::init(false)};
Option<bool> testSwapSubTensorPadTensor{		Option<bool> testSwapSubTensorPadTensor{
*this, "test-swap-subtensor-padtensor",		*this, "test-swap-subtensor-padtensor",
llvm::cl::desc("Test rewrite of subtensor(pad_tensor) into "		llvm::cl::desc("Test rewrite of subtensor(pad_tensor) into "
"pad_tensor(subtensor)"),		"pad_tensor(subtensor)"),
llvm::cl::init(false)};		llvm::cl::init(false)};
ListOption<int64_t> tileSizesForPadding{		ListOption<int64_t> tileSizesForPadding{
*this, "tile-sizes-for-padding",		*this, "tile-sizes-for-padding",
llvm::cl::desc("Linalg tile sizes when tile+pad"), llvm::cl::ZeroOrMore,		llvm::cl::desc("Linalg tile sizes when tile+pad"), llvm::cl::ZeroOrMore,
▲ Show 20 Lines • Show All 417 Lines • ▼ Show 20 Lines
}		}

static void applyPadTensorToGenericPatterns(FuncOp funcOp) {		static void applyPadTensorToGenericPatterns(FuncOp funcOp) {
RewritePatternSet patterns(funcOp.getContext());		RewritePatternSet patterns(funcOp.getContext());
patterns.add<PadTensorOpTransformationPattern>(funcOp.getContext());		patterns.add<PadTensorOpTransformationPattern>(funcOp.getContext());
(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));		(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
}		}

		static void applyGeneralizePadTensorPatterns(FuncOp funcOp) {
		RewritePatternSet patterns(funcOp.getContext());
		patterns.add<GeneralizePadTensorOpPattern>(funcOp.getContext());
		(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
		}

static void applyExtractSliceOfPadTensorSwapPattern(FuncOp funcOp) {		static void applyExtractSliceOfPadTensorSwapPattern(FuncOp funcOp) {
RewritePatternSet patterns(funcOp.getContext());		RewritePatternSet patterns(funcOp.getContext());
patterns.add<ExtractSliceOfPadTensorSwapPattern>(funcOp.getContext());		patterns.add<ExtractSliceOfPadTensorSwapPattern>(funcOp.getContext());
(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));		(void)applyPatternsAndFoldGreedily(funcOp, std::move(patterns));
}		}

static void applyAffineMinSCFCanonicalizationPatterns(FuncOp funcOp) {		static void applyAffineMinSCFCanonicalizationPatterns(FuncOp funcOp) {
RewritePatternSet foldPattern(funcOp.getContext());		RewritePatternSet foldPattern(funcOp.getContext());
▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	return applyMatmulToVectorPatterns(getFunction(),
testMatmulToVectorPatterns1dTiling,		testMatmulToVectorPatterns1dTiling,
testMatmulToVectorPatterns2dTiling);		testMatmulToVectorPatterns2dTiling);
if (testVectorTransferForwardingPatterns)		if (testVectorTransferForwardingPatterns)
return applyVectorTransferForwardingPatterns(getFunction());		return applyVectorTransferForwardingPatterns(getFunction());
if (testGenericToVectorPattern)		if (testGenericToVectorPattern)
return applyLinalgToVectorPatterns(getFunction());		return applyLinalgToVectorPatterns(getFunction());
if (testTransformPadTensor)		if (testTransformPadTensor)
return applyPadTensorToGenericPatterns(getFunction());		return applyPadTensorToGenericPatterns(getFunction());
		if (testGeneralizePadTensor)
		return applyGeneralizePadTensorPatterns(getFunction());
if (testSwapSubTensorPadTensor)		if (testSwapSubTensorPadTensor)
return applyExtractSliceOfPadTensorSwapPattern(getFunction());		return applyExtractSliceOfPadTensorSwapPattern(getFunction());
if (testAffineMinSCFCanonicalizationPatterns)		if (testAffineMinSCFCanonicalizationPatterns)
return applyAffineMinSCFCanonicalizationPatterns(getFunction());		return applyAffineMinSCFCanonicalizationPatterns(getFunction());
if (testTileAndPadPattern)		if (testTileAndPadPattern)
return applyTileAndPadPattern(getFunction(), tileSizesForPadding);		return applyTileAndPadPattern(getFunction(), tileSizesForPadding);
if (testHoistPadding) {		if (testHoistPadding) {
getFunction().walk([&](linalg::PadTensorOp padTensorOp) {		getFunction().walk([&](linalg::PadTensorOp padTensorOp) {
Show All 14 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Refactor GenericPadTensorOpVectorizationPatternClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 356919

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp

mlir/lib/Dialect/Linalg/Transforms/Transforms.cpp

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/test/Dialect/Linalg/generalize-pad-tensor.mlir

mlir/test/Integration/Dialect/Linalg/CPU/test-padtensor.mlir

mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp

Refactor GenericPadTensorOpVectorizationPattern
ClosedPublic