This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
350	if its `isElementwise && isMinorIdentity` we still need to broadcast. Can we push this down to merge the two path by inserting transposes if `!isMinorIdentity` ?

ThomasRaoux added inline comments.Mar 12 2021, 11:10 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
350	if its `isElementwise && isMinorIdentity` we still need to broadcast. Can we push this down to merge the two path by inserting transposes if `!isMinorIdentity` ? Correct, I was considering merging those two cases. That would simplify slightly the vectorization however that could generate less efficient code as right now the broadcasting is done lazily and if some operations don't need to be expanded on all the output dimensions it may pick a vector of smaller rank. I'm not sure how likely this case is to happen so if people think it is not worth the extra complexity I can merge those two cases. Note that this could also be optimized away by canonicalizations later on. I don't have a strong opinion, so I can change it based on what you, Nicolas and Aart think.

asaadaldien accepted this revision.Mar 12 2021, 12:52 PM

asaadaldien added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
276	Nit: Add a comment broadcasting into inner most dims.
350	If we have can one of operands at vectorized to a lower-rank that means all its consumers are at most at the same lower-rank. In other words we can split the linalg.generic into two independent ops operating at lower-rank & higher rank domains..... IMO we can keep it that way, this is more general and the added complicity isn't much =) Lets wait for @nicolasvasilache, @aartbik opinion.

This revision is now accepted and ready to land.Mar 12 2021, 12:52 PM

The vector.transfer_read / write operation have support for expressing the permutation and broadcast semantics as part of the permutation_map.
The transformation you are doing here looks like the kind of more general stuff needed to rewrite that permutation_map into vector.transpose + broadcast.

The tradeoff is that vector.transfer + permutation can lower better as loop + memory indexing reordering.
Introducing transpose + broadcast at the time of Linalg -> Vector is probably lowering too quickly.
OTOH we could also think of implementing some of this as canonicalizations of vector.transfer + transpose + broadcast.

This also has to be thought of in the context of vector.shape_cast and future scalable vectors.

So bottom line: I am a conflicted.
This looks more generally applicable and writing it in 2 steps as: 1) Linalg -> vector.transfer + permutation followed by 2) better support of permutation_map when lowering vector.transfer better satisfies my intuitions
OTOH it is also possible that the permutation_map in vector.transfer is more complicated to use; but this has to be weighed off against the alternative.

Can you put an extra time to chat in my cal tomorrow so we go a bit deeper with higher BW?
I'll put a blocker in the meantime because this sits at the intersection of a bunch of different things.

mlir/test/Dialect/Linalg/vectorization.mlir
346	`args_in/out` attributes are not needed anymore

This revision now requires changes to proceed.Mar 15 2021, 3:04 AM

Use transfer_read maps instead of emitting directly broadcast/transpose

Herald added a reviewer: rriddle. · View Herald TranscriptMar 16 2021, 2:15 PM

In D98523#2625614, @nicolasvasilache wrote:

The vector.transfer_read / write operation have support for expressing the permutation and broadcast semantics as part of the permutation_map.
The transformation you are doing here looks like the kind of more general stuff needed to rewrite that permutation_map into vector.transpose + broadcast.

The tradeoff is that vector.transfer + permutation can lower better as loop + memory indexing reordering.
Introducing transpose + broadcast at the time of Linalg -> Vector is probably lowering too quickly.
OTOH we could also think of implementing some of this as canonicalizations of vector.transfer + transpose + broadcast.

This also has to be thought of in the context of vector.shape_cast and future scalable vectors.

So bottom line: I am a conflicted.
This looks more generally applicable and writing it in 2 steps as: 1) Linalg -> vector.transfer + permutation followed by 2) better support of permutation_map when lowering vector.transfer better satisfies my intuitions
OTOH it is also possible that the permutation_map in vector.transfer is more complicated to use; but this has to be weighed off against the alternative.

Can you put an extra time to chat in my cal tomorrow so we go a bit deeper with higher BW?
I'll put a blocker in the meantime because this sits at the intersection of a bunch of different things.

As discussed I switched to using affine map in vector transfer instead. I'll prepare another patch once we agree on this review to do the lowering of those transfer ops with affine maps.

Harbormaster completed remote builds in B94127: Diff 331107.Mar 16 2021, 3:19 PM

nicolasvasilache accepted this revision.Mar 18 2021, 10:12 AM

nicolasvasilache added inline comments.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
258	`//` -> `///`
275	`//` -> `///`

This revision is now accepted and ready to land.Mar 18 2021, 10:12 AM

This revision was landed with ongoing or failed builds.Mar 18 2021, 12:33 PM

Closed by commit rG16947650d5ca: [mlir][linalg] Extend linalg vectorization to support non-identity input maps (authored by ThomasRaoux). · Explain Why

This revision was automatically updated to reflect the committed changes.

ThomasRaoux added a commit: rG16947650d5ca: [mlir][linalg] Extend linalg vectorization to support non-identity input maps.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

104 lines

Vector/

VectorOps.cpp

3 lines

IR/

AffineMap.cpp

5 lines

test/

Dialect/

Linalg/

vectorization.mlir

36 lines

Diff 331657

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	static VectorType extractVectorTypeFromShapedValue(Value v) {
if (st.isa<MemRefType>() && st.getShape().empty())		if (st.isa<MemRefType>() && st.getShape().empty())
return VectorType();		return VectorType();
return VectorType::get(st.getShape(), st.getElementType());		return VectorType::get(st.getShape(), st.getElementType());
}		}

/// Build a vector.transfer_read from `source` at indices set to all `0`.		/// Build a vector.transfer_read from `source` at indices set to all `0`.
/// If source has rank zero, build an memref.load.		/// If source has rank zero, build an memref.load.
/// Return the produced value.		/// Return the produced value.
static Value buildVectorRead(OpBuilder &builder, Value source) {		static Value buildVectorRead(OpBuilder &builder, Value source,
		VectorType vectorType, AffineMap map) {
edsc::ScopedContext scope(builder);		edsc::ScopedContext scope(builder);
auto shapedType = source.getType().cast<ShapedType>();		auto shapedType = source.getType().cast<ShapedType>();
if (VectorType vectorType = extractVectorTypeFromShapedValue(source)) {		if (vectorType) {
SmallVector<Value> indices(shapedType.getRank(), std_constant_index(0));		SmallVector<Value> indices(shapedType.getRank(), std_constant_index(0));
		if (map)
		return vector_transfer_read(vectorType, source, indices, map);
return vector_transfer_read(vectorType, source, indices);		return vector_transfer_read(vectorType, source, indices);
}		}
return memref_load(source);		return memref_load(source);
}		}

/// Build a vector.transfer_write of `value` into `dest` at indices set to all		/// Build a vector.transfer_write of `value` into `dest` at indices set to all
/// `0`. If `dest` has null rank, build an memref.store.		/// `0`. If `dest` has null rank, build an memref.store.
/// Return the produced value or null if no value is produced.		/// Return the produced value or null if no value is produced.
▲ Show 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	vectorizeOneOp(OpBuilder &builder, Operation *op,
OperationState state(op->getLoc(), op->getName());		OperationState state(op->getLoc(), op->getName());
state.addAttributes(op->getAttrs());		state.addAttributes(op->getAttrs());
state.addOperands(llvm::to_vector<4>(vectorizedOperands));		state.addOperands(llvm::to_vector<4>(vectorizedOperands));
state.addTypes(llvm::to_vector<4>(returnTypes));		state.addTypes(llvm::to_vector<4>(returnTypes));
return VectorizationResult{VectorizationStatus::NewOp,		return VectorizationResult{VectorizationStatus::NewOp,
builder.createOperation(state)};		builder.createOperation(state)};
}		}

		/// Detect whether `r` has only ConstantOp, ElementwiseMappable and YieldOp.
		static bool hasOnlyScalarElementwiseOp(Region &r) {
		if (!llvm::hasSingleElement(r))
		return false;
		for (Operation &op : r.front()) {
		if (!(isa<ConstantOp, linalg::YieldOp>(op) \|\|
		OpTrait::hasElementwiseMappableTraits(&op)) \|\|
		llvm::any_of(op.getResultTypes(),
		[](Type type) { return !type.isIntOrIndexOrFloat(); }))
		return false;
		}
		return true;
		}

		// Return true if the op is an element-wise linalg op.
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions `//` -> `///` nicolasvasilache: `//` -> `///`
		static bool isElementwise(Operation *op) {
		auto linalgOp = dyn_cast<linalg::LinalgOp>(op);
		if (!linalgOp)
		return false;
		if (linalgOp.getNumLoops() != linalgOp.getNumParallelLoops())
		return false;
		// TODO: relax the restrictions on indexing map.
		for (unsigned i = 0, e = linalgOp.getNumOutputs(); i < e; i++) {
		if (!linalgOp.getOutputIndexingMap(i).isIdentity())
		return false;
		}
		if (linalgOp->getNumRegions() != 1)
		return false;
		return hasOnlyScalarElementwiseOp(linalgOp->getRegion(0));
		}

		// Calculate the map to apply to transfer_read to convert the input shape into
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions `//` -> `///` nicolasvasilache: `//` -> `///`
		// the output shape.
		asaadaldienUnsubmitted Not Done Reply Inline Actions Nit: Add a comment broadcasting into inner most dims. asaadaldien: Nit: Add a comment broadcasting into inner most dims.
		static AffineMap getTransferReadMap(LinalgOp linalgOp, unsigned argIndex) {
		AffineMap linalgMap = linalgOp.getIndexingMap(argIndex);
		MLIRContext *context = linalgMap.getContext();
		AffineExpr zero = mlir::getAffineConstantExpr(0, context);
		SmallVector<AffineExpr, 4> exprs(linalgMap.getNumInputs(), zero);
		for (unsigned i : llvm::seq(unsigned(0), linalgMap.getNumResults())) {
		exprs[linalgMap.getDimPosition(i)] = getAffineDimExpr(i, context);
		}
		return AffineMap::get(linalgMap.getNumResults(), /symbolCount=/0, exprs,
		context);
		}

/// Generic vectorization function that rewrites the body of a `linalgOp` into		/// Generic vectorization function that rewrites the body of a `linalgOp` into
/// vector form. Generic vectorization proceeds as follows:		/// vector form. Generic vectorization proceeds as follows:
/// 1. The region for the linalg op is created if necessary.		/// 1. The region for the linalg op is created if necessary.
/// 2. Values defined above the region are mapped to themselves and will be		/// 2. Values defined above the region are mapped to themselves and will be
/// broadcasted on a per-need basis by their consumers.		/// broadcasted on a per-need basis by their consumers.
/// 3. Each region argument is vectorized into a vector.transfer_read (or 0-d		/// 3. Each region argument is vectorized into a vector.transfer_read (or 0-d
/// load).		/// load).
/// TODO: Reuse opportunities for RAR dependencies.		/// TODO: Reuse opportunities for RAR dependencies.
Show All 28 Lines	LogicalResult vectorizeAsLinalgGeneric(
llvm::SetVector<Value> valuesSet;		llvm::SetVector<Value> valuesSet;
mlir::getUsedValuesDefinedAbove(*region, valuesSet);		mlir::getUsedValuesDefinedAbove(*region, valuesSet);
bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());		bvm.map(valuesSet.getArrayRef(), valuesSet.getArrayRef());

// 3. Turn all BBArgs into vector.transfer_read / load.		// 3. Turn all BBArgs into vector.transfer_read / load.
SmallVector<AffineMap> indexings;		SmallVector<AffineMap> indexings;
for (auto bbarg : block->getArguments()) {		for (auto bbarg : block->getArguments()) {
Value vectorArg = linalgOp.getShapedOperand(bbarg.getArgNumber());		Value vectorArg = linalgOp.getShapedOperand(bbarg.getArgNumber());
Value vectorRead = buildVectorRead(builder, vectorArg);		AffineMap map;
		VectorType vectorType = extractVectorTypeFromShapedValue(vectorArg);
		if (isElementwise(linalgOp) &&
		!linalgOp.getIndexingMap(bbarg.getArgNumber()).isMinorIdentity()) {
		// Currently assume we don't support output permutations.
		assert(linalgOp.getNumOutputs() > 0 &&
		linalgOp.getOutputIndexingMap(0).isIdentity());
		ArrayRef<int64_t> outputShape =
		linalgOp.getOutputShapedType(0).getShape();
		vectorType = VectorType::get(outputShape, vectorType.getElementType());
		map = getTransferReadMap(linalgOp, bbarg.getArgNumber());
		}
		Value vectorRead = buildVectorRead(builder, vectorArg, vectorType, map);
LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: new vectorized bbarg("		LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: new vectorized bbarg("
<< bbarg.getArgNumber() << "): " << vectorRead);		<< bbarg.getArgNumber() << "): " << vectorRead);
bvm.map(bbarg, vectorRead);		bvm.map(bbarg, vectorRead);
bvm.map(vectorArg, vectorRead);		bvm.map(vectorArg, vectorRead);
}		}
		asaadaldienUnsubmitted Not Done Reply Inline Actions if its `isElementwise && isMinorIdentity` we still need to broadcast. Can we push this down to merge the two path by inserting transposes if `!isMinorIdentity` ? asaadaldien: if its `isElementwise && isMinorIdentity` we still need to broadcast. Can we push this down to…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions if its `isElementwise && isMinorIdentity` we still need to broadcast. Can we push this down to merge the two path by inserting transposes if `!isMinorIdentity` ? Correct, I was considering merging those two cases. That would simplify slightly the vectorization however that could generate less efficient code as right now the broadcasting is done lazily and if some operations don't need to be expanded on all the output dimensions it may pick a vector of smaller rank. I'm not sure how likely this case is to happen so if people think it is not worth the extra complexity I can merge those two cases. Note that this could also be optimized away by canonicalizations later on. I don't have a strong opinion, so I can change it based on what you, Nicolas and Aart think. ThomasRaoux: > if its `isElementwise && isMinorIdentity` we still need to broadcast. Can we push this down…
		asaadaldienUnsubmitted Not Done Reply Inline Actions If we have can one of operands at vectorized to a lower-rank that means all its consumers are at most at the same lower-rank. In other words we can split the linalg.generic into two independent ops operating at lower-rank & higher rank domains..... IMO we can keep it that way, this is more general and the added complicity isn't much =) Lets wait for @nicolasvasilache, @aartbik opinion. asaadaldien: If we have can one of operands at vectorized to a lower-rank that means all its consumers are…

// 4. Register CustomVectorizationHook for yieldOp.		// 4. Register CustomVectorizationHook for yieldOp.
CustomVectorizationHook vectorizeYield =		CustomVectorizationHook vectorizeYield =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeLinalgYield(builder, op, bvm, linalgOp, newResults);		return vectorizeLinalgYield(builder, op, bvm, linalgOp, newResults);
};		};
// Append the vectorizeYield hook.		// Append the vectorizeYield hook.
Show All 12 Lines	if (result.status == VectorizationStatus::NewOp) {
<< *result.newOp;);		<< *result.newOp;);
bvm.map(op.getResults(), result.newOp->getResults());		bvm.map(op.getResults(), result.newOp->getResults());
}		}
}		}

return success();		return success();
}		}

/// Detect whether `r` has only ConstantOp, ElementwiseMappable and YieldOp.
static bool hasOnlyScalarElementwiseOp(Region &r) {
if (!llvm::hasSingleElement(r))
return false;
for (Operation &op : r.front()) {
if (!(isa<ConstantOp, linalg::YieldOp>(op) \|\|
OpTrait::hasElementwiseMappableTraits(&op)) \|\|
llvm::any_of(op.getResultTypes(),
[](Type type) { return !type.isIntOrIndexOrFloat(); }))
return false;
}
return true;
}

// Return true if the op is an element-wise linalg op.
static bool isElementwise(Operation *op) {
auto linalgOp = dyn_cast<linalg::LinalgOp>(op);
if (!linalgOp)
return false;
if (linalgOp.getNumLoops() != linalgOp.getNumParallelLoops())
return false;
// TODO: relax the restrictions on indexing map.
for (unsigned i = 0, e = linalgOp.getNumOutputs(); i < e; i++) {
if (!linalgOp.getOutputIndexingMap(i).isIdentity())
return false;
}
// Currently bound the input indexing map to minor identity as other
// permutations might require adding transpose ops to convert the vector read
// to the right shape.
for (unsigned i = 0, e = linalgOp.getNumInputs(); i < e; i++) {
if (!linalgOp.getInputIndexingMap(i).isMinorIdentity())
return false;
}
if (linalgOp->getNumRegions() != 1)
return false;
return hasOnlyScalarElementwiseOp(linalgOp->getRegion(0));
}

static LogicalResult vectorizeContraction(OpBuilder &builder, LinalgOp linalgOp,		static LogicalResult vectorizeContraction(OpBuilder &builder, LinalgOp linalgOp,
SmallVectorImpl<Value> &newResults) {		SmallVectorImpl<Value> &newResults) {
assert(isaContractionOpInterface(linalgOp) &&		assert(isaContractionOpInterface(linalgOp) &&
"expected vectorizeContraction preconditions to be met");		"expected vectorizeContraction preconditions to be met");
Location loc = linalgOp.getLoc();		Location loc = linalgOp.getLoc();
// Vectorize other ops as vector contraction.		// Vectorize other ops as vector contraction.
// TODO: interface.		// TODO: interface.
LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: "		LLVM_DEBUG(dbgs() << "\n[" DEBUG_TYPE "]: "
▲ Show 20 Lines • Show All 445 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorOps.cpp

Show First 20 Lines • Show All 2,288 Lines • ▼ Show 20 Lines	void TransferReadOp::build(OpBuilder &builder, OperationState &result,
ValueRange indices, ArrayRef<bool> maybeMasked) {		ValueRange indices, ArrayRef<bool> maybeMasked) {
auto permMap = getTransferMinorIdentityMap(		auto permMap = getTransferMinorIdentityMap(
source.getType().cast<ShapedType>(), vectorType);		source.getType().cast<ShapedType>(), vectorType);
build(builder, result, vectorType, source, indices, permMap, maybeMasked);		build(builder, result, vectorType, source, indices, permMap, maybeMasked);
}		}

static void printTransferAttrs(OpAsmPrinter &p, VectorTransferOpInterface op) {		static void printTransferAttrs(OpAsmPrinter &p, VectorTransferOpInterface op) {
SmallVector<StringRef, 2> elidedAttrs;		SmallVector<StringRef, 2> elidedAttrs;
if (op.permutation_map() ==		if (op.permutation_map().isMinorIdentity())
getTransferMinorIdentityMap(op.getShapedType(), op.getVectorType()))
elidedAttrs.push_back(op.getPermutationMapAttrName());		elidedAttrs.push_back(op.getPermutationMapAttrName());
bool elideMasked = true;		bool elideMasked = true;
if (auto maybeMasked = op.masked()) {		if (auto maybeMasked = op.masked()) {
for (auto attr : *maybeMasked) {		for (auto attr : *maybeMasked) {
if (!attr.template cast<BoolAttr>().getValue()) {		if (!attr.template cast<BoolAttr>().getValue()) {
elideMasked = false;		elideMasked = false;
break;		break;
}		}
▲ Show 20 Lines • Show All 1,240 Lines • Show Last 20 Lines

mlir/lib/IR/AffineMap.cpp

	Show First 20 Lines • Show All 100 Lines • ▼ Show 20 Lines
	AffineMap AffineMap::getMinorIdentityMap(unsigned dims, unsigned results,			AffineMap AffineMap::getMinorIdentityMap(unsigned dims, unsigned results,
	MLIRContext *context) {			MLIRContext *context) {
	assert(dims >= results && "Dimension mismatch");			assert(dims >= results && "Dimension mismatch");
	auto id = AffineMap::getMultiDimIdentityMap(dims, context);			auto id = AffineMap::getMultiDimIdentityMap(dims, context);
	return AffineMap::get(dims, 0, id.getResults().take_back(results), context);			return AffineMap::get(dims, 0, id.getResults().take_back(results), context);
	}			}

	bool AffineMap::isMinorIdentity() const {			bool AffineMap::isMinorIdentity() const {
	return *this ==			return getNumDims() >= getNumResults() &&
				*this ==
	getMinorIdentityMap(getNumDims(), getNumResults(), getContext());			getMinorIdentityMap(getNumDims(), getNumResults(), getContext());
	}			}

	/// Returns true if this affine map is a minor identity up to broadcasted			/// Returns true if this affine map is a minor identity up to broadcasted
	/// dimensions which are indicated by value 0 in the result.			/// dimensions which are indicated by value 0 in the result.
	bool AffineMap::isMinorIdentityWithBroadcasting(			bool AffineMap::isMinorIdentityWithBroadcasting(
	SmallVectorImpl<unsigned> *broadcastedDims) const {			SmallVectorImpl<unsigned> *broadcastedDims) const {
	if (broadcastedDims)			if (broadcastedDims)
	broadcastedDims->clear();			broadcastedDims->clear();
	▲ Show 20 Lines • Show All 506 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

Show First 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	func @generic_vectorize_tensor(%arg0: tensor<4x256xf32>,
return %r#0, %r#1, %r#2, %r#3, %r#4, %r#5, %r#6, %r#7, %r#8, %r#9:		return %r#0, %r#1, %r#2, %r#3, %r#4, %r#5, %r#6, %r#7, %r#8, %r#9:
tensor<4x256xf32>, tensor<4x256xf32>, tensor<4x256xf32>,		tensor<4x256xf32>, tensor<4x256xf32>, tensor<4x256xf32>,
tensor<4x256xf32>, tensor<4x256xf32>, tensor<4x256xf32>, tensor<4x256xf32>,		tensor<4x256xf32>, tensor<4x256xf32>, tensor<4x256xf32>, tensor<4x256xf32>,
tensor<4x256xf32>, tensor<4x256xf32>, tensor<4x256xf32>		tensor<4x256xf32>, tensor<4x256xf32>, tensor<4x256xf32>
}		}

// -----		// -----

		// Test different input maps.
		#matmul_trait = {
		indexing_maps = [
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `args_in/out` attributes are not needed anymore nicolasvasilache: `args_in/out` attributes are not needed anymore
		affine_map<(d0, d1, d2, d3) -> (d1, d0)>,
		affine_map<(d0, d1, d2, d3) -> (d3, d1)>,
		affine_map<(d0, d1, d2, d3) -> (d3, d1, d0, d2)>,
		affine_map<(d0, d1, d2, d3) -> (d0, d1, d2, d3)>
		],
		iterator_types = ["parallel", "parallel", "parallel", "parallel"]
		}

		// CHECK-DAG: #[[MAP0:.*]] = affine_map<(d0, d1) -> (d1, d0, 0, 0)>
		// CHECK-DAG: #[[MAP1:.*]] = affine_map<(d0, d1) -> (0, d1, 0, d0)>
		// CHECK-DAG: #[[MAP2:.*]] = affine_map<(d0, d1, d2, d3) -> (d2, d1, d3, d0)>
		// CHECK: func @vectorization_transpose
		// CHECK: vector.transfer_read {{.*}}{permutation_map = #[[MAP0]]} : memref<14x7xf32>, vector<7x14x8x16xf32>
		// CHECK: vector.transfer_read {{.*}}{permutation_map = #[[MAP1]]} : memref<16x14xf32>, vector<7x14x8x16xf32>
		// CHECK: vector.transfer_read {{.*}}{permutation_map = #[[MAP2]]} : memref<16x14x7x8xf32>, vector<7x14x8x16xf32>
		// CHECK: addf {{.*}} : vector<7x14x8x16xf32>
		// CHECK: addf {{.*}} : vector<7x14x8x16xf32>
		// CHECK: vector.transfer_write {{.*}} : vector<7x14x8x16xf32>, memref<7x14x8x16xf32>
		func @vectorization_transpose(%A: memref<14x7xf32>, %B: memref<16x14xf32>,
		%C: memref<16x14x7x8xf32>, %D: memref<7x14x8x16xf32>) {
		linalg.generic #matmul_trait
		ins(%A, %B, %C : memref<14x7xf32>, memref<16x14xf32>, memref<16x14x7x8xf32>)
		outs(%D : memref<7x14x8x16xf32>) {
		^bb(%a: f32, %b: f32, %c: f32, %d: f32) :
		%e = addf %a, %b: f32
		%f = addf %e, %c: f32
		linalg.yield %f : f32
		}
		return
		}

		// -----

// CHECK-LABEL: func @matmul_tensors		// CHECK-LABEL: func @matmul_tensors
// CHECK-SAME: (%[[ARG0:.]]: tensor<8x4xf32>, %[[ARG1:.]]: tensor<4x12xf32>,		// CHECK-SAME: (%[[ARG0:.]]: tensor<8x4xf32>, %[[ARG1:.]]: tensor<4x12xf32>,
// CHECK-SAME: %[[ARG2:.*]]: tensor<8x12xf32>) -> tensor<8x12xf32>		// CHECK-SAME: %[[ARG2:.*]]: tensor<8x12xf32>) -> tensor<8x12xf32>
func @matmul_tensors(		func @matmul_tensors(
%arg0: tensor<8x4xf32>, %arg1: tensor<4x12xf32>, %arg2: tensor<8x12xf32>)		%arg0: tensor<8x4xf32>, %arg1: tensor<4x12xf32>, %arg2: tensor<8x12xf32>)
-> tensor<8x12xf32> {		-> tensor<8x12xf32> {
// CHECK-DAG: %[[C0:.*]] = constant 0 : index		// CHECK-DAG: %[[C0:.*]] = constant 0 : index
// CHECK-DAG: %[[VEC_C0:.*]] = constant dense<0.000000e+00> : vector<8x12xf32>		// CHECK-DAG: %[[VEC_C0:.*]] = constant dense<0.000000e+00> : vector<8x12xf32>
▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][linalg] Extend linalg vectorization to support non-identity input mapsClosedPublic

Details

Diff Detail