This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
4/8
Vectorization.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
vectorization.mlir

Differential D111825

[mlir][NFC] Refactor linalg vectorization for reduction ops
ClosedPublic

Authored by ThomasRaoux on Oct 14 2021, 10:47 AM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
dcaballe
aartbik
mravishankar
pifon2a

Commits

rGafad0cdf31e8: [mlir][vector] Refactor linalg vectorization for reductions

Summary

Emit reduction during op vectorization instead of doing it when creating the transfer write. This allow us to not broadcast output arguments for reduction initial value.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Oct 14 2021, 10:47 AM

Herald added a reviewer: aartbik. · View Herald TranscriptOct 14 2021, 10:47 AM

Herald added a reviewer: mravishankar. · View Herald Transcript

Herald added subscribers: wenzhicui, wrengr, Chia-hungDuan and 19 others. · View Herald Transcript

ThomasRaoux requested review of this revision.Oct 14 2021, 10:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2021, 10:47 AM

Herald added subscribers: limo1996, stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B128902: Diff 379766.Oct 14 2021, 10:48 AM

pifon2a accepted this revision.Oct 14 2021, 11:20 AM

This revision is now accepted and ready to land.Oct 14 2021, 11:20 AM

nicolasvasilache added inline comments.Oct 14 2021, 11:23 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
389	This "fused" loop structure feels quite unnatural. Maybe they are not possible in practice but it looks like there is a possible path where one operand passes all the checks and then we hit return; that would def. sound fishy. I'd rather structure the code like: SmallVector<Value> reducedValues, operands; for (Value operand : ...) { ... if (reducedValue) { reducedValues.push_back(reducedValue); operands.push_back(operand); } } assert(reducedValues.size() <= 1); ... etc
403	"reduce only if needed" sounds exactly like `reduceIfNeeded`. I understand the impl changes as we do it eagerly instead of late. Can we keep the function name and move the modified impl in there ?
405	is this check only for avoid interfering with the specific path of contraction vectorization? If so it starts to feel like we want to push on the missing canonicalizations / foldings and drop such "interference avoidance checks".

Thanks for fixing!
Let's just clean the code a bit and land this.

Also, this is really not NFC, we are generating better IR here.

Address review comments.

Harbormaster completed remote builds in B128934: Diff 379813.Oct 14 2021, 1:08 PM

ThomasRaoux added inline comments.Oct 14 2021, 1:10 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
389	Makes sense, I broke the loop up as suggested.
403	Moved the logic in `reduceIfNeeded` function
405	Yes, I can take a look at that sometime soon.

This revision was landed with ongoing or failed builds.Oct 14 2021, 1:38 PM

Closed by commit rGafad0cdf31e8: [mlir][vector] Refactor linalg vectorization for reductions (authored by ThomasRaoux). · Explain Why

This revision was automatically updated to reflect the committed changes.

ThomasRaoux added a commit: rGafad0cdf31e8: [mlir][vector] Refactor linalg vectorization for reductions.

dcaballe added inline comments.Oct 14 2021, 6:08 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
397	`reductionOps` may contain multiple ops. Actually, I think we are not even using `reductionOps` anywhere here? I think this is bringing back the problem I was trying to solve with the `matchReduction` utility: detecting reductions with multiple ops.

ThomasRaoux added inline comments.Oct 14 2021, 7:00 PM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
397	Right, we currently check that there is only a single op in `matchReduction` otherwise it fails. This wouldn't work otherwise indeed, maybe we need an assert. Here op will be the same as reductionOps[0], probably worth an assert too.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

157 lines

test/

Dialect/

Linalg/

vectorization.mlir

4 lines

Diff 379813

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	return b.create<vector::TransferReadOp>(loc, vectorType, source, indices,
map);		map);
return vector::TransferReadOp::createScalarOp(b, loc, source, indices);		return vector::TransferReadOp::createScalarOp(b, loc, source, indices);
}		}

/// Create MultiDimReductionOp to compute the reduction for `reductionOp`. This		/// Create MultiDimReductionOp to compute the reduction for `reductionOp`. This
/// assumes that `reductionOp` has tow operands and one of them is the reduction		/// assumes that `reductionOp` has tow operands and one of them is the reduction
/// initial value.		/// initial value.
static Value buildMultiDimReduce(OpBuilder &b, Operation *reduceOp,		static Value buildMultiDimReduce(OpBuilder &b, Operation *reduceOp,
Value outputArg,		Value valueToReduce,
const SmallVector<bool> &reductionMask,		const SmallVector<bool> &reductionMask) {
const BlockAndValueMapping &bvm) {
auto maybeKind = getKindForOp(reduceOp);		auto maybeKind = getKindForOp(reduceOp);
assert(maybeKind && "Failed precondition: could not get reduction kind");		assert(maybeKind && "Failed precondition: could not get reduction kind");
Value operandToReduce = reduceOp->getOperand(0) == outputArg		return b.create<vector::MultiDimReductionOp>(
? reduceOp->getOperand(1)		reduceOp->getLoc(), valueToReduce, reductionMask, *maybeKind);
: reduceOp->getOperand(0);
Value vec = bvm.lookup(operandToReduce);
return b.create<vector::MultiDimReductionOp>(reduceOp->getLoc(), vec,
reductionMask, *maybeKind);
}		}

/// Read the initial value associated to the given `outputOperand`.		static SmallVector<bool> getReductionMask(LinalgOp linalgOp) {
static Value readInitialValue(OpBuilder &b, LinalgOp linalgOp,
OpOperand *outputOperand) {
AffineMap map = inversePermutation(
reindexIndexingMap(linalgOp.getTiedIndexingMap(outputOperand)));
Type readType;
if (linalgOp.getShape(outputOperand).empty()) {
readType = getElementTypeOrSelf(outputOperand->get());
} else {
readType = VectorType::get(map.compose(linalgOp.getShape(outputOperand)),
getElementTypeOrSelf(outputOperand->get()));
}
Value vectorRead = buildVectorRead(b, outputOperand->get(), readType, map);
return vectorRead;
}

/// Assuming `outputOperand` is an output operand of a LinalgOp, determine
/// whether a reduction is needed to produce a `targetType` and create that
/// reduction if it is the case.
static Value reduceIfNeeded(OpBuilder &b, Type targetType, Value value,
OpOperand *outputOperand,
const BlockAndValueMapping &bvm) {
LDBG("Reduce " << value << " to type " << targetType);
LDBG("In LinalgOp operand #" << outputOperand->getOperandNumber() << "\n"
<< *(outputOperand->getOwner()));
auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());
auto vecType = value.getType().dyn_cast<VectorType>();
VectorType targetVectorType = targetType.dyn_cast<VectorType>();
if (!vecType)
return value;
if (targetVectorType && vecType.getShape() == targetVectorType.getShape())
return value;

// At this point, we know we need to reduce. Detect the reduction operator.
unsigned pos = 0;
MLIRContext *ctx = b.getContext();
SmallVector<AffineExpr> exprs;
for (auto s : linalgOp.iterator_types())
if (isParallelIterator(s))
exprs.push_back(getAffineDimExpr(pos++, ctx));

Operation *reduceOp = matchLinalgReduction(outputOperand);
assert(reduceOp && "Failed precondition: could not math a reduction");
unsigned idx = 0;		unsigned idx = 0;
SmallVector<bool> reductionMask(linalgOp.iterator_types().size(), false);		SmallVector<bool> reductionMask(linalgOp.iterator_types().size(), false);
for (auto attr : linalgOp.iterator_types()) {		for (auto attr : linalgOp.iterator_types()) {
if (isReductionIterator(attr))		if (isReductionIterator(attr))
reductionMask[idx] = true;		reductionMask[idx] = true;
++idx;		++idx;
}		}
assert(reduceOp->getNumOperands() == 2 &&		return reductionMask;
"Only support binary reduce op right now");
unsigned outputPos =
outputOperand->getOperandNumber() - linalgOp.getNumInputs();
Value outputArg = linalgOp.getRegionOutputArgs()[outputPos];
// Reduce across the iteration space.
Value reduce =
buildMultiDimReduce(b, reduceOp, outputArg, reductionMask, bvm);

// Read the original output value.
Value initialValue = readInitialValue(b, linalgOp, outputOperand);

// Combine the output argument with the reduced value.
OperationState state(reduceOp->getLoc(), reduceOp->getName());
state.addAttributes(reduceOp->getAttrs());
state.addOperands({reduce, initialValue});
state.addTypes(initialValue.getType());
return b.createOperation(state)->getResult(0);
}		}

/// Build a vector.transfer_write of `value` into `outputOperand` at indices set		/// Build a vector.transfer_write of `value` into `outputOperand` at indices set
/// to all `0`; where `outputOperand` is an output operand of the LinalgOp		/// to all `0`; where `outputOperand` is an output operand of the LinalgOp
/// currently being vectorized. If `dest` has null rank, build an memref.store.		/// currently being vectorized. If `dest` has null rank, build an memref.store.
/// Return the produced value or null if no value is produced.		/// Return the produced value or null if no value is produced.
static Value buildVectorWrite(OpBuilder &b, Value value,		static Value buildVectorWrite(OpBuilder &b, Value value,
OpOperand *outputOperand,		OpOperand *outputOperand) {
const BlockAndValueMapping &bvm) {
Operation *write;		Operation *write;
Location loc = value.getLoc();		Location loc = value.getLoc();
auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());		auto linalgOp = cast<LinalgOp>(outputOperand->getOwner());
if (VectorType vectorType =		if (VectorType vectorType =
extractVectorTypeFromShapedValue(outputOperand->get())) {		extractVectorTypeFromShapedValue(outputOperand->get())) {
AffineMap map =		AffineMap map =
reindexIndexingMap(linalgOp.getTiedIndexingMap(outputOperand));		reindexIndexingMap(linalgOp.getTiedIndexingMap(outputOperand));
SmallVector<int64_t> transposeShape =		SmallVector<int64_t> transposeShape =
applyPermutationMap(inversePermutation(map), vectorType.getShape());		applyPermutationMap(inversePermutation(map), vectorType.getShape());
assert(!transposeShape.empty() && "unexpected empty transpose shape");		assert(!transposeShape.empty() && "unexpected empty transpose shape");
vectorType = VectorType::get(transposeShape, vectorType.getElementType());		vectorType = VectorType::get(transposeShape, vectorType.getElementType());
SmallVector<Value> indices(linalgOp.getRank(outputOperand),		SmallVector<Value> indices(linalgOp.getRank(outputOperand),
b.create<arith::ConstantIndexOp>(loc, 0));		b.create<arith::ConstantIndexOp>(loc, 0));
value = broadcastIfNeeded(b, value, vectorType.getShape());		value = broadcastIfNeeded(b, value, vectorType.getShape());
value = reduceIfNeeded(b, vectorType, value, outputOperand, bvm);
write = b.create<vector::TransferWriteOp>(loc, value, outputOperand->get(),		write = b.create<vector::TransferWriteOp>(loc, value, outputOperand->get(),
indices, map);		indices, map);
} else {		} else {
value = reduceIfNeeded(b, getElementTypeOrSelf(value), value, outputOperand,
bvm);
write = vector::TransferWriteOp::createScalarOp(		write = vector::TransferWriteOp::createScalarOp(
b, loc, value, outputOperand->get(), ValueRange{});		b, loc, value, outputOperand->get(), ValueRange{});
}		}
LDBG("vectorized op: " << *write);		LDBG("vectorized op: " << *write);
if (!write->getResults().empty())		if (!write->getResults().empty())
return write->getResult(0);		return write->getResult(0);
return Value();		return Value();
}		}
Show All 18 Lines	vectorizeLinalgYield(OpBuilder &b, Operation *op,
auto yieldOp = dyn_cast<linalg::YieldOp>(op);		auto yieldOp = dyn_cast<linalg::YieldOp>(op);
if (!yieldOp)		if (!yieldOp)
return VectorizationResult{VectorizationStatus::Failure, nullptr};		return VectorizationResult{VectorizationStatus::Failure, nullptr};
for (auto outputs : llvm::enumerate(yieldOp.values())) {		for (auto outputs : llvm::enumerate(yieldOp.values())) {
// TODO: Scan for an opportunity for reuse.		// TODO: Scan for an opportunity for reuse.
// TODO: use a map.		// TODO: use a map.
Value vectorValue = bvm.lookup(outputs.value());		Value vectorValue = bvm.lookup(outputs.value());
Value newResult = buildVectorWrite(		Value newResult = buildVectorWrite(
b, vectorValue, linalgOp.getOutputOperand(outputs.index()), bvm);		b, vectorValue, linalgOp.getOutputOperand(outputs.index()));
if (newResult)		if (newResult)
newResults.push_back(newResult);		newResults.push_back(newResult);
}		}
return VectorizationResult{VectorizationStatus::NoReplace, nullptr};		return VectorizationResult{VectorizationStatus::NoReplace, nullptr};
}		}

/// Helper function to vectorize the index operations of a `linalgOp`. Return		/// Helper function to vectorize the index operations of a `linalgOp`. Return
/// VectorizationStatus::NewOp to signal the vectorization algorithm that it		/// VectorizationStatus::NewOp to signal the vectorization algorithm that it
Show All 26 Lines	static VectorizationResult vectorizeLinalgIndex(OpBuilder &b, Operation *op,
SmallVector<int64_t> transposition =		SmallVector<int64_t> transposition =
llvm::to_vector<16>(llvm::seq<int64_t>(0, linalgOp.getNumLoops()));		llvm::to_vector<16>(llvm::seq<int64_t>(0, linalgOp.getNumLoops()));
std::swap(transposition.back(), transposition[indexOp.dim()]);		std::swap(transposition.back(), transposition[indexOp.dim()]);
auto transposeOp =		auto transposeOp =
b.create<vector::TransposeOp>(loc, broadCastOp, transposition);		b.create<vector::TransposeOp>(loc, broadCastOp, transposition);
return VectorizationResult{VectorizationStatus::NewOp, transposeOp};		return VectorizationResult{VectorizationStatus::NewOp, transposeOp};
}		}

		/// Create a new vectorized verstion of `op` with the given operands and types.
		static Operation createVectorizedOp(OpBuilder &b, Operation op,
		ValueRange newOperands,
		ArrayRef<Type> types) {
		OperationState state(op->getLoc(), op->getName());
		state.addAttributes(op->getAttrs());
		state.addOperands(newOperands);
		state.addTypes(types);
		return b.createOperation(state);
		}

		/// Emit reduction operations if the shapes of the value to reduce is different
		/// that the result shape.
		static Operation reduceIfNeeded(OpBuilder &b, LinalgOp linalgOp, Operation op,
		Value reduceValue, Value initialValue,
		const BlockAndValueMapping &bvm) {
		Value reduceVec = bvm.lookup(reduceValue);
		Value outputVec = bvm.lookup(initialValue);
		auto reduceType = reduceVec.getType().dyn_cast<VectorType>();
		auto outputType = outputVec.getType().dyn_cast<VectorType>();
		// Reduce only if needed as the value may already have been reduce for
		// contraction vectorization.
		if (!reduceType \|\|
		(outputType && reduceType.getShape() == outputType.getShape()))
		return nullptr;
		SmallVector<bool> reductionMask = getReductionMask(linalgOp);
		Value reduce = buildMultiDimReduce(b, op, reduceVec, reductionMask);
		return createVectorizedOp(b, op, {reduce, outputVec}, reduce.getType());
		}

/// Generic vectorization for a single operation `op`, given already vectorized		/// Generic vectorization for a single operation `op`, given already vectorized
/// operands carried by `bvm`. Vectorization occurs as follows:		/// operands carried by `bvm`. Vectorization occurs as follows:
/// 1. Try to apply any of the `customVectorizationHooks` and return its		/// 1. Try to apply any of the `customVectorizationHooks` and return its
/// result on success.		/// result on success.
/// 2. Clone any constant in the current scope without vectorization: each		/// 2. Clone any constant in the current scope without vectorization: each
/// consumer of the constant will later determine the shape to which the		/// consumer of the constant will later determine the shape to which the
/// constant needs to be broadcast to.		/// constant needs to be broadcast to.
/// 3. Fail on any remaining non `ElementwiseMappable` op. It is the purpose		/// 3. Fail on any remaining non `ElementwiseMappable` op. It is the purpose
/// of the `customVectorizationHooks` to cover such cases.		/// of the `customVectorizationHooks` to cover such cases.
/// 4. Clone `op` in vector form to a vector of shape prescribed by the first		/// 4. Clone `op` in vector form to a vector of shape prescribed by the first
/// operand of maximal rank. Other operands have smaller rank and are		/// operand of maximal rank. Other operands have smaller rank and are
/// broadcast accordingly. It is assumed this broadcast is always legal,		/// broadcast accordingly. It is assumed this broadcast is always legal,
/// otherwise, it means one of the `customVectorizationHooks` is incorrect.		/// otherwise, it means one of the `customVectorizationHooks` is incorrect.
///		///
/// This function assumes all operands of `op` have been vectorized and are in		/// This function assumes all operands of `op` have been vectorized and are in
/// the `bvm` mapping. As a consequence, this function is meant to be called on		/// the `bvm` mapping. As a consequence, this function is meant to be called on
/// a topologically-sorted list of ops.		/// a topologically-sorted list of ops.
/// This function does not update `bvm` but returns a VectorizationStatus that		/// This function does not update `bvm` but returns a VectorizationStatus that
/// instructs the caller what `bvm` update needs to occur.		/// instructs the caller what `bvm` update needs to occur.
static VectorizationResult		static VectorizationResult
vectorizeOneOp(OpBuilder &b, Operation *op, const BlockAndValueMapping &bvm,		vectorizeOneOp(OpBuilder &b, LinalgOp linalgOp, Operation *op,
		const BlockAndValueMapping &bvm,
ArrayRef<CustomVectorizationHook> customVectorizationHooks) {		ArrayRef<CustomVectorizationHook> customVectorizationHooks) {
LDBG("vectorize op " << *op);		LDBG("vectorize op " << *op);

// 1. Try to apply any CustomVectorizationHook.		// 1. Try to apply any CustomVectorizationHook.
if (!customVectorizationHooks.empty()) {		if (!customVectorizationHooks.empty()) {
for (auto &customFunc : customVectorizationHooks) {		for (auto &customFunc : customVectorizationHooks) {
VectorizationResult result = customFunc(op, bvm);		VectorizationResult result = customFunc(op, bvm);
if (result.status == VectorizationStatus::Failure)		if (result.status == VectorizationStatus::Failure)
continue;		continue;
return result;		return result;
}		}
}		}

// 2. Constant ops don't get vectorized but rather broadcasted at their users.		// 2. Constant ops don't get vectorized but rather broadcasted at their users.
// Clone so that the constant is not confined to the linalgOp block .		// Clone so that the constant is not confined to the linalgOp block .
if (isa<arith::ConstantOp, ConstantOp>(op))		if (isa<arith::ConstantOp, ConstantOp>(op))
return VectorizationResult{VectorizationStatus::NewOp, b.clone(*op)};		return VectorizationResult{VectorizationStatus::NewOp, b.clone(*op)};

// 3. Only ElementwiseMappable are allowed in the generic vectorization.		// 3. Only ElementwiseMappable are allowed in the generic vectorization.
if (!OpTrait::hasElementwiseMappableTraits(op))		if (!OpTrait::hasElementwiseMappableTraits(op))
return VectorizationResult{VectorizationStatus::Failure, nullptr};		return VectorizationResult{VectorizationStatus::Failure, nullptr};

// 4. Generic vectorization path for ElementwiseMappable ops.		// 4 . Check if the operation is a reduction.
		SmallVector<std::pair<Value, Value>> reductionOperands;
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This "fused" loop structure feels quite unnatural. Maybe they are not possible in practice but it looks like there is a possible path where one operand passes all the checks and then we hit return; that would def. sound fishy. I'd rather structure the code like: SmallVector<Value> reducedValues, operands; for (Value operand : ...) { ... if (reducedValue) { reducedValues.push_back(reducedValue); operands.push_back(operand); } } assert(reducedValues.size() <= 1); ... etc nicolasvasilache: This "fused" loop structure feels quite unnatural. Maybe they are not possible in practice but…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Makes sense, I broke the loop up as suggested. ThomasRaoux: Makes sense, I broke the loop up as suggested.
		for (Value operand : op->getOperands()) {
		auto arg = operand.dyn_cast<BlockArgument>();
		if (!arg \|\| arg.getArgNumber() < linalgOp.getNumInputs())
		continue;
		SmallVector<Operation *> reductionOps;
		Value reduceValue = matchReduction(
		linalgOp.getRegionOutputArgs(),
		arg.getArgNumber() - linalgOp.getNumInputs(), reductionOps);
		dcaballeUnsubmitted Not Done Reply Inline Actions `reductionOps` may contain multiple ops. Actually, I think we are not even using `reductionOps` anywhere here? I think this is bringing back the problem I was trying to solve with the `matchReduction` utility: detecting reductions with multiple ops. dcaballe: `reductionOps` may contain multiple ops. Actually, I think we are not even using `reductionOps`…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Right, we currently check that there is only a single op in `matchReduction` otherwise it fails. This wouldn't work otherwise indeed, maybe we need an assert. Here op will be the same as reductionOps[0], probably worth an assert too. ThomasRaoux: Right, we currently check that there is only a single op in `matchReduction` otherwise it fails.
		if (!reduceValue)
		continue;
		reductionOperands.push_back(std::make_pair(reduceValue, operand));
		}
		if (!reductionOperands.empty()) {
		assert(reductionOperands.size() == 1);
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions "reduce only if needed" sounds exactly like `reduceIfNeeded`. I understand the impl changes as we do it eagerly instead of late. Can we keep the function name and move the modified impl in there ? nicolasvasilache: "reduce only if needed" sounds exactly like `reduceIfNeeded`. I understand the impl changes as…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Moved the logic in `reduceIfNeeded` function ThomasRaoux: Moved the logic in `reduceIfNeeded` function
		Operation *reduceOp =
		reduceIfNeeded(b, linalgOp, op, reductionOperands[0].first,
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions is this check only for avoid interfering with the specific path of contraction vectorization? If so it starts to feel like we want to push on the missing canonicalizations / foldings and drop such "interference avoidance checks". nicolasvasilache: is this check only for avoid interfering with the specific path of contraction vectorization?
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Yes, I can take a look at that sometime soon. ThomasRaoux: Yes, I can take a look at that sometime soon.
		reductionOperands[0].second, bvm);
		if (reduceOp)
		return VectorizationResult{VectorizationStatus::NewOp, reduceOp};
		}

		// 5. Generic vectorization path for ElementwiseMappable ops.
// a. first get the first max ranked shape.		// a. first get the first max ranked shape.
SmallVector<int64_t, 4> firstMaxRankedShape;		SmallVector<int64_t, 4> firstMaxRankedShape;
for (Value operand : op->getOperands()) {		for (Value operand : op->getOperands()) {
auto vt = bvm.lookup(operand).getType().dyn_cast<VectorType>();		auto vt = bvm.lookup(operand).getType().dyn_cast<VectorType>();
if (vt && firstMaxRankedShape.size() < vt.getShape().size())		if (vt && firstMaxRankedShape.size() < vt.getShape().size())
firstMaxRankedShape.assign(vt.getShape().begin(), vt.getShape().end());		firstMaxRankedShape.assign(vt.getShape().begin(), vt.getShape().end());
}		}
// b. broadcast each op if needed.		// b. broadcast each op if needed.
auto vectorizedOperands = llvm::map_range(op->getOperands(), [&](Value v) {		auto vectorizedOperands = llvm::map_range(op->getOperands(), [&](Value v) {
return firstMaxRankedShape.empty()		return firstMaxRankedShape.empty()
? bvm.lookup(v)		? bvm.lookup(v)
: broadcastIfNeeded(b, bvm.lookup(v), firstMaxRankedShape);		: broadcastIfNeeded(b, bvm.lookup(v), firstMaxRankedShape);
});		});
// c. for elementwise, the result is the vector with the firstMaxRankedShape		// c. for elementwise, the result is the vector with the firstMaxRankedShape
auto returnTypes = llvm::map_range(op->getResultTypes(), [&](Type t) {		auto returnTypes = llvm::map_range(op->getResultTypes(), [&](Type t) {
return firstMaxRankedShape.empty()		return firstMaxRankedShape.empty()
? t		? t
: VectorType::get(firstMaxRankedShape, t);		: VectorType::get(firstMaxRankedShape, t);
});		});

// Build and return the new op.		// Build and return the new op.
OperationState state(op->getLoc(), op->getName());		return VectorizationResult{
state.addAttributes(op->getAttrs());		VectorizationStatus::NewOp,
state.addOperands(llvm::to_vector<4>(vectorizedOperands));		createVectorizedOp(b, op, llvm::to_vector<4>(vectorizedOperands),
state.addTypes(llvm::to_vector<4>(returnTypes));		llvm::to_vector<4>(returnTypes))};
return VectorizationResult{VectorizationStatus::NewOp,
b.createOperation(state)};
}		}

/// Detect whether `r` has only ConstantOp, ElementwiseMappable and YieldOp.		/// Detect whether `r` has only ConstantOp, ElementwiseMappable and YieldOp.
static bool hasOnlyScalarElementwiseOp(Region &r) {		static bool hasOnlyScalarElementwiseOp(Region &r) {
if (!llvm::hasSingleElement(r))		if (!llvm::hasSingleElement(r))
return false;		return false;
for (Operation &op : r.front()) {		for (Operation &op : r.front()) {
if (!(isa<arith::ConstantOp, ConstantOp, linalg::YieldOp, linalg::IndexOp>(		if (!(isa<arith::ConstantOp, ConstantOp, linalg::YieldOp, linalg::IndexOp>(
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	if (linalgOp.isScalar(opOperand)) {
continue;		continue;
}		}
// TODO: 0-d vectors.		// TODO: 0-d vectors.
Type readType;		Type readType;
AffineMap map;		AffineMap map;
if (linalgOp.getShape(opOperand).empty()) {		if (linalgOp.getShape(opOperand).empty()) {
readType = bbarg.getType();		readType = bbarg.getType();
} else {		} else {
if (broadcastToMaximalCommonShape) {		if (broadcastToMaximalCommonShape &&
		opOperand->getOperandNumber() < linalgOp.getNumInputs()) {
map = inverseAndBroadcastProjectedPermuation(		map = inverseAndBroadcastProjectedPermuation(
linalgOp.getTiedIndexingMap(opOperand));		linalgOp.getTiedIndexingMap(opOperand));
readType = VectorType::get(commonVectorShape,		readType = VectorType::get(commonVectorShape,
getElementTypeOrSelf(opOperand->get()));		getElementTypeOrSelf(opOperand->get()));
} else {		} else {
map = inversePermutation(		map = inversePermutation(
reindexIndexingMap(linalgOp.getTiedIndexingMap(opOperand)));		reindexIndexingMap(linalgOp.getTiedIndexingMap(opOperand)));
readType = VectorType::get(map.compose(linalgOp.getShape(opOperand)),		readType = VectorType::get(map.compose(linalgOp.getShape(opOperand)),
Show All 20 Lines	CustomVectorizationHook vectorizeIndex =
[&](Operation *op,		[&](Operation *op,
const BlockAndValueMapping &bvm) -> VectorizationResult {		const BlockAndValueMapping &bvm) -> VectorizationResult {
return vectorizeLinalgIndex(b, op, linalgOp);		return vectorizeLinalgIndex(b, op, linalgOp);
};		};
hooks.push_back(vectorizeIndex);		hooks.push_back(vectorizeIndex);

// 5. Iteratively call `vectorizeOneOp` to each op in the slice.		// 5. Iteratively call `vectorizeOneOp` to each op in the slice.
for (Operation &op : block.getOperations()) {		for (Operation &op : block.getOperations()) {
VectorizationResult result = vectorizeOneOp(b, &op, bvm, hooks);		VectorizationResult result = vectorizeOneOp(b, linalgOp, &op, bvm, hooks);
if (result.status == VectorizationStatus::Failure) {		if (result.status == VectorizationStatus::Failure) {
LDBG("failed to vectorize: " << op);		LDBG("failed to vectorize: " << op);
return failure();		return failure();
}		}
if (result.status == VectorizationStatus::NewOp) {		if (result.status == VectorizationStatus::NewOp) {
LDBG("new vector op: " << *result.newOp;);		LDBG("new vector op: " << *result.newOp;);
bvm.map(op.getResults(), result.newOp->getResults());		bvm.map(op.getResults(), result.newOp->getResults());
}		}
▲ Show 20 Lines • Show All 851 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

	Show First 20 Lines • Show All 743 Lines • ▼ Show 20 Lines

	// -----			// -----

	// CHECK-LABEL: func @sum_exp			// CHECK-LABEL: func @sum_exp
	func @sum_exp(%input: tensor<4x16x8xf32>, %output: tensor<4x16xf32>)			func @sum_exp(%input: tensor<4x16x8xf32>, %output: tensor<4x16xf32>)
	-> tensor<4x16xf32>			-> tensor<4x16xf32>
	{			{
	// CHECK: vector.transfer_read {{.*}} : tensor<4x16x8xf32>, vector<4x16x8xf32>			// CHECK: vector.transfer_read {{.*}} : tensor<4x16x8xf32>, vector<4x16x8xf32>
				// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]} : tensor<4x16xf32>, vector<4x16xf32>
	// CHECK: math.exp {{.*}} : vector<4x16x8xf32>			// CHECK: math.exp {{.*}} : vector<4x16x8xf32>
	// CHECK: vector.multi_reduction #vector.kind<add>, %{{.*}} [2] : vector<4x16x8xf32> to vector<4x16xf32>			// CHECK: vector.multi_reduction #vector.kind<add>, %{{.*}} [2] : vector<4x16x8xf32> to vector<4x16xf32>
	// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true]} : tensor<4x16xf32>, vector<4x16xf32>
	// CHECK: addf {{.*}} : vector<4x16xf32>			// CHECK: addf {{.*}} : vector<4x16xf32>
	// CHECK: vector.transfer_write {{.*}} : vector<4x16xf32>, tensor<4x16xf32>			// CHECK: vector.transfer_write {{.*}} : vector<4x16xf32>, tensor<4x16xf32>
	// CHECK: return {{.*}} : tensor<4x16xf32>			// CHECK: return {{.*}} : tensor<4x16xf32>
	%0 = linalg.generic {			%0 = linalg.generic {
	indexing_maps = [			indexing_maps = [
	affine_map<(d0, d1, d2) -> (d0, d1, d2)>,			affine_map<(d0, d1, d2) -> (d0, d1, d2)>,
	affine_map<(d0, d1, d2) -> (d0, d1)>			affine_map<(d0, d1, d2) -> (d0, d1)>
	],			],
	Show All 14 Lines
	// CHECK-DAG: #[[$M3:.*]] = affine_map<(d0, d1) -> (d1, d0)>			// CHECK-DAG: #[[$M3:.*]] = affine_map<(d0, d1) -> (d1, d0)>

	// CHECK-LABEL: func @sum_exp_2			// CHECK-LABEL: func @sum_exp_2
	func @sum_exp_2(%input: tensor<3x2xf32>, %input_2: tensor<5x4xf32>, %output: tensor<5x2xf32>)			func @sum_exp_2(%input: tensor<3x2xf32>, %input_2: tensor<5x4xf32>, %output: tensor<5x2xf32>)
	-> tensor<5x2xf32>			-> tensor<5x2xf32>
	{			{
	// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true, true, true], permutation_map = #[[$M1]]} : tensor<3x2xf32>, vector<2x3x4x5xf32>			// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true, true, true], permutation_map = #[[$M1]]} : tensor<3x2xf32>, vector<2x3x4x5xf32>
	// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true, true, true], permutation_map = #[[$M2]]} : tensor<5x4xf32>, vector<2x3x4x5xf32>			// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true, true, true], permutation_map = #[[$M2]]} : tensor<5x4xf32>, vector<2x3x4x5xf32>
				// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M3]]} : tensor<5x2xf32>, vector<2x5xf32>
	// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>			// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>
	// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>			// CHECK: math.exp {{.*}} : vector<2x3x4x5xf32>
	// CHECK: addf {{.*}} : vector<2x3x4x5xf32>			// CHECK: addf {{.*}} : vector<2x3x4x5xf32>
	// CHECK: vector.multi_reduction #vector.kind<add>, {{.*}} [1, 2] : vector<2x3x4x5xf32> to vector<2x5xf32>			// CHECK: vector.multi_reduction #vector.kind<add>, {{.*}} [1, 2] : vector<2x3x4x5xf32> to vector<2x5xf32>
	// CHECK: vector.transfer_read {{.*}} {in_bounds = [true, true], permutation_map = #[[$M3]]} : tensor<5x2xf32>, vector<2x5xf32>
	// CHECK: addf {{.*}} : vector<2x5xf32>			// CHECK: addf {{.*}} : vector<2x5xf32>
	// CHECK: vector.transfer_write {{.*}} {in_bounds = [true, true], permutation_map = #[[$M3]]} : vector<2x5xf32>, tensor<5x2xf32>			// CHECK: vector.transfer_write {{.*}} {in_bounds = [true, true], permutation_map = #[[$M3]]} : vector<2x5xf32>, tensor<5x2xf32>
	// CHECK: return {{.*}} : tensor<5x2xf32>			// CHECK: return {{.*}} : tensor<5x2xf32>
	%0 = linalg.generic {			%0 = linalg.generic {
	indexing_maps = [			indexing_maps = [
	affine_map<(d0, d1, d2, d3) -> (d1, d0)>,			affine_map<(d0, d1, d2, d3) -> (d1, d0)>,
	affine_map<(d0, d1, d2, d3) -> (d3, d2)>,			affine_map<(d0, d1, d2, d3) -> (d3, d2)>,
	affine_map<(d0, d1, d2, d3) -> (d3, d0)>			affine_map<(d0, d1, d2, d3) -> (d3, d0)>
	▲ Show 20 Lines • Show All 249 Lines • Show Last 20 Lines