This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
2/6
Vectorization.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
vectorization.mlir

Differential D149678

[mlir][linalg] Add scalar broadcast load case to the vectoriser
ClosedPublic

Authored by awarzynski on May 2 2023, 12:14 PM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
dcaballe

Commits

rG678360fd9d85: [mlir][linalg] Add scalar broadcast load case to the vectoriser

Summary

This patch extends the Linalg vectoriser so that scalar loads are
correctly identified as scalar rather than gather loads. Below is an
example of a scalar load:

func.func @example(%arg0: tensor<80x16xf32>, %arg2: tensor<1x4xf32>) -> tensor<1x4xf32> {
%c8 = arith.constant 8 : index
%c16 = arith.constant 16 : index
%1 = linalg.generic {
    indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>],
    iterator_types = ["parallel", "parallel"]
  } outs(%arg2 : tensor<1x4xf32>) {
  ^bb0(%out: f32):
    %2 = linalg.index 0 : index
    %extracted = tensor.extract %arg0[%2, %c16] : tensor<80x16xf32>
    linalg.yield %extracted : f32
  } -> tensor<1x4xf32>
  return %1 : tensor<1x4xf32>
}

This patch also makes sure that these scalar loads are indeed lowered to
scalar loads (implemented as vector.gather) followed by broadcast:

%8 = vector.gather %arg0[%c0, %c0] [%7], %cst_1, %cst_2 : tensor<80x16xf32>, vector<1xindex>, vector<1xi1>, vector<1xf32> into vector<1xf32>
%9 = vector.broadcast %8 : vector<1xf32> to vector<1x4xf32>

We still need to check what backends do in these cases and whether the
vectoriser could generate genuinely scalar code instead.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

awarzynski created this revision.May 2 2023, 12:14 PM

Herald added a reviewer: aartbik. · View Herald TranscriptMay 2 2023, 12:14 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, hanchung, Moerafaat and 24 others. · View Herald Transcript

awarzynski requested review of this revision.May 2 2023, 12:14 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptMay 2 2023, 12:14 PM

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: • pcwang-thead, limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B229506: Diff 518825.May 2 2023, 2:07 PM

Matt added a subscriber: Matt.May 19 2023, 2:21 PM

Rebase

@dcaballe, I was hoping that I could replace vector.gather with either vector.extractelement or vector.extract, but:

vector.extractelement only works for 1-D or 0-D vectors,
vector.extract requires positions as "64-bit integer array attribute".

Neither of these is satisified though. Any thoughts? Perhaps I'm missing something 🤔 .

awarzynski edited the summary of this revision. (Show Details)Jun 2 2023, 2:12 AM

Harbormaster completed remote builds in B236122: Diff 527782.Jun 2 2023, 2:21 AM

You can represent a scalar load + broadcast with a vector.transfer_read if you provide the right affine map with results set to zero. It would be something like for a scalar load + 2D broadcast:

%0 = vector.transfer_read %M[%i, %j], %cst {permutation_map = affine_map<(d0, d1)->(0, 0)>} :  tensor<?x?xf32>, vector<4x8xf32>

Use vector.transfer_read instead of vector.gather

Funnily enough, some canonicalisation turns this back into tensor.extract. The output looks OK, though I've not been able to trace that back (the debug dump suggests TransferOpReduceRank).

Harbormaster completed remote builds in B237566: Diff 529693.Jun 8 2023, 12:30 PM

Awesome, thank you so much! Some minor comments but this is good to go!

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
877	is `ValueRange` needed here?
895	is this message accurate now given that we can return Gather?
899	This is really clear now, thanks!
1061	cool!

This revision is now accepted and ready to land.Jun 9 2023, 8:37 AM

Thanks for the review, Diego! I will incorporate your suggestions in the final version that I will be merging shortly.

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
877	Nope :) I will remove it before merging this change.
895	Given this on L886: if (!leadingIdxsLoopInvariant) return VectorMemoryAccessKind::Gather; I can safely simplify this block so that it can only return `VectorMemoryAccessKind::ScalarBroadcast`. With that change the message would be valid. Will update before merging.

This revision was landed with ongoing or failed builds.Jun 12 2023, 7:19 AM

Closed by commit rG678360fd9d85: [mlir][linalg] Add scalar broadcast load case to the vectoriser (authored by awarzynski). · Explain Why

This revision was automatically updated to reflect the committed changes.

awarzynski added a commit: rG678360fd9d85: [mlir][linalg] Add scalar broadcast load case to the vectoriser.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Vectorization.cpp

91 lines

test/

Dialect/

Linalg/

vectorization.mlir

85 lines

Diff 518825

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

Show First 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	auto extractOpIndex =
indexVecType.getShape());		indexVecType.getShape());

offset = rewriter.create<arith::AddIOp>(loc, extractOpIndex, offset);		offset = rewriter.create<arith::AddIOp>(loc, extractOpIndex, offset);
}		}

return offset;		return offset;
}		}

enum VectorMemoryAccessKind {		enum VectorMemoryAccessKind { ScalarBroadcast, Contiguous, Gather };
// TODO: ScalarBroadcast,
Contiguous,
Gather
};

/// Checks whether /p val can be used for calculating a loop invariant index.		/// Checks whether /p val can be used for calculating a loop invariant index.
static bool isLoopInvariantIdx(LinalgOp &linalgOp, Value &val) {		static bool isLoopInvariantIdx(LinalgOp &linalgOp, Value &val) {

auto targetShape = linalgOp.getStaticLoopRanges();		auto targetShape = linalgOp.getStaticLoopRanges();
assert(((llvm::count_if(targetShape,		assert(((llvm::count_if(targetShape,
[](int64_t dimSize) { return dimSize > 1; }) == 1)) &&		[](int64_t dimSize) { return dimSize > 1; }) == 1)) &&
"n-D vectors are not yet supported");		"n-D vectors are not yet supported");
▲ Show 20 Lines • Show All 122 Lines • ▼ Show 20 Lines	getTensorExtractMemoryAccessPattern(tensor::ExtractOp extractOp,
auto inputShape = extractOp.getTensor().getType().cast<ShapedType>();		auto inputShape = extractOp.getTensor().getType().cast<ShapedType>();

// 2. Assume that it's a gather load when reading _from_ a tensor for which		// 2. Assume that it's a gather load when reading _from_ a tensor for which
// the trailing dimension is 1, e.g. `tensor<1x4x1xi32>`.		// the trailing dimension is 1, e.g. `tensor<1x4x1xi32>`.
// TODO: Relax this condition.		// TODO: Relax this condition.
if (inputShape.getShape().back() == 1)		if (inputShape.getShape().back() == 1)
return VectorMemoryAccessKind::Gather;		return VectorMemoryAccessKind::Gather;

bool isContiguous = true;		bool leadingIdxsLoopInvariant = true;

// 3a. Analyze the leading indices of `extractOp`.		// 3. Analyze the leading indices of `extractOp`.
// Look at the way each index is calculated and decide whether it is suitable		// Look at the way each index is calculated and decide whether it is suitable
// for a contiguous load, i.e. whether it's loop invariant.		// for a contiguous load, i.e. whether it's loop invariant.
auto indices = extractOp.getIndices();		auto indices = extractOp.getIndices();
auto leadIndices = ValueRange(indices.drop_back(1));		auto leadIndices = ValueRange(indices.drop_back(1));
		dcaballeUnsubmitted Not Done Reply Inline Actions is `ValueRange` needed here? dcaballe: is `ValueRange` needed here?
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions Nope :) I will remove it before merging this change. awarzynski: Nope :) I will remove it before merging this change.

for (auto [i, indexVal] : llvm::enumerate(leadIndices)) {		for (auto [i, indexVal] : llvm::enumerate(leadIndices)) {
if (inputShape.getShape()[i] == 1)		if (inputShape.getShape()[i] == 1)
continue;		continue;

isContiguous &= isLoopInvariantIdx(linalgOp, indexVal);		leadingIdxsLoopInvariant &= isLoopInvariantIdx(linalgOp, indexVal);
}		}

// 3b. Analyze the trailing index for `extractOp`.		if (!leadingIdxsLoopInvariant)
		return VectorMemoryAccessKind::Gather;

		// 4. Analyze the trailing index for `extractOp`.
auto extractOpTrailingIdx = indices.back();		auto extractOpTrailingIdx = indices.back();
// For contiguous loads, the trailing `extractOp` index should increment with
// every loop iteration. This effectively means that it must be based on the		// 4a. Scalar broadcast load
// trailing loop index. This is what the following bool captures.		// If the trailing index is loop invariant then this is a scalar load.
		if (isLoopInvariantIdx(linalgOp, extractOpTrailingIdx)) {
		LDBG("Found scalar broadcast load: " << extractOp);
		dcaballeUnsubmitted Not Done Reply Inline Actions is this message accurate now given that we can return Gather? dcaballe: is this message accurate now given that we can return Gather?
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions Given this on L886: if (!leadingIdxsLoopInvariant) return VectorMemoryAccessKind::Gather; I can safely simplify this block so that it can only return `VectorMemoryAccessKind::ScalarBroadcast`. With that change the message would be valid. Will update before merging. awarzynski: Given this on L886: ```lang=cpp if (!leadingIdxsLoopInvariant) return…

		return leadingIdxsLoopInvariant ? VectorMemoryAccessKind::ScalarBroadcast
		: VectorMemoryAccessKind::Gather;
		}
		dcaballeUnsubmitted Not Done Reply Inline Actions This is really clear now, thanks! dcaballe: This is really clear now, thanks!

		// 4b. Contiguous loads
		// The trailing `extractOp` index should increment with every loop iteration.
		// This effectively means that it must be based on the trailing loop index.
		// This is what the following bool captures.
bool foundIndexOp = false;		bool foundIndexOp = false;
isContiguous &=		bool isContiguousLoad =
isContiguousLoadIdx(linalgOp, extractOpTrailingIdx, foundIndexOp);		isContiguousLoadIdx(linalgOp, extractOpTrailingIdx, foundIndexOp);
isContiguous &= foundIndexOp;		isContiguousLoad &= foundIndexOp;

if (isContiguous) {		if (isContiguousLoad) {
LDBG("Found contigous load: " << extractOp);		LDBG("Found contigous load: " << extractOp);
return VectorMemoryAccessKind::Contiguous;		return VectorMemoryAccessKind::Contiguous;
}		}

return VectorMemoryAccessKind::Gather;		return VectorMemoryAccessKind::Gather;
}		}

/// Helper function to vectorize the tensor.extract operations. Returns		/// Helper function to vectorize the tensor.extract operations. Returns
Show All 34 Lines	if (memAccessKind == VectorMemoryAccessKind::Gather) {
Value offset = calculateGatherOffset(rewriter, extractOp, bvm, targetShape);		Value offset = calculateGatherOffset(rewriter, extractOp, bvm, targetShape);

// Generate the gather load		// Generate the gather load
Operation *gatherOp = rewriter.create<vector::GatherOp>(		Operation *gatherOp = rewriter.create<vector::GatherOp>(
loc, resultType, extractOp.getTensor(), baseIndices, offset,		loc, resultType, extractOp.getTensor(), baseIndices, offset,
maskConstantOp, passThruConstantOp);		maskConstantOp, passThruConstantOp);
gatherOp = state.maskOperation(rewriter, gatherOp, linalgOp);		gatherOp = state.maskOperation(rewriter, gatherOp, linalgOp);

LDBG("Vectorised as gather load: " << extractOp);		LDBG("Vectorised as gather load: " << extractOp << "\n");
return VectorizationResult{VectorizationStatus::NewOp, gatherOp};		return VectorizationResult{VectorizationStatus::NewOp, gatherOp};
}		}

		// 2. Handle scalar broadcast access. Similarly to the "gather load" case,
		// generate a vector.gather. However, load only one element and then broadcast
		// it.
		if (memAccessKind == VectorMemoryAccessKind::ScalarBroadcast) {
		SmallVector<Value> baseIndices(
		extractOp.getIndices().size(),
		rewriter.create<arith::ConstantIndexOp>(loc, 0));

		Value offset = calculateGatherOffset(rewriter, extractOp, bvm, targetShape);

		// <1x1xNxi32> --> <Nxi32>
		auto resTrailingDim = resultType.getShape().back();
		auto offsetAs1dVector = rewriter.create<vector::ShapeCastOp>(
		loc, VectorType::get({resTrailingDim}, rewriter.getIndexType()),
		offset);

		// There's only 1 unique offset value in the `offset` vector. Extract it:
		// <Nxi32> --> i32
		auto zero = rewriter.create<arith::ConstantOp>(
		loc, rewriter.getI32Type(),
		rewriter.getZeroAttr(rewriter.getI32Type()));
		auto offsetUniqueVal =
		rewriter.create<vector::ExtractElementOp>(loc, offsetAs1dVector, zero);

		// Cast the scalar index to a 1-element vector:
		// i32 --> <1xi32>
		auto resultTypeAs1dVec =
		VectorType::get({1}, extractOp.getResult().getType());
		auto offsetFor1Val = broadcastIfNeeded(
		rewriter, offsetUniqueVal.getResult(), resultTypeAs1dVec.getShape());

		auto maskConstantOp = rewriter.create<arith::ConstantOp>(
		loc,
		DenseIntElementsAttr::get(VectorType::get({1}, rewriter.getI1Type()),
		/value=/true));

		auto passThruConstantOp = rewriter.create<arith::ConstantOp>(
		loc, rewriter.getZeroAttr(resultTypeAs1dVec));
		Operation *gatherOp = rewriter.create<vector::GatherOp>(
		loc, resultTypeAs1dVec, extractOp.getTensor(), baseIndices,
		offsetFor1Val, maskConstantOp, passThruConstantOp);

		auto readValue = rewriter.create<vector::BroadcastOp>(
		loc, resultType, gatherOp->getResult(0));

		LDBG("Vectorised as scalar broadcast load: " << extractOp << "\n");
		return VectorizationResult{VectorizationStatus::NewOp, readValue};
		}

// 2. Handle contiguous access.		// 2. Handle contiguous access.
LDBG("Vectorised as contiguous load: " << extractOp);		LDBG("Vectorised as contiguous load: " << extractOp);
SmallVector<Value> transferReadIdxs;		SmallVector<Value> transferReadIdxs;
auto resTrailingDim = resultType.getShape().back();		auto resTrailingDim = resultType.getShape().back();
auto zero = rewriter.create<arith::ConstantOp>(		auto zero = rewriter.create<arith::ConstantOp>(
loc, rewriter.getI32Type(), rewriter.getZeroAttr(rewriter.getI32Type()));		loc, rewriter.getI32Type(), rewriter.getZeroAttr(rewriter.getI32Type()));

// Collect indices for `vector.transfer_read`. At this point, the indices will		// Collect indices for `vector.transfer_read`. At this point, the indices will
Show All 31 Lines	vectorizeTensorExtract(RewriterBase &rewriter, VectorizationState &state,
auto srcRank = extractOp.getTensor().getType().getRank();		auto srcRank = extractOp.getTensor().getType().getRank();
auto permutationMap = AffineMap::getMinorIdentityMap(		auto permutationMap = AffineMap::getMinorIdentityMap(
srcRank, std::min(dstRank, srcRank), rewriter.getContext());		srcRank, std::min(dstRank, srcRank), rewriter.getContext());

int32_t rankDiff = dstRank - srcRank;		int32_t rankDiff = dstRank - srcRank;
// When dstRank > srcRank, broadcast the source tensor to the unitary leading		// When dstRank > srcRank, broadcast the source tensor to the unitary leading
// dims so that the ranks match. This is done by extending the map with 0s.		// dims so that the ranks match. This is done by extending the map with 0s.
// For example, for dstRank = 3, srcRank = 2, the following map created		// For example, for dstRank = 3, srcRank = 2, the following map created
// above:		// above:
		dcaballeUnsubmitted Not Done Reply Inline Actions cool! dcaballe: cool!
// (d0, d1) --> (d0, d1)		// (d0, d1) --> (d0, d1)
// is extended as:		// is extended as:
// (d0, d1) --> (0, d0, d1)		// (d0, d1) --> (0, d0, d1)
while (rankDiff > 0) {		while (rankDiff > 0) {
permutationMap = permutationMap.insertResult(		permutationMap = permutationMap.insertResult(
mlir::getAffineConstantExpr(0, rewriter.getContext()), 0);		mlir::getAffineConstantExpr(0, rewriter.getContext()), 0);
rankDiff--;		rankDiff--;
}		}
▲ Show 20 Lines • Show All 1,948 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

Show First 20 Lines • Show All 1,584 Lines • ▼ Show 20 Lines	func.func @vectorize_nd_tensor_extract_constant_idx(%arg0: tensor<3x3xf32>, %arg2: tensor<1x1x3xf32>) -> tensor<1x1x3xf32> {
} outs(%arg2 : tensor<1x1x3xf32>) {		} outs(%arg2 : tensor<1x1x3xf32>) {
^bb0(%arg4: f32):		^bb0(%arg4: f32):
%7 = tensor.extract %arg0[%c0, %c1] : tensor<3x3xf32>		%7 = tensor.extract %arg0[%c0, %c1] : tensor<3x3xf32>
linalg.yield %7 : f32		linalg.yield %7 : f32
} -> tensor<1x1x3xf32>		} -> tensor<1x1x3xf32>
return %2 : tensor<1x1x3xf32>		return %2 : tensor<1x1x3xf32>
}		}

// CHECK-LABEL: func.func @vectorize_nd_tensor_extract_constant_idx		// CHECK-LABEL: func.func @vectorize_nd_tensor_extract_constant_idx(
// CHECK-SAME: %[[ARG0:.*]]: tensor<3x3xf32>		// CHECK-SAME: %[[VAL_0:.*]]: tensor<3x3xf32>,
// CHECK-SAME: %[[ARG1:.*]]: tensor<1x1x3xf32>		// CHECK-SAME: %[[VAL_1:.*]]: tensor<1x1x3xf32>) -> tensor<1x1x3xf32> {
// CHECK: %[[MASK:.*]] = arith.constant dense<true> : vector<1x1x3xi1>		// CHECK: %[[VAL_2:.*]] = arith.constant 0 : index
// CHECK: %[[PASSTHRU:.*]] = arith.constant dense<0.000000e+00> : vector<1x1x3xf32>
// CHECK: %[[C0:.*]] = arith.constant 0 : index
// Magic "5" below comes from (1 * 3 + 2) (1: index into dim 1, 2: index into dim 2)		// Magic "5" below comes from (1 * 3 + 2) (1: index into dim 1, 2: index into dim 2)
// CHECK: %[[IDX:.*]] = arith.constant dense<5> : vector<1x1x3xindex>		// CHECK: %[[VAL_3:.*]] = arith.constant dense<5> : vector<1x1x3xindex>
// CHECK: %[[GATHER:.*]] = vector.gather %[[ARG0]][%[[C0]], %[[C0]]] [%[[IDX]]], %[[MASK]], %[[PASSTHRU]] : tensor<3x3xf32>, vector<1x1x3xindex>, vector<1x1x3xi1>, vector<1x1x3xf32> into vector<1x1x3xf32>		// CHECK: %[[VAL_4:.*]] = arith.constant 0 : i32
// CHECK: vector.transfer_write %[[GATHER]]		// CHECK: %[[VAL_5:.*]] = arith.constant dense<true> : vector<1xi1>
		// CHECK: %[[VAL_6:.*]] = arith.constant dense<0.000000e+00> : vector<1xf32>
		// CHECK: %[[VAL_7:.*]] = vector.shape_cast %[[VAL_3]] : vector<1x1x3xindex> to vector<3xindex>
		// CHECK: %[[VAL_8:.*]] = vector.extractelement %[[VAL_7]]{{\[}}%[[VAL_4]] : i32] : vector<3xindex>
		// CHECK: %[[VAL_9:.*]] = vector.broadcast %[[VAL_8]] : index to vector<1xindex>
		// CHECK: %[[VAL_10:.*]] = vector.gather %[[VAL_0]]{{\[}}%[[VAL_2]], %[[VAL_2]]] {{\[}}%[[VAL_9]]], %[[VAL_5]], %[[VAL_6]] : tensor<3x3xf32>, vector<1xindex>, vector<1xi1>, vector<1xf32> into vector<1xf32>
		// CHECK: %[[VAL_11:.*]] = vector.broadcast %[[VAL_10]] : vector<1xf32> to vector<1x1x3xf32>
		// CHECK: %[[VAL_12:.*]] = vector.transfer_write %[[VAL_11]], %[[VAL_1]]{{\[}}%[[VAL_2]], %[[VAL_2]], %[[VAL_2]]] {in_bounds = [true, true, true]} : vector<1x1x3xf32>, tensor<1x1x3xf32>
		// CHECK: return %[[VAL_12]] : tensor<1x1x3xf32>
// CHECK: }		// CHECK: }


transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!pdl.operation) -> !pdl.operation		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!pdl.operation) -> !pdl.operation
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
%2 = transform.structured.vectorize %1 { vectorize_nd_extract }		%2 = transform.structured.vectorize %1 { vectorize_nd_extract }
}		}

// -----		// -----
▲ Show 20 Lines • Show All 243 Lines • ▼ Show 20 Lines
}		}

// First `tensor.extract` is a loop invariant scalar load. This way, the		// First `tensor.extract` is a loop invariant scalar load. This way, the
// following `tensor.extract` Op becomes a contiguous load (all other Ops used		// following `tensor.extract` Op becomes a contiguous load (all other Ops used
// for address calculation also satisfy the required conditions).		// for address calculation also satisfy the required conditions).
// TODO: Don't use vector.gather for the first tensor.extract.		// TODO: Don't use vector.gather for the first tensor.extract.

// CHECK-LABEL: func.func @vectorize_nd_tensor_extract_with_tensor_extract(		// CHECK-LABEL: func.func @vectorize_nd_tensor_extract_with_tensor_extract(
// CHECK-SAME: %[[VAL_0:.*]]: tensor<1x20xi32>,		// CHECK-SAME: %[[VAL_0:.*]]: tensor<1x20xi32>,
// CHECK-SAME: %[[VAL_1:.*]]: tensor<257x24xf32>,		// CHECK-SAME: %[[VAL_1:.*]]: tensor<257x24xf32>,
// CHECK-SAME: -> tensor<1x1x4xf32> {		// CHECK-SAME: %[[VAL_2:.]]: index, %[[VAL_3:.]]: index, %[[VAL_4:.]]: index, %[[VAL_5:.]]: index) -> tensor<1x1x4xf32> {
// CHECK-DAG: %[[VAL_6:.*]] = arith.constant dense<0> : vector<1x1x4xindex>		// CHECK: %[[VAL_6:.*]] = arith.constant dense<0> : vector<1x1x4xindex>
// CHECK-DAG: %[[VAL_7:.*]] = arith.constant dense<[0, 1, 2, 3]> : vector<4xindex>		// CHECK: %[[VAL_7:.*]] = arith.constant dense<[0, 1, 2, 3]> : vector<4xindex>
// CHECK-DAG: %[[VAL_8:.*]] = arith.constant dense<true> : vector<1x1x4xi1>		// CHECK: %[[VAL_8:.*]] = arith.constant 0 : index
// CHECK-DAG: %[[VAL_9:.*]] = arith.constant dense<0> : vector<1x1x4xi32>		// CHECK: %[[VAL_9:.*]] = arith.constant 0 : i32
// CHECK-DAG: %[[VAL_10:.*]] = arith.constant 0 : index		// CHECK: %[[VAL_10:.*]] = arith.constant dense<true> : vector<1xi1>
// CHECK-DAG: %[[VAL_11:.*]] = arith.constant dense<256> : vector<1x1x4xindex>		// CHECK: %[[VAL_11:.*]] = arith.constant dense<0> : vector<1xi32>
// CHECK-DAG: %[[VAL_12:.*]] = arith.constant 0 : i32		// CHECK: %[[VAL_12:.*]] = arith.constant dense<256> : vector<1x1x4xindex>
// CHECK-DAG: %[[VAL_13:.*]] = arith.constant 0.000000e+00 : f32		// CHECK: %[[VAL_13:.*]] = arith.constant 0.000000e+00 : f32
// CHECK: %[[VAL_14:.*]] = tensor.empty() : tensor<1x1x4xf32>		// CHECK: %[[VAL_14:.*]] = tensor.empty() : tensor<1x1x4xf32>
// CHECK: %[[VAL_15:.]] = vector.broadcast %{{.}} : index to vector<1x1x4xindex>		// CHECK: %[[VAL_15:.*]] = vector.broadcast %[[VAL_2]] : index to vector<1x1x4xindex>
// CHECK: %[[VAL_16:.]] = vector.broadcast %{{.}} : index to vector<1x1x4xindex>		// CHECK: %[[VAL_16:.*]] = vector.broadcast %[[VAL_4]] : index to vector<1x1x4xindex>
// CHECK: %[[VAL_17:.*]] = arith.addi %[[VAL_15]], %[[VAL_16]] : vector<1x1x4xindex>		// CHECK: %[[VAL_17:.*]] = arith.addi %[[VAL_15]], %[[VAL_16]] : vector<1x1x4xindex>
// CHECK: %[[VAL_18:.]] = vector.broadcast %{{.}} : index to vector<1x1x4xindex>		// CHECK: %[[VAL_18:.*]] = vector.broadcast %[[VAL_3]] : index to vector<1x1x4xindex>
// CHECK: %[[VAL_19:.*]] = vector.broadcast %[[VAL_7]] : vector<4xindex> to vector<1x1x4xindex>		// CHECK: %[[VAL_19:.*]] = vector.broadcast %[[VAL_7]] : vector<4xindex> to vector<1x1x4xindex>
// CHECK: %[[VAL_20:.*]] = arith.addi %[[VAL_18]], %[[VAL_19]] : vector<1x1x4xindex>		// CHECK: %[[VAL_20:.*]] = arith.addi %[[VAL_18]], %[[VAL_19]] : vector<1x1x4xindex>
// CHECK: %[[VAL_21:.]] = vector.broadcast %{{.}} : index to vector<1x1x4xindex>		// CHECK: %[[VAL_21:.*]] = vector.broadcast %[[VAL_5]] : index to vector<1x1x4xindex>
// CHECK: %[[VAL_22:.*]] = arith.addi %[[VAL_20]], %[[VAL_21]] : vector<1x1x4xindex>		// CHECK: %[[VAL_22:.*]] = arith.addi %[[VAL_20]], %[[VAL_21]] : vector<1x1x4xindex>
// CHECK: %[[VAL_23:.*]] = vector.gather %[[VAL_0]]{{\[}}%[[VAL_10]], %[[VAL_10]]] {{\[}}%[[VAL_17]]], %[[VAL_8]], %[[VAL_9]] : tensor<1x20xi32>, vector<1x1x4xindex>, vector<1x1x4xi1>, vector<1x1x4xi32> into vector<1x1x4xi32>		// CHECK: %[[VAL_23:.*]] = vector.shape_cast %[[VAL_17]] : vector<1x1x4xindex> to vector<4xindex>
// CHECK: %[[VAL_24:.*]] = arith.index_cast %[[VAL_23]] : vector<1x1x4xi32> to vector<1x1x4xindex>		// CHECK: %[[VAL_24:.*]] = vector.extractelement %[[VAL_23]]{{\[}}%[[VAL_9]] : i32] : vector<4xindex>
// CHECK: %[[VAL_25:.*]] = arith.maxsi %[[VAL_24]], %[[VAL_6]] : vector<1x1x4xindex>		// CHECK: %[[VAL_25:.*]] = vector.broadcast %[[VAL_24]] : index to vector<1xindex>
// CHECK: %[[VAL_26:.*]] = arith.minsi %[[VAL_25]], %[[VAL_11]] : vector<1x1x4xindex>		// CHECK: %[[VAL_26:.*]] = vector.gather %[[VAL_0]]{{\[}}%[[VAL_8]], %[[VAL_8]]] {{\[}}%[[VAL_25]]], %[[VAL_10]], %[[VAL_11]] : tensor<1x20xi32>, vector<1xindex>, vector<1xi1>, vector<1xi32> into vector<1xi32>
// CHECK: %[[VAL_27:.*]] = vector.shape_cast %[[VAL_26]] : vector<1x1x4xindex> to vector<4xindex>		// CHECK: %[[VAL_27:.*]] = arith.index_cast %[[VAL_26]] : vector<1xi32> to vector<1xindex>
// CHECK: %[[VAL_28:.*]] = vector.extractelement %[[VAL_27]]{{\[}}%[[VAL_12]] : i32] : vector<4xindex>		// CHECK: %[[VAL_28:.*]] = vector.broadcast %[[VAL_27]] : vector<1xindex> to vector<1x1x4xindex>
// CHECK: %[[VAL_29:.*]] = vector.shape_cast %[[VAL_22]] : vector<1x1x4xindex> to vector<4xindex>		// CHECK: %[[VAL_29:.*]] = arith.maxsi %[[VAL_28]], %[[VAL_6]] : vector<1x1x4xindex>
// CHECK: %[[VAL_30:.*]] = vector.extractelement %[[VAL_29]]{{\[}}%[[VAL_12]] : i32] : vector<4xindex>		// CHECK: %[[VAL_30:.*]] = arith.minsi %[[VAL_29]], %[[VAL_12]] : vector<1x1x4xindex>
// CHECK: %[[VAL_31:.*]] = vector.transfer_read %[[VAL_1]]{{\[}}%[[VAL_28]], %[[VAL_30]]], %[[VAL_13]] {in_bounds = [true, true]} : tensor<257x24xf32>, vector<1x4xf32>		// CHECK: %[[VAL_31:.*]] = vector.shape_cast %[[VAL_30]] : vector<1x1x4xindex> to vector<4xindex>
// CHECK: %[[VAL_32:.*]] = vector.broadcast %[[VAL_31]] : vector<1x4xf32> to vector<1x1x4xf32>		// CHECK: %[[VAL_32:.*]] = vector.extractelement %[[VAL_31]]{{\[}}%[[VAL_9]] : i32] : vector<4xindex>
// CHECK: %[[VAL_33:.*]] = vector.transfer_write %[[VAL_32]], %[[VAL_14]]{{\[}}%[[VAL_10]], %[[VAL_10]], %[[VAL_10]]] {in_bounds = [true, true, true]} : vector<1x1x4xf32>, tensor<1x1x4xf32>		// CHECK: %[[VAL_33:.*]] = vector.shape_cast %[[VAL_22]] : vector<1x1x4xindex> to vector<4xindex>
// CHECK: return %[[VAL_33]] : tensor<1x1x4xf32>		// CHECK: %[[VAL_34:.*]] = vector.extractelement %[[VAL_33]]{{\[}}%[[VAL_9]] : i32] : vector<4xindex>
		// CHECK: %[[VAL_35:.*]] = vector.transfer_read %[[VAL_1]]{{\[}}%[[VAL_32]], %[[VAL_34]]], %[[VAL_13]] {in_bounds = [true, true]} : tensor<257x24xf32>, vector<1x4xf32>
		// CHECK: %[[VAL_36:.*]] = vector.broadcast %[[VAL_35]] : vector<1x4xf32> to vector<1x1x4xf32>
		// CHECK: %[[VAL_37:.*]] = vector.transfer_write %[[VAL_36]], %[[VAL_14]]{{\[}}%[[VAL_8]], %[[VAL_8]], %[[VAL_8]]] {in_bounds = [true, true, true]} : vector<1x1x4xf32>, tensor<1x1x4xf32>
		// CHECK: return %[[VAL_37]] : tensor<1x1x4xf32>
// CHECK: }		// CHECK: }

transform.sequence failures(propagate) {		transform.sequence failures(propagate) {
^bb1(%arg1: !pdl.operation):		^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!pdl.operation) -> !pdl.operation		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1 : (!pdl.operation) -> !pdl.operation
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
%2 = transform.structured.vectorize %1 { vectorize_nd_extract }		%2 = transform.structured.vectorize %1 { vectorize_nd_extract }
}		}
▲ Show 20 Lines • Show All 1,023 Lines • Show Last 20 Lines