This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Vector/Transforms/
-
Dialect/
-
Vector/
-
Transforms/
5/8
VectorTransforms.cpp
-
test/Dialect/Vector/
-
Dialect/
-
Vector/
-
vector-transpose-lowering.mlir

Differential D120601

[mlir][Vector] Improve default lowering of vector transpose operations
ClosedPublic

Authored by dcaballe on Feb 25 2022, 4:38 PM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
gysit
pifon2a
ThomasRaoux

Commits

rG917d95fc8adb: [mlir][Vector] Improve default lowering of vector transpose operations

Summary

The default lowering of vector transpose operations generates a large sequence of
scalar extract/insert operations, one pair for each scalar element in the input tensor.
In other words, the vector transpose is scalarized. However, there are transpose
patterns where one or more adjacent high-order dimensions are not transposed (for
example, in the transpose pattern [1, 0, 2, 3], dimensions 2 and 3 are not transposed).
This patch improves the lowering of those cases by not scalarizing them and extracting/
inserting a full n-D vector, where 'n' is the number of adjacent high-order dimensions
not being transposed. By doing so, we prevent the scalarization of the code and generate a
more performant vector version.

Paradoxically, this patch shouldn't improve the performance of transpose operations if
we are using LLVM. The LLVM pipeline is able to optimize away some of the extract/insert
operations and the SLP vectorizer is converting the scalar operations back to its vector
form. However, scalarizing a vector version of the code in MLIR and relying on the SLP
vectorizer to reconstruct the vector code again is highly undesirable for several reasons.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dcaballe created this revision.Feb 25 2022, 4:38 PM

Herald added a reviewer: aartbik. · View Herald TranscriptFeb 25 2022, 4:38 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 18 others. · View Herald Transcript

dcaballe requested review of this revision.Feb 25 2022, 4:38 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 25 2022, 4:38 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B151573: Diff 411554.Feb 25 2022, 5:00 PM

dcaballe added reviewers: gysit, pifon2a.Mar 3 2022, 6:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:35 PM

dcaballe added a reviewer: ThomasRaoux.Mar 3 2022, 7:07 PM

LGTM

This revision is now accepted and ready to land.Mar 3 2022, 7:22 PM

Not blocking you for internal reasons but please take the opportunity to clean up this older implementation as I outline in the review.

Thanks!

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
344	Side note: have you looked into generalizing the shuffle based approach? I think most of the code below could go away if the shuffle approach works well enough. Basically the idea is: vector n-d -> vector 1-d by shape cast -> shuffle -> transpose vector n-d by shape cast. The shuffle mask is a simple (linear scan -> delinearize -> transpose -> linearize) for each element. This works very well in 2-D but I have not tried in higher-D myself as there was no concrete use case.
370	A helper `findFirstIndex` would be appropriate to avoid cluttering the main pattern.
379	Can you use `VectorType::Builder(vt).setShape(vt.getShape().drop_front(leftmostTransposedDims))` here ?
393	Hmm the recursive aspect is quite old and due for a refresh. We have better ways of doing this that I described above: `(linear scan -> delinearize -> transpose -> linearize) for each element.` The helpers to linearize/delinearize already exist here: mlir/Dialect/Utils/IndexingUtils.h. You may want to extend the utils a bit to allow some transpose logic too and make this one-off thing obsolete. Note that more general transpose behavior would be useful factoring out for the "full shuffle" approach I described above.

Addressed feedback.

Thanks for the feedback! I addressed the trivial one. The cleanup/generalization would need more thought before we move forward. More info below.
I'll move forward with this if no more comments.

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
344	Yes, that's the first thing I tried but, as you mentioned, there are implementation gaps. We can think about generalizing this in the future but I see some value on the multi-dim insert/extract approach. We keep more control on the lowering. Flattening the n-d vector to 1-d and then generating a single big shuffle would work but it's up to LLVM what to do with that. For example, it could lower it to zmm registers when you really want to stay in ymm registers (actually, this just happened to me with a similar problem). We need to run some experiments but my first impression is that flattening the n-d vector by default would bring other challenges.
393	Same as before. We would need to run more experiments. Flattening the n-d vector by default would bring other challenges and we would lose control on the lowering.

nicolasvasilache added inline comments.Mar 4 2022, 11:58 AM

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
393	just to be sure we are on the same page, there is a refactoring aspect that is NFC to use more established logic and avoif locally reinventing `(linear scan -> delinearize -> transpose -> linearize) for each element`: this is pure C++ code improvement, I suspect it can also be reused in other places. vector.shuffle itself is separate and indeed subject to more experiments.

Harbormaster completed remote builds in B152644: Diff 413088.Mar 4 2022, 12:10 PM

dcaballe added inline comments.Mar 7 2022, 9:39 AM

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
393	Ok, let me land this and address this refactoring later this week. Thanks!

Closed by commit rG917d95fc8adb: [mlir][Vector] Improve default lowering of vector transpose operations (authored by dcaballe). · Explain WhyMar 7 2022, 9:58 AM

This revision was automatically updated to reflect the committed changes.

dcaballe added a commit: rG917d95fc8adb: [mlir][Vector] Improve default lowering of vector transpose operations.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Vector/

Transforms/

VectorTransforms.cpp

50 lines

test/

Dialect/

Vector/

vector-transpose-lowering.mlir

48 lines

Diff 413536

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	if (m == 0) {
result = rewriter.create<vector::InsertOp>(loc, bcst, result, d);		result = rewriter.create<vector::InsertOp>(loc, bcst, result, d);
}		}
}		}
rewriter.replaceOp(op, result);		rewriter.replaceOp(op, result);
return success();		return success();
}		}
};		};

		/// Return the number of leftmost dimensions from the first rightmost transposed
		/// dimension found in 'transpose'.
		size_t getNumDimsFromFirstTransposedDim(ArrayRef<int64_t> transpose) {
		size_t numTransposedDims = transpose.size();
		for (size_t transpDim : llvm::reverse(transpose)) {
		if (transpDim != numTransposedDims - 1)
		break;
		numTransposedDims--;
		}
		return numTransposedDims;
		}

/// Progressive lowering of TransposeOp.		/// Progressive lowering of TransposeOp.
/// One:		/// One:
/// %x = vector.transpose %y, [1, 0]		/// %x = vector.transpose %y, [1, 0]
/// is replaced by:		/// is replaced by:
/// %z = arith.constant dense<0.000000e+00>		/// %z = arith.constant dense<0.000000e+00>
/// %0 = vector.extract %y[0, 0]		/// %0 = vector.extract %y[0, 0]
/// %1 = vector.insert %0, %z [0, 0]		/// %1 = vector.insert %0, %z [0, 0]
/// ..		/// ..
Show All 13 Lines	LogicalResult matchAndRewrite(vector::TransposeOp op,

VectorType resType = op.getResultType();		VectorType resType = op.getResultType();

// Set up convenience transposition table.		// Set up convenience transposition table.
SmallVector<int64_t, 4> transp;		SmallVector<int64_t, 4> transp;
for (auto attr : op.transp())		for (auto attr : op.transp())
transp.push_back(attr.cast<IntegerAttr>().getInt());		transp.push_back(attr.cast<IntegerAttr>().getInt());

if (vectorTransformOptions.vectorTransposeLowering ==		if (vectorTransformOptions.vectorTransposeLowering ==
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Side note: have you looked into generalizing the shuffle based approach? I think most of the code below could go away if the shuffle approach works well enough. Basically the idea is: vector n-d -> vector 1-d by shape cast -> shuffle -> transpose vector n-d by shape cast. The shuffle mask is a simple (linear scan -> delinearize -> transpose -> linearize) for each element. This works very well in 2-D but I have not tried in higher-D myself as there was no concrete use case. nicolasvasilache: Side note: have you looked into generalizing the shuffle based approach? I think most of the…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Yes, that's the first thing I tried but, as you mentioned, there are implementation gaps. We can think about generalizing this in the future but I see some value on the multi-dim insert/extract approach. We keep more control on the lowering. Flattening the n-d vector to 1-d and then generating a single big shuffle would work but it's up to LLVM what to do with that. For example, it could lower it to zmm registers when you really want to stay in ymm registers (actually, this just happened to me with a similar problem). We need to run some experiments but my first impression is that flattening the n-d vector by default would bring other challenges. dcaballe: Yes, that's the first thing I tried but, as you mentioned, there are implementation gaps. We…
vector::VectorTransposeLowering::Shuffle &&		vector::VectorTransposeLowering::Shuffle &&
resType.getRank() == 2 && transp[0] == 1 && transp[1] == 0)		resType.getRank() == 2 && transp[0] == 1 && transp[1] == 0)
return rewriter.notifyMatchFailure(		return rewriter.notifyMatchFailure(
op, "Options specifies lowering to shuffle");		op, "Options specifies lowering to shuffle");

// Handle a true 2-D matrix transpose differently when requested.		// Handle a true 2-D matrix transpose differently when requested.
if (vectorTransformOptions.vectorTransposeLowering ==		if (vectorTransformOptions.vectorTransposeLowering ==
vector::VectorTransposeLowering::Flat &&		vector::VectorTransposeLowering::Flat &&
resType.getRank() == 2 && transp[0] == 1 && transp[1] == 0) {		resType.getRank() == 2 && transp[0] == 1 && transp[1] == 0) {
Type flattenedType =		Type flattenedType =
VectorType::get(resType.getNumElements(), resType.getElementType());		VectorType::get(resType.getNumElements(), resType.getElementType());
auto matrix =		auto matrix =
rewriter.create<vector::ShapeCastOp>(loc, flattenedType, op.vector());		rewriter.create<vector::ShapeCastOp>(loc, flattenedType, op.vector());
auto rows = rewriter.getI32IntegerAttr(resType.getShape()[0]);		auto rows = rewriter.getI32IntegerAttr(resType.getShape()[0]);
auto columns = rewriter.getI32IntegerAttr(resType.getShape()[1]);		auto columns = rewriter.getI32IntegerAttr(resType.getShape()[1]);
Value trans = rewriter.create<vector::FlatTransposeOp>(		Value trans = rewriter.create<vector::FlatTransposeOp>(
loc, flattenedType, matrix, rows, columns);		loc, flattenedType, matrix, rows, columns);
rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(op, resType, trans);		rewriter.replaceOpWithNewOp<vector::ShapeCastOp>(op, resType, trans);
return success();		return success();
}		}

// Generate fully unrolled extract/insert ops.		// Generate unrolled extract/insert ops. We do not unroll the rightmost
		// (i.e., highest-order) dimensions that are not transposed and leave them
		// in vector form to improve performance.
		size_t numLeftmostTransposedDims = getNumDimsFromFirstTransposedDim(transp);

		nicolasvasilacheUnsubmitted Done Reply Inline Actions A helper `findFirstIndex` would be appropriate to avoid cluttering the main pattern. nicolasvasilache: A helper `findFirstIndex` would be appropriate to avoid cluttering the main pattern.
		// The type of the extract operation will be scalar if all the dimensions
		// are unrolled. Otherwise, it will be a vector with the shape of the
		// dimensions that are not transposed.
		Type extractType =
		numLeftmostTransposedDims == transp.size()
		? resType.getElementType()
		: VectorType::Builder(resType).setShape(
		resType.getShape().drop_front(numLeftmostTransposedDims));

		nicolasvasilacheUnsubmitted Done Reply Inline Actions Can you use `VectorType::Builder(vt).setShape(vt.getShape().drop_front(leftmostTransposedDims))` here ? nicolasvasilache: Can you use `VectorType::Builder(vt).setShape(vt.getShape().drop_front…
Value result = rewriter.create<arith::ConstantOp>(		Value result = rewriter.create<arith::ConstantOp>(
loc, resType, rewriter.getZeroAttr(resType));		loc, resType, rewriter.getZeroAttr(resType));
SmallVector<int64_t, 4> lhs(transp.size(), 0);		SmallVector<int64_t, 4> lhs(numLeftmostTransposedDims, 0);
SmallVector<int64_t, 4> rhs(transp.size(), 0);		SmallVector<int64_t, 4> rhs(numLeftmostTransposedDims, 0);
rewriter.replaceOp(op, expandIndices(loc, resType, 0, transp, lhs, rhs,		rewriter.replaceOp(op, expandIndices(loc, resType, extractType, 0,
op.vector(), result, rewriter));		numLeftmostTransposedDims, transp, lhs,
		rhs, op.vector(), result, rewriter));
return success();		return success();
}		}

private:		private:
// Builds the indices arrays for the lhs and rhs. Generates the extract/insert		// Builds the indices arrays for the lhs and rhs. Generates the extract/insert
// operation when al ranks are exhausted.		// operations when all the ranks go over the last dimension being transposed.
Value expandIndices(Location loc, VectorType resType, int64_t pos,		Value expandIndices(Location loc, VectorType resType, Type extractType,
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Hmm the recursive aspect is quite old and due for a refresh. We have better ways of doing this that I described above: `(linear scan -> delinearize -> transpose -> linearize) for each element.` The helpers to linearize/delinearize already exist here: mlir/Dialect/Utils/IndexingUtils.h. You may want to extend the utils a bit to allow some transpose logic too and make this one-off thing obsolete. Note that more general transpose behavior would be useful factoring out for the "full shuffle" approach I described above. nicolasvasilache: Hmm the recursive aspect is quite old and due for a refresh. We have better ways of doing this…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Same as before. We would need to run more experiments. Flattening the n-d vector by default would bring other challenges and we would lose control on the lowering. dcaballe: Same as before. We would need to run more experiments. Flattening the n-d vector by default…
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions just to be sure we are on the same page, there is a refactoring aspect that is NFC to use more established logic and avoif locally reinventing `(linear scan -> delinearize -> transpose -> linearize) for each element`: this is pure C++ code improvement, I suspect it can also be reused in other places. vector.shuffle itself is separate and indeed subject to more experiments. nicolasvasilache: just to be sure we are on the same page, there is a refactoring aspect that is NFC to use more…
		dcaballeAuthorUnsubmitted Done Reply Inline Actions Ok, let me land this and address this refactoring later this week. Thanks! dcaballe: Ok, let me land this and address this refactoring later this week. Thanks!
		int64_t pos, int64_t numLeftmostTransposedDims,
SmallVector<int64_t, 4> &transp,		SmallVector<int64_t, 4> &transp,
SmallVector<int64_t, 4> &lhs,		SmallVector<int64_t, 4> &lhs,
SmallVector<int64_t, 4> &rhs, Value input, Value result,		SmallVector<int64_t, 4> &rhs, Value input, Value result,
PatternRewriter &rewriter) const {		PatternRewriter &rewriter) const {
if (pos >= resType.getRank()) {		if (pos >= numLeftmostTransposedDims) {
auto ridx = rewriter.getI64ArrayAttr(rhs);		auto ridx = rewriter.getI64ArrayAttr(rhs);
auto lidx = rewriter.getI64ArrayAttr(lhs);		auto lidx = rewriter.getI64ArrayAttr(lhs);
Type eltType = resType.getElementType();		Value e =
Value e = rewriter.create<vector::ExtractOp>(loc, eltType, input, ridx);		rewriter.create<vector::ExtractOp>(loc, extractType, input, ridx);
return rewriter.create<vector::InsertOp>(loc, resType, e, result, lidx);		return rewriter.create<vector::InsertOp>(loc, resType, e, result, lidx);
}		}
for (int64_t d = 0, e = resType.getDimSize(pos); d < e; ++d) {		for (int64_t d = 0, e = resType.getDimSize(pos); d < e; ++d) {
lhs[pos] = d;		lhs[pos] = d;
rhs[transp[pos]] = d;		rhs[transp[pos]] = d;
result = expandIndices(loc, resType, pos + 1, transp, lhs, rhs, input,		result = expandIndices(loc, resType, extractType, pos + 1,
		numLeftmostTransposedDims, transp, lhs, rhs, input,
result, rewriter);		result, rewriter);
}		}
return result;		return result;
}		}

/// Options to control the vector patterns.		/// Options to control the vector patterns.
vector::VectorTransformsOptions vectorTransformOptions;		vector::VectorTransformsOptions vectorTransformOptions;
};		};
▲ Show 20 Lines • Show All 2,192 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/vector-transpose-lowering.mlir

Show First 20 Lines • Show All 571 Lines • ▼ Show 20 Lines	func @transpose021_8x8x1xf32(%arg0: vector<8x8x1xf32>) -> vector<8x1x8xf32> {
%0 = vector.transpose %arg0, [0, 2, 1] : vector<8x8x1xf32> to vector<8x1x8xf32>		%0 = vector.transpose %arg0, [0, 2, 1] : vector<8x8x1xf32> to vector<8x1x8xf32>
return %0 : vector<8x1x8xf32>		return %0 : vector<8x1x8xf32>
}		}

// AVX2-NOT: vector.shuffle		// AVX2-NOT: vector.shuffle

// -----		// -----

		// ELTWISE-LABEL: func @transpose102_1x8x8xf32
// AVX2-LABEL: func @transpose102_1x8x8		// AVX2-LABEL: func @transpose102_1x8x8
func @transpose102_1x8x8xf32(%arg0: vector<1x8x8xf32>) -> vector<8x1x8xf32> {		func @transpose102_1x8x8xf32(%arg0: vector<1x8x8xf32>) -> vector<8x1x8xf32> {
		// ELTWISE: vector.extract {{.*}}[0, 0] : vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 0] : vector<8xf32> into vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[0, 1] : vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [1, 0] : vector<8xf32> into vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[0, 2] : vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [2, 0] : vector<8xf32> into vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[0, 3] : vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [3, 0] : vector<8xf32> into vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[0, 4] : vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [4, 0] : vector<8xf32> into vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[0, 5] : vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [5, 0] : vector<8xf32> into vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[0, 6] : vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [6, 0] : vector<8xf32> into vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[0, 7] : vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [7, 0] : vector<8xf32> into vector<8x1x8xf32>
%0 = vector.transpose %arg0, [1, 0, 2] : vector<1x8x8xf32> to vector<8x1x8xf32>		%0 = vector.transpose %arg0, [1, 0, 2] : vector<1x8x8xf32> to vector<8x1x8xf32>
return %0 : vector<8x1x8xf32>		return %0 : vector<8x1x8xf32>
}		}

// AVX2-NOT: vector.shuffle		// AVX2-NOT: vector.shuffle

// -----		// -----

		// ELTWISE-LABEL: func @transpose102_8x1x8xf32
// AVX2-LABEL: func @transpose102_8x1x8		// AVX2-LABEL: func @transpose102_8x1x8
func @transpose102_8x1x8xf32(%arg0: vector<8x1x8xf32>) -> vector<1x8x8xf32> {		func @transpose102_8x1x8xf32(%arg0: vector<8x1x8xf32>) -> vector<1x8x8xf32> {
		// ELTWISE: vector.extract {{.*}}[0, 0] : vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 0] : vector<8xf32> into vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[1, 0] : vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 1] : vector<8xf32> into vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[2, 0] : vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 2] : vector<8xf32> into vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[3, 0] : vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 3] : vector<8xf32> into vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[4, 0] : vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 4] : vector<8xf32> into vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[5, 0] : vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 5] : vector<8xf32> into vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[6, 0] : vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 6] : vector<8xf32> into vector<1x8x8xf32>
		// ELTWISE-NEXT: vector.extract {{.*}}[7, 0] : vector<8x1x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 7] : vector<8xf32> into vector<1x8x8xf32>
%0 = vector.transpose %arg0, [1, 0, 2] : vector<8x1x8xf32> to vector<1x8x8xf32>		%0 = vector.transpose %arg0, [1, 0, 2] : vector<8x1x8xf32> to vector<1x8x8xf32>
return %0 : vector<1x8x8xf32>		return %0 : vector<1x8x8xf32>
}		}

// AVX2-NOT: vector.shuffle		// AVX2-NOT: vector.shuffle

// -----		// -----

		// ELTWISE-LABEL: func @transpose1023_1x1x8x8xf32(
		// AVX2-LABEL: func @transpose1023_1x1x8x8
		func @transpose1023_1x1x8x8xf32(%arg0: vector<1x1x8x8xf32>) -> vector<1x1x8x8xf32> {
		// Note the single 2-D extract/insert pair since 2 and 3 are not transposed!
		// ELTWISE: vector.extract {{.*}}[0, 0] : vector<1x1x8x8xf32>
		// ELTWISE-NEXT: vector.insert {{.*}} [0, 0] : vector<8x8xf32> into vector<1x1x8x8xf32>
		%0 = vector.transpose %arg0, [1, 0, 2, 3] : vector<1x1x8x8xf32> to vector<1x1x8x8xf32>
		return %0 : vector<1x1x8x8xf32>
		}

		// AVX2-NOT: vector.shuffle

		// -----

// AVX2-LABEL: func @transpose120_1x8x8		// AVX2-LABEL: func @transpose120_1x8x8
func @transpose120_1x8x8xf32(%arg0: vector<1x8x8xf32>) -> vector<8x8x1xf32> {		func @transpose120_1x8x8xf32(%arg0: vector<1x8x8xf32>) -> vector<8x8x1xf32> {

%0 = vector.transpose %arg0, [1, 2, 0] : vector<1x8x8xf32> to vector<8x8x1xf32>		%0 = vector.transpose %arg0, [1, 2, 0] : vector<1x8x8xf32> to vector<8x8x1xf32>
return %0 : vector<8x8x1xf32>		return %0 : vector<8x8x1xf32>
}		}

// AVX2-NOT: vector.shuffle		// AVX2-NOT: vector.shuffle
Show All 11 Lines