This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/
-
mlir/
-
Dialect/
-
Linalg/Transforms/
-
Transforms/
1/2
Hoisting.h
-
Vector/
2/2
VectorUtils.h
-
lib/Dialect/
-
Dialect/
-
Linalg/Transforms/
-
Transforms/
10/12
Hoisting.cpp
-
Vector/
1/1
VectorUtils.cpp
-
test/
-
Dialect/Linalg/
-
Linalg/
-
hoisting.mlir
-
lib/Transforms/
-
Transforms/
-
TestLinalgHoisting.cpp

Differential D94115

[mlir] Add hoisting transformation for transfer ops on tensor
ClosedPublic

Authored by ThomasRaoux on Jan 5 2021, 1:12 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
aartbik

Commits

rGefd05040e13e: [mlir] Add hoisting transformation for transfer ops on tensor

Summary

Add same hoisting transformation existing for transfer ops on buffers for transfer_ops on tensor. The logic is significantly different so this is done as a separate transformation and it is expect that user would know which transformation to use based on the flow.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Jan 5 2021, 1:12 PM

Herald added subscribers: mravishankar, teijeong, rdzhabarov and 14 others. · View Herald TranscriptJan 5 2021, 1:12 PM

ThomasRaoux requested review of this revision.Jan 5 2021, 1:12 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 5 2021, 1:12 PM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

ThomasRaoux updated this revision to Diff 314778.Jan 5 2021, 7:19 PM

nicolasvasilache accepted this revision.Jan 6 2021, 8:21 AM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/Vector/VectorUtils.h
171	`xxxIndices`
mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp
84	`significantly`
94	`removed by the`
100	`forOp.getTerminator()`
114	please hoist into a static helper function
126	Same, can we please isolate in a helper function and document each subcase?
132	`// Skip the candidate use, only inspect the "other" uses.`
135	`// Consider all transitive uses through a vector.transfer_write.`
139	`// Consider all nested uses through an scf::ForOp.`
148	This case looks like it could fold into the previous `scf::ForOp` by considering both the uses of the corresponding BBArg as well as the uses of the corresponding result?
mlir/lib/Dialect/Vector/VectorUtils.cpp
351	`xxxIndices`

This revision is now accepted and ready to land.Jan 6 2021, 8:21 AM

aartbik added inline comments.Jan 6 2021, 9:38 AM

mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h
38	perhaps add the "on buffers" restriction explicitly in the doc of L36 method
mlir/include/mlir/Dialect/Vector/VectorUtils.h
171	typo: cies

Address review feedback.

ThomasRaoux marked 10 inline comments as done.Jan 6 2021, 11:28 AM

ThomasRaoux added inline comments.

mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h
38	Good point. I updated the comment above.
mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp
100	Looks like ForOp doesn't have such a method but there is a simpler way indeed. Replaced it with `forOp.getBody()->getTerminator()`
114	Makes sense, moved the logic and the one below into two helper functions.
148	The problem is that it wouldn't work for the case where we have several nested ForOp where the tensor is just pass-through. Since I rely on separate canonicalization to remove those I need to handle this case if I want to support hoisting out of nested loops.

Closed by commit rGefd05040e13e: [mlir] Add hoisting transformation for transfer ops on tensor (authored by ThomasRaoux). · Explain WhyJan 6 2021, 2:24 PM

This revision was automatically updated to reflect the committed changes.

ThomasRaoux marked an inline comment as done.

ThomasRaoux added a commit: rGefd05040e13e: [mlir] Add hoisting transformation for transfer ops on tensor.

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Transforms/

Hoisting.h

4 lines

Vector/

VectorUtils.h

6 lines

lib/

Dialect/

Linalg/

Transforms/

Hoisting.cpp

122 lines

Vector/

VectorUtils.cpp

13 lines

test/

Dialect/

Linalg/

hoisting.mlir

166 lines

lib/

Transforms/

TestLinalgHoisting.cpp

1 line

Diff 314704

mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h

	Show All 29 Lines
	/// the read across the loop)			/// the read across the loop)
	/// To improve hoisting opportunities, call the `moveLoopInvariantCode` helper			/// To improve hoisting opportunities, call the `moveLoopInvariantCode` helper
	/// function on the candidate loop above which to hoist. Hoisting the transfers			/// function on the candidate loop above which to hoist. Hoisting the transfers
	/// results in scf::ForOp yielding the value that originally transited through			/// results in scf::ForOp yielding the value that originally transited through
	/// memory.			/// memory.
	// TODO: generalize on a per-need basis.			// TODO: generalize on a per-need basis.
	void hoistRedundantVectorTransfers(FuncOp func);			void hoistRedundantVectorTransfers(FuncOp func);

				/// Same behavior as `hoistRedundantVectorTransfers` but works on tensors
				aartbikUnsubmitted Not Done Reply Inline Actions perhaps add the "on buffers" restriction explicitly in the doc of L36 method aartbik: perhaps add the "on buffers" restriction explicitly in the doc of L36 method
				ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Good point. I updated the comment above. ThomasRaoux: Good point. I updated the comment above.
				/// instead of buffers.
				void hoistRedundantVectorTransfersOnTensor(FuncOp func);

	} // namespace linalg			} // namespace linalg
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_LINALG_TRANSFORMS_HOISTING_H_			#endif // MLIR_DIALECT_LINALG_TRANSFORMS_HOISTING_H_

mlir/include/mlir/Dialect/Vector/VectorUtils.h

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	AffineMap getTransferMinorIdentityMap(ShapedType shapedType,			AffineMap getTransferMinorIdentityMap(ShapedType shapedType,
	VectorType vectorType);			VectorType vectorType);

	/// Return true if we can prove that the transfer operations access disjoint			/// Return true if we can prove that the transfer operations access disjoint
	/// memory.			/// memory.
	bool isDisjointTransferSet(VectorTransferOpInterface transferA,			bool isDisjointTransferSet(VectorTransferOpInterface transferA,
	VectorTransferOpInterface transferB);			VectorTransferOpInterface transferB);

				/// Same behavior as `isDisjointTransferSet` but doesn't require the operations
				/// to have the same tensor/memref. This allows comparing operations accessing
				/// different tensors.
				bool isDisjointTransferIndicies(VectorTransferOpInterface transferA,
				nicolasvasilacheUnsubmitted Done Reply Inline Actions `xxxIndices` nicolasvasilache: `xxxIndices`
				aartbikUnsubmitted Done Reply Inline Actions typo: cies aartbik: typo: cies
				VectorTransferOpInterface transferB);

	namespace matcher {			namespace matcher {

	/// Matches vector.transfer_read, vector.transfer_write and ops that return a			/// Matches vector.transfer_read, vector.transfer_write and ops that return a
	/// vector type that is a multiple of the sub-vector type. This allows passing			/// vector type that is a multiple of the sub-vector type. This allows passing
	/// over other smaller vector types in the function and avoids interfering with			/// over other smaller vector types in the function and avoids interfering with
	/// operations on those.			/// operations on those.
	/// This is a first approximation, it can easily be extended in the future.			/// This is a first approximation, it can easily be extended in the future.
	/// TODO: this could all be much simpler if we added a bit that a vector type to			/// TODO: this could all be much simpler if we added a bit that a vector type to
	/// mark that a vector is a strict super-vector but it still does not warrant			/// mark that a vector is a strict super-vector but it still does not warrant
	/// adding even 1 extra bit in the IR for now.			/// adding even 1 extra bit in the IR for now.
	bool operatesOnSuperVectorsOf(Operation &op, VectorType subVectorType);			bool operatesOnSuperVectorsOf(Operation &op, VectorType subVectorType);

	} // end namespace matcher			} // end namespace matcher
	} // end namespace mlir			} // end namespace mlir

	#endif // MLIR_DIALECT_VECTOR_VECTORUTILS_H_			#endif // MLIR_DIALECT_VECTOR_VECTORUTILS_H_

mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp

Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	func.walk([&changed](Operation *op) {
loop.moveOutOfLoop({op});		loop.moveOutOfLoop({op});
else // Move DeallocOp outside of the loop.		else // Move DeallocOp outside of the loop.
op->moveAfter(loop);		op->moveAfter(loop);
changed = true;		changed = true;
});		});
}		}
}		}

		// To hoist transfer op on tensor the logic can be significanlty simplified
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `significantly` nicolasvasilache: `significantly`
		// compared to the case on buffer. The transformation follows this logic:
		// 1. Look for transfer_write with a single use from ForOp yield
		// 2. Check the uses of the matching block argument and look for a transfer_read
		// with the same indices.
		// 3. Check that all the other uses of the tensor argument are either disjoint
		// tensor_read or transfer_write. For transfer_write uses recurse to make sure
		// the new tensor has the same restrictions on its uses.
		// 4. Hoist the tensor_read/tensor_write and update the tensor SSA links.
		// After this transformation the scf.forOp may have unused arguments that can be
		// remove by canonicalization pass.
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `removed by the` nicolasvasilache: `removed by the`
		void mlir::linalg::hoistRedundantVectorTransfersOnTensor(FuncOp func) {
		bool changed = true;
		while (changed) {
		changed = false;
		func.walk([&](scf::ForOp forOp) {
		Operation *yield = forOp.getLoopBody().getBlocks().back().getTerminator();
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions `forOp.getTerminator()` nicolasvasilache: `forOp.getTerminator()`
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Looks like ForOp doesn't have such a method but there is a simpler way indeed. Replaced it with `forOp.getBody()->getTerminator()` ThomasRaoux: Looks like ForOp doesn't have such a method but there is a simpler way indeed. Replaced it with…
		for (auto it : llvm::enumerate(forOp.getRegionIterArgs())) {
		Value ret = yield->getOperand(it.index());
		auto write = ret.getDefiningOp<vector::TransferWriteOp>();
		if (!write \|\| !write->hasOneUse())
		continue;
		LLVM_DEBUG(DBGS() << "Candidate write for hoisting: "
		<< *write.getOperation() << "\n");
		if (llvm::any_of(write.indices(), [&forOp](Value index) {
		return !forOp.isDefinedOutsideOfLoop(index);
		}))
		continue;
		// Find a read with the same type and indices.
		vector::TransferReadOp matchingRead;
		for (Operation *user : it.value().getUsers()) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions please hoist into a static helper function nicolasvasilache: please hoist into a static helper function
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions Makes sense, moved the logic and the one below into two helper functions. ThomasRaoux: Makes sense, moved the logic and the one below into two helper functions.
		auto read = dyn_cast<vector::TransferReadOp>(user);
		if (read && read.indices() == write.indices() &&
		read.getVectorType() == write.getVectorType()) {
		matchingRead = read;
		break;
		}
		}
		if (!matchingRead)
		continue;
		// Make sure none of the other uses read the part of the tensor modified
		// by the transfer_write.
		llvm::SmallVector<Value::use_range, 1> uses;
		nicolasvasilacheUnsubmitted Done Reply Inline Actions Same, can we please isolate in a helper function and document each subcase? nicolasvasilache: Same, can we please isolate in a helper function and document each subcase?
		uses.push_back(it.value().getUses());
		bool unknownUse = false;
		while (!uses.empty()) {
		for (OpOperand &use : uses.pop_back_val()) {
		Operation *user = use.getOwner();
		if (user == matchingRead.getOperation() \|\|
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `// Skip the candidate use, only inspect the "other" uses.` nicolasvasilache: `// Skip the candidate use, only inspect the "other" uses.`
		user == write.getOperation())
		continue;
		if (auto writeUser = dyn_cast<vector::TransferWriteOp>(user)) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `// Consider all transitive uses through a vector.transfer_write.` nicolasvasilache: `// Consider all transitive uses through a vector.transfer_write.`
		uses.push_back(writeUser->getResult(0).getUses());
		continue;
		}
		if (auto forUser = dyn_cast<scf::ForOp>(user)) {
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `// Consider all nested uses through an scf::ForOp.` nicolasvasilache: `// Consider all nested uses through an scf::ForOp.`
		Value arg = forUser.getLoopBody().getArgument(
		use.getOperandNumber() - forOp.getNumControlOperands() +
		/iv value/ 1);
		uses.push_back(arg.getUses());
		continue;
		}
		// Follow the use yield as long as it doesn't escape the original
		// region.
		scf::YieldOp yieldUser = dyn_cast<scf::YieldOp>(user);
		nicolasvasilacheUnsubmitted Not Done Reply Inline Actions This case looks like it could fold into the previous `scf::ForOp` by considering both the uses of the corresponding BBArg as well as the uses of the corresponding result? nicolasvasilache: This case looks like it could fold into the previous `scf::ForOp` by considering both the uses…
		ThomasRaouxAuthorUnsubmitted Done Reply Inline Actions The problem is that it wouldn't work for the case where we have several nested ForOp where the tensor is just pass-through. Since I rely on separate canonicalization to remove those I need to handle this case if I want to support hoisting out of nested loops. ThomasRaoux: The problem is that it wouldn't work for the case where we have several nested ForOp where the…
		if (yieldUser &&
		write->getParentOp()->isAncestor(yieldUser->getParentOp())) {
		Value ret =
		yieldUser->getParentOp()->getResult(use.getOperandNumber());
		uses.push_back(ret.getUses());
		continue;
		}
		auto read = dyn_cast<vector::TransferReadOp>(user);
		if (!read \|\|
		!isDisjointTransferIndicies(
		cast<VectorTransferOpInterface>(read.getOperation()),
		cast<VectorTransferOpInterface>(write.getOperation()))) {
		unknownUse = true;
		break;
		}
		}
		}
		if (unknownUse)
		continue;
		// Hoist read before.
		if (failed(forOp.moveOutOfLoop({matchingRead})))
		llvm_unreachable(
		"Unexpected failure to move transfer read out of loop");
		// Update the source tensor.
		matchingRead->setOperand(0, forOp.initArgs()[it.index()]);

		// Hoist write after.
		write->moveAfter(forOp);
		yield->setOperand(it.index(), write.source());

		// Rewrite `loop` with new yields by cloning and erase the original
		// loop.
		OpBuilder b(matchingRead);
		auto newForOp =
		cloneWithNewYields(b, forOp, matchingRead.vector(), write.vector());

		// Transfer write has been hoisted, need to update the vector and tensor
		// source. Replace the result of the loop to use the new tensor created
		// outside the loop.
		newForOp.getResult(it.index()).replaceAllUsesWith(write.getResult(0));
		write.setOperand(0, newForOp.getResults().back());
		write.setOperand(1, newForOp.getResult(it.index()));

		changed = true;
		forOp.erase();
		// Need to interrupt and restart because erasing the loop messes up the
		// walk.
		return WalkResult::interrupt();
		}
		return WalkResult::advance();
		});
		}
		}

void mlir::linalg::hoistRedundantVectorTransfers(FuncOp func) {		void mlir::linalg::hoistRedundantVectorTransfers(FuncOp func) {
bool changed = true;		bool changed = true;
while (changed) {		while (changed) {
changed = false;		changed = false;

func.walk([&](vector::TransferReadOp transferRead) {		func.walk([&](vector::TransferReadOp transferRead) {
		if (!transferRead.getShapedType().isa<MemRefType>())
		return WalkResult::advance();

LLVM_DEBUG(DBGS() << "Candidate for hoisting: "		LLVM_DEBUG(DBGS() << "Candidate for hoisting: "
<< *transferRead.getOperation() << "\n");		<< *transferRead.getOperation() << "\n");
auto loop = dyn_cast<scf::ForOp>(transferRead->getParentOp());		auto loop = dyn_cast<scf::ForOp>(transferRead->getParentOp());
LLVM_DEBUG(DBGS() << "Parent op: " << *transferRead->getParentOp()		LLVM_DEBUG(DBGS() << "Parent op: " << *transferRead->getParentOp()
<< "\n");		<< "\n");
if (!loop)		if (!loop)
return WalkResult::advance();		return WalkResult::advance();

▲ Show 20 Lines • Show All 100 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/VectorUtils.cpp

Show First 20 Lines • Show All 306 Lines • ▼ Show 20 Lines	bool matcher::operatesOnSuperVectorsOf(Operation &op,
// between parallel, reduction and possibly other cases.		// between parallel, reduction and possibly other cases.
if (!ratio.hasValue()) {		if (!ratio.hasValue()) {
return false;		return false;
}		}

return true;		return true;
}		}

bool mlir::isDisjointTransferSet(VectorTransferOpInterface transferA,		bool mlir::isDisjointTransferIndicies(VectorTransferOpInterface transferA,
VectorTransferOpInterface transferB) {		VectorTransferOpInterface transferB) {
if (transferA.source() != transferB.source())
return false;
// For simplicity only look at transfer of same type.		// For simplicity only look at transfer of same type.
if (transferA.getVectorType() != transferB.getVectorType())		if (transferA.getVectorType() != transferB.getVectorType())
return false;		return false;
unsigned rankOffset = transferA.getLeadingShapedRank();		unsigned rankOffset = transferA.getLeadingShapedRank();
for (unsigned i = 0, e = transferA.indices().size(); i < e; i++) {		for (unsigned i = 0, e = transferA.indices().size(); i < e; i++) {
auto indexA = transferA.indices()[i].getDefiningOp<ConstantOp>();		auto indexA = transferA.indices()[i].getDefiningOp<ConstantOp>();
auto indexB = transferB.indices()[i].getDefiningOp<ConstantOp>();		auto indexB = transferB.indices()[i].getDefiningOp<ConstantOp>();
// If any of the indices are dynamic we cannot prove anything.		// If any of the indices are dynamic we cannot prove anything.
Show All 13 Lines	if (i < rankOffset) {
std::abs(indexA.getValue().cast<IntegerAttr>().getInt() -		std::abs(indexA.getValue().cast<IntegerAttr>().getInt() -
indexB.getValue().cast<IntegerAttr>().getInt());		indexB.getValue().cast<IntegerAttr>().getInt());
if (distance >= transferA.getVectorType().getDimSize(i - rankOffset))		if (distance >= transferA.getVectorType().getDimSize(i - rankOffset))
return true;		return true;
}		}
}		}
return false;		return false;
}		}

		bool mlir::isDisjointTransferSet(VectorTransferOpInterface transferA,
		VectorTransferOpInterface transferB) {
		if (transferA.source() != transferB.source())
		return false;
		return isDisjointTransferIndicies(transferA, transferB);
		nicolasvasilacheUnsubmitted Done Reply Inline Actions `xxxIndices` nicolasvasilache: `xxxIndices`
		}

mlir/test/Dialect/Linalg/hoisting.mlir

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	scf.for %j = %lb to %ub step %step {
vector.transfer_write %u30, %memref3[%c0, %random_index] : vector<4xf32>, memref<?x?xf32>		vector.transfer_write %u30, %memref3[%c0, %random_index] : vector<4xf32>, memref<?x?xf32>
vector.transfer_write %u31, %memref3[%c1, %random_index] : vector<4xf32>, memref<?x?xf32>		vector.transfer_write %u31, %memref3[%c1, %random_index] : vector<4xf32>, memref<?x?xf32>
vector.transfer_write %u10, %memref0[%i, %i] : vector<2xf32>, memref<?x?xf32>		vector.transfer_write %u10, %memref0[%i, %i] : vector<2xf32>, memref<?x?xf32>
vector.transfer_write %u11, %memref0[%random_index, %random_index] : vector<2xf32>, memref<?x?xf32>		vector.transfer_write %u11, %memref0[%random_index, %random_index] : vector<2xf32>, memref<?x?xf32>
}		}
}		}
return		return
}		}

		// VECTOR_TRANSFERS-LABEL: func @hoist_vector_transfer_pairs_tensor
		func @hoist_vector_transfer_pairs_tensor(
		%tensor0: tensor<?x?xf32>, %tensor1: tensor<?x?xf32>, %tensor2: tensor<?x?xf32>,
		%tensor3: tensor<?x?xf32>, %tensor4: tensor<?x?xf32>, %tensor5: tensor<?x?xf32>,
		%val: index, %lb : index, %ub : index, %step: index) ->
		(tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
		tensor<?x?xf32>, tensor<?x?xf32>) {
		%c0 = constant 0 : index
		%cst = constant 0.0 : f32

		// VECTOR_TRANSFERS: vector.transfer_read %{{.*}} : tensor<?x?xf32>, vector<1xf32>
		// VECTOR_TRANSFERS: scf.for {{.}} iter_args({{.}}) ->
		// VECTOR_TRANSFERS-SAME: (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, vector<1xf32>) {
		// VECTOR_TRANSFERS: vector.transfer_read %{{.*}} : tensor<?x?xf32>, vector<2xf32>
		// VECTOR_TRANSFERS: scf.for {{.}} iter_args({{.}}) ->
		// VECTOR_TRANSFERS-SAME: (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, vector<2xf32>, vector<1xf32>) {
		// VECTOR_TRANSFERS: vector.transfer_read %{{.*}} : tensor<?x?xf32>, vector<3xf32>
		// VECTOR_TRANSFERS: vector.transfer_read %{{.*}} : tensor<?x?xf32>, vector<4xf32>
		// VECTOR_TRANSFERS: "some_crippling_use"(%{{.*}}) : (tensor<?x?xf32>) -> ()
		// VECTOR_TRANSFERS: vector.transfer_read %{{.*}} : tensor<?x?xf32>, vector<5xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<1xf32>) -> vector<1xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<2xf32>) -> vector<2xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (tensor<?x?xf32>) -> vector<3xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<4xf32>) -> vector<4xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<5xf32>) -> vector<5xf32>
		// VECTOR_TRANSFERS: vector.transfer_write %{{.*}} : vector<3xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: vector.transfer_write %{{.*}} : vector<4xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: vector.transfer_write %{{.*}} : vector<5xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: "some_crippling_use"(%{{.*}}) : (tensor<?x?xf32>) -> ()
		// VECTOR_TRANSFERS: scf.yield {{.*}} :
		// VECTOR_TRANSFERS-SAME: tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, vector<2xf32>, vector<1xf32>
		// VECTOR_TRANSFERS: }
		// VECTOR_TRANSFERS: vector.transfer_write %{{.*}} : vector<2xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: scf.yield {{.*}} :
		// VECTOR_TRANSFERS-SAME: tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, vector<1xf32>
		// VECTOR_TRANSFERS: }
		// VECTOR_TRANSFERS: vector.transfer_write %{{.*}} : vector<1xf32>, tensor<?x?xf32>
		%0:6 = scf.for %i = %lb to %ub step %step
		iter_args(%arg0 = %tensor0, %arg1 = %tensor1, %arg2 = %tensor2,
		%arg3 = %tensor3, %arg4 = %tensor4, %arg5 = %tensor5)
		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
		tensor<?x?xf32>, tensor<?x?xf32>) {
		%1:6 = scf.for %j = %lb to %ub step %step
		iter_args(%arg6 = %arg0, %arg7 = %arg1, %arg8 = %arg2,
		%arg9 = %arg3, %arg10 = %arg4, %arg11 = %arg5)
		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
		tensor<?x?xf32>, tensor<?x?xf32>) {
		%r0 = vector.transfer_read %arg7[%c0, %c0], %cst: tensor<?x?xf32>, vector<1xf32>
		%r1 = vector.transfer_read %arg6[%i, %i], %cst: tensor<?x?xf32>, vector<2xf32>
		%r2 = vector.transfer_read %arg8[%c0, %c0], %cst: tensor<?x?xf32>, vector<3xf32>
		%r3 = vector.transfer_read %arg9[%c0, %c0], %cst: tensor<?x?xf32>, vector<4xf32>
		"some_crippling_use"(%arg10) : (tensor<?x?xf32>) -> ()
		%r4 = vector.transfer_read %arg10[%c0, %c0], %cst: tensor<?x?xf32>, vector<5xf32>
		%r5 = vector.transfer_read %arg11[%c0, %c0], %cst: tensor<?x?xf32>, vector<6xf32>
		"some_crippling_use"(%arg11) : (tensor<?x?xf32>) -> ()
		%u0 = "some_use"(%r0) : (vector<1xf32>) -> vector<1xf32>
		%u1 = "some_use"(%r1) : (vector<2xf32>) -> vector<2xf32>
		%u2 = "some_use"(%arg8) : (tensor<?x?xf32>) -> vector<3xf32>
		%u3 = "some_use"(%r3) : (vector<4xf32>) -> vector<4xf32>
		%u4 = "some_use"(%r4) : (vector<5xf32>) -> vector<5xf32>
		%u5 = "some_use"(%r5) : (vector<6xf32>) -> vector<6xf32>
		%w1 = vector.transfer_write %u0, %arg7[%c0, %c0] : vector<1xf32>, tensor<?x?xf32>
		%w0 = vector.transfer_write %u1, %arg6[%i, %i] : vector<2xf32>, tensor<?x?xf32>
		%w2 = vector.transfer_write %u2, %arg8[%c0, %c0] : vector<3xf32>, tensor<?x?xf32>
		%w3 = vector.transfer_write %u3, %arg9[%c0, %c0] : vector<4xf32>, tensor<?x?xf32>
		%w4 = vector.transfer_write %u4, %arg10[%c0, %c0] : vector<5xf32>, tensor<?x?xf32>
		%w5 = vector.transfer_write %u5, %arg11[%c0, %c0] : vector<6xf32>, tensor<?x?xf32>
		"some_crippling_use"(%w3) : (tensor<?x?xf32>) -> ()
		scf.yield %w0, %w1, %w2, %w3, %w4, %w5 :
		tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
		tensor<?x?xf32>, tensor<?x?xf32>
		}
		scf.yield %1#0, %1#1, %1#2, %1#3, %1#4, %1#5 :
		tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
		tensor<?x?xf32>, tensor<?x?xf32>
		}
		return %0#0, %0#1, %0#2, %0#3, %0#4, %0#5 :
		tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>,
		tensor<?x?xf32>, tensor<?x?xf32>
		}

		// VECTOR_TRANSFERS-LABEL: func @hoist_vector_transfer_pairs_disjoint_tensor(
		// VECTOR_TRANSFERS-SAME: %[[TENSOR0:[a-zA-Z0-9]*]]: tensor<?x?xf32>,
		// VECTOR_TRANSFERS-SAME: %[[TENSOR1:[a-zA-Z0-9]*]]: tensor<?x?xf32>,
		// VECTOR_TRANSFERS-SAME: %[[TENSOR2:[a-zA-Z0-9]*]]: tensor<?x?xf32>,
		// VECTOR_TRANSFERS-SAME: %[[TENSOR3:[a-zA-Z0-9]*]]: tensor<?x?xf32>,
		func @hoist_vector_transfer_pairs_disjoint_tensor(
		%tensor0: tensor<?x?xf32>, %tensor1: tensor<?x?xf32>,
		%tensor2: tensor<?x?xf32>, %tensor3: tensor<?x?xf32>,
		%val: index, %lb : index, %ub : index, %step: index,
		%random_index : index) ->
		(tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {
		%c0 = constant 0 : index
		%c1 = constant 1 : index
		%c3 = constant 3 : index
		%cst = constant 0.0 : f32

		// VECTOR_TRANSFERS: vector.transfer_read %[[TENSOR2]]{{.*}} : tensor<?x?xf32>, vector<3xf32>
		// VECTOR_TRANSFERS: vector.transfer_read %[[TENSOR2]]{{.*}} : tensor<?x?xf32>, vector<3xf32>
		// VECTOR_TRANSFERS: vector.transfer_read %[[TENSOR3]]{{.*}} : tensor<?x?xf32>, vector<4xf32>
		// VECTOR_TRANSFERS: vector.transfer_read %[[TENSOR3]]{{.*}} : tensor<?x?xf32>, vector<4xf32>
		// VECTOR_TRANSFERS: %[[R:.]]:8 = scf.for {{.}} iter_args({{.*}}) ->
		// VECTOR_TRANSFERS-SAME: (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, vector<3xf32>, vector<3xf32>, vector<4xf32>, vector<4xf32>) {
		// VECTOR_TRANSFERS: scf.for {{.}} iter_args({{.}}) ->
		// VECTOR_TRANSFERS-SAME: (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, vector<3xf32>, vector<3xf32>, vector<4xf32>, vector<4xf32>) {
		// VECTOR_TRANSFERS: vector.transfer_read %[[TENSOR1]]{{.*}} : tensor<?x?xf32>, vector<2xf32>
		// VECTOR_TRANSFERS: vector.transfer_read %[[TENSOR1]]{{.*}} : tensor<?x?xf32>, vector<2xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<2xf32>) -> vector<2xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<2xf32>) -> vector<2xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<3xf32>) -> vector<3xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<3xf32>) -> vector<3xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<4xf32>) -> vector<4xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<4xf32>) -> vector<4xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<2xf32>) -> vector<2xf32>
		// VECTOR_TRANSFERS: "some_use"(%{{.*}}) : (vector<2xf32>) -> vector<2xf32>
		// VECTOR_TRANSFERS: vector.transfer_write %{{.}}, %{{.}}{{.*}} : vector<2xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: vector.transfer_write %{{.}}, %{{.}}{{.*}} : vector<2xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: scf.yield {{.*}} :
		// VECTOR_TRANSFERS-SAME: tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, vector<3xf32>, vector<3xf32>, vector<4xf32>, vector<4xf32>
		// VECTOR_TRANSFERS: }
		// VECTOR_TRANSFERS: scf.yield {{.*}} :
		// VECTOR_TRANSFERS-SAME: tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, vector<3xf32>, vector<3xf32>, vector<4xf32>, vector<4xf32>
		// VECTOR_TRANSFERS: }
		// VECTOR_TRANSFERS: %[[TENSOR4:.]] = vector.transfer_write %{{.}}, %[[R]]#3{{.*}} : vector<4xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: vector.transfer_write %{{.}}, %[[TENSOR4]]{{.}} : vector<4xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: %[[TENSOR5:.]] = vector.transfer_write %{{.}}, %[[R]]#2{{.*}} : vector<3xf32>, tensor<?x?xf32>
		// VECTOR_TRANSFERS: vector.transfer_write %{{.}}, %[[TENSOR5]]{{.}} : vector<3xf32>, tensor<?x?xf32>
		%0:4 = scf.for %i = %lb to %ub step %step
		iter_args(%arg0 = %tensor0, %arg1 = %tensor1, %arg2 = %tensor2,
		%arg3 = %tensor3)
		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {
		%1:4 = scf.for %j = %lb to %ub step %step
		iter_args(%arg4 = %arg0, %arg5 = %arg1, %arg6 = %arg2,
		%arg7 = %arg3)
		-> (tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>) {
		%r00 = vector.transfer_read %arg5[%c0, %c0], %cst: tensor<?x?xf32>, vector<2xf32>
		%r01 = vector.transfer_read %arg5[%c0, %c1], %cst: tensor<?x?xf32>, vector<2xf32>
		%r20 = vector.transfer_read %arg6[%c0, %c0], %cst: tensor<?x?xf32>, vector<3xf32>
		%r21 = vector.transfer_read %arg6[%c0, %c3], %cst: tensor<?x?xf32>, vector<3xf32>
		%r30 = vector.transfer_read %arg7[%c0, %random_index], %cst: tensor<?x?xf32>, vector<4xf32>
		%r31 = vector.transfer_read %arg7[%c1, %random_index], %cst: tensor<?x?xf32>, vector<4xf32>
		%r10 = vector.transfer_read %arg4[%i, %i], %cst: tensor<?x?xf32>, vector<2xf32>
		%r11 = vector.transfer_read %arg4[%random_index, %random_index], %cst: tensor<?x?xf32>, vector<2xf32>
		%u00 = "some_use"(%r00) : (vector<2xf32>) -> vector<2xf32>
		%u01 = "some_use"(%r01) : (vector<2xf32>) -> vector<2xf32>
		%u20 = "some_use"(%r20) : (vector<3xf32>) -> vector<3xf32>
		%u21 = "some_use"(%r21) : (vector<3xf32>) -> vector<3xf32>
		%u30 = "some_use"(%r30) : (vector<4xf32>) -> vector<4xf32>
		%u31 = "some_use"(%r31) : (vector<4xf32>) -> vector<4xf32>
		%u10 = "some_use"(%r10) : (vector<2xf32>) -> vector<2xf32>
		%u11 = "some_use"(%r11) : (vector<2xf32>) -> vector<2xf32>
		%w10 = vector.transfer_write %u00, %arg5[%c0, %c0] : vector<2xf32>, tensor<?x?xf32>
		%w11 = vector.transfer_write %u01, %w10[%c0, %c1] : vector<2xf32>, tensor<?x?xf32>
		%w20 = vector.transfer_write %u20, %arg6[%c0, %c0] : vector<3xf32>, tensor<?x?xf32>
		%w21 = vector.transfer_write %u21, %w20[%c0, %c3] : vector<3xf32>, tensor<?x?xf32>
		%w30 = vector.transfer_write %u30, %arg7[%c0, %random_index] : vector<4xf32>, tensor<?x?xf32>
		%w31 = vector.transfer_write %u31, %w30[%c1, %random_index] : vector<4xf32>, tensor<?x?xf32>
		%w00 = vector.transfer_write %u10, %arg4[%i, %i] : vector<2xf32>, tensor<?x?xf32>
		%w01 = vector.transfer_write %u11, %w00[%random_index, %random_index] : vector<2xf32>, tensor<?x?xf32>
		scf.yield %w01, %w11, %w21, %w31 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>
		}
		scf.yield %1#0, %1#1, %1#2, %1#3 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>
		}
		return %0#0, %0#1, %0#2, %0#3 : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>
		}

mlir/test/lib/Transforms/TestLinalgHoisting.cpp

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines

	void TestLinalgHoisting::runOnFunction() {			void TestLinalgHoisting::runOnFunction() {
	if (testHoistViewAllocs) {			if (testHoistViewAllocs) {
	hoistViewAllocOps(getFunction());			hoistViewAllocOps(getFunction());
	return;			return;
	}			}
	if (testHoistRedundantTransfers) {			if (testHoistRedundantTransfers) {
	hoistRedundantVectorTransfers(getFunction());			hoistRedundantVectorTransfers(getFunction());
				hoistRedundantVectorTransfersOnTensor(getFunction());
	return;			return;
	}			}
	}			}

	namespace mlir {			namespace mlir {
	namespace test {			namespace test {
	void registerTestLinalgHoisting() {			void registerTestLinalgHoisting() {
	PassRegistration<TestLinalgHoisting> testTestLinalgHoistingPass(			PassRegistration<TestLinalgHoisting> testTestLinalgHoistingPass(
	"test-linalg-hoisting", "Test Linalg hoisting functions.");			"test-linalg-hoisting", "Test Linalg hoisting functions.");
	}			}
	} // namespace test			} // namespace test
	} // namespace mlir			} // namespace mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Add hoisting transformation for transfer ops on tensorClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 314704

mlir/include/mlir/Dialect/Linalg/Transforms/Hoisting.h

mlir/include/mlir/Dialect/Vector/VectorUtils.h

mlir/lib/Dialect/Linalg/Transforms/Hoisting.cpp

mlir/lib/Dialect/Vector/VectorUtils.cpp

mlir/test/Dialect/Linalg/hoisting.mlir

mlir/test/lib/Transforms/TestLinalgHoisting.cpp

[mlir] Add hoisting transformation for transfer ops on tensor
ClosedPublic