This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Vector/Transforms/
-
Dialect/
-
Vector/
-
Transforms/
-
VectorTransferSplitRewritePatterns.cpp
-
test/Dialect/Vector/
-
Dialect/
-
Vector/
-
vector-transfer-full-partial-split.mlir

Differential D124366

[mlir][vector] insert `alloca`s outside of loops
ClosedPublic

Authored by ftynse on Apr 25 2022, 1:26 AM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
hanchung

Commits

rG4c807f2f579f: [mlir][vector] insert `alloca`s outside of loops

Summary

After https://reviews.llvm.org/D119743 added the AutomaticAllocationScope
trait to loop-like constructs, the vector transfer full/partial splitting pass
started inserting allocations for temporaries within the closest loop rather
than the closest function (or other allocation scope such as async.execute).
While this is correct as long as the lowered code takes care of automatic
deallocation at the end of each iteration of the loop, this interferes with
downstream optimizations that expect allocas to be at the function level.
Step over loops when looking for the closest allocation scope in vector
transfer full/partial splitting pass thus restoring the original behavior.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ftynse created this revision.Apr 25 2022, 1:26 AM

Herald added a reviewer: aartbik. · View Herald TranscriptApr 25 2022, 1:26 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: ThomasRaoux, sdasgup3, wenzhicui and 21 others. · View Herald Transcript

ftynse requested review of this revision.Apr 25 2022, 1:26 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptApr 25 2022, 1:26 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

ftynse added a reviewer: hanchung.Apr 25 2022, 1:26 AM

Thanks!

This revision is now accepted and ready to land.Apr 25 2022, 1:47 AM

Harbormaster completed remote builds in B161119: Diff 424841.Apr 25 2022, 1:47 AM

This revision was landed with ongoing or failed builds.Apr 25 2022, 1:49 AM

Closed by commit rG4c807f2f579f: [mlir][vector] insert `alloca`s outside of loops (authored by ftynse). · Explain Why

This revision was automatically updated to reflect the committed changes.

ftynse added a commit: rG4c807f2f579f: [mlir][vector] insert `alloca`s outside of loops.

What is the longer term plan here? Do all places that introduce alloca have to perform this local optimization? Should we instead have a pass that hoists alloca out of loops? Or rethink the introduction of allocation scope for loops?

We should add a facility (a pass presumably) that hoists allocas up to some user-specified level. Maybe the allocation scope that is not nested in another allocation scope, but is nested in the closest isolated-from-above operation. I believe @wsmoses has a prototype for this.

In D124366#3471552, @ftynse wrote:

We should add a facility (a pass presumably) that hoists allocas up to some user-specified level. Maybe the allocation scope that is not nested in another allocation scope, but is nested in the closest isolated-from-above operation. I believe @wsmoses has a prototype for this.

This might be useful for FIR as well since there is a plan to implement such a pass in Flang/FIR.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Vector/

Transforms/

VectorTransferSplitRewritePatterns.cpp

13 lines

test/

Dialect/

Vector/

vector-transfer-full-partial-split.mlir

20 lines

Diff 424845

mlir/lib/Dialect/Vector/Transforms/VectorTransferSplitRewritePatterns.cpp

Show First 20 Lines • Show All 435 Lines • ▼ Show 20 Lines	b.create<scf::IfOp>(loc, notInBounds, [&](OpBuilder &b, Location loc) {
mapping.map(xferOp.getVector(), load);		mapping.map(xferOp.getVector(), load);
b.clone(*xferOp.getOperation(), mapping);		b.clone(*xferOp.getOperation(), mapping);
b.create<scf::YieldOp>(loc, ValueRange{});		b.create<scf::YieldOp>(loc, ValueRange{});
});		});
}		}

// TODO: Parallelism and threadlocal considerations with a ParallelScope trait.		// TODO: Parallelism and threadlocal considerations with a ParallelScope trait.
static Operation getAutomaticAllocationScope(Operation op) {		static Operation getAutomaticAllocationScope(Operation op) {
Operation *scope =		// Find the closest surrounding allocation scope that is not a known looping
op->getParentWithTrait<OpTrait::AutomaticAllocationScope>();		// construct (putting alloca's in loops doesn't always lower to deallocation
		// until the end of the loop).
		Operation *scope = nullptr;
		for (Operation *parent = op->getParentOp(); parent != nullptr;
		parent = parent->getParentOp()) {
		if (parent->hasTrait<OpTrait::AutomaticAllocationScope>())
		scope = parent;
		if (!isa<scf::ForOp, AffineForOp>(parent))
		break;
		}
assert(scope && "Expected op to be inside automatic allocation scope");		assert(scope && "Expected op to be inside automatic allocation scope");
return scope;		return scope;
}		}

/// Split a vector.transfer operation into an in-bounds (i.e., no out-of-bounds		/// Split a vector.transfer operation into an in-bounds (i.e., no out-of-bounds
/// masking) fastpath and a slowpath.		/// masking) fastpath and a slowpath.
///		///
/// For vector.transfer_read:		/// For vector.transfer_read:
▲ Show 20 Lines • Show All 182 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/vector-transfer-full-partial-split.mlir

Show First 20 Lines • Show All 406 Lines • ▼ Show 20 Lines	func.func @transfer_read_within_async_execute(%A : memref<?x?xf32>) -> !async.token {
// CHECK: alloca		// CHECK: alloca
%token = async.execute {		%token = async.execute {
%0 = vector.transfer_read %A[%c0, %c0], %f0 : memref<?x?xf32>, vector<2x2xf32>		%0 = vector.transfer_read %A[%c0, %c0], %f0 : memref<?x?xf32>, vector<2x2xf32>
func.call @fake_side_effecting_fun(%0) : (vector<2x2xf32>) -> ()		func.call @fake_side_effecting_fun(%0) : (vector<2x2xf32>) -> ()
async.yield		async.yield
}		}
return %token : !async.token		return %token : !async.token
}		}

		// -----

		func.func private @fake_side_effecting_fun(%0: vector<2x2xf32>) -> ()

		// Ensure that `alloca`s are inserted outside of loops even though loops are
		// consdered allocation scopes.
		// CHECK-LABEL: transfer_read_within_scf_for
		func.func @transfer_read_within_scf_for(%A : memref<?x?xf32>, %lb : index, %ub : index, %step : index) {
		%c0 = arith.constant 0 : index
		%f0 = arith.constant 0.0 : f32
		// CHECK: alloca
		// CHECK: scf.for
		// CHECK-NOT: alloca
		scf.for %i = %lb to %ub step %step {
		%0 = vector.transfer_read %A[%c0, %c0], %f0 : memref<?x?xf32>, vector<2x2xf32>
		func.call @fake_side_effecting_fun(%0) : (vector<2x2xf32>) -> ()
		}
		return
		}