This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Affine] Enable fusion of loops with vector loads/stores
ClosedPublic

Authored by dcaballe on Jun 1 2020, 6:30 PM.

Download Raw Diff

Details

Reviewers

bondhugula
andydavis1
nicolasvasilache
ftynse

Commits

rG8a418e5f8e89: [mlir][Affine] Enable fusion of loops with vector loads/stores

Summary

This patch enables affine loop fusion for loops with affine vector loads
and stores. For that, we only had to use affine memory op interfaces in
LoopFusionUtils.cpp and Utils.cpp so that vector loads and stores are
also taken into account.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

dcaballe created this revision.Jun 1 2020, 6:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2020, 6:30 PM

Herald added subscribers: jurahul, Kayjukh, frgossen and 12 others. · View Herald Transcript

Harbormaster completed remote builds in B58707: Diff 267777.Jun 1 2020, 7:30 PM

This should be a canonical example of why analyses/passes should use interfaces

This revision is now accepted and ready to land.Jun 2 2020, 8:58 AM

andydavis1 accepted this revision.Jun 2 2020, 10:29 AM

In D80971#2068940, @ftynse wrote:

This should be a canonical example of why analyses/passes should use interfaces

Indeed! Nice :)

Closed by commit rG8a418e5f8e89: [mlir][Affine] Enable fusion of loops with vector loads/stores (authored by dcaballe). · Explain WhyJun 2 2020, 3:56 PM

This revision was automatically updated to reflect the committed changes.

This is really great - the same could be done for memref store to load forwarding, and it should work out of the box with affine.vector_load/store. But there's an issue with this (here and in general) if you mix vector load/stores with regular load/stores on the same memref. The slices computed by fusion aren't aware of the larger underlying loaded/stored data with the vectors. So the regions will be inaccurate at the boundaries. You are fine as long as you don't have memrefs that have both vector load/stores and scalar load/stores in different loop nests. With this patch, at the moment, if you mix the two, you'll get incorrect output from fusion say when the producer nest has scalar stores and the consumer vector loads. The slice won't have all the data needed.

This can be fixed by fixing the dependence information and memref region computation to account for vector load/stores. They would be inaccurate after this revision (and of course anyway inaccurate even if completely ignoring vector load/store).

Herald added a subscriber: msifontes. · View Herald TranscriptJun 7 2020, 11:42 AM

Thanks Uday! Good point. I haven't looked too much into the slice computation but if you can send me some pointers and some examples on how the regions should be computed I could have a look. Thanks!

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Affine/

IR/

AffineMemoryOpInterfaces.td

26 lines

AffineOps.td

7 lines

lib/

Transforms/

Utils/

LoopFusionUtils.cpp

16 lines

Utils.cpp

6 lines

test/

Transforms/

loop-fusion.mlir

29 lines

Diff 268013

mlir/include/mlir/Dialect/Affine/IR/AffineMemoryOpInterfaces.td

Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	InterfaceMethod<
/methodName=/"getAffineMap",		/methodName=/"getAffineMap",
/args=/(ins),		/args=/(ins),
/methodBody=/[{}],		/methodBody=/[{}],
/defaultImplementation=/[{		/defaultImplementation=/[{
ConcreteOp op = cast<ConcreteOp>(this->getOperation());		ConcreteOp op = cast<ConcreteOp>(this->getOperation());
return op.getAffineMapAttr().getValue();		return op.getAffineMapAttr().getValue();
}]		}]
>,		>,
		InterfaceMethod<
		/desc=/"Returns the AffineMapAttr associated with 'memref'.",
		/retTy=/"NamedAttribute",
		/methodName=/"getAffineMapAttrForMemRef",
		/args=/(ins "Value":$memref),
		/methodBody=/[{}],
		/defaultImplementation=/[{
		ConcreteOp op = cast<ConcreteOp>(this->getOperation());
		assert(memref == getMemRef());
		return {Identifier::get(op.getMapAttrName(), op.getContext()),
		op.getAffineMapAttr()};
		}]
		>,
];		];
}		}

def AffineWriteOpInterface : OpInterface<"AffineWriteOpInterface"> {		def AffineWriteOpInterface : OpInterface<"AffineWriteOpInterface"> {
let description = [{		let description = [{
Interface to query characteristics of write-like ops with affine		Interface to query characteristics of write-like ops with affine
restrictions.		restrictions.
}];		}];
Show All 40 Lines	InterfaceMethod<
/methodName=/"getAffineMap",		/methodName=/"getAffineMap",
/args=/(ins),		/args=/(ins),
/methodName=/[{}],		/methodName=/[{}],
/defaultImplementation=/[{		/defaultImplementation=/[{
ConcreteOp op = cast<ConcreteOp>(this->getOperation());		ConcreteOp op = cast<ConcreteOp>(this->getOperation());
return op.getAffineMapAttr().getValue();		return op.getAffineMapAttr().getValue();
}]		}]
>,		>,
		InterfaceMethod<
		/desc=/"Returns the AffineMapAttr associated with 'memref'.",
		/retTy=/"NamedAttribute",
		/methodName=/"getAffineMapAttrForMemRef",
		/args=/(ins "Value":$memref),
		/methodBody=/[{}],
		/defaultImplementation=/[{
		ConcreteOp op = cast<ConcreteOp>(this->getOperation());
		assert(memref == getMemRef());
		return {Identifier::get(op.getMapAttrName(), op.getContext()),
		op.getAffineMapAttr()};
		}]
		>,
];		];
}		}

#endif // AFFINEMEMORYOPINTERFACES		#endif // AFFINEMEMORYOPINTERFACES

mlir/include/mlir/Dialect/Affine/IR/AffineOps.td

Show First 20 Lines • Show All 383 Lines • ▼ Show 20 Lines	code extraClassDeclarationBase = [{

void setMemRef(Value value) { setOperand(getMemRefOperandIndex(), value); }		void setMemRef(Value value) { setOperand(getMemRefOperandIndex(), value); }

/// Returns the affine map used to index the memref for this operation.		/// Returns the affine map used to index the memref for this operation.
AffineMapAttr getAffineMapAttr() {		AffineMapAttr getAffineMapAttr() {
return getAttr(getMapAttrName()).cast<AffineMapAttr>();		return getAttr(getMapAttrName()).cast<AffineMapAttr>();
}		}

/// Returns the AffineMapAttr associated with 'memref'.
NamedAttribute getAffineMapAttrForMemRef(Value memref) {
assert(memref == getMemRef());
return {Identifier::get(getMapAttrName(), getContext()),
getAffineMapAttr()};
}

static StringRef getMapAttrName() { return "map"; }		static StringRef getMapAttrName() { return "map"; }
}];		}];
}		}

def AffineLoadOp : AffineLoadOpBase<"load"> {		def AffineLoadOp : AffineLoadOpBase<"load"> {
let summary = "affine load operation";		let summary = "affine load operation";
let description = [{		let description = [{
The "affine.load" op reads an element from a memref, where the index		The "affine.load" op reads an element from a memref, where the index
▲ Show 20 Lines • Show All 467 Lines • Show Last 20 Lines

mlir/lib/Transforms/Utils/LoopFusionUtils.cpp

Show All 32 Lines

using namespace mlir;		using namespace mlir;

// Gathers all load and store memref accesses in 'opA' into 'values', where		// Gathers all load and store memref accesses in 'opA' into 'values', where
// 'values[memref] == true' for each store operation.		// 'values[memref] == true' for each store operation.
static void getLoadAndStoreMemRefAccesses(Operation *opA,		static void getLoadAndStoreMemRefAccesses(Operation *opA,
DenseMap<Value, bool> &values) {		DenseMap<Value, bool> &values) {
opA->walk([&](Operation *op) {		opA->walk([&](Operation *op) {
if (auto loadOp = dyn_cast<AffineLoadOp>(op)) {		if (auto loadOp = dyn_cast<AffineReadOpInterface>(op)) {
if (values.count(loadOp.getMemRef()) == 0)		if (values.count(loadOp.getMemRef()) == 0)
values[loadOp.getMemRef()] = false;		values[loadOp.getMemRef()] = false;
} else if (auto storeOp = dyn_cast<AffineStoreOp>(op)) {		} else if (auto storeOp = dyn_cast<AffineWriteOpInterface>(op)) {
values[storeOp.getMemRef()] = true;		values[storeOp.getMemRef()] = true;
}		}
});		});
}		}

// Returns true if 'op' is a load or store operation which access an memref		// Returns true if 'op' is a load or store operation which access an memref
// accessed 'values' and at least one of the access is a store operation.		// accessed 'values' and at least one of the access is a store operation.
// Returns false otherwise.		// Returns false otherwise.
static bool isDependentLoadOrStoreOp(Operation *op,		static bool isDependentLoadOrStoreOp(Operation *op,
DenseMap<Value, bool> &values) {		DenseMap<Value, bool> &values) {
if (auto loadOp = dyn_cast<AffineLoadOp>(op)) {		if (auto loadOp = dyn_cast<AffineReadOpInterface>(op)) {
return values.count(loadOp.getMemRef()) > 0 &&		return values.count(loadOp.getMemRef()) > 0 &&
values[loadOp.getMemRef()] == true;		values[loadOp.getMemRef()] == true;
} else if (auto storeOp = dyn_cast<AffineStoreOp>(op)) {		} else if (auto storeOp = dyn_cast<AffineWriteOpInterface>(op)) {
return values.count(storeOp.getMemRef()) > 0;		return values.count(storeOp.getMemRef()) > 0;
}		}
return false;		return false;
}		}

// Returns the first operation in range ('opA', 'opB') which has a data		// Returns the first operation in range ('opA', 'opB') which has a data
// dependence on 'opA'. Returns 'nullptr' of no dependence exists.		// dependence on 'opA'. Returns 'nullptr' of no dependence exists.
static Operation getFirstDependentOpInRange(Operation opA, Operation *opB) {		static Operation getFirstDependentOpInRange(Operation opA, Operation *opB) {
Show All 33 Lines	static Operation getLastDependentOpInRange(Operation opA, Operation *opB) {
// *) 'opX' and 'opB' access the same memref and at least one of the accesses		// *) 'opX' and 'opB' access the same memref and at least one of the accesses
// is a store.		// is a store.
// *) 'opX' produces an SSA Value which is used by 'opB'.		// *) 'opX' produces an SSA Value which is used by 'opB'.
Operation *lastDepOp = nullptr;		Operation *lastDepOp = nullptr;
for (Block::reverse_iterator it = std::next(Block::reverse_iterator(opB));		for (Block::reverse_iterator it = std::next(Block::reverse_iterator(opB));
it != Block::reverse_iterator(opA); ++it) {		it != Block::reverse_iterator(opA); ++it) {
Operation opX = &(it);		Operation opX = &(it);
opX->walk([&](Operation *op) {		opX->walk([&](Operation *op) {
if (isa<AffineLoadOp>(op) \|\| isa<AffineStoreOp>(op)) {		if (isa<AffineReadOpInterface>(op) \|\| isa<AffineWriteOpInterface>(op)) {
if (isDependentLoadOrStoreOp(op, values)) {		if (isDependentLoadOrStoreOp(op, values)) {
lastDepOp = opX;		lastDepOp = opX;
return WalkResult::interrupt();		return WalkResult::interrupt();
}		}
return WalkResult::advance();		return WalkResult::advance();
}		}
for (auto value : op->getResults()) {		for (auto value : op->getResults()) {
for (auto user : value.getUsers()) {		for (auto user : value.getUsers()) {
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines

// Gathers all load and store ops in loop nest rooted at 'forOp' into		// Gathers all load and store ops in loop nest rooted at 'forOp' into
// 'loadAndStoreOps'.		// 'loadAndStoreOps'.
static bool		static bool
gatherLoadsAndStores(AffineForOp forOp,		gatherLoadsAndStores(AffineForOp forOp,
SmallVectorImpl<Operation *> &loadAndStoreOps) {		SmallVectorImpl<Operation *> &loadAndStoreOps) {
bool hasIfOp = false;		bool hasIfOp = false;
forOp.walk([&](Operation *op) {		forOp.walk([&](Operation *op) {
if (isa<AffineLoadOp>(op) \|\| isa<AffineStoreOp>(op))		if (isa<AffineReadOpInterface>(op) \|\| isa<AffineWriteOpInterface>(op))
loadAndStoreOps.push_back(op);		loadAndStoreOps.push_back(op);
else if (isa<AffineIfOp>(op))		else if (isa<AffineIfOp>(op))
hasIfOp = true;		hasIfOp = true;
});		});
return !hasIfOp;		return !hasIfOp;
}		}

// TODO(andydavis) Prevent fusion of loop nests with side-effecting operations.		// TODO(andydavis) Prevent fusion of loop nests with side-effecting operations.
▲ Show 20 Lines • Show All 268 Lines • ▼ Show 20 Lines	bool mlir::getFusionComputeCost(AffineForOp srcForOp, LoopNestStats &srcStats,
// The store and loads to this memref will disappear.		// The store and loads to this memref will disappear.
// TODO(andydavis) Add load coalescing to memref data flow opt pass.		// TODO(andydavis) Add load coalescing to memref data flow opt pass.
if (storeLoadFwdGuaranteed) {		if (storeLoadFwdGuaranteed) {
// Subtract from operation count the loads/store we expect load/store		// Subtract from operation count the loads/store we expect load/store
// forwarding to remove.		// forwarding to remove.
unsigned storeCount = 0;		unsigned storeCount = 0;
llvm::SmallDenseSet<Value, 4> storeMemrefs;		llvm::SmallDenseSet<Value, 4> storeMemrefs;
srcForOp.walk([&](Operation *op) {		srcForOp.walk([&](Operation *op) {
if (auto storeOp = dyn_cast<AffineStoreOp>(op)) {		if (auto storeOp = dyn_cast<AffineWriteOpInterface>(op)) {
storeMemrefs.insert(storeOp.getMemRef());		storeMemrefs.insert(storeOp.getMemRef());
++storeCount;		++storeCount;
}		}
});		});
// Subtract out any store ops in single-iteration src slice loop nest.		// Subtract out any store ops in single-iteration src slice loop nest.
if (storeCount > 0)		if (storeCount > 0)
computeCostMap[insertPointParent] = -storeCount;		computeCostMap[insertPointParent] = -storeCount;
// Subtract out any load users of 'storeMemrefs' nested below		// Subtract out any load users of 'storeMemrefs' nested below
// 'insertPointParent'.		// 'insertPointParent'.
for (auto value : storeMemrefs) {		for (auto value : storeMemrefs) {
for (auto *user : value.getUsers()) {		for (auto *user : value.getUsers()) {
if (auto loadOp = dyn_cast<AffineLoadOp>(user)) {		if (auto loadOp = dyn_cast<AffineReadOpInterface>(user)) {
SmallVector<AffineForOp, 4> loops;		SmallVector<AffineForOp, 4> loops;
// Check if any loop in loop nest surrounding 'user' is		// Check if any loop in loop nest surrounding 'user' is
// 'insertPointParent'.		// 'insertPointParent'.
getLoopIVs(*user, &loops);		getLoopIVs(*user, &loops);
if (llvm::is_contained(loops, cast<AffineForOp>(insertPointParent))) {		if (llvm::is_contained(loops, cast<AffineForOp>(insertPointParent))) {
if (auto forOp =		if (auto forOp =
dyn_cast_or_null<AffineForOp>(user->getParentOp())) {		dyn_cast_or_null<AffineForOp>(user->getParentOp())) {
if (computeCostMap.count(forOp) == 0)		if (computeCostMap.count(forOp) == 0)
Show All 21 Lines

mlir/lib/Transforms/Utils/Utils.cpp

	Show All 24 Lines
	#include "llvm/ADT/DenseMap.h"			#include "llvm/ADT/DenseMap.h"
	#include "llvm/ADT/TypeSwitch.h"			#include "llvm/ADT/TypeSwitch.h"
	using namespace mlir;			using namespace mlir;

	/// Return true if this operation dereferences one or more memref's.			/// Return true if this operation dereferences one or more memref's.
	// Temporary utility: will be replaced when this is modeled through			// Temporary utility: will be replaced when this is modeled through
	// side-effects/op traits. TODO(b/117228571)			// side-effects/op traits. TODO(b/117228571)
	static bool isMemRefDereferencingOp(Operation &op) {			static bool isMemRefDereferencingOp(Operation &op) {
	if (isa<AffineLoadOp>(op) \|\| isa<AffineStoreOp>(op) \|\|			if (isa<AffineReadOpInterface>(op) \|\| isa<AffineWriteOpInterface>(op) \|\|
	isa<AffineDmaStartOp>(op) \|\| isa<AffineDmaWaitOp>(op))			isa<AffineDmaStartOp>(op) \|\| isa<AffineDmaWaitOp>(op))
	return true;			return true;
	return false;			return false;
	}			}

	/// Return the AffineMapAttr associated with memory 'op' on 'memref'.			/// Return the AffineMapAttr associated with memory 'op' on 'memref'.
	static NamedAttribute getAffineMapAttrForMemRef(Operation *op, Value memref) {			static NamedAttribute getAffineMapAttrForMemRef(Operation *op, Value memref) {
	return TypeSwitch<Operation *, NamedAttribute>(op)			return TypeSwitch<Operation *, NamedAttribute>(op)
	.Case<AffineDmaStartOp, AffineLoadOp, AffinePrefetchOp, AffineStoreOp,			.Case<AffineDmaStartOp, AffineReadOpInterface, AffinePrefetchOp,
	AffineDmaWaitOp>(			AffineWriteOpInterface, AffineDmaWaitOp>(
	[=](auto op) { return op.getAffineMapAttrForMemRef(memref); });			[=](auto op) { return op.getAffineMapAttrForMemRef(memref); });
	}			}

	// Perform the replacement in `op`.			// Perform the replacement in `op`.
	LogicalResult mlir::replaceAllMemRefUsesWith(Value oldMemRef, Value newMemRef,			LogicalResult mlir::replaceAllMemRefUsesWith(Value oldMemRef, Value newMemRef,
	Operation *op,			Operation *op,
	ArrayRef<Value> extraIndices,			ArrayRef<Value> extraIndices,
	AffineMap indexRemap,			AffineMap indexRemap,
	▲ Show 20 Lines • Show All 418 Lines • Show Last 20 Lines

mlir/test/Transforms/loop-fusion.mlir

Show First 20 Lines • Show All 2,458 Lines • ▼ Show 20 Lines	func @reshape_into_matmul(%lhs : memref<1024x1024xf32>,
return		return
}		}
// MAXIMAL-NEXT: alloc		// MAXIMAL-NEXT: alloc
// MAXIMAL-NEXT: affine.for		// MAXIMAL-NEXT: affine.for
// MAXIMAL-NEXT: affine.for		// MAXIMAL-NEXT: affine.for
// MAXIMAL-NEXT: affine.for		// MAXIMAL-NEXT: affine.for
// MAXIMAL-NOT: affine.for		// MAXIMAL-NOT: affine.for
// MAXIMAL: return		// MAXIMAL: return

		// -----

		// CHECK-LABEL: func @vector_loop
		func @vector_loop(%a : memref<10x20xf32>, %b : memref<10x20xf32>,
		%c : memref<10x20xf32>) {
		affine.for %j = 0 to 10 {
		affine.for %i = 0 to 5 {
		%ld0 = affine.vector_load %a[%j, %i*4] : memref<10x20xf32>, vector<4xf32>
		affine.vector_store %ld0, %b[%j, %i*4] : memref<10x20xf32>, vector<4xf32>
		}
		}

		affine.for %j = 0 to 10 {
		affine.for %i = 0 to 5 {
		%ld0 = affine.vector_load %b[%j, %i*4] : memref<10x20xf32>, vector<4xf32>
		affine.vector_store %ld0, %c[%j, %i*4] : memref<10x20xf32>, vector<4xf32>
		}
		}

		return
		}
		// CHECK: affine.for
		// CHECK-NEXT: affine.for
		// CHECK-NEXT: affine.vector_load
		// CHECK-NEXT: affine.vector_store
		// CHECK-NEXT: affine.vector_load
		// CHECK-NEXT: affine.vector_store
		// CHECK-NOT: affine.for