This is an archive of the discontinued LLVM Phabricator instance.

mlir/lib/Transforms/MemRefDataFlowOpt.cpp
286	We should have a doc comment here - if needed, this can be brief and refer to the larger comment above.
299–312	You can actually do both in one walk, i.e., if the `store -> load` forward doesn't forward it, immediately run the `loadCSE`. (The forwarding can be made to return a `LogicalStatus` to check if it happened.)
302	Comment here please. This isn't covered by the comment above.
302–303	Please use `llvm::is_contained`.
321–339	This is a large block of duplicated code that should be factored out and shared. You could have a function that given the `storeOp` and `loadOp` returns true/false as to whether the former may be reaching the latter, and this can be used to guard the collection of `depSrcStores`.
346–349	Actually post dominance isn't needed here - just dominance is sufficient (but has to be used differently) and in fact will lead to a less conservative condition. But this can be fixed in a follow-up patch since the check for `forwardStoreToLoad` can be similarly made stronger as well freed from post-dominance info. (I already have a patch and was incidentally planning to submit it; so I can take care of this in a follow-up.)

This revision now requires changes to proceed.May 30 2021, 3:08 AM

Address code review comments to use a utility function, combine walks, and add more comments.

ayzhuang marked 4 inline comments as done.Jun 1 2021, 12:37 PM

ayzhuang added inline comments.

mlir/lib/Transforms/MemRefDataFlowOpt.cpp

346–349

With your new patch, will %v1 be replaced?

affine.for %i0 = 0 to 10 {
  affine.for %i1 = 0 to 9 {
    %a1 = affine.apply affine_map<(d0, d1) -> (d1 + 1)> (%i0, %i1)
    affine.store %c7, %m[%i0, %a1] : memref<10x10xf32>
    %v0 = affine.load %m[%i0, %i1] : memref<10x10xf32>
  }
}

affine.for %i0 = 0 to 10 {
  affine.for %i1 = 0 to 9 {
    affine.store %c7, %m[%i0, %i1] : memref<10x10xf32>
    %v1 = affine.load %m[%i0, %i1] : memref<10x10xf32>
  }
}

Harbormaster completed remote builds in B107105: Diff 349057.Jun 1 2021, 1:50 PM

Add shape checks needed for affine vector loads and stores.

Harbormaster completed remote builds in B107146: Diff 349125.Jun 1 2021, 4:54 PM

bondhugula added inline comments.Jun 2 2021, 5:20 AM

mlir/lib/Transforms/MemRefDataFlowOpt.cpp
346–349	Yes. More simply put, patterns like this as well: affine.store ... affine.for ... { affine.load ... affine.store ... ... = affine.load ... } With the current pass, the store to load in the nest won't happen. You don't need the last store to postdominate all other relevant stores. The new condition will similarly make load CSE more powerful.

Thanks for addressing everything - LGTM!

This revision is now accepted and ready to land.Jun 2 2021, 5:23 AM

@bondhugula Thank you for the review. We'll merge it tomorrow if there are no more comments.

ayzhuang added inline comments.Jun 2 2021, 9:37 AM

mlir/lib/Transforms/MemRefDataFlowOpt.cpp
346–349	Great!

Very helpful! Looks great, thanks.

Closed by commit rG986bef97826f: [mlir] Remove redundant loads (authored by ayzhuang). · Explain WhyJun 3 2021, 4:05 PM

This revision was automatically updated to reflect the committed changes.

ayzhuang added a commit: rG986bef97826f: [mlir] Remove redundant loads.

Revision Contents

Path

Size

mlir/

lib/

Transforms/

MemRefDataFlowOpt.cpp

181 lines

test/

Transforms/

memref-dataflow-opt.mlir

230 lines

Diff 349720

mlir/lib/Transforms/MemRefDataFlowOpt.cpp

//===- MemRefDataFlowOpt.cpp - MemRef DataFlow Optimization pass ------ -*-===//		//===- MemRefDataFlowOpt.cpp - MemRef DataFlow Optimization pass ------ -*-===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements a pass to forward memref stores to loads, thereby		// This file implements a pass to forward memref stores to loads, thereby
// potentially getting rid of intermediate memref's entirely.		// potentially getting rid of intermediate memref's entirely. It also removes
		// redundant loads.
// TODO: In the future, similar techniques could be used to eliminate		// TODO: In the future, similar techniques could be used to eliminate
// dead memref store's and perform more complex forwarding when support for		// dead memref store's and perform more complex forwarding when support for
// SSA scalars live out of 'affine.for'/'affine.if' statements is available.		// SSA scalars live out of 'affine.for'/'affine.if' statements is available.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "PassDetail.h"		#include "PassDetail.h"
#include "mlir/Analysis/AffineAnalysis.h"		#include "mlir/Analysis/AffineAnalysis.h"
#include "mlir/Analysis/Utils.h"		#include "mlir/Analysis/Utils.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/IR/Dominance.h"		#include "mlir/IR/Dominance.h"
		#include "mlir/Support/LogicalResult.h"
#include "mlir/Transforms/Passes.h"		#include "mlir/Transforms/Passes.h"
#include "llvm/ADT/SmallPtrSet.h"		#include "llvm/ADT/SmallPtrSet.h"
#include <algorithm>		#include <algorithm>

#define DEBUG_TYPE "memref-dataflow-opt"		#define DEBUG_TYPE "memref-dataflow-opt"

using namespace mlir;		using namespace mlir;

namespace {		namespace {
// The store to load forwarding relies on three conditions:		// The store to load forwarding and load CSE rely on three conditions:
//		//
// 1) they need to have mathematically equivalent affine access functions		// 1) store/load and load need to have mathematically equivalent affine access
// (checked after full composition of load/store operands); this implies that		// functions (checked after full composition of load/store operands); this
// they access the same single memref element for all iterations of the common		// implies that they access the same single memref element for all iterations of
// surrounding loop,		// the common surrounding loop,
//		//
// 2) the store op should dominate the load op,		// 2) the store/load op should dominate the load op,
//		//
// 3) among all op's that satisfy both (1) and (2), the one that postdominates		// 3) among all op's that satisfy both (1) and (2), for store to load
// all store op's that have a dependence into the load, is provably the last		// forwarding, the one that postdominates all store op's that have a dependence
// writer to the particular memref location being loaded at the load op, and its		// into the load, is provably the last writer to the particular memref location
// store value can be forwarded to the load. Note that the only dependences		// being loaded at the load op, and its store value can be forwarded to the
// that are to be considered are those that are satisfied at the block* of the		// load; for load CSE, any op that postdominates all store op's that have a
// innermost common surrounding loop of the <store, load> being considered.		// dependence into the load can be forwarded and the first one found is chosen.
		// Note that the only dependences that are to be considered are those that are
		// satisfied at the block* of the innermost common surrounding loop of the
		// <store/load, load> being considered.
//		//
// (* A dependence being satisfied at a block: a dependence that is satisfied by		// (* A dependence being satisfied at a block: a dependence that is satisfied by
// virtue of the destination operation appearing textually / lexically after		// virtue of the destination operation appearing textually / lexically after
// the source operation within the body of a 'affine.for' operation; thus, a		// the source operation within the body of a 'affine.for' operation; thus, a
// dependence is always either satisfied by a loop or by a block).		// dependence is always either satisfied by a loop or by a block).
//		//
// The above conditions are simple to check, sufficient, and powerful for most		// The above conditions are simple to check, sufficient, and powerful for most
// cases in practice - they are sufficient, but not necessary --- since they		// cases in practice - they are sufficient, but not necessary --- since they
// don't reason about loops that are guaranteed to execute at least once or		// don't reason about loops that are guaranteed to execute at least once or
// multiple sources to forward from.		// multiple sources to forward from.
//		//
// TODO: more forwarding can be done when support for		// TODO: more forwarding can be done when support for
// loop/conditional live-out SSA values is available.		// loop/conditional live-out SSA values is available.
// TODO: do general dead store elimination for memref's. This pass		// TODO: do general dead store elimination for memref's. This pass
// currently only eliminates the stores only if no other loads/uses (other		// currently only eliminates the stores only if no other loads/uses (other
// than dealloc) remain.		// than dealloc) remain.
//		//
struct MemRefDataFlowOpt : public MemRefDataFlowOptBase<MemRefDataFlowOpt> {		struct MemRefDataFlowOpt : public MemRefDataFlowOptBase<MemRefDataFlowOpt> {
void runOnFunction() override;		void runOnFunction() override;

void forwardStoreToLoad(AffineReadOpInterface loadOp);		LogicalResult forwardStoreToLoad(AffineReadOpInterface loadOp);
		void loadCSE(AffineReadOpInterface loadOp);

// A list of memref's that are potentially dead / could be eliminated.		// A list of memref's that are potentially dead / could be eliminated.
SmallPtrSet<Value, 4> memrefsToErase;		SmallPtrSet<Value, 4> memrefsToErase;
// Load op's whose results were replaced by those forwarded from stores.		// Load op's whose results were replaced by those forwarded from stores
		// dominating stores or loads..
SmallVector<Operation *, 8> loadOpsToErase;		SmallVector<Operation *, 8> loadOpsToErase;

DominanceInfo *domInfo = nullptr;		DominanceInfo *domInfo = nullptr;
PostDominanceInfo *postDomInfo = nullptr;		PostDominanceInfo *postDomInfo = nullptr;
};		};

} // end anonymous namespace		} // end anonymous namespace

/// Creates a pass to perform optimizations relying on memref dataflow such as		/// Creates a pass to perform optimizations relying on memref dataflow such as
/// store to load forwarding, elimination of dead stores, and dead allocs.		/// store to load forwarding, elimination of dead stores, and dead allocs.
std::unique_ptr<OperationPass<FuncOp>> mlir::createMemRefDataFlowOptPass() {		std::unique_ptr<OperationPass<FuncOp>> mlir::createMemRefDataFlowOptPass() {
return std::make_unique<MemRefDataFlowOpt>();		return std::make_unique<MemRefDataFlowOpt>();
}		}

		// Check if the store may be reaching the load.
		static bool storeMayReachLoad(Operation storeOp, Operation loadOp,
		unsigned minSurroundingLoops) {
		MemRefAccess srcAccess(storeOp);
		MemRefAccess destAccess(loadOp);
		FlatAffineConstraints dependenceConstraints;
		unsigned nsLoops = getNumCommonSurroundingLoops(loadOp, storeOp);
		unsigned d;
		// Dependences at loop depth <= minSurroundingLoops do NOT matter.
		for (d = nsLoops + 1; d > minSurroundingLoops; d--) {
		DependenceResult result = checkMemrefAccessDependence(
		srcAccess, destAccess, d, &dependenceConstraints,
		/dependenceComponents=/nullptr);
		if (hasDependence(result))
		break;
		}
		if (d <= minSurroundingLoops)
		return false;

		return true;
		}

// This is a straightforward implementation not optimized for speed. Optimize		// This is a straightforward implementation not optimized for speed. Optimize
// if needed.		// if needed.
void MemRefDataFlowOpt::forwardStoreToLoad(AffineReadOpInterface loadOp) {		LogicalResult
		MemRefDataFlowOpt::forwardStoreToLoad(AffineReadOpInterface loadOp) {
// First pass over the use list to get the minimum number of surrounding		// First pass over the use list to get the minimum number of surrounding
// loops common between the load op and the store op, with min taken across		// loops common between the load op and the store op, with min taken across
// all store ops.		// all store ops.
SmallVector<Operation *, 8> storeOps;		SmallVector<Operation *, 8> storeOps;
unsigned minSurroundingLoops = getNestingDepth(loadOp);		unsigned minSurroundingLoops = getNestingDepth(loadOp);
for (auto *user : loadOp.getMemRef().getUsers()) {		for (auto *user : loadOp.getMemRef().getUsers()) {
auto storeOp = dyn_cast<AffineWriteOpInterface>(user);		auto storeOp = dyn_cast<AffineWriteOpInterface>(user);
if (!storeOp)		if (!storeOp)
continue;		continue;
unsigned nsLoops = getNumCommonSurroundingLoops(loadOp, storeOp);		unsigned nsLoops = getNumCommonSurroundingLoops(loadOp, storeOp);
minSurroundingLoops = std::min(nsLoops, minSurroundingLoops);		minSurroundingLoops = std::min(nsLoops, minSurroundingLoops);
storeOps.push_back(storeOp);		storeOps.push_back(storeOp);
}		}

// The list of store op candidates for forwarding that satisfy conditions		// The list of store op candidates for forwarding that satisfy conditions
// (1) and (2) above - they will be filtered later when checking (3).		// (1) and (2) above - they will be filtered later when checking (3).
SmallVector<Operation *, 8> fwdingCandidates;		SmallVector<Operation *, 8> fwdingCandidates;

// Store ops that have a dependence into the load (even if they aren't		// Store ops that have a dependence into the load (even if they aren't
// forwarding candidates). Each forwarding candidate will be checked for a		// forwarding candidates). Each forwarding candidate will be checked for a
// post-dominance on these. 'fwdingCandidates' are a subset of depSrcStores.		// post-dominance on these. 'fwdingCandidates' are a subset of depSrcStores.
SmallVector<Operation *, 8> depSrcStores;		SmallVector<Operation *, 8> depSrcStores;

for (auto *storeOp : storeOps) {		for (auto *storeOp : storeOps) {
MemRefAccess srcAccess(storeOp);		if (!storeMayReachLoad(storeOp, loadOp, minSurroundingLoops))
MemRefAccess destAccess(loadOp);
// Find stores that may be reaching the load.
FlatAffineConstraints dependenceConstraints;
unsigned nsLoops = getNumCommonSurroundingLoops(loadOp, storeOp);
unsigned d;
// Dependences at loop depth <= minSurroundingLoops do NOT matter.
for (d = nsLoops + 1; d > minSurroundingLoops; d--) {
DependenceResult result = checkMemrefAccessDependence(
srcAccess, destAccess, d, &dependenceConstraints,
/dependenceComponents=/nullptr);
if (hasDependence(result))
break;
}
if (d == minSurroundingLoops)
continue;		continue;

// Stores that may be reaching the load.		// Stores that may be reaching the load.
depSrcStores.push_back(storeOp);		depSrcStores.push_back(storeOp);

// 1. Check if the store and the load have mathematically equivalent		// 1. Check if the store and the load have mathematically equivalent
// affine access functions; this implies that they statically refer to the		// affine access functions; this implies that they statically refer to the
// same single memref element. As an example this filters out cases like:		// same single memref element. As an example this filters out cases like:
// store %A[%i0 + 1]		// store %A[%i0 + 1]
// load %A[%i0]		// load %A[%i0]
// store %A[%M]		// store %A[%M]
// load %A[%N]		// load %A[%N]
// Use the AffineValueMap difference based memref access equality checking.		// Use the AffineValueMap difference based memref access equality checking.
		MemRefAccess srcAccess(storeOp);
		MemRefAccess destAccess(loadOp);
if (srcAccess != destAccess)		if (srcAccess != destAccess)
continue;		continue;

// 2. The store has to dominate the load op to be candidate.		// 2. The store has to dominate the load op to be candidate.
if (!domInfo->dominates(storeOp, loadOp))		if (!domInfo->dominates(storeOp, loadOp))
continue;		continue;

// We now have a candidate for forwarding.		// We now have a candidate for forwarding.
Show All 11 Lines	for (auto *storeOp : fwdingCandidates) {
if (llvm::all_of(depSrcStores, [&](Operation *depStore) {		if (llvm::all_of(depSrcStores, [&](Operation *depStore) {
return postDomInfo->postDominates(storeOp, depStore);		return postDomInfo->postDominates(storeOp, depStore);
})) {		})) {
lastWriteStoreOp = storeOp;		lastWriteStoreOp = storeOp;
break;		break;
}		}
}		}
if (!lastWriteStoreOp)		if (!lastWriteStoreOp)
return;		return failure();

// Perform the actual store to load forwarding.		// Perform the actual store to load forwarding.
Value storeVal =		Value storeVal =
cast<AffineWriteOpInterface>(lastWriteStoreOp).getValueToStore();		cast<AffineWriteOpInterface>(lastWriteStoreOp).getValueToStore();
		// Check if 2 values have the same shape. This is needed for affine vector
		// loads and stores.
		if (storeVal.getType() != loadOp.getValue().getType())
		return failure();
loadOp.getValue().replaceAllUsesWith(storeVal);		loadOp.getValue().replaceAllUsesWith(storeVal);
// Record the memref for a later sweep to optimize away.		// Record the memref for a later sweep to optimize away.
memrefsToErase.insert(loadOp.getMemRef());		memrefsToErase.insert(loadOp.getMemRef());
// Record this to erase later.		// Record this to erase later.
loadOpsToErase.push_back(loadOp);		loadOpsToErase.push_back(loadOp);
		return success();
		}

		// The load to load forwarding / redundant load elimination is similar to the
		// store to load forwarding.
		// loadA will be be replaced with loadB if:
		// 1) loadA and loadB have mathematically equivalent affine access functions.
		// 2) loadB dominates loadA.
		// 3) loadB postdominates all the store op's that have a dependence into loadA.
		void MemRefDataFlowOpt::loadCSE(AffineReadOpInterface loadOp) {
		// The list of load op candidates for forwarding that satisfy conditions
		// (1) and (2) above - they will be filtered later when checking (3).
		SmallVector<Operation *, 8> fwdingCandidates;
		SmallVector<Operation *, 8> storeOps;
		unsigned minSurroundingLoops = getNestingDepth(loadOp);
		MemRefAccess memRefAccess(loadOp);
		// First pass over the use list to get 1) the minimum number of surrounding
		// loops common between the load op and an load op candidate, with min taken
		// across all load op candidates; 2) load op candidates; 3) store ops.
		// We take min across all load op candidates instead of all load ops to make
		// sure later dependence check is performed at loop depths that do matter.
		for (auto *user : loadOp.getMemRef().getUsers()) {
		if (auto storeOp = dyn_cast<AffineWriteOpInterface>(user)) {
		storeOps.push_back(storeOp);
		} else if (auto aLoadOp = dyn_cast<AffineReadOpInterface>(user)) {
		MemRefAccess otherMemRefAccess(aLoadOp);
		// No need to consider Load ops that have been replaced in previous store
		// to load forwarding or loadCSE. If loadA or storeA can be forwarded to
		// loadB, then loadA or storeA can be forwarded to loadC iff loadB can be
		// forwarded to loadC.
		// If loadB is visited before loadC and replace with loadA, we do not put
		// loadB in candidates list, only loadA. If loadC is visited before loadB,
		// loadC may be replaced with loadB, which will be replaced with loadA
		// later.
		if (aLoadOp != loadOp && !llvm::is_contained(loadOpsToErase, aLoadOp) &&
		memRefAccess == otherMemRefAccess &&
		domInfo->dominates(aLoadOp, loadOp)) {
		fwdingCandidates.push_back(aLoadOp);
		unsigned nsLoops = getNumCommonSurroundingLoops(loadOp, aLoadOp);
		minSurroundingLoops = std::min(nsLoops, minSurroundingLoops);
		}
		}
		}

		// No forwarding candidate.
		if (fwdingCandidates.empty())
		return;

		// Store ops that have a dependence into the load.
		SmallVector<Operation *, 8> depSrcStores;

		for (auto *storeOp : storeOps) {
		if (!storeMayReachLoad(storeOp, loadOp, minSurroundingLoops))
		continue;

		// Stores that may be reaching the load.
		depSrcStores.push_back(storeOp);
		}

		// 3. Of all the load op's that meet the above criteria, return the first load
		// found that postdominates all 'depSrcStores' and has the same shape as the
		// load to be replaced (if one exists). The shape check is needed for affine
		// vector loads.
		Operation *firstLoadOp = nullptr;
		Value oldVal = loadOp.getValue();
		for (auto *loadOp : fwdingCandidates) {
		if (llvm::all_of(depSrcStores,
		[&](Operation *depStore) {
		return postDomInfo->postDominates(loadOp, depStore);
		}) &&
		cast<AffineReadOpInterface>(loadOp).getValue().getType() ==
		oldVal.getType()) {
		firstLoadOp = loadOp;
		break;
		}
		}
		if (!firstLoadOp)
		return;

		// Perform the actual load to load forwarding.
		Value loadVal = cast<AffineReadOpInterface>(firstLoadOp).getValue();
		loadOp.getValue().replaceAllUsesWith(loadVal);
		// Record this to erase later.
		loadOpsToErase.push_back(loadOp);
}		}

void MemRefDataFlowOpt::runOnFunction() {		void MemRefDataFlowOpt::runOnFunction() {
		bondhugulaUnsubmitted Done Reply Inline Actions We should have a doc comment here - if needed, this can be brief and refer to the larger comment above. bondhugula: We should have a doc comment here - if needed, this can be brief and refer to the larger…
// Only supports single block functions at the moment.		// Only supports single block functions at the moment.
FuncOp f = getFunction();		FuncOp f = getFunction();
if (!llvm::hasSingleElement(f)) {		if (!llvm::hasSingleElement(f)) {
markAllAnalysesPreserved();		markAllAnalysesPreserved();
return;		return;
}		}

domInfo = &getAnalysis<DominanceInfo>();		domInfo = &getAnalysis<DominanceInfo>();
postDomInfo = &getAnalysis<PostDominanceInfo>();		postDomInfo = &getAnalysis<PostDominanceInfo>();

loadOpsToErase.clear();		loadOpsToErase.clear();
memrefsToErase.clear();		memrefsToErase.clear();

// Walk all load's and perform store to load forwarding.		// Walk all load's and perform store to load forwarding and loadCSE.
f.walk([&](AffineReadOpInterface loadOp) { forwardStoreToLoad(loadOp); });		f.walk([&](AffineReadOpInterface loadOp) {
		// Do store to load forwarding first, if no success, try loadCSE.
		bondhugulaUnsubmitted Done Reply Inline Actions Comment here please. This isn't covered by the comment above. bondhugula: Comment here please. This isn't covered by the comment above.
		if (failed(forwardStoreToLoad(loadOp)))
		bondhugulaUnsubmitted Done Reply Inline Actions Please use `llvm::is_contained`. bondhugula: Please use `llvm::is_contained`.
		loadCSE(loadOp);
		});

// Erase all load op's whose results were replaced with store fwd'ed ones.		// Erase all load op's whose results were replaced with store or load fwd'ed
		// ones.
for (auto *loadOp : loadOpsToErase)		for (auto *loadOp : loadOpsToErase)
loadOp->erase();		loadOp->erase();

// Check if the store fwd'ed memrefs are now left with only stores and can		// Check if the store fwd'ed memrefs are now left with only stores and can
		bondhugulaUnsubmitted Done Reply Inline Actions You can actually do both in one walk, i.e., if the `store -> load` forward doesn't forward it, immediately run the `loadCSE`. (The forwarding can be made to return a `LogicalStatus` to check if it happened.) bondhugula: You can actually do both in one walk, i.e., if the `store -> load` forward doesn't forward it…
// thus be completely deleted. Note: the canonicalize pass should be able		// thus be completely deleted. Note: the canonicalize pass should be able
// to do this as well, but we'll do it here since we collected these anyway.		// to do this as well, but we'll do it here since we collected these anyway.
for (auto memref : memrefsToErase) {		for (auto memref : memrefsToErase) {
// If the memref hasn't been alloc'ed in this function, skip.		// If the memref hasn't been alloc'ed in this function, skip.
Operation *defOp = memref.getDefiningOp();		Operation *defOp = memref.getDefiningOp();
if (!defOp \|\| !isa<memref::AllocOp>(defOp))		if (!defOp \|\| !isa<memref::AllocOp>(defOp))
// TODO: if the memref was returned by a 'call' operation, we		// TODO: if the memref was returned by a 'call' operation, we
// could still erase it if the call had no side-effects.		// could still erase it if the call had no side-effects.
continue;		continue;
if (llvm::any_of(memref.getUsers(), [&](Operation *ownerOp) {		if (llvm::any_of(memref.getUsers(), [&](Operation *ownerOp) {
return !isa<AffineWriteOpInterface, memref::DeallocOp>(ownerOp);		return !isa<AffineWriteOpInterface, memref::DeallocOp>(ownerOp);
}))		}))
continue;		continue;

// Erase all stores, the dealloc, and the alloc on the memref.		// Erase all stores, the dealloc, and the alloc on the memref.
for (auto *user : llvm::make_early_inc_range(memref.getUsers()))		for (auto *user : llvm::make_early_inc_range(memref.getUsers()))
user->erase();		user->erase();
defOp->erase();		defOp->erase();
}		}
}		}
		bondhugulaUnsubmitted Done Reply Inline Actions Actually post dominance isn't needed here - just dominance is sufficient (but has to be used differently) and in fact will lead to a less conservative condition. But this can be fixed in a follow-up patch since the check for `forwardStoreToLoad` can be similarly made stronger as well freed from post-dominance info. (I already have a patch and was incidentally planning to submit it; so I can take care of this in a follow-up.) bondhugula: Actually post dominance isn't needed here - just dominance is sufficient (but has to be used…
		ayzhuangAuthorUnsubmitted Done Reply Inline Actions With your new patch, will %v1 be replaced? affine.for %i0 = 0 to 10 { affine.for %i1 = 0 to 9 { %a1 = affine.apply affine_map<(d0, d1) -> (d1 + 1)> (%i0, %i1) affine.store %c7, %m[%i0, %a1] : memref<10x10xf32> %v0 = affine.load %m[%i0, %i1] : memref<10x10xf32> } } affine.for %i0 = 0 to 10 { affine.for %i1 = 0 to 9 { affine.store %c7, %m[%i0, %i1] : memref<10x10xf32> %v1 = affine.load %m[%i0, %i1] : memref<10x10xf32> } } ayzhuang: With your new patch, will %v1 be replaced? ``` affine.for %i0 = 0 to 10 { affine.for %i1…
		bondhugulaUnsubmitted Done Reply Inline Actions Yes. More simply put, patterns like this as well: affine.store ... affine.for ... { affine.load ... affine.store ... ... = affine.load ... } With the current pass, the store to load in the nest won't happen. You don't need the last store to postdominate all other relevant stores. The new condition will similarly make load CSE more powerful. bondhugula: Yes. More simply put, patterns like this as well: ``` affine.store ... affine.for ... {…
		ayzhuangAuthorUnsubmitted Done Reply Inline Actions Great! ayzhuang: Great!

mlir/test/Transforms/memref-dataflow-opt.mlir

Show First 20 Lines • Show All 294 Lines • ▼ Show 20 Lines	func @vector_forwarding(%in : memref<512xf32>, %out : memref<512xf32>) {
return		return
}		}

// CHECK-LABEL: func @vector_forwarding		// CHECK-LABEL: func @vector_forwarding
// CHECK: affine.for %{{.*}} = 0 to 16 {		// CHECK: affine.for %{{.*}} = 0 to 16 {
// CHECK-NEXT: %[[LDVAL:.*]] = affine.vector_load		// CHECK-NEXT: %[[LDVAL:.*]] = affine.vector_load
// CHECK-NEXT: affine.vector_store %[[LDVAL]],{{.*}}		// CHECK-NEXT: affine.vector_store %[[LDVAL]],{{.*}}
// CHECK-NEXT: }		// CHECK-NEXT: }

		func @vector_no_forwarding(%in : memref<512xf32>, %out : memref<512xf32>) {
		%tmp = memref.alloc() : memref<512xf32>
		affine.for %i = 0 to 16 {
		%ld0 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
		affine.vector_store %ld0, %tmp[32*%i] : memref<512xf32>, vector<32xf32>
		%ld1 = affine.vector_load %tmp[32*%i] : memref<512xf32>, vector<16xf32>
		affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<16xf32>
		}
		return
		}

		// CHECK-LABEL: func @vector_no_forwarding
		// CHECK: affine.for %{{.*}} = 0 to 16 {
		// CHECK-NEXT: %[[LDVAL:.*]] = affine.vector_load
		// CHECK-NEXT: affine.vector_store %[[LDVAL]],{{.*}}
		// CHECK-NEXT: %[[LDVAL1:.*]] = affine.vector_load
		// CHECK-NEXT: affine.vector_store %[[LDVAL1]],{{.*}}
		// CHECK-NEXT: }

		// CHECK-LABEL: func @simple_three_loads
		func @simple_three_loads(%in : memref<10xf32>) {
		affine.for %i0 = 0 to 10 {
		// CHECK: affine.load
		%v0 = affine.load %in[%i0] : memref<10xf32>
		// CHECK-NOT: affine.load
		%v1 = affine.load %in[%i0] : memref<10xf32>
		%v2 = addf %v0, %v1 : f32
		%v3 = affine.load %in[%i0] : memref<10xf32>
		%v4 = addf %v2, %v3 : f32
		}
		return
		}

		// CHECK-LABEL: func @nested_loads_const_index
		func @nested_loads_const_index(%in : memref<10xf32>) {
		%c0 = constant 0 : index
		// CHECK: affine.load
		%v0 = affine.load %in[%c0] : memref<10xf32>
		affine.for %i0 = 0 to 10 {
		affine.for %i1 = 0 to 20 {
		affine.for %i2 = 0 to 30 {
		// CHECK-NOT: affine.load
		%v1 = affine.load %in[%c0] : memref<10xf32>
		%v2 = addf %v0, %v1 : f32
		}
		}
		}
		return
		}

		// CHECK-LABEL: func @nested_loads
		func @nested_loads(%N : index, %in : memref<10xf32>) {
		affine.for %i0 = 0 to 10 {
		// CHECK: affine.load
		%v0 = affine.load %in[%i0] : memref<10xf32>
		affine.for %i1 = 0 to %N {
		// CHECK-NOT: affine.load
		%v1 = affine.load %in[%i0] : memref<10xf32>
		%v2 = addf %v0, %v1 : f32
		}
		}
		return
		}

		// CHECK-LABEL: func @nested_loads_different_memref_accesses_no_cse
		func @nested_loads_different_memref_accesses_no_cse(%in : memref<10xf32>) {
		affine.for %i0 = 0 to 10 {
		// CHECK: affine.load
		%v0 = affine.load %in[%i0] : memref<10xf32>
		affine.for %i1 = 0 to 20 {
		// CHECK: affine.load
		%v1 = affine.load %in[%i1] : memref<10xf32>
		%v2 = addf %v0, %v1 : f32
		}
		}
		return
		}

		// CHECK-LABEL: func @load_load_store
		func @load_load_store(%m : memref<10xf32>) {
		affine.for %i0 = 0 to 10 {
		// CHECK: affine.load
		%v0 = affine.load %m[%i0] : memref<10xf32>
		// CHECK-NOT: affine.load
		%v1 = affine.load %m[%i0] : memref<10xf32>
		%v2 = addf %v0, %v1 : f32
		affine.store %v2, %m[%i0] : memref<10xf32>
		}
		return
		}

		// CHECK-LABEL: func @load_load_store_2_loops_no_cse
		func @load_load_store_2_loops_no_cse(%N : index, %m : memref<10xf32>) {
		affine.for %i0 = 0 to 10 {
		// CHECK: affine.load
		%v0 = affine.load %m[%i0] : memref<10xf32>
		affine.for %i1 = 0 to %N {
		// CHECK: affine.load
		%v1 = affine.load %m[%i0] : memref<10xf32>
		%v2 = addf %v0, %v1 : f32
		affine.store %v2, %m[%i0] : memref<10xf32>
		}
		}
		return
		}

		// CHECK-LABEL: func @load_load_store_3_loops_no_cse
		func @load_load_store_3_loops_no_cse(%m : memref<10xf32>) {
		%cf1 = constant 1.0 : f32
		affine.for %i0 = 0 to 10 {
		// CHECK: affine.load
		%v0 = affine.load %m[%i0] : memref<10xf32>
		affine.for %i1 = 0 to 20 {
		affine.for %i2 = 0 to 30 {
		// CHECK: affine.load
		%v1 = affine.load %m[%i0] : memref<10xf32>
		%v2 = addf %v0, %v1 : f32
		}
		affine.store %cf1, %m[%i0] : memref<10xf32>
		}
		}
		return
		}

		// CHECK-LABEL: func @load_load_store_3_loops
		func @load_load_store_3_loops(%m : memref<10xf32>) {
		%cf1 = constant 1.0 : f32
		affine.for %i0 = 0 to 10 {
		affine.for %i1 = 0 to 20 {
		// CHECK: affine.load
		%v0 = affine.load %m[%i0] : memref<10xf32>
		affine.for %i2 = 0 to 30 {
		// CHECK-NOT: affine.load
		%v1 = affine.load %m[%i0] : memref<10xf32>
		%v2 = addf %v0, %v1 : f32
		}
		}
		affine.store %cf1, %m[%i0] : memref<10xf32>
		}
		return
		}

		// CHECK-LABEL: func @loads_in_sibling_loops_const_index_no_cse
		func @loads_in_sibling_loops_const_index_no_cse(%m : memref<10xf32>) {
		%c0 = constant 0 : index
		affine.for %i0 = 0 to 10 {
		// CHECK: affine.load
		%v0 = affine.load %m[%c0] : memref<10xf32>
		}
		affine.for %i1 = 0 to 10 {
		// CHECK: affine.load
		%v0 = affine.load %m[%c0] : memref<10xf32>
		%v1 = addf %v0, %v0 : f32
		}
		return
		}

		// CHECK-LABEL: func @load_load_affine_apply
		func @load_load_affine_apply(%in : memref<10x10xf32>) {
		affine.for %i0 = 0 to 10 {
		affine.for %i1 = 0 to 10 {
		%t0 = affine.apply affine_map<(d0, d1) -> (d1 + 1)>(%i0, %i1)
		%t1 = affine.apply affine_map<(d0, d1) -> (d0)>(%i0, %i1)
		%idx0 = affine.apply affine_map<(d0, d1) -> (d1)> (%t0, %t1)
		%idx1 = affine.apply affine_map<(d0, d1) -> (d0 - 1)> (%t0, %t1)
		// CHECK: affine.load
		%v0 = affine.load %in[%idx0, %idx1] : memref<10x10xf32>
		// CHECK-NOT: affine.load
		%v1 = affine.load %in[%i0, %i1] : memref<10x10xf32>
		%v2 = addf %v0, %v1 : f32
		}
		}
		return
		}

		// CHECK-LABEL: func @vector_loads
		func @vector_loads(%in : memref<512xf32>, %out : memref<512xf32>) {
		affine.for %i = 0 to 16 {
		// CHECK: affine.vector_load
		%ld0 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
		// CHECK-NOT: affine.vector_load
		%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
		%add = addf %ld0, %ld1 : vector<32xf32>
		affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>
		}
		return
		}

		// CHECK-LABEL: func @vector_loads_no_cse
		func @vector_loads_no_cse(%in : memref<512xf32>, %out : memref<512xf32>) {
		affine.for %i = 0 to 16 {
		// CHECK: affine.vector_load
		%ld0 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
		// CHECK: affine.vector_load
		%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<16xf32>
		affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<16xf32>
		}
		return
		}

		// CHECK-LABEL: func @vector_load_store_load_no_cse
		func @vector_load_store_load_no_cse(%in : memref<512xf32>, %out : memref<512xf32>) {
		affine.for %i = 0 to 16 {
		// CHECK: affine.vector_load
		%ld0 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
		affine.vector_store %ld0, %in[16*%i] : memref<512xf32>, vector<32xf32>
		// CHECK: affine.vector_load
		%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
		%add = addf %ld0, %ld1 : vector<32xf32>
		affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>
		}
		return
		}

		// CHECK-LABEL: func @vector_load_affine_apply_store_load
		func @vector_load_affine_apply_store_load(%in : memref<512xf32>, %out : memref<512xf32>) {
		%cf1 = constant 1: index
		affine.for %i = 0 to 15 {
		// CHECK: affine.vector_load
		%ld0 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
		%idx = affine.apply affine_map<(d0) -> (d0 + 1)> (%i)
		affine.vector_store %ld0, %in[32*%idx] : memref<512xf32>, vector<32xf32>
		// CHECK-NOT: affine.vector_load
		%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
		%add = addf %ld0, %ld1 : vector<32xf32>
		affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>
		}
		return
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Remove redundant loadsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 349720

mlir/lib/Transforms/MemRefDataFlowOpt.cpp

mlir/test/Transforms/memref-dataflow-opt.mlir

[mlir] Remove redundant loads
ClosedPublic