This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Affine/Transforms/
-
Dialect/
-
Affine/
-
Transforms/
57/68
AffineScalarReplacement.cpp
-
test/Dialect/Affine/
-
Dialect/
-
Affine/
7/10
scalrep.mlir

Differential D104053

[MLIR] Correct memrefdataflow behavior in the presence of cast and other operations
ClosedPublic

Authored by wsmoses on Jun 10 2021, 12:15 PM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache
mehdi_amini
chelini
kumasento
vinayaka-polymage
ayzhuang
dcaballe

Commits

rG44826ecd929b: [MLIR] Correct memrefdataflow behavior in the presence of cast and other…

Summary

MemRefDataFlow performs mem2reg style operations for affine load/stores. Unfortunately, it is not presently correct in the presence of external operations such as memref.cast, or function calls. This diff extends the functionality of the pass to remain correct in the presence of such ops.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wsmoses created this revision.Jun 10 2021, 12:15 PM

Herald added subscribers: dcaballe, cota, teijeong and 15 others. · View Herald TranscriptJun 10 2021, 12:15 PM

wsmoses requested review of this revision.Jun 10 2021, 12:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 12:16 PM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B108666: Diff 351234.Jun 10 2021, 12:44 PM

@wsmoses MemRefDataFlow has just been moved and been renamed. Could you rebase right away? (so that the rebase becomes easy with no further intervening commits)

bondhugula added a reviewer: vinayaka-polymage.Jun 14 2021, 7:03 AM

bondhugula added a reviewer: ayzhuang.

Rebase after memrefdataflow opt move

Harbormaster completed remote builds in B109153: Diff 351943.Jun 14 2021, 12:39 PM

ftynse added inline comments.Jun 21 2021, 4:11 AM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
73–93	Could we have some doc comments for these functions?
130–134	Nit: add braces for these
142–144	Nit: expand auto here plz
185
188	Nit: ++iter
195	Expanding auto will remove this linter error.
231	Nit: here and below, please add trailing dots to all sentences.
362	Please fix the linter error.
mlir/test/Dialect/Affine/scalrep.mlir
238–253	Please explain what needs to be done here.
659	Please add a newline.

Fix formatting

Harbormaster completed remote builds in B110258: Diff 353443.Jun 21 2021, 12:47 PM

bondhugula requested changes to this revision.Jun 21 2021, 6:27 PM

bondhugula added inline comments.

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
73–93	Since these aren't publicly exposed, I think the convention is to have these comments on the definition instead.
115–116	Reflow - use width.
117	`auto` should be fine here.
127–128	Reflow.
393	You don't need to initialize this to `nullptr` - it has default null init.

bondhugula added inline comments.Jun 21 2021, 6:27 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
105	From the comment itself, it's not clear what it means for an operation to have a memory effect on `memOp`! An op has a memory effect or not through a `Value` in this case - it isn't meaningful to say an op has a memory effect on another op. Can you rephrase?
113	Can you move this declare/init further below? Also, `legal` -> `hasEffect`?
183	Terminate with period.
221–225	This is covered neither by the commit title nor commit summary and is a new feature/extension completely separate from the fix for side-effecting/non-affine deferencing ops being present. Can you please move this to another revision?

This revision now requires changes to proceed.Jun 21 2021, 6:27 PM

Address changes and refactor store forwarding

wsmoses marked 12 inline comments as done.Jun 22 2021, 9:40 AM

wsmoses added inline comments.

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
117	I don't believe it is. Since check is called recursively, the actual type is needed.

wsmoses added inline comments.Jun 22 2021, 9:41 AM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
113	Changed name, however, this can't be moved any further down since it is referenced within the check function.

Harbormaster completed remote builds in B110432: Diff 353678.Jun 22 2021, 10:42 AM

ftynse added inline comments.Jun 23 2021, 1:32 AM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
117	Indeed. And you need an std::function, function_ref won't work correctly.
mlir/test/Dialect/Affine/scalrep.mlir
238–253	Can this be fixed by recognizing specifically the ops with AffineRead/WriteOpInterface before MemoryEffectOpInterface, and keeping track of the affine subscripts?

This patch is regressing functionality - on store_load_store_nest_forward. Dependence analysis was already being used by this pass and was able to earlier already detect that a store was able to forward to the load. We shouldn't be disabling that test instead of revising this to not undo that inference.

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
76–77	Pass dom infos by reference since they can't be null at this stage.
81	Likewise.
104–236	Ensure that all ... do not have ... Ensure that no operation between ... has the ... It's not clear what "potential memory effect" is. Rephrase. Nit: Enclose any arg names referred to in backticks (eg. memOp, EffectType).
105	It still isn't clear here what "effect on an op" means.
106	is defined to be an operation -> is an operation ?
110	originalMemref -> memref ? It's not clear what "original" here means.
114	Document this please. Should this be hasSideEffect? hasEffect is generic.
117	Please rename this lambda - `check` is too general.
142	may have a side effect? Nit: Please terminate all comments with a period - here and everywhere else.
157	Please rename this lambda.
168	Assertion message please.
183	Name isn't descriptive enough: `todoBlocks`?
252	to -> `endOp` or `untilOp`?
261	You don't need the `.getOperation()` I think.
268–272	Update doc comment to cover the additional `..ToErase` args.
322	"2." is gone and all numbers are off by one. You'll also have to update the top-level class comment on `AffineScalarReplacement`.
329–333	This isn't meaningful as a loop - this is only equivalent to: assert(fwdingCandidates.size() <= 1 && "...");
334–335	Move this up and if (....empty()) return failure(); ... assertion ... lastWriteStoreOp = fwdingCandidates.front();
362	This isn't properly named - `loadCandidates`?
383–385	2 values -> two values
394	Avoid `auto` here.
446	This is a local variable and isn't used any more - no need to clear.
mlir/test/Dialect/Affine/scalrep.mlir
240–241	Dependence analysis is already being used by the pass and was able to earlier already detect that this store did not reach the load above. Please see comment above - it's exactly saying that: " Although there is a dependence from the second store to the load, it is satisfied by the outer surrounding loop, and does not prevent the first // store to be forwarded to the load." This patch is regressing on this functionality and undoing that inference. We shouldn't be disabling this test.

This revision now requires changes to proceed.Jun 26 2021, 12:48 AM

Address comments

Additional comments

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
285	This is a sketch of how the additional dependence analysis can be restored, while maintaining correctness with other ops. This snippet here is not sufficient, see my comment in the test.
mlir/test/Dialect/Affine/scalrep.mlir
240–241	I've demonstrated above where such dependence checking code could be inserted. Unfortunately, I do not understand the existing dependence loop structure sufficiently to rewrite it to be correct and apply here, when taking into account that other operators could modify the memory. If you could explain that, or take a stab feel free. Alternatively, if you're okay with it, since this PR fixes existing correctness bugs that I've seen in practice, perhaps we could first fix correctness, then restore that optimization?

Harbormaster completed remote builds in B111198: Diff 354764.Jun 27 2021, 1:01 PM

bondhugula requested changes to this revision.Jun 27 2021, 5:15 PM

bondhugula added inline comments.

mlir/test/Dialect/Affine/scalrep.mlir
240–241	Unfortunately, I do not understand the existing dependence loop structure sufficiently Can you explain which part isn't clear? I can help if the code comments or doc isn't clear. Alternatively, if you're okay with it, since this PR fixes existing correctness bugs that >I've seen in practice, perhaps we could first fix correctness, then restore that Actually, we shouldn't be creating such a regression - this is really not a corner case or a special case that this is regressing on but a pretty important pattern and the very reason dependence analysis is being used by this pass! It is common for affine passes to be missing handling of such side-effecting operations that need to be fixed at various places - (for eg. we always assume that %memref_1 and %memref_2 are different even if they are block arguments). Let's fix this while not regressing. For `hasNoIntervening` effect, can you document what "between" two ops means? It's obvious when the two ops lie in the same block but not otherwise. Also, the class comment on the top has still not been updated to reflect this check.

This revision now requires changes to proceed.Jun 27 2021, 5:15 PM

bondhugula added inline comments.Jun 27 2021, 5:53 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
266	Address clang-tidy here please.
290–300	Reflow - please use the entire width for comments here and everywhere else - it'll lead to fewer lines in general.

bondhugula added inline comments.Jun 27 2021, 6:21 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
77	This is unused.
284–285	an -> and
285	It's not clear to me what "check the entire parent op" means here. Rephrase?
308	Please put `from` and `to` in backticks or else it messes up what' being conveyed.
319	If we only expect to find the first op and are asserting below when there is more than one, we should be doing it right here instead of finding more than two ops. We are also completely missing comments here on what the approach is. To start with, `postDominanceInfo` is now no longer being used, yet it's being computed and passed to this function. All of the related comments are outdated and incorrect and still appear to be present. I don't think I really understand the approach at all. Can you please properly document things and summarize the approach?
322–333	This whole comment is outdated and incorrect now. I think several of the code comment in class comment are also similarly outdated.
334	.size() == 0 -> .empty()
423	This is not even used any more anywhere in the code.

bondhugula added inline comments.Jun 27 2021, 6:36 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
323–333	You may also want to rebase on master - post dominance info isn't used anymore and this check was changed to use dominance info.

I'm going to be mostly away from reviewing the next 1.5 weeks - @ayzhuang, @dcaballe - could you please take over reviewing this? I've pretty much posted all comments I had so far.

This revision now requires review to proceed.Jun 27 2021, 6:42 PM

bondhugula added inline comments.Jun 27 2021, 7:16 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
275–295	Instead of dropping the reliance on dominance and the condition (3) that existed to find the "last write" to fwd, what could instead be done to be unified with the existing approach is to simply also consider here all ops that have a memory write side-effect in addition to just affine writes for `storeOps` above. Then ensure that those ops get into `depSrcStores` as well (effectively treating such non-affine ops conservatively). Find the last write op in the same way as before and this naturally/transparently takes care of it. If the unique last write isn't an AffineWriteOpInterface, no forwarding will happen. This will also be as powerful as the previous approach and will not cause any regression. It's also no more expensive than the existing approach and you really don't need to check for any paths separately. (You are just reusing dominance info.)

Restore dependence analysis and address comments

wsmoses added inline comments.Jun 27 2021, 7:53 PM

mlir/test/Dialect/Affine/scalrep.mlir
240–241	Understood. I've spent some time recently going through and trying to understand the existing analysis, and have added what I think is a reasonable solution that maintains the aforementioned case, and general correctness. For ease, I've also added some more depth to that part of the code (and perhaps can add a figure if thought to be useful). In essence the reliance of the store dominating the load as a necessary precondition for why the given loop depth range was unclear to me and why I was hesitant to adding something like that before ensuring it would also apply here. Happily it does (with the aforementioned changes and comment describing why it should be correct). It may even be able to be slightly more aggressive than the previous dependence analysis since we can set the min loop depth to that of the proposed replacement op rather than the min across all potential storing operations.

Harbormaster completed remote builds in B111216: Diff 354783.Jun 27 2021, 8:16 PM

I'm going to be mostly away from reviewing the next 1.5 weeks - @ayzhuang, @dcaballe - could you please take over reviewing this? I've pretty much posted all comments I had so far.

Diego is also unavailable AFAIK.

The fixed version without regression looks okay to me. The overall recursive algorithm is to look if there is a non-affine operation that may have side effects preventing the store-to-load forwarding. It starts from the store and eagerly looks into all control-flow successors (operations and blocks) until the load is reached or all control flow paths are visited. This is overly conservative because it is visiting operations that do not lie on any control flow path from the store to the load. Ideally, we would have some reachability analysis and only visit the operations that are (1) reachable from the store and (2) such that the load is reachable from them.

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
106	Nit: `///`
280–281	Please reflow and use `///` for top-level comments.
340–352	Spurious whitespace changes.
358–372	Nit: trailing dot plz.
371
379–381
mlir/test/Dialect/Affine/scalrep.mlir
577	Nit: there's no "external" operation in this test AFAICS
611–622	Please put the check blocks consistently below or consistently after the function they should match. (Look what the rest of the file does). It's also possible to use the `// -----` marker to separate test cases.

This revision is now accepted and ready to land.Jun 28 2021, 3:13 AM

Post rebase / fixing nits

Harbormaster completed remote builds in B111297: Diff 354907.Jun 28 2021, 9:05 AM

Closed by commit rG44826ecd929b: [MLIR] Correct memrefdataflow behavior in the presence of cast and other… (authored by wsmoses). · Explain WhyJun 28 2021, 9:23 AM

This revision was automatically updated to reflect the committed changes.

wsmoses added a commit: rG44826ecd929b: [MLIR] Correct memrefdataflow behavior in the presence of cast and other….

mehdi_amini added inline comments.Jun 28 2021, 6:13 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
141	How does this work in presence of things like subview? Or if two different function parameters are the same memref (maybe that's forbidden by the affine dialect I don't remember)?
290	Seems like you can early return as soon as `hasSideEffect` is true here instead of continuing to traverse the IR?
314	Likely another place where you can early return when hasSideEffect is true.
359	After the call to checkOperation is another place to check for hasSideEffect and early return to limit the traversal.

ayzhuang added inline comments.Jun 29 2021, 10:30 AM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
395	rename depStore to load
413	hasSingleElement check is removed. Have you tested multi-block functions? If there is no problem, could you remove the above comment and add a unit test of multi-block function?
425	Why remove loadCSE from this and other comments?
mlir/test/Dialect/Affine/scalrep.mlir
592	Comment not correct, please modify.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Affine/

Transforms/

AffineScalarReplacement.cpp

382 lines

test/

Dialect/

Affine/

scalrep.mlir

154 lines

Diff 353443

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp

Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines

// TODO: do general dead store elimination for memref's. This pass // TODO: do general dead store elimination for memref's. This pass

// currently only eliminates the stores only if no other loads/uses (other // currently only eliminates the stores only if no other loads/uses (other

// than dealloc) remain. // than dealloc) remain.

// //

struct AffineScalarReplacement struct AffineScalarReplacement

: public AffineScalarReplacementBase<AffineScalarReplacement> { : public AffineScalarReplacementBase<AffineScalarReplacement> {

void runOnFunction() override; void runOnFunction() override;

LogicalResult forwardStoreToLoad(AffineReadOpInterface loadOp); // Attempt to eliminate loadOp by replacing it with a value stored

void loadCSE(AffineReadOpInterface loadOp); // into memory which the load is guaranteed to retrieve.

LogicalResult forwardStoreToLoad(AffineReadOpInterface loadOp,

// A list of memref's that are potentially dead / could be eliminated. SmallVectorImpl<Operation *> &loadOpsToErase,

SmallPtrSet<Value, 4> memrefsToErase; SmallPtrSetImpl<Value> &memrefsToErase,

bondhugulaUnsubmitted

Done

Pass dom infos by reference since they can't be null at this stage.

bondhugula: Pass dom infos by reference since they can't be null at this stage.

bondhugulaUnsubmitted

Done

This is unused.

bondhugula: This is unused.

// Load op's whose results were replaced by those forwarded from stores DominanceInfo *domInfo,

// dominating stores or loads.. PostDominanceInfo *postDominanceInfo);

SmallVector<Operation *, 8> loadOpsToErase;

// Remove a store which cannot impact the program.

bondhugulaUnsubmitted

Done

Likewise.

bondhugula: Likewise.

DominanceInfo *domInfo = nullptr; void removeUnusedStore(AffineWriteOpInterface loadOp,

PostDominanceInfo *postDomInfo = nullptr; SmallVectorImpl<Operation *> &loadOpsToErase,

SmallPtrSetImpl<Value> &memrefsToErase,

DominanceInfo *domInfo,

PostDominanceInfo *postDominanceInfo);

// Replace a loadOp with the result of another loadOp if the two

// loads are guaranteed to retrieve the same value.

void loadCSE(AffineReadOpInterface loadOp,

SmallVectorImpl<Operation *> &loadOpsToErase,

DominanceInfo *domInfo);

}; };

ftynseUnsubmitted

Done

Could we have some doc comments for these functions?

ftynse: Could we have some doc comments for these functions?

bondhugulaUnsubmitted

Done

Since these aren't publicly exposed, I think the convention is to have these comments on the definition instead.

bondhugula: Since these aren't publicly exposed, I think the convention is to have these comments on the…

} // end anonymous namespace } // end anonymous namespace

/// Creates a pass to perform optimizations relying on memref dataflow such as /// Creates a pass to perform optimizations relying on memref dataflow such as

/// store to load forwarding, elimination of dead stores, and dead allocs. /// store to load forwarding, elimination of dead stores, and dead allocs.

std::unique_ptr<OperationPass<FuncOp>> std::unique_ptr<OperationPass<FuncOp>>

mlir::createAffineScalarReplacementPass() { mlir::createAffineScalarReplacementPass() {

return std::make_unique<AffineScalarReplacement>(); return std::make_unique<AffineScalarReplacement>();

} }

// Check if the store may be reaching the load. /// Ensure that all operations between start (noninclusive) and memOp

static bool storeMayReachLoad(Operation *storeOp, Operation *loadOp, /// do not have the potential memory effect EffectType on memOp.

bondhugulaUnsubmitted

Done

From the comment itself, it's not clear what it means for an operation to have a memory effect on memOp! An op has a memory effect or not through a Value in this case - it isn't meaningful to say an op has a memory effect on another op. Can you rephrase?

bondhugula: From the comment itself, it's not clear what it means for an operation to have a memory effect…

bondhugulaUnsubmitted

Done

It still isn't clear here what "effect on an op" means.

bondhugula: It still isn't clear here what "effect on an op" means.

unsigned minSurroundingLoops) { template <typename EffectType, typename T>

bondhugulaUnsubmitted

Done

is defined to be an operation -> is an operation ?

bondhugula: is defined to be an operation -> is an operation ?

ftynseUnsubmitted

Done

Nit: ///

ftynse: Nit: `///`

MemRefAccess srcAccess(storeOp); bool hasNoInterveningEffect(Operation *start, T memOp) {

MemRefAccess destAccess(loadOp);

FlatAffineConstraints dependenceConstraints; Value originalMemref = memOp.getMemRef();

unsigned nsLoops = getNumCommonSurroundingLoops(*loadOp, *storeOp); bool isOriginalAllocation =

bondhugulaUnsubmitted

Done

originalMemref -> memref ?

It's not clear what "original" here means.

bondhugula: originalMemref -> memref ? It's not clear what "original" here means.

unsigned d; originalMemref.getDefiningOp<memref::AllocaOp>() ||

// Dependences at loop depth <= minSurroundingLoops do NOT matter. originalMemref.getDefiningOp<memref::AllocOp>();

for (d = nsLoops + 1; d > minSurroundingLoops; d--) { bool legal = true;

bondhugulaUnsubmitted

Done

Can you move this declare/init further below?

Also, legal -> hasEffect?

bondhugula: Can you move this declare/init further below? Also, `legal` -> `hasEffect`?

wsmosesAuthorUnsubmitted

Done

Changed name, however, this can't be moved any further down since it is referenced within the check function.

wsmoses: Changed name, however, this can't be moved any further down since it is referenced within the…

DependenceResult result = checkMemrefAccessDependence(

bondhugulaUnsubmitted

Done

Document this please. Should this be hasSideEffect? hasEffect is generic.

bondhugula: Document this please. Should this be hasSideEffect? hasEffect is generic.

srcAccess, destAccess, d, &dependenceConstraints, // Check whether the effect on memOp can be caused by

/*dependenceComponents=*/nullptr); // a given operation op.

bondhugulaUnsubmitted

Done

Reflow - use width.

bondhugula: Reflow - use width.

if (hasDependence(result)) std::function<void(Operation *)> check = [&](Operation *op) {

bondhugulaUnsubmitted

Done

auto should be fine here.

bondhugula: `auto` should be fine here.

bondhugulaUnsubmitted

Done

Please rename this lambda - check is too general.

bondhugula: Please rename this lambda - `check` is too general.

wsmosesAuthorUnsubmitted

Done

I don't believe it is. Since check is called recursively, the actual type is needed.

wsmoses: I don't believe it is. Since check is called recursively, the actual type is needed.

ftynseUnsubmitted

Done

Indeed. And you need an std::function, function_ref won't work correctly.

ftynse: Indeed. And you need an std::function, function_ref won't work correctly.

// If the effect has alreay been found, early exit,

if (!legal)

return;

if (auto memEffect = dyn_cast<MemoryEffectOpInterface>(op)) {

SmallVector<MemoryEffects::EffectInstance, 1> effects;

memEffect.getEffects(effects);

for (auto effect : effects) {

// If op causes EffectType on a potentially aliasing

// location for memOp, mark as illegal.

bondhugulaUnsubmitted

Done

Reflow.

bondhugula: Reflow.

if (isa<EffectType>(effect.getEffect())) {

if (isOriginalAllocation && effect.getValue() &&

(effect.getValue().getDefiningOp<memref::AllocaOp>() ||

effect.getValue().getDefiningOp<memref::AllocOp>())) {

if (effect.getValue() != originalMemref)

continue;

ftynseUnsubmitted

Done

Nit: add braces for these

ftynse: Nit: add braces for these

}

legal = false;

return;

}

} else if (op->hasTrait<OpTrait::HasRecursiveSideEffects>()) {

// Recurse into the regions for this op and check whether

mehdi_aminiUnsubmitted

Not Done

How does this work in presence of things like subview? Or if two different function parameters are the same memref (maybe that's forbidden by the affine dialect I don't remember)?

mehdi_amini: How does this work in presence of things like subview? Or if two different function parameters…

// the internal operations may have the effect

bondhugulaUnsubmitted

Done

may have a side effect?

Nit: Please terminate all comments with a period - here and everywhere else.

bondhugula: may have a side effect? Nit: Please terminate all comments with a period - here and everywhere…

for (Region &region : op->getRegions())

for (Block &block : region)

ftynseUnsubmitted

Done

Nit: expand auto here plz

ftynse: Nit: expand auto here plz

for (Operation &op : block)

check(&op);

} else {

// Otherwise, conservatively assume generic operations have

// the effect on the operation

legal = false;

return;

}

};

// Check all paths from ancestor op `parent` to the

// operation `to` for the effect. It is known that

// `to` must be contained within `parent`.

bondhugulaUnsubmitted

Done

Please rename this lambda.

bondhugula: Please rename this lambda.

auto until = [&](Operation *parent, Operation *to) {

// TODO check only the paths from `parent` to `to`.

// Currently we fallback an check the entire parent op.

assert(parent->isAncestor(to));

check(parent);

};

// Check for all paths from operation `from` to operation

// `to` for the given memory effect.

std::function<void(Operation *, Operation *)> recur = [&](Operation *from,

Operation *to) {

bondhugulaUnsubmitted

Done

Assertion message please.

bondhugula: Assertion message please.

assert(from->getParentRegion()->isAncestor(to->getParentRegion()));

// If the operations are in different regions, recursively

// consider all path from `from` to the parent of `to` and

// all paths from the parent of `to` to `to`.

if (from->getParentRegion() != to->getParentRegion()) {

recur(from, to->getParentOp());

until(to->getParentOp(), to);

return;

}

// Now, assuming that from and to exist in the same region, perform

// a CFG traversal to check all the relevant operations.

// Additional blocks to consider

bondhugulaUnsubmitted

Done

Terminate with period.

bondhugula: Terminate with period.

bondhugulaUnsubmitted

Done

Name isn't descriptive enough: todoBlocks?

bondhugula: Name isn't descriptive enough: `todoBlocks`?

SmallVector<Block *, 2> todo;

{

ftynseUnsubmitted

Done

SmallVector<Block *, 2> todo;

{

- // First consier the parent block of `from` an check all operations

+ // First consider the parent block of `from` an check all operations

// after `from`.

ftynse:

// First consider the parent block of `from` an check all operations

// after `from`.

for (auto iter = ++from->getIterator(), end = from->getBlock()->end();

ftynseUnsubmitted

Done

Nit: ++iter

ftynse: Nit: ++iter

iter != end && &*iter != to; ++iter) {

check(&*iter);

}

// If the parent of `from` doesn't contain `to`, add the successors

// to the list of blocks to check.

if (to->getBlock() != from->getBlock())

ftynseUnsubmitted

Done

Expanding auto will remove this linter error.

ftynse: Expanding auto will remove this linter error.

for (Block *succ : from->getBlock()->getSuccessors())

todo.push_back(succ);

}

SmallPtrSet<Block *, 4> done;

// Traverse the CFG until hitting `to`.

while (todo.size()) {

Block *blk = todo.pop_back_val();

if (done.count(blk))

continue;

done.insert(blk);

for (auto &op : *blk) {

if (&op == to)

break; break;

check(&op);

if (&op == blk->getTerminator())

for (auto succ : blk->getSuccessors())

Lint: Pre-merge checks

clang-tidy: warning: 'auto succ' can be declared as 'auto *succ' [llvm-qualified-auto]
not useful

Lint: Pre-merge checks: clang-tidy: warning: 'auto succ' can be declared as 'auto *succ' [llvm-qualified-auto] [[https…

todo.push_back(succ);

}

} }

if (d <= minSurroundingLoops) };

return false; recur(start, memOp.getOperation());

return legal;

}

// This attempts to remove stores which have no impact on the final result.

// A writing op writeA will be eliminated if there exists an op writeB if

// 1) writeA and writeB have mathematically equivalent affine access functions.

// 2) writeB postdominates loadA.

// 3) There is no potential read between writeA and writeB.

bondhugulaUnsubmitted

Done

This is covered neither by the commit title nor commit summary and is a new feature/extension completely separate from the fix for side-effecting/non-affine deferencing ops being present. Can you please move this to another revision?

bondhugula: This is covered neither by the commit title nor commit summary and is a new feature/extension…

void AffineScalarReplacement::removeUnusedStore(

AffineWriteOpInterface writeA, SmallVectorImpl<Operation *> &opsToErase,

SmallPtrSetImpl<Value> &memrefsToErase, DominanceInfo *domInfo,

PostDominanceInfo *postDominanceInfo) {

for (auto *user : writeA.getMemRef().getUsers()) {

ftynseUnsubmitted

Done

Nit: here and below, please add trailing dots to all sentences.

ftynse: Nit: here and below, please add trailing dots to all sentences.

// Only consider writing operations.

auto writeB = dyn_cast<AffineWriteOpInterface>(user);

if (!writeB)

continue;

bondhugulaUnsubmitted

Done

Ensure that all ... do not have ...

Ensure that no operation between ... has the ...

It's not clear what "potential memory effect" is. Rephrase.

Nit: Enclose any arg names referred to in backticks (eg. memOp, EffectType).

bondhugula: Ensure that all ... do not have ... Ensure that no operation between ... has the ... It's not…

// The operations must be distinct.

if (writeB == writeA)

continue;

// Both operations must lie in the same region.

if (writeB->getParentRegion() != writeA->getParentRegion())

continue;

// Both operations must write to the same memory.

MemRefAccess srcAccess(writeB);

MemRefAccess destAccess(writeA);

return true; if (srcAccess != destAccess)

continue;

// writeB must postdominate writeA.

bondhugulaUnsubmitted

Done

to -> endOp or untilOp?

bondhugula: to -> `endOp` or `untilOp`?

if (!postDominanceInfo->postDominates(writeB, writeA))

continue;

// There cannot be an operation which reads from memory between

// the two writes.

if (!hasNoInterveningEffect<MemoryEffects::Read>(writeA, writeB))

continue;

opsToErase.push_back(writeA);

bondhugulaUnsubmitted

Done

You don't need the .getOperation() I think.

bondhugula: You don't need the `.getOperation()` I think.

break;

}

} }

// This is a straightforward implementation not optimized for speed. Optimize // This is a straightforward implementation not optimized for speed. Optimize

bondhugulaUnsubmitted

Done

Address clang-tidy here please.

bondhugula: Address clang-tidy here please.

// if needed. // if needed.

LogicalResult LogicalResult AffineScalarReplacement::forwardStoreToLoad(

AffineScalarReplacement::forwardStoreToLoad(AffineReadOpInterface loadOp) { AffineReadOpInterface loadOp, SmallVectorImpl<Operation *> &loadOpsToErase,

SmallPtrSetImpl<Value> &memrefsToErase, DominanceInfo *domInfo,

PostDominanceInfo *postDominanceInfo) {

// First pass over the use list to get the minimum number of surrounding // First pass over the use list to get the minimum number of surrounding

bondhugulaUnsubmitted

Done

Update doc comment to cover the additional ..ToErase args.

bondhugula: Update doc comment to cover the additional `..ToErase` args.

// loops common between the load op and the store op, with min taken across // loops common between the load op and the store op, with min taken across

// all store ops. // all store ops.

SmallVector<Operation *, 8> storeOps; SmallVector<Operation *, 8> storeOps;

unsigned minSurroundingLoops = getNestingDepth(loadOp); unsigned minSurroundingLoops = getNestingDepth(loadOp);

for (auto *user : loadOp.getMemRef().getUsers()) { for (auto *user : loadOp.getMemRef().getUsers()) {

auto storeOp = dyn_cast<AffineWriteOpInterface>(user); auto storeOp = dyn_cast<AffineWriteOpInterface>(user);

if (!storeOp) if (!storeOp)

continue; continue;

unsigned nsLoops = getNumCommonSurroundingLoops(*loadOp, *storeOp); unsigned nsLoops = getNumCommonSurroundingLoops(*loadOp, *storeOp);

ftynseUnsubmitted

Done

Please reflow and use /// for top-level comments.

ftynse: Please reflow and use `///` for top-level comments.

minSurroundingLoops = std::min(nsLoops, minSurroundingLoops); minSurroundingLoops = std::min(nsLoops, minSurroundingLoops);

storeOps.push_back(storeOp); storeOps.push_back(storeOp);

} }

wsmosesAuthorUnsubmitted

Done

This is a sketch of how the additional dependence analysis can be restored, while maintaining correctness with other ops. This snippet here is not sufficient, see my comment in the test.

wsmoses: This is a sketch of how the additional dependence analysis can be restored, while maintaining…

bondhugulaUnsubmitted

Done

an -> and

bondhugula: an -> and

bondhugulaUnsubmitted

Done

It's not clear to me what "check the entire parent op" means here. Rephrase?

bondhugula: It's not clear to me what "check the entire parent op" means here. Rephrase?

// The list of store op candidates for forwarding that satisfy conditions // The list of store op candidates for forwarding that satisfy conditions

// (1) and (2) above - they will be filtered later when checking (3). // (1) and (2) above - they will be filtered later when checking (3).

SmallVector<Operation *, 8> fwdingCandidates; SmallVector<Operation *, 8> fwdingCandidates;

// Store ops that have a dependence into the load (even if they aren't // Store ops that have a dependence into the load (even if they aren't

mehdi_aminiUnsubmitted

Not Done

Seems like you can early return as soon as hasSideEffect is true here instead of continuing to traverse the IR?

mehdi_amini: Seems like you can early return as soon as `hasSideEffect` is true here instead of continuing…

// forwarding candidates). Each forwarding candidate will be checked for a // forwarding candidates). Each forwarding candidate will be checked for a

// post-dominance on these. 'fwdingCandidates' are a subset of depSrcStores. // post-dominance on these. 'fwdingCandidates' are a subset of depSrcStores.

SmallVector<Operation *, 8> depSrcStores; SmallVector<Operation *, 8> depSrcStores;

for (auto *storeOp : storeOps) { for (auto *storeOp : storeOps) {

if (!storeMayReachLoad(storeOp, loadOp, minSurroundingLoops)) MemRefAccess srcAccess(storeOp);

bondhugulaUnsubmitted

Not Done

Instead of dropping the reliance on dominance and the condition (3) that existed to find the "last write" to fwd, what could instead be done to be unified with the existing approach is to simply also consider here all ops that have a memory write side-effect *in addition to* just affine writes for storeOps above. Then ensure that those ops get into depSrcStores as well (effectively treating such non-affine ops conservatively). Find the last write op in the same way as before and this naturally/transparently takes care of it. If the unique last write isn't an AffineWriteOpInterface, no forwarding will happen. This will also be as powerful as the previous approach and will not cause any regression. It's also no more expensive than the existing approach and you really don't need to check for any paths separately. (You are just reusing dominance info.)

bondhugula: Instead of dropping the reliance on dominance and the condition (3) that existed to find the…

continue; MemRefAccess destAccess(loadOp);

// Stores that *may* be reaching the load. // Stores that *may* be reaching the load.

depSrcStores.push_back(storeOp); depSrcStores.push_back(storeOp);

bondhugulaUnsubmitted

Done

Reflow - please use the entire width for comments here and everywhere else - it'll lead to fewer lines in general.

bondhugula: Reflow - please use the entire width for comments here and everywhere else - it'll lead to…

// 1. Check if the store and the load have mathematically equivalent // 1. Check if the store and the load have mathematically equivalent

// affine access functions; this implies that they statically refer to the // affine access functions; this implies that they statically refer to the

// same single memref element. As an example this filters out cases like: // same single memref element. As an example this filters out cases like:

// store %A[%i0 + 1] // store %A[%i0 + 1]

// load %A[%i0] // load %A[%i0]

// store %A[%M] // store %A[%M]

// load %A[%N] // load %A[%N]

// Use the AffineValueMap difference based memref access equality checking. // Use the AffineValueMap difference based memref access equality checking.

bondhugulaUnsubmitted

Done

Please put from and to in backticks or else it messes up what' being conveyed.

bondhugula: Please put `from` and `to` in backticks or else it messes up what' being conveyed.

MemRefAccess srcAccess(storeOp);

MemRefAccess destAccess(loadOp);

if (srcAccess != destAccess) if (srcAccess != destAccess)

continue; continue;

// 2. The store has to dominate the load op to be candidate.

if (!domInfo->dominates(storeOp, loadOp)) if (!domInfo->dominates(storeOp, loadOp))

continue; continue;

mehdi_aminiUnsubmitted

Not Done

Likely another place where you can early return when hasSideEffect is true.

mehdi_amini: Likely another place where you can early return when hasSideEffect is true.

if (!hasNoInterveningEffect<MemoryEffects::Write>(storeOp, loadOp))

continue;

// We now have a candidate for forwarding. // We now have a candidate for forwarding.

fwdingCandidates.push_back(storeOp); fwdingCandidates.push_back(storeOp);

bondhugulaUnsubmitted

Done

If we only expect to find the first op and are asserting below when there is more than one, we should be doing it right here instead of finding more than two ops.

We are also completely missing comments here on what the approach is. To start with, postDominanceInfo is now no longer being used, yet it's being computed and passed to this function. All of the related comments are outdated and incorrect and still appear to be present. I don't think I really understand the approach at all. Can you please properly document things and summarize the approach?

bondhugula: If we only expect to find the first op and are asserting below when there is more than one, we…

} }

// 3. Of all the store op's that meet the above criteria, the store that // 3. Of all the store op's that meet the above criteria, the store that

bondhugulaUnsubmitted

Done

"2." is gone and all numbers are off by one.

You'll also have to update the top-level class comment on AffineScalarReplacement.

bondhugula: "2." is gone and all numbers are off by one. You'll also have to update the top-level class…

// postdominates all 'depSrcStores' (if one exists) is the unique store // postdominates all 'depSrcStores' (if one exists) is the unique store

// providing the value to the load, i.e., provably the last writer to that // providing the value to the load, i.e., provably the last writer to that

// memref loc. // memref loc.

// Note: this can be implemented in a cleaner way with postdominator tree // Note: this can be implemented in a cleaner way with postdominator tree

// traversals. Consider this for the future if needed. // traversals. Consider this for the future if needed.

Operation *lastWriteStoreOp = nullptr; Operation *lastWriteStoreOp = nullptr;

for (auto *storeOp : fwdingCandidates) { for (auto *storeOp : fwdingCandidates) {

if (llvm::all_of(depSrcStores, [&](Operation *depStore) { assert(!lastWriteStoreOp);

return postDomInfo->postDominates(storeOp, depStore);

})) {

lastWriteStoreOp = storeOp; lastWriteStoreOp = storeOp;

break;

}

} }

bondhugulaUnsubmitted

Done

This isn't meaningful as a loop - this is only equivalent to:

assert(fwdingCandidates.size() <= 1 && "...");

bondhugula: This isn't meaningful as a loop - this is only equivalent to: ``` assert(fwdingCandidates.size…

bondhugulaUnsubmitted

Not Done

This whole comment is outdated and incorrect now. I think several of the code comment in class comment are also similarly outdated.

bondhugula: This whole comment is outdated and incorrect now. I think several of the code comment in class…

bondhugulaUnsubmitted

Not Done

You may also want to rebase on master - post dominance info isn't used anymore and this check was changed to use dominance info.

bondhugula: You may also want to rebase on master - post dominance info isn't used anymore and this check…

if (!lastWriteStoreOp) if (!lastWriteStoreOp)

bondhugulaUnsubmitted

Not Done

.size() == 0 -> .empty()

bondhugula: .size() == 0 -> .empty()

return failure(); return failure();

bondhugulaUnsubmitted

Done

Move this up and

if (....empty()) 
  return failure();
... assertion ...
lastWriteStoreOp = fwdingCandidates.front();

bondhugula: Move this up and ``` if (....empty()) return failure(); ... assertion ... lastWriteStoreOp…

// Perform the actual store to load forwarding. // Perform the actual store to load forwarding.

Value storeVal = Value storeVal =

cast<AffineWriteOpInterface>(lastWriteStoreOp).getValueToStore(); cast<AffineWriteOpInterface>(lastWriteStoreOp).getValueToStore();

// Check if 2 values have the same shape. This is needed for affine vector // Check if 2 values have the same shape. This is needed for affine vector

// loads and stores. // loads and stores.

if (storeVal.getType() != loadOp.getValue().getType()) if (storeVal.getType() != loadOp.getValue().getType())

return failure(); return failure();

loadOp.getValue().replaceAllUsesWith(storeVal); loadOp.getValue().replaceAllUsesWith(storeVal);

// Record the memref for a later sweep to optimize away. // Record the memref for a later sweep to optimize away.

memrefsToErase.insert(loadOp.getMemRef()); memrefsToErase.insert(loadOp.getMemRef());

// Record this to erase later. // Record this to erase later.

loadOpsToErase.push_back(loadOp); loadOpsToErase.push_back(loadOp);

return success(); return success();

} }

ftynseUnsubmitted

Done

Spurious whitespace changes.

ftynse: Spurious whitespace changes.

// The load to load forwarding / redundant load elimination is similar to the // The load to load forwarding / redundant load elimination is similar to the

// store to load forwarding. // store to load forwarding.

// loadA will be be replaced with loadB if: // loadA will be be replaced with loadB if:

// 1) loadA and loadB have mathematically equivalent affine access functions. // 1) loadA and loadB have mathematically equivalent affine access functions.

// 2) loadB dominates loadA. // 2) loadB dominates loadA.

// 3) loadB postdominates all the store op's that have a dependence into loadA. // 3) There is no write between loadA and loadB

void AffineScalarReplacement::loadCSE(AffineReadOpInterface loadOp) { void AffineScalarReplacement::loadCSE(

mehdi_aminiUnsubmitted

Not Done

After the call to checkOperation is another place to check for hasSideEffect and early return to limit the traversal.

mehdi_amini: After the call to checkOperation is another place to check for hasSideEffect and early return…

// The list of load op candidates for forwarding that satisfy conditions AffineReadOpInterface loadA, SmallVectorImpl<Operation *> &loadOpsToErase,

// (1) and (2) above - they will be filtered later when checking (3). DominanceInfo *domInfo) {

SmallVector<Operation *, 8> fwdingCandidates; SmallVector<AffineReadOpInterface, 4> loadOptions;

ftynseUnsubmitted

Done

Please fix the linter error.

ftynse: Please fix the linter error.

bondhugulaUnsubmitted

Done

This isn't properly named - loadCandidates?

bondhugula: This isn't properly named - `loadCandidates`?

SmallVector<Operation *, 8> storeOps; for (auto *user : loadA.getMemRef().getUsers()) {

unsigned minSurroundingLoops = getNestingDepth(loadOp); auto loadB = dyn_cast<AffineReadOpInterface>(user);

MemRefAccess memRefAccess(loadOp); if (!loadB || loadB == loadA)

// First pass over the use list to get 1) the minimum number of surrounding continue;

// loops common between the load op and an load op candidate, with min taken

// across all load op candidates; 2) load op candidates; 3) store ops. MemRefAccess srcAccess(loadB);

// We take min across all load op candidates instead of all load ops to make MemRefAccess destAccess(loadA);

// sure later dependence check is performed at loop depths that do matter.

for (auto *user : loadOp.getMemRef().getUsers()) { if (srcAccess != destAccess) {

ftynseUnsubmitted

Done

MemRefAccess destAccess(loadA);

- // 1. The accesses have to be to the same location

+ // 1. The accesses have to be to the same location.

if (srcAccess != destAccess) {

ftynse:

if (auto storeOp = dyn_cast<AffineWriteOpInterface>(user)) { continue;

ftynseUnsubmitted

Done

Nit: trailing dot plz.

ftynse: Nit: trailing dot plz.

storeOps.push_back(storeOp);

} else if (auto aLoadOp = dyn_cast<AffineReadOpInterface>(user)) {

MemRefAccess otherMemRefAccess(aLoadOp);

// No need to consider Load ops that have been replaced in previous store

// to load forwarding or loadCSE. If loadA or storeA can be forwarded to

// loadB, then loadA or storeA can be forwarded to loadC iff loadB can be

// forwarded to loadC.

// If loadB is visited before loadC and replace with loadA, we do not put

// loadB in candidates list, only loadA. If loadC is visited before loadB,

// loadC may be replaced with loadB, which will be replaced with loadA

// later.

if (aLoadOp != loadOp && !llvm::is_contained(loadOpsToErase, aLoadOp) &&

memRefAccess == otherMemRefAccess &&

domInfo->dominates(aLoadOp, loadOp)) {

fwdingCandidates.push_back(aLoadOp);

unsigned nsLoops = getNumCommonSurroundingLoops(*loadOp, *aLoadOp);

minSurroundingLoops = std::min(nsLoops, minSurroundingLoops);

}

} }

// No forwarding candidate. // 2. The store has to dominate the load op to be candidate.

if (fwdingCandidates.empty()) if (!domInfo->dominates(loadB, loadA))

return; continue;

// Store ops that have a dependence into the load. if (!hasNoInterveningEffect<MemoryEffects::Write>(loadB.getOperation(),

SmallVector<Operation *, 8> depSrcStores; loadA))

continue;

ftynseUnsubmitted

Done

continue;

+ // 3. There is no write between loadA and loadB.

if (!hasNoInterveningEffect<MemoryEffects::Write>(loadB.getOperation(),

loadA))

ftynse:

for (auto *storeOp : storeOps) { // Check if 2 values have the same shape. This is needed for affine vector

if (!storeMayReachLoad(storeOp, loadOp, minSurroundingLoops)) // loads.

if (loadB.getValue().getType() != loadA.getValue().getType())

bondhugulaUnsubmitted

Done

2 values -> two values

bondhugula: 2 values -> two values

continue; continue;

// Stores that *may* be reaching the load. loadOptions.push_back(loadB);

depSrcStores.push_back(storeOp);

} }

// 3. Of all the load op's that meet the above criteria, return the first load // Of the legal load candidates, use the one that dominates all others

// found that postdominates all 'depSrcStores' and has the same shape as the // to minimize the subsequent need to loadCSE

// load to be replaced (if one exists). The shape check is needed for affine Value loadB = nullptr;

bondhugulaUnsubmitted

Done

You don't need to initialize this to nullptr - it has default null init.

bondhugula: You don't need to initialize this to `nullptr` - it has default null init.

// vector loads. for (auto option : loadOptions) {

bondhugulaUnsubmitted

Done

Avoid auto here.

bondhugula: Avoid `auto` here.

Operation *firstLoadOp = nullptr; if (llvm::all_of(loadOptions, [&](AffineReadOpInterface depStore) {

ayzhuangUnsubmitted

Not Done

rename depStore to load

ayzhuang: rename depStore to load

Value oldVal = loadOp.getValue(); return depStore == option ||

for (auto *loadOp : fwdingCandidates) { domInfo->dominates(option.getOperation(),

if (llvm::all_of(depSrcStores, depStore.getOperation());

[&](Operation *depStore) { })) {

return postDomInfo->postDominates(loadOp, depStore); loadB = option.getValue();

}) &&

cast<AffineReadOpInterface>(loadOp).getValue().getType() ==

oldVal.getType()) {

firstLoadOp = loadOp;

break; break;

} }

if (!firstLoadOp)

return;

// Perform the actual load to load forwarding. if (loadB) {

Value loadVal = cast<AffineReadOpInterface>(firstLoadOp).getValue(); loadA.getValue().replaceAllUsesWith(loadB);

loadOp.getValue().replaceAllUsesWith(loadVal);

// Record this to erase later. // Record this to erase later.

loadOpsToErase.push_back(loadOp); loadOpsToErase.push_back(loadA);

}

} }

void AffineScalarReplacement::runOnFunction() { void AffineScalarReplacement::runOnFunction() {

// Only supports single block functions at the moment. // Only supports single block functions at the moment.

ayzhuangUnsubmitted

Not Done

hasSingleElement check is removed. Have you tested multi-block functions? If there is no problem, could you remove the above comment and add a unit test of multi-block function?

ayzhuang: hasSingleElement check is removed. Have you tested multi-block functions? If there is no…

FuncOp f = getFunction(); FuncOp f = getFunction();

if (!llvm::hasSingleElement(f)) {

markAllAnalysesPreserved();

return;

}

domInfo = &getAnalysis<DominanceInfo>(); // Load op's whose results were replaced by those forwarded from stores.

postDomInfo = &getAnalysis<PostDominanceInfo>(); SmallVector<Operation *, 8> opsToErase;

loadOpsToErase.clear(); // A list of memref's that are potentially dead / could be eliminated.

memrefsToErase.clear(); SmallPtrSet<Value, 4> memrefsToErase;

auto *domInfo = &getAnalysis<DominanceInfo>();

auto *postDominanceInfo = &getAnalysis<PostDominanceInfo>();

bondhugulaUnsubmitted

Done

This is not even used any more anywhere in the code.

bondhugula: This is not even used any more anywhere in the code.

// Walk all load's and perform store to load forwarding and loadCSE. // Walk all load's and perform store to load forwarding.

ayzhuangUnsubmitted

Not Done

Why remove loadCSE from this and other comments?

ayzhuang: Why remove loadCSE from this and other comments?

f.walk([&](AffineReadOpInterface loadOp) { f.walk([&](AffineReadOpInterface loadOp) {

// Do store to load forwarding first, if no success, try loadCSE. if (failed(forwardStoreToLoad(loadOp, opsToErase, memrefsToErase, domInfo,

if (failed(forwardStoreToLoad(loadOp))) postDominanceInfo))) {

loadCSE(loadOp); loadCSE(loadOp, opsToErase, domInfo);

}

});

// Erase all load op's whose results were replaced with store fwd'ed ones.

for (auto *op : opsToErase)

op->erase();

opsToErase.clear();

f.walk([&](AffineWriteOpInterface loadOp) {

removeUnusedStore(loadOp, opsToErase, memrefsToErase, domInfo,

postDominanceInfo);

}); });

// Erase all load op's whose results were replaced with store or load fwd'ed // Erase all store op's which are unnecessary.

// ones. for (auto *op : opsToErase)

for (auto *loadOp : loadOpsToErase) op->erase();

loadOp->erase(); opsToErase.clear();

bondhugulaUnsubmitted

Done

This is a local variable and isn't used any more - no need to clear.

bondhugula: This is a local variable and isn't used any more - no need to clear.

// Check if the store fwd'ed memrefs are now left with only stores and can // Check if the store fwd'ed memrefs are now left with only stores and can

// thus be completely deleted. Note: the canonicalize pass should be able // thus be completely deleted. Note: the canonicalize pass should be able

// to do this as well, but we'll do it here since we collected these anyway. // to do this as well, but we'll do it here since we collected these anyway.

for (auto memref : memrefsToErase) { for (auto memref : memrefsToErase) {

// If the memref hasn't been alloc'ed in this function, skip. // If the memref hasn't been alloc'ed in this function, skip.

Operation *defOp = memref.getDefiningOp(); Operation *defOp = memref.getDefiningOp();

if (!defOp || !isa<memref::AllocOp>(defOp)) if (!defOp || !isa<memref::AllocOp>(defOp))

Show All 14 Lines

mlir/test/Dialect/Affine/scalrep.mlir

Show First 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	affine.for %i1 = 0 to %N {
%v1 = addf %v0, %v0 : f32		%v1 = addf %v0, %v0 : f32
%idx = affine.apply affine_map<(d0) -> (d0 + 1)> (%i0)		%idx = affine.apply affine_map<(d0) -> (d0 + 1)> (%i0)
affine.store %cf9, %m[%idx] : memref<10xf32>		affine.store %cf9, %m[%idx] : memref<10xf32>
}		}
}		}
// Due to this load, the memref isn't optimized away.		// Due to this load, the memref isn't optimized away.
%v3 = affine.load %m[%c1] : memref<10xf32>		%v3 = affine.load %m[%c1] : memref<10xf32>
return %v3 : f32		return %v3 : f32
// CHECK: %{{.*}} = memref.alloc() : memref<10xf32>		// This test is currently disabled as the affine store to i0+1 is seen as
// CHECK-NEXT: affine.for %{{.*}} = 0 to 10 {		// having a side effect that potentially conflicts with the load of i0.
// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>		// More fine grained analysis of the side effecting behavior (and dependence
// CHECK-NEXT: affine.for %{{.}} = 0 to %{{.}} {		// structure) is necessary for this to succeed.
		bondhugulaUnsubmitted Done Reply Inline Actions Dependence analysis is already being used by the pass and was able to earlier already detect that this store did not reach the load above. Please see comment above - it's exactly saying that: " Although there is a dependence from the second store to the load, it is satisfied by the outer surrounding loop, and does not prevent the first // store to be forwarded to the load." This patch is regressing on this functionality and undoing that inference. We shouldn't be disabling this test. bondhugula: Dependence analysis is already being used by the pass and was able to earlier already detect…
		wsmosesAuthorUnsubmitted Done Reply Inline Actions I've demonstrated above where such dependence checking code could be inserted. Unfortunately, I do not understand the existing dependence loop structure sufficiently to rewrite it to be correct and apply here, when taking into account that other operators could modify the memory. If you could explain that, or take a stab feel free. Alternatively, if you're okay with it, since this PR fixes existing correctness bugs that I've seen in practice, perhaps we could first fix correctness, then restore that optimization? wsmoses: I've demonstrated above where such dependence checking code could be inserted. Unfortunately, I…
		bondhugulaUnsubmitted Done Reply Inline Actions Unfortunately, I do not understand the existing dependence loop structure sufficiently Can you explain which part isn't clear? I can help if the code comments or doc isn't clear. Alternatively, if you're okay with it, since this PR fixes existing correctness bugs that >I've seen in practice, perhaps we could first fix correctness, then restore that Actually, we shouldn't be creating such a regression - this is really not a corner case or a special case that this is regressing on but a pretty important pattern and the very reason dependence analysis is being used by this pass! It is common for affine passes to be missing handling of such side-effecting operations that need to be fixed at various places - (for eg. we always assume that %memref_1 and %memref_2 are different even if they are block arguments). Let's fix this while not regressing. For `hasNoIntervening` effect, can you document what "between" two ops means? It's obvious when the two ops lie in the same block but not otherwise. Also, the class comment on the top has still not been updated to reflect this check. bondhugula: >Unfortunately, I do not understand the existing dependence loop structure sufficiently Can…
		wsmosesAuthorUnsubmitted Done Reply Inline Actions Understood. I've spent some time recently going through and trying to understand the existing analysis, and have added what I think is a reasonable solution that maintains the aforementioned case, and general correctness. For ease, I've also added some more depth to that part of the code (and perhaps can add a figure if thought to be useful). In essence the reliance of the store dominating the load as a necessary precondition for why the given loop depth range was unclear to me and why I was hesitant to adding something like that before ensuring it would also apply here. Happily it does (with the aforementioned changes and comment describing why it should be correct). It may even be able to be slightly more aggressive than the previous dependence analysis since we can set the min loop depth to that of the proposed replacement op rather than the min across all potential storing operations. wsmoses: Understood. I've spent some time recently going through and trying to understand the existing…
// CHECK-NEXT: %{{.}} = addf %{{.}}, %{{.*}} : f32		// TODO: %{{.*}} = memref.alloc() : memref<10xf32>
// CHECK-NEXT: %{{.}} = affine.apply [[$MAP4]](%{{.}})		// TODO-NEXT: affine.for %{{.*}} = 0 to 10 {
// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>		// TODO-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
// CHECK-NEXT: }		// TODO-NEXT: affine.for %{{.}} = 0 to %{{.}} {
// CHECK-NEXT: }		// TODO-NEXT: %{{.}} = addf %{{.}}, %{{.*}} : f32
// CHECK-NEXT: %{{.}} = affine.load %{{.}}[%{{.*}}] : memref<10xf32>		// TODO-NEXT: %{{.}} = affine.apply [[$MAP4]](%{{.}})
// CHECK-NEXT: return %{{.*}} : f32		// TODO-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// TODO-NEXT: }
		// TODO-NEXT: }
		// TODO-NEXT: %{{.}} = affine.load %{{.}}[%{{.*}}] : memref<10xf32>
		// TODO-NEXT: return %{{.*}} : f32
}		}
		ftynseUnsubmitted Done Reply Inline Actions Please explain what needs to be done here. ftynse: Please explain what needs to be done here.
		ftynseUnsubmitted Done Reply Inline Actions Can this be fixed by recognizing specifically the ops with AffineRead/WriteOpInterface before MemoryEffectOpInterface, and keeping track of the affine subscripts? ftynse: Can this be fixed by recognizing specifically the ops with AffineRead/WriteOpInterface before…

// CHECK-LABEL: func @should_not_fwd		// CHECK-LABEL: func @should_not_fwd
func @should_not_fwd(%A: memref<100xf32>, %M : index, %N : index) -> f32 {		func @should_not_fwd(%A: memref<100xf32>, %M : index, %N : index) -> f32 {
%cf = constant 0.0 : f32		%cf = constant 0.0 : f32
affine.store %cf, %A[%M] : memref<100xf32>		affine.store %cf, %A[%M] : memref<100xf32>
// CHECK: affine.load %{{.}}[%{{.}}]		// CHECK: affine.load %{{.}}[%{{.}}]
%v = affine.load %A[%N] : memref<100xf32>		%v = affine.load %A[%N] : memref<100xf32>
return %v : f32		return %v : f32
▲ Show 20 Lines • Show All 252 Lines • ▼ Show 20 Lines	affine.for %i = 0 to 16 {
// CHECK: affine.vector_load		// CHECK: affine.vector_load
%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>		%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
%add = addf %ld0, %ld1 : vector<32xf32>		%add = addf %ld0, %ld1 : vector<32xf32>
affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>		affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>
}		}
return		return
}		}

// CHECK-LABEL: func @vector_load_affine_apply_store_load		// TODO-LABEL: func @vector_load_affine_apply_store_load
func @vector_load_affine_apply_store_load(%in : memref<512xf32>, %out : memref<512xf32>) {		func @vector_load_affine_apply_store_load(%in : memref<512xf32>, %out : memref<512xf32>) {
%cf1 = constant 1: index		%cf1 = constant 1: index
affine.for %i = 0 to 15 {		affine.for %i = 0 to 15 {
// CHECK: affine.vector_load		// TODO: affine.vector_load
%ld0 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>		%ld0 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
%idx = affine.apply affine_map<(d0) -> (d0 + 1)> (%i)		%idx = affine.apply affine_map<(d0) -> (d0 + 1)> (%i)
affine.vector_store %ld0, %in[32*%idx] : memref<512xf32>, vector<32xf32>		affine.vector_store %ld0, %in[32*%idx] : memref<512xf32>, vector<32xf32>
// CHECK-NOT: affine.vector_load		// TODO-NOT: affine.vector_load
%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>		%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
%add = addf %ld0, %ld1 : vector<32xf32>		%add = addf %ld0, %ld1 : vector<32xf32>
affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>		affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>
}		}
return		return
}		}

		// CHECK-LABEL: func @external_no_forward_load
		// CHECK: affine.load
		// CHECK: affine.store
		// CHECK: affine.load
		// CHECK: affine.store

		func @external_no_forward_load(%in : memref<512xf32>, %out : memref<512xf32>) {
		affine.for %i = 0 to 16 {
		%ld0 = affine.load %in[32*%i] : memref<512xf32>
		affine.store %ld0, %out[32*%i] : memref<512xf32>
		"memop"(%in, %out) : (memref<512xf32>, memref<512xf32>) -> ()
		%ld1 = affine.load %in[32*%i] : memref<512xf32>
		affine.store %ld1, %out[32*%i] : memref<512xf32>
		}
		return
		}

		// CHECK-LABEL: func @external_no_forward_store
		// CHECK: affine.store
		// CHECK: affine.load
		// CHECK: affine.store

		func @external_no_forward_store(%in : memref<512xf32>, %out : memref<512xf32>) {
		%cf1 = constant 1.0 : f32
		affine.for %i = 0 to 16 {
		affine.store %cf1, %in[32*%i] : memref<512xf32>
		"memop"(%in, %out) : (memref<512xf32>, memref<512xf32>) -> ()
		%ld1 = affine.load %in[32*%i] : memref<512xf32>
		affine.store %ld1, %out[32*%i] : memref<512xf32>
		}
		return
		}

		// CHECK-LABEL: func @external_no_forward_cst
		// CHECK: affine.store
		// CHECK-NEXT: affine.store
		// CHECK-NEXT: affine.load
		// CHECK-NEXT: affine.store

		func @external_no_forward_cst(%in : memref<512xf32>, %out : memref<512xf32>) {
		ftynseUnsubmitted Not Done Reply Inline Actions Nit: there's no "external" operation in this test AFAICS ftynse: Nit: there's no "external" operation in this test AFAICS
		%cf1 = constant 1.0 : f32
		%cf2 = constant 2.0 : f32
		%m2 = memref.cast %in : memref<512xf32> to memref<?xf32>
		affine.for %i = 0 to 16 {
		affine.store %cf1, %in[32*%i] : memref<512xf32>
		affine.store %cf2, %m2[32*%i] : memref<?xf32>
		%ld1 = affine.load %in[32*%i] : memref<512xf32>
		affine.store %ld1, %out[32*%i] : memref<512xf32>
		}
		return
		}

		// Although there is a dependence from the second store to the load, it is
		// satisfied by the outer surrounding loop, and does not prevent the first
		// store to be forwarded to the load.
		ayzhuangUnsubmitted Not Done Reply Inline Actions Comment not correct, please modify. ayzhuang: Comment not correct, please modify.
		func @overlap_no_fwd(%N : index) -> f32 {
		%cf7 = constant 7.0 : f32
		%cf9 = constant 9.0 : f32
		%c0 = constant 0 : index
		%c1 = constant 1 : index
		%m = memref.alloc() : memref<10xf32>
		affine.for %i0 = 0 to 5 {
		affine.store %cf7, %m[2 * %i0] : memref<10xf32>
		affine.for %i1 = 0 to %N {
		%v0 = affine.load %m[2 * %i0] : memref<10xf32>
		%v1 = addf %v0, %v0 : f32
		affine.store %cf9, %m[%i0 + 1] : memref<10xf32>
		}
		}
		// Due to this load, the memref isn't optimized away.
		%v3 = affine.load %m[%c1] : memref<10xf32>
		return %v3 : f32

		// CHECK-LABEL: func @overlap_no_fwd
		// CHECK: affine.for %{{.*}} = 0 to 5 {
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: affine.for %{{.}} = 0 to %{{.}} {
		// CHECK-NEXT: %{{.*}} = affine.load
		// CHECK-NEXT: %{{.}} = addf %{{.}}, %{{.*}} : f32
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: }
		// CHECK-NEXT: }
		// CHECK-NEXT: %{{.}} = affine.load %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: return %{{.*}} : f32
		}
		ftynseUnsubmitted Not Done Reply Inline Actions Please put the check blocks consistently below or consistently after the function they should match. (Look what the rest of the file does). It's also possible to use the `// -----` marker to separate test cases. ftynse: Please put the check blocks consistently below or consistently after the function they should…


		// CHECK-LABEL: func @redundant_store_elim
		// CHECK: affine.for
		// CHECK-NEXT: affine.store
		// CHECK-NEXT: }

		func @redundant_store_elim(%out : memref<512xf32>) {
		%cf1 = constant 1.0 : f32
		%cf2 = constant 2.0 : f32
		affine.for %i = 0 to 16 {
		affine.store %cf1, %out[32*%i] : memref<512xf32>
		affine.store %cf2, %out[32*%i] : memref<512xf32>
		}
		return
		}


		// CHECK-LABEL: func @redundant_store_elim
		// CHECK: affine.for
		// CHECK-NEXT: affine.store
		// CHECK-NEXT: "test.use"
		// CHECK-NEXT: affine.store
		// CHECK-NEXT: }

		func @redundant_store_elim_fail(%out : memref<512xf32>) {
		%cf1 = constant 1.0 : f32
		%cf2 = constant 2.0 : f32
		affine.for %i = 0 to 16 {
		affine.store %cf1, %out[32*%i] : memref<512xf32>
		"test.use"(%out) : (memref<512xf32>) -> ()
		affine.store %cf2, %out[32*%i] : memref<512xf32>
		}
		return
		}

		ftynseUnsubmitted Done Reply Inline Actions Please add a newline. ftynse: Please add a newline.

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Correct memrefdataflow behavior in the presence of cast and other operationsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 353443

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp

mlir/test/Dialect/Affine/scalrep.mlir

[MLIR] Correct memrefdataflow behavior in the presence of cast and other operations
ClosedPublic