This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Affine/Transforms/
-
Dialect/
-
Affine/
-
Transforms/
57/68
AffineScalarReplacement.cpp
-
test/Dialect/Affine/
-
Dialect/
-
Affine/
7/10
scalrep.mlir

Differential D104053

[MLIR] Correct memrefdataflow behavior in the presence of cast and other operations
ClosedPublic

Authored by wsmoses on Jun 10 2021, 12:15 PM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache
mehdi_amini
chelini
kumasento
vinayaka-polymage
ayzhuang
dcaballe

Commits

rG44826ecd929b: [MLIR] Correct memrefdataflow behavior in the presence of cast and other…

Summary

MemRefDataFlow performs mem2reg style operations for affine load/stores. Unfortunately, it is not presently correct in the presence of external operations such as memref.cast, or function calls. This diff extends the functionality of the pass to remain correct in the presence of such ops.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

wsmoses created this revision.Jun 10 2021, 12:15 PM

Herald added subscribers: dcaballe, cota, teijeong and 15 others. · View Herald TranscriptJun 10 2021, 12:15 PM

wsmoses requested review of this revision.Jun 10 2021, 12:15 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2021, 12:16 PM

Herald added a subscriber: stephenneuendorffer. · View Herald Transcript

Harbormaster completed remote builds in B108666: Diff 351234.Jun 10 2021, 12:44 PM

@wsmoses MemRefDataFlow has just been moved and been renamed. Could you rebase right away? (so that the rebase becomes easy with no further intervening commits)

bondhugula added a reviewer: vinayaka-polymage.Jun 14 2021, 7:03 AM

bondhugula added a reviewer: ayzhuang.

Rebase after memrefdataflow opt move

Harbormaster completed remote builds in B109153: Diff 351943.Jun 14 2021, 12:39 PM

ftynse added inline comments.Jun 21 2021, 4:11 AM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
68–71	Could we have some doc comments for these functions?
108–112	Nit: add braces for these
120–122	Nit: expand auto here plz
163
166	Nit: ++iter
173	Expanding auto will remove this linter error.
209	Nit: here and below, please add trailing dots to all sentences.
334	Please fix the linter error.
mlir/test/Dialect/Affine/scalrep.mlir
248–249	Please explain what needs to be done here.
645	Please add a newline.

Fix formatting

Harbormaster completed remote builds in B110258: Diff 353443.Jun 21 2021, 12:47 PM

bondhugula requested changes to this revision.Jun 21 2021, 6:27 PM

bondhugula added inline comments.

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
68–71	Since these aren't publicly exposed, I think the convention is to have these comments on the definition instead.
93–94	Reflow - use width.
95	`auto` should be fine here.
105–106	Reflow.
367	You don't need to initialize this to `nullptr` - it has default null init.

bondhugula added inline comments.Jun 21 2021, 6:27 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
83	From the comment itself, it's not clear what it means for an operation to have a memory effect on `memOp`! An op has a memory effect or not through a `Value` in this case - it isn't meaningful to say an op has a memory effect on another op. Can you rephrase?
91	Can you move this declare/init further below? Also, `legal` -> `hasEffect`?
161	Terminate with period.
199–203	This is covered neither by the commit title nor commit summary and is a new feature/extension completely separate from the fix for side-effecting/non-affine deferencing ops being present. Can you please move this to another revision?

This revision now requires changes to proceed.Jun 21 2021, 6:27 PM

Address changes and refactor store forwarding

wsmoses marked 12 inline comments as done.Jun 22 2021, 9:40 AM

wsmoses added inline comments.

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
95	I don't believe it is. Since check is called recursively, the actual type is needed.

wsmoses added inline comments.Jun 22 2021, 9:41 AM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
91	Changed name, however, this can't be moved any further down since it is referenced within the check function.

Harbormaster completed remote builds in B110432: Diff 353678.Jun 22 2021, 10:42 AM

ftynse added inline comments.Jun 23 2021, 1:32 AM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
95	Indeed. And you need an std::function, function_ref won't work correctly.
mlir/test/Dialect/Affine/scalrep.mlir
248–249	Can this be fixed by recognizing specifically the ops with AffineRead/WriteOpInterface before MemoryEffectOpInterface, and keeping track of the affine subscripts?

This patch is regressing functionality - on store_load_store_nest_forward. Dependence analysis was already being used by this pass and was able to earlier already detect that a store was able to forward to the load. We shouldn't be disabling that test instead of revising this to not undo that inference.

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
71–72	Pass dom infos by reference since they can't be null at this stage.
76	Likewise.
82–125	Ensure that all ... do not have ... Ensure that no operation between ... has the ... It's not clear what "potential memory effect" is. Rephrase. Nit: Enclose any arg names referred to in backticks (eg. memOp, EffectType).
83	It still isn't clear here what "effect on an op" means.
84	is defined to be an operation -> is an operation ?
88	originalMemref -> memref ? It's not clear what "original" here means.
92	Document this please. Should this be hasSideEffect? hasEffect is generic.
95	Please rename this lambda - `check` is too general.
120	may have a side effect? Nit: Please terminate all comments with a period - here and everywhere else.
135	Please rename this lambda.
141	to -> `endOp` or `untilOp`?
146	Assertion message please.
150	You don't need the `.getOperation()` I think.
161	Name isn't descriptive enough: `todoBlocks`?
258–259	Update doc comment to cover the additional `..ToErase` args.
306–307	"2." is gone and all numbers are off by one. You'll also have to update the top-level class comment on `AffineScalarReplacement`.
306–307	This isn't meaningful as a loop - this is only equivalent to: assert(fwdingCandidates.size() <= 1 && "...");
307–308	Move this up and if (....empty()) return failure(); ... assertion ... lastWriteStoreOp = fwdingCandidates.front();
334	This isn't properly named - `loadCandidates`?
357–359	2 values -> two values
368	Avoid `auto` here.
409	This is a local variable and isn't used any more - no need to clear.
mlir/test/Dialect/Affine/scalrep.mlir
250–251	Dependence analysis is already being used by the pass and was able to earlier already detect that this store did not reach the load above. Please see comment above - it's exactly saying that: " Although there is a dependence from the second store to the load, it is satisfied by the outer surrounding loop, and does not prevent the first // store to be forwarded to the load." This patch is regressing on this functionality and undoing that inference. We shouldn't be disabling this test.

This revision now requires changes to proceed.Jun 26 2021, 12:48 AM

Address comments

Additional comments

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
174	This is a sketch of how the additional dependence analysis can be restored, while maintaining correctness with other ops. This snippet here is not sufficient, see my comment in the test.
mlir/test/Dialect/Affine/scalrep.mlir
250–251	I've demonstrated above where such dependence checking code could be inserted. Unfortunately, I do not understand the existing dependence loop structure sufficiently to rewrite it to be correct and apply here, when taking into account that other operators could modify the memory. If you could explain that, or take a stab feel free. Alternatively, if you're okay with it, since this PR fixes existing correctness bugs that I've seen in practice, perhaps we could first fix correctness, then restore that optimization?

Harbormaster completed remote builds in B111198: Diff 354764.Jun 27 2021, 1:01 PM

bondhugula requested changes to this revision.Jun 27 2021, 5:15 PM

bondhugula added inline comments.

mlir/test/Dialect/Affine/scalrep.mlir
250–251	Unfortunately, I do not understand the existing dependence loop structure sufficiently Can you explain which part isn't clear? I can help if the code comments or doc isn't clear. Alternatively, if you're okay with it, since this PR fixes existing correctness bugs that >I've seen in practice, perhaps we could first fix correctness, then restore that Actually, we shouldn't be creating such a regression - this is really not a corner case or a special case that this is regressing on but a pretty important pattern and the very reason dependence analysis is being used by this pass! It is common for affine passes to be missing handling of such side-effecting operations that need to be fixed at various places - (for eg. we always assume that %memref_1 and %memref_2 are different even if they are block arguments). Let's fix this while not regressing. For `hasNoIntervening` effect, can you document what "between" two ops means? It's obvious when the two ops lie in the same block but not otherwise. Also, the class comment on the top has still not been updated to reflect this check.

This revision now requires changes to proceed.Jun 27 2021, 5:15 PM

bondhugula added inline comments.Jun 27 2021, 5:53 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
155	Address clang-tidy here please.
179–189	Reflow - please use the entire width for comments here and everywhere else - it'll lead to fewer lines in general.

bondhugula added inline comments.Jun 27 2021, 6:21 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
72	This is unused.
173–174	an -> and
174	It's not clear to me what "check the entire parent op" means here. Rephrase?
197	Please put `from` and `to` in backticks or else it messes up what' being conveyed.
302–304	If we only expect to find the first op and are asserting below when there is more than one, we should be doing it right here instead of finding more than two ops. We are also completely missing comments here on what the approach is. To start with, `postDominanceInfo` is now no longer being used, yet it's being computed and passed to this function. All of the related comments are outdated and incorrect and still appear to be present. I don't think I really understand the approach at all. Can you please properly document things and summarize the approach?
306–307	This whole comment is outdated and incorrect now. I think several of the code comment in class comment are also similarly outdated.
307	.size() == 0 -> .empty()
397	This is not even used any more anywhere in the code.

bondhugula added inline comments.Jun 27 2021, 6:36 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
306–307	You may also want to rebase on master - post dominance info isn't used anymore and this check was changed to use dominance info.

I'm going to be mostly away from reviewing the next 1.5 weeks - @ayzhuang, @dcaballe - could you please take over reviewing this? I've pretty much posted all comments I had so far.

This revision now requires review to proceed.Jun 27 2021, 6:42 PM

bondhugula added inline comments.Jun 27 2021, 7:16 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
258–278	Instead of dropping the reliance on dominance and the condition (3) that existed to find the "last write" to fwd, what could instead be done to be unified with the existing approach is to simply also consider here all ops that have a memory write side-effect in addition to just affine writes for `storeOps` above. Then ensure that those ops get into `depSrcStores` as well (effectively treating such non-affine ops conservatively). Find the last write op in the same way as before and this naturally/transparently takes care of it. If the unique last write isn't an AffineWriteOpInterface, no forwarding will happen. This will also be as powerful as the previous approach and will not cause any regression. It's also no more expensive than the existing approach and you really don't need to check for any paths separately. (You are just reusing dominance info.)

Restore dependence analysis and address comments

wsmoses added inline comments.Jun 27 2021, 7:53 PM

mlir/test/Dialect/Affine/scalrep.mlir
250–251	Understood. I've spent some time recently going through and trying to understand the existing analysis, and have added what I think is a reasonable solution that maintains the aforementioned case, and general correctness. For ease, I've also added some more depth to that part of the code (and perhaps can add a figure if thought to be useful). In essence the reliance of the store dominating the load as a necessary precondition for why the given loop depth range was unclear to me and why I was hesitant to adding something like that before ensuring it would also apply here. Happily it does (with the aforementioned changes and comment describing why it should be correct). It may even be able to be slightly more aggressive than the previous dependence analysis since we can set the min loop depth to that of the proposed replacement op rather than the min across all potential storing operations.

Harbormaster completed remote builds in B111216: Diff 354783.Jun 27 2021, 8:16 PM

I'm going to be mostly away from reviewing the next 1.5 weeks - @ayzhuang, @dcaballe - could you please take over reviewing this? I've pretty much posted all comments I had so far.

Diego is also unavailable AFAIK.

The fixed version without regression looks okay to me. The overall recursive algorithm is to look if there is a non-affine operation that may have side effects preventing the store-to-load forwarding. It starts from the store and eagerly looks into all control-flow successors (operations and blocks) until the load is reached or all control flow paths are visited. This is overly conservative because it is visiting operations that do not lie on any control flow path from the store to the load. Ideally, we would have some reachability analysis and only visit the operations that are (1) reachable from the store and (2) such that the load is reachable from them.

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
84	Nit: `///`
263–264	Please reflow and use `///` for top-level comments.
313–325	Spurious whitespace changes.
330–345	Nit: trailing dot plz.
343
352–355
mlir/test/Dialect/Affine/scalrep.mlir
597	Nit: there's no "external" operation in this test AFAICS
631–642	Please put the check blocks consistently below or consistently after the function they should match. (Look what the rest of the file does). It's also possible to use the `// -----` marker to separate test cases.

This revision is now accepted and ready to land.Jun 28 2021, 3:13 AM

Post rebase / fixing nits

Harbormaster completed remote builds in B111297: Diff 354907.Jun 28 2021, 9:05 AM

Closed by commit rG44826ecd929b: [MLIR] Correct memrefdataflow behavior in the presence of cast and other… (authored by wsmoses). · Explain WhyJun 28 2021, 9:23 AM

This revision was automatically updated to reflect the committed changes.

wsmoses added a commit: rG44826ecd929b: [MLIR] Correct memrefdataflow behavior in the presence of cast and other….

mehdi_amini added inline comments.Jun 28 2021, 6:13 PM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
119	How does this work in presence of things like subview? Or if two different function parameters are the same memref (maybe that's forbidden by the affine dialect I don't remember)?
179	Seems like you can early return as soon as `hasSideEffect` is true here instead of continuing to traverse the IR?
203	Likely another place where you can early return when hasSideEffect is true.
248	After the call to checkOperation is another place to check for hasSideEffect and early return to limit the traversal.

ayzhuang added inline comments.Jun 29 2021, 10:30 AM

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp
369	rename depStore to load
387	hasSingleElement check is removed. Have you tested multi-block functions? If there is no problem, could you remove the above comment and add a unit test of multi-block function?
398	Why remove loadCSE from this and other comments?
mlir/test/Dialect/Affine/scalrep.mlir
612	Comment not correct, please modify.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Affine/

Transforms/

AffineScalarReplacement.cpp

452 lines

test/

Dialect/

Affine/

scalrep.mlir

88 lines

Diff 354929

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp

Show All 27 Lines

#define DEBUG_TYPE "memref-dataflow-opt" #define DEBUG_TYPE "memref-dataflow-opt"

using namespace mlir; using namespace mlir;

namespace { namespace {

// The store to load forwarding and load CSE rely on three conditions: // The store to load forwarding and load CSE rely on three conditions:

// //

// 1) store/load and load need to have mathematically equivalent affine access // 1) store/load providing a replacement value and load being replaced need to

// functions (checked after full composition of load/store operands); this // have mathematically equivalent affine access functions (checked after full

// implies that they access the same single memref element for all iterations of // composition of load/store operands); this implies that they access the same

// the common surrounding loop, // single memref element for all iterations of the common surrounding loop,

// //

// 2) the store/load op should dominate the load op, // 2) the store/load op should dominate the load op,

// //

// 3) among all op's that satisfy both (1) and (2), for store to load // 3) no operation that may write to memory read by the load being replaced can

// forwarding, the one that does not dominate any store op that has a // occur after executing the instruction (load or store) providing the

// dependence into the load, is provably the last writer to the particular // replacement value and before the load being replaced (thus potentially

// memref location being loaded at the load op, and its store value can be // allowing overwriting the memory read by the load).

// forwarded to the load; for load CSE, any op that does not dominate any store

// op that have a dependence into the load can be forwarded and the first one

// found is chosen. Note that the only dependences that are to be considered are

// those that are satisfied at the block* of the innermost common surrounding

// loop of the <store/load, load> being considered.

// (* A dependence being satisfied at a block: a dependence that is satisfied by

// virtue of the destination operation appearing textually / lexically after

// the source operation within the body of a 'affine.for' operation; thus, a

// dependence is always either satisfied by a loop or by a block).

// //

// The above conditions are simple to check, sufficient, and powerful for most // The above conditions are simple to check, sufficient, and powerful for most

// cases in practice - they are sufficient, but not necessary --- since they // cases in practice - they are sufficient, but not necessary --- since they

// don't reason about loops that are guaranteed to execute at least once or // don't reason about loops that are guaranteed to execute at least once or

// multiple sources to forward from. // multiple sources to forward from.

// //

// TODO: more forwarding can be done when support for // TODO: more forwarding can be done when support for

// loop/conditional live-out SSA values is available. // loop/conditional live-out SSA values is available.

// TODO: do general dead store elimination for memref's. This pass // TODO: do general dead store elimination for memref's. This pass

// currently only eliminates the stores only if no other loads/uses (other // currently only eliminates the stores only if no other loads/uses (other

// than dealloc) remain. // than dealloc) remain.

// //

struct AffineScalarReplacement struct AffineScalarReplacement

: public AffineScalarReplacementBase<AffineScalarReplacement> { : public AffineScalarReplacementBase<AffineScalarReplacement> {

void runOnFunction() override; void runOnFunction() override;

LogicalResult forwardStoreToLoad(AffineReadOpInterface loadOp); LogicalResult forwardStoreToLoad(AffineReadOpInterface loadOp,

void loadCSE(AffineReadOpInterface loadOp); SmallVectorImpl<Operation *> &loadOpsToErase,

SmallPtrSetImpl<Value> &memrefsToErase,

// A list of memref's that are potentially dead / could be eliminated. DominanceInfo &domInfo);

SmallPtrSet<Value, 4> memrefsToErase;

// Load ops whose results were replaced by those forwarded from stores void loadCSE(AffineReadOpInterface loadOp,

// dominating stores or loads.. SmallVectorImpl<Operation *> &loadOpsToErase,

SmallVector<Operation *, 8> loadOpsToErase; DominanceInfo &domInfo);

DominanceInfo *domInfo = nullptr;

}; };

ftynseUnsubmitted

Done

Could we have some doc comments for these functions?

ftynse: Could we have some doc comments for these functions?

bondhugulaUnsubmitted

Done

Since these aren't publicly exposed, I think the convention is to have these comments on the definition instead.

bondhugula: Since these aren't publicly exposed, I think the convention is to have these comments on the…

bondhugulaUnsubmitted

Done

Pass dom infos by reference since they can't be null at this stage.

bondhugula: Pass dom infos by reference since they can't be null at this stage.

bondhugulaUnsubmitted

Done

This is unused.

bondhugula: This is unused.

} // end anonymous namespace } // end anonymous namespace

/// Creates a pass to perform optimizations relying on memref dataflow such as /// Creates a pass to perform optimizations relying on memref dataflow such as

/// store to load forwarding, elimination of dead stores, and dead allocs. /// store to load forwarding, elimination of dead stores, and dead allocs.

bondhugulaUnsubmitted

Done

Likewise.

bondhugula: Likewise.

std::unique_ptr<OperationPass<FuncOp>> std::unique_ptr<OperationPass<FuncOp>>

mlir::createAffineScalarReplacementPass() { mlir::createAffineScalarReplacementPass() {

return std::make_unique<AffineScalarReplacement>(); return std::make_unique<AffineScalarReplacement>();

} }

// Check if the store may be reaching the load. /// Ensure that all operations that could be executed after `start`

static bool storeMayReachLoad(Operation *storeOp, Operation *loadOp, /// (noninclusive) and prior to `memOp` (e.g. on a control flow/op path

bondhugulaUnsubmitted

Done

From the comment itself, it's not clear what it means for an operation to have a memory effect on memOp! An op has a memory effect or not through a Value in this case - it isn't meaningful to say an op has a memory effect on another op. Can you rephrase?

bondhugula: From the comment itself, it's not clear what it means for an operation to have a memory effect…

bondhugulaUnsubmitted

Done

It still isn't clear here what "effect on an op" means.

bondhugula: It still isn't clear here what "effect on an op" means.

unsigned minSurroundingLoops) { /// between the operations) do not have the potential memory effect

bondhugulaUnsubmitted

Done

is defined to be an operation -> is an operation ?

bondhugula: is defined to be an operation -> is an operation ?

ftynseUnsubmitted

Done

Nit: ///

ftynse: Nit: `///`

MemRefAccess srcAccess(storeOp); /// `EffectType` on `memOp`. `memOp` is an operation that reads or writes to

MemRefAccess destAccess(loadOp); /// a memref. For example, if `EffectType` is MemoryEffects::Write, this method

/// will check if there is no write to the memory between `start` and `memOp`

/// that would change the read within `memOp`.

bondhugulaUnsubmitted

Done

originalMemref -> memref ?

It's not clear what "original" here means.

bondhugula: originalMemref -> memref ? It's not clear what "original" here means.

template <typename EffectType, typename T>

bool hasNoInterveningEffect(Operation *start, T memOp) {

bondhugulaUnsubmitted

Done

Can you move this declare/init further below?

Also, legal -> hasEffect?

bondhugula: Can you move this declare/init further below? Also, `legal` -> `hasEffect`?

wsmosesAuthorUnsubmitted

Done

Changed name, however, this can't be moved any further down since it is referenced within the check function.

wsmoses: Changed name, however, this can't be moved any further down since it is referenced within the…

Value memref = memOp.getMemRef();

bondhugulaUnsubmitted

Done

Document this please. Should this be hasSideEffect? hasEffect is generic.

bondhugula: Document this please. Should this be hasSideEffect? hasEffect is generic.

bool isOriginalAllocation = memref.getDefiningOp<memref::AllocaOp>() ||

memref.getDefiningOp<memref::AllocOp>();

bondhugulaUnsubmitted

Done

Reflow - use width.

bondhugula: Reflow - use width.

bondhugulaUnsubmitted

Done

auto should be fine here.

bondhugula: `auto` should be fine here.

wsmosesAuthorUnsubmitted

Done

I don't believe it is. Since check is called recursively, the actual type is needed.

wsmoses: I don't believe it is. Since check is called recursively, the actual type is needed.

ftynseUnsubmitted

Done

Indeed. And you need an std::function, function_ref won't work correctly.

ftynse: Indeed. And you need an std::function, function_ref won't work correctly.

bondhugulaUnsubmitted

Done

Please rename this lambda - check is too general.

bondhugula: Please rename this lambda - `check` is too general.

// A boolean representing whether an intervening operation could have impacted

// memOp.

bool hasSideEffect = false;

// Check whether the effect on memOp can be caused by a given operation op.

std::function<void(Operation *)> checkOperation = [&](Operation *op) {

// If the effect has alreay been found, early exit,

if (hasSideEffect)

return;

if (auto memEffect = dyn_cast<MemoryEffectOpInterface>(op)) {

bondhugulaUnsubmitted

Done

Reflow.

bondhugula: Reflow.

SmallVector<MemoryEffects::EffectInstance, 1> effects;

memEffect.getEffects(effects);

bool opMayHaveEffect = false;

for (auto effect : effects) {

// If op causes EffectType on a potentially aliasing location for

ftynseUnsubmitted

Done

Nit: add braces for these

ftynse: Nit: add braces for these

// memOp, mark as having the effect.

if (isa<EffectType>(effect.getEffect())) {

if (isOriginalAllocation && effect.getValue() &&

(effect.getValue().getDefiningOp<memref::AllocaOp>() ||

effect.getValue().getDefiningOp<memref::AllocOp>())) {

if (effect.getValue() != memref)

continue;

mehdi_aminiUnsubmitted

Not Done

How does this work in presence of things like subview? Or if two different function parameters are the same memref (maybe that's forbidden by the affine dialect I don't remember)?

mehdi_amini: How does this work in presence of things like subview? Or if two different function parameters…

}

bondhugulaUnsubmitted

Done

may have a side effect?

Nit: Please terminate all comments with a period - here and everywhere else.

bondhugula: may have a side effect? Nit: Please terminate all comments with a period - here and everywhere…

opMayHaveEffect = true;

break;

ftynseUnsubmitted

Done

Nit: expand auto here plz

ftynse: Nit: expand auto here plz

}

bondhugulaUnsubmitted

Done

Ensure that all ... do not have ...

Ensure that no operation between ... has the ...

It's not clear what "potential memory effect" is. Rephrase.

Nit: Enclose any arg names referred to in backticks (eg. memOp, EffectType).

bondhugula: Ensure that all ... do not have ... Ensure that no operation between ... has the ... It's not…

if (!opMayHaveEffect)

return;

// If the side effect comes from an affine read or write, try to

// prove the side effecting `op` cannot reach `memOp`.

if (isa<AffineReadOpInterface, AffineWriteOpInterface>(op)) {

MemRefAccess srcAccess(op);

MemRefAccess destAccess(memOp);

// Dependence analysis is only correct if both ops operate on the same

// memref.

bondhugulaUnsubmitted

Done

Please rename this lambda.

bondhugula: Please rename this lambda.

if (srcAccess.memref == destAccess.memref) {

FlatAffineConstraints dependenceConstraints; FlatAffineConstraints dependenceConstraints;

unsigned nsLoops = getNumCommonSurroundingLoops(*loadOp, *storeOp);

// Number of loops containing the start op and the ending operation.

unsigned minSurroundingLoops =

getNumCommonSurroundingLoops(*start, *memOp);

bondhugulaUnsubmitted

Done

to -> endOp or untilOp?

bondhugula: to -> `endOp` or `untilOp`?

// Number of loops containing the operation `op` which has the

// potential memory side effect and can occur on a path between

// `start` and `memOp`.

unsigned nsLoops = getNumCommonSurroundingLoops(*op, *memOp);

bondhugulaUnsubmitted

Done

Assertion message please.

bondhugula: Assertion message please.

// For ease, let's consider the case that `op` is a store and we're

// looking for other potential stores (e.g `op`) that overwrite memory

// after `start`, and before being read in `memOp`. In this case, we

bondhugulaUnsubmitted

Done

You don't need the .getOperation() I think.

bondhugula: You don't need the `.getOperation()` I think.

// only need to consider other potential stores with depth >

// minSurrounding loops since `start` would overwrite any store with a

// smaller number of surrounding loops before.

unsigned d; unsigned d;

// Dependences at loop depth <= minSurroundingLoops do NOT matter.

for (d = nsLoops + 1; d > minSurroundingLoops; d--) { for (d = nsLoops + 1; d > minSurroundingLoops; d--) {

bondhugulaUnsubmitted

Done

Address clang-tidy here please.

bondhugula: Address clang-tidy here please.

DependenceResult result = checkMemrefAccessDependence( DependenceResult result = checkMemrefAccessDependence(

srcAccess, destAccess, d, &dependenceConstraints, srcAccess, destAccess, d, &dependenceConstraints,

/*dependenceComponents=*/nullptr); /*dependenceComponents=*/nullptr);

if (hasDependence(result)) if (hasDependence(result)) {

break; hasSideEffect = true;

return;

bondhugulaUnsubmitted

Done

Terminate with period.

bondhugula: Terminate with period.

bondhugulaUnsubmitted

Done

Name isn't descriptive enough: todoBlocks?

bondhugula: Name isn't descriptive enough: `todoBlocks`?

}

} }

ftynseUnsubmitted

Done

SmallVector<Block *, 2> todo;

{

- // First consier the parent block of `from` an check all operations

+ // First consider the parent block of `from` an check all operations

// after `from`.

ftynse:

if (d <= minSurroundingLoops)

return false;

return true; // No side effect was seen, simply return.

return;

ftynseUnsubmitted

Done

Nit: ++iter

ftynse: Nit: ++iter

}

hasSideEffect = true;

return;

} }

// This is a straightforward implementation not optimized for speed. Optimize if (op->hasTrait<OpTrait::HasRecursiveSideEffects>()) {

ftynseUnsubmitted

Done

Expanding auto will remove this linter error.

ftynse: Expanding auto will remove this linter error.

// if needed. // Recurse into the regions for this op and check whether the internal

wsmosesAuthorUnsubmitted

Done

This is a sketch of how the additional dependence analysis can be restored, while maintaining correctness with other ops. This snippet here is not sufficient, see my comment in the test.

wsmoses: This is a sketch of how the additional dependence analysis can be restored, while maintaining…

bondhugulaUnsubmitted

Done

an -> and

bondhugula: an -> and

bondhugulaUnsubmitted

Done

It's not clear to me what "check the entire parent op" means here. Rephrase?

bondhugula: It's not clear to me what "check the entire parent op" means here. Rephrase?

LogicalResult // operations may have the side effect `EffectType` on memOp.

AffineScalarReplacement::forwardStoreToLoad(AffineReadOpInterface loadOp) { for (Region &region : op->getRegions())

// First pass over the use list to get the minimum number of surrounding for (Block &block : region)

// loops common between the load op and the store op, with min taken across for (Operation &op : block)

// all store ops. checkOperation(&op);

mehdi_aminiUnsubmitted

Not Done

Seems like you can early return as soon as hasSideEffect is true here instead of continuing to traverse the IR?

mehdi_amini: Seems like you can early return as soon as `hasSideEffect` is true here instead of continuing…

SmallVector<Operation *, 8> storeOps; return;

unsigned minSurroundingLoops = getNestingDepth(loadOp);

for (auto *user : loadOp.getMemRef().getUsers()) {

auto storeOp = dyn_cast<AffineWriteOpInterface>(user);

if (!storeOp)

continue;

unsigned nsLoops = getNumCommonSurroundingLoops(*loadOp, *storeOp);

minSurroundingLoops = std::min(nsLoops, minSurroundingLoops);

storeOps.push_back(storeOp);

} }

// The list of store op candidates for forwarding that satisfy conditions // Otherwise, conservatively assume generic operations have the effect

// (1) and (2) above - they will be filtered later when checking (3). // on the operation

SmallVector<Operation *, 8> fwdingCandidates; hasSideEffect = true;

return;

};

// Check all paths from ancestor op `parent` to the operation `to` for the

bondhugulaUnsubmitted

Done

Reflow - please use the entire width for comments here and everywhere else - it'll lead to fewer lines in general.

bondhugula: Reflow - please use the entire width for comments here and everywhere else - it'll lead to…

// effect. It is known that `to` must be contained within `parent`.

auto until = [&](Operation *parent, Operation *to) {

// TODO check only the paths from `parent` to `to`.

// Currently we fallback and check the entire parent op, rather than

// just the paths from the parent path, stopping after reaching `to`.

// This is conservatively correct, but could be made more aggressive.

assert(parent->isAncestor(to));

checkOperation(parent);

bondhugulaUnsubmitted

Done

Please put from and to in backticks or else it messes up what' being conveyed.

bondhugula: Please put `from` and `to` in backticks or else it messes up what' being conveyed.

};

// Check for all paths from operation `from` to operation `untilOp` for the

// given memory effect.

std::function<void(Operation *, Operation *)> recur =

[&](Operation *from, Operation *untilOp) {

bondhugulaUnsubmitted

Done

This is covered neither by the commit title nor commit summary and is a new feature/extension completely separate from the fix for side-effecting/non-affine deferencing ops being present. Can you please move this to another revision?

bondhugula: This is covered neither by the commit title nor commit summary and is a new feature/extension…

mehdi_aminiUnsubmitted

Not Done

Likely another place where you can early return when hasSideEffect is true.

mehdi_amini: Likely another place where you can early return when hasSideEffect is true.

assert(

from->getParentRegion()->isAncestor(untilOp->getParentRegion()) &&

"Checking for side effect between two operations without a common "

"ancestor");

// If the operations are in different regions, recursively consider all

ftynseUnsubmitted

Done

Nit: here and below, please add trailing dots to all sentences.

ftynse: Nit: here and below, please add trailing dots to all sentences.

// path from `from` to the parent of `to` and all paths from the parent

// of `to` to `to`.

if (from->getParentRegion() != untilOp->getParentRegion()) {

recur(from, untilOp->getParentOp());

until(untilOp->getParentOp(), untilOp);

return;

}

// Store ops that have a dependence into the load (even if they aren't // Now, assuming that `from` and `to` exist in the same region, perform

// forwarding candidates). Each forwarding candidate will be checked for a // a CFG traversal to check all the relevant operations.

// dominance on these. 'fwdingCandidates' are a subset of depSrcStores.

SmallVector<Operation *, 8> depSrcStores;

for (auto *storeOp : storeOps) { // Additional blocks to consider.

if (!storeMayReachLoad(storeOp, loadOp, minSurroundingLoops)) SmallVector<Block *, 2> todoBlocks;

{

// First consider the parent block of `from` an check all operations

// after `from`.

for (auto iter = ++from->getIterator(), end = from->getBlock()->end();

iter != end && &*iter != untilOp; ++iter) {

checkOperation(&*iter);

}

// If the parent of `from` doesn't contain `to`, add the successors

// to the list of blocks to check.

if (untilOp->getBlock() != from->getBlock())

for (Block *succ : from->getBlock()->getSuccessors())

todoBlocks.push_back(succ);

}

SmallPtrSet<Block *, 4> done;

// Traverse the CFG until hitting `to`.

while (todoBlocks.size()) {

Block *blk = todoBlocks.pop_back_val();

if (done.count(blk))

continue; continue;

done.insert(blk);

for (auto &op : *blk) {

if (&op == untilOp)

break;

checkOperation(&op);

mehdi_aminiUnsubmitted

Not Done

After the call to checkOperation is another place to check for hasSideEffect and early return to limit the traversal.

mehdi_amini: After the call to checkOperation is another place to check for hasSideEffect and early return…

if (&op == blk->getTerminator())

for (Block *succ : blk->getSuccessors())

todoBlocks.push_back(succ);

}

};

recur(start, memOp);

return !hasSideEffect;

}

// Stores that *may* be reaching the load. /// Attempt to eliminate loadOp by replacing it with a value stored into memory

bondhugulaUnsubmitted

Done

Update doc comment to cover the additional ..ToErase args.

bondhugula: Update doc comment to cover the additional `..ToErase` args.

depSrcStores.push_back(storeOp); /// which the load is guaranteed to retrieve. This check involves three

/// components: 1) The store and load must be on the same location 2) The store

/// must dominate (and therefore must always occur prior to) the load 3) No

/// other operations will overwrite the memory loaded between the given load

/// and store. If such a value exists, the replaced `loadOp` will be added to

ftynseUnsubmitted

Done

Please reflow and use /// for top-level comments.

ftynse: Please reflow and use `///` for top-level comments.

/// `loadOpsToErase` and its memref will be added to `memrefsToErase`.

LogicalResult AffineScalarReplacement::forwardStoreToLoad(

AffineReadOpInterface loadOp, SmallVectorImpl<Operation *> &loadOpsToErase,

SmallPtrSetImpl<Value> &memrefsToErase, DominanceInfo &domInfo) {

// The store op candidate for forwarding that satisfies all conditions

// to replace the load, if any.

Operation *lastWriteStoreOp = nullptr;

for (auto *user : loadOp.getMemRef().getUsers()) {

auto storeOp = dyn_cast<AffineWriteOpInterface>(user);

if (!storeOp)

continue;

MemRefAccess srcAccess(storeOp);

bondhugulaUnsubmitted

Not Done

Instead of dropping the reliance on dominance and the condition (3) that existed to find the "last write" to fwd, what could instead be done to be unified with the existing approach is to simply also consider here all ops that have a memory write side-effect *in addition to* just affine writes for storeOps above. Then ensure that those ops get into depSrcStores as well (effectively treating such non-affine ops conservatively). Find the last write op in the same way as before and this naturally/transparently takes care of it. If the unique last write isn't an AffineWriteOpInterface, no forwarding will happen. This will also be as powerful as the previous approach and will not cause any regression. It's also no more expensive than the existing approach and you really don't need to check for any paths separately. (You are just reusing dominance info.)

bondhugula: Instead of dropping the reliance on dominance and the condition (3) that existed to find the…

MemRefAccess destAccess(loadOp);

// 1. Check if the store and the load have mathematically equivalent // 1. Check if the store and the load have mathematically equivalent

// affine access functions; this implies that they statically refer to the // affine access functions; this implies that they statically refer to the

// same single memref element. As an example this filters out cases like: // same single memref element. As an example this filters out cases like:

// store %A[%i0 + 1] // store %A[%i0 + 1]

// load %A[%i0] // load %A[%i0]

// store %A[%M] // store %A[%M]

// load %A[%N] // load %A[%N]

// Use the AffineValueMap difference based memref access equality checking. // Use the AffineValueMap difference based memref access equality checking.

MemRefAccess srcAccess(storeOp);

MemRefAccess destAccess(loadOp);

if (srcAccess != destAccess) if (srcAccess != destAccess)

continue; continue;

// 2. The store has to dominate the load op to be candidate. // 2. The store has to dominate the load op to be candidate.

if (!domInfo->dominates(storeOp, loadOp)) if (!domInfo.dominates(storeOp, loadOp))

continue; continue;

// We now have a candidate for forwarding. // 3. Ensure there is no intermediate operation which could replace the

fwdingCandidates.push_back(storeOp); // value in memory.

} if (!hasNoInterveningEffect<MemoryEffects::Write>(storeOp, loadOp))

continue;

// 3. Of all the store ops that meet the above criteria, the store op // We now have a candidate for forwarding.

// that does not dominate any of the ops in 'depSrcStores' (if such exists) assert(lastWriteStoreOp == nullptr &&

// will not have any of those latter ops on its paths to `loadOp`. It would "multiple simulataneous replacement stores");

// thus be the unique store providing the value to the load. This condition is

// however conservative for eg:

// for ... {

// store

// load

// store

// load

// }

Operation *lastWriteStoreOp = nullptr;

for (auto *storeOp : fwdingCandidates) {

if (llvm::all_of(depSrcStores, [&](Operation *depStore) {

return !domInfo->properlyDominates(storeOp, depStore);

})) {

lastWriteStoreOp = storeOp; lastWriteStoreOp = storeOp;

bondhugulaUnsubmitted

Done

If we only expect to find the first op and are asserting below when there is more than one, we should be doing it right here instead of finding more than two ops.

We are also completely missing comments here on what the approach is. To start with, postDominanceInfo is now no longer being used, yet it's being computed and passed to this function. All of the related comments are outdated and incorrect and still appear to be present. I don't think I really understand the approach at all. Can you please properly document things and summarize the approach?

bondhugula: If we only expect to find the first op and are asserting below when there is more than one, we…

break;

}

} }

if (!lastWriteStoreOp) if (!lastWriteStoreOp)

bondhugulaUnsubmitted

Done

"2." is gone and all numbers are off by one.

You'll also have to update the top-level class comment on AffineScalarReplacement.

bondhugula: "2." is gone and all numbers are off by one. You'll also have to update the top-level class…

bondhugulaUnsubmitted

Done

This isn't meaningful as a loop - this is only equivalent to:

assert(fwdingCandidates.size() <= 1 && "...");

bondhugula: This isn't meaningful as a loop - this is only equivalent to: ``` assert(fwdingCandidates.size…

bondhugulaUnsubmitted

Not Done

.size() == 0 -> .empty()

bondhugula: .size() == 0 -> .empty()

bondhugulaUnsubmitted

Not Done

This whole comment is outdated and incorrect now. I think several of the code comment in class comment are also similarly outdated.

bondhugula: This whole comment is outdated and incorrect now. I think several of the code comment in class…

bondhugulaUnsubmitted

Not Done

You may also want to rebase on master - post dominance info isn't used anymore and this check was changed to use dominance info.

bondhugula: You may also want to rebase on master - post dominance info isn't used anymore and this check…

return failure(); return failure();

bondhugulaUnsubmitted

Done

Move this up and

if (....empty()) 
  return failure();
... assertion ...
lastWriteStoreOp = fwdingCandidates.front();

bondhugula: Move this up and ``` if (....empty()) return failure(); ... assertion ... lastWriteStoreOp…

// Perform the actual store to load forwarding. // Perform the actual store to load forwarding.

Value storeVal = Value storeVal =

cast<AffineWriteOpInterface>(lastWriteStoreOp).getValueToStore(); cast<AffineWriteOpInterface>(lastWriteStoreOp).getValueToStore();

// Check if 2 values have the same shape. This is needed for affine vector // Check if 2 values have the same shape. This is needed for affine vector

// loads and stores. // loads and stores.

if (storeVal.getType() != loadOp.getValue().getType()) if (storeVal.getType() != loadOp.getValue().getType())

return failure(); return failure();

loadOp.getValue().replaceAllUsesWith(storeVal); loadOp.getValue().replaceAllUsesWith(storeVal);

// Record the memref for a later sweep to optimize away. // Record the memref for a later sweep to optimize away.

memrefsToErase.insert(loadOp.getMemRef()); memrefsToErase.insert(loadOp.getMemRef());

// Record this to erase later. // Record this to erase later.

loadOpsToErase.push_back(loadOp); loadOpsToErase.push_back(loadOp);

return success(); return success();

} }

// The load to load forwarding / redundant load elimination is similar to the // The load to load forwarding / redundant load elimination is similar to the

ftynseUnsubmitted

Done

Spurious whitespace changes.

ftynse: Spurious whitespace changes.

// store to load forwarding. // store to load forwarding.

// loadA will be be replaced with loadB if: // loadA will be be replaced with loadB if:

// 1) loadA and loadB have mathematically equivalent affine access functions. // 1) loadA and loadB have mathematically equivalent affine access functions.

// 2) loadB dominates loadA. // 2) loadB dominates loadA.

// 3) loadB does not dominate any of the store ops that have a dependence into // 3) There is no write between loadA and loadB.

// loadA. void AffineScalarReplacement::loadCSE(

void AffineScalarReplacement::loadCSE(AffineReadOpInterface loadOp) { AffineReadOpInterface loadA, SmallVectorImpl<Operation *> &loadOpsToErase,

// The list of load op candidates for forwarding that satisfy conditions DominanceInfo &domInfo) {

// (1) and (2) above - they will be filtered later when checking (3). SmallVector<AffineReadOpInterface, 4> loadCandidates;

ftynseUnsubmitted

Done

Please fix the linter error.

ftynse: Please fix the linter error.

bondhugulaUnsubmitted

Done

This isn't properly named - loadCandidates?

bondhugula: This isn't properly named - `loadCandidates`?

SmallVector<Operation *, 8> fwdingCandidates; for (auto *user : loadA.getMemRef().getUsers()) {

SmallVector<Operation *, 8> storeOps; auto loadB = dyn_cast<AffineReadOpInterface>(user);

unsigned minSurroundingLoops = getNestingDepth(loadOp); if (!loadB || loadB == loadA)

MemRefAccess memRefAccess(loadOp); continue;

// First pass over the use list to get 1) the minimum number of surrounding

// loops common between the load op and an load op candidate, with min taken MemRefAccess srcAccess(loadB);

// across all load op candidates; 2) load op candidates; 3) store ops. MemRefAccess destAccess(loadA);

// We take min across all load op candidates instead of all load ops to make

// sure later dependence check is performed at loop depths that do matter. // 1. The accesses have to be to the same location.

ftynseUnsubmitted

Done

MemRefAccess destAccess(loadA);

- // 1. The accesses have to be to the same location

+ // 1. The accesses have to be to the same location.

if (srcAccess != destAccess) {

ftynse:

for (auto *user : loadOp.getMemRef().getUsers()) { if (srcAccess != destAccess) {

if (auto storeOp = dyn_cast<AffineWriteOpInterface>(user)) { continue;

ftynseUnsubmitted

Done

Nit: trailing dot plz.

ftynse: Nit: trailing dot plz.

storeOps.push_back(storeOp);

} else if (auto aLoadOp = dyn_cast<AffineReadOpInterface>(user)) {

MemRefAccess otherMemRefAccess(aLoadOp);

// No need to consider Load ops that have been replaced in previous store

// to load forwarding or loadCSE. If loadA or storeA can be forwarded to

// loadB, then loadA or storeA can be forwarded to loadC iff loadB can be

// forwarded to loadC.

// If loadB is visited before loadC and replace with loadA, we do not put

// loadB in candidates list, only loadA. If loadC is visited before loadB,

// loadC may be replaced with loadB, which will be replaced with loadA

// later.

if (aLoadOp != loadOp && !llvm::is_contained(loadOpsToErase, aLoadOp) &&

memRefAccess == otherMemRefAccess &&

domInfo->dominates(aLoadOp, loadOp)) {

fwdingCandidates.push_back(aLoadOp);

unsigned nsLoops = getNumCommonSurroundingLoops(*loadOp, *aLoadOp);

minSurroundingLoops = std::min(nsLoops, minSurroundingLoops);

}

} }

// No forwarding candidate. // 2. The store has to dominate the load op to be candidate.

if (fwdingCandidates.empty()) if (!domInfo.dominates(loadB, loadA))

return; continue;

// Store ops that have a dependence into the load. // 3. There is no write between loadA and loadB.

SmallVector<Operation *, 8> depSrcStores; if (!hasNoInterveningEffect<MemoryEffects::Write>(loadB.getOperation(),

loadA))

continue;

ftynseUnsubmitted

Done

continue;

+ // 3. There is no write between loadA and loadB.

if (!hasNoInterveningEffect<MemoryEffects::Write>(loadB.getOperation(),

loadA))

ftynse:

for (auto *storeOp : storeOps) { // Check if two values have the same shape. This is needed for affine vector

if (!storeMayReachLoad(storeOp, loadOp, minSurroundingLoops)) // loads.

if (loadB.getValue().getType() != loadA.getValue().getType())

bondhugulaUnsubmitted

Done

2 values -> two values

bondhugula: 2 values -> two values

continue; continue;

// Stores that *may* be reaching the load. loadCandidates.push_back(loadB);

depSrcStores.push_back(storeOp); }

}

// Of the legal load candidates, use the one that dominates all others

// 3. Of all the load op's that meet the above criteria, return the first load // to minimize the subsequent need to loadCSE

// found that does not dominate any op in 'depSrcStores' and has the same Value loadB;

bondhugulaUnsubmitted

Done

You don't need to initialize this to nullptr - it has default null init.

bondhugula: You don't need to initialize this to `nullptr` - it has default null init.

// shape as the load to be replaced (if one exists). The shape check is needed for (AffineReadOpInterface option : loadCandidates) {

bondhugulaUnsubmitted

Done

Avoid auto here.

bondhugula: Avoid `auto` here.

// for affine vector loads. if (llvm::all_of(loadCandidates, [&](AffineReadOpInterface depStore) {

ayzhuangUnsubmitted

Not Done

rename depStore to load

ayzhuang: rename depStore to load

Operation *firstLoadOp = nullptr; return depStore == option ||

Value oldVal = loadOp.getValue(); domInfo.dominates(option.getOperation(),

for (auto *loadOp : fwdingCandidates) { depStore.getOperation());

if (llvm::all_of(depSrcStores, })) {

[&](Operation *depStore) { loadB = option.getValue();

return !domInfo->properlyDominates(loadOp, depStore);

}) &&

cast<AffineReadOpInterface>(loadOp).getValue().getType() ==

oldVal.getType()) {

firstLoadOp = loadOp;

break; break;

} }

if (!firstLoadOp)

return;

// Perform the actual load to load forwarding. if (loadB) {

Value loadVal = cast<AffineReadOpInterface>(firstLoadOp).getValue(); loadA.getValue().replaceAllUsesWith(loadB);

loadOp.getValue().replaceAllUsesWith(loadVal);

// Record this to erase later. // Record this to erase later.

loadOpsToErase.push_back(loadOp); loadOpsToErase.push_back(loadA);

}

} }

void AffineScalarReplacement::runOnFunction() { void AffineScalarReplacement::runOnFunction() {

// Only supports single block functions at the moment. // Only supports single block functions at the moment.

ayzhuangUnsubmitted

Not Done

hasSingleElement check is removed. Have you tested multi-block functions? If there is no problem, could you remove the above comment and add a unit test of multi-block function?

ayzhuang: hasSingleElement check is removed. Have you tested multi-block functions? If there is no…

FuncOp f = getFunction(); FuncOp f = getFunction();

if (!llvm::hasSingleElement(f)) {

markAllAnalysesPreserved();

return;

}

domInfo = &getAnalysis<DominanceInfo>(); // Load op's whose results were replaced by those forwarded from stores.

SmallVector<Operation *, 8> opsToErase;

loadOpsToErase.clear(); // A list of memref's that are potentially dead / could be eliminated.

memrefsToErase.clear(); SmallPtrSet<Value, 4> memrefsToErase;

// Walk all load's and perform store to load forwarding and loadCSE. auto &domInfo = getAnalysis<DominanceInfo>();

bondhugulaUnsubmitted

Done

This is not even used any more anywhere in the code.

bondhugula: This is not even used any more anywhere in the code.

// Walk all load's and perform store to load forwarding.

ayzhuangUnsubmitted

Not Done

Why remove loadCSE from this and other comments?

ayzhuang: Why remove loadCSE from this and other comments?

f.walk([&](AffineReadOpInterface loadOp) { f.walk([&](AffineReadOpInterface loadOp) {

// Do store to load forwarding first, if no success, try loadCSE. if (failed(

if (failed(forwardStoreToLoad(loadOp))) forwardStoreToLoad(loadOp, opsToErase, memrefsToErase, domInfo))) {

loadCSE(loadOp); loadCSE(loadOp, opsToErase, domInfo);

}

}); });

// Erase all load op's whose results were replaced with store or load fwd'ed // Erase all load op's whose results were replaced with store fwd'ed ones.

// ones. for (auto *op : opsToErase)

for (auto *loadOp : loadOpsToErase) op->erase();

loadOp->erase();

bondhugulaUnsubmitted

Done

This is a local variable and isn't used any more - no need to clear.

bondhugula: This is a local variable and isn't used any more - no need to clear.

// Check if the store fwd'ed memrefs are now left with only stores and can // Check if the store fwd'ed memrefs are now left with only stores and can

// thus be completely deleted. Note: the canonicalize pass should be able // thus be completely deleted. Note: the canonicalize pass should be able

// to do this as well, but we'll do it here since we collected these anyway. // to do this as well, but we'll do it here since we collected these anyway.

for (auto memref : memrefsToErase) { for (auto memref : memrefsToErase) {

// If the memref hasn't been alloc'ed in this function, skip. // If the memref hasn't been alloc'ed in this function, skip.

Operation *defOp = memref.getDefiningOp(); Operation *defOp = memref.getDefiningOp();

if (!defOp || !isa<memref::AllocOp>(defOp)) if (!defOp || !isa<memref::AllocOp>(defOp))

// TODO: if the memref was returned by a 'call' operation, we // TODO: if the memref was returned by a 'call' operation, we

Show All 13 Lines

mlir/test/Dialect/Affine/scalrep.mlir

Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines
// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
// CHECK-NEXT: affine.for %{{.}} = 0 to %{{.}} {		// CHECK-NEXT: affine.for %{{.}} = 0 to %{{.}} {
// CHECK-NEXT: %{{.}} = addf %{{.}}, %{{.*}} : f32		// CHECK-NEXT: %{{.}} = addf %{{.}}, %{{.*}} : f32
// CHECK-NEXT: %{{.}} = affine.apply [[$MAP4]](%{{.}})		// CHECK-NEXT: %{{.}} = affine.apply [[$MAP4]](%{{.}})
// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK-NEXT: %{{.}} = affine.load %{{.}}[%{{.*}}] : memref<10xf32>		// CHECK-NEXT: %{{.}} = affine.load %{{.}}[%{{.*}}] : memref<10xf32>
// CHECK-NEXT: return %{{.*}} : f32		// CHECK-NEXT: return %{{.*}} : f32
}		}
		ftynseUnsubmitted Done Reply Inline Actions Please explain what needs to be done here. ftynse: Please explain what needs to be done here.
		ftynseUnsubmitted Done Reply Inline Actions Can this be fixed by recognizing specifically the ops with AffineRead/WriteOpInterface before MemoryEffectOpInterface, and keeping track of the affine subscripts? ftynse: Can this be fixed by recognizing specifically the ops with AffineRead/WriteOpInterface before…

// CHECK-LABEL: func @should_not_fwd		// CHECK-LABEL: func @should_not_fwd
		bondhugulaUnsubmitted Done Reply Inline Actions Dependence analysis is already being used by the pass and was able to earlier already detect that this store did not reach the load above. Please see comment above - it's exactly saying that: " Although there is a dependence from the second store to the load, it is satisfied by the outer surrounding loop, and does not prevent the first // store to be forwarded to the load." This patch is regressing on this functionality and undoing that inference. We shouldn't be disabling this test. bondhugula: Dependence analysis is already being used by the pass and was able to earlier already detect…
		wsmosesAuthorUnsubmitted Done Reply Inline Actions I've demonstrated above where such dependence checking code could be inserted. Unfortunately, I do not understand the existing dependence loop structure sufficiently to rewrite it to be correct and apply here, when taking into account that other operators could modify the memory. If you could explain that, or take a stab feel free. Alternatively, if you're okay with it, since this PR fixes existing correctness bugs that I've seen in practice, perhaps we could first fix correctness, then restore that optimization? wsmoses: I've demonstrated above where such dependence checking code could be inserted. Unfortunately, I…
		bondhugulaUnsubmitted Done Reply Inline Actions Unfortunately, I do not understand the existing dependence loop structure sufficiently Can you explain which part isn't clear? I can help if the code comments or doc isn't clear. Alternatively, if you're okay with it, since this PR fixes existing correctness bugs that >I've seen in practice, perhaps we could first fix correctness, then restore that Actually, we shouldn't be creating such a regression - this is really not a corner case or a special case that this is regressing on but a pretty important pattern and the very reason dependence analysis is being used by this pass! It is common for affine passes to be missing handling of such side-effecting operations that need to be fixed at various places - (for eg. we always assume that %memref_1 and %memref_2 are different even if they are block arguments). Let's fix this while not regressing. For `hasNoIntervening` effect, can you document what "between" two ops means? It's obvious when the two ops lie in the same block but not otherwise. Also, the class comment on the top has still not been updated to reflect this check. bondhugula: >Unfortunately, I do not understand the existing dependence loop structure sufficiently Can…
		wsmosesAuthorUnsubmitted Done Reply Inline Actions Understood. I've spent some time recently going through and trying to understand the existing analysis, and have added what I think is a reasonable solution that maintains the aforementioned case, and general correctness. For ease, I've also added some more depth to that part of the code (and perhaps can add a figure if thought to be useful). In essence the reliance of the store dominating the load as a necessary precondition for why the given loop depth range was unclear to me and why I was hesitant to adding something like that before ensuring it would also apply here. Happily it does (with the aforementioned changes and comment describing why it should be correct). It may even be able to be slightly more aggressive than the previous dependence analysis since we can set the min loop depth to that of the proposed replacement op rather than the min across all potential storing operations. wsmoses: Understood. I've spent some time recently going through and trying to understand the existing…
func @should_not_fwd(%A: memref<100xf32>, %M : index, %N : index) -> f32 {		func @should_not_fwd(%A: memref<100xf32>, %M : index, %N : index) -> f32 {
%cf = constant 0.0 : f32		%cf = constant 0.0 : f32
affine.store %cf, %A[%M] : memref<100xf32>		affine.store %cf, %A[%M] : memref<100xf32>
// CHECK: affine.load %{{.}}[%{{.}}]		// CHECK: affine.load %{{.}}[%{{.}}]
%v = affine.load %A[%N] : memref<100xf32>		%v = affine.load %A[%N] : memref<100xf32>
return %v : f32		return %v : f32
}		}

▲ Show 20 Lines • Show All 289 Lines • ▼ Show 20 Lines	affine.for %i = 0 to 15 {
affine.vector_store %ld0, %in[32*%idx] : memref<512xf32>, vector<32xf32>		affine.vector_store %ld0, %in[32*%idx] : memref<512xf32>, vector<32xf32>
// CHECK-NOT: affine.vector_load		// CHECK-NOT: affine.vector_load
%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>		%ld1 = affine.vector_load %in[32*%i] : memref<512xf32>, vector<32xf32>
%add = addf %ld0, %ld1 : vector<32xf32>		%add = addf %ld0, %ld1 : vector<32xf32>
affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>		affine.vector_store %ld1, %out[32*%i] : memref<512xf32>, vector<32xf32>
}		}
return		return
}		}

		// CHECK-LABEL: func @external_no_forward_load

		func @external_no_forward_load(%in : memref<512xf32>, %out : memref<512xf32>) {
		affine.for %i = 0 to 16 {
		%ld0 = affine.load %in[32*%i] : memref<512xf32>
		affine.store %ld0, %out[32*%i] : memref<512xf32>
		"memop"(%in, %out) : (memref<512xf32>, memref<512xf32>) -> ()
		%ld1 = affine.load %in[32*%i] : memref<512xf32>
		affine.store %ld1, %out[32*%i] : memref<512xf32>
		}
		return
		}
		// CHECK: affine.load
		// CHECK: affine.store
		// CHECK: affine.load
		// CHECK: affine.store

		// CHECK-LABEL: func @external_no_forward_store

		func @external_no_forward_store(%in : memref<512xf32>, %out : memref<512xf32>) {
		%cf1 = constant 1.0 : f32
		affine.for %i = 0 to 16 {
		affine.store %cf1, %in[32*%i] : memref<512xf32>
		"memop"(%in, %out) : (memref<512xf32>, memref<512xf32>) -> ()
		%ld1 = affine.load %in[32*%i] : memref<512xf32>
		affine.store %ld1, %out[32*%i] : memref<512xf32>
		}
		return
		}
		// CHECK: affine.store
		// CHECK: affine.load
		// CHECK: affine.store

		// CHECK-LABEL: func @no_forward_cast

		func @no_forward_cast(%in : memref<512xf32>, %out : memref<512xf32>) {
		%cf1 = constant 1.0 : f32
		%cf2 = constant 2.0 : f32
		%m2 = memref.cast %in : memref<512xf32> to memref<?xf32>
		affine.for %i = 0 to 16 {
		ftynseUnsubmitted Not Done Reply Inline Actions Nit: there's no "external" operation in this test AFAICS ftynse: Nit: there's no "external" operation in this test AFAICS
		affine.store %cf1, %in[32*%i] : memref<512xf32>
		affine.store %cf2, %m2[32*%i] : memref<?xf32>
		%ld1 = affine.load %in[32*%i] : memref<512xf32>
		affine.store %ld1, %out[32*%i] : memref<512xf32>
		}
		return
		}
		// CHECK: affine.store
		// CHECK-NEXT: affine.store
		// CHECK-NEXT: affine.load
		// CHECK-NEXT: affine.store

		// Although there is a dependence from the second store to the load, it is
		// satisfied by the outer surrounding loop, and does not prevent the first
		// store to be forwarded to the load.
		ayzhuangUnsubmitted Not Done Reply Inline Actions Comment not correct, please modify. ayzhuang: Comment not correct, please modify.

		// CHECK-LABEL: func @overlap_no_fwd
		func @overlap_no_fwd(%N : index) -> f32 {
		%cf7 = constant 7.0 : f32
		%cf9 = constant 9.0 : f32
		%c0 = constant 0 : index
		%c1 = constant 1 : index
		%m = memref.alloc() : memref<10xf32>
		affine.for %i0 = 0 to 5 {
		affine.store %cf7, %m[2 * %i0] : memref<10xf32>
		affine.for %i1 = 0 to %N {
		%v0 = affine.load %m[2 * %i0] : memref<10xf32>
		%v1 = addf %v0, %v0 : f32
		affine.store %cf9, %m[%i0 + 1] : memref<10xf32>
		}
		}
		// Due to this load, the memref isn't optimized away.
		%v3 = affine.load %m[%c1] : memref<10xf32>
		return %v3 : f32

		// CHECK: affine.for %{{.*}} = 0 to 5 {
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: affine.for %{{.}} = 0 to %{{.}} {
		// CHECK-NEXT: %{{.*}} = affine.load
		// CHECK-NEXT: %{{.}} = addf %{{.}}, %{{.*}} : f32
		// CHECK-NEXT: affine.store %{{.}}, %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: }
		// CHECK-NEXT: }
		// CHECK-NEXT: %{{.}} = affine.load %{{.}}[%{{.*}}] : memref<10xf32>
		// CHECK-NEXT: return %{{.*}} : f32
		ftynseUnsubmitted Not Done Reply Inline Actions Please put the check blocks consistently below or consistently after the function they should match. (Look what the rest of the file does). It's also possible to use the `// -----` marker to separate test cases. ftynse: Please put the check blocks consistently below or consistently after the function they should…
		}

		ftynseUnsubmitted Done Reply Inline Actions Please add a newline. ftynse: Please add a newline.

This is an archive of the discontinued LLVM Phabricator instance.

[MLIR] Correct memrefdataflow behavior in the presence of cast and other operationsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 354929

mlir/lib/Dialect/Affine/Transforms/AffineScalarReplacement.cpp

mlir/test/Dialect/Affine/scalrep.mlir

[MLIR] Correct memrefdataflow behavior in the presence of cast and other operations
ClosedPublic