This is an archive of the discontinued LLVM Phabricator instance.

[MLIR][Affine] Replace AffineScope by its complement trait
Needs RevisionPublic

Authored by bondhugula on Oct 31 2021, 12:05 PM.

Details

Summary

Replace AffineScope trait by its complement trait. Introduce
ExtendsAffineScope trait that is the complement of AffineScope and
remove the latter.

This change does not bring any additional representational power but
changes the "default" trait with respect to ops creating new affine
scopes.

Any region holding op now starts an affine scope. Ops like affine.for,
affine.if, affine.parallel further extend the affine scope either
started or extended by their enclosing op. affine.for,
affine.parallel, and affine.if now have the ExtendsAffineScope
trait.

As a result of this change:

  • no additional trait is needed for a region holding op (like various FuncOp's, scf.for, scf.if, scf.execute_region, etc.) to allow affine load/stores/for/ifs in a larger set of cases; symbols that are defined in their bodies would be valid symbols for the affine ops.
  • ops outside the affine dialect can add an ExtendsAffineScope trait in order to have their block arguments treated as dimensional identifiers.

Clean up documentation and code comments.

This change also has the benign affect of dimensions becoming symbols in
several contexts (since arbitrary region holding ops are able to define
symbols at their top-level): hence the several mechanical updates to
test cases.

Losely speaking, letting all region-holding ops by default
start an affine scope leads to more "symbols" by default, and admits
affine IR in a significantly larger set of use cases by default.

Some additional discussion:
https://llvm.discourse.group/t/should-linalg-indexed-generic-allow-for-affine-operations-on-its-body/2889/14

Diff Detail

Event Timeline

bondhugula created this revision.Oct 31 2021, 12:05 PM
bondhugula requested review of this revision.Oct 31 2021, 12:05 PM
bondhugula edited the summary of this revision. (Show Details)

Nice!

This LGTM, but it is worth having someone like @ftynse and/or @dcaballe to look at this as well.

mlir/docs/Dialects/Affine.md
75

Should we leave an empty line between the title and the paragraph?

83

Can this be any constant-like operation?

Extra comments.

nicolasvasilache added a comment.EditedOct 31 2021, 1:33 PM

Thanks for implementing this inversion, this is more in line with the traditional way of thinking about polyhedral SCoPs and is a step in the right modeling direction in MLIR.

Can you highlight the specific parts that require changes in the Linalg tests ?
I won't start reviewing before later this week but this change smells funny.
I get that any op without the new Trait creates a new affine scope but this is orthogonal to what an affine_apply by itself can use locally as a dim or a symbol.
The only relevant rule there remains that "dims compose" and "symbols concatenate".

Specifically, is there anything that would make it harder to move affine.apply/min/max to arith.affine_apply/min/max ?

bondhugula added a comment.EditedNov 1 2021, 7:38 AM

Thanks for implementing this inversion, this is more in line with the traditional way of thinking about polyhedral SCoPs and is a step in the right modeling direction in MLIR.

Can you highlight the specific parts that require changes in the Linalg tests ?
I get that any op without the new Trait creates a new affine scope but this is orthogonal to what an affine_apply by itself can use locally as a dim or a symbol.

I've covered this in the commit summary. More SSA values become symbols and so they switch to the symbol positions among affine.apply operands. canonicalize (canonicalizeMapAndOperands) will canonicalize to a symbol if it can be a symbol - although it was also a valid dim.

Specifically, is there anything that would make it harder to move affine.apply/min/max to arith.affine_apply/min/max ?

This is unrelated to this patch itself and we should have this discussion on discourse.

bondhugula added inline comments.Nov 1 2021, 7:43 AM
mlir/docs/Dialects/Affine.md
83

I guess any constant-like operation that generates an index type. I'll update the line and the check. Thanks.

bondhugula added inline comments.Nov 1 2021, 7:44 AM
mlir/docs/Dialects/Affine.md
83

Actually, the check is already fine (it's only looking for a constant-like operation); the doc needs an update.

nicolasvasilache requested changes to this revision.EditedNov 2 2021, 3:21 AM

Thanks for implementing this inversion, this is more in line with the traditional way of thinking about polyhedral SCoPs and is a step in the right modeling direction in MLIR.

Can you highlight the specific parts that require changes in the Linalg tests ?
I get that any op without the new Trait creates a new affine scope but this is orthogonal to what an affine_apply by itself can use locally as a dim or a symbol.

I've covered this in the commit summary.

You've covered that it happens but not why it happens, the why is important.

More SSA values become symbols and so they switch to the symbol positions among affine.apply operands. canonicalize (canonicalizeMapAndOperands) will canonicalize to a symbol if it can be a symbol - although it was also a valid dim.

Ok, so the proper answer to my question is that affine.apply canonicalization has assumptions related to AffineScope that behave differently under this patch.
This actually happens in canonicalizePromotedSymbols.
I had to introduce this in 071ca8da918a5aed4758c4b4e27b946663adce58 as a counterpart to promoteComposedSymbolsAsDims.
The rationale at the time was to reduce the mess created by the multi-result + chains of AffineApplyOp design which had its own separate implementation that did not agree with AffineMap::compose.
As time passed, things got better and all this technical debt could be cleaned up at long last.

I think it is time to drop this AffineScope-dependent rewrite from affine.apply canonicalization and make it into a separate opt-in pattern (if you still need it, I'd be fine just dropping it altogether).
In any case, this revision should not change any Linalg test.

Specifically, is there anything that would make it harder to move affine.apply/min/max to arith.affine_apply/min/max ?

This is unrelated to this patch itself and we should have this discussion on discourse.

It is related to this patch because it is a test of separation of concerns: there is a clear issue atm.
Deciding whether / when to make this move is a topic for discourse indeed.

This revision now requires changes to proceed.Nov 2 2021, 3:21 AM
bondhugula added a comment.EditedNov 2 2021, 4:59 AM

Thanks for implementing this inversion, this is more in line with the traditional way of thinking about polyhedral SCoPs and is a step in the right modeling direction in MLIR.

Can you highlight the specific parts that require changes in the Linalg tests ?
I get that any op without the new Trait creates a new affine scope but this is orthogonal to what an affine_apply by itself can use locally as a dim or a symbol.

I've covered this in the commit summary.

You've covered that it happens but not why it happens, the why is important.

More SSA values become symbols and so they switch to the symbol positions among affine.apply operands. canonicalize (canonicalizeMapAndOperands) will canonicalize to a symbol if it can be a symbol - although it was also a valid dim.

Ok, so the proper answer to my question is that affine.apply canonicalization has assumptions related to AffineScope that behave differently under this patch.
This actually happens in canonicalizePromotedSymbols.

Since the affine.apply has dimensional and symbolic operands, it is expected to behave differently when the surrounding affine scope changes. This patch changes the default trait and so the canonicalization will lead to a different result. I thought this was pretty clear from the commit summary and the expln above.

AffineScope-dependent rewrite from affine.apply canonicalization and make it into a separate opt-in pattern (if you still need it, I'd be fine just dropping it altogether).

In case you don't need an affine scope dependent affine.apply, you'll have to consider using a new "affine.apply" op or another reusable mechanism to achieve that. It's not appropriate to expect the author of this patch to do that for you -- feel free to design and do that yourself. It would also be unreasonable to expect this patch to wait until then. This revision is limited to changing the default trait.

In any case, this revision should not change any Linalg test.

Since Linalg uses affine.apply, this would impact its tests just like it impacts other tests. I am going to only update the test cases here and limit the revision to changing the default trait.

mlir/docs/Dialects/Affine.md
78

Is this actually always true?

Imagine calling a func from GPU and passing it threadIdx.x as an argument.
Is it safe to consider this a symbol ?

I can imagine we could create examples where the dependence analysis would be corrupted?

mlir/docs/Traits.md
268

Nice illustration.

mehdi_amini added inline comments.Nov 2 2021, 3:29 PM
mlir/docs/Dialects/Affine.md
78

Why wouldn't it be safe to consider this a symbol? How is it different from regular CPU functions and function parameters in general?

ftynse requested changes to this revision.Nov 4 2021, 6:15 AM

I think this goes in the right direction and only have one concern about two versions of isValidDim being seemingly inconsistent for block arguments, plus a couple of documentation suggestions.

IMO, changes to Linalg tests are indeed mechanical and thus should not be problematic. It is worth verifying that the expected canonicalizations still apply in Linalg pipelines though (@nicolasvasilache may have some end-to-end tests). Linalg only uses affine.apply for expression simplification purposes and I would expect canonicalizations to apply equally well regardless of operands being dimensions or symbols. Skimming through the affine maps in those tests looks like everything is fine. Certainly, there is the compose vs. concatenate difference that @nicolasvasilache points out, but it should be problematic either as long as we are also running the "operand deduplication" canonicalization for affine.apply that can remove unused dimensions.

I tend to agree that the discussion on affine.apply/min/max being factored is at least tangentially relevant to this patch. They both concern canonicalization of affine.apply/min/max and it might be surprising, although not incorrect, that the canonical form of these operations may be different depending on enclosing operations. We can consider this too surprising to be always desirable and thus factor out simplification of affine.apply/min/max based on affine value categorization rules into a separate pass.

mlir/docs/Dialects/Affine.md
78

I think this description can benefit from an example of how polyhedral analyses are supposed to reason about nested affine scopes, maybe link to the example in the trait documentation.

Treating any value defined at the affine scope level as a symbol looks safe to me as is provided that we don't attempt any nested scope interaction.

Regarding the GPU example with threadIdx.x as function argument, it is not a problem as long as we are not including the "virtual loop" corresponding treadIdx.x into the analysis. We can analyze and transform code within a single "iteration" with fixed threadIdx.x. This connects to the above: we don't reason across scopes, at least not implicitly. FWIW, fixing thread ids as symbols is exactly what PPCG does in GPU code generation.

89

Nit: now that dim also takes the position of the dimension as a value, I suppose we should also require that dimension to be a valid symbol?

mlir/lib/Dialect/Affine/IR/AffineOps.cpp
272

Looking at this, I realize that it is worth discussing block arguments in the documentation. So far, it only mentions region arguments, i.e. arguments of the entry block. IMO, block arguments can be treated similarly to op results, that is, they are valid symbols if the region that contains the block is attached to an operation without the ExtendsAffineScope trait. And valid dimensions in any case.

I don't know if we want to allow operations with the ExtendsAffineScope trait to have more than one block, maybe we can stay conservative for now and have a verifier check on the trait.

296–305

I'm not sure how this connects to isValidDim(Value) overload that says BlockArguments are always valid dimensions, looks like a contradiction unless I am missing something.

I tend to agree that the discussion on affine.apply/min/max being factored is at least tangentially relevant to this patch. They both concern canonicalization of affine.apply/min/max and it might be surprising, although not incorrect, that the canonical form of these operations may be different depending on enclosing operations. We can consider this too surprising to be always desirable and thus factor out simplification of affine.apply/min/max based on affine value categorization rules into a separate pass.

Yes, at least giving a good effort at independently factoring out the simplification rule into a separate pass / pattern that is applied with additional control seems like the constructive step forward.
There are still things we, collectively, do not fully understand and my Linalg gatekeeper bell rings when I see such changes: time to take the flashlight and see what that noise is about.

mlir/docs/Dialects/Affine.md
78

@mehdi_amini it is a bit tricky and I am not sure, which is why I ask:
threadId is both a symbol from the point of view of a single thread (i.e. it is exactly one of the value in [0, numThreads))
it is also not a symbol from the point of view of the process (i.e. the union of all threads: it is a symbol that take all values in [0, numThreads) )

I put such duality in the past to good use but it was always clear that we were done with parallelization and dependence analysis.
Here we cannot rely no such assumption.

I imagine we could construct examples where dependence analysis could be messed up by this duality ?

ftynse added inline comments.Nov 4 2021, 8:49 AM
mlir/docs/Dialects/Affine.md
78

Only if your dependence analysis reasons across different scopes, which it should not under this patch IIUC. It sees the nested scope as an essentially opaque op and should not attempt reasoning about its regions, just assume it can access anything. (Later, we should be able to write a smarter analysis that overapproximates the internals.) That's why I asked for a clarification in the doc as to what happens in the case of nested scopes.

If it only reasons inside one scope, you should never have thread id as alternatively symbol or dimension.

mehdi_amini added inline comments.Nov 4 2021, 5:49 PM
mlir/docs/Dialects/Affine.md
78

Ah I see what your were asking about originally now Nicolas, thanks Alex for presenting the "cross scope" problem.

In the proposed model I would see the iteration space offered by the GPU grid of thread as a non-affine scope here: the affine scope is always the view of a single thread when you get to this level. Note that this is also a problem with a simple affine loop nest with a function call: the function boundary acts as a blocker for the affine scope definition.

On the other hand, the gpu.launch with the region form can likely express the entire loop nest I think?

Seeing whether we can revive this in light of recent interest in: https://discourse.llvm.org/t/affineif-inside-a-linalg-operations/64544

More SSA values become symbols and so they switch to the symbol positions among affine.apply operands. canonicalize (canonicalizeMapAndOperands) will canonicalize to a symbol if it can be a symbol - although it was also a valid dim.

We have seen this type of intrusive simplifications in multiple other places through the codebase, each and every time, the solution has been to not make this a blanket canonicalization but opt-in via more control.
It would make sense that an affine-canonicalize pass would make these foldings.

Another potential direction is to only apply the intrusive canonicalize to only the uses that are immediately within an ExtendsAffineScope (I haven't analyzed the full implication of this yet, so maybe less relevant).

mlir/docs/Traits.md
278

It would be nice to discuss the relationship between affine scope and analyses / optimizations.

In particular, what if op11 and op12 contain unknown side-effecting ops that one cannot summarize: what is the granularity at which e.g dependence analysis occurs?
My intuition is that only AS1 is analyzable and transformable.
In particular that if op_with_extends_affine_scope_trait is actually affine.parallel, the IR may be invalid, UB or racy by construction depending on how you want to define this.

Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2022, 1:34 AM
mehdi_amini added inline comments.Aug 22 2022, 2:46 AM
mlir/docs/Traits.md
278

With an affine.parallel, could we just consider that the "unknown side-effects" in op11/op12 are guarantee'd to not conflict or there is UB because of the race condition?

mlir/docs/Traits.md
278

This may be a bridge too far though .. in my mind racy by construction can very well conflict but is not UB.

Lock-free like algorithms where multiple threads compute and commit the same results or RMW-like atomics are both important things to be able to represent.

But atomic updates aren’t races :)
That said it may be tricky for affine analysis consistency then?