This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Affine] Add helper functions to allow reordering affine.apply operands and decompose the ops into smaller components
ClosedPublic

Authored by nicolasvasilache on Mar 9 2023, 4:47 AM.

Details

Summary

Care is taken to order operands from least hoistable to most hoistable and to process subexpressions in the same
order.

This allows exposing more oppportunities for licm, cse and strength reduction.

Such a step should typically be applied while we still have loops in the IR and just before lowering affine ops to arith.
This is because the affine.apply canonicalization currently tries to maximally compose chains of affine.apply operations
and could undo the effects of these decompositions.

Depends on: D145784

Diff Detail

Event Timeline

nicolasvasilache requested review of this revision.Mar 9 2023, 4:47 AM
Herald added a project: Restricted Project. · View Herald Transcript
nicolasvasilache planned changes to this revision.Mar 9 2023, 4:47 AM
nicolasvasilache added a reviewer: qcolombet.

add test and fix ordering assumptions to account for local variables.

Drop debug spew

nicolasvasilache edited the summary of this revision. (Show Details)

Format

nicolasvasilache edited the summary of this revision. (Show Details)

Rebase

Add licm and cse to test

ftynse accepted this revision.Mar 13 2023, 5:38 PM
ftynse added inline comments.
mlir/include/mlir/Dialect/Affine/Transforms/Transforms.h
43–47

Looks unnecessary here.

mlir/lib/Dialect/Affine/Transforms/DecomposeAffineOps.cpp
44

Nit: I think to_vector no longer needs the number of stack elements.

This revision is now accepted and ready to land.Mar 13 2023, 5:38 PM
springerm accepted this revision.Mar 14 2023, 1:00 AM
qcolombet accepted this revision.Mar 14 2023, 1:22 AM
qcolombet added inline comments.
mlir/test/lib/Dialect/Affine/TestDecomposeAffineOps.cpp
30

For the integration in IREE, I had to "promote" this test pass into an actual pass (see https://github.com/iree-org/iree-llvm-fork/blob/4ef84146ad72a6a5878697daf9658844cefb0a22/mlir/lib/Dialect/Affine/Transforms/DecomposeAffineOps.cpp#L183).
Could you front load this refactoring in this PR?

Unless you have a different plan for the IREE integration.

Food for thoughts:
Since the canonicalization is going to undo the decomposition, should we rewrite the expression (e.g., as part of the canonicalization) to the ordering we want?
The downside is we would still rely on the backend to do the hoisting of the loop invariant stuff.

Right now, what happens is:
Let's say we have:

affine.apply affine_map<()[s0, s1, s2] -> (s0 * 1024 + s1 * 32 + s2)>()[%loopVariant, %inv1, %inv2]

This decomposes in:

%inv1x32 = affine.apply affine_map<()[s0] -> (s0 * 32)>()[%inv1]
%inv2_ = affine.apply affine_map<()[s0] -> (s0)>()[%inv2]
%inv1x32_plus_inv2 = affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()[%inv1x32, %inv2_]
%loopVariantx1024 = affine.apply affine_map<()[s0] -> (s0 * 1024)>()[%loopVariant]
%res = affine.apply affine_map<()[s0, s1] -> (s0 + s1)>()[%loopVariant, %inv1x32_plus_inv2]

Then we run licm and lower. Hooray, we did the hoisting and produce the code we wanted.

Now, let's say we don't lower right away and happen to run through canonicalization, the resulting expression will look like:

affine.apply affine_map<()[s0, s1, s2] -> (s0 + s1 * 32 + s2 * 1024)>()[%inv1, %inv2, %loopVariant]

Now if we lower this expression, the backend is still able to do licm and whatnot.

To summarize, I like the decomposition approach as it is more flexible, but it is easy to undo it (via canonicalization) so I wonder if we should just make the expression with the invariant symbols first the canonical representation.

nicolasvasilache marked 3 inline comments as done.Mar 14 2023, 4:02 AM
nicolasvasilache added inline comments.
mlir/test/lib/Dialect/Affine/TestDecomposeAffineOps.cpp
30

Given the interaction with other passes and transforms I was thinking we want this functionality to just be called from a more comprehensive pass.

I wouldn't want to chase phase orderings between this, canonicalize, licm, lower affine and stuff related to ldmatrix.
I was thinking we'd have a new pass that puts the things properly together on the IREE side (or even upstream once we know exactly what we need) ?