This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Linalg] Improve the logic to perform tile and fuse with better dependence tracking.
ClosedPublic

Authored by mravishankar on Nov 1 2020, 9:33 PM.

Details

Summary

This change does two main things

  1. An operation might have multiple dependences to the same producer. Not tracking them correctly can result in incorrect code generation with fusion. To rectify this the dependence tracking needs to also have the operand number in the consumer.
  2. Improve the logic used to find the fused loops making it easier to follow. The only constraint for fusion is that linalg ops (on buffers) have update semantics for the result. Fusion should be such that only one iteration of the fused loop (which is also a tiled loop) must touch only one (disjoint) tile of the output. This could be relaxed by allowing for recomputation that is the default when oeprands are tensors, or can be made legal with promotion of the fused view (in future).

Diff Detail

Event Timeline

mravishankar created this revision.Nov 1 2020, 9:33 PM
mravishankar requested review of this revision.Nov 1 2020, 9:33 PM
hanchung added inline comments.Nov 4 2020, 6:29 AM
mlir/lib/Dialect/Linalg/Analysis/DependenceAnalysis.cpp
167–169

IIRC, "W" would be index + number of inputs, and "R" would be number of inputs. I feel we could miss any of one in a future change, so I think it would be good to have something like getLinalgOpViewW(int idx, LinalgOp op) and getLinalgOpViewR(int idx, LinalgOp op) and both of them will return a LinalgOpView.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
707

Do we want tabs \t for indents?

Can we split out the NFC part that extends support for operand + index in the dependence analysis and land that separately ?

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
540

It seems to me you are computing a simple projection of map.
Is a composition of map with AffineMap::getSubMap appropriate for this?

584

You probably want to document the case for which you have done all this:
A + A^T where A comes from a matmul ?

Generally makes sense, thanks for fixing some of the underlying issues.
I left some things to address so I'll wait till next week for approval.
Note that in tensor world, recompute will become much more easy and side effects will become optimizations.
Would we good to think about potential refactorings of the fusable dependences towards using in cost models (not in this revision of course :) ).

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
688–689

The control flow under this loop is somewhat unclear to me.
Some conditions skip the current operand, some conditions exit the functions and the last one breaks out of the loop.

Can we restructure/fission this loop to first rule out the empty case ?
Processing the data 2 or 3 times seems minor compared to the expected gain in readability.

703

expose + reuse "areOpsInSameBlock" ?

723–724

typo

841–844

this seems to leak internal linalgop operand ordering .
Can we restructure and use input/output idx to hide such logic below the LinalgStructuredOpsInterface ?

842

Please add some named accessors to make this nicer to read.

mravishankar marked 3 inline comments as done.Nov 7 2020, 11:20 PM

Can we split out the NFC part that extends support for operand + index in the dependence analysis and land that separately ?

This seems a bit involved. I will have to first update the old code in Fusion.cpp to use the new dependence tracking and then change the code. Could we land it as one patch?

mlir/lib/Dialect/Linalg/Analysis/DependenceAnalysis.cpp
167–169

Changed it to use the getIndexOfShapedOperand method.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
540

I couldnt work out how to use a compose to do this. Suggestions?

584

Right now this case doesnt really come here. It is treated as an unfusable dependence. When I add the promotion (or test it on tensors) I can add that example. It is actually easier since we just need to take the closest outer parallel loops.

688–689

Split the loop to first find the index of operand to fuse, and then do the fusion of those

707

Maybe, but its a simple check here. Not sure where to put that function.

842

I am not sure I understand what you mean by named accessors

mravishankar marked an inline comment as done.

Rebase and address comments

antiagainst added inline comments.Nov 9 2020, 3:02 PM
mlir/include/mlir/Dialect/Linalg/Analysis/DependenceAnalysis.h
48

Nit: I'd suggest to use operandIndex. Number usually means total count to me.

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h
434

SmallSet? I'd assume typically we don't have a large number of indices to fuse.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
538

A fusable loop's iteration space only touches ...

550

The actual fusable loops are ..

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
540

On second look, not with getSubMap (which is already a composition with a permutation) but I think something like this could work for you:

SmallVector<AffineExpr, 4> projection;
for (auto attr : llvm::enumerate(iteratorTypes))
  if (!isParallelIterator(attr.value()))
    projection.push_back(getAffineDimExpr(attr.index(), ctx));
unsigned projectionRank = projection.size();
projection.append(iteratorTypes.size() - projectionRank, getAffineConstant(0, ctx));
return AffineMap::get(projection.size(), 0, projectionRank, ctx).compose(map).getMajorSubMap(projectionRank);

The first n-1 lines could be refactored into a new AffineMap::getProjectionMap(ArrayRef<unsigned> positions, unsigned codomainRank).

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
540

actually 3 more things:

  1. the proper expression is prob map.compose(AffineMap::get(...))
  2. s/!ifParallelIterator/isParallelIterator
  3. you may need to drop the 0 results of the output AffineMap for your use case.
hanchung added inline comments.Nov 10 2020, 8:26 AM
mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOpsInterface.td
578

s/then/the

581

s/retTyp/retTy

586–588

format this? (see above examples)

593

s/then/the

596

s/retTyp/retTy

mlir/lib/Dialect/Linalg/Analysis/DependenceAnalysis.cpp
167–169

nice!

248–253

[optional] Do we want to capture the value for consumer? it's only used in a later statement -- Value consumerView = ...

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
370

I think it's okay to pass without reference, because it is unsigned and we won't update the variable. It's a bit like passing an unsigned value without reference to a function.

nicolasvasilache accepted this revision.Nov 11 2020, 8:25 AM
This revision is now accepted and ready to land.Nov 11 2020, 8:25 AM
mravishankar marked 9 inline comments as done.

Address comments

mravishankar marked 7 inline comments as done.

Addressing missed minor comments.

mlir/lib/Dialect/Linalg/Analysis/DependenceAnalysis.cpp
248–253

True, but the code is less readable IMO.

mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
540

Nice! It actually helped me concretize something I was stuck at downstream of this change. Added the method as you asked for. PTAL

hanchung accepted this revision.Nov 11 2020, 11:46 PM
hanchung added inline comments.
mlir/lib/Dialect/Linalg/Transforms/Fusion.cpp
793–795

nit: remove trivial braces

It's good to fix to me because you touched this condition. But it's okay to leave it this way because it doesn't really relate to the change...