This is an archive of the discontinued LLVM Phabricator instance.

[mlir][bufferization][WIP] Make `TensorCopyInsertionPass` a test pass

Authored by springerm on Nov 29 2022, 5:55 AM.



TensorCopyInsertion should not have been exposed as a pass. This was a flaw in the original design. It is a preparation step for bufferization and certain transforms (that would otherwise be legal) are illegal between TensorCopyInsertion and actual rewrite to MemRef ops. Therefore, even if broken down as two separate steps internally, they should be exposed as a single pass.

This change affects the sparse compiler, which uses TensorCopyInsertionPass. A new SparseBufferizationPass is added to replace all passes in the sparse tensor pipeline from TensorCopyInsertionPass until the actual bufferization (rewrite to memref/non-tensor). It is generally unsafe to run arbitrary passes in-between, in particular passes that hoist tensor ops out of loops or change SSA use-def chains along tensor ops.

Diff Detail

Event Timeline

springerm created this revision.Nov 29 2022, 5:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2022, 5:55 AM
springerm requested review of this revision.Nov 29 2022, 5:55 AM
jreiffers accepted this revision.Nov 29 2022, 6:22 AM

Thanks for the fix!

This revision is now accepted and ready to land.Nov 29 2022, 6:22 AM

(Will wait for Aart's comments.)

aartbik added inline comments.Nov 30 2022, 9:23 AM

At very first glance, I am a bit -1 on moving these passes out of the pipeline into a bufferization pass.
I understand why you are doing this but is there any way you can still do this while not breaking this flow here?

1 ↗(On Diff #478547)

name seems copy paste

Thanks Matthias, I believe that this is the correct thing to do!

51 ↗(On Diff #478547)

Is there any reason that we need to do insertTensorCopy before Sparsification in particular? If we can move all dense related bufferization after sparsification, wouldn't that be more clear? Or am I missing anything? It seems that it could be dangerous to do optimization based on simple SSA def-use chain after InsertTensorCopies, I am a little bit worry that it might cause other issues in the future (or there is other remaining issue in the current pipeline).

If we somehow must perform insertTensorCopy here, what types of optimization are sound afterwards (i.e., what assumption can we hold for the dense tensor SSA value)? is there any general rules that we should follow?

60 ↗(On Diff #478547)

nit: If we want to wrap these passes into the same class, I'd prefer including createPreSparsificationRewritePass in this pass as well, so that Pre- and Post-rewriting passes are in the same place.

The requirements are:

  • TensorCopyInsertion must run before sparsification, sparsetensorconversion, densebufferization (otherwise we may be missing copies).
  • The same pass that runs TensorCopyInsertion should also rewrite all tensor type ops to memref type ops. I.e. all tensor type ops are gone afterwards.

I wanted to keep the passes in the original pipeline but couldn’t find a way. Do you see a better layering?

Is there any reason that we need to do insertTensorCopy before Sparsification

sparsification removes the linalg.generic with loops right? memref loops maybe even? we cannot analyze these anymore. tensorcopyinsertion must run before introducing any memrefs.

It seems that it could be dangerous to do optimization based on simple SSA def-use chain after InsertTensorCopies

no more analyses happen after tensorcopyinsertion. in particular no bufferization analysis

If we somehow must perform insertTensorCopy here, what types of optimization are sound afterwards (i.e., what assumption can we hold for the dense tensor SSA value)? is there any general rules that we should follow?

the short answer is none. ideally do not touch the IR apart from rewriting tensor ops to memref ops. the dense bufferization does a Operation::walk over the IR instead of using the greedypatternrewriter for that reason. the greedypatternrewriter could call folding patterns which are unsafe. basically only BufferizableOpInterface::bufferize (and it’s sparse equivalent) is allowed

Thanks for finding a solution, Matthias. Some feedback below, but assuming no two-pass solution can be made, I would at least like to see this documented and using a better "pass" name for the compound pass.

1 ↗(On Diff #478547)

That was perhaps a bit vague, but I mean that after the renaming that should be


34 ↗(On Diff #478547)

I gave this another good look, and grouping this all together is probably a good way to make sure stuff happens in the right order and with the right information.

I guess I objected a little on creating a "mini"pipeline as part of the larger pipeline (or perhaps a "super"pass on top of our other passes). If you still see a way of putting the two parts you need in two passes and keep the original pipeline, that would have my preference. But guessing that is not feasible, I think my other objection came from the name SparseBufferizationPass.

I think if (1) we give this a better name, like SparsificationPass (which is heavily overloaded by now, so I am very open to better names) (2) we document really well that this is grouping stuff together to form a logical consistent unit (that does all the rewriting and bufferization "atomically" and cannot be split in clean passes), I am okay with this change.

60 ↗(On Diff #478547)

+1 on keeping the pre/post together

springerm added inline comments.Dec 1 2022, 6:11 AM
60 ↗(On Diff #478547)

PreSparsificationRewriting contains a pattern that should not run after TensorCopyInsertion: FoldInvariantYield. This was the reason for one of the miscompiles that we were seeing.

Can we rename PreSparsificationRewriting and PostSparsificationRewriting into something more descriptive (that ideally describes what the pass is actually doing)?

springerm added inline comments.Dec 1 2022, 6:25 AM
34 ↗(On Diff #478547)

Imo SparsificationPass does not describe it very accurately because the pass is doing more than that. This pass basically converts a tensor program into a memref program. In particular, it also handles dense tensor IR. But this is just naming and for you to decide.

Some ideas that came to mind:

  • sparse_tensor::SparseAndDenseBufferizationPass
  • sparse_tensor::BufferizationPass
  • sparse_tensor::ConvertTensorToMemrefPass
  • sparse_tensor::SparsificationPass and we mention in the pass description that it also does other stuff.

If we go with SparsificationPass: There is already a SparsificationPass, so we would have to rename that one. I always thought of it as ConvertSparseLinalgToLoopsPass. Because there is also a ConvertLinalgToLoopsPass which does the same thing, just for the dense case.

92–96 ↗(On Diff #478547)

Do you see a better way than having all these options here as a field? I wanted to use SparseCompilerOptions, but it is defined in Pipelines/Passes.h (not in Transforms). I guess I could create a new SparseBufferizationOptions (or whatever we call the pass).

Peiming added inline comments.Dec 1 2022, 10:02 AM
60 ↗(On Diff #478547)

It should still be okay to put PreSparsificationRewriting into the class (before TensorCopyInsertion)? But I think you are right about renaming, probably that is a better way to make thing more clear.

I think FoldInvariantYield is to enable FuseSparseMulitplyOverAdd, maybe we can just call it FuseSparseOperations?
I do not have a good name for PostSparsificationRewriting though.

What do you think?

aartbik accepted this revision.EditedDec 1 2022, 2:11 PM

Okay, this is LGTM provided that
(1) we name the new "mini"pipeline, sparse_tensor::SparsificationAndBufferizationPass
(2) we move PreSparsificationRewriting into that pass, running first, but just so they are all logically contained
(3) we keep the Pre/PostSparsificationRewriting names; nothing is perfect but these names reflect the current function reasonably well

wrengr added inline comments.Dec 1 2022, 6:58 PM

I wonder if this is still relevant here? Before this differential this conditional bailed out before doing all our various sparse passes, but now it's only doing so after all of SparseBufferizationPass. So I'm thinking the conditional should either be moved into SparseBufferizationPass or else removed entirely

springerm updated this revision to Diff 479569.Dec 2 2022, 2:46 AM
springerm marked 2 inline comments as done.

address comments

This revision was landed with ongoing or failed builds.Dec 2 2022, 6:39 AM
This revision was automatically updated to reflect the committed changes.