Page MenuHomePhabricator

[Polly] Generalize the pattern matching to the case of tensor contractions.
Needs ReviewPublic

Authored by gareevroman on Nov 21 2021, 6:10 AM.

Details

Reviewers
Meinersbur
bollu
Summary

The pattern matching optimization of Polly detects and optimizes dense general matrix-matrix multiplication. The generated code is close to high performance implementations of matrix-matrix multiplications, which are contained in manually tuned libraries [1]. The described pattern matching optimization is a particular case of tensor contraction optimization, which was introduced in [2].

This patch generalizes the pattern matching to the case of tensor contractions using the algorithm described in [2]. Following the ideas introduced in [3], it logically represents tensor contraction operands as matrix multiplication operands and uses the approach presented in [1].

Optimization of tensor contractions will be added in the next patch. These modifications can be found in https://github.com/gareevroman/llvm-project.

[1] - Low T.M., Igual F.D., Smith T.M., Quintana-Orti E.S. Analytical Modeling Is Enough for High-Performance BLIS // ACM Transactions on Mathemat­ical Software. 2016. Vol. 43, no. 2. P. 12:1—12:18. DOI: 10.1145/2925987.

[2] - Gareev R., Grosser T., Kruse M. High-Performance Generalized Tensor Op­erations: A Compiler-Oriented Approach // ACM Transactions on Architec­ture and Code Optimization (TACO). 2018. Vol. 15, no. 3. P. 34:1–34:27. DOI: 10.1145/3235029.

[3] - Matthews D. High-Performance Tensor Contraction without BLAS // SIAM Journal on Scientific Computing. 2018. Vol. 40, no. 1. P. C 1—C 24. DOI: 110.1137/16m108968x.

Diff Detail

Event Timeline

gareevroman created this revision.Nov 21 2021, 6:10 AM
gareevroman requested review of this revision.Nov 21 2021, 6:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 21 2021, 6:10 AM

Thanks for upstreaming your tensor optimization!

The naming of polly-pattern-matching-based-opts suggests that it includes all pattern-based optimizations, yet this introduces another flag polly-pattern-matching-based-tc-opts. I'd prefer polly-pattern-matching-based-opts controlling both optimizations, and then additional flags for enabling matrix-multiplication and tensor optimizations. Alternatively, rename polly-pattern-matching-based-opts to e.g. polly-matmul-opt. Also, you are adding this functionality into a file called MatmulOptimizer and a function called tryOptimizeMatMulPattern. Since matrix-multiplication is a tensor contraction, is ts-opt supposed to supperseed matrix multiplication? In short, I would like to know what what the relation between those two optimizations should be. I'd prefer to not have to maintain two optimizations if one is strictly more powerful than the other.

I'd enjoy some occasional comments and grouping of statements (empty lines) inside the functions in addition to the doxygen comments. For instance isTCOperandAcc is just a wall of code. For such property-checking functions, ideally each return should mention what property is violated here and why this property is required to be compatible with tensor optimization.

polly/lib/Transform/MatmulOptimizer.cpp
180–181

Please add more details on what the members represent.

187–189

std::set is a high-overhead implementation. Consider using DenseSet or SmallDenseSet. See https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-denseset-h

190–196

Is there an argument to use 30 and small size? If not, consider using just SmallVector<int>.

1091

[style] No reason to make this a wide string literal, especially if just used as an assertion failed message.

Apples to other occurrences as well.

1110
1116

or introduce intFromIslSize.

1149–1150

SmallVectorImpl is not specific to what the vector's small size is.

1160

Consider using polly::rangeIslSize for iterating over dimensions.

1187

Although already in an anon namespace, the other methods add static as well. I found it helps the compiler to warn if a static function is unnused.

1189–1190

Do you know of #include <llvm/ADT/SetOperations.h>? Unfortunately, these modify one set rather than returning a new set.

1219
1229–1230

This computes whether two sets a disjoint, it should not be required to compute the intersection.

1246
1284

getAccessesInOrder requires Stmt to not be a RegionStmt. Please add a test for it.

1294

If any of the returns are executed, what causes the pattern to be rejected (it's not return false)?

1363–1364
1365–1367

The check should not depend on n_basic_set, which is fragile and depends on whether on eg. coalesce was successful. Consider using something like polly::getConstant.

1411–1412

Consider lexmin_pw_multi_aff/lexmax_pw_multi_aff

polly/lib/Transform/ScheduleOptimizer.cpp
459–463

Instead of modifying the idea of whether a node is tilable, consider introducing another constraint-checking function, as we should have done with prevectorization as well.

Thank you very much for the review!

I've tried to address all comments. Additionally, I've updated the optimization for the case of filter nodes (e.g., polly/test/ScheduleOptimizer/pattern-matching-based-opts-after-delicm.ll). I believe that the optimization of tensor contractions is strictly more powerful than the optimization of matrix-multiplications. So, I'd suggest to replace the optimization of matrix-multiplications with it eventually.

gareevroman marked 19 inline comments as done.Dec 12 2021, 1:30 AM
gareevroman added inline comments.
polly/lib/Transform/MatmulOptimizer.cpp
1116

As far I understand, Dimensions.size() returns a value of type size_t instead of a value of the type isl_size. So, in the new version I used the unsigned type to avoid the cast.

1229–1230

That check is redundant. Thanks.

1246

As far as I understand, we cannot do this here because of the assignment to TCI.ADimensions and TCI.BDimensions

1284

I’ve added a check to containsOnlyTCAcc. Could you clarify how the test case should look like? Should it be a region statement that contains a matrix multiplication with right order of memory accesses?

1365–1367

I think that this check was redundant. I’ve removed it.

gareevroman marked 5 inline comments as done.Dec 12 2021, 1:31 AM