Page MenuHomePhabricator

[Polly] Generalize the pattern matching to the case of tensor contractions.
Needs ReviewPublic

Authored by gareevroman on Sun, Nov 21, 6:10 AM.

Details

Reviewers
Meinersbur
bollu
Summary

The pattern matching optimization of Polly detects and optimizes dense general matrix-matrix multiplication. The generated code is close to high performance implementations of matrix-matrix multiplications, which are contained in manually tuned libraries [1]. The described pattern matching optimization is a particular case of tensor contraction optimization, which was introduced in [2].

This patch generalizes the pattern matching to the case of tensor contractions using the algorithm described in [2]. Following the ideas introduced in [3], it logically represents tensor contraction operands as matrix multiplication operands and uses the approach presented in [1].

Optimization of tensor contractions will be added in the next patch. These modifications can be found in https://github.com/gareevroman/llvm-project.

[1] - Low T.M., Igual F.D., Smith T.M., Quintana-Orti E.S. Analytical Modeling Is Enough for High-Performance BLIS // ACM Transactions on Mathemat­ical Software. 2016. Vol. 43, no. 2. P. 12:1—12:18. DOI: 10.1145/2925987.

[2] - Gareev R., Grosser T., Kruse M. High-Performance Generalized Tensor Op­erations: A Compiler-Oriented Approach // ACM Transactions on Architec­ture and Code Optimization (TACO). 2018. Vol. 15, no. 3. P. 34:1–34:27. DOI: 10.1145/3235029.

[3] - Matthews D. High-Performance Tensor Contraction without BLAS // SIAM Journal on Scientific Computing. 2018. Vol. 40, no. 1. P. C 1—C 24. DOI: 110.1137/16m108968x.

Diff Detail

Event Timeline

gareevroman created this revision.Sun, Nov 21, 6:10 AM
gareevroman requested review of this revision.Sun, Nov 21, 6:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptSun, Nov 21, 6:10 AM

Thanks for upstreaming your tensor optimization!

The naming of polly-pattern-matching-based-opts suggests that it includes all pattern-based optimizations, yet this introduces another flag polly-pattern-matching-based-tc-opts. I'd prefer polly-pattern-matching-based-opts controlling both optimizations, and then additional flags for enabling matrix-multiplication and tensor optimizations. Alternatively, rename polly-pattern-matching-based-opts to e.g. polly-matmul-opt. Also, you are adding this functionality into a file called MatmulOptimizer and a function called tryOptimizeMatMulPattern. Since matrix-multiplication is a tensor contraction, is ts-opt supposed to supperseed matrix multiplication? In short, I would like to know what what the relation between those two optimizations should be. I'd prefer to not have to maintain two optimizations if one is strictly more powerful than the other.

I'd enjoy some occasional comments and grouping of statements (empty lines) inside the functions in addition to the doxygen comments. For instance isTCOperandAcc is just a wall of code. For such property-checking functions, ideally each return should mention what property is violated here and why this property is required to be compatible with tensor optimization.

polly/lib/Transform/MatmulOptimizer.cpp
172–173

Please add more details on what the members represent.

179–181

std::set is a high-overhead implementation. Consider using DenseSet or SmallDenseSet. See https://www.llvm.org/docs/ProgrammersManual.html#llvm-adt-denseset-h

182–188

Is there an argument to use 30 and small size? If not, consider using just SmallVector<int>.

1068

[style] No reason to make this a wide string literal, especially if just used as an assertion failed message.

Apples to other occurrences as well.

1087
1093

or introduce intFromIslSize.

1126–1127

SmallVectorImpl is not specific to what the vector's small size is.

1137

Consider using polly::rangeIslSize for iterating over dimensions.

1164

Although already in an anon namespace, the other methods add static as well. I found it helps the compiler to warn if a static function is unnused.

1166–1167

Do you know of #include <llvm/ADT/SetOperations.h>? Unfortunately, these modify one set rather than returning a new set.

1196
1206–1207

This computes whether two sets a disjoint, it should not be required to compute the intersection.

1223
1261

getAccessesInOrder requires Stmt to not be a RegionStmt. Please add a test for it.

1271

If any of the returns are executed, what causes the pattern to be rejected (it's not return false)?

1340–1341
1342–1344

The check should not depend on n_basic_set, which is fragile and depends on whether on eg. coalesce was successful. Consider using something like polly::getConstant.

1388–1389

Consider lexmin_pw_multi_aff/lexmax_pw_multi_aff

polly/lib/Transform/ScheduleOptimizer.cpp
459–463

Instead of modifying the idea of whether a node is tilable, consider introducing another constraint-checking function, as we should have done with prevectorization as well.