The pattern matching optimization of Polly detects and optimizes dense general matrix-matrix multiplication. The generated code is close to high performance implementations of matrix-matrix multiplications, which are contained in manually tuned libraries [1]. The described pattern matching optimization is a particular case of tensor contraction optimization, which was introduced in [2].
This patch generalizes the pattern matching to the case of tensor contractions using the algorithm described in [2]. Following the ideas introduced in [3], it logically represents tensor contraction operands as matrix multiplication operands and uses the approach presented in [1].
Optimization of tensor contractions will be added in the next patch. These modifications can be found in https://github.com/gareevroman/llvm-project.
[1] - Low T.M., Igual F.D., Smith T.M., Quintana-Orti E.S. Analytical Modeling Is Enough for High-Performance BLIS // ACM Transactions on Mathematical Software. 2016. Vol. 43, no. 2. P. 12:1—12:18. DOI: 10.1145/2925987.
[2] - Gareev R., Grosser T., Kruse M. High-Performance Generalized Tensor Operations: A Compiler-Oriented Approach // ACM Transactions on Architecture and Code Optimization (TACO). 2018. Vol. 15, no. 3. P. 34:1–34:27. DOI: 10.1145/3235029.
[3] - Matthews D. High-Performance Tensor Contraction without BLAS // SIAM Journal on Scientific Computing. 2018. Vol. 40, no. 1. P. C 1—C 24. DOI: 110.1137/16m108968x.
nice