This patch uses the TileInfo introduced in D77550 to generate a loop
nest for tiled matrix multiplication, instead of generating the
unrolled code for the whole multiplication. This makes code-generation
more scalable for larger matrixes.
Initially loops are only used if both the number of rows and columns are
divisible by the tile size. Other cases will be added as follow-up.
clang-tidy: error: 'llvm/Transforms/Utils/MatrixUtils.h' file not found [clang-diagnostic-error]
not useful