The Matmul intrinsic performs matrix multiplication on rank 2 arrays.
The intrinsic is lowered to a runtime call.
This is part of the upstreaming effort from the fir-dev branch in [1].
[1] https://github.com/flang-compiler/f18-llvm-project
Co-authored-by: Jean Perier <jperier@nvidia.com>
Co-authored-by: Valentin Clement <clementval@gmail.com>