IR with matrix intrinsics is likely to also contain large vector

operations, which can benefit from early simplifications.

This is the last step in a series of changes to improve code-gen for

code using matrix subscript operators with the C/C++ matrix extension in

CLang, like

using matrix_t = double __attribute__((matrix_type(15, 15))); void foo(unsigned i, matrix_t &A, matrix_t &B) { for (unsigned j = 0; j < 4; ++j) for (unsigned k = 0; k < i; k++) B[k][j] -= A[k][j] * B[i][j]; }