This patch adds initial fusion for load/multiply/store chains of matrix

operations.

The patch contains roughly two parts:

- Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).

First, we ensure that both loads of the multiply operands do not alias the store. If they do, we create new non-aliasing copies of the operands. Note that this may introduce new basic block. Finally we process TileSize x TileSize blocks. That is: load tiles from the input operands, multiply and store them.

- Identify fusion candidates & matrix instructions.

As a first step, collect all instructions with shape info and fusion candidates (currently @llvm.matrix.multiply calls). Next, try to fuse candidates and collect instructions eliminated by fusion. Finally iterate over all matrix instructions, skip the ones eliminated by fusion and lower the rest as usual.