[Matrix] Add initial tiling for load/multiply/store chains.

This patch adds initial fusion for load/multiply/store chains of matrix

operations.

The patch contains roughly two parts:

- Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).

First, we ensure that both loads of the multiply operands do not alias the store.

If they do, we create new non-aliasing copies of the operands. Note that this

may introduce new basic block. Finally we process TileSize x TileSize blocks.

That is: load tiles from the input operands, multiply and store them.

- Identify fusion candidates & matrix instructions.

As a first step, collect all instructions with shape info and fusion candidates

(currently @llvm.matrix.multiply calls). Next, try to fuse candidates and

collect instructions eliminated by fusion. Finally iterate over all matrix

instructions, skip the ones eliminated by fusion and lower the rest as usual.

Reviewers: anemet, Gerolf, hfinkel, andrew.w.kaylor, LuoYuanke

Reviewed By: anemet

Differential Revision: https://reviews.llvm.org/D75566