This patch adds initial fusion for load/multiply/store chains of matrix
The patch contains roughly two parts:
1. Code generation for a fused load/multiply/store chain (LowerMatrixMultiplyFused).
First, we ensure that both loads of the multiply operands do not alias the store. If they do, we create new non-aliasing copies of the operands. Note that this may introduce new basic block. Then we split the block containing the multiply at the multiply, to simplify processing by returning the remainder of the original block to continue analysis (see 2.). Finally we process TileSize x TileSize blocks, that is,Finally we process TileSize x TileSize blocks. That is: load tiles from the input operands, multiply and store them.
2. Identify fusion candidates & matrix instructions.
To identify candidates for fusionAs a first step, we look forcollect all instructions with shape info and fusion candidates (currently @llvm.matrix.multiply with operands that are loads and a single use of the result in a storecalls). To avoid generating unnecessary code for loads that later on get fusedNext, we do a first pass over the function and only try fusing instructions,try to fuse candidates and collect instructions eliminated by fusion. while keeping track of all other instructions with shape information in the funFinally iterate over all matrix instruction.s, We continue with the regular code generation for the remaining instructions with shape information after finishing fusionskip the ones eliminated by fusion and lower the rest as usual.