This is an optimized approach for D94155 , D95136.
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the AMX instructions between one call to another. We will insert
ldtilecfg after the first call if we find any AMX instructions.
Since you iterate all MI twice, the total complex is 2 * N(BB) * M(MI), which is worse than D56136 (N * M + N).