This is an optimized approach for D94155.
Previous code build the model that tile config register is the user of
each AMX instruction. There is a problem for the tile config register
spill. When across function, the ldtilecfg instruction may be inserted
on each AMX instruction which use tile config register. This cause all
tile data register clobber.
To fix this issue, we remove the model of tile config register. We
analyze the AMX instructions between one call to another. We will insert
ldtilecfg after the first call if we find any AMX instructions.
We also optimized the inserting of tilerelease by moving it just after
the last AMX instruction of each branch. We will insert it to its
successors if the BB contains the last AMX instruction is in a loop.
LGTM for the logic.