We request no intersections between AMX instructions and their shapes'
def when we insert ldtilecfg. However, this is not always ture resulting
from not only users don't follow AMX API model, but also optimizations.
This patch adds a mechanism that tries to hoist AMX shapes' def as well.
It only hoists shapes inside a BB, we can improve it for cases across
BBs in future. Currently, it only hoists shapes of which all sources' def
above the first AMX instruction. We can improve for the case that only
source that moves an immediate value to a register below AMX instruction.
ShapeBBs[MBB] may both contains the Row_def and Col_def, and they may be in different BBs, The Pos of them in their BB my be equal, So use lower_bound here seems not good. Because it may duplicated insert X_def at line 201.