AMX shape should be defined before AMX intrinsics. However for below
case, the shape a.row is defined after tile load of b. If we transform
load b to @llvm.x86.tileloadd64 intrinsic, the shape dependency
doesn't meet.
void test_tile_dpbsud(__tile1024i a, __tile1024i b, __tile1024i c) { __tile_dpbsud(&c, a, b); }
This patch is to store the tile b to stack and reloaded it after the
def of b.row. It would cause redundant store/load, but it is simple
to avoid generating invalid IR.
The better way may hoist def b.row before tile load instruction,
but it seems more complicated to recursively hoist its operands.