Since there is no tile copy instruction, we need to store tile
register to stack and load from stack to another tile register.
We need extra GR to hold the stride, and we need stack slot to
hold the tile data register. We would run this pass after copy
propagation, so that we don't miss copy optimization. And we
would run this pass before prolog/epilog insertion, so that we
can allocate stack slot.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Target/X86/X86LowerTileCopy.cpp | ||
---|---|---|
2 | Comment is wrong. | |
10 | instructions | |
llvm/lib/Target/X86/X86RegisterInfo.cpp | ||
878 | Is it possible to define a special COPY for AMX which can implicitly define a register for stride? | |
llvm/lib/Target/X86/X86TargetMachine.cpp | ||
584 | We are much like handling X87 register copy in pass "X86 FP Stackifier", so I think we can add the pass to addPostRegAlloc like it. | |
llvm/test/CodeGen/X86/AMX/amx-lower-tile-copy.ll | ||
38 | As we had discussed, tilezero should be rematerialized instead of spilling. For non tilezero cases, we still need to consider the spilling as loop invariant and hoist it out of the loop. Anyway, these are optimization thoughs which don't affect the functionality here. |
llvm/lib/Target/X86/X86RegisterInfo.cpp | ||
---|---|---|
878 | And COPY instruction is auto generated by some passes. |
Comment is wrong.