The previous solution depends on variable name to record the shape
information. However it is not reliable, because in release build
compiler would not set the variable name. It can be accomplished with an
additional option fno-discard-value-names, but it is not acceptable
for users.
This patch is to preconfigure the tile register with machine
instruction. It follow the same way what sigle configure does. In the
future we can fall back to multiple configure when single configure
fails due to the shape dependency issue.
The algorithm to configure the tile register is simple in the patch. We
may improve it in the future. It configure tile register based on basic
block. Compiler would spill the tile register if it live out the basic
block. After the configure there should be no spill across tile
confgiure in the register alloction. Just like fast register allocation
the algorithm walk the instruction in reverse order. When the shape
dependency doesn't meet, it insert ldtilecfg after the last instruction
that define the shape.
In post configuration compiler also walk the basic block to collect the
physical tile register number and generate instruction to fill the stack
slot for the correponding shape information.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86FastPreTileConfig.cpp | ||
---|---|---|
440 | Maybe better to add the comments with change steps of a simple case ahead the key functions. |
llvm/lib/Target/X86/X86FastPreTileConfig.cpp | ||
---|---|---|
393–394 | We need reload after function call too. |
- Fix stack slot address of circular phi.
- Handle config across function call.
- Address Phoebe and Xiang's comments.
llvm/lib/Target/X86/X86FastPreTileConfig.cpp | ||
---|---|---|
408 | Good catch. I'll fix it. |
llvm/lib/Target/X86/X86FastPreTileConfig.cpp | ||
---|---|---|
408 | I'm not sure it is well cononical phi node, but I created such case. |
llvm/lib/Target/X86/X86FastPreTileConfig.cpp | ||
---|---|---|
356 | Yes, but there should be no side effect to set false. The kill flag should be caluculated during register alloction. | |
357–358 | The kill flag should be caluculated during register alloction. We need to anaylze the live range to know the kill flag. Here I think it is not important to set kill flag. |
Address Xiang's comments.
- Canonicalize the phi node.
- Trace the liveout register as we may miss spill when phi is transformed to tileload and the phi is deleted.
llvm/lib/Target/X86/X86FastPreTileConfig.cpp | ||
---|---|---|
551 | I am afraid , even without call, 1 ldtilecfg is not enough for 1 MBB. | |
565 | Not much sure here must no "COPY" for tile. (T1 = Copy T0) | |
575 | Sorry, In my understand, the comment is not match the code. // tilezero // def row // def col <- LastShapeMI // ldtilecfg <- insert // tilezero(row, col) | |
605 | We should escape duplicated reload. for example: Not re-gen reload for line 2. 1 T0 = TileLoad 2 TileUse T0 |
llvm/lib/Target/X86/X86FastPreTileConfig.cpp | ||
---|---|---|
551 | That would run out of register. Currently we have valotiled tile in "lower amx type" pass. We can improve it later and disable volatile tile in that pass. | |
565 | Thre are only phi or copy beside AMX intrinsic. We handle phi separately. "COPY" may be generated after phi elimination. OK, let me add assert here. | |
575 | Let me update the comments. | |
605 | How about optimize reload in another patch? |
llvm/lib/Target/X86/X86FastPreTileConfig.cpp | ||
---|---|---|
551 | And do we need to consider shape number > "max reg num of ldtilecfg (8)" in current stage ? It is possible in big BB. |
llvm/lib/Target/X86/X86TargetMachine.cpp | ||
---|---|---|
424 | Remove. |
Mark some TODO for your planning.
llvm/test/CodeGen/X86/AMX/amx-across-func.ll | ||
---|---|---|
619 | We may need reconfig if tilerelease |
llvm/test/CodeGen/X86/AMX/amx-across-func.ll | ||
---|---|---|
619 | Config is based on function unit. Each function that use AMX should reconfig. If it doesn't touch AMX, no config is required. |
FYI this has some noticeable compile-time impact on O0 builds: http://llvm-compile-time-tracker.com/compare.php?from=62a9b36fcf728b104ea87e6eb84c0be69b779df7&to=496156ac57da3abd9c8a6dc422852b7bdfaa448f&stat=instructions
Maybe it's possible to skip more of X86FastPreTileConfig early if tile registers are not used (i.e. almost always)?
I can duplicated the regression with "llc -mtriple=aarch64 test/DebugInfo/Generic/two-cus-from-same-file.ll -O0 -o -". This is a ISel bug on "DBG_VALUE", I'll file a bug for it.
Forget to remove ?