We won't fold bitcast for tile type, becasue there is no way to
assignee a tmm reg from a constant. We manually generate tilestore
and tileload at pass "Lower AMX type".
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/test/CodeGen/X86/amx_api.c | ||
---|---|---|
39 | we usually write like this __tile1024i c = {row, col}; |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
101 | API for this function always returns non-null: can we return ConstantExpr::getBitCast(C,DestTy) instead? Then the change in SCCP is not needed either. |
at clang/test/CodeGen/X86/amx_api.c
Probably we need a .ll test case to for constant folding.
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
108 | assign |
Fold constant is done in CSE and SCCP which are both passes run in Clang (O2)
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
101 | I tried, the SCCP will also fold the bitcast into the following instruction. |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
101 | I see, that makes sense. But are we sure that all callers of FoldBitCast are doing a null check: for example, FoldReinterpretLoadFromConstPtr calls FoldBitCast several times, and null is not checked before dereference. Maybe the AMX type cannot happen in this case? |
llvm/lib/Analysis/ConstantFolding.cpp | ||
---|---|---|
101 | Right! let me check the callers of FoldBitCast, |
I strongly suggest you bring up this ongoing creep of if (DestTy->isX86_AMXTy()) return false; on llvm-dev.
I strongly supsect you are covering up bugs in you backend/pass with them.
clang/test/CodeGen/X86/amx_api.c | ||
---|---|---|
5 |
|
Sorry, I don't much understand your idea, I happen to find this bug when I supporting fast reg allocation for AMX.
It fold the Constant bitcast of tile type into a amx instruction, which will escape the BackEnd pass "Lower AMX type for Load/Store"
Hi @lebedev.ri Do you think the target-special type (X86_AMXTy) broken the beauty of target-independent code at mid-end ?
I think that is a traditional backend problem that the pass will just have to be updated to deal with.
Hi @lebedev.ri Do you think the target-special type (X86_AMXTy) broken the beauty of target-independent code at mid-end ?
Hi @lebedev.ri , seems there is some mistakes, let me first point out the problem:
- All AMX operation should use AMX intrinsic,
So we need specially handle the bitcast from Constant vector to AMX type. (Not use normal load / store)
This work is done at Back-End pass "Lower AMX type for Load/Store" by checking the bitcast instruction.
- If Mid-End fold this bitcast into a instruction, currently, the Back-End pass "Lower AMX type for Load/Store" will no find it.
(of course, we can check every operands of every instruction to find out the amx bitcast, but it not good job, directly let it not folding in mid-end is better)
That is obvious,
Discuss what, can you point it out clearly ?
The topic is do it in mid-end or back-end ?
The ongoing special-casing of X86_AMXTy through the llvm due to the inability of the existing backend passes to handle certain llvm ir constructs.
The topic is do it in mid-end or back-end ?
We have bring up it to llvm-dev.
BTW, All the Type should see as target independent. (Even it support by less targets or 1 target)
Current we see “ if (Ty.isVectorTy()) {…}” is make sense in Mid-End.
Why we can’t see “if (Ty.isX86_AMXTy()){…}” is make sense ?
Just because more targets support the VectorTy, less target (only x86) support the AMXTy ?
The logic is not make sense.
One thing to note is that there appears to be no documentation of x86_amx in the langref (maybe there is, but searching for x86_amx did not surface anything). Without that, making any decisions based on the type in general LLVM optimizations seems problematic, because the type is effectively not specified.