This is an archive of the discontinued LLVM Phabricator instance.

[AMX] Not fold constant bitcast into amx intrisic
AbandonedPublic

Authored by xiangzhangllvm on Mar 16 2021, 8:12 PM.

Details

Summary

We won't fold bitcast for tile type, becasue there is no way to
assignee a tmm reg from a constant. We manually generate tilestore
and tileload at pass "Lower AMX type".

Diff Detail

Event Timeline

xiangzhangllvm created this revision.Mar 16 2021, 8:12 PM
xiangzhangllvm requested review of this revision.Mar 16 2021, 8:12 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 16 2021, 8:12 PM
xiangzhangllvm added inline comments.Mar 16 2021, 8:35 PM
clang/test/CodeGen/X86/amx_api.c
39

we usually write like this __tile1024i c = {row, col};
rm {1,2,3} will also see as {row, col, {0,...}}

Would you add a test case for it?

Would you add a test case for it?

at clang/test/CodeGen/X86/amx_api.c

clin1 added inline comments.Mar 16 2021, 9:19 PM
llvm/lib/Analysis/ConstantFolding.cpp
101

API for this function always returns non-null: can we return ConstantExpr::getBitCast(C,DestTy) instead? Then the change in SCCP is not needed either.

at clang/test/CodeGen/X86/amx_api.c

Probably we need a .ll test case to for constant folding.

llvm/lib/Analysis/ConstantFolding.cpp
108

assign

Probably we need a .ll test case to for constant folding.

Fold constant is done in CSE and SCCP which are both passes run in Clang (O2)

llvm/lib/Analysis/ConstantFolding.cpp
101

I tried, the SCCP will also fold the bitcast into the following instruction.

clin1 added inline comments.Mar 16 2021, 10:47 PM
llvm/lib/Analysis/ConstantFolding.cpp
101

I see, that makes sense. But are we sure that all callers of FoldBitCast are doing a null check: for example, FoldReinterpretLoadFromConstPtr calls FoldBitCast several times, and null is not checked before dereference. Maybe the AMX type cannot happen in this case?
Alternative: can AMX be checked in SCCP?

llvm/lib/Analysis/ConstantFolding.cpp
101

Right! let me check the callers of FoldBitCast,
Luckly, there is only several callers of FoldBitCast, and almost all in this file ConstantFolding.cpp.

lebedev.ri requested changes to this revision.EditedMar 17 2021, 12:15 AM
lebedev.ri added a subscriber: lebedev.ri.

I strongly suggest you bring up this ongoing creep of if (DestTy->isX86_AMXTy()) return false; on llvm-dev.
I strongly supsect you are covering up bugs in you backend/pass with them.

clang/test/CodeGen/X86/amx_api.c
5
  1. This should be a SCCP test
  2. clang tests should not test llvm optimizations
This revision now requires changes to proceed.Mar 17 2021, 12:15 AM
xiangzhangllvm added a comment.EditedMar 17 2021, 12:34 AM

I strongly suggest you bring up this ongoing creep of if (DestTy->isX86_AMXTy()) return false; on llvm-dev.
I strongly supsect you are covering up bugs in you backend/pass with them.

Sorry, I don't much understand your idea, I happen to find this bug when I supporting fast reg allocation for AMX.
It fold the Constant bitcast of tile type into a amx instruction, which will escape the BackEnd pass "Lower AMX type for Load/Store"

Hi @lebedev.ri Do you think the target-special type (X86_AMXTy) broken the beauty of target-independent code at mid-end ?

I strongly suggest you bring up this ongoing creep of if (DestTy->isX86_AMXTy()) return false; on llvm-dev.
I strongly supsect you are covering up bugs in you backend/pass with them.

Sorry, I don't much understand your idea, I happen to find this bug when I supporting fast reg allocation for AMX.
It fold the Constant bitcast of tile type into a amx instruction, which will escape the BackEnd pass "Lower AMX type for Load/Store"

I think that is a traditional backend problem that the pass will just have to be updated to deal with.

Hi @lebedev.ri Do you think the target-special type (X86_AMXTy) broken the beauty of target-independent code at mid-end ?

I think that is a traditional backend problem that the pass will just have to be updated to deal with.

Hi @lebedev.ri , seems there is some mistakes, let me first point out the problem:

  1. All AMX operation should use AMX intrinsic,

So we need specially handle the bitcast from Constant vector to AMX type. (Not use normal load / store)
This work is done at Back-End pass "Lower AMX type for Load/Store" by checking the bitcast instruction.

  1. If Mid-End fold this bitcast into a instruction, currently, the Back-End pass "Lower AMX type for Load/Store" will no find it.

(of course, we can check every operands of every instruction to find out the amx bitcast, but it not good job, directly let it not folding in mid-end is better)

Once again, i suggest to bring this up on llvm-dev.

Once again, i suggest to bring this up on llvm-dev.

That is obvious,
Discuss what, can you point it out clearly ?
The topic is do it in mid-end or back-end ?

Once again, i suggest to bring this up on llvm-dev.

That is obvious,
Discuss what, can you point it out clearly ?

The ongoing special-casing of X86_AMXTy through the llvm due to the inability of the existing backend passes to handle certain llvm ir constructs.

The topic is do it in mid-end or back-end ?

The ongoing special-casing of X86_AMXTy through the llvm due to the inability of the existing backend passes to handle certain llvm ir constructs.

We have bring up it to llvm-dev.
BTW, All the Type should see as target independent. (Even it support by less targets or 1 target)

Current we see “ if (Ty.isVectorTy()) {…}” is make sense in Mid-End.
Why we can’t see “if (Ty.isX86_AMXTy()){…}” is make sense ?

Just because more targets support the VectorTy, less target (only x86) support the AMXTy ?
The logic is not make sense.

fhahn added a subscriber: fhahn.Mar 18 2021, 2:19 AM

The ongoing special-casing of X86_AMXTy through the llvm due to the inability of the existing backend passes to handle certain llvm ir constructs.

We have bring up it to llvm-dev.
BTW, All the Type should see as target independent. (Even it support by less targets or 1 target)

Current we see “ if (Ty.isVectorTy()) {…}” is make sense in Mid-End.
Why we can’t see “if (Ty.isX86_AMXTy()){…}” is make sense ?

One thing to note is that there appears to be no documentation of x86_amx in the langref (maybe there is, but searching for x86_amx did not surface anything). Without that, making any decisions based on the type in general LLVM optimizations seems problematic, because the type is effectively not specified.

xiangzhangllvm abandoned this revision.Apr 13 2021, 5:15 PM