The AMX combiner would store undef or zero to stack and invoke tileload
to load the data to tile register. To avoid the store/load, we can
materialzie undef or zero value to tilezero.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Target/X86/X86LowerAMXType.cpp | ||
---|---|---|
81 | Can we implement this with "Reverse thinking". | |
186 | a1: Here only fetch the first use. |
llvm/lib/Target/X86/X86LowerAMXType.cpp | ||
---|---|---|
81 | Most of intrinsic return x86_amx, an exception is tilestored64_internal which return void. Maybe add llvm.x86.amx prefix name for each AMX intrinsic, so that we can distinguish amx intrinsics by its name? | |
186 | We don't traverse all node, that is more complex. Here we just traverse the first user. If we can find shape, then return the shape, otherwise just return nullptr and abandon the optimization. |
llvm/lib/Target/X86/X86LowerAMXType.cpp | ||
---|---|---|
81 | re-add prefix is ok but is big job. | |
186 | I think no problem. |
llvm/lib/Target/X86/X86LowerAMXType.cpp | ||
---|---|---|
81 | Good idea. Updated the patch. |
Can we implement this with "Reverse thinking".
for example:
use x86_amx tile and exclude non-amx-instrinsics instruction (cast, copy, ..).
In this way we may no need to care about here when we add new AMXs.