This is an archive of the discontinued LLVM Phabricator instance.

[X86][AMX] Materialize undef or zero value to tilezero
ClosedPublic

Authored by LuoYuanke on Mar 30 2022, 3:35 AM.

Details

Summary

The AMX combiner would store undef or zero to stack and invoke tileload
to load the data to tile register. To avoid the store/load, we can
materialzie undef or zero value to tilezero.

Diff Detail

Event Timeline

LuoYuanke created this revision.Mar 30 2022, 3:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2022, 3:35 AM
LuoYuanke requested review of this revision.Mar 30 2022, 3:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2022, 3:35 AM
LuoYuanke updated this revision to Diff 419096.Mar 30 2022, 3:51 AM

Check the use_empty() in getShape().

xiangzhangllvm added inline comments.Mar 30 2022, 6:44 PM
llvm/lib/Target/X86/X86LowerAMXType.cpp
81

Can we implement this with "Reverse thinking".
for example:
use x86_amx tile and exclude non-amx-instrinsics instruction (cast, copy, ..).
In this way we may no need to care about here when we add new AMXs.

184

a1: Here only fetch the first use.
a2: line 191 if its user "V->use_empty()" will fail ?
a3: how about a1 second use can get shape but 1st not.

LuoYuanke added inline comments.Mar 30 2022, 8:34 PM
llvm/lib/Target/X86/X86LowerAMXType.cpp
81

Most of intrinsic return x86_amx, an exception is tilestored64_internal which return void. Maybe add llvm.x86.amx prefix name for each AMX intrinsic, so that we can distinguish amx intrinsics by its name?

184

We don't traverse all node, that is more complex. Here we just traverse the first user. If we can find shape, then return the shape, otherwise just return nullptr and abandon the optimization.
Mostly an value should have an user, so it mostly can get shape. But if it is not, we just abandon the optimization. I think we can enhance it when there is user case that need to be optimized. How about add TODO for it?

llvm/lib/Target/X86/X86LowerAMXType.cpp
81

re-add prefix is ok but is big job.
tilestored64_internal also use x86_amx operand.
Because mainly amx intrinsics use/def x86_amx data.
We can both check return's and operands' type, then exclude a few non-amx-instrinsics instructions.

184

I think no problem.

LuoYuanke updated this revision to Diff 419369.Mar 31 2022, 2:04 AM

Address Xiang's comments. Thanks, Xiang.

LuoYuanke added inline comments.Mar 31 2022, 2:05 AM
llvm/lib/Target/X86/X86LowerAMXType.cpp
81

Good idea. Updated the patch.

LuoYuanke updated this revision to Diff 419372.Mar 31 2022, 2:17 AM

Add TODO in getShape().

This revision is now accepted and ready to land.Mar 31 2022, 3:01 AM
This revision was landed with ongoing or failed builds.Mar 31 2022, 4:11 AM
This revision was automatically updated to reflect the committed changes.