Page MenuHomePhabricator

[X86] Add x86_amx type for intel AMX.
Needs ReviewPublic

Authored by LuoYuanke on Sat, Nov 21, 4:51 PM.

Details

Summary

The x86_amx is used for AMX intrinsics. <256 x i32> is bitcasted to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcasted to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. The new type x86_amx can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depends on https://reviews.llvm.org/D87981.

Diff Detail

Unit TestsFailed

TimeTest
380 mslinux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp

Event Timeline

LuoYuanke created this revision.Sat, Nov 21, 4:51 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript
LuoYuanke requested review of this revision.Sat, Nov 21, 4:51 PM
LuoYuanke retitled this revision from [X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics... to [X86] Add x86_amx type for intel AMX. .Sat, Nov 21, 4:54 PM
LuoYuanke edited the summary of this revision. (Show Details)

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

llvm/lib/Target/X86/X86ISelLowering.cpp
5331–5332

Should this just be deleted?

llvm/lib/Target/X86/X86LowerAMXType.cpp
127

Don't use an assert to check the result of a dyn_cast. If it shouldn't fail just use cast<LoadInst> which will assert internally.

135

Unchecked dyn_cast

158

Use cast.

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

@craig.topper , thank you for reviewing my patch.
I think if user just use our external API, such IR won't be generated. However if there is such IR, we can transform bitcast to <store, load>, so that the type can be translated through memory. One of <store, load> is AMX intrinsics store/load, so it won't be optimized. Is it reasonable?

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

@craig.topper , thank you for reviewing my patch.
I think if user just use our external API, such IR won't be generated. However if there is such IR, we can transform bitcast to <store, load>, so that the type can be translated through memory. One of <store, load> is AMX intrinsics store/load, so it won't be optimized. Is it reasonable?

Its fine if its not optimized, just make sure it doesn't crash.

LuoYuanke updated this revision to Diff 307298.Tue, Nov 24, 3:08 AM
LuoYuanke retitled this revision from [X86] Add x86_amx type for intel AMX. to [X86] Add x86_amx type for intel AMX..

Address Craig's comments.

LuoYuanke updated this revision to Diff 307300.Tue, Nov 24, 3:16 AM

Address Craig's comments.
Change dyn_cast to cast.

LuoYuanke marked 3 inline comments as done.Tue, Nov 24, 3:19 AM
pengfei added inline comments.Tue, Nov 24, 4:30 AM
llvm/lib/IR/DataLayout.cpp
819

Should be 512 bits?

llvm/lib/Target/X86/X86LowerAMXType.cpp
115

Why the alignment not be 64?

pengfei added inline comments.Tue, Nov 24, 4:43 AM
llvm/lib/Target/X86/X86LowerAMXType.cpp
162

%src is not used here.

llvm/utils/TableGen/IntrinsicEmitter.cpp
252

Remove ,

craig.topper added inline comments.Tue, Nov 24, 10:45 AM
llvm/lib/Target/X86/X86LowerAMXType.cpp
108–109

Shouldn't this be in the function's entry block?

125

Just use Value. auto doesn't add any value other than shortening by 1 character.

176

maximun->maximum

180

Use Builder.getInt8PtrTy then you don't need Ctx

LuoYuanke marked an inline comment as done.Tue, Nov 24, 9:39 PM
LuoYuanke added inline comments.
llvm/lib/IR/DataLayout.cpp
819

Yes. It is 512. Thanks.

llvm/lib/Target/X86/X86LowerAMXType.cpp
108–109

Yes. It is in function's entry block. It is done in line 48 of function CreateAllocaInst(). CreateAllocaInst() is actually copied from your code. :)

115

1024 is conservatives, because vector require the alignment to be the vector size. Here generate vector <256 x i32> load/store.

LuoYuanke updated this revision to Diff 307514.Tue, Nov 24, 9:46 PM
LuoYuanke marked an inline comment as done.

Address Craig and Pengfei's comments.