The x86_amx is used for AMX intrinsics. <256 x i32> is bitcasted to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcasted to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. The new type x86_amx can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depends on https://reviews.llvm.org/D87981.
|380 ms||linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp|
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang --driver-mode=g++ -m64 -gline-tables-only -fsanitize=hwaddress -fuse-ld=lld -mcmodel=large -mllvm -hwasan-globals -mllvm -hwasan-use-short-granules -mllvm -hwasan-instrument-landing-pads=0 -mllvm -hwasan-instrument-personality-functions /mnt/disks/ssd0/agent/llvm-project/compiler-rt/test/hwasan/TestCases/sizes.cpp -nostdlib++ -lstdc++ -o /mnt/disks/ssd0/agent/llvm-project/build/projects/compiler-rt/test/hwasan/X86_64/TestCases/Output/sizes.cpp.tmp
I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?
Should this just be deleted?
Don't use an assert to check the result of a dyn_cast. If it shouldn't fail just use cast<LoadInst> which will assert internally.
@craig.topper , thank you for reviewing my patch.
I think if user just use our external API, such IR won't be generated. However if there is such IR, we can transform bitcast to <store, load>, so that the type can be translated through memory. One of <store, load> is AMX intrinsics store/load, so it won't be optimized. Is it reasonable?
Yes. It is 512. Thanks.
Yes. It is in function's entry block. It is done in line 48 of function CreateAllocaInst(). CreateAllocaInst() is actually copied from your code. :)
1024 is conservatives, because vector require the alignment to be the vector size. Here generate vector <256 x i32> load/store.