Page MenuHomePhabricator

[X86] AMX programming model.
AcceptedPublic

Authored by LuoYuanke on Sep 19 2020, 9:19 PM.

Details

Summary

This patch is to implement amx programming model that was discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast register allocation is not covered in the patch yet. The patch include:

  1. The c interface to end user.
  2. The AMX intrinsics in LLVM IR.
  3. The Lowering from AMX intrinsics to AMX pseudo instruction.
  4. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx instructions.
  5. The register allocation for tile register.
  6. Morph AMX pseudo instruction to AMX real instruction.

We also need support 2 features. That will be implemented in other patches.

  1. Support fast register allocation to allocate tile register.
  2. Support inline assembly and support allocating the tile register for inline assembly.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Diff Detail

Unit TestsFailed

TimeTest
850 mslinux > Clang.CodeGen::target-builtin-noerror.c
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/target-builtin-noerror.c -triple=x86_64-linux-gnu -S -o -
2,800 mslinux > Clang.CodeGen/X86::avx512f-builtins-constrained.c
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -fexperimental-new-pass-manager -flax-vector-conversions=none -ffreestanding /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/X86/avx512f-builtins-constrained.c -triple=x86_64-unknown-linux-gnu -target-feature +avx512f -emit-llvm -o - -Wall -Werror | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck --check-prefix=COMMON --check-prefix=COMMONIR --check-prefix=UNCONSTRAINED /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/X86/avx512f-builtins-constrained.c
1,880 mslinux > Clang.CodeGen/X86::sse-builtins-constrained.c
Script: -- : 'RUN: at line 2'; /mnt/disks/ssd0/agent/llvm-project/build/bin/clang -cc1 -internal-isystem /mnt/disks/ssd0/agent/llvm-project/build/lib/clang/12.0.0/include -nostdsysteminc -ffreestanding /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/X86/sse-builtins-constrained.c -triple=x86_64-unknown-linux-gnu -target-feature +sse -emit-llvm -o - -Wall -Werror | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/clang/test/CodeGen/X86/sse-builtins-constrained.c --check-prefix=UNCONSTRAINED --check-prefix=COMMON --check-prefix=COMMONIR
30 mslinux > LLVM.CodeGen/X86::O0-pipeline.ll
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -mtriple=x86_64-- -O0 -debug-pass=Structure < /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/X86/O0-pipeline.ll -o /dev/null 2>&1 | grep -v 'Verify generated machine code' | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/X86/O0-pipeline.ll
60 mslinux > LLVM.CodeGen/X86::opt-pipeline.ll
Script: -- : 'RUN: at line 3'; /mnt/disks/ssd0/agent/llvm-project/build/bin/llc -mtriple=x86_64-- -O1 -debug-pass=Structure < /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/X86/opt-pipeline.ll -o /dev/null 2>&1 | grep -v 'Verify generated machine code' | /mnt/disks/ssd0/agent/llvm-project/build/bin/FileCheck /mnt/disks/ssd0/agent/llvm-project/llvm/test/CodeGen/X86/opt-pipeline.ll
View Full Test Results (10 Failed)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
LuoYuanke added inline comments.Wed, Oct 28, 1:26 AM
llvm/test/CodeGen/X86/ipra-reg-usage.ll
6 ↗(On Diff #301172)

This looks wired to me too. The patch only add "tmmcfg". I'll look into it later.

LuoYuanke updated this revision to Diff 301205.Wed, Oct 28, 2:20 AM

Address Pengfei's comments.

LuoYuanke marked an inline comment as done.Wed, Oct 28, 2:28 AM
LuoYuanke added inline comments.
clang/lib/Headers/amxintrin.h
259

Maybe size_t is better.

LuoYuanke marked 4 inline comments as done.Wed, Oct 28, 2:31 AM
LuoYuanke updated this revision to Diff 302738.Tue, Nov 3, 9:25 PM

Removed tilezero support in this patch.
Fixed some bugs.

LuoYuanke added inline comments.Wed, Nov 4, 4:46 AM
llvm/test/CodeGen/X86/ipra-reg-usage.ll
6 ↗(On Diff #301172)

I check the test case without my patch the k pair registers are clobbered. But FileCheck only match the strings, so the test passes. I can also remove "$k0_k1 $k2_k3 $k4_k5 $k6_k7" from the checking.

$ llc -enable-ipra -print-regusage test/CodeGen/X86/ipra-reg-usage.ll
foo Clobbered Registers: $cs $df $ds $eflags $eip $eiz $es $fpcw $fpsw $fs $gs $hip $ip $mxcsr $rip $riz $ss $ssp $bnd0 $bnd1 $bnd2 $bnd3 $cr0 $cr1 $cr2 $cr3 $cr4 $cr5 $cr6
$cr7 $cr8 $cr9 $cr10 $cr11 $cr12 $cr13 $cr14 $cr15 $dr0 $dr1 $dr2 $dr3 $dr4 $dr5 $dr6 $dr7 $dr8 $dr9 $dr10 $dr11 $dr12 $dr13 $dr14 $dr15 $fp0 $fp1 $fp2 $fp3 $fp4 $fp5 $fp6 $fp7 $k0 $k1 $k2 $k3 $k4 $k5 $k6 $k7 $mm0 $mm1 $mm2 $mm3 $mm4 $mm5 $mm6 $mm7 $r11 $st0 $st1 $st2 $st3 $st4 $st5 $st6 $st7 $tmm0 $tmm1 $tmm2 $tmm3 $tmm4 $tmm5 $tmm6 $tmm7 $xmm16 $xmm17 $xmm18 $xmm19 $xmm20 $xmm21 $xmm22 $xmm23 $xmm24 $xmm25 $xmm26 $xmm27 $xmm28 $xmm29 $xmm30 $xmm31 $ymm0 $ymm1 $ymm2 $ymm3 $ymm4 $ymm5 $ymm6 $ymm7 $ymm8 $ymm9 $ymm10 $ymm11 $ymm12 $ymm13 $ymm14 $ymm15 $ymm16 $ymm17 $ymm18 $ymm19 $ymm20 $ymm21 $ymm22 $ymm23 $ymm24 $ymm25 $ymm26 $ymm27 $ymm28 $ymm29 $ymm30 $ymm31 $zmm0 $zmm1 $zmm2 $zmm3
$zmm4 $zmm5 $zmm6 $zmm7 $zmm8 $zmm9 $zmm10 $zmm11 $zmm12 $zmm13 $zmm14 $zmm15 $zmm16 $zmm17 $zmm18 $zmm19 $zmm20 $zmm21 $zmm22 $zmm23 $zmm24 $zmm25 $zmm26 $zmm27 $zmm28 $zmm29 $zmm30 $zmm31 $r11b $r11bh $r11d $r11w $r11wh $k0_k1 $k2_k3 $k4_k5 $k6_k7

You may still need to change the format according to the Lint suggestions.

clang/lib/Headers/amxintrin.h
277

The comment is useless.

293

The type is inconsistent with __tile_stored

305

Why not use size_t?

316

The same here.

clang/test/CodeGen/X86/amx_api.c
13 ↗(On Diff #302738)

Shoud it check for 3 and only 3 llvm.x86.tileloadd64.internal?

llvm/include/llvm/CodeGen/Passes.h
494

Comments?

llvm/include/llvm/CodeGen/TileShapeInfo.h
48

You just need to check
RowImm != InvalidImmShape && ColImm != InvalidImmShape

llvm/lib/Target/X86/X86TileConfig.cpp
118

The format looks strange, I wonder why Lint didn't report it.

llvm/test/CodeGen/X86/ipra-reg-usage.ll
6 ↗(On Diff #301172)

You can commit a patch to fix it directly.

llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir
94 ↗(On Diff #302738)

Is the different caused by the order change of MachineDominator Tree Construction?

LuoYuanke updated this revision to Diff 303345.Thu, Nov 5, 11:59 PM

Fix lit test failure.

akashk4 requested changes to this revision.Fri, Nov 6, 8:42 AM
akashk4 added a subscriber: akashk4.

It would be great if TILE_RELEASE is supported in the backend, so the scope of transaction is known and the configurations are at least stored to memory properly.

llvm/lib/Target/X86/X86TileConfig.cpp
102

I do not understand why wasn't PSTTILECFG used to store config to memory. I guess it will be difficult to do that because we do not know the scope of a transaction since the TILE_RELEASE is not supported.

This revision now requires changes to proceed.Fri, Nov 6, 8:42 AM
LuoYuanke added inline comments.Fri, Nov 6, 4:14 PM
llvm/lib/Target/X86/X86TileConfig.cpp
102

PSTTILECFG is use to store the config from tile config register to memory. Here to need to load the config from memory to tile config register, so that each tile data register is configured.

The LDTIELCFG has been inserted in the X86PreTileConfig pass. Since at X86PreTileConfig pass we don't know the config of the tile physical registers, the shape data in stack slot is zero. At the pass which is after RA, we know the shape of tile registers, so we just fill in the shape in the stack slot.

Do you mean to tile release at the end of each function which use AMX? Since we config tile registers at the beginning of each function that use AMX, it doesn't break AMX operation without tile release. But it may reduce the overhead of thread switch with tile release, if AMX operation in only used for a while in a thread.

It would be great if TILE_RELEASE is supported in the backend, so the scope of transaction is known and the configurations are at least stored to memory properly.

TILERELEASE don't store the config to memory. STTILECFG store the config to memory. Why do we need to store the config to memory?

LuoYuanke added inline comments.Mon, Nov 9, 3:55 AM
clang/test/CodeGen/X86/amx_api.c
13 ↗(On Diff #302738)

I may create another simple test case that test tile_loadd, tile_dpbsud, __tile_stored seperately.

llvm/test/CodeGen/X86/ipra-reg-usage.ll
6 ↗(On Diff #301172)
llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir
94 ↗(On Diff #302738)

It seems the test case doesn't ensure the order. I'll take more time to look into the case.

LuoYuanke updated this revision to Diff 303830.Mon, Nov 9, 4:51 AM
  1. Address Pengfei's comments.
  2. Add tilerelease at the epilog.
  3. Fix a bug of emitting instruction that use subregister.
LuoYuanke marked 5 inline comments as done.Mon, Nov 9, 4:58 AM
LuoYuanke added inline comments.
llvm/lib/Target/X86/X86TileConfig.cpp
102

I add tilerelease in function's epilog if the function define AMX registers.

118

I modified the code, but it is reformated by lint again.

LuoYuanke updated this revision to Diff 303833.Mon, Nov 9, 5:07 AM

Reformt the code with lint fix.

LuoYuanke updated this revision to Diff 304136.Tue, Nov 10, 4:30 AM

Config palette.

LuoYuanke added inline comments.Wed, Nov 11, 7:09 PM
llvm/lib/Target/X86/X86TileConfig.cpp
102

@akashk4, ping

Fix clang-tidy.

LuoYuanke added inline comments.Thu, Nov 12, 2:32 AM
llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir
94 ↗(On Diff #302738)

This is spilled stack slot. The order is sorted by register size. Since the register size of r14 and rbx is the same, so the order is undefined. llvm::sort call std::sort and std::sort doesn't preserve the order of the equivalent elements. (https://stackoverflow.com/questions/4107315/stdsort-behavior-with-ints-that-are-equal)

Rename type name tile to tile1024i.

wxiao3 added inline comments.Mon, Nov 16, 6:47 AM
llvm/include/llvm/CodeGen/LiveRegMatrix.h
44

what's the purpose of this member?

llvm/lib/Target/X86/X86RegisterInfo.cpp
874

Don't need to add PhysReg again if PhysReg is already in Hints.

This comment was removed by wxiao3.
LuoYuanke added inline comments.Tue, Nov 17, 9:07 PM
llvm/include/llvm/CodeGen/LiveRegMatrix.h
44

It is useless. Thanks.

llvm/lib/Target/X86/X86RegisterInfo.cpp
874

You are right. I will revise the code.

Address Wei's comments.

akashk4 accepted this revision.Tue, Nov 17, 10:52 PM
This revision is now accepted and ready to land.Tue, Nov 17, 10:52 PM
LuoYuanke edited the summary of this revision. (Show Details)Wed, Nov 18, 5:46 PM
LuoYuanke retitled this revision from [X86] AMX programming model prototype. to [X86] AMX programming model..Wed, Nov 18, 6:36 PM

(Drive by comments)

llvm/include/llvm/CodeGen/TileShapeInfo.h
28

Can you document the class?

llvm/lib/Target/X86/X86PreTileConfig.cpp
30 ↗(On Diff #305978)

Can you document this as well?

llvm/lib/Target/X86/X86TileConfig.cpp
35

Can you document this pass?

224

Is this usual for MachinePasses to do this? In general the pass manager can display the scheduling when debugging

LuoYuanke added inline comments.Wed, Nov 18, 9:11 PM
llvm/lib/Target/X86/X86TileConfig.cpp
224

Make sense. I'll remove the debug log.

Address Mehdi's comments.

You should at least get @craig.topper's feedback, given this is a significant change in the x86 backend.

You should at least get @craig.topper's feedback, given this is a significant change in the x86 backend.

We had internal review with Craig on the design before. Craig

You should at least get @craig.topper's feedback, given this is a significant change in the x86 backend.

We have internal review with Craig and Andy on the design before. Craig may be very busy for now.
@craig.topper
Would you take the time to review my patch?

This triggers an assertion in X86LowerAMXType::visit() when compiled with clang test.c -O3

typedef int vec1024 __attribute__((vector_size(1024)));

vec1024 foo(vec1024 x, vec1024 y) {
  return x + y;
}
While deleting: <256 x i32> %x 
Use still stuck around after Def is destroyed:  %add = add <256 x i32> %y, %x
clang-12: llvm/lib/IR/Value.cpp:100: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.

This triggers an assertion in X86LowerAMXType::visit() when compiled with clang test.c -O3

typedef int vec1024 __attribute__((vector_size(1024)));

vec1024 foo(vec1024 x, vec1024 y) {
  return x + y;
}
While deleting: <256 x i32> %x 
Use still stuck around after Def is destroyed:  %add = add <256 x i32> %y, %x
clang-12: llvm/lib/IR/Value.cpp:100: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.

Thank Craig. This case create an instruction "%add = add <256 x i32> %y, %x" which is not covered by the type lowering. So we need do something similar of type legalization for <256 x i32> to split the type. Do you have any suggestion?

Remind me what happened with the idea of using a dedicated type for tiles in IR like mmx?

LuoYuanke added a comment.EditedThu, Nov 19, 3:27 PM

Remind me what happened with the idea of using a dedicated type for tiles in IR like mmx?

Yes, that's good idea. Do you think we need a separate patch to support the new IR type for AMX before this patch? It may be clearer.

@craig.topper. Thank you for the good idea. I create another patch (https://reviews.llvm.org/D91927) to fix the problem.