This is an archive of the discontinued LLVM Phabricator instance.

[X86] AMX programming model.
ClosedPublic

Authored by LuoYuanke on Sep 19 2020, 9:19 PM.

Details

Summary

This patch is to implement amx programming model that was discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast register allocation is not covered in the patch yet. The patch include:

  1. The c interface to end user.
  2. The AMX intrinsics in LLVM IR.
  3. The Lowering from AMX intrinsics to AMX pseudo instruction.
  4. Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx instructions.
  5. The register allocation for tile register.
  6. Morph AMX pseudo instruction to AMX real instruction.

We also need support 2 features. That will be implemented in other patches.

  1. Support fast register allocation to allocate tile register.
  2. Support inline assembly and support allocating the tile register for inline assembly.

Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0

Diff Detail

Event Timeline

LuoYuanke created this revision.Sep 19 2020, 9:19 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptSep 19 2020, 9:19 PM
LuoYuanke requested review of this revision.Sep 19 2020, 9:19 PM
LuoYuanke edited the summary of this revision. (Show Details)Sep 19 2020, 9:35 PM

Updating D87981: [X86] AMX programming model prototype.
Fixed some clang format issue.

LuoYuanke updated this revision to Diff 293051.Sep 20 2020, 7:35 PM

Updating D87981: [X86] AMX programming model prototype.
Fixed lit test case failure.

Updating D87981: [X86] AMX programming model prototype.
Fix clang format and add test case for RA that across function call.

LuoYuanke updated this revision to Diff 294036.Sep 24 2020, 6:26 AM

Updating D87981: [X86] AMX programming model prototype.
Rebase.

LuoYuanke updated this revision to Diff 301172.Oct 27 2020, 8:56 PM

Rebase and Model tile configure register.
Configure register is the user of all the amx instruction.
Add X86PreTileConfig pass to insert psuedo ldtilecfg and
build the data depenedcy between config regiter and amx
instructions.

LuoYuanke edited the summary of this revision. (Show Details)Oct 27 2020, 10:23 PM
pengfei added inline comments.Oct 27 2020, 11:44 PM
clang/lib/Headers/amxintrin.h
65

Better to use __DEFAULT_FN_ATTRS_TILE, __DEFAULT_FN_ATTRS_INT8, then you don't need to undef it in the middle.

230

How about using the unsigned type?

pengfei added inline comments.Oct 28 2020, 12:02 AM
clang/lib/Headers/amxintrin.h
259

unsigned long long?

llvm/include/llvm/CodeGen/TileShapeInfo.h
10

Do you want to use TileShapeInfo for the class name or forget to change the comment to ShapeT?

llvm/lib/IR/Function.cpp
779 ↗(On Diff #301172)

There's no new change here. This file shoule be reverted.

llvm/test/CodeGen/X86/ipra-reg-usage.ll
6

Why this patch affects the k registers?

llvm/test/CodeGen/X86/opt-pipeline.ll
122

Why the order of "MachineDominator Tree Construction" changed?

llvm/utils/TableGen/IntrinsicEmitter.cpp
192 ↗(On Diff #301172)

There's no new change here. This file shoule be reverted.

LuoYuanke added inline comments.Oct 28 2020, 1:26 AM
llvm/include/llvm/CodeGen/TileShapeInfo.h
10

I forget change the comments. Thanks.

llvm/test/CodeGen/X86/ipra-reg-usage.ll
6

This looks wired to me too. The patch only add "tmmcfg". I'll look into it later.

llvm/test/CodeGen/X86/opt-pipeline.ll
122

Because both "Tile Register Pre-configure" and "Machine Natural Loop Construction" depends on "MachineDominator Tree Construction", but "Dominator Tree" doesn't change after "Tile Register Pre-configure", so "MachineDominator Tree Construction" doesn't need to be analyzed again in "Machine Natural Loop Construction"

LuoYuanke updated this revision to Diff 301205.Oct 28 2020, 2:20 AM

Address Pengfei's comments.

LuoYuanke marked an inline comment as done.Oct 28 2020, 2:28 AM
LuoYuanke added inline comments.
clang/lib/Headers/amxintrin.h
259

Maybe size_t is better.

LuoYuanke marked 4 inline comments as done.Oct 28 2020, 2:31 AM
LuoYuanke updated this revision to Diff 302738.Nov 3 2020, 9:25 PM

Removed tilezero support in this patch.
Fixed some bugs.

LuoYuanke added inline comments.Nov 4 2020, 4:46 AM
llvm/test/CodeGen/X86/ipra-reg-usage.ll
6

I check the test case without my patch the k pair registers are clobbered. But FileCheck only match the strings, so the test passes. I can also remove "$k0_k1 $k2_k3 $k4_k5 $k6_k7" from the checking.

$ llc -enable-ipra -print-regusage test/CodeGen/X86/ipra-reg-usage.ll
foo Clobbered Registers: $cs $df $ds $eflags $eip $eiz $es $fpcw $fpsw $fs $gs $hip $ip $mxcsr $rip $riz $ss $ssp $bnd0 $bnd1 $bnd2 $bnd3 $cr0 $cr1 $cr2 $cr3 $cr4 $cr5 $cr6
$cr7 $cr8 $cr9 $cr10 $cr11 $cr12 $cr13 $cr14 $cr15 $dr0 $dr1 $dr2 $dr3 $dr4 $dr5 $dr6 $dr7 $dr8 $dr9 $dr10 $dr11 $dr12 $dr13 $dr14 $dr15 $fp0 $fp1 $fp2 $fp3 $fp4 $fp5 $fp6 $fp7 $k0 $k1 $k2 $k3 $k4 $k5 $k6 $k7 $mm0 $mm1 $mm2 $mm3 $mm4 $mm5 $mm6 $mm7 $r11 $st0 $st1 $st2 $st3 $st4 $st5 $st6 $st7 $tmm0 $tmm1 $tmm2 $tmm3 $tmm4 $tmm5 $tmm6 $tmm7 $xmm16 $xmm17 $xmm18 $xmm19 $xmm20 $xmm21 $xmm22 $xmm23 $xmm24 $xmm25 $xmm26 $xmm27 $xmm28 $xmm29 $xmm30 $xmm31 $ymm0 $ymm1 $ymm2 $ymm3 $ymm4 $ymm5 $ymm6 $ymm7 $ymm8 $ymm9 $ymm10 $ymm11 $ymm12 $ymm13 $ymm14 $ymm15 $ymm16 $ymm17 $ymm18 $ymm19 $ymm20 $ymm21 $ymm22 $ymm23 $ymm24 $ymm25 $ymm26 $ymm27 $ymm28 $ymm29 $ymm30 $ymm31 $zmm0 $zmm1 $zmm2 $zmm3
$zmm4 $zmm5 $zmm6 $zmm7 $zmm8 $zmm9 $zmm10 $zmm11 $zmm12 $zmm13 $zmm14 $zmm15 $zmm16 $zmm17 $zmm18 $zmm19 $zmm20 $zmm21 $zmm22 $zmm23 $zmm24 $zmm25 $zmm26 $zmm27 $zmm28 $zmm29 $zmm30 $zmm31 $r11b $r11bh $r11d $r11w $r11wh $k0_k1 $k2_k3 $k4_k5 $k6_k7

You may still need to change the format according to the Lint suggestions.

clang/lib/Headers/amxintrin.h
227

The comment is useless.

243

The type is inconsistent with __tile_stored

255

Why not use size_t?

266

The same here.

clang/test/CodeGen/X86/amx_api.c
14

Shoud it check for 3 and only 3 llvm.x86.tileloadd64.internal?

llvm/include/llvm/CodeGen/Passes.h
489

Comments?

llvm/lib/Target/X86/X86TileConfig.cpp
118

The format looks strange, I wonder why Lint didn't report it.

llvm/test/CodeGen/X86/ipra-reg-usage.ll
6

You can commit a patch to fix it directly.

llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir
94

Is the different caused by the order change of MachineDominator Tree Construction?

pengfei added inline comments.Nov 4 2020, 6:31 AM
llvm/include/llvm/CodeGen/TileShapeInfo.h
48

You just need to check
RowImm != InvalidImmShape && ColImm != InvalidImmShape

LuoYuanke updated this revision to Diff 303345.Nov 5 2020, 11:59 PM

Fix lit test failure.

akashk4 requested changes to this revision.Nov 6 2020, 8:42 AM
akashk4 added a subscriber: akashk4.

It would be great if TILE_RELEASE is supported in the backend, so the scope of transaction is known and the configurations are at least stored to memory properly.

llvm/lib/Target/X86/X86TileConfig.cpp
102

I do not understand why wasn't PSTTILECFG used to store config to memory. I guess it will be difficult to do that because we do not know the scope of a transaction since the TILE_RELEASE is not supported.

This revision now requires changes to proceed.Nov 6 2020, 8:42 AM
LuoYuanke added inline comments.Nov 6 2020, 4:14 PM
llvm/lib/Target/X86/X86TileConfig.cpp
102

PSTTILECFG is use to store the config from tile config register to memory. Here to need to load the config from memory to tile config register, so that each tile data register is configured.

The LDTIELCFG has been inserted in the X86PreTileConfig pass. Since at X86PreTileConfig pass we don't know the config of the tile physical registers, the shape data in stack slot is zero. At the pass which is after RA, we know the shape of tile registers, so we just fill in the shape in the stack slot.

Do you mean to tile release at the end of each function which use AMX? Since we config tile registers at the beginning of each function that use AMX, it doesn't break AMX operation without tile release. But it may reduce the overhead of thread switch with tile release, if AMX operation in only used for a while in a thread.

It would be great if TILE_RELEASE is supported in the backend, so the scope of transaction is known and the configurations are at least stored to memory properly.

TILERELEASE don't store the config to memory. STTILECFG store the config to memory. Why do we need to store the config to memory?

LuoYuanke added inline comments.Nov 9 2020, 3:55 AM
clang/test/CodeGen/X86/amx_api.c
14

I may create another simple test case that test tile_loadd, tile_dpbsud, __tile_stored seperately.

llvm/test/CodeGen/X86/ipra-reg-usage.ll
6
llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir
94

It seems the test case doesn't ensure the order. I'll take more time to look into the case.

LuoYuanke updated this revision to Diff 303830.Nov 9 2020, 4:51 AM
  1. Address Pengfei's comments.
  2. Add tilerelease at the epilog.
  3. Fix a bug of emitting instruction that use subregister.
LuoYuanke marked 5 inline comments as done.Nov 9 2020, 4:58 AM
LuoYuanke added inline comments.
llvm/lib/Target/X86/X86TileConfig.cpp
102

I add tilerelease in function's epilog if the function define AMX registers.

118

I modified the code, but it is reformated by lint again.

LuoYuanke updated this revision to Diff 303833.Nov 9 2020, 5:07 AM

Reformt the code with lint fix.

LuoYuanke updated this revision to Diff 304136.Nov 10 2020, 4:30 AM

Config palette.

LuoYuanke added inline comments.Nov 11 2020, 7:09 PM
llvm/lib/Target/X86/X86TileConfig.cpp
102

@akashk4, ping

Fix clang-tidy.

LuoYuanke added inline comments.Nov 12 2020, 2:32 AM
llvm/test/CodeGen/X86/statepoint-fixup-invoke.mir
94

This is spilled stack slot. The order is sorted by register size. Since the register size of r14 and rbx is the same, so the order is undefined. llvm::sort call std::sort and std::sort doesn't preserve the order of the equivalent elements. (https://stackoverflow.com/questions/4107315/stdsort-behavior-with-ints-that-are-equal)

Rename type name tile to tile1024i.

wxiao3 added inline comments.Nov 16 2020, 6:47 AM
llvm/include/llvm/CodeGen/LiveRegMatrix.h
44

what's the purpose of this member?

llvm/lib/Target/X86/X86RegisterInfo.cpp
917

Don't need to add PhysReg again if PhysReg is already in Hints.

This comment was removed by wxiao3.
LuoYuanke added inline comments.Nov 17 2020, 9:07 PM
llvm/include/llvm/CodeGen/LiveRegMatrix.h
44

It is useless. Thanks.

llvm/lib/Target/X86/X86RegisterInfo.cpp
917

You are right. I will revise the code.

Address Wei's comments.

akashk4 accepted this revision.Nov 17 2020, 10:52 PM
This revision is now accepted and ready to land.Nov 17 2020, 10:52 PM
LuoYuanke edited the summary of this revision. (Show Details)Nov 18 2020, 5:46 PM
LuoYuanke retitled this revision from [X86] AMX programming model prototype. to [X86] AMX programming model..Nov 18 2020, 6:36 PM

(Drive by comments)

llvm/include/llvm/CodeGen/TileShapeInfo.h
28

Can you document the class?

llvm/lib/Target/X86/X86PreTileConfig.cpp
31

Can you document this as well?

llvm/lib/Target/X86/X86TileConfig.cpp
35

Can you document this pass?

224

Is this usual for MachinePasses to do this? In general the pass manager can display the scheduling when debugging

LuoYuanke added inline comments.Nov 18 2020, 9:11 PM
llvm/lib/Target/X86/X86TileConfig.cpp
224

Make sense. I'll remove the debug log.

Address Mehdi's comments.

You should at least get @craig.topper's feedback, given this is a significant change in the x86 backend.

You should at least get @craig.topper's feedback, given this is a significant change in the x86 backend.

We had internal review with Craig on the design before. Craig

You should at least get @craig.topper's feedback, given this is a significant change in the x86 backend.

We have internal review with Craig and Andy on the design before. Craig may be very busy for now.
@craig.topper
Would you take the time to review my patch?

This triggers an assertion in X86LowerAMXType::visit() when compiled with clang test.c -O3

typedef int vec1024 __attribute__((vector_size(1024)));

vec1024 foo(vec1024 x, vec1024 y) {
  return x + y;
}
While deleting: <256 x i32> %x 
Use still stuck around after Def is destroyed:  %add = add <256 x i32> %y, %x
clang-12: llvm/lib/IR/Value.cpp:100: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.

This triggers an assertion in X86LowerAMXType::visit() when compiled with clang test.c -O3

typedef int vec1024 __attribute__((vector_size(1024)));

vec1024 foo(vec1024 x, vec1024 y) {
  return x + y;
}
While deleting: <256 x i32> %x 
Use still stuck around after Def is destroyed:  %add = add <256 x i32> %y, %x
clang-12: llvm/lib/IR/Value.cpp:100: llvm::Value::~Value(): Assertion `materialized_use_empty() && "Uses remain when a value is destroyed!"' failed.

Thank Craig. This case create an instruction "%add = add <256 x i32> %y, %x" which is not covered by the type lowering. So we need do something similar of type legalization for <256 x i32> to split the type. Do you have any suggestion?

Remind me what happened with the idea of using a dedicated type for tiles in IR like mmx?

LuoYuanke added a comment.EditedNov 19 2020, 3:27 PM

Remind me what happened with the idea of using a dedicated type for tiles in IR like mmx?

Yes, that's good idea. Do you think we need a separate patch to support the new IR type for AMX before this patch? It may be clearer.

@craig.topper. Thank you for the good idea. I create another patch (https://reviews.llvm.org/D91927) to fix the problem.

LuoYuanke updated this revision to Diff 310160.Dec 8 2020, 6:01 AM

Replace v512i1 with untyped.

Hi,
I'm going to land the patch if there is no objection.

pengfei accepted this revision.Dec 9 2020, 11:10 PM

LGTM. I think we can land this patch as a beginning. Cheers~

This revision was landed with ongoing or failed builds.Dec 10 2020, 1:07 AM
This revision was automatically updated to reflect the committed changes.
yubing added a subscriber: yubing.Mar 18 2021, 12:58 AM
yubing added inline comments.
llvm/lib/Target/X86/X86PreTileConfig.cpp
90

It should be Pre Tile configure instead of Tile Register Configure.

LuoYuanke added inline comments.Mar 18 2021, 1:39 AM
llvm/lib/Target/X86/X86PreTileConfig.cpp
90

You are right. Thanks. Will fix it.

yubing added inline comments.Jun 15 2021, 7:00 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4675

It seems there should be a check here, according to line4575:

if (!Subtarget->hasAMXTILE())
  break;
pengfei added inline comments.Jun 15 2021, 8:33 PM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
4675

Is the check in 4575 is necessary?