Page MenuHomePhabricator

[X86] Add x86_amx type for intel AMX.
ClosedPublic

Authored by LuoYuanke on Nov 21 2020, 4:51 PM.

Details

Summary

The x86_amx is used for AMX intrinsics. <256 x i32> is bitcasted to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcasted to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. The new type x86_amx can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depends on https://reviews.llvm.org/D87981.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Avoid generatng constant for x86_amx.

pengfei added inline comments.Dec 21 2020, 5:46 AM
llvm/lib/IR/ConstantFold.cpp
540

Operation should at the end of the line.

llvm/lib/Target/X86/X86LowerAMXType.cpp
60

I think we'd better to check exceptions. E.g.

default:
  llvm_unreachable("");
case Intrinsic::x86_tileloadd64_internal:
case Intrinsic::x86_tdpbssd_internal:
case Intrinsic::x86_tilestored64_internal:
  Row = II->getArgOperand(0);
  Col = II->getArgOperand(1);
  break;
122

Why don't check empty like line 157?

204

Is it possible the x86_amx operand isn't from AMX intrinsic, e.g.

%src = bitcast <256 x i32> %xxx to x86_amx
%2 = bitcast x86_amx %src to <256 x i32>
221–224

Better move it to line 310.

236–237

Better to reuse the cast result, e.g.

BitCastInst *BInst = dyn_cast<BitCastInst>(&Inst);
if (!BInst )

You can save several cast<BitCastInst>(&Inst) below.

263

Where's x86_amx* %tile from? Shouldn't been transfered to x86_amx before this bitcast if it exists?

269

Maybe better to keep a duplicated load that calling transformBitcast. The same for line 285.

284

Why we need to consider <256 x i32> has more than one use?

llvm/test/CodeGen/X86/AMX/amx-across-func.ll
89–91

Better to remove these unused attributes. The same to other tests.

llvm/test/CodeGen/X86/AMX/amx-type.ll
67

For this and the next test, we have chances to optimize to memcpy if we can make sure %s is constant 64.

103

We don't need to check this case now, right?

LuoYuanke updated this revision to Diff 313315.Dec 22 2020, 5:53 AM

Address Pengfei's comments.

LuoYuanke updated this revision to Diff 313326.Dec 22 2020, 6:47 AM

Rebase and fix lit test case failure.

pengfei added inline comments.Dec 22 2020, 6:57 AM
llvm/lib/Target/X86/X86LowerAMXType.cpp
153

Why don't put it in DeadBitcasts?

155

Can we leave the canonicalize bitcast cases a single patch. It's a bit complex here and I don't think it's a common case.

163

Maybe better to use BitCastInst?

309

This comment is for above code? Better move it up.

LuoYuanke added inline comments.Dec 22 2020, 3:20 PM
llvm/lib/Target/X86/X86LowerAMXType.cpp
155

Ok, I'll create another patch for it.

163

There may be dead load or store instructions.

204

Good catch. I'll add support for this pattern.

236–237

That's good. Thanks.

llvm/test/CodeGen/X86/AMX/amx-across-func.ll
89–91

I'll create a separate patch to clean the attributes.

llvm/test/CodeGen/X86/AMX/amx-type.ll
67

If the stride is 64 we can transform the code to memcpy. How about do it in another patch?

103

It can check the load and store instruction is not transformed if they are not participate in amx operation. I prefer to keep the case.

LuoYuanke updated this revision to Diff 313458.Dec 22 2020, 5:44 PM

Address Pengfei's comments.

LuoYuanke added inline comments.Dec 22 2020, 10:04 PM
llvm/lib/Target/X86/X86LowerAMXType.cpp
263

In my test case, it is transformed after Combine redundant instructions.

*** IR Dump After Simplify the CFG ***
define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 {
entry:
  %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0
  %0 = load i16, i16* %row, align 64, !tbaa !2
  %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1
  %1 = load i16, i16* %col, align 2, !tbaa !7
  %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64) #6
  %3 = bitcast x86_amx %2 to <256 x i32>
  %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3
  store <256 x i32> %3, <256 x i32>* %tile, align 64, !tbaa !8
  ret void
}

To

*** IR Dump After Combine redundant instructions ***
; Function Attrs: alwaysinline nounwind uwtable mustprogress
define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 {
entry:
  %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0
  %0 = load i16, i16* %row, align 64, !tbaa !2
  %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1
  %1 = load i16, i16* %col, align 2, !tbaa !7
  %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf, i64 0, i64 0), i64 64) #6
  %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3
  %3 = bitcast <256 x i32>* %tile to x86_amx*
  store x86_amx %2, x86_amx* %3, align 64, !tbaa !8
  ret void
}

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

llvm/lib/Target/X86/X86LowerAMXType.cpp
153

I don't see any chance this happen. But we still need to handle the x86_amx* here if possible, right?
Maybe better to give an assertion for now.

cast<PointerType>(Src->getType())->isX86_AMXTy()

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

Ok, I'll disable the transform for AMX type.

LuoYuanke updated this revision to Diff 313635.Dec 23 2020, 5:53 PM

Address Pengfei's comments.

LuoYuanke updated this revision to Diff 313639.Dec 23 2020, 6:19 PM

Improve the comments.

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

Ok, I'll disable the transform for AMX type.

Good job.

llvm/lib/Target/X86/X86LowerAMXType.cpp
50

Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here.

124–150

Value

126

How about the Tile comes from tdpbssd?

173

We don't need to align to 1024. 64 should be enough. The same for below comments.

197

How about the Tile comes from tdpbssd?

LuoYuanke added inline comments.Dec 23 2020, 11:19 PM
llvm/lib/Target/X86/X86LowerAMXType.cpp
126

We have a convention, when amx intrinsics define a x86_amx tile the first 2 operands is the shape of the defined tile. For tdpbssd, the intrinsics operands are (m, n, k, ...). (m, n) is the shape of the produced tile.

pengfei added inline comments.Dec 23 2020, 11:35 PM
llvm/lib/Target/X86/X86LowerAMXType.cpp
126

Oh, yes. I missed that. Thanks.

244

vector

316

Why we need to recursively delete them? I think delete the nodes in DeadInsts is enough.

Address Pengfei's comments.

Refine comments.

LuoYuanke marked 2 inline comments as done.Dec 24 2020, 12:30 AM
pengfei accepted this revision.Dec 24 2020, 12:34 AM

LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if others have objections.

This revision is now accepted and ready to land.Dec 24 2020, 12:34 AM

LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if others have objections.

Thank Pengfei for the review. Sure, I'll wait for a few days.

This revision was landed with ongoing or failed builds.Dec 29 2020, 9:52 PM
This revision was automatically updated to reflect the committed changes.
uabelho added inline comments.
llvm/include/llvm/IR/Type.h
68

This addition causes a compilation warning in HexagonTargetObjectFile.cpp:

../lib/Target/Hexagon/HexagonTargetObjectFile.cpp:297:11: error: enumeration value 'X86_AMXTyID' not handled in switch [-Werror,-Wswitch]
  switch (Ty->getTypeID()) {
          ^
1 error generated.

Seen in build bots, e.g. here:
http://lab.llvm.org:8011/#/builders/57/builds/2889/steps/6/logs/stdio

pengfei added inline comments.Dec 30 2020, 6:35 AM
llvm/include/llvm/IR/Type.h
68

Thanks Mikael for pointing it out. I think we just need to put the type in the switch table. I've posted a patch to fix it. rG16c2067cf212.

uabelho added inline comments.Dec 30 2020, 6:48 AM
llvm/include/llvm/IR/Type.h
68

Yep, thanks!

MaskRay added a subscriber: MaskRay.EditedDec 30 2020, 9:51 AM

D93944 fixed an llvm-c-test issue. Note, adding new enum members usually requires check-all (at least check-llvm, but Clang may use these enum as well) because they can be used everywhere.

cuviper added inline comments.
llvm/include/llvm-c/Core.h
163

This is a breaking change to the C ABI -- can we move it to the end of the enum?
https://bugs.llvm.org/show_bug.cgi?id=48905

MaskRay added inline comments.Jan 27 2021, 4:32 PM
llvm/include/llvm-c/Core.h
163
LuoYuanke added inline comments.Jan 27 2021, 5:20 PM
llvm/include/llvm-c/Core.h
163

@MaskRay, thank you!

ychen added a subscriber: ychen.Wed, Mar 17, 9:12 PM