This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/X86/
-
test/
-
CodeGen/
-
X86/
-
amx_api.c
-
llvm/
-
include/
-
llvm-c/
1/3
Core.h
-
llvm/
-
Bitcode/
-
LLVMBitCodes.h
-
CodeGen/
-
ValueTypes.td
-
IR/
-
DataLayout.h
-
Intrinsics.h
-
Intrinsics.td
-
IntrinsicsX86.td
3
Type.h
-
Support/
-
MachineValueType.h
-
lib/
-
Analysis/
-
ConstantFolding.cpp
-
AsmParser/
-
LLLexer.cpp
-
Bitcode/
-
Reader/
-
BitcodeReader.cpp
-
Writer/
-
BitcodeWriter.cpp
-
CodeGen/
-
ValueTypes.cpp
-
IR/
-
AsmWriter.cpp
1
ConstantFold.cpp
-
Core.cpp
2/2
DataLayout.cpp
-
Function.cpp
-
LLVMContextImpl.h
-
LLVMContextImpl.cpp
-
Type.cpp
-
Target/X86/
-
X86/
-
X86ISelDAGToDAG.cpp
1
X86ISelLowering.cpp
13/38
X86LowerAMXType.cpp
-
X86RegisterInfo.td
-
Transforms/InstCombine/
-
InstCombine/
-
InstCombineLoadStoreAlloca.cpp
-
test/CodeGen/X86/AMX/
-
CodeGen/
-
X86/
-
AMX/
1/2
amx-across-func.ll
-
amx-config.ll
-
amx-intrinsic-chain.ll
-
amx-spill.ll
2/4
amx-type.ll
-
utils/TableGen/
-
TableGen/
-
CodeGenTarget.cpp
1
IntrinsicEmitter.cpp

Differential D91927

[X86] Add x86_amx type for intel AMX.
ClosedPublic

Authored by LuoYuanke on Nov 21 2020, 4:51 PM.

Download Raw Diff

Details

Reviewers

deadalnix
craig.topper
hfinkel
akashk4
rengolin
mehdi_amini
pengfei
wxiao3
xiangzhangllvm

Commits

rG981a0bd85811: [X86] Add x86_amx type for intel AMX.

Summary

The x86_amx is used for AMX intrinsics. <256 x i32> is bitcasted to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcasted to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. The new type x86_amx can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depends on https://reviews.llvm.org/D87981.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

LuoYuanke created this revision.Nov 21 2020, 4:51 PM

Herald added a reviewer: deadalnix. · View Herald TranscriptNov 21 2020, 4:51 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, cfe-commits, dexonsmith and 3 others. · View Herald Transcript

LuoYuanke requested review of this revision.Nov 21 2020, 4:51 PM

Herald added a subscriber: jdoerfert. · View Herald TranscriptNov 21 2020, 4:51 PM

Harbormaster completed remote builds in B79710: Diff 306886.Nov 21 2020, 4:52 PM

LuoYuanke retitled this revision from [X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics... to [X86] Add x86_amx type for intel AMX. .Nov 21 2020, 4:54 PM

LuoYuanke edited the summary of this revision. (Show Details)

LuoYuanke added reviewers: craig.topper, hfinkel, akashk4, rengolin, mehdi_amini, pengfei, wxiao3, xiangzhangllvm.Nov 21 2020, 5:01 PM

LuoYuanke mentioned this in D87981: [X86] AMX programming model..

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

llvm/lib/Target/X86/X86ISelLowering.cpp
5348–5349	Should this just be deleted?
llvm/lib/Target/X86/X86LowerAMXType.cpp
246	Don't use an assert to check the result of a dyn_cast. If it shouldn't fail just use cast<LoadInst> which will assert internally.
254	Unchecked dyn_cast
277	Use cast.

In D91927#2412557, @craig.topper wrote:

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

@craig.topper , thank you for reviewing my patch.
I think if user just use our external API, such IR won't be generated. However if there is such IR, we can transform bitcast to <store, load>, so that the type can be translated through memory. One of <store, load> is AMX intrinsics store/load, so it won't be optimized. Is it reasonable?

In D91927#2412604, @LuoYuanke wrote:

In D91927#2412557, @craig.topper wrote:

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

@craig.topper , thank you for reviewing my patch.
I think if user just use our external API, such IR won't be generated. However if there is such IR, we can transform bitcast to <store, load>, so that the type can be translated through memory. One of <store, load> is AMX intrinsics store/load, so it won't be optimized. Is it reasonable?

Its fine if its not optimized, just make sure it doesn't crash.

Address Craig's comments.

Harbormaster completed remote builds in B79920: Diff 307298.Nov 24 2020, 3:08 AM

Address Craig's comments.
Change dyn_cast to cast.

Harbormaster completed remote builds in B79921: Diff 307300.Nov 24 2020, 3:17 AM

LuoYuanke marked 3 inline comments as done.Nov 24 2020, 3:19 AM

pengfei added a parent revision: D87981: [X86] AMX programming model..Nov 24 2020, 4:26 AM

pengfei added inline comments.Nov 24 2020, 4:30 AM

llvm/lib/IR/DataLayout.cpp
796	Should be 512 bits?
llvm/lib/Target/X86/X86LowerAMXType.cpp
173	Why the alignment not be 64?

pengfei added inline comments.Nov 24 2020, 4:43 AM

llvm/lib/Target/X86/X86LowerAMXType.cpp
281	`%src` is not used here.
llvm/utils/TableGen/IntrinsicEmitter.cpp
252	Remove `,`

craig.topper added inline comments.Nov 24 2020, 10:45 AM

llvm/lib/Target/X86/X86LowerAMXType.cpp
166–206	Shouldn't this be in the function's entry block?
183	Just use Value. auto doesn't add any value other than shortening by 1 character.
295	maximun->maximum
299	Use Builder.getInt8PtrTy then you don't need Ctx

LuoYuanke marked an inline comment as done.Nov 24 2020, 9:39 PM

LuoYuanke added inline comments.

llvm/lib/IR/DataLayout.cpp
796	Yes. It is 512. Thanks.
llvm/lib/Target/X86/X86LowerAMXType.cpp
166–206	Yes. It is in function's entry block. It is done in line 48 of function CreateAllocaInst(). CreateAllocaInst() is actually copied from your code. :)
173	1024 is conservatives, because vector require the alignment to be the vector size. Here generate vector <256 x i32> load/store.

Address Craig and Pengfei's comments.

Harbormaster completed remote builds in B80048: Diff 307514.Nov 24 2020, 10:34 PM

Add the handler of "bitcast <256 x i32>* to x86_amx*".
Refactor the code.

Harbormaster completed remote builds in B80407: Diff 308146.Nov 28 2020, 4:53 AM

LuoYuanke added a subscriber: annita.zhang.Nov 30 2020, 3:23 AM

pengfei mentioned this in D92449: [X86] Sink x86_amx load in AMX type lowering..Dec 2 2020, 1:14 AM

LuoYuanke added a child revision: D92449: [X86] Sink x86_amx load in AMX type lowering..Dec 3 2020, 12:52 AM

LuoYuanke updated this revision to Diff 309761.Dec 5 2020, 11:31 PM

Avoid generatng constant for x86_amx.

Harbormaster completed remote builds in B81221: Diff 309761.Dec 6 2020, 12:49 AM

LuoYuanke added a child revision: D92837: [X86] Support tilezero intrinsic and c interface for AMX..Dec 8 2020, 4:50 AM

Rebase.

Harbormaster completed remote builds in B81790: Diff 310800.Dec 10 2020, 2:29 AM

LuoYuanke added a child revision: D93594: [X86] Pass to transform amx intrinsics to scalar operation..Dec 20 2020, 5:00 AM

pengfei added inline comments.Dec 21 2020, 5:46 AM

llvm/lib/IR/ConstantFold.cpp
540	Operation should at the end of the line.
llvm/lib/Target/X86/X86LowerAMXType.cpp
60	I think we'd better to check exceptions. E.g. default: llvm_unreachable(""); case Intrinsic::x86_tileloadd64_internal: case Intrinsic::x86_tdpbssd_internal: case Intrinsic::x86_tilestored64_internal: Row = II->getArgOperand(0); Col = II->getArgOperand(1); break;
122	Why don't check empty like line 157?
204	Is it possible the x86_amx operand isn't from AMX intrinsic, e.g. %src = bitcast <256 x i32> %xxx to x86_amx %2 = bitcast x86_amx %src to <256 x i32>
221–224	Better move it to line 310.
236–237	Better to reuse the cast result, e.g. BitCastInst *BInst = dyn_cast<BitCastInst>(&Inst); if (!BInst ) You can save several `cast<BitCastInst>(&Inst)` below.
263	Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before this bitcast if it exists?
269	Maybe better to keep a duplicated `load` that calling `transformBitcast`. The same for line 285.
284	Why we need to consider <256 x i32> has more than one use?
llvm/test/CodeGen/X86/AMX/amx-across-func.ll
89–91	Better to remove these unused attributes. The same to other tests.
llvm/test/CodeGen/X86/AMX/amx-type.ll
67	For this and the next test, we have chances to optimize to memcpy if we can make sure %s is constant 64.
103	We don't need to check this case now, right?

Address Pengfei's comments.

Rebase and fix lit test case failure.

pengfei added inline comments.Dec 22 2020, 6:57 AM

llvm/lib/Target/X86/X86LowerAMXType.cpp
153	Why don't put it in DeadBitcasts?
155	Can we leave the canonicalize bitcast cases a single patch. It's a bit complex here and I don't think it's a common case.
163	Maybe better to use BitCastInst?
309	This comment is for above code? Better move it up.

Harbormaster completed remote builds in B83255: Diff 313315.Dec 22 2020, 7:02 AM

Harbormaster completed remote builds in B83264: Diff 313326.Dec 22 2020, 7:39 AM

LuoYuanke added inline comments.Dec 22 2020, 3:20 PM

llvm/lib/Target/X86/X86LowerAMXType.cpp
155	Ok, I'll create another patch for it.
163	There may be dead load or store instructions.
204	Good catch. I'll add support for this pattern.
236–237	That's good. Thanks.
llvm/test/CodeGen/X86/AMX/amx-across-func.ll
89–91	I'll create a separate patch to clean the attributes.
llvm/test/CodeGen/X86/AMX/amx-type.ll
67	If the stride is 64 we can transform the code to memcpy. How about do it in another patch?
103	It can check the load and store instruction is not transformed if they are not participate in amx operation. I prefer to keep the case.

Address Pengfei's comments.

Harbormaster completed remote builds in B83344: Diff 313458.Dec 22 2020, 6:18 PM

LuoYuanke added a child revision: D93740: [X86] Canonicalize AMX bitcast instruction..Dec 22 2020, 6:31 PM

LuoYuanke added inline comments.Dec 22 2020, 10:04 PM

llvm/lib/Target/X86/X86LowerAMXType.cpp

263

In my test case, it is transformed after Combine redundant instructions.

*** IR Dump After Simplify the CFG ***
define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 {
entry:
  %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0
  %0 = load i16, i16* %row, align 64, !tbaa !2
  %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1
  %1 = load i16, i16* %col, align 2, !tbaa !7
  %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64) #6
  %3 = bitcast x86_amx %2 to <256 x i32>
  %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3
  store <256 x i32> %3, <256 x i32>* %tile, align 64, !tbaa !8
  ret void
}

*** IR Dump After Combine redundant instructions ***
; Function Attrs: alwaysinline nounwind uwtable mustprogress
define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 {
entry:
  %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0
  %0 = load i16, i16* %row, align 64, !tbaa !2
  %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1
  %1 = load i16, i16* %col, align 2, !tbaa !7
  %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf, i64 0, i64 0), i64 64) #6
  %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3
  %3 = bitcast <256 x i32>* %tile to x86_amx*
  store x86_amx %2, x86_amx* %3, align 64, !tbaa !8
  ret void
}

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

llvm/lib/Target/X86/X86LowerAMXType.cpp
153	I don't see any chance this happen. But we still need to handle the x86_amx* here if possible, right? Maybe better to give an assertion for now. cast<PointerType>(Src->getType())->isX86_AMXTy()

In D91927#2469977, @pengfei wrote:

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

Ok, I'll disable the transform for AMX type.

Address Pengfei's comments.

LuoYuanke added a child revision: D93788: [X86] Transform amx pointer..Dec 23 2020, 6:04 PM

Improve the comments.

Harbormaster completed remote builds in B83442: Diff 313635.Dec 23 2020, 6:41 PM

Harbormaster completed remote builds in B83445: Diff 313639.Dec 23 2020, 7:06 PM

LuoYuanke added a child revision: D93792: [X86] Refactor AMX test case, remove unnecessary code..Dec 23 2020, 7:40 PM

In D91927#2470818, @LuoYuanke wrote:

In D91927#2469977, @pengfei wrote:

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

Ok, I'll disable the transform for AMX type.

Good job.

llvm/lib/Target/X86/X86LowerAMXType.cpp
50	Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here.
124–150	Value
126	How about the `Tile` comes from tdpbssd?
173	We don't need to align to 1024. 64 should be enough. The same for below comments.
197	How about the `Tile` comes from tdpbssd?

LuoYuanke added inline comments.Dec 23 2020, 11:19 PM

llvm/lib/Target/X86/X86LowerAMXType.cpp
126	We have a convention, when amx intrinsics define a x86_amx tile the first 2 operands is the shape of the defined tile. For tdpbssd, the intrinsics operands are (m, n, k, ...). (m, n) is the shape of the produced tile.

pengfei added inline comments.Dec 23 2020, 11:35 PM

llvm/lib/Target/X86/X86LowerAMXType.cpp
126	Oh, yes. I missed that. Thanks.
244	vector
316	Why we need to recursively delete them? I think delete the nodes in DeadInsts is enough.

Address Pengfei's comments.

Refine comments.

LuoYuanke marked 2 inline comments as done.Dec 24 2020, 12:30 AM

LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if others have objections.

This revision is now accepted and ready to land.Dec 24 2020, 12:34 AM

In D91927#2471140, @pengfei wrote:

LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if others have objections.

Thank Pengfei for the review. Sure, I'll wait for a few days.

Harbormaster completed remote builds in B83465: Diff 313663.Dec 24 2020, 1:04 AM

Harbormaster completed remote builds in B83468: Diff 313669.Dec 24 2020, 1:17 AM

LuoYuanke added a child revision: D93898: [X86] Fix tile register spill issue..Dec 29 2020, 4:10 AM

This revision was landed with ongoing or failed builds.Dec 29 2020, 9:52 PM

Closed by commit rG981a0bd85811: [X86] Add x86_amx type for intel AMX. (authored by LuoYuanke). · Explain Why

This revision was automatically updated to reflect the committed changes.

LuoYuanke added a commit: rG981a0bd85811: [X86] Add x86_amx type for intel AMX..

uabelho added a subscriber: uabelho.Dec 30 2020, 6:11 AM

uabelho added inline comments.

llvm/include/llvm/IR/Type.h
68	This addition causes a compilation warning in HexagonTargetObjectFile.cpp: ../lib/Target/Hexagon/HexagonTargetObjectFile.cpp:297:11: error: enumeration value 'X86_AMXTyID' not handled in switch [-Werror,-Wswitch] switch (Ty->getTypeID()) { ^ 1 error generated. Seen in build bots, e.g. here: http://lab.llvm.org:8011/#/builders/57/builds/2889/steps/6/logs/stdio

pengfei added inline comments.Dec 30 2020, 6:35 AM

llvm/include/llvm/IR/Type.h
68	Thanks Mikael for pointing it out. I think we just need to put the type in the switch table. I've posted a patch to fix it. rG16c2067cf212.

uabelho added inline comments.Dec 30 2020, 6:48 AM

llvm/include/llvm/IR/Type.h
68	Yep, thanks!

D93944 fixed an llvm-c-test issue. Note, adding new enum members usually requires check-all (at least check-llvm, but Clang may use these enum as well) because they can be used everywhere.

Thank @pengfei and @MaskRay.

LuoYuanke removed a child revision: D92837: [X86] Support tilezero intrinsic and c interface for AMX..Dec 30 2020, 5:32 PM

LuoYuanke added a child revision: D94372: [X86][AMX] Prohibit pointer cast on load..Jan 9 2021, 10:11 PM

cuviper added a subscriber: cuviper.Jan 27 2021, 1:20 PM

cuviper added inline comments.

llvm/include/llvm-c/Core.h
163	This is a breaking change to the C ABI -- can we move it to the end of the enum? https://bugs.llvm.org/show_bug.cgi?id=48905

MaskRay added inline comments.Jan 27 2021, 4:32 PM

llvm/include/llvm-c/Core.h
163	Done in 6612c2bb68becda5504099b48082c844503c6d4c

LuoYuanke added inline comments.Jan 27 2021, 5:20 PM

llvm/include/llvm-c/Core.h
163	@MaskRay, thank you!

ychen added a subscriber: ychen.Mar 17 2021, 9:12 PM

Revision Contents

Path

Size

clang/

test/

CodeGen/

X86/

amx_api.c

13 lines

llvm/

include/

llvm-c/

Core.h

7 lines

llvm/

Bitcode/

LLVMBitCodes.h

3 lines

CodeGen/

ValueTypes.td

1 line

IR/

2 lines

3 lines

2 lines

32 lines

12 lines

Support/

MachineValueType.h

4 lines

lib/

Analysis/

ConstantFolding.cpp

15 lines

AsmParser/

LLLexer.cpp

1 line

Bitcode/

Reader/

BitcodeReader.cpp

3 lines

Writer/

BitcodeWriter.cpp

1 line

CodeGen/

ValueTypes.cpp

3 lines

IR/

1 line

2 lines

8 lines

2 lines

9 lines

2 lines

1 line

15 lines

Target/

X86/

4 lines

7 lines

455 lines

2 lines

Transforms/

InstCombine/

InstCombineLoadStoreAlloca.cpp

4 lines

test/

CodeGen/

X86/

AMX/

amx-across-func.ll

16 lines

amx-config.ll

28 lines

amx-intrinsic-chain.ll

24 lines

amx-spill.ll

48 lines

amx-type.ll

185 lines

utils/

TableGen/

CodeGenTarget.cpp

1 line

IntrinsicEmitter.cpp

4 lines

Diff 314066

clang/test/CodeGen/X86/amx_api.c

	// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +avx512f -target-feature +amx-int8 \			// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +avx512f -target-feature +amx-int8 \
	// RUN: -target-feature +amx-bf16 -emit-llvm -o - -Werror -pedantic \| FileCheck %s --check-prefixes=CHECK			// RUN: -target-feature +amx-bf16 -emit-llvm -o - -Werror -pedantic \| FileCheck %s --check-prefixes=CHECK

	#include <immintrin.h>			#include <immintrin.h>

	char buf[1024];			char buf[1024];
	#define STRIDE 32			#define STRIDE 32

	char buf2[1024];			char buf2[1024];

	// This is an example code and integration test.			// This is an example code and integration test.
	void test_api(int cond, short row, short col) {			void test_api(int cond, short row, short col) {
	//CHECK-LABEL: @test_api			//CHECK-LABEL: @test_api
	//CHECK: call <256 x i32> @llvm.x86.tileloadd64.internal			//CHECK: call x86_amx @llvm.x86.tileloadd64.internal
	//CHECK: call <256 x i32> @llvm.x86.tdpbssd.internal			//CHECK: call x86_amx @llvm.x86.tdpbssd.internal
	//CHECK: call void @llvm.x86.tilestored64.internal			//CHECK: call void @llvm.x86.tilestored64.internal
	__tile1024i a = {row, 8};			__tile1024i a = {row, 8};
	__tile1024i b = {8, col};			__tile1024i b = {8, col};
	__tile1024i c = {row, col};			__tile1024i c = {row, col};

	if (cond) {			if (cond) {
	__tile_loadd(&a, buf, STRIDE);			__tile_loadd(&a, buf, STRIDE);
	__tile_loadd(&b, buf, STRIDE);			__tile_loadd(&b, buf, STRIDE);
	__tile_loadd(&c, buf, STRIDE);			__tile_loadd(&c, buf, STRIDE);
	} else {			} else {
	__tile_loadd(&a, buf2, STRIDE);			__tile_loadd(&a, buf2, STRIDE);
	__tile_loadd(&b, buf2, STRIDE);			__tile_loadd(&b, buf2, STRIDE);
	__tile_loadd(&c, buf2, STRIDE);			__tile_loadd(&c, buf2, STRIDE);
	}			}
	__tile_dpbsud(&c, a, b);			__tile_dpbsud(&c, a, b);
	__tile_stored(buf, STRIDE, c);			__tile_stored(buf, STRIDE, c);
	}			}

	void test_tile_loadd(short row, short col) {			void test_tile_loadd(short row, short col) {
	//CHECK-LABEL: @test_tile_loadd			//CHECK-LABEL: @test_tile_loadd
	//CHECK: call <256 x i32> @llvm.x86.tileloadd64.internal			//CHECK: call x86_amx @llvm.x86.tileloadd64.internal
				//CHECK-NEXT: {{%.}} = bitcast x86_amx {{%.}} to <256 x i32>
	__tile1024i a = {row, col};			__tile1024i a = {row, col};
	__tile_loadd(&a, buf, STRIDE);			__tile_loadd(&a, buf, STRIDE);
	}			}

	void test_tile_dpbsud(__tile1024i a, __tile1024i b, __tile1024i c) {			void test_tile_dpbsud(__tile1024i a, __tile1024i b, __tile1024i c) {
	//CHECK-LABEL: @test_tile_dpbsud			//CHECK-LABEL: @test_tile_dpbsud
	//CHECK: call <256 x i32> @llvm.x86.tdpbssd.internal			//CHECK: call x86_amx @llvm.x86.tdpbssd.internal
				//CHECK-NEXT: {{%.}} = bitcast x86_amx {{%.}} to <256 x i32>
	__tile_dpbsud(&c, a, b);			__tile_dpbsud(&c, a, b);
	}			}

	void test_tile_stored(__tile1024i c) {			void test_tile_stored(__tile1024i c) {
	//CHECK-LABEL: @test_tile_stored			//CHECK-LABEL: @test_tile_stored
	//CHECK: call void @llvm.x86.tilestored64.internal			//CHECK: {{%.}} = bitcast <256 x i32> {{%.}} to x86_amx
				//CHECK-NEXT: call void @llvm.x86.tilestored64.internal
	__tile_stored(buf, STRIDE, c);			__tile_stored(buf, STRIDE, c);
	}			}

llvm/include/llvm-c/Core.h

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	typedef enum {
LLVMIntegerTypeKind, /*< Arbitrary bit width integers /		LLVMIntegerTypeKind, /*< Arbitrary bit width integers /
LLVMFunctionTypeKind, /*< Functions /		LLVMFunctionTypeKind, /*< Functions /
LLVMStructTypeKind, /*< Structures /		LLVMStructTypeKind, /*< Structures /
LLVMArrayTypeKind, /*< Arrays /		LLVMArrayTypeKind, /*< Arrays /
LLVMPointerTypeKind, /*< Pointers /		LLVMPointerTypeKind, /*< Pointers /
LLVMVectorTypeKind, /*< Fixed width SIMD vector type /		LLVMVectorTypeKind, /*< Fixed width SIMD vector type /
LLVMMetadataTypeKind, /*< Metadata /		LLVMMetadataTypeKind, /*< Metadata /
LLVMX86_MMXTypeKind, /*< X86 MMX /		LLVMX86_MMXTypeKind, /*< X86 MMX /
		LLVMX86_AMXTypeKind, /*< X86 AMX /
		cuviperUnsubmitted Not Done Reply Inline Actions This is a breaking change to the C ABI -- can we move it to the end of the enum? https://bugs.llvm.org/show_bug.cgi?id=48905 cuviper: This is a breaking change to the C ABI -- can we move it to the end of the enum? https://bugs.
		MaskRayUnsubmitted Not Done Reply Inline Actions Done in 6612c2bb68becda5504099b48082c844503c6d4c MaskRay: Done in 6612c2bb68becda5504099b48082c844503c6d4c
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions @MaskRay, thank you! LuoYuanke: @MaskRay, thank you!
LLVMTokenTypeKind, /*< Tokens /		LLVMTokenTypeKind, /*< Tokens /
LLVMScalableVectorTypeKind, /*< Scalable SIMD vector type /		LLVMScalableVectorTypeKind, /*< Scalable SIMD vector type /
LLVMBFloatTypeKind /*< 16 bit brain floating point type /		LLVMBFloatTypeKind /*< 16 bit brain floating point type /
} LLVMTypeKind;		} LLVMTypeKind;

typedef enum {		typedef enum {
LLVMExternalLinkage, /*< Externally visible function /		LLVMExternalLinkage, /*< Externally visible function /
LLVMAvailableExternallyLinkage,		LLVMAvailableExternallyLinkage,
▲ Show 20 Lines • Show All 1,318 Lines • ▼ Show 20 Lines
LLVMTypeRef LLVMLabelTypeInContext(LLVMContextRef C);		LLVMTypeRef LLVMLabelTypeInContext(LLVMContextRef C);

/**		/**
* Create a X86 MMX type in a context.		* Create a X86 MMX type in a context.
*/		*/
LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C);		LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C);

/**		/**
		* Create a X86 AMX type in a context.
		*/
		LLVMTypeRef LLVMX86AMXTypeInContext(LLVMContextRef C);

		/**
* Create a token type in a context.		* Create a token type in a context.
*/		*/
LLVMTypeRef LLVMTokenTypeInContext(LLVMContextRef C);		LLVMTypeRef LLVMTokenTypeInContext(LLVMContextRef C);

/**		/**
* Create a metadata type in a context.		* Create a metadata type in a context.
*/		*/
LLVMTypeRef LLVMMetadataTypeInContext(LLVMContextRef C);		LLVMTypeRef LLVMMetadataTypeInContext(LLVMContextRef C);

/**		/**
* These are similar to the above functions except they operate on the		* These are similar to the above functions except they operate on the
* global context.		* global context.
*/		*/
LLVMTypeRef LLVMVoidType(void);		LLVMTypeRef LLVMVoidType(void);
LLVMTypeRef LLVMLabelType(void);		LLVMTypeRef LLVMLabelType(void);
LLVMTypeRef LLVMX86MMXType(void);		LLVMTypeRef LLVMX86MMXType(void);
		LLVMTypeRef LLVMX86AMXType(void);

/**		/**
* @}		* @}
*/		*/

/**		/**
* @}		* @}
*/		*/
▲ Show 20 Lines • Show All 2,631 Lines • Show Last 20 Lines

llvm/include/llvm/Bitcode/LLVMBitCodes.h

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	enum TypeCodes {
TYPE_CODE_STRUCT_ANON = 18, // STRUCT_ANON: [ispacked, eltty x N]		TYPE_CODE_STRUCT_ANON = 18, // STRUCT_ANON: [ispacked, eltty x N]
TYPE_CODE_STRUCT_NAME = 19, // STRUCT_NAME: [strchr x N]		TYPE_CODE_STRUCT_NAME = 19, // STRUCT_NAME: [strchr x N]
TYPE_CODE_STRUCT_NAMED = 20, // STRUCT_NAMED: [ispacked, eltty x N]		TYPE_CODE_STRUCT_NAMED = 20, // STRUCT_NAMED: [ispacked, eltty x N]

TYPE_CODE_FUNCTION = 21, // FUNCTION: [vararg, retty, paramty x N]		TYPE_CODE_FUNCTION = 21, // FUNCTION: [vararg, retty, paramty x N]

TYPE_CODE_TOKEN = 22, // TOKEN		TYPE_CODE_TOKEN = 22, // TOKEN

TYPE_CODE_BFLOAT = 23 // BRAIN FLOATING POINT		TYPE_CODE_BFLOAT = 23, // BRAIN FLOATING POINT
		TYPE_CODE_X86_AMX = 24 // X86 AMX
};		};

enum OperandBundleTagCode {		enum OperandBundleTagCode {
OPERAND_BUNDLE_TAG = 1, // TAG: [strchr x N]		OPERAND_BUNDLE_TAG = 1, // TAG: [strchr x N]
};		};

enum SyncScopeNameCode {		enum SyncScopeNameCode {
SYNC_SCOPE_NAME = 1,		SYNC_SCOPE_NAME = 1,
▲ Show 20 Lines • Show All 500 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ValueTypes.td

	Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines

	def x86mmx : ValueType<64 , 157>; // X86 MMX value			def x86mmx : ValueType<64 , 157>; // X86 MMX value
	def FlagVT : ValueType<0 , 158>; // Pre-RA sched glue			def FlagVT : ValueType<0 , 158>; // Pre-RA sched glue
	def isVoid : ValueType<0 , 159>; // Produces no value			def isVoid : ValueType<0 , 159>; // Produces no value
	def untyped: ValueType<8 , 160>; // Produces an untyped value			def untyped: ValueType<8 , 160>; // Produces an untyped value
	def exnref : ValueType<0 , 161>; // WebAssembly's exnref type			def exnref : ValueType<0 , 161>; // WebAssembly's exnref type
	def funcref : ValueType<0 , 162>; // WebAssembly's funcref type			def funcref : ValueType<0 , 162>; // WebAssembly's funcref type
	def externref : ValueType<0 , 163>; // WebAssembly's externref type			def externref : ValueType<0 , 163>; // WebAssembly's externref type
				def x86amx : ValueType<8192, 164>; // X86 AMX value


	def token : ValueType<0 , 248>; // TokenTy			def token : ValueType<0 , 248>; // TokenTy
	def MetadataVT: ValueType<0, 249>; // Metadata			def MetadataVT: ValueType<0, 249>; // Metadata

	// Pseudo valuetype mapped to the current pointer size to any address space.			// Pseudo valuetype mapped to the current pointer size to any address space.
	// Should only be used in TableGen.			// Should only be used in TableGen.
	def iPTRAny : ValueType<0, 250>;			def iPTRAny : ValueType<0, 250>;
	Show All 26 Lines

llvm/include/llvm/IR/DataLayout.h

Show First 20 Lines • Show All 684 Lines • ▼ Show 20 Lines	inline TypeSize DataLayout::getTypeSizeInBits(Type *Ty) const {
case Type::FloatTyID:		case Type::FloatTyID:
return TypeSize::Fixed(32);		return TypeSize::Fixed(32);
case Type::DoubleTyID:		case Type::DoubleTyID:
case Type::X86_MMXTyID:		case Type::X86_MMXTyID:
return TypeSize::Fixed(64);		return TypeSize::Fixed(64);
case Type::PPC_FP128TyID:		case Type::PPC_FP128TyID:
case Type::FP128TyID:		case Type::FP128TyID:
return TypeSize::Fixed(128);		return TypeSize::Fixed(128);
		case Type::X86_AMXTyID:
		return TypeSize::Fixed(8192);
// In memory objects this is always aligned to a higher boundary, but		// In memory objects this is always aligned to a higher boundary, but
// only 80 bits contain information.		// only 80 bits contain information.
case Type::X86_FP80TyID:		case Type::X86_FP80TyID:
return TypeSize::Fixed(80);		return TypeSize::Fixed(80);
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
VectorType *VTy = cast<VectorType>(Ty);		VectorType *VTy = cast<VectorType>(Ty);
auto EltCnt = VTy->getElementCount();		auto EltCnt = VTy->getElementCount();
Show All 12 Lines

llvm/include/llvm/IR/Intrinsics.h

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	enum IITDescriptorKind {
HalfVecArgument,		HalfVecArgument,
SameVecWidthArgument,		SameVecWidthArgument,
PtrToArgument,		PtrToArgument,
PtrToElt,		PtrToElt,
VecOfAnyPtrsToElt,		VecOfAnyPtrsToElt,
VecElementArgument,		VecElementArgument,
Subdivide2Argument,		Subdivide2Argument,
Subdivide4Argument,		Subdivide4Argument,
VecOfBitcastsToInt		VecOfBitcastsToInt,
		AMX
} Kind;		} Kind;

union {		union {
unsigned Integer_Width;		unsigned Integer_Width;
unsigned Float_Width;		unsigned Float_Width;
unsigned Pointer_AddressSpace;		unsigned Pointer_AddressSpace;
unsigned Struct_NumElements;		unsigned Struct_NumElements;
unsigned Argument_Info;		unsigned Argument_Info;
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	def llvm_empty_ty : LLVMType<OtherVT>; // { }			def llvm_empty_ty : LLVMType<OtherVT>; // { }
	def llvm_descriptor_ty : LLVMPointerType<llvm_empty_ty>; // { }*			def llvm_descriptor_ty : LLVMPointerType<llvm_empty_ty>; // { }*
	def llvm_metadata_ty : LLVMType<MetadataVT>; // !{...}			def llvm_metadata_ty : LLVMType<MetadataVT>; // !{...}
	def llvm_token_ty : LLVMType<token>; // token			def llvm_token_ty : LLVMType<token>; // token

	def llvm_x86mmx_ty : LLVMType<x86mmx>;			def llvm_x86mmx_ty : LLVMType<x86mmx>;
	def llvm_ptrx86mmx_ty : LLVMPointerType<llvm_x86mmx_ty>; // <1 x i64>*			def llvm_ptrx86mmx_ty : LLVMPointerType<llvm_x86mmx_ty>; // <1 x i64>*

				def llvm_x86amx_ty : LLVMType<x86amx>;

	def llvm_v2i1_ty : LLVMType<v2i1>; // 2 x i1			def llvm_v2i1_ty : LLVMType<v2i1>; // 2 x i1
	def llvm_v4i1_ty : LLVMType<v4i1>; // 4 x i1			def llvm_v4i1_ty : LLVMType<v4i1>; // 4 x i1
	def llvm_v8i1_ty : LLVMType<v8i1>; // 8 x i1			def llvm_v8i1_ty : LLVMType<v8i1>; // 8 x i1
	def llvm_v16i1_ty : LLVMType<v16i1>; // 16 x i1			def llvm_v16i1_ty : LLVMType<v16i1>; // 16 x i1
	def llvm_v32i1_ty : LLVMType<v32i1>; // 32 x i1			def llvm_v32i1_ty : LLVMType<v32i1>; // 32 x i1
	def llvm_v64i1_ty : LLVMType<v64i1>; // 64 x i1			def llvm_v64i1_ty : LLVMType<v64i1>; // 64 x i1
	def llvm_v128i1_ty : LLVMType<v128i1>; // 128 x i1			def llvm_v128i1_ty : LLVMType<v128i1>; // 128 x i1
	def llvm_v256i1_ty : LLVMType<v256i1>; // 256 x i1			def llvm_v256i1_ty : LLVMType<v256i1>; // 256 x i1
	▲ Show 20 Lines • Show All 1,389 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsX86.td

Show First 20 Lines • Show All 5,035 Lines • ▼ Show 20 Lines	let TargetPrefix = "x86" in {
def int_x86_tdpbuud : GCCBuiltin<"__builtin_ia32_tdpbuud">,		def int_x86_tdpbuud : GCCBuiltin<"__builtin_ia32_tdpbuud">,
Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty],		Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty],
[ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>,		[ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>,
ImmArg<ArgIndex<2>>]>;		ImmArg<ArgIndex<2>>]>;
def int_x86_tdpbf16ps : GCCBuiltin<"__builtin_ia32_tdpbf16ps">,		def int_x86_tdpbf16ps : GCCBuiltin<"__builtin_ia32_tdpbf16ps">,
Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty],		Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty],
[ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>,		[ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>,
ImmArg<ArgIndex<2>>]>;		ImmArg<ArgIndex<2>>]>;
		// AMX - internal intrinsics
		def int_x86_tileloadd64_internal :
		GCCBuiltin<"__builtin_ia32_tileloadd64_internal">,
		Intrinsic<[llvm_x86amx_ty],
		[llvm_i16_ty, llvm_i16_ty, llvm_ptr_ty, llvm_i64_ty],
		[]>;
		def int_x86_tdpbssd_internal :
		GCCBuiltin<"__builtin_ia32_tdpbssd_internal">,
		Intrinsic<[llvm_x86amx_ty],
		[llvm_i16_ty, llvm_i16_ty, llvm_i16_ty,
		llvm_x86amx_ty, llvm_x86amx_ty,
		llvm_x86amx_ty], []>;
		def int_x86_tilestored64_internal :
		GCCBuiltin<"__builtin_ia32_tilestored64_internal">,
		Intrinsic<[], [llvm_i16_ty, llvm_i16_ty, llvm_ptr_ty,
		llvm_i64_ty, llvm_x86amx_ty], []>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// UINTR - User Level Interrupt		// UINTR - User Level Interrupt

let TargetPrefix = "x86" in {		let TargetPrefix = "x86" in {
def int_x86_clui : GCCBuiltin<"__builtin_ia32_clui">,		def int_x86_clui : GCCBuiltin<"__builtin_ia32_clui">,
Intrinsic<[], [], []>;		Intrinsic<[], [], []>;
def int_x86_stui : GCCBuiltin<"__builtin_ia32_stui">,		def int_x86_stui : GCCBuiltin<"__builtin_ia32_stui">,
Intrinsic<[], [], []>;		Intrinsic<[], [], []>;
def int_x86_testui : GCCBuiltin<"__builtin_ia32_testui">,		def int_x86_testui : GCCBuiltin<"__builtin_ia32_testui">,
Intrinsic<[llvm_i8_ty], [], []>;		Intrinsic<[llvm_i8_ty], [], []>;
def int_x86_senduipi : GCCBuiltin<"__builtin_ia32_senduipi">,		def int_x86_senduipi : GCCBuiltin<"__builtin_ia32_senduipi">,
Intrinsic<[], [llvm_i64_ty], []>;		Intrinsic<[], [llvm_i64_ty], []>;
// AMX - internal intrinsics
def int_x86_tileloadd64_internal :
GCCBuiltin<"__builtin_ia32_tileloadd64_internal">,
Intrinsic<[llvm_v256i32_ty],
[llvm_i16_ty, llvm_i16_ty, llvm_ptr_ty, llvm_i64_ty],
[]>;
def int_x86_tdpbssd_internal :
GCCBuiltin<"__builtin_ia32_tdpbssd_internal">,
Intrinsic<[llvm_v256i32_ty],
[llvm_i16_ty, llvm_i16_ty, llvm_i16_ty,
llvm_v256i32_ty, llvm_v256i32_ty,
llvm_v256i32_ty], []>;
def int_x86_tilestored64_internal :
GCCBuiltin<"__builtin_ia32_tilestored64_internal">,
Intrinsic<[], [llvm_i16_ty, llvm_i16_ty, llvm_ptr_ty,
llvm_i64_ty, llvm_v256i32_ty], []>;
}		}

llvm/include/llvm/IR/Type.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	enum TypeID {
DoubleTyID, ///< 64-bit floating point type		DoubleTyID, ///< 64-bit floating point type
X86_FP80TyID, ///< 80-bit floating point type (X87)		X86_FP80TyID, ///< 80-bit floating point type (X87)
FP128TyID, ///< 128-bit floating point type (112-bit significand)		FP128TyID, ///< 128-bit floating point type (112-bit significand)
PPC_FP128TyID, ///< 128-bit floating point type (two 64-bits, PowerPC)		PPC_FP128TyID, ///< 128-bit floating point type (two 64-bits, PowerPC)
VoidTyID, ///< type with no size		VoidTyID, ///< type with no size
LabelTyID, ///< Labels		LabelTyID, ///< Labels
MetadataTyID, ///< Metadata		MetadataTyID, ///< Metadata
X86_MMXTyID, ///< MMX vectors (64 bits, X86 specific)		X86_MMXTyID, ///< MMX vectors (64 bits, X86 specific)
		X86_AMXTyID, ///< AMX vectors (8192 bits, X86 specific)
		uabelhoUnsubmitted Not Done Reply Inline Actions This addition causes a compilation warning in HexagonTargetObjectFile.cpp: ../lib/Target/Hexagon/HexagonTargetObjectFile.cpp:297:11: error: enumeration value 'X86_AMXTyID' not handled in switch [-Werror,-Wswitch] switch (Ty->getTypeID()) { ^ 1 error generated. Seen in build bots, e.g. here: http://lab.llvm.org:8011/#/builders/57/builds/2889/steps/6/logs/stdio uabelho: This addition causes a compilation warning in HexagonTargetObjectFile.cpp: ``` ..
		pengfeiUnsubmitted Not Done Reply Inline Actions Thanks Mikael for pointing it out. I think we just need to put the type in the switch table. I've posted a patch to fix it. rG16c2067cf212. pengfei: Thanks Mikael for pointing it out. I think we just need to put the type in the switch table.
		uabelhoUnsubmitted Not Done Reply Inline Actions Yep, thanks! uabelho: Yep, thanks!
TokenTyID, ///< Tokens		TokenTyID, ///< Tokens

// Derived types... see DerivedTypes.h file.		// Derived types... see DerivedTypes.h file.
IntegerTyID, ///< Arbitrary bit width integers		IntegerTyID, ///< Arbitrary bit width integers
FunctionTyID, ///< Functions		FunctionTyID, ///< Functions
PointerTyID, ///< Pointers		PointerTyID, ///< Pointers
StructTyID, ///< Structures		StructTyID, ///< Structures
ArrayTyID, ///< Arrays		ArrayTyID, ///< Arrays
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	const fltSemantics &getFltSemantics() const {
case PPC_FP128TyID: return APFloat::PPCDoubleDouble();		case PPC_FP128TyID: return APFloat::PPCDoubleDouble();
default: llvm_unreachable("Invalid floating type");		default: llvm_unreachable("Invalid floating type");
}		}
}		}

/// Return true if this is X86 MMX.		/// Return true if this is X86 MMX.
bool isX86_MMXTy() const { return getTypeID() == X86_MMXTyID; }		bool isX86_MMXTy() const { return getTypeID() == X86_MMXTyID; }

		/// Return true if this is X86 AMX.
		bool isX86_AMXTy() const { return getTypeID() == X86_AMXTyID; }

/// Return true if this is a FP type or a vector of FP.		/// Return true if this is a FP type or a vector of FP.
bool isFPOrFPVectorTy() const { return getScalarType()->isFloatingPointTy(); }		bool isFPOrFPVectorTy() const { return getScalarType()->isFloatingPointTy(); }

/// Return true if this is 'label'.		/// Return true if this is 'label'.
bool isLabelTy() const { return getTypeID() == LabelTyID; }		bool isLabelTy() const { return getTypeID() == LabelTyID; }

/// Return true if this is 'metadata'.		/// Return true if this is 'metadata'.
bool isMetadataTy() const { return getTypeID() == MetadataTyID; }		bool isMetadataTy() const { return getTypeID() == MetadataTyID; }
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	public:
bool isFirstClassType() const {		bool isFirstClassType() const {
return getTypeID() != FunctionTyID && getTypeID() != VoidTyID;		return getTypeID() != FunctionTyID && getTypeID() != VoidTyID;
}		}

/// Return true if the type is a valid type for a register in codegen. This		/// Return true if the type is a valid type for a register in codegen. This
/// includes all first-class types except struct and array types.		/// includes all first-class types except struct and array types.
bool isSingleValueType() const {		bool isSingleValueType() const {
return isFloatingPointTy() \|\| isX86_MMXTy() \|\| isIntegerTy() \|\|		return isFloatingPointTy() \|\| isX86_MMXTy() \|\| isIntegerTy() \|\|
isPointerTy() \|\| isVectorTy();		isPointerTy() \|\| isVectorTy() \|\| isX86_AMXTy();
}		}

/// Return true if the type is an aggregate type. This means it is valid as		/// Return true if the type is an aggregate type. This means it is valid as
/// the first operand of an insertvalue or extractvalue instruction. This		/// the first operand of an insertvalue or extractvalue instruction. This
/// includes struct and array types, but does not include vector types.		/// includes struct and array types, but does not include vector types.
bool isAggregateType() const {		bool isAggregateType() const {
return getTypeID() == StructTyID \|\| getTypeID() == ArrayTyID;		return getTypeID() == StructTyID \|\| getTypeID() == ArrayTyID;
}		}

/// Return true if it makes sense to take the size of this type. To get the		/// Return true if it makes sense to take the size of this type. To get the
/// actual size for a particular target, it is reasonable to use the		/// actual size for a particular target, it is reasonable to use the
/// DataLayout subsystem to do this.		/// DataLayout subsystem to do this.
bool isSized(SmallPtrSetImpl<Type> Visited = nullptr) const {		bool isSized(SmallPtrSetImpl<Type> Visited = nullptr) const {
// If it's a primitive, it is always sized.		// If it's a primitive, it is always sized.
if (getTypeID() == IntegerTyID \|\| isFloatingPointTy() \|\|		if (getTypeID() == IntegerTyID \|\| isFloatingPointTy() \|\|
getTypeID() == PointerTyID \|\|		getTypeID() == PointerTyID \|\| getTypeID() == X86_MMXTyID \|\|
getTypeID() == X86_MMXTyID)		getTypeID() == X86_AMXTyID)
return true;		return true;
// If it is not something that can have a size (e.g. a function or label),		// If it is not something that can have a size (e.g. a function or label),
// it doesn't have a size.		// it doesn't have a size.
if (getTypeID() != StructTyID && getTypeID() != ArrayTyID && !isVectorTy())		if (getTypeID() != StructTyID && getTypeID() != ArrayTyID && !isVectorTy())
return false;		return false;
// Otherwise we have to try harder to decide.		// Otherwise we have to try harder to decide.
return isSizedDerivedType(Visited);		return isSizedDerivedType(Visited);
}		}
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	public:
static Type *getBFloatTy(LLVMContext &C);		static Type *getBFloatTy(LLVMContext &C);
static Type *getFloatTy(LLVMContext &C);		static Type *getFloatTy(LLVMContext &C);
static Type *getDoubleTy(LLVMContext &C);		static Type *getDoubleTy(LLVMContext &C);
static Type *getMetadataTy(LLVMContext &C);		static Type *getMetadataTy(LLVMContext &C);
static Type *getX86_FP80Ty(LLVMContext &C);		static Type *getX86_FP80Ty(LLVMContext &C);
static Type *getFP128Ty(LLVMContext &C);		static Type *getFP128Ty(LLVMContext &C);
static Type *getPPC_FP128Ty(LLVMContext &C);		static Type *getPPC_FP128Ty(LLVMContext &C);
static Type *getX86_MMXTy(LLVMContext &C);		static Type *getX86_MMXTy(LLVMContext &C);
		static Type *getX86_AMXTy(LLVMContext &C);
static Type *getTokenTy(LLVMContext &C);		static Type *getTokenTy(LLVMContext &C);
static IntegerType *getIntNTy(LLVMContext &C, unsigned N);		static IntegerType *getIntNTy(LLVMContext &C, unsigned N);
static IntegerType *getInt1Ty(LLVMContext &C);		static IntegerType *getInt1Ty(LLVMContext &C);
static IntegerType *getInt8Ty(LLVMContext &C);		static IntegerType *getInt8Ty(LLVMContext &C);
static IntegerType *getInt16Ty(LLVMContext &C);		static IntegerType *getInt16Ty(LLVMContext &C);
static IntegerType *getInt32Ty(LLVMContext &C);		static IntegerType *getInt32Ty(LLVMContext &C);
static IntegerType *getInt64Ty(LLVMContext &C);		static IntegerType *getInt64Ty(LLVMContext &C);
static IntegerType *getInt128Ty(LLVMContext &C);		static IntegerType *getInt128Ty(LLVMContext &C);
Show All 39 Lines	public:
static PointerType *getHalfPtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getHalfPtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getBFloatPtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getBFloatPtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getFloatPtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getFloatPtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getDoublePtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getDoublePtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getX86_FP80PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getX86_FP80PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getFP128PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getFP128PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getPPC_FP128PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getPPC_FP128PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getX86_MMXPtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getX86_MMXPtrTy(LLVMContext &C, unsigned AS = 0);
		static PointerType *getX86_AMXPtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getIntNPtrTy(LLVMContext &C, unsigned N, unsigned AS = 0);		static PointerType *getIntNPtrTy(LLVMContext &C, unsigned N, unsigned AS = 0);
static PointerType *getInt1PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt1PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getInt8PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt8PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getInt16PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt16PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getInt32PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt32PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getInt64PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt64PtrTy(LLVMContext &C, unsigned AS = 0);

/// Return a pointer to the current type. This is equivalent to		/// Return a pointer to the current type. This is equivalent to
Show All 39 Lines

llvm/include/llvm/Support/MachineValueType.h

Show First 20 Lines • Show All 241 Lines • ▼ Show 20 Lines	enum SimpleValueType : uint8_t {

Untyped = 160, // This value takes a register, but has		Untyped = 160, // This value takes a register, but has
// unspecified type. The register class		// unspecified type. The register class
// will be determined by the opcode.		// will be determined by the opcode.

exnref = 161, // WebAssembly's exnref type		exnref = 161, // WebAssembly's exnref type
funcref = 162, // WebAssembly's funcref type		funcref = 162, // WebAssembly's funcref type
externref = 163, // WebAssembly's externref type		externref = 163, // WebAssembly's externref type
		x86amx = 164, // This is an X86 AMX value

FIRST_VALUETYPE = 1, // This is always the beginning of the list.		FIRST_VALUETYPE = 1, // This is always the beginning of the list.
LAST_VALUETYPE = 164, // This always remains at the end of the list.		LAST_VALUETYPE = 165, // This always remains at the end of the list.

// This is the current maximum for LAST_VALUETYPE.		// This is the current maximum for LAST_VALUETYPE.
// MVT::MAX_ALLOWED_VALUETYPE is used for asserts and to size bit vectors		// MVT::MAX_ALLOWED_VALUETYPE is used for asserts and to size bit vectors
// This value must be a multiple of 32.		// This value must be a multiple of 32.
MAX_ALLOWED_VALUETYPE = 192,		MAX_ALLOWED_VALUETYPE = 192,

// A value of type llvm::TokenTy		// A value of type llvm::TokenTy
token = 248,		token = 248,
▲ Show 20 Lines • Show All 700 Lines • ▼ Show 20 Lines	TypeSize getSizeInBits() const {
case nxv32i64: return TypeSize::Scalable(2048);		case nxv32i64: return TypeSize::Scalable(2048);
case v128i32:		case v128i32:
case v64i64:		case v64i64:
case v128f32:		case v128f32:
case v64f64: return TypeSize::Fixed(4096);		case v64f64: return TypeSize::Fixed(4096);
case v256i32:		case v256i32:
case v128i64:		case v128i64:
case v256f32:		case v256f32:
		case x86amx:
case v128f64: return TypeSize::Fixed(8192);		case v128f64: return TypeSize::Fixed(8192);
case v512i32:		case v512i32:
case v256i64:		case v256i64:
case v512f32:		case v512f32:
case v256f64: return TypeSize::Fixed(16384);		case v256f64: return TypeSize::Fixed(16384);
case v1024i32:		case v1024i32:
case v1024f32: return TypeSize::Fixed(32768);		case v1024f32: return TypeSize::Fixed(32768);
case v2048i32:		case v2048i32:
▲ Show 20 Lines • Show All 425 Lines • Show Last 20 Lines

llvm/lib/Analysis/ConstantFolding.cpp

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
/// Constant fold bitcast, symbolically evaluating it with DataLayout.		/// Constant fold bitcast, symbolically evaluating it with DataLayout.
/// This always returns a non-null constant, but it may be a		/// This always returns a non-null constant, but it may be a
/// ConstantExpr if unfoldable.		/// ConstantExpr if unfoldable.
Constant FoldBitCast(Constant C, Type *DestTy, const DataLayout &DL) {		Constant FoldBitCast(Constant C, Type *DestTy, const DataLayout &DL) {
assert(CastInst::castIsValid(Instruction::BitCast, C, DestTy) &&		assert(CastInst::castIsValid(Instruction::BitCast, C, DestTy) &&
"Invalid constantexpr bitcast!");		"Invalid constantexpr bitcast!");

// Catch the obvious splat cases.		// Catch the obvious splat cases.
if (C->isNullValue() && !DestTy->isX86_MMXTy())		if (C->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy())
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
if (C->isAllOnesValue() && !DestTy->isX86_MMXTy() &&		if (C->isAllOnesValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy() &&
!DestTy->isPtrOrPtrVectorTy()) // Don't get ones for ptr types!		!DestTy->isPtrOrPtrVectorTy()) // Don't get ones for ptr types!
return Constant::getAllOnesValue(DestTy);		return Constant::getAllOnesValue(DestTy);

if (auto *VTy = dyn_cast<VectorType>(C->getType())) {		if (auto *VTy = dyn_cast<VectorType>(C->getType())) {
// Handle a vector->scalar integer/fp cast.		// Handle a vector->scalar integer/fp cast.
if (isa<IntegerType>(DestTy) \|\| DestTy->isFloatingPointTy()) {		if (isa<IntegerType>(DestTy) \|\| DestTy->isFloatingPointTy()) {
unsigned NumSrcElts = cast<FixedVectorType>(VTy)->getNumElements();		unsigned NumSrcElts = cast<FixedVectorType>(VTy)->getNumElements();
Type *SrcEltTy = VTy->getElementType();		Type *SrcEltTy = VTy->getElementType();
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	do {
Type *SrcTy = C->getType();		Type *SrcTy = C->getType();
uint64_t DestSize = DL.getTypeSizeInBits(DestTy);		uint64_t DestSize = DL.getTypeSizeInBits(DestTy);
uint64_t SrcSize = DL.getTypeSizeInBits(SrcTy);		uint64_t SrcSize = DL.getTypeSizeInBits(SrcTy);
if (SrcSize < DestSize)		if (SrcSize < DestSize)
return nullptr;		return nullptr;

// Catch the obvious splat cases (since all-zeros can coerce non-integral		// Catch the obvious splat cases (since all-zeros can coerce non-integral
// pointers legally).		// pointers legally).
if (C->isNullValue() && !DestTy->isX86_MMXTy())		if (C->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy())
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
if (C->isAllOnesValue() &&		if (C->isAllOnesValue() &&
(DestTy->isIntegerTy() \|\| DestTy->isFloatingPointTy() \|\|		(DestTy->isIntegerTy() \|\| DestTy->isFloatingPointTy() \|\|
DestTy->isVectorTy()) &&		DestTy->isVectorTy()) &&
!DestTy->isX86_MMXTy() && !DestTy->isPtrOrPtrVectorTy())		!DestTy->isX86_AMXTy() && !DestTy->isX86_MMXTy() &&
		!DestTy->isPtrOrPtrVectorTy())
// Get ones when the input is trivial, but		// Get ones when the input is trivial, but
// only for supported types inside getAllOnesValue.		// only for supported types inside getAllOnesValue.
return Constant::getAllOnesValue(DestTy);		return Constant::getAllOnesValue(DestTy);

// If the type sizes are the same and a cast is legal, just directly		// If the type sizes are the same and a cast is legal, just directly
// cast the constant.		// cast the constant.
// But be careful not to coerce non-integral pointers illegally.		// But be careful not to coerce non-integral pointers illegally.
if (SrcSize == DestSize &&		if (SrcSize == DestSize &&
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	if (!IntType) {
else if (LoadTy->isVectorTy()) {		else if (LoadTy->isVectorTy()) {
MapTy = PointerType::getIntNTy(		MapTy = PointerType::getIntNTy(
C->getContext(), DL.getTypeSizeInBits(LoadTy).getFixedSize());		C->getContext(), DL.getTypeSizeInBits(LoadTy).getFixedSize());
} else		} else
return nullptr;		return nullptr;

C = FoldBitCast(C, MapTy->getPointerTo(AS), DL);		C = FoldBitCast(C, MapTy->getPointerTo(AS), DL);
if (Constant *Res = FoldReinterpretLoadFromConstPtr(C, MapTy, DL)) {		if (Constant *Res = FoldReinterpretLoadFromConstPtr(C, MapTy, DL)) {
if (Res->isNullValue() && !LoadTy->isX86_MMXTy())		if (Res->isNullValue() && !LoadTy->isX86_MMXTy() &&
		!LoadTy->isX86_AMXTy())
// Materializing a zero can be done trivially without a bitcast		// Materializing a zero can be done trivially without a bitcast
return Constant::getNullValue(LoadTy);		return Constant::getNullValue(LoadTy);
Type *CastTy = LoadTy->isPtrOrPtrVectorTy() ? DL.getIntPtrType(LoadTy) : LoadTy;		Type *CastTy = LoadTy->isPtrOrPtrVectorTy() ? DL.getIntPtrType(LoadTy) : LoadTy;
Res = FoldBitCast(Res, CastTy, DL);		Res = FoldBitCast(Res, CastTy, DL);
if (LoadTy->isPtrOrPtrVectorTy()) {		if (LoadTy->isPtrOrPtrVectorTy()) {
// For vector of pointer, we needed to first convert to a vector of integer, then do vector inttoptr		// For vector of pointer, we needed to first convert to a vector of integer, then do vector inttoptr
if (Res->isNullValue() && !LoadTy->isX86_MMXTy())		if (Res->isNullValue() && !LoadTy->isX86_MMXTy() &&
		!LoadTy->isX86_AMXTy())
return Constant::getNullValue(LoadTy);		return Constant::getNullValue(LoadTy);
if (DL.isNonIntegralPointerType(LoadTy->getScalarType()))		if (DL.isNonIntegralPointerType(LoadTy->getScalarType()))
// Be careful not to replace a load of an addrspace value with an inttoptr here		// Be careful not to replace a load of an addrspace value with an inttoptr here
return nullptr;		return nullptr;
Res = ConstantExpr::getCast(Instruction::IntToPtr, Res, LoadTy);		Res = ConstantExpr::getCast(Instruction::IntToPtr, Res, LoadTy);
}		}
return Res;		return Res;
}		}
▲ Show 20 Lines • Show All 2,533 Lines • Show Last 20 Lines

llvm/lib/AsmParser/LLLexer.cpp

Show First 20 Lines • Show All 834 Lines • ▼ Show 20 Lines	#define TYPEKEYWORD(STR, LLVMTY) \
TYPEKEYWORD("float", Type::getFloatTy(Context));		TYPEKEYWORD("float", Type::getFloatTy(Context));
TYPEKEYWORD("double", Type::getDoubleTy(Context));		TYPEKEYWORD("double", Type::getDoubleTy(Context));
TYPEKEYWORD("x86_fp80", Type::getX86_FP80Ty(Context));		TYPEKEYWORD("x86_fp80", Type::getX86_FP80Ty(Context));
TYPEKEYWORD("fp128", Type::getFP128Ty(Context));		TYPEKEYWORD("fp128", Type::getFP128Ty(Context));
TYPEKEYWORD("ppc_fp128", Type::getPPC_FP128Ty(Context));		TYPEKEYWORD("ppc_fp128", Type::getPPC_FP128Ty(Context));
TYPEKEYWORD("label", Type::getLabelTy(Context));		TYPEKEYWORD("label", Type::getLabelTy(Context));
TYPEKEYWORD("metadata", Type::getMetadataTy(Context));		TYPEKEYWORD("metadata", Type::getMetadataTy(Context));
TYPEKEYWORD("x86_mmx", Type::getX86_MMXTy(Context));		TYPEKEYWORD("x86_mmx", Type::getX86_MMXTy(Context));
		TYPEKEYWORD("x86_amx", Type::getX86_AMXTy(Context));
TYPEKEYWORD("token", Type::getTokenTy(Context));		TYPEKEYWORD("token", Type::getTokenTy(Context));

#undef TYPEKEYWORD		#undef TYPEKEYWORD

// Keywords for instructions.		// Keywords for instructions.
#define INSTKEYWORD(STR, Enum) \		#define INSTKEYWORD(STR, Enum) \
do { \		do { \
if (Keyword == #STR) { \		if (Keyword == #STR) { \
▲ Show 20 Lines • Show All 321 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

Show First 20 Lines • Show All 1,757 Lines • ▼ Show 20 Lines	case bitc::TYPE_CODE_LABEL: // LABEL
ResultTy = Type::getLabelTy(Context);		ResultTy = Type::getLabelTy(Context);
break;		break;
case bitc::TYPE_CODE_METADATA: // METADATA		case bitc::TYPE_CODE_METADATA: // METADATA
ResultTy = Type::getMetadataTy(Context);		ResultTy = Type::getMetadataTy(Context);
break;		break;
case bitc::TYPE_CODE_X86_MMX: // X86_MMX		case bitc::TYPE_CODE_X86_MMX: // X86_MMX
ResultTy = Type::getX86_MMXTy(Context);		ResultTy = Type::getX86_MMXTy(Context);
break;		break;
		case bitc::TYPE_CODE_X86_AMX: // X86_AMX
		ResultTy = Type::getX86_AMXTy(Context);
		break;
case bitc::TYPE_CODE_TOKEN: // TOKEN		case bitc::TYPE_CODE_TOKEN: // TOKEN
ResultTy = Type::getTokenTy(Context);		ResultTy = Type::getTokenTy(Context);
break;		break;
case bitc::TYPE_CODE_INTEGER: { // INTEGER: [width]		case bitc::TYPE_CODE_INTEGER: { // INTEGER: [width]
if (Record.empty())		if (Record.empty())
return error("Invalid record");		return error("Invalid record");

uint64_t NumBits = Record[0];		uint64_t NumBits = Record[0];
▲ Show 20 Lines • Show All 5,204 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 907 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = TypeList.size(); i != e; ++i) {
case Type::FloatTyID: Code = bitc::TYPE_CODE_FLOAT; break;		case Type::FloatTyID: Code = bitc::TYPE_CODE_FLOAT; break;
case Type::DoubleTyID: Code = bitc::TYPE_CODE_DOUBLE; break;		case Type::DoubleTyID: Code = bitc::TYPE_CODE_DOUBLE; break;
case Type::X86_FP80TyID: Code = bitc::TYPE_CODE_X86_FP80; break;		case Type::X86_FP80TyID: Code = bitc::TYPE_CODE_X86_FP80; break;
case Type::FP128TyID: Code = bitc::TYPE_CODE_FP128; break;		case Type::FP128TyID: Code = bitc::TYPE_CODE_FP128; break;
case Type::PPC_FP128TyID: Code = bitc::TYPE_CODE_PPC_FP128; break;		case Type::PPC_FP128TyID: Code = bitc::TYPE_CODE_PPC_FP128; break;
case Type::LabelTyID: Code = bitc::TYPE_CODE_LABEL; break;		case Type::LabelTyID: Code = bitc::TYPE_CODE_LABEL; break;
case Type::MetadataTyID: Code = bitc::TYPE_CODE_METADATA; break;		case Type::MetadataTyID: Code = bitc::TYPE_CODE_METADATA; break;
case Type::X86_MMXTyID: Code = bitc::TYPE_CODE_X86_MMX; break;		case Type::X86_MMXTyID: Code = bitc::TYPE_CODE_X86_MMX; break;
		case Type::X86_AMXTyID: Code = bitc::TYPE_CODE_X86_AMX; break;
case Type::TokenTyID: Code = bitc::TYPE_CODE_TOKEN; break;		case Type::TokenTyID: Code = bitc::TYPE_CODE_TOKEN; break;
case Type::IntegerTyID:		case Type::IntegerTyID:
// INTEGER: [width]		// INTEGER: [width]
Code = bitc::TYPE_CODE_INTEGER;		Code = bitc::TYPE_CODE_INTEGER;
TypeVals.push_back(cast<IntegerType>(T)->getBitWidth());		TypeVals.push_back(cast<IntegerType>(T)->getBitWidth());
break;		break;
case Type::PointerTyID: {		case Type::PointerTyID: {
PointerType *PTy = cast<PointerType>(T);		PointerType *PTy = cast<PointerType>(T);
▲ Show 20 Lines • Show All 4,009 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ValueTypes.cpp

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	if (isFloatingPoint())
return "f" + utostr(getSizeInBits());		return "f" + utostr(getSizeInBits());
llvm_unreachable("Invalid EVT!");		llvm_unreachable("Invalid EVT!");
case MVT::bf16: return "bf16";		case MVT::bf16: return "bf16";
case MVT::ppcf128: return "ppcf128";		case MVT::ppcf128: return "ppcf128";
case MVT::isVoid: return "isVoid";		case MVT::isVoid: return "isVoid";
case MVT::Other: return "ch";		case MVT::Other: return "ch";
case MVT::Glue: return "glue";		case MVT::Glue: return "glue";
case MVT::x86mmx: return "x86mmx";		case MVT::x86mmx: return "x86mmx";
		case MVT::x86amx: return "x86amx";
case MVT::Metadata: return "Metadata";		case MVT::Metadata: return "Metadata";
case MVT::Untyped: return "Untyped";		case MVT::Untyped: return "Untyped";
case MVT::exnref: return "exnref";		case MVT::exnref: return "exnref";
case MVT::funcref: return "funcref";		case MVT::funcref: return "funcref";
case MVT::externref: return "externref";		case MVT::externref: return "externref";
}		}
}		}

Show All 15 Lines	Type *EVT::getTypeForEVT(LLVMContext &Context) const {
case MVT::f16: return Type::getHalfTy(Context);		case MVT::f16: return Type::getHalfTy(Context);
case MVT::bf16: return Type::getBFloatTy(Context);		case MVT::bf16: return Type::getBFloatTy(Context);
case MVT::f32: return Type::getFloatTy(Context);		case MVT::f32: return Type::getFloatTy(Context);
case MVT::f64: return Type::getDoubleTy(Context);		case MVT::f64: return Type::getDoubleTy(Context);
case MVT::f80: return Type::getX86_FP80Ty(Context);		case MVT::f80: return Type::getX86_FP80Ty(Context);
case MVT::f128: return Type::getFP128Ty(Context);		case MVT::f128: return Type::getFP128Ty(Context);
case MVT::ppcf128: return Type::getPPC_FP128Ty(Context);		case MVT::ppcf128: return Type::getPPC_FP128Ty(Context);
case MVT::x86mmx: return Type::getX86_MMXTy(Context);		case MVT::x86mmx: return Type::getX86_MMXTy(Context);
		case MVT::x86amx: return Type::getX86_AMXTy(Context);
case MVT::v1i1:		case MVT::v1i1:
return FixedVectorType::get(Type::getInt1Ty(Context), 1);		return FixedVectorType::get(Type::getInt1Ty(Context), 1);
case MVT::v2i1:		case MVT::v2i1:
return FixedVectorType::get(Type::getInt1Ty(Context), 2);		return FixedVectorType::get(Type::getInt1Ty(Context), 2);
case MVT::v4i1:		case MVT::v4i1:
return FixedVectorType::get(Type::getInt1Ty(Context), 4);		return FixedVectorType::get(Type::getInt1Ty(Context), 4);
case MVT::v8i1:		case MVT::v8i1:
return FixedVectorType::get(Type::getInt1Ty(Context), 8);		return FixedVectorType::get(Type::getInt1Ty(Context), 8);
▲ Show 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	MVT MVT::getVT(Type *Ty, bool HandleUnknown){
case Type::IntegerTyID:		case Type::IntegerTyID:
return getIntegerVT(cast<IntegerType>(Ty)->getBitWidth());		return getIntegerVT(cast<IntegerType>(Ty)->getBitWidth());
case Type::HalfTyID: return MVT(MVT::f16);		case Type::HalfTyID: return MVT(MVT::f16);
case Type::BFloatTyID: return MVT(MVT::bf16);		case Type::BFloatTyID: return MVT(MVT::bf16);
case Type::FloatTyID: return MVT(MVT::f32);		case Type::FloatTyID: return MVT(MVT::f32);
case Type::DoubleTyID: return MVT(MVT::f64);		case Type::DoubleTyID: return MVT(MVT::f64);
case Type::X86_FP80TyID: return MVT(MVT::f80);		case Type::X86_FP80TyID: return MVT(MVT::f80);
case Type::X86_MMXTyID: return MVT(MVT::x86mmx);		case Type::X86_MMXTyID: return MVT(MVT::x86mmx);
		case Type::X86_AMXTyID: return MVT(MVT::x86amx);
case Type::FP128TyID: return MVT(MVT::f128);		case Type::FP128TyID: return MVT(MVT::f128);
case Type::PPC_FP128TyID: return MVT(MVT::ppcf128);		case Type::PPC_FP128TyID: return MVT(MVT::ppcf128);
case Type::PointerTyID: return MVT(MVT::iPTR);		case Type::PointerTyID: return MVT(MVT::iPTR);
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
VectorType *VTy = cast<VectorType>(Ty);		VectorType *VTy = cast<VectorType>(Ty);
return getVectorVT(		return getVectorVT(
getVT(VTy->getElementType(), /HandleUnknown=/ false),		getVT(VTy->getElementType(), /HandleUnknown=/ false),
Show All 23 Lines

llvm/lib/IR/AsmWriter.cpp

Show First 20 Lines • Show All 603 Lines • ▼ Show 20 Lines	void TypePrinting::print(Type *Ty, raw_ostream &OS) {
case Type::FloatTyID: OS << "float"; return;		case Type::FloatTyID: OS << "float"; return;
case Type::DoubleTyID: OS << "double"; return;		case Type::DoubleTyID: OS << "double"; return;
case Type::X86_FP80TyID: OS << "x86_fp80"; return;		case Type::X86_FP80TyID: OS << "x86_fp80"; return;
case Type::FP128TyID: OS << "fp128"; return;		case Type::FP128TyID: OS << "fp128"; return;
case Type::PPC_FP128TyID: OS << "ppc_fp128"; return;		case Type::PPC_FP128TyID: OS << "ppc_fp128"; return;
case Type::LabelTyID: OS << "label"; return;		case Type::LabelTyID: OS << "label"; return;
case Type::MetadataTyID: OS << "metadata"; return;		case Type::MetadataTyID: OS << "metadata"; return;
case Type::X86_MMXTyID: OS << "x86_mmx"; return;		case Type::X86_MMXTyID: OS << "x86_mmx"; return;
		case Type::X86_AMXTyID: OS << "x86_amx"; return;
case Type::TokenTyID: OS << "token"; return;		case Type::TokenTyID: OS << "token"; return;
case Type::IntegerTyID:		case Type::IntegerTyID:
OS << 'i' << cast<IntegerType>(Ty)->getBitWidth();		OS << 'i' << cast<IntegerType>(Ty)->getBitWidth();
return;		return;

case Type::FunctionTyID: {		case Type::FunctionTyID: {
FunctionType *FTy = cast<FunctionType>(Ty);		FunctionType *FTy = cast<FunctionType>(Ty);
print(FTy->getReturnType(), OS);		print(FTy->getReturnType(), OS);
▲ Show 20 Lines • Show All 4,140 Lines • Show Last 20 Lines

llvm/lib/IR/ConstantFold.cpp

Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	if (isa<UndefValue>(V)) {
// sext(undef) = 0, because the top bits will all be the same.		// sext(undef) = 0, because the top bits will all be the same.
// [us]itofp(undef) = 0, because the result value is bounded.		// [us]itofp(undef) = 0, because the result value is bounded.
if (opc == Instruction::ZExt \|\| opc == Instruction::SExt \|\|		if (opc == Instruction::ZExt \|\| opc == Instruction::SExt \|\|
opc == Instruction::UIToFP \|\| opc == Instruction::SIToFP)		opc == Instruction::UIToFP \|\| opc == Instruction::SIToFP)
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
return UndefValue::get(DestTy);		return UndefValue::get(DestTy);
}		}

if (V->isNullValue() && !DestTy->isX86_MMXTy() &&		if (V->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy() &&
opc != Instruction::AddrSpaceCast)		opc != Instruction::AddrSpaceCast)
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
		pengfeiUnsubmitted Not Done Reply Inline Actions Operation should at the end of the line. pengfei: Operation should at the end of the line.

// If the cast operand is a constant expression, there's a few things we can		// If the cast operand is a constant expression, there's a few things we can
// do to try to simplify it.		// do to try to simplify it.
if (ConstantExpr *CE = dyn_cast<ConstantExpr>(V)) {		if (ConstantExpr *CE = dyn_cast<ConstantExpr>(V)) {
if (CE->isCast()) {		if (CE->isCast()) {
// Try hard to fold cast of cast because they are often eliminable.		// Try hard to fold cast of cast because they are often eliminable.
if (unsigned newOpc = foldConstantCastPair(opc, CE, DestTy))		if (unsigned newOpc = foldConstantCastPair(opc, CE, DestTy))
return ConstantExpr::getCast(newOpc, CE->getOperand(0), DestTy);		return ConstantExpr::getCast(newOpc, CE->getOperand(0), DestTy);
▲ Show 20 Lines • Show All 2,100 Lines • Show Last 20 Lines

llvm/lib/IR/Core.cpp

Show First 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty) {
case Type::ArrayTyID:		case Type::ArrayTyID:
return LLVMArrayTypeKind;		return LLVMArrayTypeKind;
case Type::PointerTyID:		case Type::PointerTyID:
return LLVMPointerTypeKind;		return LLVMPointerTypeKind;
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
return LLVMVectorTypeKind;		return LLVMVectorTypeKind;
case Type::X86_MMXTyID:		case Type::X86_MMXTyID:
return LLVMX86_MMXTypeKind;		return LLVMX86_MMXTypeKind;
		case Type::X86_AMXTyID:
		return LLVMX86_AMXTypeKind;
case Type::TokenTyID:		case Type::TokenTyID:
return LLVMTokenTypeKind;		return LLVMTokenTypeKind;
case Type::ScalableVectorTyID:		case Type::ScalableVectorTyID:
return LLVMScalableVectorTypeKind;		return LLVMScalableVectorTypeKind;
}		}
llvm_unreachable("Unhandled TypeID.");		llvm_unreachable("Unhandled TypeID.");
}		}

▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	LLVMTypeRef LLVMFP128TypeInContext(LLVMContextRef C) {
return (LLVMTypeRef) Type::getFP128Ty(*unwrap(C));		return (LLVMTypeRef) Type::getFP128Ty(*unwrap(C));
}		}
LLVMTypeRef LLVMPPCFP128TypeInContext(LLVMContextRef C) {		LLVMTypeRef LLVMPPCFP128TypeInContext(LLVMContextRef C) {
return (LLVMTypeRef) Type::getPPC_FP128Ty(*unwrap(C));		return (LLVMTypeRef) Type::getPPC_FP128Ty(*unwrap(C));
}		}
LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C) {		LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C) {
return (LLVMTypeRef) Type::getX86_MMXTy(*unwrap(C));		return (LLVMTypeRef) Type::getX86_MMXTy(*unwrap(C));
}		}
		LLVMTypeRef LLVMX86AMXTypeInContext(LLVMContextRef C) {
		return (LLVMTypeRef) Type::getX86_AMXTy(*unwrap(C));
		}

LLVMTypeRef LLVMHalfType(void) {		LLVMTypeRef LLVMHalfType(void) {
return LLVMHalfTypeInContext(LLVMGetGlobalContext());		return LLVMHalfTypeInContext(LLVMGetGlobalContext());
}		}
LLVMTypeRef LLVMBFloatType(void) {		LLVMTypeRef LLVMBFloatType(void) {
return LLVMBFloatTypeInContext(LLVMGetGlobalContext());		return LLVMBFloatTypeInContext(LLVMGetGlobalContext());
}		}
LLVMTypeRef LLVMFloatType(void) {		LLVMTypeRef LLVMFloatType(void) {
Show All 9 Lines	LLVMTypeRef LLVMFP128Type(void) {
return LLVMFP128TypeInContext(LLVMGetGlobalContext());		return LLVMFP128TypeInContext(LLVMGetGlobalContext());
}		}
LLVMTypeRef LLVMPPCFP128Type(void) {		LLVMTypeRef LLVMPPCFP128Type(void) {
return LLVMPPCFP128TypeInContext(LLVMGetGlobalContext());		return LLVMPPCFP128TypeInContext(LLVMGetGlobalContext());
}		}
LLVMTypeRef LLVMX86MMXType(void) {		LLVMTypeRef LLVMX86MMXType(void) {
return LLVMX86MMXTypeInContext(LLVMGetGlobalContext());		return LLVMX86MMXTypeInContext(LLVMGetGlobalContext());
}		}
		LLVMTypeRef LLVMX86AMXType(void) {
		return LLVMX86AMXTypeInContext(LLVMGetGlobalContext());
		}

/--.. Operations on function types ........................................--/		/--.. Operations on function types ........................................--/

LLVMTypeRef LLVMFunctionType(LLVMTypeRef ReturnType,		LLVMTypeRef LLVMFunctionType(LLVMTypeRef ReturnType,
LLVMTypeRef *ParamTypes, unsigned ParamCount,		LLVMTypeRef *ParamTypes, unsigned ParamCount,
LLVMBool IsVarArg) {		LLVMBool IsVarArg) {
ArrayRef<Type*> Tys(unwrap(ParamTypes), ParamCount);		ArrayRef<Type*> Tys(unwrap(ParamTypes), ParamCount);
return wrap(FunctionType::get(unwrap(ReturnType), Tys, IsVarArg != 0));		return wrap(FunctionType::get(unwrap(ReturnType), Tys, IsVarArg != 0));
▲ Show 20 Lines • Show All 3,504 Lines • Show Last 20 Lines

llvm/lib/IR/DataLayout.cpp

Show First 20 Lines • Show All 787 Lines • ▼ Show 20 Lines	case Type::X86_FP80TyID: {
// less conservative, they should have specified it explicitly in the data		// less conservative, they should have specified it explicitly in the data
// layout.		// layout.
return Align(PowerOf2Ceil(BitWidth / 8));		return Align(PowerOf2Ceil(BitWidth / 8));
}		}
case Type::X86_MMXTyID:		case Type::X86_MMXTyID:
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
unsigned BitWidth = getTypeSizeInBits(Ty).getKnownMinSize();		unsigned BitWidth = getTypeSizeInBits(Ty).getKnownMinSize();
auto I = findAlignmentLowerBound(VECTOR_ALIGN, BitWidth);		auto I = findAlignmentLowerBound(VECTOR_ALIGN, BitWidth);
		pengfeiUnsubmitted Done Reply Inline Actions Should be 512 bits? pengfei: Should be 512 bits?
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Yes. It is 512. Thanks. LuoYuanke: Yes. It is 512. Thanks.
if (I != Alignments.end() && I->AlignType == VECTOR_ALIGN &&		if (I != Alignments.end() && I->AlignType == VECTOR_ALIGN &&
I->TypeBitWidth == BitWidth)		I->TypeBitWidth == BitWidth)
return abi_or_pref ? I->ABIAlign : I->PrefAlign;		return abi_or_pref ? I->ABIAlign : I->PrefAlign;

// By default, use natural alignment for vector types. This is consistent		// By default, use natural alignment for vector types. This is consistent
// with what clang and llvm-gcc do.		// with what clang and llvm-gcc do.
// TODO: This should probably not be using the alloc size.		// TODO: This should probably not be using the alloc size.
unsigned Alignment =		unsigned Alignment =
getTypeAllocSize(cast<VectorType>(Ty)->getElementType());		getTypeAllocSize(cast<VectorType>(Ty)->getElementType());
// We're only calculating a natural alignment, so it doesn't have to be		// We're only calculating a natural alignment, so it doesn't have to be
// based on the full size for scalable vectors. Using the minimum element		// based on the full size for scalable vectors. Using the minimum element
// count should be enough here.		// count should be enough here.
Alignment *= cast<VectorType>(Ty)->getElementCount().getKnownMinValue();		Alignment *= cast<VectorType>(Ty)->getElementCount().getKnownMinValue();
Alignment = PowerOf2Ceil(Alignment);		Alignment = PowerOf2Ceil(Alignment);
return Align(Alignment);		return Align(Alignment);
}		}
		case Type::X86_AMXTyID:
		return Align(64);
default:		default:
llvm_unreachable("Bad type for getAlignment!!!");		llvm_unreachable("Bad type for getAlignment!!!");
}		}
}		}

/// TODO: Remove this function once the transition to Align is over.		/// TODO: Remove this function once the transition to Align is over.
unsigned DataLayout::getABITypeAlignment(Type *Ty) const {		unsigned DataLayout::getABITypeAlignment(Type *Ty) const {
return getABITypeAlign(Ty).value();		return getABITypeAlign(Ty).value();
▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/lib/IR/Function.cpp

Show First 20 Lines • Show All 758 Lines • ▼ Show 20 Lines	if (PointerType* PTyp = dyn_cast<PointerType>(Ty)) {
case Type::HalfTyID: Result += "f16"; break;		case Type::HalfTyID: Result += "f16"; break;
case Type::BFloatTyID: Result += "bf16"; break;		case Type::BFloatTyID: Result += "bf16"; break;
case Type::FloatTyID: Result += "f32"; break;		case Type::FloatTyID: Result += "f32"; break;
case Type::DoubleTyID: Result += "f64"; break;		case Type::DoubleTyID: Result += "f64"; break;
case Type::X86_FP80TyID: Result += "f80"; break;		case Type::X86_FP80TyID: Result += "f80"; break;
case Type::FP128TyID: Result += "f128"; break;		case Type::FP128TyID: Result += "f128"; break;
case Type::PPC_FP128TyID: Result += "ppcf128"; break;		case Type::PPC_FP128TyID: Result += "ppcf128"; break;
case Type::X86_MMXTyID: Result += "x86mmx"; break;		case Type::X86_MMXTyID: Result += "x86mmx"; break;
		case Type::X86_AMXTyID: Result += "x86amx"; break;
case Type::IntegerTyID:		case Type::IntegerTyID:
Result += "i" + utostr(cast<IntegerType>(Ty)->getBitWidth());		Result += "i" + utostr(cast<IntegerType>(Ty)->getBitWidth());
break;		break;
}		}
}		}
return Result;		return Result;
}		}

▲ Show 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	enum IIT_Info {
IIT_VEC_ELEMENT = 42,		IIT_VEC_ELEMENT = 42,
IIT_SCALABLE_VEC = 43,		IIT_SCALABLE_VEC = 43,
IIT_SUBDIVIDE2_ARG = 44,		IIT_SUBDIVIDE2_ARG = 44,
IIT_SUBDIVIDE4_ARG = 45,		IIT_SUBDIVIDE4_ARG = 45,
IIT_VEC_OF_BITCASTS_TO_INT = 46,		IIT_VEC_OF_BITCASTS_TO_INT = 46,
IIT_V128 = 47,		IIT_V128 = 47,
IIT_BF16 = 48,		IIT_BF16 = 48,
IIT_STRUCT9 = 49,		IIT_STRUCT9 = 49,
IIT_V256 = 50		IIT_V256 = 50,
		IIT_AMX = 51
};		};

static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,		static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,
IIT_Info LastInfo,		IIT_Info LastInfo,
SmallVectorImpl<Intrinsic::IITDescriptor> &OutputTable) {		SmallVectorImpl<Intrinsic::IITDescriptor> &OutputTable) {
using namespace Intrinsic;		using namespace Intrinsic;

bool IsScalableVector = (LastInfo == IIT_SCALABLE_VEC);		bool IsScalableVector = (LastInfo == IIT_SCALABLE_VEC);

IIT_Info Info = IIT_Info(Infos[NextElt++]);		IIT_Info Info = IIT_Info(Infos[NextElt++]);
unsigned StructElts = 2;		unsigned StructElts = 2;

switch (Info) {		switch (Info) {
case IIT_Done:		case IIT_Done:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Void, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Void, 0));
return;		return;
case IIT_VARARG:		case IIT_VARARG:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::VarArg, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::VarArg, 0));
return;		return;
case IIT_MMX:		case IIT_MMX:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::MMX, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::MMX, 0));
return;		return;
		case IIT_AMX:
		OutputTable.push_back(IITDescriptor::get(IITDescriptor::AMX, 0));
		return;
case IIT_TOKEN:		case IIT_TOKEN:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Token, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Token, 0));
return;		return;
case IIT_METADATA:		case IIT_METADATA:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Metadata, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Metadata, 0));
return;		return;
case IIT_F16:		case IIT_F16:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Half, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Half, 0));
▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	static Type *DecodeFixedType(ArrayRef<Intrinsic::IITDescriptor> &Infos,

IITDescriptor D = Infos.front();		IITDescriptor D = Infos.front();
Infos = Infos.slice(1);		Infos = Infos.slice(1);

switch (D.Kind) {		switch (D.Kind) {
case IITDescriptor::Void: return Type::getVoidTy(Context);		case IITDescriptor::Void: return Type::getVoidTy(Context);
case IITDescriptor::VarArg: return Type::getVoidTy(Context);		case IITDescriptor::VarArg: return Type::getVoidTy(Context);
case IITDescriptor::MMX: return Type::getX86_MMXTy(Context);		case IITDescriptor::MMX: return Type::getX86_MMXTy(Context);
		case IITDescriptor::AMX: return Type::getX86_AMXTy(Context);
case IITDescriptor::Token: return Type::getTokenTy(Context);		case IITDescriptor::Token: return Type::getTokenTy(Context);
case IITDescriptor::Metadata: return Type::getMetadataTy(Context);		case IITDescriptor::Metadata: return Type::getMetadataTy(Context);
case IITDescriptor::Half: return Type::getHalfTy(Context);		case IITDescriptor::Half: return Type::getHalfTy(Context);
case IITDescriptor::BFloat: return Type::getBFloatTy(Context);		case IITDescriptor::BFloat: return Type::getBFloatTy(Context);
case IITDescriptor::Float: return Type::getFloatTy(Context);		case IITDescriptor::Float: return Type::getFloatTy(Context);
case IITDescriptor::Double: return Type::getDoubleTy(Context);		case IITDescriptor::Double: return Type::getDoubleTy(Context);
case IITDescriptor::Quad: return Type::getFP128Ty(Context);		case IITDescriptor::Quad: return Type::getFP128Ty(Context);

▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	static bool matchIntrinsicType(

IITDescriptor D = Infos.front();		IITDescriptor D = Infos.front();
Infos = Infos.slice(1);		Infos = Infos.slice(1);

switch (D.Kind) {		switch (D.Kind) {
case IITDescriptor::Void: return !Ty->isVoidTy();		case IITDescriptor::Void: return !Ty->isVoidTy();
case IITDescriptor::VarArg: return true;		case IITDescriptor::VarArg: return true;
case IITDescriptor::MMX: return !Ty->isX86_MMXTy();		case IITDescriptor::MMX: return !Ty->isX86_MMXTy();
		case IITDescriptor::AMX: return !Ty->isX86_AMXTy();
case IITDescriptor::Token: return !Ty->isTokenTy();		case IITDescriptor::Token: return !Ty->isTokenTy();
case IITDescriptor::Metadata: return !Ty->isMetadataTy();		case IITDescriptor::Metadata: return !Ty->isMetadataTy();
case IITDescriptor::Half: return !Ty->isHalfTy();		case IITDescriptor::Half: return !Ty->isHalfTy();
case IITDescriptor::BFloat: return !Ty->isBFloatTy();		case IITDescriptor::BFloat: return !Ty->isBFloatTy();
case IITDescriptor::Float: return !Ty->isFloatTy();		case IITDescriptor::Float: return !Ty->isFloatTy();
case IITDescriptor::Double: return !Ty->isDoubleTy();		case IITDescriptor::Double: return !Ty->isDoubleTy();
case IITDescriptor::Quad: return !Ty->isFP128Ty();		case IITDescriptor::Quad: return !Ty->isFP128Ty();
case IITDescriptor::Integer: return !Ty->isIntegerTy(D.Integer_Width);		case IITDescriptor::Integer: return !Ty->isIntegerTy(D.Integer_Width);
▲ Show 20 Lines • Show All 479 Lines • Show Last 20 Lines

llvm/lib/IR/LLVMContextImpl.h

Show First 20 Lines • Show All 1,412 Lines • ▼ Show 20 Lines	#include "llvm/IR/Metadata.def"
ConstantInt *TheTrueVal = nullptr;		ConstantInt *TheTrueVal = nullptr;
ConstantInt *TheFalseVal = nullptr;		ConstantInt *TheFalseVal = nullptr;

std::unique_ptr<ConstantTokenNone> TheNoneToken;		std::unique_ptr<ConstantTokenNone> TheNoneToken;

// Basic type instances.		// Basic type instances.
Type VoidTy, LabelTy, HalfTy, BFloatTy, FloatTy, DoubleTy, MetadataTy,		Type VoidTy, LabelTy, HalfTy, BFloatTy, FloatTy, DoubleTy, MetadataTy,
TokenTy;		TokenTy;
Type X86_FP80Ty, FP128Ty, PPC_FP128Ty, X86_MMXTy;		Type X86_FP80Ty, FP128Ty, PPC_FP128Ty, X86_MMXTy, X86_AMXTy;
IntegerType Int1Ty, Int8Ty, Int16Ty, Int32Ty, Int64Ty, Int128Ty;		IntegerType Int1Ty, Int8Ty, Int16Ty, Int32Ty, Int64Ty, Int128Ty;

BumpPtrAllocator Alloc;		BumpPtrAllocator Alloc;
UniqueStringSaver Saver{Alloc};		UniqueStringSaver Saver{Alloc};

DenseMap<unsigned, IntegerType*> IntegerTypes;		DenseMap<unsigned, IntegerType*> IntegerTypes;

using FunctionTypeSet = DenseSet<FunctionType *, FunctionTypeKeyInfo>;		using FunctionTypeSet = DenseSet<FunctionType *, FunctionTypeKeyInfo>;
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/lib/IR/LLVMContextImpl.cpp

Show All 29 Lines	: DiagHandler(std::make_unique<DiagnosticHandler>()),
FloatTy(C, Type::FloatTyID),		FloatTy(C, Type::FloatTyID),
DoubleTy(C, Type::DoubleTyID),		DoubleTy(C, Type::DoubleTyID),
MetadataTy(C, Type::MetadataTyID),		MetadataTy(C, Type::MetadataTyID),
TokenTy(C, Type::TokenTyID),		TokenTy(C, Type::TokenTyID),
X86_FP80Ty(C, Type::X86_FP80TyID),		X86_FP80Ty(C, Type::X86_FP80TyID),
FP128Ty(C, Type::FP128TyID),		FP128Ty(C, Type::FP128TyID),
PPC_FP128Ty(C, Type::PPC_FP128TyID),		PPC_FP128Ty(C, Type::PPC_FP128TyID),
X86_MMXTy(C, Type::X86_MMXTyID),		X86_MMXTy(C, Type::X86_MMXTyID),
		X86_AMXTy(C, Type::X86_AMXTyID),
Int1Ty(C, 1),		Int1Ty(C, 1),
Int8Ty(C, 8),		Int8Ty(C, 8),
Int16Ty(C, 16),		Int16Ty(C, 16),
Int32Ty(C, 32),		Int32Ty(C, 32),
Int64Ty(C, 64),		Int64Ty(C, 64),
Int128Ty(C, 128) {}		Int128Ty(C, 128) {}

LLVMContextImpl::~LLVMContextImpl() {		LLVMContextImpl::~LLVMContextImpl() {
▲ Show 20 Lines • Show All 187 Lines • Show Last 20 Lines

llvm/lib/IR/Type.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Type *Type::getPrimitiveType(LLVMContext &C, TypeID IDNumber) {
case FloatTyID : return getFloatTy(C);		case FloatTyID : return getFloatTy(C);
case DoubleTyID : return getDoubleTy(C);		case DoubleTyID : return getDoubleTy(C);
case X86_FP80TyID : return getX86_FP80Ty(C);		case X86_FP80TyID : return getX86_FP80Ty(C);
case FP128TyID : return getFP128Ty(C);		case FP128TyID : return getFP128Ty(C);
case PPC_FP128TyID : return getPPC_FP128Ty(C);		case PPC_FP128TyID : return getPPC_FP128Ty(C);
case LabelTyID : return getLabelTy(C);		case LabelTyID : return getLabelTy(C);
case MetadataTyID : return getMetadataTy(C);		case MetadataTyID : return getMetadataTy(C);
case X86_MMXTyID : return getX86_MMXTy(C);		case X86_MMXTyID : return getX86_MMXTy(C);
		case X86_AMXTyID : return getX86_AMXTy(C);
case TokenTyID : return getTokenTy(C);		case TokenTyID : return getTokenTy(C);
default:		default:
return nullptr;		return nullptr;
}		}
}		}

bool Type::isIntegerTy(unsigned Bitwidth) const {		bool Type::isIntegerTy(unsigned Bitwidth) const {
return isIntegerTy() && cast<IntegerType>(this)->getBitWidth() == Bitwidth;		return isIntegerTy() && cast<IntegerType>(this)->getBitWidth() == Bitwidth;
Show All 16 Lines	bool Type::canLosslesslyBitCastTo(Type *Ty) const {
// 64-bit fixed width vector types can be losslessly converted to x86mmx.		// 64-bit fixed width vector types can be losslessly converted to x86mmx.
if (((isa<FixedVectorType>(this)) && Ty->isX86_MMXTy()) &&		if (((isa<FixedVectorType>(this)) && Ty->isX86_MMXTy()) &&
getPrimitiveSizeInBits().getFixedSize() == 64)		getPrimitiveSizeInBits().getFixedSize() == 64)
return true;		return true;
if ((isX86_MMXTy() && isa<FixedVectorType>(Ty)) &&		if ((isX86_MMXTy() && isa<FixedVectorType>(Ty)) &&
Ty->getPrimitiveSizeInBits().getFixedSize() == 64)		Ty->getPrimitiveSizeInBits().getFixedSize() == 64)
return true;		return true;

		// 8192-bit fixed width vector types can be losslessly converted to x86amx.
		if (((isa<FixedVectorType>(this)) && Ty->isX86_AMXTy()) &&
		getPrimitiveSizeInBits().getFixedSize() == 8192)
		return true;
		if ((isX86_AMXTy() && isa<FixedVectorType>(Ty)) &&
		Ty->getPrimitiveSizeInBits().getFixedSize() == 8192)
		return true;

// At this point we have only various mismatches of the first class types		// At this point we have only various mismatches of the first class types
// remaining and ptr->ptr. Just select the lossless conversions. Everything		// remaining and ptr->ptr. Just select the lossless conversions. Everything
// else is not lossless. Conservatively assume we can't losslessly convert		// else is not lossless. Conservatively assume we can't losslessly convert
// between pointers with different address spaces.		// between pointers with different address spaces.
if (auto *PTy = dyn_cast<PointerType>(this)) {		if (auto *PTy = dyn_cast<PointerType>(this)) {
if (auto *OtherPTy = dyn_cast<PointerType>(Ty))		if (auto *OtherPTy = dyn_cast<PointerType>(Ty))
return PTy->getAddressSpace() == OtherPTy->getAddressSpace();		return PTy->getAddressSpace() == OtherPTy->getAddressSpace();
return false;		return false;
Show All 23 Lines	TypeSize Type::getPrimitiveSizeInBits() const {
case Type::HalfTyID: return TypeSize::Fixed(16);		case Type::HalfTyID: return TypeSize::Fixed(16);
case Type::BFloatTyID: return TypeSize::Fixed(16);		case Type::BFloatTyID: return TypeSize::Fixed(16);
case Type::FloatTyID: return TypeSize::Fixed(32);		case Type::FloatTyID: return TypeSize::Fixed(32);
case Type::DoubleTyID: return TypeSize::Fixed(64);		case Type::DoubleTyID: return TypeSize::Fixed(64);
case Type::X86_FP80TyID: return TypeSize::Fixed(80);		case Type::X86_FP80TyID: return TypeSize::Fixed(80);
case Type::FP128TyID: return TypeSize::Fixed(128);		case Type::FP128TyID: return TypeSize::Fixed(128);
case Type::PPC_FP128TyID: return TypeSize::Fixed(128);		case Type::PPC_FP128TyID: return TypeSize::Fixed(128);
case Type::X86_MMXTyID: return TypeSize::Fixed(64);		case Type::X86_MMXTyID: return TypeSize::Fixed(64);
		case Type::X86_AMXTyID: return TypeSize::Fixed(8192);
case Type::IntegerTyID:		case Type::IntegerTyID:
return TypeSize::Fixed(cast<IntegerType>(this)->getBitWidth());		return TypeSize::Fixed(cast<IntegerType>(this)->getBitWidth());
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
const VectorType *VTy = cast<VectorType>(this);		const VectorType *VTy = cast<VectorType>(this);
ElementCount EC = VTy->getElementCount();		ElementCount EC = VTy->getElementCount();
TypeSize ETS = VTy->getElementType()->getPrimitiveSizeInBits();		TypeSize ETS = VTy->getElementType()->getPrimitiveSizeInBits();
assert(!ETS.isScalable() && "Vector type should have fixed-width elements");		assert(!ETS.isScalable() && "Vector type should have fixed-width elements");
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
Type *Type::getFloatTy(LLVMContext &C) { return &C.pImpl->FloatTy; }		Type *Type::getFloatTy(LLVMContext &C) { return &C.pImpl->FloatTy; }
Type *Type::getDoubleTy(LLVMContext &C) { return &C.pImpl->DoubleTy; }		Type *Type::getDoubleTy(LLVMContext &C) { return &C.pImpl->DoubleTy; }
Type *Type::getMetadataTy(LLVMContext &C) { return &C.pImpl->MetadataTy; }		Type *Type::getMetadataTy(LLVMContext &C) { return &C.pImpl->MetadataTy; }
Type *Type::getTokenTy(LLVMContext &C) { return &C.pImpl->TokenTy; }		Type *Type::getTokenTy(LLVMContext &C) { return &C.pImpl->TokenTy; }
Type *Type::getX86_FP80Ty(LLVMContext &C) { return &C.pImpl->X86_FP80Ty; }		Type *Type::getX86_FP80Ty(LLVMContext &C) { return &C.pImpl->X86_FP80Ty; }
Type *Type::getFP128Ty(LLVMContext &C) { return &C.pImpl->FP128Ty; }		Type *Type::getFP128Ty(LLVMContext &C) { return &C.pImpl->FP128Ty; }
Type *Type::getPPC_FP128Ty(LLVMContext &C) { return &C.pImpl->PPC_FP128Ty; }		Type *Type::getPPC_FP128Ty(LLVMContext &C) { return &C.pImpl->PPC_FP128Ty; }
Type *Type::getX86_MMXTy(LLVMContext &C) { return &C.pImpl->X86_MMXTy; }		Type *Type::getX86_MMXTy(LLVMContext &C) { return &C.pImpl->X86_MMXTy; }
		Type *Type::getX86_AMXTy(LLVMContext &C) { return &C.pImpl->X86_AMXTy; }

IntegerType *Type::getInt1Ty(LLVMContext &C) { return &C.pImpl->Int1Ty; }		IntegerType *Type::getInt1Ty(LLVMContext &C) { return &C.pImpl->Int1Ty; }
IntegerType *Type::getInt8Ty(LLVMContext &C) { return &C.pImpl->Int8Ty; }		IntegerType *Type::getInt8Ty(LLVMContext &C) { return &C.pImpl->Int8Ty; }
IntegerType *Type::getInt16Ty(LLVMContext &C) { return &C.pImpl->Int16Ty; }		IntegerType *Type::getInt16Ty(LLVMContext &C) { return &C.pImpl->Int16Ty; }
IntegerType *Type::getInt32Ty(LLVMContext &C) { return &C.pImpl->Int32Ty; }		IntegerType *Type::getInt32Ty(LLVMContext &C) { return &C.pImpl->Int32Ty; }
IntegerType *Type::getInt64Ty(LLVMContext &C) { return &C.pImpl->Int64Ty; }		IntegerType *Type::getInt64Ty(LLVMContext &C) { return &C.pImpl->Int64Ty; }
IntegerType *Type::getInt128Ty(LLVMContext &C) { return &C.pImpl->Int128Ty; }		IntegerType *Type::getInt128Ty(LLVMContext &C) { return &C.pImpl->Int128Ty; }

Show All 28 Lines
PointerType *Type::getPPC_FP128PtrTy(LLVMContext &C, unsigned AS) {		PointerType *Type::getPPC_FP128PtrTy(LLVMContext &C, unsigned AS) {
return getPPC_FP128Ty(C)->getPointerTo(AS);		return getPPC_FP128Ty(C)->getPointerTo(AS);
}		}

PointerType *Type::getX86_MMXPtrTy(LLVMContext &C, unsigned AS) {		PointerType *Type::getX86_MMXPtrTy(LLVMContext &C, unsigned AS) {
return getX86_MMXTy(C)->getPointerTo(AS);		return getX86_MMXTy(C)->getPointerTo(AS);
}		}

		PointerType *Type::getX86_AMXPtrTy(LLVMContext &C, unsigned AS) {
		return getX86_AMXTy(C)->getPointerTo(AS);
		}

PointerType *Type::getIntNPtrTy(LLVMContext &C, unsigned N, unsigned AS) {		PointerType *Type::getIntNPtrTy(LLVMContext &C, unsigned N, unsigned AS) {
return getIntNTy(C, N)->getPointerTo(AS);		return getIntNTy(C, N)->getPointerTo(AS);
}		}

PointerType *Type::getInt1PtrTy(LLVMContext &C, unsigned AS) {		PointerType *Type::getInt1PtrTy(LLVMContext &C, unsigned AS) {
return getInt1Ty(C)->getPointerTo(AS);		return getInt1Ty(C)->getPointerTo(AS);
}		}

▲ Show 20 Lines • Show All 454 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 4,612 Lines • ▼ Show 20 Lines	case Intrinsic::x86_tileloadd64_internal: {
Node->getOperand(3),		Node->getOperand(3),
Base,		Base,
Scale,		Scale,
Index,		Index,
Disp,		Disp,
Segment,		Segment,
CFG,		CFG,
Chain};		Chain};
CNode = CurDAG->getMachineNode(Opc, dl, {MVT::v256i32, MVT::Other}, Ops);		CNode = CurDAG->getMachineNode(Opc, dl, {MVT::x86amx, MVT::Other}, Ops);
ReplaceNode(Node, CNode);		ReplaceNode(Node, CNode);
return;		return;
}		}
case Intrinsic::x86_tdpbssd_internal: {		case Intrinsic::x86_tdpbssd_internal: {
if (!Subtarget->hasAMXTILE())		if (!Subtarget->hasAMXTILE())
break;		break;
SDValue Chain = Node->getOperand(0);		SDValue Chain = Node->getOperand(0);
unsigned Opc = X86::PTDPBSSDV;		unsigned Opc = X86::PTDPBSSDV;
SDValue CFG = CurDAG->getRegister(0, MVT::Untyped);		SDValue CFG = CurDAG->getRegister(0, MVT::Untyped);
SDValue Ops[] = {Node->getOperand(2),		SDValue Ops[] = {Node->getOperand(2),
Node->getOperand(3),		Node->getOperand(3),
Node->getOperand(4),		Node->getOperand(4),
Node->getOperand(5),		Node->getOperand(5),
Node->getOperand(6),		Node->getOperand(6),
Node->getOperand(7),		Node->getOperand(7),
CFG,		CFG,
Chain};		Chain};
MachineSDNode *CNode =		MachineSDNode *CNode =
CurDAG->getMachineNode(Opc, dl, {MVT::v256i32, MVT::Other}, Ops);		CurDAG->getMachineNode(Opc, dl, {MVT::x86amx, MVT::Other}, Ops);
ReplaceNode(Node, CNode);		ReplaceNode(Node, CNode);
return;		return;
}		}
}		}
break;		break;
}		}
case ISD::INTRINSIC_VOID: {		case ISD::INTRINSIC_VOID: {
unsigned IntNo = Node->getConstantOperandVal(1);		unsigned IntNo = Node->getConstantOperandVal(1);
▲ Show 20 Lines • Show All 1,359 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,892 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasVLX()) {
}		}

setOperationAction(ISD::TRUNCATE, MVT::v16i32, Custom);		setOperationAction(ISD::TRUNCATE, MVT::v16i32, Custom);
setOperationAction(ISD::TRUNCATE, MVT::v8i64, Custom);		setOperationAction(ISD::TRUNCATE, MVT::v8i64, Custom);
setOperationAction(ISD::TRUNCATE, MVT::v16i64, Custom);		setOperationAction(ISD::TRUNCATE, MVT::v16i64, Custom);
}		}

if (Subtarget.hasAMXTILE()) {		if (Subtarget.hasAMXTILE()) {
addRegisterClass(MVT::v256i32, &X86::TILERegClass);		addRegisterClass(MVT::x86amx, &X86::TILERegClass);
}		}

// We want to custom lower some of our intrinsics.		// We want to custom lower some of our intrinsics.
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
if (!Subtarget.is64Bit()) {		if (!Subtarget.is64Bit()) {
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i64, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i64, Custom);
▲ Show 20 Lines • Show All 3,430 Lines • ▼ Show 20 Lines	bool X86TargetLowering::canMergeStoresTo(unsigned AddressSpace, EVT MemVT,
if (NoFloat) {		if (NoFloat) {
unsigned MaxIntSize = Subtarget.is64Bit() ? 64 : 32;		unsigned MaxIntSize = Subtarget.is64Bit() ? 64 : 32;
return (MemVT.getSizeInBits() <= MaxIntSize);		return (MemVT.getSizeInBits() <= MaxIntSize);
}		}
// Make sure we don't merge greater than our preferred vector		// Make sure we don't merge greater than our preferred vector
// width.		// width.
if (MemVT.getSizeInBits() > Subtarget.getPreferVectorWidth())		if (MemVT.getSizeInBits() > Subtarget.getPreferVectorWidth())
return false;		return false;

// Don't merge to x86 amx tile, as we only map MVT::v256i32
// to x86 amx tile on amx intrinsics.
if (MemVT == MVT::v256i32)
return false;

return true;		return true;
		craig.topperUnsubmitted Not Done Reply Inline Actions Should this just be deleted? craig.topper: Should this just be deleted?
}		}

bool X86TargetLowering::isCtlzFast() const {		bool X86TargetLowering::isCtlzFast() const {
return Subtarget.hasFastLZCNT();		return Subtarget.hasFastLZCNT();
}		}

bool X86TargetLowering::isMaskAndCmp0FoldingBeneficial(		bool X86TargetLowering::isMaskAndCmp0FoldingBeneficial(
const Instruction &AndI) const {		const Instruction &AndI) const {
▲ Show 20 Lines • Show All 45,964 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86LowerAMXType.cpp

	//===- llvm/CodeGen/TileShapeInfo.h - ---------------------------- C++ --===//			//===- llvm/CodeGen/TileShapeInfo.h - ---------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	/// \file Pass to transform <256 x i32>			/// \file Pass to transform <256 x i32> load/store
	/// <256 x i32> is mapped to AMX tile register on X86, AMX instruction set only			/// <256 x i32> is bitcasted to x86_amx on X86, and AMX instruction set only
	/// provides simple operation on tile register. The basic elementwise operation			/// provides simple operation on x86_amx. The basic elementwise operation
	/// is not supported by AMX. Since we define the AMX tile as vector <256 x i32>			/// is not supported by AMX. Since x86_amx is bitcasted from vector <256 x i32>
	/// and only AMX intrinsics can operate on the type, we need transform			/// and only AMX intrinsics can operate on the type, we need transform
	/// load/store <256 x i32> instruction to AMX load/store. Besides, we split			/// load/store <256 x i32> instruction to AMX load/store. If the bitcast can
	/// <256 x i32> to 2 <128 x i32> if the vector is not used or defined by AMX			/// not be combined with load/store, we transform the bitcast to amx load/store
	/// intrinsics, so that in instruction selection it can be lowered to proper			/// and <256 x i32> store/load.
	/// size which HW can support.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	#include "X86.h"			#include "X86.h"
	#include "llvm/ADT/DenseSet.h"			#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/ADT/SmallSet.h"
	#include "llvm/Analysis/OptimizationRemarkEmitter.h"			#include "llvm/Analysis/OptimizationRemarkEmitter.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/CodeGen/Passes.h"			#include "llvm/CodeGen/Passes.h"
	#include "llvm/CodeGen/ValueTypes.h"			#include "llvm/CodeGen/ValueTypes.h"
	#include "llvm/IR/DataLayout.h"			#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include "llvm/IR/IRBuilder.h"			#include "llvm/IR/IRBuilder.h"
	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/IR/IntrinsicsX86.h"			#include "llvm/IR/IntrinsicsX86.h"
				#include "llvm/IR/PatternMatch.h"
	#include "llvm/InitializePasses.h"			#include "llvm/InitializePasses.h"
	#include "llvm/Pass.h"			#include "llvm/Pass.h"

	using namespace llvm;			using namespace llvm;
				using namespace PatternMatch;

	#define DEBUG_TYPE "lower-amx-type"			#define DEBUG_TYPE "lower-amx-type"

	namespace {			static AllocaInst CreateAllocaInst(IRBuilder<> &Builder, BasicBlock BB) {
	class X86LowerAMXType {			Function &F = *BB->getParent();
	Function &Func;			Module *M = BB->getModule();
	const DataLayout &DL;			const DataLayout &DL = M->getDataLayout();
	DenseSet<Instruction *> LDSet;
	DenseSet<Instruction *> STSet;
	DenseMap<Value , std::pair<LoadInst , LoadInst *>> LoadMap;

	public:
	X86LowerAMXType(Function &F) : Func(F), DL(F.getParent()->getDataLayout()) {}
	bool visit();
	bool visitLD();
	bool visitST();
	void splitST(Instruction *Inst);
	void splitLD(Instruction *Inst);
	};

	// Split v256i32 load/store to 2 v128i32, so that ISel can			Type *V256I32Ty = VectorType::get(Builder.getInt32Ty(), 256, false);
	// lower it to proper vector size.
	void X86LowerAMXType::splitST(Instruction *Inst) {
	StoreInst *ST = dyn_cast<StoreInst>(Inst);
	IRBuilder<> Builder(ST);
	LLVMContext &Ctx = Builder.getContext();			LLVMContext &Ctx = Builder.getContext();
	Type *Ty = ST->getValueOperand()->getType();			auto AllocaAlignment = DL.getPrefTypeAlign(Type::getX86_AMXTy(Ctx));
	EVT VT = EVT::getEVT(Ty);			unsigned AllocaAS = DL.getAllocaAddrSpace();
				pengfeiUnsubmitted Not Done Reply Inline Actions Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here. pengfei: Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here.
	EVT HalfVT = VT.getHalfNumVectorElementsVT(Ctx);			AllocaInst *AllocaRes =
	Type *HalfTy = HalfVT.getTypeForEVT(Ctx);			new AllocaInst(V256I32Ty, AllocaAS, "", &F.getEntryBlock().front());
				AllocaRes->setAlignment(AllocaAlignment);
	LoadInst Lo, Hi;			return AllocaRes;
	std::tie(Lo, Hi) = LoadMap[ST->getValueOperand()];			}
	Value *Ptr = ST->getPointerOperand();
	PointerType *HalfPtrTy = HalfTy->getPointerTo(ST->getPointerAddressSpace());			static std::pair<Value , Value > getShape(IntrinsicInst *II, unsigned OpNo) {
	Value *HalfPtr = Builder.CreateBitCast(Ptr, HalfPtrTy);			Value Row = nullptr, Col = nullptr;
	// The HW require the alignment for AMX tile is 64, but front-end generate
	// code for the vector alignment which is the vector size.
	uint64_t HalfTySize = HalfTy->getPrimitiveSizeInBits().getFixedSize() / 8;
	Align Alignment = std::min(Lo->getAlign(), Align(HalfTySize));
	Builder.CreateAlignedStore(Lo, HalfPtr, Alignment, ST->isVolatile());

	HalfPtr = Builder.CreateGEP(HalfTy, HalfPtr, Builder.getInt32(1));
	Builder.CreateAlignedStore(Hi, HalfPtr, Alignment, ST->isVolatile());
	}

	bool X86LowerAMXType::visitST() {
	if (STSet.empty())
	return false;
	for (auto *Inst : STSet) {
	Value Row, Col;
	const IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst->getOperand(0));
	if (!II)
	Row = Col = nullptr;
	else {
	switch (II->getIntrinsicID()) {			switch (II->getIntrinsicID()) {
	default:			default:
				pengfeiUnsubmitted Not Done Reply Inline Actions I think we'd better to check exceptions. E.g. default: llvm_unreachable(""); case Intrinsic::x86_tileloadd64_internal: case Intrinsic::x86_tdpbssd_internal: case Intrinsic::x86_tilestored64_internal: Row = II->getArgOperand(0); Col = II->getArgOperand(1); break; pengfei: I think we'd better to check exceptions. E.g. ``` default: llvm_unreachable(""); case…
	Row = Col = nullptr;			llvm_unreachable("Expect amx intrinsics");
	break;
	case Intrinsic::x86_tileloadd64_internal:			case Intrinsic::x86_tileloadd64_internal:
	case Intrinsic::x86_tdpbssd_internal: {			case Intrinsic::x86_tilestored64_internal: {
	Row = II->getArgOperand(0);			Row = II->getArgOperand(0);
	Col = II->getArgOperand(1);			Col = II->getArgOperand(1);
	break;			break;
	}			}
	}			// a * b + c
	}			// The shape depends on which operand.
	if (!Row) {
	splitST(Inst);
	continue;
	}
	IRBuilder<> Builder(Inst);
	LLVMContext &Ctx = Builder.getContext();
	// Use the maximun column as stride. It must be the same with load stride.
	Value *Stride = Builder.getInt64(64);
	Value *I8Ptr =
	Builder.CreateBitCast(Inst->getOperand(1), Type::getInt8PtrTy(Ctx));
	std::array<Value *, 5> Args = {Row, Col, I8Ptr, Stride,
	Inst->getOperand(0)};

	Builder.CreateIntrinsic(Intrinsic::x86_tilestored64_internal, None, Args);
	}
	return true;
	}

	void X86LowerAMXType::splitLD(Instruction *Inst) {
	LoadInst *LD = dyn_cast<LoadInst>(Inst);
	IRBuilder<> Builder(LD);
	LLVMContext &Ctx = Builder.getContext();
	Type *Ty = LD->getType();
	EVT VT = EVT::getEVT(Ty);
	EVT HalfVT = VT.getHalfNumVectorElementsVT(Ctx);
	Type *HalfTy = HalfVT.getTypeForEVT(Ctx);

	Value *Ptr = LD->getPointerOperand();
	PointerType *HalfPtrTy = HalfTy->getPointerTo(LD->getPointerAddressSpace());
	Value *HalfPtr = Builder.CreateBitCast(Ptr, HalfPtrTy);
	// The HW require the alignment for AMX tile is 64, but front-end generate
	// code for the vector alignment which is the vector size.
	uint64_t HalfTySize = HalfTy->getPrimitiveSizeInBits().getFixedSize() / 8;
	Align Alignment = std::min(LD->getAlign(), Align(HalfTySize));
	auto *Lo =
	Builder.CreateAlignedLoad(HalfTy, HalfPtr, Alignment, LD->isVolatile());

	HalfPtr = Builder.CreateGEP(HalfTy, HalfPtr, Builder.getInt32(1));
	auto *Hi =
	Builder.CreateAlignedLoad(HalfTy, HalfPtr, Alignment, LD->isVolatile());

	LoadMap[Inst] = std::make_pair(Lo, Hi);
	}

	bool X86LowerAMXType::visitLD() {
	if (LDSet.empty())
	return false;
	for (auto &Inst : LDSet) {
	int Count = 0;
	Value *NewInst = nullptr;
	// The user should be all AMX intrinsics or all LLVM instruction.
	// Don't support it is used by both AMX intrinsics and LLVM instructions.
	for (auto I = Inst->use_begin(), E = Inst->use_end(); I != E;) {
	Use &U = *I++;
	const IntrinsicInst *II = dyn_cast<IntrinsicInst>(U.getUser());
	if (!II) {
	Count++;
	continue;
	}
	if (NewInst)
	continue;
	Value Row, Col;
	switch (II->getIntrinsicID()) {
	default:
	report_fatal_error("Non-AMX intrinsic use tile type.");
	break;
	case Intrinsic::x86_tdpbssd_internal: {			case Intrinsic::x86_tdpbssd_internal: {
	unsigned OpNo = U.getOperandNo();
	switch (OpNo) {			switch (OpNo) {
	case 3:			case 3:
	Row = II->getArgOperand(0);			Row = II->getArgOperand(0);
	Col = II->getArgOperand(1);			Col = II->getArgOperand(1);
	break;			break;
	case 4:			case 4:
	Row = II->getArgOperand(0);			Row = II->getArgOperand(0);
	Col = II->getArgOperand(2);			Col = II->getArgOperand(2);
	break;			break;
	case 5:			case 5:
	Row = II->getArgOperand(2);			Row = II->getArgOperand(2);
	Col = II->getArgOperand(1);			Col = II->getArgOperand(1);
	break;			break;
	}			}
	break;			break;
	}			}
	case Intrinsic::x86_tilestored64_internal: {
	Row = II->getArgOperand(0);
	Col = II->getArgOperand(1);
	break;
	}			}

				return std::make_pair(Row, Col);
	}			}
	assert(Count == 0 && "Can NOT mix amx intrinsic and LLVM instruction");
	// FIXME: The shape def should be ahead of load.			// %src = load <256 x i32>, <256 x i32>* %addr, align 64
	IRBuilder<> Builder(Inst);			// %2 = bitcast <256 x i32> %src to x86_amx
	LLVMContext &Ctx = Builder.getContext();			// -->
				// %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col,
				// i8* %addr, i64 %stride64)
				static void combineLoadBitcast(LoadInst LD, BitCastInst Bitcast) {
				Value Row = nullptr, Col = nullptr;
				Use &U = *(Bitcast->use_begin());
				unsigned OpNo = U.getOperandNo();
				auto *II = cast<IntrinsicInst>(U.getUser());
				std::tie(Row, Col) = getShape(II, OpNo);
				IRBuilder<> Builder(Bitcast);
	// Use the maximun column as stride.			// Use the maximun column as stride.
	Value *Stride = Builder.getInt64(64);			Value *Stride = Builder.getInt64(64);
	Value *I8Ptr =			Value *I8Ptr =
	Builder.CreateBitCast(Inst->getOperand(0), Type::getInt8PtrTy(Ctx));			Builder.CreateBitCast(LD->getOperand(0), Builder.getInt8PtrTy());
	std::array<Value *, 4> Args = {Row, Col, I8Ptr, Stride};			std::array<Value *, 4> Args = {Row, Col, I8Ptr, Stride};

	NewInst = Builder.CreateIntrinsic(Intrinsic::x86_tileloadd64_internal,			Value *NewInst =
	None, Args);			Builder.CreateIntrinsic(Intrinsic::x86_tileloadd64_internal, None, Args);
				Bitcast->replaceAllUsesWith(NewInst);
				}

				// %src = call x86_amx @llvm.x86.tileloadd64.internal(%row, %col, %addr,
				// %stride);
				// %13 = bitcast x86_amx %src to <256 x i32>
				// store <256 x i32> %13, <256 x i32>* %addr, align 64
				// -->
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %13)
				static void combineBitcastStore(BitCastInst Bitcast, StoreInst ST) {
				pengfeiUnsubmitted Not Done Reply Inline Actions Why don't check empty like line 157? pengfei: Why don't check empty like line 157?

				Value *Tile = Bitcast->getOperand(0);
				auto *II = cast<IntrinsicInst>(Tile);
				// Tile is output from AMX intrinsic. The first operand of the
				pengfeiUnsubmitted Not Done Reply Inline Actions How about the `Tile` comes from tdpbssd? pengfei: How about the `Tile` comes from tdpbssd?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions We have a convention, when amx intrinsics define a x86_amx tile the first 2 operands is the shape of the defined tile. For tdpbssd, the intrinsics operands are (m, n, k, ...). (m, n) is the shape of the produced tile. LuoYuanke: We have a convention, when amx intrinsics define a x86_amx tile the first 2 operands is the…
				pengfeiUnsubmitted Not Done Reply Inline Actions Oh, yes. I missed that. Thanks. pengfei: Oh, yes. I missed that. Thanks.
				// intrinsic is row, the second operand of the intrinsic is column.
				Value *Row = II->getOperand(0);
				Value *Col = II->getOperand(1);
				IRBuilder<> Builder(ST);
				// Use the maximum column as stride. It must be the same with load
				// stride.
				Value *Stride = Builder.getInt64(64);
				Value *I8Ptr =
				Builder.CreateBitCast(ST->getOperand(1), Builder.getInt8PtrTy());
				std::array<Value *, 5> Args = {Row, Col, I8Ptr, Stride, Tile};
				Builder.CreateIntrinsic(Intrinsic::x86_tilestored64_internal, None, Args);
				if (Bitcast->hasOneUse())
				return;
				// %13 = bitcast x86_amx %src to <256 x i32>
				// store <256 x i32> %13, <256 x i32>* %addr, align 64
				// %add = <256 x i32> %13, <256 x i32> %src2
				// -->
				// %13 = bitcast x86_amx %src to <256 x i32>
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %13)
				// %14 = load <256 x i32>, %addr
				// %add = <256 x i32> %14, <256 x i32> %src2
				Value *Vec = Builder.CreateLoad(Bitcast->getType(), ST->getOperand(1));
				Bitcast->replaceAllUsesWith(Vec);
				pengfeiUnsubmitted Not Done Reply Inline Actions Value pengfei: Value
				}

				// transform bitcast to <store, load> instructions.
				pengfeiUnsubmitted Not Done Reply Inline Actions Why don't put it in DeadBitcasts? pengfei: Why don't put it in DeadBitcasts?
				pengfeiUnsubmitted Not Done Reply Inline Actions I don't see any chance this happen. But we still need to handle the x86_amx* here if possible, right? Maybe better to give an assertion for now. cast<PointerType>(Src->getType())->isX86_AMXTy() pengfei: I don't see any chance this happen. But we still need to handle the x86_amx* here if possible…
				static bool transformBitcast(BitCastInst *Bitcast) {
				IRBuilder<> Builder(Bitcast);
				pengfeiUnsubmitted Not Done Reply Inline Actions Can we leave the canonicalize bitcast cases a single patch. It's a bit complex here and I don't think it's a common case. pengfei: Can we leave the canonicalize bitcast cases a single patch. It's a bit complex here and I don't…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Ok, I'll create another patch for it. LuoYuanke: Ok, I'll create another patch for it.
				AllocaInst *AllocaAddr;
				Value I8Ptr, Stride;
				auto *Src = Bitcast->getOperand(0);

				auto Prepare = [&]() {
				AllocaAddr = CreateAllocaInst(Builder, Bitcast->getParent());
				I8Ptr = Builder.CreateBitCast(AllocaAddr, Builder.getInt8PtrTy());
				Stride = Builder.getInt64(64);
				pengfeiUnsubmitted Not Done Reply Inline Actions Maybe better to use BitCastInst? pengfei: Maybe better to use BitCastInst?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions There may be dead load or store instructions. LuoYuanke: There may be dead load or store instructions.
				};

	Inst->replaceAllUsesWith(NewInst);			if (Bitcast->getType()->isX86_AMXTy()) {
	}			// %2 = bitcast <256 x i32> %src to x86_amx
	if (!NewInst)			// -->
	splitLD(Inst);			// %addr = alloca <256 x i32>, align 64
				// store <256 x i32> %src, <256 x i32>* %addr, align 64
				// %addr2 = bitcast <256 x i32>* to i8*
				// %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col,
				// i8* %addr2,
				pengfeiUnsubmitted Not Done Reply Inline Actions Why the alignment not be 64? pengfei: Why the alignment not be 64?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions 1024 is conservatives, because vector require the alignment to be the vector size. Here generate vector <256 x i32> load/store. LuoYuanke: 1024 is conservatives, because vector require the alignment to be the vector size. Here…
				pengfeiUnsubmitted Not Done Reply Inline Actions We don't need to align to 1024. 64 should be enough. The same for below comments. pengfei: We don't need to align to 1024. 64 should be enough. The same for below comments.
				// i64 64)
				Use &U = *(Bitcast->use_begin());
				unsigned OpNo = U.getOperandNo();
				auto *II = dyn_cast<IntrinsicInst>(U.getUser());
				if (!II)
				return false; // May be bitcast from x86amx to <256 x i32>.
				Prepare();
				Builder.CreateStore(Src, AllocaAddr);
				// TODO we can pick an constant operand for the shape.
				Value Row = nullptr, Col = nullptr;
				craig.topperUnsubmitted Not Done Reply Inline Actions Just use Value. auto doesn't add any value other than shortening by 1 character. craig.topper: Just use Value. auto doesn't add any value other than shortening by 1 character.
				std::tie(Row, Col) = getShape(II, OpNo);
				std::array<Value *, 4> Args = {Row, Col, I8Ptr, Stride};
				Value *NewInst = Builder.CreateIntrinsic(
				Intrinsic::x86_tileloadd64_internal, None, Args);
				Bitcast->replaceAllUsesWith(NewInst);
				} else {
				// %2 = bitcast x86_amx %src to <256 x i32>
				// -->
				// %addr = alloca <256 x i32>, align 64
				// %addr2 = bitcast <256 x i32>* to i8*
				// call void @llvm.x86.tilestored64.internal(i16 %row, i16 %col,
				// i8* %addr2, i64 %stride)
				// %2 = load <256 x i32>, <256 x i32>* %addr, align 64
				auto *II = dyn_cast<IntrinsicInst>(Src);
				pengfeiUnsubmitted Not Done Reply Inline Actions How about the `Tile` comes from tdpbssd? pengfei: How about the `Tile` comes from tdpbssd?
				if (!II)
				return false; // May be bitcast from <256 x i32> to x86amx.
				Prepare();
				Value *Row = II->getOperand(0);
				Value *Col = II->getOperand(1);
				std::array<Value *, 5> Args = {Row, Col, I8Ptr, Stride, Src};
				Builder.CreateIntrinsic(Intrinsic::x86_tilestored64_internal, None, Args);
				pengfeiUnsubmitted Not Done Reply Inline Actions Is it possible the x86_amx operand isn't from AMX intrinsic, e.g. %src = bitcast <256 x i32> %xxx to x86_amx %2 = bitcast x86_amx %src to <256 x i32> pengfei: Is it possible the x86_amx operand isn't from AMX intrinsic, e.g. ``` %src = bitcast <256 x…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Good catch. I'll add support for this pattern. LuoYuanke: Good catch. I'll add support for this pattern.
				Value *NewInst = Builder.CreateLoad(Bitcast->getType(), AllocaAddr);
				Bitcast->replaceAllUsesWith(NewInst);
				craig.topperUnsubmitted Not Done Reply Inline Actions Shouldn't this be in the function's entry block? craig.topper: Shouldn't this be in the function's entry block?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Yes. It is in function's entry block. It is done in line 48 of function CreateAllocaInst(). CreateAllocaInst() is actually copied from your code. :) LuoYuanke: Yes. It is in function's entry block. It is done in line 48 of function CreateAllocaInst().
	}			}

	return true;			return true;
	}			}

	bool X86LowerAMXType::visit() {			namespace {
	bool C;			class X86LowerAMXType {
	auto IsAMXType = [](FixedVectorType *VTy) {			Function &Func;
	if (!VTy)
	return false;
	if (!VTy->getScalarType()->isIntegerTy(32))
	return false;
	if (VTy->getNumElements() != 256)
	return false;

	return true;			public:
				X86LowerAMXType(Function &F) : Func(F) {}
				bool visit();
	};			};

	for (BasicBlock &BB : Func) {			bool X86LowerAMXType::visit() {
	for (Instruction &Inst : BB) {			SmallVector<Instruction *, 8> DeadInsts;
	LoadInst *LD = dyn_cast<LoadInst>(&Inst);
	// Check load instruction.			for (BasicBlock *BB : post_order(&Func)) {
				pengfeiUnsubmitted Not Done Reply Inline Actions Better move it to line 310. pengfei: Better move it to line 310.
	// %3 = load <256 x i32>, <256 x i32>* %1, align 64			for (BasicBlock::reverse_iterator II = BB->rbegin(), IE = BB->rend();
	if (LD) {			II != IE;) {
	FixedVectorType *VTy = dyn_cast<FixedVectorType>(Inst.getType());			Instruction &Inst = *II++;
	if (!IsAMXType(VTy))			auto *Bitcast = dyn_cast<BitCastInst>(&Inst);
				if (!Bitcast)
	continue;			continue;
	LDSet.insert(&Inst);
				Value *Src = Bitcast->getOperand(0);
				if (Bitcast->getType()->isX86_AMXTy()) {
				if (Bitcast->user_empty()) {
				DeadInsts.push_back(Bitcast);
	continue;			continue;
	}			}
				pengfeiUnsubmitted Not Done Reply Inline Actions Better to reuse the cast result, e.g. BitCastInst BInst = dyn_cast<BitCastInst>(&Inst); if (!BInst ) You can save several `cast<BitCastInst>(&Inst)` below. pengfei:* Better to reuse the cast result, e.g. ``` BitCastInst *BInst = dyn_cast<BitCastInst>(&Inst); if…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions That's good. Thanks. LuoYuanke: That's good. Thanks.
	// Check store instruction.			LoadInst *LD = dyn_cast<LoadInst>(Src);
	// store <256 x i32> %3, <256 x i32>* %2, align 64			if (!LD) {
	StoreInst *ST = dyn_cast<StoreInst>(&Inst);			if (transformBitcast(Bitcast))
	if (!ST)			DeadInsts.push_back(Bitcast);
	continue;			continue;
	FixedVectorType *VTy =			}
	dyn_cast<FixedVectorType>(ST->getOperand(0)->getType());			// If load has mutli-user, duplicate a vector load.
				pengfeiUnsubmitted Done Reply Inline Actions vector pengfei: vector
	if (!IsAMXType(VTy))			// %src = load <256 x i32>, <256 x i32>* %addr, align 64
				// %2 = bitcast <256 x i32> %src to x86_amx
				craig.topperUnsubmitted Done Reply Inline Actions Don't use an assert to check the result of a dyn_cast. If it shouldn't fail just use cast<LoadInst> which will assert internally. craig.topper: Don't use an assert to check the result of a dyn_cast. If it shouldn't fail just use…
				// %add = add <256 x i32> %src, <256 x i32> %src2
				// -->
				// %src = load <256 x i32>, <256 x i32>* %addr, align 64
				// %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col,
				// i8* %addr, i64 %stride64)
				// %add = add <256 x i32> %src, <256 x i32> %src2

				// If load has one user, the load will be eliminated in DAG ISel.
				craig.topperUnsubmitted Done Reply Inline Actions Unchecked dyn_cast craig.topper: Unchecked dyn_cast
				// %src = load <256 x i32>, <256 x i32>* %addr, align 64
				// %2 = bitcast <256 x i32> %src to x86_amx
				// -->
				// %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col,
				// i8* %addr, i64 %stride64)
				combineLoadBitcast(LD, Bitcast);
				DeadInsts.push_back(Bitcast);
				if (LD->hasOneUse())
				DeadInsts.push_back(LD);
				pengfeiUnsubmitted Not Done Reply Inline Actions Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before this bitcast if it exists? pengfei: Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before this bitcast if it…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions In my test case, it is transformed after Combine redundant instructions. * IR Dump After Simplify the CFG * define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 { entry: %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0 %0 = load i16, i16* %row, align 64, !tbaa !2 %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1 %1 = load i16, i16* %col, align 2, !tbaa !7 %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64) #6 %3 = bitcast x86_amx %2 to <256 x i32> %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3 store <256 x i32> %3, <256 x i32>* %tile, align 64, !tbaa !8 ret void } To * IR Dump After Combine redundant instructions * ; Function Attrs: alwaysinline nounwind uwtable mustprogress define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 { entry: %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0 %0 = load i16, i16* %row, align 64, !tbaa !2 %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1 %1 = load i16, i16* %col, align 2, !tbaa !7 %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64) #6 %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3 %3 = bitcast <256 x i32>* %tile to x86_amx* store x86_amx %2, x86_amx* %3, align 64, !tbaa !8 ret void } LuoYuanke: In my test case, it is transformed after Combine redundant instructions. ``` *** IR Dump After…
				} else if (Src->getType()->isX86_AMXTy()) {
				if (Bitcast->user_empty()) {
				DeadInsts.push_back(Bitcast);
	continue;			continue;
	STSet.insert(&Inst);			}
				StoreInst *ST = nullptr;
				pengfeiUnsubmitted Not Done Reply Inline Actions Maybe better to keep a duplicated `load` that calling `transformBitcast`. The same for line 285. pengfei: Maybe better to keep a duplicated `load` that calling `transformBitcast`. The same for line 285.
				for (auto UI = Bitcast->use_begin(), UE = Bitcast->use_end();
				UI != UE;) {
				Value *I = (UI++)->getUser();
				ST = dyn_cast<StoreInst>(I);
				if (ST)
				break;
				}
				if (!ST) {
				craig.topperUnsubmitted Done Reply Inline Actions Use cast. craig.topper: Use cast.
				if (transformBitcast(Bitcast))
				DeadInsts.push_back(Bitcast);
				continue;
				}
				pengfeiUnsubmitted Not Done Reply Inline Actions `%src` is not used here. pengfei: `%src` is not used here.
				// If bitcast (%13) has one use, combine bitcast and store to amx store.
				// %src = call x86_amx @llvm.x86.tileloadd64.internal(%row, %col, %addr,
				// %stride);
				pengfeiUnsubmitted Not Done Reply Inline Actions Why we need to consider <256 x i32> has more than one use? pengfei: Why we need to consider <256 x i32> has more than one use?
				// %13 = bitcast x86_amx %src to <256 x i32>
				// store <256 x i32> %13, <256 x i32>* %addr, align 64
				// -->
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %13)
				//
				// If bitcast (%13) has multi-use, transform as below.
				// %13 = bitcast x86_amx %src to <256 x i32>
				// store <256 x i32> %13, <256 x i32>* %addr, align 64
				// %add = <256 x i32> %13, <256 x i32> %src2
				// -->
				craig.topperUnsubmitted Not Done Reply Inline Actions maximun->maximum craig.topper: maximun->maximum
				// %13 = bitcast x86_amx %src to <256 x i32>
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %13)
				// %14 = load <256 x i32>, %addr
				craig.topperUnsubmitted Not Done Reply Inline Actions Use Builder.getInt8PtrTy then you don't need Ctx craig.topper: Use Builder.getInt8PtrTy then you don't need Ctx
				// %add = <256 x i32> %14, <256 x i32> %src2
				//
				combineBitcastStore(Bitcast, ST);
				// Delete user first.
				DeadInsts.push_back(ST);
				DeadInsts.push_back(Bitcast);
				}
	}			}
	}			}

				pengfeiUnsubmitted Not Done Reply Inline Actions This comment is for above code? Better move it up. pengfei: This comment is for above code? Better move it up.
	C = visitLD() \| visitST();			bool C = !DeadInsts.empty();
	for (auto *Inst : STSet)
	Inst->eraseFromParent();			for (auto *Inst : DeadInsts)
	for (auto *Inst : LDSet)
	Inst->eraseFromParent();			Inst->eraseFromParent();

	return C;			return C;
	}			}
				pengfeiUnsubmitted Done Reply Inline Actions Why we need to recursively delete them? I think delete the nodes in DeadInsts is enough. pengfei: Why we need to recursively delete them? I think delete the nodes in DeadInsts is enough.
	} // anonymous namespace			} // anonymous namespace

	namespace {			namespace {

	class X86LowerAMXTypeLegacyPass : public FunctionPass {			class X86LowerAMXTypeLegacyPass : public FunctionPass {
	public:			public:
	static char ID;			static char ID;

	Show All 27 Lines

llvm/lib/Target/X86/X86RegisterInfo.td

	Show First 20 Lines • Show All 631 Lines • ▼ Show 20 Lines
	def VK32WM : RegisterClass<"X86", [v32i1], 32, (add VK16WM)> {let Size = 32;}			def VK32WM : RegisterClass<"X86", [v32i1], 32, (add VK16WM)> {let Size = 32;}
	def VK64WM : RegisterClass<"X86", [v64i1], 64, (add VK32WM)> {let Size = 64;}			def VK64WM : RegisterClass<"X86", [v64i1], 64, (add VK32WM)> {let Size = 64;}

	// Bound registers			// Bound registers
	def BNDR : RegisterClass<"X86", [v2i64], 128, (sequence "BND%u", 0, 3)>;			def BNDR : RegisterClass<"X86", [v2i64], 128, (sequence "BND%u", 0, 3)>;

	// Tiles			// Tiles
	let CopyCost = -1 in // Don't allow copying of tile registers			let CopyCost = -1 in // Don't allow copying of tile registers
	def TILE : RegisterClass<"X86", [v256i32], 8192,			def TILE : RegisterClass<"X86", [x86amx], 8192,
	(sequence "TMM%u", 0, 7)> {let Size = 8192;}			(sequence "TMM%u", 0, 7)> {let Size = 8192;}
	def TILECFG : RegisterClass<"X86", [untyped], 512, (add TMMCFG)> {			def TILECFG : RegisterClass<"X86", [untyped], 512, (add TMMCFG)> {
	let CopyCost = -1; // Don't allow copying of tile config registers.			let CopyCost = -1; // Don't allow copying of tile config registers.
	let isAllocatable = 1;			let isAllocatable = 1;
	let Size = 512;			let Size = 512;
	}			}

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

Show First 20 Lines • Show All 1,109 Lines • ▼ Show 20 Lines	static bool combineStoreToValueType(InstCombinerImpl &IC, StoreInst &SI) {
if (SI.getPointerOperand()->isSwiftError())		if (SI.getPointerOperand()->isSwiftError())
return false;		return false;

Value *V = SI.getValueOperand();		Value *V = SI.getValueOperand();

// Fold away bit casts of the stored value by storing the original type.		// Fold away bit casts of the stored value by storing the original type.
if (auto *BC = dyn_cast<BitCastInst>(V)) {		if (auto *BC = dyn_cast<BitCastInst>(V)) {
V = BC->getOperand(0);		V = BC->getOperand(0);
		// Don't transform when the type is x86_amx, it make the pass that lower
		// x86_amx type happy.
		if (BC->getType()->isX86_AMXTy() \|\| V->getType()->isX86_AMXTy())
		return false;
if (!SI.isAtomic() \|\| isSupportedAtomicType(V->getType())) {		if (!SI.isAtomic() \|\| isSupportedAtomicType(V->getType())) {
combineStoreToNewValue(IC, SI, V);		combineStoreToNewValue(IC, SI, V);
return true;		return true;
}		}
}		}

if (Value *U = likeBitCastFromVector(IC, V))		if (Value *U = likeBitCastFromVector(IC, V))
if (!SI.isAtomic() \|\| isSupportedAtomicType(U->getType())) {		if (!SI.isAtomic() \|\| isSupportedAtomicType(U->getType())) {
▲ Show 20 Lines • Show All 430 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/AMX/amx-across-func.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: popq %r14			; CHECK-NEXT: popq %r14
	; CHECK-NEXT: .cfi_def_cfa_offset 24			; CHECK-NEXT: .cfi_def_cfa_offset 24
	; CHECK-NEXT: popq %r15			; CHECK-NEXT: popq %r15
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: tilerelease			; CHECK-NEXT: tilerelease
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%3 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %0, i16 8, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 0), i64 32) #4			%3 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 8, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 0), i64 32) #4
	%4 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 8, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 1024), i64 32) #4			%4 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 1024), i64 32) #4
	tail call void (...) @foo() #4			tail call void (...) @foo() #4
	%5 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 2048), i64 32) #4			%5 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 2048), i64 32) #4
	%6 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %0, i16 %1, i16 8, <256 x i32> %5, <256 x i32> %3, <256 x i32> %4) #4			%6 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %0, i16 %1, i16 8, x86_amx %5, x86_amx %3, x86_amx %4) #4
	tail call void @llvm.x86.tilestored64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 2048), i64 32, <256 x i32> %6) #4			tail call void @llvm.x86.tilestored64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 2048), i64 32, x86_amx %6) #4
	ret void			ret void
	}			}

	declare dso_local void @foo(...) local_unnamed_addr #3			declare dso_local void @foo(...) local_unnamed_addr #3

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #4			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #4
	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>) #4			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) #4
	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>) #4			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx) #4

	attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #4 = { nounwind }			attributes #4 = { nounwind }
				pengfeiUnsubmitted Not Done Reply Inline Actions Better to remove these unused attributes. The same to other tests. pengfei: Better to remove these unused attributes. The same to other tests.
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I'll create a separate patch to clean the attributes. LuoYuanke: I'll create a separate patch to clean the attributes.

llvm/test/CodeGen/X86/AMX/amx-config.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%4 = icmp eq i32 %0, 0			%4 = icmp eq i32 %0, 0
	%5 = shl i16 %1, 8			%5 = shl i16 %1, 8
	%6 = ashr exact i16 %5, 8			%6 = ashr exact i16 %5, 8
	br i1 %4, label %11, label %7			br i1 %4, label %11, label %7

	7: ; preds = %3			7: ; preds = %3
	%8 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%8 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%9 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%9 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%10 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%10 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	br label %15			br label %15

	11: ; preds = %3			11: ; preds = %3
	%12 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%12 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	%13 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%13 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	%14 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%14 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	br label %15			br label %15

	15: ; preds = %11, %7			15: ; preds = %11, %7
	%16 = phi <256 x i32> [ %12, %11 ], [ %8, %7 ]			%16 = phi x86_amx [ %12, %11 ], [ %8, %7 ]
	%17 = phi <256 x i32> [ %13, %11 ], [ %9, %7 ]			%17 = phi x86_amx [ %13, %11 ], [ %9, %7 ]
	%18 = phi <256 x i32> [ %14, %11 ], [ %10, %7 ]			%18 = phi x86_amx [ %14, %11 ], [ %10, %7 ]
	%19 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %6, i16 %2, i16 %1, <256 x i32> %18, <256 x i32> %16, <256 x i32> %17) #3			%19 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %6, i16 %2, i16 %1, x86_amx %18, x86_amx %16, x86_amx %17) #3
	tail call void @llvm.x86.tilestored64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32, <256 x i32> %19) #3			tail call void @llvm.x86.tilestored64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32, x86_amx %19) #3
	ret void			ret void
	}			}

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3

	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>) #3			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) #3

	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>) #3			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx) #3

	attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll

	Show All 31 Lines
	; CHECK-NEXT: tdpbssd %tmm4, %tmm0, %tmm3			; CHECK-NEXT: tdpbssd %tmm4, %tmm0, %tmm3
	; CHECK-NEXT: tilestored %tmm3, (%rdx,%r8)			; CHECK-NEXT: tilestored %tmm3, (%rdx,%r8)
	; CHECK-NEXT: tdpbssd %tmm4, %tmm1, %tmm2			; CHECK-NEXT: tdpbssd %tmm4, %tmm1, %tmm2
	; CHECK-NEXT: tilestored %tmm2, (%rdi,%r8)			; CHECK-NEXT: tilestored %tmm2, (%rdi,%r8)
	; CHECK-NEXT: tilerelease			; CHECK-NEXT: tilerelease
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	entry:			entry:
	%a1 = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %A_mem, i64 64)			%a1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %A_mem, i64 64)
	%addr = getelementptr inbounds i8, i8* %A_mem, i64 1024			%addr = getelementptr inbounds i8, i8* %A_mem, i64 1024
	%a2 = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %addr, i64 64)			%a2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %addr, i64 64)
	%c1 = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %C_mem, i64 64)			%c1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %C_mem, i64 64)
	%caddr = getelementptr inbounds i8, i8* %C_mem, i64 1024			%caddr = getelementptr inbounds i8, i8* %C_mem, i64 1024
	%c2 = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %caddr, i64 64)			%c2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %caddr, i64 64)
	br label %dotpd			br label %dotpd

	dotpd:			dotpd:
	%b = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %B_mem, i64 64)			%b = call x86_amx @llvm.x86.tileloadd64.internal(i16 16, i16 64, i8* nonnull %B_mem, i64 64)
	%dp1 = call <256 x i32> @llvm.x86.tdpbssd.internal(i16 16, i16 64, i16 64, <256 x i32> %c1, <256 x i32> %a1, <256 x i32> %b)			%dp1 = call x86_amx @llvm.x86.tdpbssd.internal(i16 16, i16 64, i16 64, x86_amx %c1, x86_amx %a1, x86_amx %b)
	call void @llvm.x86.tilestored64.internal(i16 16, i16 64, i8* nonnull %C_mem, i64 64, <256 x i32> %dp1)			call void @llvm.x86.tilestored64.internal(i16 16, i16 64, i8* nonnull %C_mem, i64 64, x86_amx %dp1)
	%dp2 = call <256 x i32> @llvm.x86.tdpbssd.internal(i16 16, i16 64, i16 64, <256 x i32> %c2, <256 x i32> %a2, <256 x i32> %b)			%dp2 = call x86_amx @llvm.x86.tdpbssd.internal(i16 16, i16 64, i16 64, x86_amx %c2, x86_amx %a2, x86_amx %b)
	call void @llvm.x86.tilestored64.internal(i16 16, i16 64, i8* nonnull %caddr, i64 64, <256 x i32> %dp2)			call void @llvm.x86.tilestored64.internal(i16 16, i16 64, i8* nonnull %caddr, i64 64, x86_amx %dp2)
	ret void			ret void
	}			}

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64)			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64)
	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>)			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx)
	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>)			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx)

llvm/test/CodeGen/X86/AMX/amx-spill.ll

	Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movl $buf, %eax			; CHECK-NEXT: movl $buf, %eax
	; CHECK-NEXT: movl $32, %ecx			; CHECK-NEXT: movl $32, %ecx
	; CHECK-NEXT: tilestored %tmm0, (%rax,%rcx)			; CHECK-NEXT: tilestored %tmm0, (%rax,%rcx)
	; CHECK-NEXT: addq $2936, %rsp # imm = 0xB78			; CHECK-NEXT: addq $2936, %rsp # imm = 0xB78
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: tilerelease			; CHECK-NEXT: tilerelease
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%4 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%4 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%5 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%5 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%6 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%6 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%7 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%7 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%8 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%8 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%9 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%9 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%10 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%10 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%11 = icmp eq i32 %0, 0			%11 = icmp eq i32 %0, 0
	br i1 %11, label %16, label %12			br i1 %11, label %16, label %12

	12: ; preds = %3			12: ; preds = %3
	%13 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%13 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%14 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%14 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%15 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%15 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	br label %20			br label %20

	16: ; preds = %3			16: ; preds = %3
	%17 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%17 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	%18 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%18 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	%19 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%19 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	br label %20			br label %20

	20: ; preds = %16, %12			20: ; preds = %16, %12
	%21 = phi <256 x i32> [ %17, %16 ], [ %13, %12 ]			%21 = phi x86_amx [ %17, %16 ], [ %13, %12 ]
	%22 = phi <256 x i32> [ %18, %16 ], [ %14, %12 ]			%22 = phi x86_amx [ %18, %16 ], [ %14, %12 ]
	%23 = phi <256 x i32> [ %19, %16 ], [ %15, %12 ]			%23 = phi x86_amx [ %19, %16 ], [ %15, %12 ]
	%24 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %1, <256 x i32> %23, <256 x i32> %21, <256 x i32> %22) #3			%24 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %1, x86_amx %23, x86_amx %21, x86_amx %22) #3
	%25 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %2, <256 x i32> %6, <256 x i32> %24, <256 x i32> %5) #3			%25 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %2, x86_amx %6, x86_amx %24, x86_amx %5) #3
	%26 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %2, <256 x i32> %8, <256 x i32> %25, <256 x i32> %7) #3			%26 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %2, x86_amx %8, x86_amx %25, x86_amx %7) #3
	%27 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %2, i16 %2, i16 %2, <256 x i32> %10, <256 x i32> %26, <256 x i32> %9) #3			%27 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %2, i16 %2, i16 %2, x86_amx %10, x86_amx %26, x86_amx %9) #3
	tail call void @llvm.x86.tilestored64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32, <256 x i32> %27) #3			tail call void @llvm.x86.tilestored64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32, x86_amx %27) #3
	ret void			ret void
	}			}

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3
	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>) #3			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) #3
	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>) #3			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx) #3

	attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

llvm/test/CodeGen/X86/AMX/amx-type.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -lower-amx-type %s -S \| FileCheck %s			; RUN: opt -lower-amx-type %s -S \| FileCheck %s
	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%struct.__tile_str = type { i16, i16, <256 x i32> }			%struct.__tile_str = type { i16, i16, <256 x i32> }

	@buf = dso_local global [1024 x i8] zeroinitializer, align 16			@buf = dso_local global [1024 x i8] zeroinitializer, align 16
	@buf2 = dso_local global [1024 x i8] zeroinitializer, align 16			@buf2 = dso_local global [1024 x i8] zeroinitializer, align 16

				; test bitcast x86_amx to <256 x i32>
				define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
				; CHECK-LABEL: @test_user_empty(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[T1:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.]], i16 [[N:%.]], i8 [[BUF:%.]], i64 [[S:%.]]) [[ATTR3:#.*]]
				; CHECK-NEXT: ret void
				;
				entry:
				%t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
				%t2 = bitcast x86_amx %t1 to <256 x i32>
				ret void
				}

				; test bitcast <256 x i32> to x86_amx
				define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
				; CHECK-LABEL: @test_user_empty2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret void
				;
				entry:
				%t = bitcast <256 x i32> %in to x86_amx
				ret void
				}

				define dso_local <256 x i32> @test_amx_load_bitcast(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
				; CHECK-LABEL: @test_amx_load_bitcast(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[T1:%.]] = load <256 x i32>, <256 x i32> [[IN:%.*]], align 64
				; CHECK-NEXT: [[TMP0:%.]] = bitcast <256 x i32> [[IN]] to i8*
				; CHECK-NEXT: [[TMP1:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.]], i16 [[N:%.]], i8 [[TMP0]], i64 64)
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.]], i64 [[S:%.]], x86_amx [[TMP1]]) [[ATTR3]]
				; CHECK-NEXT: ret <256 x i32> [[T1]]
				;
				entry:
				%t1 = load <256 x i32>, <256 x i32>* %in, align 64
				%t2 = bitcast <256 x i32> %t1 to x86_amx
				call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t2) #3
				ret <256 x i32> %t1
				}

				define dso_local <256 x i32> @test_amx_bitcast_store(<256 x i32>* %out, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
				; CHECK-LABEL: @test_amx_bitcast_store(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[T1:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.]], i16 [[M]], i8* [[BUF:%.]], i64 [[S:%.]]) [[ATTR3]]
				; CHECK-NEXT: [[TMP0:%.]] = bitcast <256 x i32> [[OUT:%.]] to i8
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[M]], i8* [[TMP0]], i64 64, x86_amx [[T1]])
				; CHECK-NEXT: [[TMP1:%.]] = load <256 x i32>, <256 x i32> [[OUT]], align 1024
				; CHECK-NEXT: ret <256 x i32> [[TMP1]]
				;
				entry:
				%t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %m, i8* %buf, i64 %s) #3
				%t2 = bitcast x86_amx %t1 to <256 x i32>
				store <256 x i32> %t2, <256 x i32>* %out
				ret <256 x i32> %t2
				}

				define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
				pengfeiUnsubmitted Not Done Reply Inline Actions For this and the next test, we have chances to optimize to memcpy if we can make sure %s is constant 64. pengfei: For this and the next test, we have chances to optimize to memcpy if we can make sure %s is…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions If the stride is 64 we can transform the code to memcpy. How about do it in another patch? LuoYuanke: If the stride is 64 we can transform the code to memcpy. How about do it in another patch?
				; CHECK-LABEL: @test_src_add(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = alloca <256 x i32>, align 64
				; CHECK-NEXT: [[ADD:%.]] = add <256 x i32> [[Y:%.]], [[X:%.*]]
				; CHECK-NEXT: [[TMP1:%.]] = bitcast <256 x i32> [[TMP0]] to i8*
				; CHECK-NEXT: store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]], align 1024
				; CHECK-NEXT: [[TMP2:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.]], i16 [[C:%.]], i8 [[TMP1]], i64 64)
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[BUF:%.]], i64 [[S:%.]], x86_amx [[TMP2]]) [[ATTR3]]
				; CHECK-NEXT: ret void
				;
				entry:
				%add = add <256 x i32> %y, %x
				%t = bitcast <256 x i32> %add to x86_amx
				call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s, x86_amx %t) #3
				ret void
				}

				define dso_local void @test_src_add2(<256 x i32> %x, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
				; CHECK-LABEL: @test_src_add2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = alloca <256 x i32>, align 64
				; CHECK-NEXT: [[T1:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.]], i16 [[C:%.]], i8 [[BUF:%.]], i64 [[S:%.]]) [[ATTR3]]
				; CHECK-NEXT: [[TMP1:%.]] = bitcast <256 x i32> [[TMP0]] to i8*
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[TMP1]], i64 64, x86_amx [[T1]])
				; CHECK-NEXT: [[TMP2:%.]] = load <256 x i32>, <256 x i32> [[TMP0]], align 1024
				; CHECK-NEXT: [[ADD:%.]] = add <256 x i32> [[TMP2]], [[X:%.]]
				; CHECK-NEXT: ret void
				;
				entry:
				%t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %r, i16 %c, i8* %buf, i64 %s) #3
				%t2 = bitcast x86_amx %t1 to <256 x i32>
				%add = add <256 x i32> %t2, %x
				ret void
				}

	define dso_local void @test_load(i8* %in, i8* %out) local_unnamed_addr #2 {			define dso_local void @test_load(i8* %in, i8* %out) local_unnamed_addr #2 {
				pengfeiUnsubmitted Not Done Reply Inline Actions We don't need to check this case now, right? pengfei: We don't need to check this case now, right?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions It can check the load and store instruction is not transformed if they are not participate in amx operation. I prefer to keep the case. LuoYuanke: It can check the load and store instruction is not transformed if they are not participate in…
	; CHECK-LABEL: @test_load(			; CHECK-LABEL: @test_load(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[IN:%.]] to <256 x i32>			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[IN:%.]] to <256 x i32>
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[OUT:%.]] to <256 x i32>			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[OUT:%.]] to <256 x i32>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast <256 x i32> [[TMP1]] to <128 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = load <256 x i32>, <256 x i32> [[TMP1]], align 64, [[TBAA2:!tbaa !.*]]
	; CHECK-NEXT: [[TMP4:%.]] = load <128 x i32>, <128 x i32> [[TMP3]], align 64			; CHECK-NEXT: store <256 x i32> [[TMP3]], <256 x i32>* [[TMP2]], align 64, [[TBAA2]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr <128 x i32>, <128 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP6:%.]] = load <128 x i32>, <128 x i32> [[TMP5]], align 64
	; CHECK-NEXT: [[TMP7:%.]] = bitcast <256 x i32> [[TMP2]] to <128 x i32>*
	; CHECK-NEXT: store <128 x i32> [[TMP4]], <128 x i32>* [[TMP7]], align 64
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr <128 x i32>, <128 x i32> [[TMP7]], i32 1
	; CHECK-NEXT: store <128 x i32> [[TMP6]], <128 x i32>* [[TMP8]], align 64
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = bitcast i8* %in to <256 x i32>*			%1 = bitcast i8* %in to <256 x i32>*
	%2 = bitcast i8* %out to <256 x i32>*			%2 = bitcast i8* %out to <256 x i32>*
	%3 = load <256 x i32>, <256 x i32>* %1, align 64, !tbaa !8			%3 = load <256 x i32>, <256 x i32>* %1, align 64, !tbaa !8
	store <256 x i32> %3, <256 x i32>* %2, align 64, !tbaa !8			store <256 x i32> %3, <256 x i32>* %2, align 64, !tbaa !8
	ret void			ret void
	}			}

				define dso_local <256 x i32> @foo(<256 x i32>* nocapture readonly byval(<256 x i32>) align 1024 %0, <256 x i32>* nocapture readonly byval(<256 x i32>) align 1024 %1) local_unnamed_addr #0 {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X:%.]] = load <256 x i32>, <256 x i32> [[TMP0:%.]], align 1024, [[TBAA5:!tbaa !.]]
				; CHECK-NEXT: [[Y:%.]] = load <256 x i32>, <256 x i32> [[TMP1:%.*]], align 1024, [[TBAA5]]
				; CHECK-NEXT: [[ADD:%.*]] = add <256 x i32> [[Y]], [[X]]
				; CHECK-NEXT: ret <256 x i32> [[ADD]]
				;
				entry:
				%x = load <256 x i32>, <256 x i32>* %0, align 1024, !tbaa !2
				%y = load <256 x i32>, <256 x i32>* %1, align 1024, !tbaa !2
				%add = add <256 x i32> %y, %x
				ret <256 x i32> %add
				}

	define dso_local void @__tile_loadd(%struct.__tile_str* nocapture %0, i8* %1, i64 %2) local_unnamed_addr #0 {			define dso_local void @__tile_loadd(%struct.__tile_str* nocapture %0, i8* %1, i64 %2) local_unnamed_addr #0 {
	; CHECK-LABEL: @__tile_loadd(			; CHECK-LABEL: @__tile_loadd(
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP0:%.*]], i64 0, i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP0:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA2:!tbaa !.*]]			; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA5]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0]], i64 0, i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0]], i64 0, i32 1
	; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA7:!tbaa !.*]]			; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA8:!tbaa !.*]]
	; CHECK-NEXT: [[TMP8:%.]] = shl i64 [[TMP2:%.]], 32			; CHECK-NEXT: [[TMP8:%.]] = shl i64 [[TMP2:%.]], 32
	; CHECK-NEXT: [[TMP9:%.*]] = ashr exact i64 [[TMP8]], 32			; CHECK-NEXT: [[TMP9:%.*]] = ashr exact i64 [[TMP8]], 32
	; CHECK-NEXT: [[TMP10:%.]] = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP1:%.]], i64 [[TMP9]]) [[ATTR3:#.]]			; CHECK-NEXT: [[TMP10:%.]] = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP1:%.*]], i64 [[TMP9]]) [[ATTR3]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0]], i64 0, i32 2			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0]], i64 0, i32 2
	; CHECK-NEXT: [[TMP12:%.]] = bitcast <256 x i32> [[TMP11]] to i8*			; CHECK-NEXT: [[TMP12:%.]] = bitcast <256 x i32> [[TMP11]] to i8*
	; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP12]], i64 64, <256 x i32> [[TMP10]])			; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP12]], i64 64, x86_amx [[TMP10]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 0			%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 0
	%5 = load i16, i16* %4, align 64, !tbaa !2			%5 = load i16, i16* %4, align 64, !tbaa !2
	%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 1			%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 1
	%7 = load i16, i16* %6, align 2, !tbaa !7			%7 = load i16, i16* %6, align 2, !tbaa !7
	%8 = shl i64 %2, 32			%8 = shl i64 %2, 32
	%9 = ashr exact i64 %8, 32			%9 = ashr exact i64 %8, 32
	%10 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %5, i16 %7, i8* %1, i64 %9) #3			%10 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %5, i16 %7, i8* %1, i64 %9) #3
	%11 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 2			%11 = bitcast x86_amx %10 to <256 x i32>
	store <256 x i32> %10, <256 x i32>* %11, align 64, !tbaa !8			%12 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 2
				store <256 x i32> %11, <256 x i32>* %12, align 64, !tbaa !8
	ret void			ret void
	}			}

	define dso_local void @__tile_dpbsud(%struct.__tile_str* nocapture %0, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %1, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %2) local_unnamed_addr #0 {			define dso_local void @__tile_dpbsud(%struct.__tile_str* nocapture %0, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %1, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %2) local_unnamed_addr #0 {
	; CHECK-LABEL: @__tile_dpbsud(			; CHECK-LABEL: @__tile_dpbsud(
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP1:%.*]], i64 0, i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP1:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA2]]			; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA5]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2:%.*]], i64 0, i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2:%.*]], i64 0, i32 1
	; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA7]]			; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA8]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP1]], i64 0, i32 1			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP1]], i64 0, i32 1
	; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP8]], align 2, [[TBAA7]]			; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP8]], align 2, [[TBAA8]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0:%.*]], i64 0, i32 2			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0:%.*]], i64 0, i32 2
	; CHECK-NEXT: [[TMP11:%.]] = bitcast <256 x i32> [[TMP10]] to i8*			; CHECK-NEXT: [[TMP11:%.]] = bitcast <256 x i32> [[TMP10]] to i8*
	; CHECK-NEXT: [[TMP12:%.]] = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP11]], i64 64)			; CHECK-NEXT: [[TMP12:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP11]], i64 64)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP1]], i64 0, i32 2			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP1]], i64 0, i32 2
	; CHECK-NEXT: [[TMP14:%.]] = bitcast <256 x i32> [[TMP13]] to i8*			; CHECK-NEXT: [[TMP14:%.]] = bitcast <256 x i32> [[TMP13]] to i8*
	; CHECK-NEXT: [[TMP15:%.]] = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP9]], i8 [[TMP14]], i64 64)			; CHECK-NEXT: [[TMP15:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP9]], i8 [[TMP14]], i64 64)
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 2			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 2
	; CHECK-NEXT: [[TMP17:%.]] = bitcast <256 x i32> [[TMP16]] to i8*			; CHECK-NEXT: [[TMP17:%.]] = bitcast <256 x i32> [[TMP16]] to i8*
	; CHECK-NEXT: [[TMP18:%.]] = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP9]], i16 [[TMP7]], i8 [[TMP17]], i64 64)			; CHECK-NEXT: [[TMP18:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP9]], i16 [[TMP7]], i8 [[TMP17]], i64 64)
	; CHECK-NEXT: [[TMP19:%.*]] = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 [[TMP5]], i16 [[TMP7]], i16 [[TMP9]], <256 x i32> [[TMP12]], <256 x i32> [[TMP15]], <256 x i32> [[TMP18]]) [[ATTR3]]			; CHECK-NEXT: [[TMP19:%.*]] = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 [[TMP5]], i16 [[TMP7]], i16 [[TMP9]], x86_amx [[TMP12]], x86_amx [[TMP15]], x86_amx [[TMP18]]) [[ATTR3]]
	; CHECK-NEXT: [[TMP20:%.]] = bitcast <256 x i32> [[TMP10]] to i8*			; CHECK-NEXT: [[TMP20:%.]] = bitcast <256 x i32> [[TMP10]] to i8*
	; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP20]], i64 64, <256 x i32> [[TMP19]])			; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP20]], i64 64, x86_amx [[TMP19]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 0			%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 0
	%5 = load i16, i16* %4, align 64, !tbaa !2			%5 = load i16, i16* %4, align 64, !tbaa !2
	%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 1			%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 1
	%7 = load i16, i16* %6, align 2, !tbaa !7			%7 = load i16, i16* %6, align 2, !tbaa !7
	%8 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 1			%8 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 1
	%9 = load i16, i16* %8, align 2, !tbaa !7			%9 = load i16, i16* %8, align 2, !tbaa !7
	%10 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 2			%10 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 2
	%11 = load <256 x i32>, <256 x i32>* %10, align 64, !tbaa !8			%11 = load <256 x i32>, <256 x i32>* %10, align 64, !tbaa !8
	%12 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 2			%12 = bitcast <256 x i32> %11 to x86_amx
	%13 = load <256 x i32>, <256 x i32>* %12, align 64, !tbaa !8			%13 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 2
	%14 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 2			%14 = load <256 x i32>, <256 x i32>* %13, align 64, !tbaa !8
	%15 = load <256 x i32>, <256 x i32>* %14, align 64, !tbaa !8			%15 = bitcast <256 x i32> %14 to x86_amx
	%16 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %5, i16 %7, i16 %9, <256 x i32> %11, <256 x i32> %13, <256 x i32> %15) #3			%16 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 2
	store <256 x i32> %16, <256 x i32>* %10, align 64, !tbaa !8			%17 = load <256 x i32>, <256 x i32>* %16, align 64, !tbaa !8
				%18 = bitcast <256 x i32> %17 to x86_amx
				%19 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %5, i16 %7, i16 %9, x86_amx %12, x86_amx %15, x86_amx %18) #3
				%20 = bitcast x86_amx %19 to <256 x i32>
				store <256 x i32> %20, <256 x i32>* %10, align 64, !tbaa !8
	ret void			ret void
	}			}

	define dso_local void @__tile_stored(i8* %0, i64 %1, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %2) local_unnamed_addr #1 {			define dso_local void @__tile_stored(i8* %0, i64 %1, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %2) local_unnamed_addr #1 {
	; CHECK-LABEL: @__tile_stored(			; CHECK-LABEL: @__tile_stored(
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP2:%.*]], i64 0, i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP2:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA2]]			; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA5]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 1
	; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA7]]			; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA8]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 2			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 2
	; CHECK-NEXT: [[TMP9:%.]] = bitcast <256 x i32> [[TMP8]] to i8*			; CHECK-NEXT: [[TMP9:%.]] = bitcast <256 x i32> [[TMP8]] to i8*
	; CHECK-NEXT: [[TMP10:%.]] = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP9]], i64 64)			; CHECK-NEXT: [[TMP10:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP9]], i64 64)
	; CHECK-NEXT: [[TMP11:%.]] = shl i64 [[TMP1:%.]], 32			; CHECK-NEXT: [[TMP11:%.]] = shl i64 [[TMP1:%.]], 32
	; CHECK-NEXT: [[TMP12:%.*]] = ashr exact i64 [[TMP11]], 32			; CHECK-NEXT: [[TMP12:%.*]] = ashr exact i64 [[TMP11]], 32
	; CHECK-NEXT: tail call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP0:%.*]], i64 [[TMP12]], <256 x i32> [[TMP10]]) [[ATTR3]]			; CHECK-NEXT: tail call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP0:%.*]], i64 [[TMP12]], x86_amx [[TMP10]]) [[ATTR3]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 0			%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 0
	%5 = load i16, i16* %4, align 64, !tbaa !2			%5 = load i16, i16* %4, align 64, !tbaa !2
	%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 1			%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 1
	%7 = load i16, i16* %6, align 2, !tbaa !7			%7 = load i16, i16* %6, align 2, !tbaa !7
	%8 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 2			%8 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 2
	%9 = load <256 x i32>, <256 x i32>* %8, align 64, !tbaa !8			%9 = load <256 x i32>, <256 x i32>* %8, align 64, !tbaa !8
	%10 = shl i64 %1, 32			%10 = bitcast <256 x i32> %9 to x86_amx
	%11 = ashr exact i64 %10, 32			%11 = shl i64 %1, 32
	tail call void @llvm.x86.tilestored64.internal(i16 %5, i16 %7, i8* %0, i64 %11, <256 x i32> %9) #3			%12 = ashr exact i64 %11, 32
				tail call void @llvm.x86.tilestored64.internal(i16 %5, i16 %7, i8* %0, i64 %12, x86_amx %10) #3
	ret void			ret void
	}			}

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3
	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>) #3			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) #3
	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>) #3			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx) #3

	attributes #0 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #0 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #1 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #2 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #2 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

	!llvm.module.flags = !{!0}			!llvm.module.flags = !{!0}
	!llvm.ident = !{!1}			!llvm.ident = !{!1}
	Show All 10 Lines

llvm/utils/TableGen/CodeGenTarget.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	StringRef llvm::getEnumName(MVT::SimpleValueType T) {
case MVT::f16: return "MVT::f16";		case MVT::f16: return "MVT::f16";
case MVT::bf16: return "MVT::bf16";		case MVT::bf16: return "MVT::bf16";
case MVT::f32: return "MVT::f32";		case MVT::f32: return "MVT::f32";
case MVT::f64: return "MVT::f64";		case MVT::f64: return "MVT::f64";
case MVT::f80: return "MVT::f80";		case MVT::f80: return "MVT::f80";
case MVT::f128: return "MVT::f128";		case MVT::f128: return "MVT::f128";
case MVT::ppcf128: return "MVT::ppcf128";		case MVT::ppcf128: return "MVT::ppcf128";
case MVT::x86mmx: return "MVT::x86mmx";		case MVT::x86mmx: return "MVT::x86mmx";
		case MVT::x86amx: return "MVT::x86amx";
case MVT::Glue: return "MVT::Glue";		case MVT::Glue: return "MVT::Glue";
case MVT::isVoid: return "MVT::isVoid";		case MVT::isVoid: return "MVT::isVoid";
case MVT::v1i1: return "MVT::v1i1";		case MVT::v1i1: return "MVT::v1i1";
case MVT::v2i1: return "MVT::v2i1";		case MVT::v2i1: return "MVT::v2i1";
case MVT::v4i1: return "MVT::v4i1";		case MVT::v4i1: return "MVT::v4i1";
case MVT::v8i1: return "MVT::v8i1";		case MVT::v8i1: return "MVT::v8i1";
case MVT::v16i1: return "MVT::v16i1";		case MVT::v16i1: return "MVT::v16i1";
case MVT::v32i1: return "MVT::v32i1";		case MVT::v32i1: return "MVT::v32i1";
▲ Show 20 Lines • Show All 823 Lines • Show Last 20 Lines

llvm/utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 242 Lines • ▼ Show 20 Lines	enum IIT_Info {
IIT_VEC_ELEMENT = 42,		IIT_VEC_ELEMENT = 42,
IIT_SCALABLE_VEC = 43,		IIT_SCALABLE_VEC = 43,
IIT_SUBDIVIDE2_ARG = 44,		IIT_SUBDIVIDE2_ARG = 44,
IIT_SUBDIVIDE4_ARG = 45,		IIT_SUBDIVIDE4_ARG = 45,
IIT_VEC_OF_BITCASTS_TO_INT = 46,		IIT_VEC_OF_BITCASTS_TO_INT = 46,
IIT_V128 = 47,		IIT_V128 = 47,
IIT_BF16 = 48,		IIT_BF16 = 48,
IIT_STRUCT9 = 49,		IIT_STRUCT9 = 49,
IIT_V256 = 50		IIT_V256 = 50,
		IIT_AMX = 51
		pengfeiUnsubmitted Not Done Reply Inline Actions Remove `,` pengfei: Remove `,`
};		};

static void EncodeFixedValueType(MVT::SimpleValueType VT,		static void EncodeFixedValueType(MVT::SimpleValueType VT,
std::vector<unsigned char> &Sig) {		std::vector<unsigned char> &Sig) {
if (MVT(VT).isInteger()) {		if (MVT(VT).isInteger()) {
unsigned BitWidth = MVT(VT).getFixedSizeInBits();		unsigned BitWidth = MVT(VT).getFixedSizeInBits();
switch (BitWidth) {		switch (BitWidth) {
default: PrintFatalError("unhandled integer type width in intrinsic!");		default: PrintFatalError("unhandled integer type width in intrinsic!");
Show All 11 Lines	static void EncodeFixedValueType(MVT::SimpleValueType VT,
case MVT::f16: return Sig.push_back(IIT_F16);		case MVT::f16: return Sig.push_back(IIT_F16);
case MVT::bf16: return Sig.push_back(IIT_BF16);		case MVT::bf16: return Sig.push_back(IIT_BF16);
case MVT::f32: return Sig.push_back(IIT_F32);		case MVT::f32: return Sig.push_back(IIT_F32);
case MVT::f64: return Sig.push_back(IIT_F64);		case MVT::f64: return Sig.push_back(IIT_F64);
case MVT::f128: return Sig.push_back(IIT_F128);		case MVT::f128: return Sig.push_back(IIT_F128);
case MVT::token: return Sig.push_back(IIT_TOKEN);		case MVT::token: return Sig.push_back(IIT_TOKEN);
case MVT::Metadata: return Sig.push_back(IIT_METADATA);		case MVT::Metadata: return Sig.push_back(IIT_METADATA);
case MVT::x86mmx: return Sig.push_back(IIT_MMX);		case MVT::x86mmx: return Sig.push_back(IIT_MMX);
		case MVT::x86amx: return Sig.push_back(IIT_AMX);
// MVT::OtherVT is used to mean the empty struct type here.		// MVT::OtherVT is used to mean the empty struct type here.
case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);		case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
// MVT::isVoid is used to represent varargs here.		// MVT::isVoid is used to represent varargs here.
case MVT::isVoid: return Sig.push_back(IIT_VARARG);		case MVT::isVoid: return Sig.push_back(IIT_VARARG);
}		}
}		}

#if defined(_MSC_VER) && !defined(__clang__)		#if defined(_MSC_VER) && !defined(__clang__)
▲ Show 20 Lines • Show All 727 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add x86_amx type for intel AMX.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 314066

clang/test/CodeGen/X86/amx_api.c

llvm/include/llvm-c/Core.h

llvm/include/llvm/Bitcode/LLVMBitCodes.h

llvm/include/llvm/CodeGen/ValueTypes.td

llvm/include/llvm/IR/DataLayout.h

llvm/include/llvm/IR/Intrinsics.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/IntrinsicsX86.td

llvm/include/llvm/IR/Type.h

llvm/include/llvm/Support/MachineValueType.h

llvm/lib/Analysis/ConstantFolding.cpp

llvm/lib/AsmParser/LLLexer.cpp

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

llvm/lib/CodeGen/ValueTypes.cpp

llvm/lib/IR/AsmWriter.cpp

llvm/lib/IR/ConstantFold.cpp

llvm/lib/IR/Core.cpp

llvm/lib/IR/DataLayout.cpp

llvm/lib/IR/Function.cpp

llvm/lib/IR/LLVMContextImpl.h

llvm/lib/IR/LLVMContextImpl.cpp

llvm/lib/IR/Type.cpp

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86LowerAMXType.cpp

llvm/lib/Target/X86/X86RegisterInfo.td

llvm/lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp

llvm/test/CodeGen/X86/AMX/amx-across-func.ll

llvm/test/CodeGen/X86/AMX/amx-config.ll

llvm/test/CodeGen/X86/AMX/amx-intrinsic-chain.ll

llvm/test/CodeGen/X86/AMX/amx-spill.ll

llvm/test/CodeGen/X86/AMX/amx-type.ll

llvm/utils/TableGen/CodeGenTarget.cpp

llvm/utils/TableGen/IntrinsicEmitter.cpp

[X86] Add x86_amx type for intel AMX.
ClosedPublic