This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/test/CodeGen/X86/
-
test/
-
CodeGen/
-
X86/
-
amx_api.c
-
llvm/
-
include/
-
llvm-c/
1/3
Core.h
-
llvm/
-
Bitcode/
-
LLVMBitCodes.h
-
CodeGen/
-
ValueTypes.td
-
IR/
-
DataLayout.h
-
Intrinsics.h
-
Intrinsics.td
-
IntrinsicsX86.td
3
Type.h
-
Support/
-
MachineValueType.h
-
lib/
-
Analysis/
-
ConstantFolding.cpp
-
AsmParser/
-
LLLexer.cpp
-
Bitcode/
-
Reader/
-
BitcodeReader.cpp
-
Writer/
-
BitcodeWriter.cpp
-
CodeGen/
-
ValueTypes.cpp
-
IR/
-
AsmWriter.cpp
1
ConstantFold.cpp
-
Core.cpp
2/2
DataLayout.cpp
-
Function.cpp
-
LLVMContextImpl.h
-
LLVMContextImpl.cpp
-
Type.cpp
-
Target/X86/
-
X86/
-
X86ISelDAGToDAG.cpp
1
X86ISelLowering.cpp
13/38
X86LowerAMXType.cpp
-
X86RegisterInfo.td
-
test/CodeGen/X86/AMX/
-
CodeGen/
-
X86/
-
AMX/
1/2
amx-across-func.ll
-
amx-config.ll
-
amx-spill.ll
2/4
amx-type.ll
-
utils/TableGen/
-
TableGen/
-
CodeGenTarget.cpp
1
IntrinsicEmitter.cpp

Differential D91927

[X86] Add x86_amx type for intel AMX.
ClosedPublic

Authored by LuoYuanke on Nov 21 2020, 4:51 PM.

Download Raw Diff

Details

Reviewers

deadalnix
craig.topper
hfinkel
akashk4
rengolin
mehdi_amini
pengfei
wxiao3
xiangzhangllvm

Commits

rG981a0bd85811: [X86] Add x86_amx type for intel AMX.

Summary

The x86_amx is used for AMX intrinsics. <256 x i32> is bitcasted to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcasted to <256 x i32> when it is used by load/store instruction. So amx intrinsics only operate on type x86_amx. The new type x86_amx can help to separate amx intrinsics from llvm IR instructions (+-*/). Thank Craig for the idea. This patch depends on https://reviews.llvm.org/D87981.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	90 ms	x64 debian > LLVM.CodeGen/X86::addrsig.ll
	90 ms	x64 debian > LLVM.CodeGen/X86::atomic64.ll
	140 ms	x64 debian > LLVM.CodeGen/X86::block-placement.ll
	50 ms	x64 debian > LLVM.CodeGen/X86::statepoint-vector.ll
	40 ms	x64 debian > LLVM.CodeGen/X86::tail-dup-merge-loop-headers.ll
		View Full Test Results (11 Failed)

Event Timeline

LuoYuanke created this revision.Nov 21 2020, 4:51 PM

Herald added a reviewer: deadalnix. · View Herald TranscriptNov 21 2020, 4:51 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, cfe-commits, dexonsmith and 3 others. · View Herald Transcript

LuoYuanke requested review of this revision.Nov 21 2020, 4:51 PM

Herald added a subscriber: jdoerfert. · View Herald TranscriptNov 21 2020, 4:51 PM

Harbormaster completed remote builds in B79710: Diff 306886.Nov 21 2020, 4:52 PM

LuoYuanke retitled this revision from [X86] Add x86_amx type for intel AMX. The x86_amx is used for AMX intrisics. <256 x i32> is bitcast to x86_amx when it is used by AMX intrinsics, and x86_amx is bitcast to <256 x i32> when it is used by load/store instruction. So amx intrinsics... to [X86] Add x86_amx type for intel AMX. .Nov 21 2020, 4:54 PM

LuoYuanke edited the summary of this revision. (Show Details)

LuoYuanke added reviewers: craig.topper, hfinkel, akashk4, rengolin, mehdi_amini, pengfei, wxiao3, xiangzhangllvm.Nov 21 2020, 5:01 PM

LuoYuanke mentioned this in D87981: [X86] AMX programming model..

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

llvm/lib/Target/X86/X86ISelLowering.cpp
5348–5349	Should this just be deleted?
llvm/lib/Target/X86/X86LowerAMXType.cpp
413	Don't use an assert to check the result of a dyn_cast. If it shouldn't fail just use cast<LoadInst> which will assert internally.
421	Unchecked dyn_cast
444	Use cast.

In D91927#2412557, @craig.topper wrote:

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

@craig.topper , thank you for reviewing my patch.
I think if user just use our external API, such IR won't be generated. However if there is such IR, we can transform bitcast to <store, load>, so that the type can be translated through memory. One of <store, load> is AMX intrinsics store/load, so it won't be optimized. Is it reasonable?

In D91927#2412604, @LuoYuanke wrote:

In D91927#2412557, @craig.topper wrote:

I only took a quick pass through this so far. What happens if a bitcast between x86amx and v256i32(or any other 1024-bit vector type) exists in the IR but isn't next to a load/store?

@craig.topper , thank you for reviewing my patch.
I think if user just use our external API, such IR won't be generated. However if there is such IR, we can transform bitcast to <store, load>, so that the type can be translated through memory. One of <store, load> is AMX intrinsics store/load, so it won't be optimized. Is it reasonable?

Its fine if its not optimized, just make sure it doesn't crash.

Address Craig's comments.

Harbormaster completed remote builds in B79920: Diff 307298.Nov 24 2020, 3:08 AM

Address Craig's comments.
Change dyn_cast to cast.

Harbormaster completed remote builds in B79921: Diff 307300.Nov 24 2020, 3:17 AM

LuoYuanke marked 3 inline comments as done.Nov 24 2020, 3:19 AM

pengfei added a parent revision: D87981: [X86] AMX programming model..Nov 24 2020, 4:26 AM

pengfei added inline comments.Nov 24 2020, 4:30 AM

llvm/lib/IR/DataLayout.cpp
796	Should be 512 bits?
llvm/lib/Target/X86/X86LowerAMXType.cpp
398	Why the alignment not be 64?

pengfei added inline comments.Nov 24 2020, 4:43 AM

llvm/lib/Target/X86/X86LowerAMXType.cpp
448	`%src` is not used here.
llvm/utils/TableGen/IntrinsicEmitter.cpp
252	Remove `,`

craig.topper added inline comments.Nov 24 2020, 10:45 AM

llvm/lib/Target/X86/X86LowerAMXType.cpp
391–392	Shouldn't this be in the function's entry block?
408	Just use Value. auto doesn't add any value other than shortening by 1 character.
462	maximun->maximum
466	Use Builder.getInt8PtrTy then you don't need Ctx

LuoYuanke marked an inline comment as done.Nov 24 2020, 9:39 PM

LuoYuanke added inline comments.

llvm/lib/IR/DataLayout.cpp
796	Yes. It is 512. Thanks.
llvm/lib/Target/X86/X86LowerAMXType.cpp
391–392	Yes. It is in function's entry block. It is done in line 48 of function CreateAllocaInst(). CreateAllocaInst() is actually copied from your code. :)
398	1024 is conservatives, because vector require the alignment to be the vector size. Here generate vector <256 x i32> load/store.

Address Craig and Pengfei's comments.

Harbormaster completed remote builds in B80048: Diff 307514.Nov 24 2020, 10:34 PM

Add the handler of "bitcast <256 x i32>* to x86_amx*".
Refactor the code.

Harbormaster completed remote builds in B80407: Diff 308146.Nov 28 2020, 4:53 AM

LuoYuanke added a subscriber: annita.zhang.Nov 30 2020, 3:23 AM

pengfei mentioned this in D92449: [X86] Sink x86_amx load in AMX type lowering..Dec 2 2020, 1:14 AM

LuoYuanke added a child revision: D92449: [X86] Sink x86_amx load in AMX type lowering..Dec 3 2020, 12:52 AM

LuoYuanke updated this revision to Diff 309761.Dec 5 2020, 11:31 PM

Avoid generatng constant for x86_amx.

Harbormaster completed remote builds in B81221: Diff 309761.Dec 6 2020, 12:49 AM

LuoYuanke added a child revision: D92837: [X86] Support tilezero intrinsic and c interface for AMX..Dec 8 2020, 4:50 AM

Rebase.

Harbormaster completed remote builds in B81790: Diff 310800.Dec 10 2020, 2:29 AM

LuoYuanke added a child revision: D93594: [X86] Pass to transform amx intrinsics to scalar operation..Dec 20 2020, 5:00 AM

pengfei added inline comments.Dec 21 2020, 5:46 AM

llvm/lib/IR/ConstantFold.cpp
540	Operation should at the end of the line.
llvm/lib/Target/X86/X86LowerAMXType.cpp
176	I think we'd better to check exceptions. E.g. default: llvm_unreachable(""); case Intrinsic::x86_tileloadd64_internal: case Intrinsic::x86_tdpbssd_internal: case Intrinsic::x86_tilestored64_internal: Row = II->getArgOperand(0); Col = II->getArgOperand(1); break;
239	Why don't check empty like line 157?
393–394	Better move it to line 310.
396–403	Better to reuse the cast result, e.g. BitCastInst *BInst = dyn_cast<BitCastInst>(&Inst); if (!BInst ) You can save several `cast<BitCastInst>(&Inst)` below.
429	Is it possible the x86_amx operand isn't from AMX intrinsic, e.g. %src = bitcast <256 x i32> %xxx to x86_amx %2 = bitcast x86_amx %src to <256 x i32>
430	Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before this bitcast if it exists?
436	Maybe better to keep a duplicated `load` that calling `transformBitcast`. The same for line 285.
451	Why we need to consider <256 x i32> has more than one use?
llvm/test/CodeGen/X86/AMX/amx-across-func.ll
89–91	Better to remove these unused attributes. The same to other tests.
llvm/test/CodeGen/X86/AMX/amx-type.ll
67	For this and the next test, we have chances to optimize to memcpy if we can make sure %s is constant 64.
145	We don't need to check this case now, right?

Address Pengfei's comments.

Rebase and fix lit test case failure.

pengfei added inline comments.Dec 22 2020, 6:57 AM

llvm/lib/Target/X86/X86LowerAMXType.cpp
264	Maybe better to use BitCastInst?
274	Why don't put it in DeadBitcasts?
280	Can we leave the canonicalize bitcast cases a single patch. It's a bit complex here and I don't think it's a common case.
420	This comment is for above code? Better move it up.

Harbormaster completed remote builds in B83255: Diff 313315.Dec 22 2020, 7:02 AM

Harbormaster completed remote builds in B83264: Diff 313326.Dec 22 2020, 7:39 AM

LuoYuanke added inline comments.Dec 22 2020, 3:20 PM

llvm/lib/Target/X86/X86LowerAMXType.cpp
264	There may be dead load or store instructions.
280	Ok, I'll create another patch for it.
396–403	That's good. Thanks.
429	Good catch. I'll add support for this pattern.
llvm/test/CodeGen/X86/AMX/amx-across-func.ll
89–91	I'll create a separate patch to clean the attributes.
llvm/test/CodeGen/X86/AMX/amx-type.ll
67	If the stride is 64 we can transform the code to memcpy. How about do it in another patch?
145	It can check the load and store instruction is not transformed if they are not participate in amx operation. I prefer to keep the case.

Address Pengfei's comments.

Harbormaster completed remote builds in B83344: Diff 313458.Dec 22 2020, 6:18 PM

LuoYuanke added a child revision: D93740: [X86] Canonicalize AMX bitcast instruction..Dec 22 2020, 6:31 PM

LuoYuanke added inline comments.Dec 22 2020, 10:04 PM

llvm/lib/Target/X86/X86LowerAMXType.cpp

430

In my test case, it is transformed after Combine redundant instructions.

*** IR Dump After Simplify the CFG ***
define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 {
entry:
  %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0
  %0 = load i16, i16* %row, align 64, !tbaa !2
  %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1
  %1 = load i16, i16* %col, align 2, !tbaa !7
  %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64) #6
  %3 = bitcast x86_amx %2 to <256 x i32>
  %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3
  store <256 x i32> %3, <256 x i32>* %tile, align 64, !tbaa !8
  ret void
}

*** IR Dump After Combine redundant instructions ***
; Function Attrs: alwaysinline nounwind uwtable mustprogress
define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 {
entry:
  %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0
  %0 = load i16, i16* %row, align 64, !tbaa !2
  %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1
  %1 = load i16, i16* %col, align 2, !tbaa !7
  %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024
x i8]* @buf, i64 0, i64 0), i64 64) #6
  %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3
  %3 = bitcast <256 x i32>* %tile to x86_amx*
  store x86_amx %2, x86_amx* %3, align 64, !tbaa !8
  ret void
}

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

llvm/lib/Target/X86/X86LowerAMXType.cpp
278	I don't see any chance this happen. But we still need to handle the x86_amx* here if possible, right? Maybe better to give an assertion for now. cast<PointerType>(Src->getType())->isX86_AMXTy()

In D91927#2469977, @pengfei wrote:

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

Ok, I'll disable the transform for AMX type.

Address Pengfei's comments.

LuoYuanke added a child revision: D93788: [X86] Transform amx pointer..Dec 23 2020, 6:04 PM

Improve the comments.

Harbormaster completed remote builds in B83442: Diff 313635.Dec 23 2020, 6:41 PM

Harbormaster completed remote builds in B83445: Diff 313639.Dec 23 2020, 7:06 PM

LuoYuanke added a child revision: D93792: [X86] Refactor AMX test case, remove unnecessary code..Dec 23 2020, 7:40 PM

In D91927#2470818, @LuoYuanke wrote:

In D91927#2469977, @pengfei wrote:

In my test case, it is transformed after Combine redundant instructions.

Can we disable it for AMX type? The pointer to AMX type is meaningless and may result in bad perfomance.

Ok, I'll disable the transform for AMX type.

Good job.

llvm/lib/Target/X86/X86LowerAMXType.cpp
153	Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here.
233–250	Value
235	How about the `Tile` comes from tdpbssd?
398	We don't need to align to 1024. 64 should be enough. The same for below comments.
422	How about the `Tile` comes from tdpbssd?

LuoYuanke added inline comments.Dec 23 2020, 11:19 PM

llvm/lib/Target/X86/X86LowerAMXType.cpp
235	We have a convention, when amx intrinsics define a x86_amx tile the first 2 operands is the shape of the defined tile. For tdpbssd, the intrinsics operands are (m, n, k, ...). (m, n) is the shape of the produced tile.

pengfei added inline comments.Dec 23 2020, 11:35 PM

llvm/lib/Target/X86/X86LowerAMXType.cpp
235	Oh, yes. I missed that. Thanks.
411	vector
426	Why we need to recursively delete them? I think delete the nodes in DeadInsts is enough.

Address Pengfei's comments.

Refine comments.

LuoYuanke marked 2 inline comments as done.Dec 24 2020, 12:30 AM

LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if others have objections.

This revision is now accepted and ready to land.Dec 24 2020, 12:34 AM

In D91927#2471140, @pengfei wrote:

LGTM. Thanks for the refactors. Maybe better to wait for a few days to see if others have objections.

Thank Pengfei for the review. Sure, I'll wait for a few days.

Harbormaster completed remote builds in B83465: Diff 313663.Dec 24 2020, 1:04 AM

Harbormaster completed remote builds in B83468: Diff 313669.Dec 24 2020, 1:17 AM

LuoYuanke added a child revision: D93898: [X86] Fix tile register spill issue..Dec 29 2020, 4:10 AM

This revision was landed with ongoing or failed builds.Dec 29 2020, 9:52 PM

Closed by commit rG981a0bd85811: [X86] Add x86_amx type for intel AMX. (authored by LuoYuanke). · Explain Why

This revision was automatically updated to reflect the committed changes.

LuoYuanke added a commit: rG981a0bd85811: [X86] Add x86_amx type for intel AMX..

uabelho added a subscriber: uabelho.Dec 30 2020, 6:11 AM

uabelho added inline comments.

llvm/include/llvm/IR/Type.h
68	This addition causes a compilation warning in HexagonTargetObjectFile.cpp: ../lib/Target/Hexagon/HexagonTargetObjectFile.cpp:297:11: error: enumeration value 'X86_AMXTyID' not handled in switch [-Werror,-Wswitch] switch (Ty->getTypeID()) { ^ 1 error generated. Seen in build bots, e.g. here: http://lab.llvm.org:8011/#/builders/57/builds/2889/steps/6/logs/stdio

pengfei added inline comments.Dec 30 2020, 6:35 AM

llvm/include/llvm/IR/Type.h
68	Thanks Mikael for pointing it out. I think we just need to put the type in the switch table. I've posted a patch to fix it. rG16c2067cf212.

uabelho added inline comments.Dec 30 2020, 6:48 AM

llvm/include/llvm/IR/Type.h
68	Yep, thanks!

D93944 fixed an llvm-c-test issue. Note, adding new enum members usually requires check-all (at least check-llvm, but Clang may use these enum as well) because they can be used everywhere.

Thank @pengfei and @MaskRay.

LuoYuanke removed a child revision: D92837: [X86] Support tilezero intrinsic and c interface for AMX..Dec 30 2020, 5:32 PM

LuoYuanke added a child revision: D94372: [X86][AMX] Prohibit pointer cast on load..Jan 9 2021, 10:11 PM

cuviper added a subscriber: cuviper.Jan 27 2021, 1:20 PM

cuviper added inline comments.

llvm/include/llvm-c/Core.h
163	This is a breaking change to the C ABI -- can we move it to the end of the enum? https://bugs.llvm.org/show_bug.cgi?id=48905

MaskRay added inline comments.Jan 27 2021, 4:32 PM

llvm/include/llvm-c/Core.h
163	Done in 6612c2bb68becda5504099b48082c844503c6d4c

LuoYuanke added inline comments.Jan 27 2021, 5:20 PM

llvm/include/llvm-c/Core.h
163	@MaskRay, thank you!

ychen added a subscriber: ychen.Mar 17 2021, 9:12 PM

Revision Contents

Path

Size

clang/

test/

CodeGen/

X86/

amx_api.c

13 lines

llvm/

include/

llvm-c/

Core.h

7 lines

llvm/

Bitcode/

LLVMBitCodes.h

3 lines

CodeGen/

ValueTypes.td

1 line

IR/

2 lines

3 lines

2 lines

32 lines

12 lines

Support/

MachineValueType.h

4 lines

lib/

Analysis/

ConstantFolding.cpp

15 lines

AsmParser/

LLLexer.cpp

1 line

Bitcode/

Reader/

BitcodeReader.cpp

3 lines

Writer/

BitcodeWriter.cpp

1 line

CodeGen/

ValueTypes.cpp

3 lines

IR/

1 line

2 lines

8 lines

2 lines

9 lines

2 lines

1 line

15 lines

Target/

X86/

4 lines

7 lines

566 lines

2 lines

test/

CodeGen/

X86/

AMX/

16 lines

28 lines

48 lines

227 lines

utils/

TableGen/

CodeGenTarget.cpp

1 line

IntrinsicEmitter.cpp

4 lines

Diff 313315

clang/test/CodeGen/X86/amx_api.c

	// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +avx512f -target-feature +amx-int8 \			// RUN: %clang_cc1 %s -ffreestanding -triple=x86_64-unknown-unknown -target-feature +avx512f -target-feature +amx-int8 \
	// RUN: -target-feature +amx-bf16 -emit-llvm -o - -Werror -pedantic \| FileCheck %s --check-prefixes=CHECK			// RUN: -target-feature +amx-bf16 -emit-llvm -o - -Werror -pedantic \| FileCheck %s --check-prefixes=CHECK

	#include <immintrin.h>			#include <immintrin.h>

	char buf[1024];			char buf[1024];
	#define STRIDE 32			#define STRIDE 32

	char buf2[1024];			char buf2[1024];

	// This is an example code and integration test.			// This is an example code and integration test.
	void test_api(int cond, short row, short col) {			void test_api(int cond, short row, short col) {
	//CHECK-LABEL: @test_api			//CHECK-LABEL: @test_api
	//CHECK: call <256 x i32> @llvm.x86.tileloadd64.internal			//CHECK: call x86_amx @llvm.x86.tileloadd64.internal
	//CHECK: call <256 x i32> @llvm.x86.tdpbssd.internal			//CHECK: call x86_amx @llvm.x86.tdpbssd.internal
	//CHECK: call void @llvm.x86.tilestored64.internal			//CHECK: call void @llvm.x86.tilestored64.internal
	__tile1024i a = {row, 8};			__tile1024i a = {row, 8};
	__tile1024i b = {8, col};			__tile1024i b = {8, col};
	__tile1024i c = {row, col};			__tile1024i c = {row, col};

	if (cond) {			if (cond) {
	__tile_loadd(&a, buf, STRIDE);			__tile_loadd(&a, buf, STRIDE);
	__tile_loadd(&b, buf, STRIDE);			__tile_loadd(&b, buf, STRIDE);
	__tile_loadd(&c, buf, STRIDE);			__tile_loadd(&c, buf, STRIDE);
	} else {			} else {
	__tile_loadd(&a, buf2, STRIDE);			__tile_loadd(&a, buf2, STRIDE);
	__tile_loadd(&b, buf2, STRIDE);			__tile_loadd(&b, buf2, STRIDE);
	__tile_loadd(&c, buf2, STRIDE);			__tile_loadd(&c, buf2, STRIDE);
	}			}
	__tile_dpbsud(&c, a, b);			__tile_dpbsud(&c, a, b);
	__tile_stored(buf, STRIDE, c);			__tile_stored(buf, STRIDE, c);
	}			}

	void test_tile_loadd(short row, short col) {			void test_tile_loadd(short row, short col) {
	//CHECK-LABEL: @test_tile_loadd			//CHECK-LABEL: @test_tile_loadd
	//CHECK: call <256 x i32> @llvm.x86.tileloadd64.internal			//CHECK: call x86_amx @llvm.x86.tileloadd64.internal
				//CHECK-NEXT: {{%.}} = bitcast x86_amx {{%.}} to <256 x i32>
	__tile1024i a = {row, col};			__tile1024i a = {row, col};
	__tile_loadd(&a, buf, STRIDE);			__tile_loadd(&a, buf, STRIDE);
	}			}

	void test_tile_dpbsud(__tile1024i a, __tile1024i b, __tile1024i c) {			void test_tile_dpbsud(__tile1024i a, __tile1024i b, __tile1024i c) {
	//CHECK-LABEL: @test_tile_dpbsud			//CHECK-LABEL: @test_tile_dpbsud
	//CHECK: call <256 x i32> @llvm.x86.tdpbssd.internal			//CHECK: call x86_amx @llvm.x86.tdpbssd.internal
				//CHECK-NEXT: {{%.}} = bitcast x86_amx {{%.}} to <256 x i32>
	__tile_dpbsud(&c, a, b);			__tile_dpbsud(&c, a, b);
	}			}

	void test_tile_stored(__tile1024i c) {			void test_tile_stored(__tile1024i c) {
	//CHECK-LABEL: @test_tile_stored			//CHECK-LABEL: @test_tile_stored
	//CHECK: call void @llvm.x86.tilestored64.internal			//CHECK: {{%.}} = bitcast <256 x i32> {{%.}} to x86_amx
				//CHECK-NEXT: call void @llvm.x86.tilestored64.internal
	__tile_stored(buf, STRIDE, c);			__tile_stored(buf, STRIDE, c);
	}			}

llvm/include/llvm-c/Core.h

Show First 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	typedef enum {
LLVMIntegerTypeKind, /*< Arbitrary bit width integers /		LLVMIntegerTypeKind, /*< Arbitrary bit width integers /
LLVMFunctionTypeKind, /*< Functions /		LLVMFunctionTypeKind, /*< Functions /
LLVMStructTypeKind, /*< Structures /		LLVMStructTypeKind, /*< Structures /
LLVMArrayTypeKind, /*< Arrays /		LLVMArrayTypeKind, /*< Arrays /
LLVMPointerTypeKind, /*< Pointers /		LLVMPointerTypeKind, /*< Pointers /
LLVMVectorTypeKind, /*< Fixed width SIMD vector type /		LLVMVectorTypeKind, /*< Fixed width SIMD vector type /
LLVMMetadataTypeKind, /*< Metadata /		LLVMMetadataTypeKind, /*< Metadata /
LLVMX86_MMXTypeKind, /*< X86 MMX /		LLVMX86_MMXTypeKind, /*< X86 MMX /
		LLVMX86_AMXTypeKind, /*< X86 AMX /
		cuviperUnsubmitted Not Done Reply Inline Actions This is a breaking change to the C ABI -- can we move it to the end of the enum? https://bugs.llvm.org/show_bug.cgi?id=48905 cuviper: This is a breaking change to the C ABI -- can we move it to the end of the enum? https://bugs.
		MaskRayUnsubmitted Not Done Reply Inline Actions Done in 6612c2bb68becda5504099b48082c844503c6d4c MaskRay: Done in 6612c2bb68becda5504099b48082c844503c6d4c
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions @MaskRay, thank you! LuoYuanke: @MaskRay, thank you!
LLVMTokenTypeKind, /*< Tokens /		LLVMTokenTypeKind, /*< Tokens /
LLVMScalableVectorTypeKind, /*< Scalable SIMD vector type /		LLVMScalableVectorTypeKind, /*< Scalable SIMD vector type /
LLVMBFloatTypeKind /*< 16 bit brain floating point type /		LLVMBFloatTypeKind /*< 16 bit brain floating point type /
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LLVMBFloatTypeKind /*< 16 bit brain floating point type / + LLVMBFloatTypeKind /*< 16 bit brain floating point type / Lint: Pre-merge checks: clang-format: please reformat the code ``` - LLVMBFloatTypeKind /**< 16 bit brain floating…
} LLVMTypeKind;		} LLVMTypeKind;

typedef enum {		typedef enum {
LLVMExternalLinkage, /*< Externally visible function /		LLVMExternalLinkage, /*< Externally visible function /
LLVMAvailableExternallyLinkage,		LLVMAvailableExternallyLinkage,
LLVMLinkOnceAnyLinkage, /*< Keep one copy of function when linking (inline)/		LLVMLinkOnceAnyLinkage, /*< Keep one copy of function when linking (inline)/
LLVMLinkOnceODRLinkage, /**< Same, but only replaced by something		LLVMLinkOnceODRLinkage, /**< Same, but only replaced by something
equivalent. */		equivalent. */
▲ Show 20 Lines • Show All 1,315 Lines • ▼ Show 20 Lines
LLVMTypeRef LLVMLabelTypeInContext(LLVMContextRef C);		LLVMTypeRef LLVMLabelTypeInContext(LLVMContextRef C);

/**		/**
* Create a X86 MMX type in a context.		* Create a X86 MMX type in a context.
*/		*/
LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C);		LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C);

/**		/**
		* Create a X86 AMX type in a context.
		*/
		LLVMTypeRef LLVMX86AMXTypeInContext(LLVMContextRef C);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LLVMX86AMXTypeInContext' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'LLVMX86AMXTypeInContext' [readability…

		/**
* Create a token type in a context.		* Create a token type in a context.
*/		*/
LLVMTypeRef LLVMTokenTypeInContext(LLVMContextRef C);		LLVMTypeRef LLVMTokenTypeInContext(LLVMContextRef C);

/**		/**
* Create a metadata type in a context.		* Create a metadata type in a context.
*/		*/
LLVMTypeRef LLVMMetadataTypeInContext(LLVMContextRef C);		LLVMTypeRef LLVMMetadataTypeInContext(LLVMContextRef C);

/**		/**
* These are similar to the above functions except they operate on the		* These are similar to the above functions except they operate on the
* global context.		* global context.
*/		*/
LLVMTypeRef LLVMVoidType(void);		LLVMTypeRef LLVMVoidType(void);
LLVMTypeRef LLVMLabelType(void);		LLVMTypeRef LLVMLabelType(void);
LLVMTypeRef LLVMX86MMXType(void);		LLVMTypeRef LLVMX86MMXType(void);
		LLVMTypeRef LLVMX86AMXType(void);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LLVMX86AMXType' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'LLVMX86AMXType' [readability-identifier…

/**		/**
* @}		* @}
*/		*/

/**		/**
* @}		* @}
*/		*/
▲ Show 20 Lines • Show All 2,631 Lines • Show Last 20 Lines

llvm/include/llvm/Bitcode/LLVMBitCodes.h

Show First 20 Lines • Show All 162 Lines • ▼ Show 20 Lines	enum TypeCodes {
TYPE_CODE_STRUCT_ANON = 18, // STRUCT_ANON: [ispacked, eltty x N]		TYPE_CODE_STRUCT_ANON = 18, // STRUCT_ANON: [ispacked, eltty x N]
TYPE_CODE_STRUCT_NAME = 19, // STRUCT_NAME: [strchr x N]		TYPE_CODE_STRUCT_NAME = 19, // STRUCT_NAME: [strchr x N]
TYPE_CODE_STRUCT_NAMED = 20, // STRUCT_NAMED: [ispacked, eltty x N]		TYPE_CODE_STRUCT_NAMED = 20, // STRUCT_NAMED: [ispacked, eltty x N]

TYPE_CODE_FUNCTION = 21, // FUNCTION: [vararg, retty, paramty x N]		TYPE_CODE_FUNCTION = 21, // FUNCTION: [vararg, retty, paramty x N]

TYPE_CODE_TOKEN = 22, // TOKEN		TYPE_CODE_TOKEN = 22, // TOKEN

TYPE_CODE_BFLOAT = 23 // BRAIN FLOATING POINT		TYPE_CODE_BFLOAT = 23, // BRAIN FLOATING POINT
		TYPE_CODE_X86_AMX = 24 // X86 AMX
};		};

enum OperandBundleTagCode {		enum OperandBundleTagCode {
OPERAND_BUNDLE_TAG = 1, // TAG: [strchr x N]		OPERAND_BUNDLE_TAG = 1, // TAG: [strchr x N]
};		};

enum SyncScopeNameCode {		enum SyncScopeNameCode {
SYNC_SCOPE_NAME = 1,		SYNC_SCOPE_NAME = 1,
▲ Show 20 Lines • Show All 498 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/ValueTypes.td

	Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines

	def x86mmx : ValueType<64 , 157>; // X86 MMX value			def x86mmx : ValueType<64 , 157>; // X86 MMX value
	def FlagVT : ValueType<0 , 158>; // Pre-RA sched glue			def FlagVT : ValueType<0 , 158>; // Pre-RA sched glue
	def isVoid : ValueType<0 , 159>; // Produces no value			def isVoid : ValueType<0 , 159>; // Produces no value
	def untyped: ValueType<8 , 160>; // Produces an untyped value			def untyped: ValueType<8 , 160>; // Produces an untyped value
	def exnref : ValueType<0 , 161>; // WebAssembly's exnref type			def exnref : ValueType<0 , 161>; // WebAssembly's exnref type
	def funcref : ValueType<0 , 162>; // WebAssembly's funcref type			def funcref : ValueType<0 , 162>; // WebAssembly's funcref type
	def externref : ValueType<0 , 163>; // WebAssembly's externref type			def externref : ValueType<0 , 163>; // WebAssembly's externref type
				def x86amx : ValueType<8192, 164>; // X86 AMX value


	def token : ValueType<0 , 248>; // TokenTy			def token : ValueType<0 , 248>; // TokenTy
	def MetadataVT: ValueType<0, 249>; // Metadata			def MetadataVT: ValueType<0, 249>; // Metadata

	// Pseudo valuetype mapped to the current pointer size to any address space.			// Pseudo valuetype mapped to the current pointer size to any address space.
	// Should only be used in TableGen.			// Should only be used in TableGen.
	def iPTRAny : ValueType<0, 250>;			def iPTRAny : ValueType<0, 250>;
	Show All 26 Lines

llvm/include/llvm/IR/DataLayout.h

Show First 20 Lines • Show All 684 Lines • ▼ Show 20 Lines	inline TypeSize DataLayout::getTypeSizeInBits(Type *Ty) const {
case Type::FloatTyID:		case Type::FloatTyID:
return TypeSize::Fixed(32);		return TypeSize::Fixed(32);
case Type::DoubleTyID:		case Type::DoubleTyID:
case Type::X86_MMXTyID:		case Type::X86_MMXTyID:
return TypeSize::Fixed(64);		return TypeSize::Fixed(64);
case Type::PPC_FP128TyID:		case Type::PPC_FP128TyID:
case Type::FP128TyID:		case Type::FP128TyID:
return TypeSize::Fixed(128);		return TypeSize::Fixed(128);
		case Type::X86_AMXTyID:
		return TypeSize::Fixed(8192);
// In memory objects this is always aligned to a higher boundary, but		// In memory objects this is always aligned to a higher boundary, but
// only 80 bits contain information.		// only 80 bits contain information.
case Type::X86_FP80TyID:		case Type::X86_FP80TyID:
return TypeSize::Fixed(80);		return TypeSize::Fixed(80);
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
VectorType *VTy = cast<VectorType>(Ty);		VectorType *VTy = cast<VectorType>(Ty);
auto EltCnt = VTy->getElementCount();		auto EltCnt = VTy->getElementCount();
Show All 12 Lines

llvm/include/llvm/IR/Intrinsics.h

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	enum IITDescriptorKind {
HalfVecArgument,		HalfVecArgument,
SameVecWidthArgument,		SameVecWidthArgument,
PtrToArgument,		PtrToArgument,
PtrToElt,		PtrToElt,
VecOfAnyPtrsToElt,		VecOfAnyPtrsToElt,
VecElementArgument,		VecElementArgument,
Subdivide2Argument,		Subdivide2Argument,
Subdivide4Argument,		Subdivide4Argument,
VecOfBitcastsToInt		VecOfBitcastsToInt,
		AMX
} Kind;		} Kind;

union {		union {
unsigned Integer_Width;		unsigned Integer_Width;
unsigned Float_Width;		unsigned Float_Width;
unsigned Pointer_AddressSpace;		unsigned Pointer_AddressSpace;
unsigned Struct_NumElements;		unsigned Struct_NumElements;
unsigned Argument_Info;		unsigned Argument_Info;
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 249 Lines • ▼ Show 20 Lines
	def llvm_empty_ty : LLVMType<OtherVT>; // { }			def llvm_empty_ty : LLVMType<OtherVT>; // { }
	def llvm_descriptor_ty : LLVMPointerType<llvm_empty_ty>; // { }*			def llvm_descriptor_ty : LLVMPointerType<llvm_empty_ty>; // { }*
	def llvm_metadata_ty : LLVMType<MetadataVT>; // !{...}			def llvm_metadata_ty : LLVMType<MetadataVT>; // !{...}
	def llvm_token_ty : LLVMType<token>; // token			def llvm_token_ty : LLVMType<token>; // token

	def llvm_x86mmx_ty : LLVMType<x86mmx>;			def llvm_x86mmx_ty : LLVMType<x86mmx>;
	def llvm_ptrx86mmx_ty : LLVMPointerType<llvm_x86mmx_ty>; // <1 x i64>*			def llvm_ptrx86mmx_ty : LLVMPointerType<llvm_x86mmx_ty>; // <1 x i64>*

				def llvm_x86amx_ty : LLVMType<x86amx>;

	def llvm_v2i1_ty : LLVMType<v2i1>; // 2 x i1			def llvm_v2i1_ty : LLVMType<v2i1>; // 2 x i1
	def llvm_v4i1_ty : LLVMType<v4i1>; // 4 x i1			def llvm_v4i1_ty : LLVMType<v4i1>; // 4 x i1
	def llvm_v8i1_ty : LLVMType<v8i1>; // 8 x i1			def llvm_v8i1_ty : LLVMType<v8i1>; // 8 x i1
	def llvm_v16i1_ty : LLVMType<v16i1>; // 16 x i1			def llvm_v16i1_ty : LLVMType<v16i1>; // 16 x i1
	def llvm_v32i1_ty : LLVMType<v32i1>; // 32 x i1			def llvm_v32i1_ty : LLVMType<v32i1>; // 32 x i1
	def llvm_v64i1_ty : LLVMType<v64i1>; // 64 x i1			def llvm_v64i1_ty : LLVMType<v64i1>; // 64 x i1
	def llvm_v128i1_ty : LLVMType<v128i1>; // 128 x i1			def llvm_v128i1_ty : LLVMType<v128i1>; // 128 x i1
	def llvm_v256i1_ty : LLVMType<v256i1>; // 256 x i1			def llvm_v256i1_ty : LLVMType<v256i1>; // 256 x i1
	▲ Show 20 Lines • Show All 1,381 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicsX86.td

Show First 20 Lines • Show All 5,036 Lines • ▼ Show 20 Lines	let TargetPrefix = "x86" in {
def int_x86_tdpbuud : GCCBuiltin<"__builtin_ia32_tdpbuud">,		def int_x86_tdpbuud : GCCBuiltin<"__builtin_ia32_tdpbuud">,
Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty],		Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty],
[ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>,		[ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>,
ImmArg<ArgIndex<2>>]>;		ImmArg<ArgIndex<2>>]>;
def int_x86_tdpbf16ps : GCCBuiltin<"__builtin_ia32_tdpbf16ps">,		def int_x86_tdpbf16ps : GCCBuiltin<"__builtin_ia32_tdpbf16ps">,
Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty],		Intrinsic<[], [llvm_i8_ty, llvm_i8_ty, llvm_i8_ty],
[ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>,		[ImmArg<ArgIndex<0>>, ImmArg<ArgIndex<1>>,
ImmArg<ArgIndex<2>>]>;		ImmArg<ArgIndex<2>>]>;
		// AMX - internal intrinsics
		def int_x86_tileloadd64_internal :
		GCCBuiltin<"__builtin_ia32_tileloadd64_internal">,
		Intrinsic<[llvm_x86amx_ty],
		[llvm_i16_ty, llvm_i16_ty, llvm_ptr_ty, llvm_i64_ty],
		[]>;
		def int_x86_tdpbssd_internal :
		GCCBuiltin<"__builtin_ia32_tdpbssd_internal">,
		Intrinsic<[llvm_x86amx_ty],
		[llvm_i16_ty, llvm_i16_ty, llvm_i16_ty,
		llvm_x86amx_ty, llvm_x86amx_ty,
		llvm_x86amx_ty], []>;
		def int_x86_tilestored64_internal :
		GCCBuiltin<"__builtin_ia32_tilestored64_internal">,
		Intrinsic<[], [llvm_i16_ty, llvm_i16_ty, llvm_ptr_ty,
		llvm_i64_ty, llvm_x86amx_ty], []>;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// UINTR - User Level Interrupt		// UINTR - User Level Interrupt

let TargetPrefix = "x86" in {		let TargetPrefix = "x86" in {
def int_x86_clui : GCCBuiltin<"__builtin_ia32_clui">,		def int_x86_clui : GCCBuiltin<"__builtin_ia32_clui">,
Intrinsic<[], [], []>;		Intrinsic<[], [], []>;
def int_x86_stui : GCCBuiltin<"__builtin_ia32_stui">,		def int_x86_stui : GCCBuiltin<"__builtin_ia32_stui">,
Intrinsic<[], [], []>;		Intrinsic<[], [], []>;
def int_x86_testui : GCCBuiltin<"__builtin_ia32_testui">,		def int_x86_testui : GCCBuiltin<"__builtin_ia32_testui">,
Intrinsic<[llvm_i8_ty], [], []>;		Intrinsic<[llvm_i8_ty], [], []>;
def int_x86_senduipi : GCCBuiltin<"__builtin_ia32_senduipi">,		def int_x86_senduipi : GCCBuiltin<"__builtin_ia32_senduipi">,
Intrinsic<[], [llvm_i64_ty], []>;		Intrinsic<[], [llvm_i64_ty], []>;
// AMX - internal intrinsics
def int_x86_tileloadd64_internal :
GCCBuiltin<"__builtin_ia32_tileloadd64_internal">,
Intrinsic<[llvm_v256i32_ty],
[llvm_i16_ty, llvm_i16_ty, llvm_ptr_ty, llvm_i64_ty],
[]>;
def int_x86_tdpbssd_internal :
GCCBuiltin<"__builtin_ia32_tdpbssd_internal">,
Intrinsic<[llvm_v256i32_ty],
[llvm_i16_ty, llvm_i16_ty, llvm_i16_ty,
llvm_v256i32_ty, llvm_v256i32_ty,
llvm_v256i32_ty], []>;
def int_x86_tilestored64_internal :
GCCBuiltin<"__builtin_ia32_tilestored64_internal">,
Intrinsic<[], [llvm_i16_ty, llvm_i16_ty, llvm_ptr_ty,
llvm_i64_ty, llvm_v256i32_ty], []>;
}		}

llvm/include/llvm/IR/Type.h

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	enum TypeID {
DoubleTyID, ///< 64-bit floating point type		DoubleTyID, ///< 64-bit floating point type
X86_FP80TyID, ///< 80-bit floating point type (X87)		X86_FP80TyID, ///< 80-bit floating point type (X87)
FP128TyID, ///< 128-bit floating point type (112-bit significand)		FP128TyID, ///< 128-bit floating point type (112-bit significand)
PPC_FP128TyID, ///< 128-bit floating point type (two 64-bits, PowerPC)		PPC_FP128TyID, ///< 128-bit floating point type (two 64-bits, PowerPC)
VoidTyID, ///< type with no size		VoidTyID, ///< type with no size
LabelTyID, ///< Labels		LabelTyID, ///< Labels
MetadataTyID, ///< Metadata		MetadataTyID, ///< Metadata
X86_MMXTyID, ///< MMX vectors (64 bits, X86 specific)		X86_MMXTyID, ///< MMX vectors (64 bits, X86 specific)
		X86_AMXTyID, ///< AMX vectors (8192 bits, X86 specific)
		uabelhoUnsubmitted Not Done Reply Inline Actions This addition causes a compilation warning in HexagonTargetObjectFile.cpp: ../lib/Target/Hexagon/HexagonTargetObjectFile.cpp:297:11: error: enumeration value 'X86_AMXTyID' not handled in switch [-Werror,-Wswitch] switch (Ty->getTypeID()) { ^ 1 error generated. Seen in build bots, e.g. here: http://lab.llvm.org:8011/#/builders/57/builds/2889/steps/6/logs/stdio uabelho: This addition causes a compilation warning in HexagonTargetObjectFile.cpp: ``` ..
		pengfeiUnsubmitted Not Done Reply Inline Actions Thanks Mikael for pointing it out. I think we just need to put the type in the switch table. I've posted a patch to fix it. rG16c2067cf212. pengfei: Thanks Mikael for pointing it out. I think we just need to put the type in the switch table.
		uabelhoUnsubmitted Not Done Reply Inline Actions Yep, thanks! uabelho: Yep, thanks!
TokenTyID, ///< Tokens		TokenTyID, ///< Tokens

// Derived types... see DerivedTypes.h file.		// Derived types... see DerivedTypes.h file.
IntegerTyID, ///< Arbitrary bit width integers		IntegerTyID, ///< Arbitrary bit width integers
FunctionTyID, ///< Functions		FunctionTyID, ///< Functions
PointerTyID, ///< Pointers		PointerTyID, ///< Pointers
StructTyID, ///< Structures		StructTyID, ///< Structures
ArrayTyID, ///< Arrays		ArrayTyID, ///< Arrays
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	const fltSemantics &getFltSemantics() const {
case PPC_FP128TyID: return APFloat::PPCDoubleDouble();		case PPC_FP128TyID: return APFloat::PPCDoubleDouble();
default: llvm_unreachable("Invalid floating type");		default: llvm_unreachable("Invalid floating type");
}		}
}		}

/// Return true if this is X86 MMX.		/// Return true if this is X86 MMX.
bool isX86_MMXTy() const { return getTypeID() == X86_MMXTyID; }		bool isX86_MMXTy() const { return getTypeID() == X86_MMXTyID; }

		/// Return true if this is X86 AMX.
		bool isX86_AMXTy() const { return getTypeID() == X86_AMXTyID; }
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'isX86_AMXTy' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'isX86_AMXTy' [readability-identifier…

/// Return true if this is a FP type or a vector of FP.		/// Return true if this is a FP type or a vector of FP.
bool isFPOrFPVectorTy() const { return getScalarType()->isFloatingPointTy(); }		bool isFPOrFPVectorTy() const { return getScalarType()->isFloatingPointTy(); }

/// Return true if this is 'label'.		/// Return true if this is 'label'.
bool isLabelTy() const { return getTypeID() == LabelTyID; }		bool isLabelTy() const { return getTypeID() == LabelTyID; }

/// Return true if this is 'metadata'.		/// Return true if this is 'metadata'.
bool isMetadataTy() const { return getTypeID() == MetadataTyID; }		bool isMetadataTy() const { return getTypeID() == MetadataTyID; }
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	public:
bool isFirstClassType() const {		bool isFirstClassType() const {
return getTypeID() != FunctionTyID && getTypeID() != VoidTyID;		return getTypeID() != FunctionTyID && getTypeID() != VoidTyID;
}		}

/// Return true if the type is a valid type for a register in codegen. This		/// Return true if the type is a valid type for a register in codegen. This
/// includes all first-class types except struct and array types.		/// includes all first-class types except struct and array types.
bool isSingleValueType() const {		bool isSingleValueType() const {
return isFloatingPointTy() \|\| isX86_MMXTy() \|\| isIntegerTy() \|\|		return isFloatingPointTy() \|\| isX86_MMXTy() \|\| isIntegerTy() \|\|
isPointerTy() \|\| isVectorTy();		isPointerTy() \|\| isVectorTy() \|\| isX86_AMXTy();
}		}

/// Return true if the type is an aggregate type. This means it is valid as		/// Return true if the type is an aggregate type. This means it is valid as
/// the first operand of an insertvalue or extractvalue instruction. This		/// the first operand of an insertvalue or extractvalue instruction. This
/// includes struct and array types, but does not include vector types.		/// includes struct and array types, but does not include vector types.
bool isAggregateType() const {		bool isAggregateType() const {
return getTypeID() == StructTyID \|\| getTypeID() == ArrayTyID;		return getTypeID() == StructTyID \|\| getTypeID() == ArrayTyID;
}		}

/// Return true if it makes sense to take the size of this type. To get the		/// Return true if it makes sense to take the size of this type. To get the
/// actual size for a particular target, it is reasonable to use the		/// actual size for a particular target, it is reasonable to use the
/// DataLayout subsystem to do this.		/// DataLayout subsystem to do this.
bool isSized(SmallPtrSetImpl<Type> Visited = nullptr) const {		bool isSized(SmallPtrSetImpl<Type> Visited = nullptr) const {
// If it's a primitive, it is always sized.		// If it's a primitive, it is always sized.
if (getTypeID() == IntegerTyID \|\| isFloatingPointTy() \|\|		if (getTypeID() == IntegerTyID \|\| isFloatingPointTy() \|\|
getTypeID() == PointerTyID \|\|		getTypeID() == PointerTyID \|\| getTypeID() == X86_MMXTyID \|\|
getTypeID() == X86_MMXTyID)		getTypeID() == X86_AMXTyID)
return true;		return true;
// If it is not something that can have a size (e.g. a function or label),		// If it is not something that can have a size (e.g. a function or label),
// it doesn't have a size.		// it doesn't have a size.
if (getTypeID() != StructTyID && getTypeID() != ArrayTyID && !isVectorTy())		if (getTypeID() != StructTyID && getTypeID() != ArrayTyID && !isVectorTy())
return false;		return false;
// Otherwise we have to try harder to decide.		// Otherwise we have to try harder to decide.
return isSizedDerivedType(Visited);		return isSizedDerivedType(Visited);
}		}
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	public:
static Type *getBFloatTy(LLVMContext &C);		static Type *getBFloatTy(LLVMContext &C);
static Type *getFloatTy(LLVMContext &C);		static Type *getFloatTy(LLVMContext &C);
static Type *getDoubleTy(LLVMContext &C);		static Type *getDoubleTy(LLVMContext &C);
static Type *getMetadataTy(LLVMContext &C);		static Type *getMetadataTy(LLVMContext &C);
static Type *getX86_FP80Ty(LLVMContext &C);		static Type *getX86_FP80Ty(LLVMContext &C);
static Type *getFP128Ty(LLVMContext &C);		static Type *getFP128Ty(LLVMContext &C);
static Type *getPPC_FP128Ty(LLVMContext &C);		static Type *getPPC_FP128Ty(LLVMContext &C);
static Type *getX86_MMXTy(LLVMContext &C);		static Type *getX86_MMXTy(LLVMContext &C);
		static Type *getX86_AMXTy(LLVMContext &C);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'getX86_AMXTy' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'getX86_AMXTy' [readability-identifier…
static Type *getTokenTy(LLVMContext &C);		static Type *getTokenTy(LLVMContext &C);
static IntegerType *getIntNTy(LLVMContext &C, unsigned N);		static IntegerType *getIntNTy(LLVMContext &C, unsigned N);
static IntegerType *getInt1Ty(LLVMContext &C);		static IntegerType *getInt1Ty(LLVMContext &C);
static IntegerType *getInt8Ty(LLVMContext &C);		static IntegerType *getInt8Ty(LLVMContext &C);
static IntegerType *getInt16Ty(LLVMContext &C);		static IntegerType *getInt16Ty(LLVMContext &C);
static IntegerType *getInt32Ty(LLVMContext &C);		static IntegerType *getInt32Ty(LLVMContext &C);
static IntegerType *getInt64Ty(LLVMContext &C);		static IntegerType *getInt64Ty(LLVMContext &C);
static IntegerType *getInt128Ty(LLVMContext &C);		static IntegerType *getInt128Ty(LLVMContext &C);
Show All 39 Lines	public:
static PointerType *getHalfPtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getHalfPtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getBFloatPtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getBFloatPtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getFloatPtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getFloatPtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getDoublePtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getDoublePtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getX86_FP80PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getX86_FP80PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getFP128PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getFP128PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getPPC_FP128PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getPPC_FP128PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getX86_MMXPtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getX86_MMXPtrTy(LLVMContext &C, unsigned AS = 0);
		static PointerType *getX86_AMXPtrTy(LLVMContext &C, unsigned AS = 0);
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'getX86_AMXPtrTy' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'getX86_AMXPtrTy' [readability-identifier…
static PointerType *getIntNPtrTy(LLVMContext &C, unsigned N, unsigned AS = 0);		static PointerType *getIntNPtrTy(LLVMContext &C, unsigned N, unsigned AS = 0);
static PointerType *getInt1PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt1PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getInt8PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt8PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getInt16PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt16PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getInt32PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt32PtrTy(LLVMContext &C, unsigned AS = 0);
static PointerType *getInt64PtrTy(LLVMContext &C, unsigned AS = 0);		static PointerType *getInt64PtrTy(LLVMContext &C, unsigned AS = 0);

/// Return a pointer to the current type. This is equivalent to		/// Return a pointer to the current type. This is equivalent to
Show All 39 Lines

llvm/include/llvm/Support/MachineValueType.h

Show All 30 Lines	namespace llvm {
public:		public:
enum SimpleValueType : uint8_t {		enum SimpleValueType : uint8_t {
// Simple value types that aren't explicitly part of this enumeration		// Simple value types that aren't explicitly part of this enumeration
// are considered extended value types.		// are considered extended value types.
INVALID_SIMPLE_VALUE_TYPE = 0,		INVALID_SIMPLE_VALUE_TYPE = 0,

// If you change this numbering, you must change the values in		// If you change this numbering, you must change the values in
// ValueTypes.td as well!		// ValueTypes.td as well!
Other = 1, // This is a non-standard value		Other = 1, // This is a non-standard value
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Other = 1, // This is a non-standard value - i1 = 2, // This is a 1 bit integer value - i8 = 3, // This is an 8 bit integer value - i16 = 4, // This is a 16 bit integer value - i32 = 5, // This is a 32 bit integer value - i64 = 6, // This is a 64 bit integer value - i128 = 7, // This is a 128 bit integer value + Other = 1, // This is a non-standard value + i1 = 2, // This is a 1 bit integer value + i8 = 3, // This is an 8 bit integer value 4 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - Other = 1, // This is a non…
i1 = 2, // This is a 1 bit integer value		i1 = 2, // This is a 1 bit integer value
i8 = 3, // This is an 8 bit integer value		i8 = 3, // This is an 8 bit integer value
i16 = 4, // This is a 16 bit integer value		i16 = 4, // This is a 16 bit integer value
i32 = 5, // This is a 32 bit integer value		i32 = 5, // This is a 32 bit integer value
i64 = 6, // This is a 64 bit integer value		i64 = 6, // This is a 64 bit integer value
i128 = 7, // This is a 128 bit integer value		i128 = 7, // This is a 128 bit integer value

FIRST_INTEGER_VALUETYPE = i1,		FIRST_INTEGER_VALUETYPE = i1,
LAST_INTEGER_VALUETYPE = i128,		LAST_INTEGER_VALUETYPE = i128,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LAST_INTEGER_VALUETYPE = i128, + LAST_INTEGER_VALUETYPE = i128, Lint: Pre-merge checks: clang-format: please reformat the code ``` - LAST_INTEGER_VALUETYPE = i128, +…

bf16 = 8, // This is a 16 bit brain floating point value		bf16 = 8, // This is a 16 bit brain floating point value
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - bf16 = 8, // This is a 16 bit brain floating point value - f16 = 9, // This is a 16 bit floating point value - f32 = 10, // This is a 32 bit floating point value - f64 = 11, // This is a 64 bit floating point value - f80 = 12, // This is a 80 bit floating point value - f128 = 13, // This is a 128 bit floating point value - ppcf128 = 14, // This is a PPC 128-bit floating point value + bf16 = 8, // This is a 16 bit brain floating point value + f16 = 9, // This is a 16 bit floating point value + f32 = 10, // This is a 32 bit floating point value 4 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - bf16 = 8, // This is a 16 bit…
f16 = 9, // This is a 16 bit floating point value		f16 = 9, // This is a 16 bit floating point value
f32 = 10, // This is a 32 bit floating point value		f32 = 10, // This is a 32 bit floating point value
f64 = 11, // This is a 64 bit floating point value		f64 = 11, // This is a 64 bit floating point value
f80 = 12, // This is a 80 bit floating point value		f80 = 12, // This is a 80 bit floating point value
f128 = 13, // This is a 128 bit floating point value		f128 = 13, // This is a 128 bit floating point value
ppcf128 = 14, // This is a PPC 128-bit floating point value		ppcf128 = 14, // This is a PPC 128-bit floating point value

FIRST_FP_VALUETYPE = bf16,		FIRST_FP_VALUETYPE = bf16,
LAST_FP_VALUETYPE = ppcf128,		LAST_FP_VALUETYPE = ppcf128,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LAST_FP_VALUETYPE = ppcf128, - - v1i1 = 15, // 1 x i1 - v2i1 = 16, // 2 x i1 - v4i1 = 17, // 4 x i1 - v8i1 = 18, // 8 x i1 - v16i1 = 19, // 16 x i1 - v32i1 = 20, // 32 x i1 - v64i1 = 21, // 64 x i1 - v128i1 = 22, // 128 x i1 110 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - LAST_FP_VALUETYPE = ppcf128, - - v1i1…

v1i1 = 15, // 1 x i1		v1i1 = 15, // 1 x i1
v2i1 = 16, // 2 x i1		v2i1 = 16, // 2 x i1
v4i1 = 17, // 4 x i1		v4i1 = 17, // 4 x i1
v8i1 = 18, // 8 x i1		v8i1 = 18, // 8 x i1
v16i1 = 19, // 16 x i1		v16i1 = 19, // 16 x i1
v32i1 = 20, // 32 x i1		v32i1 = 20, // 32 x i1
v64i1 = 21, // 64 x i1		v64i1 = 21, // 64 x i1
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	enum SimpleValueType : uint8_t {
v128i64 = 65, // 128 x i64		v128i64 = 65, // 128 x i64
v256i64 = 66, // 256 x i64		v256i64 = 66, // 256 x i64

v1i128 = 67, // 1 x i128		v1i128 = 67, // 1 x i128

FIRST_INTEGER_FIXEDLEN_VECTOR_VALUETYPE = v1i1,		FIRST_INTEGER_FIXEDLEN_VECTOR_VALUETYPE = v1i1,
LAST_INTEGER_FIXEDLEN_VECTOR_VALUETYPE = v1i128,		LAST_INTEGER_FIXEDLEN_VECTOR_VALUETYPE = v1i128,

v2f16 = 68, // 2 x f16		v2f16 = 68, // 2 x f16
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - v2f16 = 68, // 2 x f16 - v3f16 = 69, // 3 x f16 - v4f16 = 70, // 4 x f16 - v8f16 = 71, // 8 x f16 - v16f16 = 72, // 16 x f16 - v32f16 = 73, // 32 x f16 - v64f16 = 74, // 64 x f16 - v128f16 = 75, // 128 x f16 - v2bf16 = 76, // 2 x bf16 - v3bf16 = 77, // 3 x bf16 68 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - v2f16 = 68, // 2 x f16…
v3f16 = 69, // 3 x f16		v3f16 = 69, // 3 x f16
v4f16 = 70, // 4 x f16		v4f16 = 70, // 4 x f16
v8f16 = 71, // 8 x f16		v8f16 = 71, // 8 x f16
v16f16 = 72, // 16 x f16		v16f16 = 72, // 16 x f16
v32f16 = 73, // 32 x f16		v32f16 = 73, // 32 x f16
v64f16 = 74, // 64 x f16		v64f16 = 74, // 64 x f16
v128f16 = 75, // 128 x f16		v128f16 = 75, // 128 x f16
v2bf16 = 76, // 2 x bf16		v2bf16 = 76, // 2 x bf16
Show All 29 Lines	enum SimpleValueType : uint8_t {
v256f64 = 106, // 256 x f64		v256f64 = 106, // 256 x f64

FIRST_FP_FIXEDLEN_VECTOR_VALUETYPE = v2f16,		FIRST_FP_FIXEDLEN_VECTOR_VALUETYPE = v2f16,
LAST_FP_FIXEDLEN_VECTOR_VALUETYPE = v256f64,		LAST_FP_FIXEDLEN_VECTOR_VALUETYPE = v256f64,

FIRST_FIXEDLEN_VECTOR_VALUETYPE = v1i1,		FIRST_FIXEDLEN_VECTOR_VALUETYPE = v1i1,
LAST_FIXEDLEN_VECTOR_VALUETYPE = v256f64,		LAST_FIXEDLEN_VECTOR_VALUETYPE = v256f64,

nxv1i1 = 107, // n x 1 x i1		nxv1i1 = 107, // n x 1 x i1
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - nxv1i1 = 107, // n x 1 x i1 - nxv2i1 = 108, // n x 2 x i1 - nxv4i1 = 109, // n x 4 x i1 - nxv8i1 = 110, // n x 8 x i1 - nxv16i1 = 111, // n x 16 x i1 - nxv32i1 = 112, // n x 32 x i1 - nxv64i1 = 113, // n x 64 x i1 - - nxv1i8 = 114, // n x 1 x i8 - nxv2i8 = 115, // n x 2 x i8 62 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - nxv1i1 = 107, // n x 1 x i1…
nxv2i1 = 108, // n x 2 x i1		nxv2i1 = 108, // n x 2 x i1
nxv4i1 = 109, // n x 4 x i1		nxv4i1 = 109, // n x 4 x i1
nxv8i1 = 110, // n x 8 x i1		nxv8i1 = 110, // n x 8 x i1
nxv16i1 = 111, // n x 16 x i1		nxv16i1 = 111, // n x 16 x i1
nxv32i1 = 112, // n x 32 x i1		nxv32i1 = 112, // n x 32 x i1
nxv64i1 = 113, // n x 64 x i1		nxv64i1 = 113, // n x 64 x i1

nxv1i8 = 114, // n x 1 x i8		nxv1i8 = 114, // n x 1 x i8
Show All 23 Lines	enum SimpleValueType : uint8_t {
nxv4i64 = 135, // n x 4 x i64		nxv4i64 = 135, // n x 4 x i64
nxv8i64 = 136, // n x 8 x i64		nxv8i64 = 136, // n x 8 x i64
nxv16i64 = 137, // n x 16 x i64		nxv16i64 = 137, // n x 16 x i64
nxv32i64 = 138, // n x 32 x i64		nxv32i64 = 138, // n x 32 x i64

FIRST_INTEGER_SCALABLE_VECTOR_VALUETYPE = nxv1i1,		FIRST_INTEGER_SCALABLE_VECTOR_VALUETYPE = nxv1i1,
LAST_INTEGER_SCALABLE_VECTOR_VALUETYPE = nxv32i64,		LAST_INTEGER_SCALABLE_VECTOR_VALUETYPE = nxv32i64,

nxv1f16 = 139, // n x 1 x f16		nxv1f16 = 139, // n x 1 x f16
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - nxv1f16 = 139, // n x 1 x f16 - nxv2f16 = 140, // n x 2 x f16 - nxv4f16 = 141, // n x 4 x f16 - nxv8f16 = 142, // n x 8 x f16 - nxv16f16 = 143, // n x 16 x f16 - nxv32f16 = 144, // n x 32 x f16 - nxv2bf16 = 145, // n x 2 x bf16 - nxv4bf16 = 146, // n x 4 x bf16 - nxv8bf16 = 147, // n x 8 x bf16 - nxv1f32 = 148, // n x 1 x f32 26 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - nxv1f16 = 139, // n x 1 x f16…
nxv2f16 = 140, // n x 2 x f16		nxv2f16 = 140, // n x 2 x f16
nxv4f16 = 141, // n x 4 x f16		nxv4f16 = 141, // n x 4 x f16
nxv8f16 = 142, // n x 8 x f16		nxv8f16 = 142, // n x 8 x f16
nxv16f16 = 143, // n x 16 x f16		nxv16f16 = 143, // n x 16 x f16
nxv32f16 = 144, // n x 32 x f16		nxv32f16 = 144, // n x 32 x f16
nxv2bf16 = 145, // n x 2 x bf16		nxv2bf16 = 145, // n x 2 x bf16
nxv4bf16 = 146, // n x 4 x bf16		nxv4bf16 = 146, // n x 4 x bf16
nxv8bf16 = 147, // n x 8 x bf16		nxv8bf16 = 147, // n x 8 x bf16
Show All 9 Lines	enum SimpleValueType : uint8_t {

FIRST_FP_SCALABLE_VECTOR_VALUETYPE = nxv1f16,		FIRST_FP_SCALABLE_VECTOR_VALUETYPE = nxv1f16,
LAST_FP_SCALABLE_VECTOR_VALUETYPE = nxv8f64,		LAST_FP_SCALABLE_VECTOR_VALUETYPE = nxv8f64,

FIRST_SCALABLE_VECTOR_VALUETYPE = nxv1i1,		FIRST_SCALABLE_VECTOR_VALUETYPE = nxv1i1,
LAST_SCALABLE_VECTOR_VALUETYPE = nxv8f64,		LAST_SCALABLE_VECTOR_VALUETYPE = nxv8f64,

FIRST_VECTOR_VALUETYPE = v1i1,		FIRST_VECTOR_VALUETYPE = v1i1,
LAST_VECTOR_VALUETYPE = nxv8f64,		LAST_VECTOR_VALUETYPE = nxv8f64,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - LAST_VECTOR_VALUETYPE = nxv8f64, + LAST_VECTOR_VALUETYPE = nxv8f64, Lint: Pre-merge checks: clang-format: please reformat the code ``` - LAST_VECTOR_VALUETYPE = nxv8f64, +…

x86mmx = 157, // This is an X86 MMX value		x86mmx = 157, // This is an X86 MMX value
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - x86mmx = 157, // This is an X86 MMX value + x86mmx = 157, // This is an X86 MMX value Lint: Pre-merge checks: clang-format: please reformat the code ``` - x86mmx = 157, // This is an X86 MMX…

Glue = 158, // This glues nodes together during pre-RA sched		Glue = 158, // This glues nodes together during pre-RA sched
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Glue = 158, // This glues nodes together during pre-RA sched + Glue = 158, // This glues nodes together during pre-RA sched Lint: Pre-merge checks: clang-format: please reformat the code ``` - Glue = 158, // This glues nodes…

isVoid = 159, // This has no value		isVoid = 159, // This has no value
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - isVoid = 159, // This has no value + isVoid = 159, // This has no value Lint: Pre-merge checks: clang-format: please reformat the code ``` - isVoid = 159, // This has no value…

Untyped = 160, // This value takes a register, but has		Untyped = 160, // This value takes a register, but has
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Untyped = 160, // This value takes a register, but has - // unspecified type. The register class - // will be determined by the opcode. + Untyped = 160, // This value takes a register, but has + // unspecified type. The register class + // will be determined by the opcode. Lint: Pre-merge checks: clang-format: please reformat the code ``` - Untyped = 160, // This value takes a…
// unspecified type. The register class		// unspecified type. The register class
// will be determined by the opcode.		// will be determined by the opcode.

exnref = 161, // WebAssembly's exnref type		exnref = 161, // WebAssembly's exnref type
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - exnref = 161, // WebAssembly's exnref type - funcref = 162, // WebAssembly's funcref type - externref = 163, // WebAssembly's externref type - x86amx = 164, // This is an X86 AMX value + exnref = 161, // WebAssembly's exnref type + funcref = 162, // WebAssembly's funcref type + externref = 163, // WebAssembly's externref type + x86amx = 164, // This is an X86 AMX value Lint: Pre-merge checks: clang-format: please reformat the code ``` - exnref = 161, // WebAssembly's…
funcref = 162, // WebAssembly's funcref type		funcref = 162, // WebAssembly's funcref type
externref = 163, // WebAssembly's externref type		externref = 163, // WebAssembly's externref type
		x86amx = 164, // This is an X86 AMX value

FIRST_VALUETYPE = 1, // This is always the beginning of the list.		FIRST_VALUETYPE = 1, // This is always the beginning of the list.
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - FIRST_VALUETYPE = 1, // This is always the beginning of the list. - LAST_VALUETYPE = 165, // This always remains at the end of the list. + FIRST_VALUETYPE = 1, // This is always the beginning of the list. + LAST_VALUETYPE = 165, // This always remains at the end of the list. Lint: Pre-merge checks: clang-format: please reformat the code ``` - FIRST_VALUETYPE = 1, // This is always the…
LAST_VALUETYPE = 164, // This always remains at the end of the list.		LAST_VALUETYPE = 165, // This always remains at the end of the list.

// This is the current maximum for LAST_VALUETYPE.		// This is the current maximum for LAST_VALUETYPE.
// MVT::MAX_ALLOWED_VALUETYPE is used for asserts and to size bit vectors		// MVT::MAX_ALLOWED_VALUETYPE is used for asserts and to size bit vectors
// This value must be a multiple of 32.		// This value must be a multiple of 32.
MAX_ALLOWED_VALUETYPE = 192,		MAX_ALLOWED_VALUETYPE = 192,

// A value of type llvm::TokenTy		// A value of type llvm::TokenTy
token = 248,		token = 248,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - token = 248, + token = 248, Lint: Pre-merge checks: clang-format: please reformat the code ``` - token = 248, + token = 248, ```

// This is MDNode or MDString.		// This is MDNode or MDString.
Metadata = 249,		Metadata = 249,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Metadata = 249, + Metadata = 249, Lint: Pre-merge checks: clang-format: please reformat the code ``` - Metadata = 249, + Metadata = 249…

// An int value the size of the pointer of the current		// An int value the size of the pointer of the current
// target to any address space. This must only be used internal to		// target to any address space. This must only be used internal to
// tblgen. Other than for overloading, we treat iPTRAny the same as iPTR.		// tblgen. Other than for overloading, we treat iPTRAny the same as iPTR.
iPTRAny = 250,		iPTRAny = 250,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - iPTRAny = 250, + iPTRAny = 250, Lint: Pre-merge checks: clang-format: please reformat the code ``` - iPTRAny = 250, + iPTRAny = 250…

// A vector with any length and element size. This is used		// A vector with any length and element size. This is used
// for intrinsics that have overloadings based on vector types.		// for intrinsics that have overloadings based on vector types.
// This is only for tblgen's consumption!		// This is only for tblgen's consumption!
vAny = 251,		vAny = 251,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - vAny = 251, + vAny = 251, Lint: Pre-merge checks: clang-format: please reformat the code ``` - vAny = 251, + vAny = 251, ```

// Any floating-point or vector floating-point value. This is used		// Any floating-point or vector floating-point value. This is used
// for intrinsics that have overloadings based on floating-point types.		// for intrinsics that have overloadings based on floating-point types.
// This is only for tblgen's consumption!		// This is only for tblgen's consumption!
fAny = 252,		fAny = 252,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - fAny = 252, + fAny = 252, Lint: Pre-merge checks: clang-format: please reformat the code ``` - fAny = 252, + fAny = 252, ```

// An integer or vector integer value of any bit width. This is		// An integer or vector integer value of any bit width. This is
// used for intrinsics that have overloadings based on integer bit widths.		// used for intrinsics that have overloadings based on integer bit widths.
// This is only for tblgen's consumption!		// This is only for tblgen's consumption!
iAny = 253,		iAny = 253,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - iAny = 253, + iAny = 253, Lint: Pre-merge checks: clang-format: please reformat the code ``` - iAny = 253, + iAny = 253, ```

// An int value the size of the pointer of the current		// An int value the size of the pointer of the current
// target. This should only be used internal to tblgen!		// target. This should only be used internal to tblgen!
iPTR = 254,		iPTR = 254,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - iPTR = 254, + iPTR = 254, Lint: Pre-merge checks: clang-format: please reformat the code ``` - iPTR = 254, + iPTR = 254, ```

// Any type. This is used for intrinsics that have overloadings.		// Any type. This is used for intrinsics that have overloadings.
// This is only for tblgen's consumption!		// This is only for tblgen's consumption!
Any = 255		Any = 255
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Any = 255 + Any = 255 Lint: Pre-merge checks: clang-format: please reformat the code ``` - Any = 255 + Any = 255 ```
};		};

SimpleValueType SimpleTy = INVALID_SIMPLE_VALUE_TYPE;		SimpleValueType SimpleTy = INVALID_SIMPLE_VALUE_TYPE;

constexpr MVT() = default;		constexpr MVT() = default;
constexpr MVT(SimpleValueType SVT) : SimpleTy(SVT) {}		constexpr MVT(SimpleValueType SVT) : SimpleTy(SVT) {}

bool operator>(const MVT& S) const { return SimpleTy > S.SimpleTy; }		bool operator>(const MVT& S) const { return SimpleTy > S.SimpleTy; }
▲ Show 20 Lines • Show All 661 Lines • ▼ Show 20 Lines	TypeSize getSizeInBits() const {
case nxv32i64: return TypeSize::Scalable(2048);		case nxv32i64: return TypeSize::Scalable(2048);
case v128i32:		case v128i32:
case v64i64:		case v64i64:
case v128f32:		case v128f32:
case v64f64: return TypeSize::Fixed(4096);		case v64f64: return TypeSize::Fixed(4096);
case v256i32:		case v256i32:
case v128i64:		case v128i64:
case v256f32:		case v256f32:
		case x86amx:
case v128f64: return TypeSize::Fixed(8192);		case v128f64: return TypeSize::Fixed(8192);
case v512i32:		case v512i32:
case v256i64:		case v256i64:
case v512f32:		case v512f32:
case v256f64: return TypeSize::Fixed(16384);		case v256f64: return TypeSize::Fixed(16384);
case v1024i32:		case v1024i32:
case v1024f32: return TypeSize::Fixed(32768);		case v1024f32: return TypeSize::Fixed(32768);
case v2048i32:		case v2048i32:
▲ Show 20 Lines • Show All 425 Lines • Show Last 20 Lines

llvm/lib/Analysis/ConstantFolding.cpp

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines
/// Constant fold bitcast, symbolically evaluating it with DataLayout.		/// Constant fold bitcast, symbolically evaluating it with DataLayout.
/// This always returns a non-null constant, but it may be a		/// This always returns a non-null constant, but it may be a
/// ConstantExpr if unfoldable.		/// ConstantExpr if unfoldable.
Constant FoldBitCast(Constant C, Type *DestTy, const DataLayout &DL) {		Constant FoldBitCast(Constant C, Type *DestTy, const DataLayout &DL) {
assert(CastInst::castIsValid(Instruction::BitCast, C, DestTy) &&		assert(CastInst::castIsValid(Instruction::BitCast, C, DestTy) &&
"Invalid constantexpr bitcast!");		"Invalid constantexpr bitcast!");

// Catch the obvious splat cases.		// Catch the obvious splat cases.
if (C->isNullValue() && !DestTy->isX86_MMXTy())		if (C->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy())
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
if (C->isAllOnesValue() && !DestTy->isX86_MMXTy() &&		if (C->isAllOnesValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy() &&
!DestTy->isPtrOrPtrVectorTy()) // Don't get ones for ptr types!		!DestTy->isPtrOrPtrVectorTy()) // Don't get ones for ptr types!
return Constant::getAllOnesValue(DestTy);		return Constant::getAllOnesValue(DestTy);

if (auto *VTy = dyn_cast<VectorType>(C->getType())) {		if (auto *VTy = dyn_cast<VectorType>(C->getType())) {
// Handle a vector->scalar integer/fp cast.		// Handle a vector->scalar integer/fp cast.
if (isa<IntegerType>(DestTy) \|\| DestTy->isFloatingPointTy()) {		if (isa<IntegerType>(DestTy) \|\| DestTy->isFloatingPointTy()) {
unsigned NumSrcElts = cast<FixedVectorType>(VTy)->getNumElements();		unsigned NumSrcElts = cast<FixedVectorType>(VTy)->getNumElements();
Type *SrcEltTy = VTy->getElementType();		Type *SrcEltTy = VTy->getElementType();
▲ Show 20 Lines • Show All 234 Lines • ▼ Show 20 Lines	do {
Type *SrcTy = C->getType();		Type *SrcTy = C->getType();
uint64_t DestSize = DL.getTypeSizeInBits(DestTy);		uint64_t DestSize = DL.getTypeSizeInBits(DestTy);
uint64_t SrcSize = DL.getTypeSizeInBits(SrcTy);		uint64_t SrcSize = DL.getTypeSizeInBits(SrcTy);
if (SrcSize < DestSize)		if (SrcSize < DestSize)
return nullptr;		return nullptr;

// Catch the obvious splat cases (since all-zeros can coerce non-integral		// Catch the obvious splat cases (since all-zeros can coerce non-integral
// pointers legally).		// pointers legally).
if (C->isNullValue() && !DestTy->isX86_MMXTy())		if (C->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy())
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
if (C->isAllOnesValue() &&		if (C->isAllOnesValue() &&
(DestTy->isIntegerTy() \|\| DestTy->isFloatingPointTy() \|\|		(DestTy->isIntegerTy() \|\| DestTy->isFloatingPointTy() \|\|
DestTy->isVectorTy()) &&		DestTy->isVectorTy()) &&
!DestTy->isX86_MMXTy() && !DestTy->isPtrOrPtrVectorTy())		!DestTy->isX86_AMXTy() && !DestTy->isX86_MMXTy() &&
		!DestTy->isPtrOrPtrVectorTy())
// Get ones when the input is trivial, but		// Get ones when the input is trivial, but
// only for supported types inside getAllOnesValue.		// only for supported types inside getAllOnesValue.
return Constant::getAllOnesValue(DestTy);		return Constant::getAllOnesValue(DestTy);

// If the type sizes are the same and a cast is legal, just directly		// If the type sizes are the same and a cast is legal, just directly
// cast the constant.		// cast the constant.
// But be careful not to coerce non-integral pointers illegally.		// But be careful not to coerce non-integral pointers illegally.
if (SrcSize == DestSize &&		if (SrcSize == DestSize &&
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	if (!IntType) {
else if (LoadTy->isVectorTy()) {		else if (LoadTy->isVectorTy()) {
MapTy = PointerType::getIntNTy(		MapTy = PointerType::getIntNTy(
C->getContext(), DL.getTypeSizeInBits(LoadTy).getFixedSize());		C->getContext(), DL.getTypeSizeInBits(LoadTy).getFixedSize());
} else		} else
return nullptr;		return nullptr;

C = FoldBitCast(C, MapTy->getPointerTo(AS), DL);		C = FoldBitCast(C, MapTy->getPointerTo(AS), DL);
if (Constant *Res = FoldReinterpretLoadFromConstPtr(C, MapTy, DL)) {		if (Constant *Res = FoldReinterpretLoadFromConstPtr(C, MapTy, DL)) {
if (Res->isNullValue() && !LoadTy->isX86_MMXTy())		if (Res->isNullValue() && !LoadTy->isX86_MMXTy() &&
		!LoadTy->isX86_AMXTy())
// Materializing a zero can be done trivially without a bitcast		// Materializing a zero can be done trivially without a bitcast
return Constant::getNullValue(LoadTy);		return Constant::getNullValue(LoadTy);
Type *CastTy = LoadTy->isPtrOrPtrVectorTy() ? DL.getIntPtrType(LoadTy) : LoadTy;		Type *CastTy = LoadTy->isPtrOrPtrVectorTy() ? DL.getIntPtrType(LoadTy) : LoadTy;
Res = FoldBitCast(Res, CastTy, DL);		Res = FoldBitCast(Res, CastTy, DL);
if (LoadTy->isPtrOrPtrVectorTy()) {		if (LoadTy->isPtrOrPtrVectorTy()) {
// For vector of pointer, we needed to first convert to a vector of integer, then do vector inttoptr		// For vector of pointer, we needed to first convert to a vector of integer, then do vector inttoptr
if (Res->isNullValue() && !LoadTy->isX86_MMXTy())		if (Res->isNullValue() && !LoadTy->isX86_MMXTy() &&
		!LoadTy->isX86_AMXTy())
return Constant::getNullValue(LoadTy);		return Constant::getNullValue(LoadTy);
if (DL.isNonIntegralPointerType(LoadTy->getScalarType()))		if (DL.isNonIntegralPointerType(LoadTy->getScalarType()))
// Be careful not to replace a load of an addrspace value with an inttoptr here		// Be careful not to replace a load of an addrspace value with an inttoptr here
return nullptr;		return nullptr;
Res = ConstantExpr::getCast(Instruction::IntToPtr, Res, LoadTy);		Res = ConstantExpr::getCast(Instruction::IntToPtr, Res, LoadTy);
}		}
return Res;		return Res;
}		}
▲ Show 20 Lines • Show All 2,533 Lines • Show Last 20 Lines

llvm/lib/AsmParser/LLLexer.cpp

Show First 20 Lines • Show All 833 Lines • ▼ Show 20 Lines	#define TYPEKEYWORD(STR, LLVMTY) \
TYPEKEYWORD("float", Type::getFloatTy(Context));		TYPEKEYWORD("float", Type::getFloatTy(Context));
TYPEKEYWORD("double", Type::getDoubleTy(Context));		TYPEKEYWORD("double", Type::getDoubleTy(Context));
TYPEKEYWORD("x86_fp80", Type::getX86_FP80Ty(Context));		TYPEKEYWORD("x86_fp80", Type::getX86_FP80Ty(Context));
TYPEKEYWORD("fp128", Type::getFP128Ty(Context));		TYPEKEYWORD("fp128", Type::getFP128Ty(Context));
TYPEKEYWORD("ppc_fp128", Type::getPPC_FP128Ty(Context));		TYPEKEYWORD("ppc_fp128", Type::getPPC_FP128Ty(Context));
TYPEKEYWORD("label", Type::getLabelTy(Context));		TYPEKEYWORD("label", Type::getLabelTy(Context));
TYPEKEYWORD("metadata", Type::getMetadataTy(Context));		TYPEKEYWORD("metadata", Type::getMetadataTy(Context));
TYPEKEYWORD("x86_mmx", Type::getX86_MMXTy(Context));		TYPEKEYWORD("x86_mmx", Type::getX86_MMXTy(Context));
		TYPEKEYWORD("x86_amx", Type::getX86_AMXTy(Context));
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - TYPEKEYWORD("x86_amx", Type::getX86_AMXTy(Context)); + TYPEKEYWORD("x86_amx", Type::getX86_AMXTy(Context)); Lint: Pre-merge checks: clang-format: please reformat the code ``` - TYPEKEYWORD("x86_amx", Type::getX86_AMXTy…
TYPEKEYWORD("token", Type::getTokenTy(Context));		TYPEKEYWORD("token", Type::getTokenTy(Context));

#undef TYPEKEYWORD		#undef TYPEKEYWORD

// Keywords for instructions.		// Keywords for instructions.
#define INSTKEYWORD(STR, Enum) \		#define INSTKEYWORD(STR, Enum) \
do { \		do { \
if (Keyword == #STR) { \		if (Keyword == #STR) { \
▲ Show 20 Lines • Show All 321 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

Show First 20 Lines • Show All 1,753 Lines • ▼ Show 20 Lines	case bitc::TYPE_CODE_LABEL: // LABEL
ResultTy = Type::getLabelTy(Context);		ResultTy = Type::getLabelTy(Context);
break;		break;
case bitc::TYPE_CODE_METADATA: // METADATA		case bitc::TYPE_CODE_METADATA: // METADATA
ResultTy = Type::getMetadataTy(Context);		ResultTy = Type::getMetadataTy(Context);
break;		break;
case bitc::TYPE_CODE_X86_MMX: // X86_MMX		case bitc::TYPE_CODE_X86_MMX: // X86_MMX
ResultTy = Type::getX86_MMXTy(Context);		ResultTy = Type::getX86_MMXTy(Context);
break;		break;
		case bitc::TYPE_CODE_X86_AMX: // X86_AMX
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case bitc::TYPE_CODE_X86_AMX: // X86_AMX + case bitc::TYPE_CODE_X86_AMX: // X86_AMX Lint: Pre-merge checks: clang-format: please reformat the code ``` - case bitc::TYPE_CODE_X86_AMX: // X86_AMX +…
		ResultTy = Type::getX86_AMXTy(Context);
		break;
case bitc::TYPE_CODE_TOKEN: // TOKEN		case bitc::TYPE_CODE_TOKEN: // TOKEN
ResultTy = Type::getTokenTy(Context);		ResultTy = Type::getTokenTy(Context);
break;		break;
case bitc::TYPE_CODE_INTEGER: { // INTEGER: [width]		case bitc::TYPE_CODE_INTEGER: { // INTEGER: [width]
if (Record.empty())		if (Record.empty())
return error("Invalid record");		return error("Invalid record");

uint64_t NumBits = Record[0];		uint64_t NumBits = Record[0];
▲ Show 20 Lines • Show All 5,204 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 903 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = TypeList.size(); i != e; ++i) {
case Type::FloatTyID: Code = bitc::TYPE_CODE_FLOAT; break;		case Type::FloatTyID: Code = bitc::TYPE_CODE_FLOAT; break;
case Type::DoubleTyID: Code = bitc::TYPE_CODE_DOUBLE; break;		case Type::DoubleTyID: Code = bitc::TYPE_CODE_DOUBLE; break;
case Type::X86_FP80TyID: Code = bitc::TYPE_CODE_X86_FP80; break;		case Type::X86_FP80TyID: Code = bitc::TYPE_CODE_X86_FP80; break;
case Type::FP128TyID: Code = bitc::TYPE_CODE_FP128; break;		case Type::FP128TyID: Code = bitc::TYPE_CODE_FP128; break;
case Type::PPC_FP128TyID: Code = bitc::TYPE_CODE_PPC_FP128; break;		case Type::PPC_FP128TyID: Code = bitc::TYPE_CODE_PPC_FP128; break;
case Type::LabelTyID: Code = bitc::TYPE_CODE_LABEL; break;		case Type::LabelTyID: Code = bitc::TYPE_CODE_LABEL; break;
case Type::MetadataTyID: Code = bitc::TYPE_CODE_METADATA; break;		case Type::MetadataTyID: Code = bitc::TYPE_CODE_METADATA; break;
case Type::X86_MMXTyID: Code = bitc::TYPE_CODE_X86_MMX; break;		case Type::X86_MMXTyID: Code = bitc::TYPE_CODE_X86_MMX; break;
		case Type::X86_AMXTyID: Code = bitc::TYPE_CODE_X86_AMX; break;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case Type::X86_AMXTyID: Code = bitc::TYPE_CODE_X86_AMX; break; + case Type::X86_AMXTyID: + Code = bitc::TYPE_CODE_X86_AMX; + break; Lint: Pre-merge checks: clang-format: please reformat the code ``` - case Type::X86_AMXTyID: Code = bitc…
case Type::TokenTyID: Code = bitc::TYPE_CODE_TOKEN; break;		case Type::TokenTyID: Code = bitc::TYPE_CODE_TOKEN; break;
case Type::IntegerTyID:		case Type::IntegerTyID:
// INTEGER: [width]		// INTEGER: [width]
Code = bitc::TYPE_CODE_INTEGER;		Code = bitc::TYPE_CODE_INTEGER;
TypeVals.push_back(cast<IntegerType>(T)->getBitWidth());		TypeVals.push_back(cast<IntegerType>(T)->getBitWidth());
break;		break;
case Type::PointerTyID: {		case Type::PointerTyID: {
PointerType *PTy = cast<PointerType>(T);		PointerType *PTy = cast<PointerType>(T);
▲ Show 20 Lines • Show All 4,008 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ValueTypes.cpp

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	if (isFloatingPoint())
return "f" + utostr(getSizeInBits());		return "f" + utostr(getSizeInBits());
llvm_unreachable("Invalid EVT!");		llvm_unreachable("Invalid EVT!");
case MVT::bf16: return "bf16";		case MVT::bf16: return "bf16";
case MVT::ppcf128: return "ppcf128";		case MVT::ppcf128: return "ppcf128";
case MVT::isVoid: return "isVoid";		case MVT::isVoid: return "isVoid";
case MVT::Other: return "ch";		case MVT::Other: return "ch";
case MVT::Glue: return "glue";		case MVT::Glue: return "glue";
case MVT::x86mmx: return "x86mmx";		case MVT::x86mmx: return "x86mmx";
		case MVT::x86amx: return "x86amx";
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case MVT::x86amx: return "x86amx"; + case MVT::x86amx: + return "x86amx"; Lint: Pre-merge checks: clang-format: please reformat the code ``` - case MVT::x86amx: return "x86amx"; + case MVT…
case MVT::Metadata: return "Metadata";		case MVT::Metadata: return "Metadata";
case MVT::Untyped: return "Untyped";		case MVT::Untyped: return "Untyped";
case MVT::exnref: return "exnref";		case MVT::exnref: return "exnref";
case MVT::funcref: return "funcref";		case MVT::funcref: return "funcref";
case MVT::externref: return "externref";		case MVT::externref: return "externref";
}		}
}		}

Show All 15 Lines	Type *EVT::getTypeForEVT(LLVMContext &Context) const {
case MVT::f16: return Type::getHalfTy(Context);		case MVT::f16: return Type::getHalfTy(Context);
case MVT::bf16: return Type::getBFloatTy(Context);		case MVT::bf16: return Type::getBFloatTy(Context);
case MVT::f32: return Type::getFloatTy(Context);		case MVT::f32: return Type::getFloatTy(Context);
case MVT::f64: return Type::getDoubleTy(Context);		case MVT::f64: return Type::getDoubleTy(Context);
case MVT::f80: return Type::getX86_FP80Ty(Context);		case MVT::f80: return Type::getX86_FP80Ty(Context);
case MVT::f128: return Type::getFP128Ty(Context);		case MVT::f128: return Type::getFP128Ty(Context);
case MVT::ppcf128: return Type::getPPC_FP128Ty(Context);		case MVT::ppcf128: return Type::getPPC_FP128Ty(Context);
case MVT::x86mmx: return Type::getX86_MMXTy(Context);		case MVT::x86mmx: return Type::getX86_MMXTy(Context);
		case MVT::x86amx: return Type::getX86_AMXTy(Context);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case MVT::x86amx: return Type::getX86_AMXTy(Context); + case MVT::x86amx: + return Type::getX86_AMXTy(Context); Lint: Pre-merge checks: clang-format: please reformat the code ``` - case MVT::x86amx: return Type::getX86_AMXTy…
case MVT::v1i1:		case MVT::v1i1:
return FixedVectorType::get(Type::getInt1Ty(Context), 1);		return FixedVectorType::get(Type::getInt1Ty(Context), 1);
case MVT::v2i1:		case MVT::v2i1:
return FixedVectorType::get(Type::getInt1Ty(Context), 2);		return FixedVectorType::get(Type::getInt1Ty(Context), 2);
case MVT::v4i1:		case MVT::v4i1:
return FixedVectorType::get(Type::getInt1Ty(Context), 4);		return FixedVectorType::get(Type::getInt1Ty(Context), 4);
case MVT::v8i1:		case MVT::v8i1:
return FixedVectorType::get(Type::getInt1Ty(Context), 8);		return FixedVectorType::get(Type::getInt1Ty(Context), 8);
▲ Show 20 Lines • Show All 290 Lines • ▼ Show 20 Lines	MVT MVT::getVT(Type *Ty, bool HandleUnknown){
case Type::IntegerTyID:		case Type::IntegerTyID:
return getIntegerVT(cast<IntegerType>(Ty)->getBitWidth());		return getIntegerVT(cast<IntegerType>(Ty)->getBitWidth());
case Type::HalfTyID: return MVT(MVT::f16);		case Type::HalfTyID: return MVT(MVT::f16);
case Type::BFloatTyID: return MVT(MVT::bf16);		case Type::BFloatTyID: return MVT(MVT::bf16);
case Type::FloatTyID: return MVT(MVT::f32);		case Type::FloatTyID: return MVT(MVT::f32);
case Type::DoubleTyID: return MVT(MVT::f64);		case Type::DoubleTyID: return MVT(MVT::f64);
case Type::X86_FP80TyID: return MVT(MVT::f80);		case Type::X86_FP80TyID: return MVT(MVT::f80);
case Type::X86_MMXTyID: return MVT(MVT::x86mmx);		case Type::X86_MMXTyID: return MVT(MVT::x86mmx);
		case Type::X86_AMXTyID: return MVT(MVT::x86amx);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case Type::X86_AMXTyID: return MVT(MVT::x86amx); + case Type::X86_AMXTyID: + return MVT(MVT::x86amx); Lint: Pre-merge checks: clang-format: please reformat the code ``` - case Type::X86_AMXTyID: return MVT(MVT::x86amx)…
case Type::FP128TyID: return MVT(MVT::f128);		case Type::FP128TyID: return MVT(MVT::f128);
case Type::PPC_FP128TyID: return MVT(MVT::ppcf128);		case Type::PPC_FP128TyID: return MVT(MVT::ppcf128);
case Type::PointerTyID: return MVT(MVT::iPTR);		case Type::PointerTyID: return MVT(MVT::iPTR);
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
VectorType *VTy = cast<VectorType>(Ty);		VectorType *VTy = cast<VectorType>(Ty);
return getVectorVT(		return getVectorVT(
getVT(VTy->getElementType(), /HandleUnknown=/ false),		getVT(VTy->getElementType(), /HandleUnknown=/ false),
Show All 23 Lines

llvm/lib/IR/AsmWriter.cpp

Show First 20 Lines • Show All 603 Lines • ▼ Show 20 Lines	void TypePrinting::print(Type *Ty, raw_ostream &OS) {
case Type::FloatTyID: OS << "float"; return;		case Type::FloatTyID: OS << "float"; return;
case Type::DoubleTyID: OS << "double"; return;		case Type::DoubleTyID: OS << "double"; return;
case Type::X86_FP80TyID: OS << "x86_fp80"; return;		case Type::X86_FP80TyID: OS << "x86_fp80"; return;
case Type::FP128TyID: OS << "fp128"; return;		case Type::FP128TyID: OS << "fp128"; return;
case Type::PPC_FP128TyID: OS << "ppc_fp128"; return;		case Type::PPC_FP128TyID: OS << "ppc_fp128"; return;
case Type::LabelTyID: OS << "label"; return;		case Type::LabelTyID: OS << "label"; return;
case Type::MetadataTyID: OS << "metadata"; return;		case Type::MetadataTyID: OS << "metadata"; return;
case Type::X86_MMXTyID: OS << "x86_mmx"; return;		case Type::X86_MMXTyID: OS << "x86_mmx"; return;
		case Type::X86_AMXTyID: OS << "x86_amx"; return;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case Type::X86_AMXTyID: OS << "x86_amx"; return; + case Type::X86_AMXTyID: + OS << "x86_amx"; + return; Lint: Pre-merge checks: clang-format: please reformat the code ``` - case Type::X86_AMXTyID: OS << "x86_amx"; return…
case Type::TokenTyID: OS << "token"; return;		case Type::TokenTyID: OS << "token"; return;
case Type::IntegerTyID:		case Type::IntegerTyID:
OS << 'i' << cast<IntegerType>(Ty)->getBitWidth();		OS << 'i' << cast<IntegerType>(Ty)->getBitWidth();
return;		return;

case Type::FunctionTyID: {		case Type::FunctionTyID: {
FunctionType *FTy = cast<FunctionType>(Ty);		FunctionType *FTy = cast<FunctionType>(Ty);
print(FTy->getReturnType(), OS);		print(FTy->getReturnType(), OS);
▲ Show 20 Lines • Show All 4,139 Lines • Show Last 20 Lines

llvm/lib/IR/ConstantFold.cpp

Show First 20 Lines • Show All 529 Lines • ▼ Show 20 Lines	if (isa<UndefValue>(V)) {
// sext(undef) = 0, because the top bits will all be the same.		// sext(undef) = 0, because the top bits will all be the same.
// [us]itofp(undef) = 0, because the result value is bounded.		// [us]itofp(undef) = 0, because the result value is bounded.
if (opc == Instruction::ZExt \|\| opc == Instruction::SExt \|\|		if (opc == Instruction::ZExt \|\| opc == Instruction::SExt \|\|
opc == Instruction::UIToFP \|\| opc == Instruction::SIToFP)		opc == Instruction::UIToFP \|\| opc == Instruction::SIToFP)
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
return UndefValue::get(DestTy);		return UndefValue::get(DestTy);
}		}

if (V->isNullValue() && !DestTy->isX86_MMXTy() &&		if (V->isNullValue() && !DestTy->isX86_MMXTy() && !DestTy->isX86_AMXTy() &&
opc != Instruction::AddrSpaceCast)		opc != Instruction::AddrSpaceCast)
return Constant::getNullValue(DestTy);		return Constant::getNullValue(DestTy);
		pengfeiUnsubmitted Not Done Reply Inline Actions Operation should at the end of the line. pengfei: Operation should at the end of the line.

// If the cast operand is a constant expression, there's a few things we can		// If the cast operand is a constant expression, there's a few things we can
// do to try to simplify it.		// do to try to simplify it.
if (ConstantExpr *CE = dyn_cast<ConstantExpr>(V)) {		if (ConstantExpr *CE = dyn_cast<ConstantExpr>(V)) {
if (CE->isCast()) {		if (CE->isCast()) {
// Try hard to fold cast of cast because they are often eliminable.		// Try hard to fold cast of cast because they are often eliminable.
if (unsigned newOpc = foldConstantCastPair(opc, CE, DestTy))		if (unsigned newOpc = foldConstantCastPair(opc, CE, DestTy))
return ConstantExpr::getCast(newOpc, CE->getOperand(0), DestTy);		return ConstantExpr::getCast(newOpc, CE->getOperand(0), DestTy);
▲ Show 20 Lines • Show All 2,100 Lines • Show Last 20 Lines

llvm/lib/IR/Core.cpp

Show First 20 Lines • Show All 506 Lines • ▼ Show 20 Lines	LLVMTypeKind LLVMGetTypeKind(LLVMTypeRef Ty) {
case Type::ArrayTyID:		case Type::ArrayTyID:
return LLVMArrayTypeKind;		return LLVMArrayTypeKind;
case Type::PointerTyID:		case Type::PointerTyID:
return LLVMPointerTypeKind;		return LLVMPointerTypeKind;
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
return LLVMVectorTypeKind;		return LLVMVectorTypeKind;
case Type::X86_MMXTyID:		case Type::X86_MMXTyID:
return LLVMX86_MMXTypeKind;		return LLVMX86_MMXTypeKind;
		case Type::X86_AMXTyID:
		return LLVMX86_AMXTypeKind;
case Type::TokenTyID:		case Type::TokenTyID:
return LLVMTokenTypeKind;		return LLVMTokenTypeKind;
case Type::ScalableVectorTyID:		case Type::ScalableVectorTyID:
return LLVMScalableVectorTypeKind;		return LLVMScalableVectorTypeKind;
}		}
llvm_unreachable("Unhandled TypeID.");		llvm_unreachable("Unhandled TypeID.");
}		}

▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	LLVMTypeRef LLVMFP128TypeInContext(LLVMContextRef C) {
return (LLVMTypeRef) Type::getFP128Ty(*unwrap(C));		return (LLVMTypeRef) Type::getFP128Ty(*unwrap(C));
}		}
LLVMTypeRef LLVMPPCFP128TypeInContext(LLVMContextRef C) {		LLVMTypeRef LLVMPPCFP128TypeInContext(LLVMContextRef C) {
return (LLVMTypeRef) Type::getPPC_FP128Ty(*unwrap(C));		return (LLVMTypeRef) Type::getPPC_FP128Ty(*unwrap(C));
}		}
LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C) {		LLVMTypeRef LLVMX86MMXTypeInContext(LLVMContextRef C) {
return (LLVMTypeRef) Type::getX86_MMXTy(*unwrap(C));		return (LLVMTypeRef) Type::getX86_MMXTy(*unwrap(C));
}		}
		LLVMTypeRef LLVMX86AMXTypeInContext(LLVMContextRef C) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LLVMX86AMXTypeInContext' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'LLVMX86AMXTypeInContext' [readability…
		return (LLVMTypeRef) Type::getX86_AMXTy(*unwrap(C));
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - return (LLVMTypeRef) Type::getX86_AMXTy(unwrap(C)); + return (LLVMTypeRef)Type::getX86_AMXTy(unwrap(C)); Lint: Pre-merge checks: clang-format: please reformat the code ``` - return (LLVMTypeRef) Type::getX86_AMXTy(*unwrap…
		}

LLVMTypeRef LLVMHalfType(void) {		LLVMTypeRef LLVMHalfType(void) {
return LLVMHalfTypeInContext(LLVMGetGlobalContext());		return LLVMHalfTypeInContext(LLVMGetGlobalContext());
}		}
LLVMTypeRef LLVMBFloatType(void) {		LLVMTypeRef LLVMBFloatType(void) {
return LLVMBFloatTypeInContext(LLVMGetGlobalContext());		return LLVMBFloatTypeInContext(LLVMGetGlobalContext());
}		}
LLVMTypeRef LLVMFloatType(void) {		LLVMTypeRef LLVMFloatType(void) {
Show All 9 Lines	LLVMTypeRef LLVMFP128Type(void) {
return LLVMFP128TypeInContext(LLVMGetGlobalContext());		return LLVMFP128TypeInContext(LLVMGetGlobalContext());
}		}
LLVMTypeRef LLVMPPCFP128Type(void) {		LLVMTypeRef LLVMPPCFP128Type(void) {
return LLVMPPCFP128TypeInContext(LLVMGetGlobalContext());		return LLVMPPCFP128TypeInContext(LLVMGetGlobalContext());
}		}
LLVMTypeRef LLVMX86MMXType(void) {		LLVMTypeRef LLVMX86MMXType(void) {
return LLVMX86MMXTypeInContext(LLVMGetGlobalContext());		return LLVMX86MMXTypeInContext(LLVMGetGlobalContext());
}		}
		LLVMTypeRef LLVMX86AMXType(void) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'LLVMX86AMXType' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'LLVMX86AMXType' [readability-identifier…
		return LLVMX86AMXTypeInContext(LLVMGetGlobalContext());
		}

/--.. Operations on function types ........................................--/		/--.. Operations on function types ........................................--/

LLVMTypeRef LLVMFunctionType(LLVMTypeRef ReturnType,		LLVMTypeRef LLVMFunctionType(LLVMTypeRef ReturnType,
LLVMTypeRef *ParamTypes, unsigned ParamCount,		LLVMTypeRef *ParamTypes, unsigned ParamCount,
LLVMBool IsVarArg) {		LLVMBool IsVarArg) {
ArrayRef<Type*> Tys(unwrap(ParamTypes), ParamCount);		ArrayRef<Type*> Tys(unwrap(ParamTypes), ParamCount);
return wrap(FunctionType::get(unwrap(ReturnType), Tys, IsVarArg != 0));		return wrap(FunctionType::get(unwrap(ReturnType), Tys, IsVarArg != 0));
▲ Show 20 Lines • Show All 3,504 Lines • Show Last 20 Lines

llvm/lib/IR/DataLayout.cpp

Show First 20 Lines • Show All 787 Lines • ▼ Show 20 Lines	case Type::X86_FP80TyID: {
// less conservative, they should have specified it explicitly in the data		// less conservative, they should have specified it explicitly in the data
// layout.		// layout.
return Align(PowerOf2Ceil(BitWidth / 8));		return Align(PowerOf2Ceil(BitWidth / 8));
}		}
case Type::X86_MMXTyID:		case Type::X86_MMXTyID:
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
unsigned BitWidth = getTypeSizeInBits(Ty).getKnownMinSize();		unsigned BitWidth = getTypeSizeInBits(Ty).getKnownMinSize();
auto I = findAlignmentLowerBound(VECTOR_ALIGN, BitWidth);		auto I = findAlignmentLowerBound(VECTOR_ALIGN, BitWidth);
		pengfeiUnsubmitted Done Reply Inline Actions Should be 512 bits? pengfei: Should be 512 bits?
		LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Yes. It is 512. Thanks. LuoYuanke: Yes. It is 512. Thanks.
if (I != Alignments.end() && I->AlignType == VECTOR_ALIGN &&		if (I != Alignments.end() && I->AlignType == VECTOR_ALIGN &&
I->TypeBitWidth == BitWidth)		I->TypeBitWidth == BitWidth)
return abi_or_pref ? I->ABIAlign : I->PrefAlign;		return abi_or_pref ? I->ABIAlign : I->PrefAlign;

// By default, use natural alignment for vector types. This is consistent		// By default, use natural alignment for vector types. This is consistent
// with what clang and llvm-gcc do.		// with what clang and llvm-gcc do.
// TODO: This should probably not be using the alloc size.		// TODO: This should probably not be using the alloc size.
unsigned Alignment =		unsigned Alignment =
getTypeAllocSize(cast<VectorType>(Ty)->getElementType());		getTypeAllocSize(cast<VectorType>(Ty)->getElementType());
// We're only calculating a natural alignment, so it doesn't have to be		// We're only calculating a natural alignment, so it doesn't have to be
// based on the full size for scalable vectors. Using the minimum element		// based on the full size for scalable vectors. Using the minimum element
// count should be enough here.		// count should be enough here.
Alignment *= cast<VectorType>(Ty)->getElementCount().getKnownMinValue();		Alignment *= cast<VectorType>(Ty)->getElementCount().getKnownMinValue();
Alignment = PowerOf2Ceil(Alignment);		Alignment = PowerOf2Ceil(Alignment);
return Align(Alignment);		return Align(Alignment);
}		}
		case Type::X86_AMXTyID:
		return Align(512);
default:		default:
llvm_unreachable("Bad type for getAlignment!!!");		llvm_unreachable("Bad type for getAlignment!!!");
}		}
}		}

/// TODO: Remove this function once the transition to Align is over.		/// TODO: Remove this function once the transition to Align is over.
unsigned DataLayout::getABITypeAlignment(Type *Ty) const {		unsigned DataLayout::getABITypeAlignment(Type *Ty) const {
return getABITypeAlign(Ty).value();		return getABITypeAlign(Ty).value();
▲ Show 20 Lines • Show All 117 Lines • Show Last 20 Lines

llvm/lib/IR/Function.cpp

Show First 20 Lines • Show All 758 Lines • ▼ Show 20 Lines	if (PointerType* PTyp = dyn_cast<PointerType>(Ty)) {
case Type::HalfTyID: Result += "f16"; break;		case Type::HalfTyID: Result += "f16"; break;
case Type::BFloatTyID: Result += "bf16"; break;		case Type::BFloatTyID: Result += "bf16"; break;
case Type::FloatTyID: Result += "f32"; break;		case Type::FloatTyID: Result += "f32"; break;
case Type::DoubleTyID: Result += "f64"; break;		case Type::DoubleTyID: Result += "f64"; break;
case Type::X86_FP80TyID: Result += "f80"; break;		case Type::X86_FP80TyID: Result += "f80"; break;
case Type::FP128TyID: Result += "f128"; break;		case Type::FP128TyID: Result += "f128"; break;
case Type::PPC_FP128TyID: Result += "ppcf128"; break;		case Type::PPC_FP128TyID: Result += "ppcf128"; break;
case Type::X86_MMXTyID: Result += "x86mmx"; break;		case Type::X86_MMXTyID: Result += "x86mmx"; break;
		case Type::X86_AMXTyID: Result += "x86amx"; break;
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case Type::X86_AMXTyID: Result += "x86amx"; break; + case Type::X86_AMXTyID: + Result += "x86amx"; + break; Lint: Pre-merge checks: clang-format: please reformat the code ``` - case Type::X86_AMXTyID: Result += "x86amx"…
case Type::IntegerTyID:		case Type::IntegerTyID:
Result += "i" + utostr(cast<IntegerType>(Ty)->getBitWidth());		Result += "i" + utostr(cast<IntegerType>(Ty)->getBitWidth());
break;		break;
}		}
}		}
return Result;		return Result;
}		}

Show All 17 Lines

/// IIT_Info - These are enumerators that describe the entries returned by the		/// IIT_Info - These are enumerators that describe the entries returned by the
/// getIntrinsicInfoTableEntries function.		/// getIntrinsicInfoTableEntries function.
///		///
/// NOTE: This must be kept in synch with the copy in TblGen/IntrinsicEmitter!		/// NOTE: This must be kept in synch with the copy in TblGen/IntrinsicEmitter!
enum IIT_Info {		enum IIT_Info {
// Common values should be encoded with 0-15.		// Common values should be encoded with 0-15.
IIT_Done = 0,		IIT_Done = 0,
IIT_I1 = 1,		IIT_I1 = 1,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - IIT_I1 = 1, - IIT_I8 = 2, - IIT_I16 = 3, - IIT_I32 = 4, - IIT_I64 = 5, - IIT_F16 = 6, - IIT_F32 = 7, - IIT_F64 = 8, - IIT_V2 = 9, - IIT_V4 = 10, 20 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - IIT_I1 = 1, - IIT_I8 = 2, - IIT_I16 = 3…
IIT_I8 = 2,		IIT_I8 = 2,
IIT_I16 = 3,		IIT_I16 = 3,
IIT_I32 = 4,		IIT_I32 = 4,
IIT_I64 = 5,		IIT_I64 = 5,
IIT_F16 = 6,		IIT_F16 = 6,
IIT_F32 = 7,		IIT_F32 = 7,
IIT_F64 = 8,		IIT_F64 = 8,
IIT_V2 = 9,		IIT_V2 = 9,
IIT_V4 = 10,		IIT_V4 = 10,
IIT_V8 = 11,		IIT_V8 = 11,
IIT_V16 = 12,		IIT_V16 = 12,
IIT_V32 = 13,		IIT_V32 = 13,
IIT_PTR = 14,		IIT_PTR = 14,
IIT_ARG = 15,		IIT_ARG = 15,

// Values from 16+ are only encodable with the inefficient encoding.		// Values from 16+ are only encodable with the inefficient encoding.
IIT_V64 = 16,		IIT_V64 = 16,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - IIT_V64 = 16, - IIT_MMX = 17, + IIT_V64 = 16, + IIT_MMX = 17, Lint: Pre-merge checks: clang-format: please reformat the code ``` - IIT_V64 = 16, - IIT_MMX = 17, + IIT_V64 = 16…
IIT_MMX = 17,		IIT_MMX = 17,
IIT_TOKEN = 18,		IIT_TOKEN = 18,
IIT_METADATA = 19,		IIT_METADATA = 19,
IIT_EMPTYSTRUCT = 20,		IIT_EMPTYSTRUCT = 20,
IIT_STRUCT2 = 21,		IIT_STRUCT2 = 21,
IIT_STRUCT3 = 22,		IIT_STRUCT3 = 22,
IIT_STRUCT4 = 23,		IIT_STRUCT4 = 23,
IIT_STRUCT5 = 24,		IIT_STRUCT5 = 24,
IIT_EXTEND_ARG = 25,		IIT_EXTEND_ARG = 25,
IIT_TRUNC_ARG = 26,		IIT_TRUNC_ARG = 26,
IIT_ANYPTR = 27,		IIT_ANYPTR = 27,
IIT_V1 = 28,		IIT_V1 = 28,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - IIT_V1 = 28, + IIT_V1 = 28, Lint: Pre-merge checks: clang-format: please reformat the code ``` - IIT_V1 = 28, + IIT_V1 = 28, ```
IIT_VARARG = 29,		IIT_VARARG = 29,
IIT_HALF_VEC_ARG = 30,		IIT_HALF_VEC_ARG = 30,
IIT_SAME_VEC_WIDTH_ARG = 31,		IIT_SAME_VEC_WIDTH_ARG = 31,
IIT_PTR_TO_ARG = 32,		IIT_PTR_TO_ARG = 32,
IIT_PTR_TO_ELT = 33,		IIT_PTR_TO_ELT = 33,
IIT_VEC_OF_ANYPTRS_TO_ELT = 34,		IIT_VEC_OF_ANYPTRS_TO_ELT = 34,
IIT_I128 = 35,		IIT_I128 = 35,
IIT_V512 = 36,		IIT_V512 = 36,
IIT_V1024 = 37,		IIT_V1024 = 37,
IIT_STRUCT6 = 38,		IIT_STRUCT6 = 38,
IIT_STRUCT7 = 39,		IIT_STRUCT7 = 39,
IIT_STRUCT8 = 40,		IIT_STRUCT8 = 40,
IIT_F128 = 41,		IIT_F128 = 41,
IIT_VEC_ELEMENT = 42,		IIT_VEC_ELEMENT = 42,
IIT_SCALABLE_VEC = 43,		IIT_SCALABLE_VEC = 43,
IIT_SUBDIVIDE2_ARG = 44,		IIT_SUBDIVIDE2_ARG = 44,
IIT_SUBDIVIDE4_ARG = 45,		IIT_SUBDIVIDE4_ARG = 45,
IIT_VEC_OF_BITCASTS_TO_INT = 46,		IIT_VEC_OF_BITCASTS_TO_INT = 46,
IIT_V128 = 47,		IIT_V128 = 47,
IIT_BF16 = 48,		IIT_BF16 = 48,
IIT_STRUCT9 = 49,		IIT_STRUCT9 = 49,
IIT_V256 = 50		IIT_V256 = 50,
		IIT_AMX = 51
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - IIT_AMX = 51 + IIT_AMX = 51 Lint: Pre-merge checks: clang-format: please reformat the code ``` - IIT_AMX = 51 + IIT_AMX = 51 ```
};		};

static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,		static void DecodeIITType(unsigned &NextElt, ArrayRef<unsigned char> Infos,
IIT_Info LastInfo,		IIT_Info LastInfo,
SmallVectorImpl<Intrinsic::IITDescriptor> &OutputTable) {		SmallVectorImpl<Intrinsic::IITDescriptor> &OutputTable) {
using namespace Intrinsic;		using namespace Intrinsic;

bool IsScalableVector = (LastInfo == IIT_SCALABLE_VEC);		bool IsScalableVector = (LastInfo == IIT_SCALABLE_VEC);

IIT_Info Info = IIT_Info(Infos[NextElt++]);		IIT_Info Info = IIT_Info(Infos[NextElt++]);
unsigned StructElts = 2;		unsigned StructElts = 2;

switch (Info) {		switch (Info) {
case IIT_Done:		case IIT_Done:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Void, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Void, 0));
return;		return;
case IIT_VARARG:		case IIT_VARARG:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::VarArg, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::VarArg, 0));
return;		return;
case IIT_MMX:		case IIT_MMX:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::MMX, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::MMX, 0));
return;		return;
		case IIT_AMX:
		OutputTable.push_back(IITDescriptor::get(IITDescriptor::AMX, 0));
		return;
case IIT_TOKEN:		case IIT_TOKEN:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Token, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Token, 0));
return;		return;
case IIT_METADATA:		case IIT_METADATA:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Metadata, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Metadata, 0));
return;		return;
case IIT_F16:		case IIT_F16:
OutputTable.push_back(IITDescriptor::get(IITDescriptor::Half, 0));		OutputTable.push_back(IITDescriptor::get(IITDescriptor::Half, 0));
▲ Show 20 Lines • Show All 221 Lines • ▼ Show 20 Lines	static Type *DecodeFixedType(ArrayRef<Intrinsic::IITDescriptor> &Infos,

IITDescriptor D = Infos.front();		IITDescriptor D = Infos.front();
Infos = Infos.slice(1);		Infos = Infos.slice(1);

switch (D.Kind) {		switch (D.Kind) {
case IITDescriptor::Void: return Type::getVoidTy(Context);		case IITDescriptor::Void: return Type::getVoidTy(Context);
case IITDescriptor::VarArg: return Type::getVoidTy(Context);		case IITDescriptor::VarArg: return Type::getVoidTy(Context);
case IITDescriptor::MMX: return Type::getX86_MMXTy(Context);		case IITDescriptor::MMX: return Type::getX86_MMXTy(Context);
		case IITDescriptor::AMX: return Type::getX86_AMXTy(Context);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case IITDescriptor::AMX: return Type::getX86_AMXTy(Context); + case IITDescriptor::AMX: + return Type::getX86_AMXTy(Context); Lint: Pre-merge checks: clang-format: please reformat the code ``` - case IITDescriptor::AMX: return Type…
case IITDescriptor::Token: return Type::getTokenTy(Context);		case IITDescriptor::Token: return Type::getTokenTy(Context);
case IITDescriptor::Metadata: return Type::getMetadataTy(Context);		case IITDescriptor::Metadata: return Type::getMetadataTy(Context);
case IITDescriptor::Half: return Type::getHalfTy(Context);		case IITDescriptor::Half: return Type::getHalfTy(Context);
case IITDescriptor::BFloat: return Type::getBFloatTy(Context);		case IITDescriptor::BFloat: return Type::getBFloatTy(Context);
case IITDescriptor::Float: return Type::getFloatTy(Context);		case IITDescriptor::Float: return Type::getFloatTy(Context);
case IITDescriptor::Double: return Type::getDoubleTy(Context);		case IITDescriptor::Double: return Type::getDoubleTy(Context);
case IITDescriptor::Quad: return Type::getFP128Ty(Context);		case IITDescriptor::Quad: return Type::getFP128Ty(Context);

▲ Show 20 Lines • Show All 163 Lines • ▼ Show 20 Lines	static bool matchIntrinsicType(

IITDescriptor D = Infos.front();		IITDescriptor D = Infos.front();
Infos = Infos.slice(1);		Infos = Infos.slice(1);

switch (D.Kind) {		switch (D.Kind) {
case IITDescriptor::Void: return !Ty->isVoidTy();		case IITDescriptor::Void: return !Ty->isVoidTy();
case IITDescriptor::VarArg: return true;		case IITDescriptor::VarArg: return true;
case IITDescriptor::MMX: return !Ty->isX86_MMXTy();		case IITDescriptor::MMX: return !Ty->isX86_MMXTy();
		case IITDescriptor::AMX: return !Ty->isX86_AMXTy();
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case IITDescriptor::AMX: return !Ty->isX86_AMXTy(); + case IITDescriptor::AMX: + return !Ty->isX86_AMXTy(); Lint: Pre-merge checks: clang-format: please reformat the code ``` - case IITDescriptor::AMX: return !Ty…
case IITDescriptor::Token: return !Ty->isTokenTy();		case IITDescriptor::Token: return !Ty->isTokenTy();
case IITDescriptor::Metadata: return !Ty->isMetadataTy();		case IITDescriptor::Metadata: return !Ty->isMetadataTy();
case IITDescriptor::Half: return !Ty->isHalfTy();		case IITDescriptor::Half: return !Ty->isHalfTy();
case IITDescriptor::BFloat: return !Ty->isBFloatTy();		case IITDescriptor::BFloat: return !Ty->isBFloatTy();
case IITDescriptor::Float: return !Ty->isFloatTy();		case IITDescriptor::Float: return !Ty->isFloatTy();
case IITDescriptor::Double: return !Ty->isDoubleTy();		case IITDescriptor::Double: return !Ty->isDoubleTy();
case IITDescriptor::Quad: return !Ty->isFP128Ty();		case IITDescriptor::Quad: return !Ty->isFP128Ty();
case IITDescriptor::Integer: return !Ty->isIntegerTy(D.Integer_Width);		case IITDescriptor::Integer: return !Ty->isIntegerTy(D.Integer_Width);
▲ Show 20 Lines • Show All 479 Lines • Show Last 20 Lines

llvm/lib/IR/LLVMContextImpl.h

Show First 20 Lines • Show All 1,410 Lines • ▼ Show 20 Lines	#include "llvm/IR/Metadata.def"
ConstantInt *TheTrueVal = nullptr;		ConstantInt *TheTrueVal = nullptr;
ConstantInt *TheFalseVal = nullptr;		ConstantInt *TheFalseVal = nullptr;

std::unique_ptr<ConstantTokenNone> TheNoneToken;		std::unique_ptr<ConstantTokenNone> TheNoneToken;

// Basic type instances.		// Basic type instances.
Type VoidTy, LabelTy, HalfTy, BFloatTy, FloatTy, DoubleTy, MetadataTy,		Type VoidTy, LabelTy, HalfTy, BFloatTy, FloatTy, DoubleTy, MetadataTy,
TokenTy;		TokenTy;
Type X86_FP80Ty, FP128Ty, PPC_FP128Ty, X86_MMXTy;		Type X86_FP80Ty, FP128Ty, PPC_FP128Ty, X86_MMXTy, X86_AMXTy;
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for member 'X86_FP80Ty' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for member 'PPC_FP128Ty' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for member 'X86_MMXTy' [readability-identifier-naming] not useful clang-tidy: warning: invalid case style for member 'X86_AMXTy' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for member 'X86_FP80Ty' [readability-identifier-naming]…
IntegerType Int1Ty, Int8Ty, Int16Ty, Int32Ty, Int64Ty, Int128Ty;		IntegerType Int1Ty, Int8Ty, Int16Ty, Int32Ty, Int64Ty, Int128Ty;

BumpPtrAllocator Alloc;		BumpPtrAllocator Alloc;
UniqueStringSaver Saver{Alloc};		UniqueStringSaver Saver{Alloc};

DenseMap<unsigned, IntegerType*> IntegerTypes;		DenseMap<unsigned, IntegerType*> IntegerTypes;

using FunctionTypeSet = DenseSet<FunctionType *, FunctionTypeKeyInfo>;		using FunctionTypeSet = DenseSet<FunctionType *, FunctionTypeKeyInfo>;
▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

llvm/lib/IR/LLVMContextImpl.cpp

	Show All 16 Lines
	#include "llvm/IR/Type.h"			#include "llvm/IR/Type.h"
	#include "llvm/Support/ManagedStatic.h"			#include "llvm/Support/ManagedStatic.h"
	#include <cassert>			#include <cassert>
	#include <utility>			#include <utility>

	using namespace llvm;			using namespace llvm;

	LLVMContextImpl::LLVMContextImpl(LLVMContext &C)			LLVMContextImpl::LLVMContextImpl(LLVMContext &C)
	: DiagHandler(std::make_unique<DiagnosticHandler>()),			: DiagHandler(std::make_unique<DiagnosticHandler>()),
				Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - : DiagHandler(std::make_unique<DiagnosticHandler>()), - VoidTy(C, Type::VoidTyID), - LabelTy(C, Type::LabelTyID), - HalfTy(C, Type::HalfTyID), - BFloatTy(C, Type::BFloatTyID), - FloatTy(C, Type::FloatTyID), - DoubleTy(C, Type::DoubleTyID), - MetadataTy(C, Type::MetadataTyID), - TokenTy(C, Type::TokenTyID), - X86_FP80Ty(C, Type::X86_FP80TyID), 19 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - : DiagHandler(std::make_unique<DiagnosticHandler>…
	VoidTy(C, Type::VoidTyID),			VoidTy(C, Type::VoidTyID),
	LabelTy(C, Type::LabelTyID),			LabelTy(C, Type::LabelTyID),
	HalfTy(C, Type::HalfTyID),			HalfTy(C, Type::HalfTyID),
	BFloatTy(C, Type::BFloatTyID),			BFloatTy(C, Type::BFloatTyID),
	FloatTy(C, Type::FloatTyID),			FloatTy(C, Type::FloatTyID),
	DoubleTy(C, Type::DoubleTyID),			DoubleTy(C, Type::DoubleTyID),
	MetadataTy(C, Type::MetadataTyID),			MetadataTy(C, Type::MetadataTyID),
	TokenTy(C, Type::TokenTyID),			TokenTy(C, Type::TokenTyID),
	X86_FP80Ty(C, Type::X86_FP80TyID),			X86_FP80Ty(C, Type::X86_FP80TyID),
	FP128Ty(C, Type::FP128TyID),			FP128Ty(C, Type::FP128TyID),
	PPC_FP128Ty(C, Type::PPC_FP128TyID),			PPC_FP128Ty(C, Type::PPC_FP128TyID),
	X86_MMXTy(C, Type::X86_MMXTyID),			X86_MMXTy(C, Type::X86_MMXTyID),
				X86_AMXTy(C, Type::X86_AMXTyID),
	Int1Ty(C, 1),			Int1Ty(C, 1),
	Int8Ty(C, 8),			Int8Ty(C, 8),
	Int16Ty(C, 16),			Int16Ty(C, 16),
	Int32Ty(C, 32),			Int32Ty(C, 32),
	Int64Ty(C, 64),			Int64Ty(C, 64),
	Int128Ty(C, 128) {}			Int128Ty(C, 128) {}

	LLVMContextImpl::~LLVMContextImpl() {			LLVMContextImpl::~LLVMContextImpl() {
	▲ Show 20 Lines • Show All 198 Lines • Show Last 20 Lines

llvm/lib/IR/Type.cpp

Show First 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	Type *Type::getPrimitiveType(LLVMContext &C, TypeID IDNumber) {
case FloatTyID : return getFloatTy(C);		case FloatTyID : return getFloatTy(C);
case DoubleTyID : return getDoubleTy(C);		case DoubleTyID : return getDoubleTy(C);
case X86_FP80TyID : return getX86_FP80Ty(C);		case X86_FP80TyID : return getX86_FP80Ty(C);
case FP128TyID : return getFP128Ty(C);		case FP128TyID : return getFP128Ty(C);
case PPC_FP128TyID : return getPPC_FP128Ty(C);		case PPC_FP128TyID : return getPPC_FP128Ty(C);
case LabelTyID : return getLabelTy(C);		case LabelTyID : return getLabelTy(C);
case MetadataTyID : return getMetadataTy(C);		case MetadataTyID : return getMetadataTy(C);
case X86_MMXTyID : return getX86_MMXTy(C);		case X86_MMXTyID : return getX86_MMXTy(C);
		case X86_AMXTyID : return getX86_AMXTy(C);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case X86_AMXTyID : return getX86_AMXTy(C); + case X86_AMXTyID: + return getX86_AMXTy(C); Lint: Pre-merge checks: clang-format: please reformat the code ``` - case X86_AMXTyID : return getX86_AMXTy(C); +…
case TokenTyID : return getTokenTy(C);		case TokenTyID : return getTokenTy(C);
default:		default:
return nullptr;		return nullptr;
}		}
}		}

bool Type::isIntegerTy(unsigned Bitwidth) const {		bool Type::isIntegerTy(unsigned Bitwidth) const {
return isIntegerTy() && cast<IntegerType>(this)->getBitWidth() == Bitwidth;		return isIntegerTy() && cast<IntegerType>(this)->getBitWidth() == Bitwidth;
Show All 16 Lines	bool Type::canLosslesslyBitCastTo(Type *Ty) const {
// 64-bit fixed width vector types can be losslessly converted to x86mmx.		// 64-bit fixed width vector types can be losslessly converted to x86mmx.
if (((isa<FixedVectorType>(this)) && Ty->isX86_MMXTy()) &&		if (((isa<FixedVectorType>(this)) && Ty->isX86_MMXTy()) &&
getPrimitiveSizeInBits().getFixedSize() == 64)		getPrimitiveSizeInBits().getFixedSize() == 64)
return true;		return true;
if ((isX86_MMXTy() && isa<FixedVectorType>(Ty)) &&		if ((isX86_MMXTy() && isa<FixedVectorType>(Ty)) &&
Ty->getPrimitiveSizeInBits().getFixedSize() == 64)		Ty->getPrimitiveSizeInBits().getFixedSize() == 64)
return true;		return true;

		// 8192-bit fixed width vector types can be losslessly converted to x86amx.
		if (((isa<FixedVectorType>(this)) && Ty->isX86_AMXTy()) &&
		getPrimitiveSizeInBits().getFixedSize() == 8192)
		return true;
		if ((isX86_AMXTy() && isa<FixedVectorType>(Ty)) &&
		Ty->getPrimitiveSizeInBits().getFixedSize() == 8192)
		return true;

// At this point we have only various mismatches of the first class types		// At this point we have only various mismatches of the first class types
// remaining and ptr->ptr. Just select the lossless conversions. Everything		// remaining and ptr->ptr. Just select the lossless conversions. Everything
// else is not lossless. Conservatively assume we can't losslessly convert		// else is not lossless. Conservatively assume we can't losslessly convert
// between pointers with different address spaces.		// between pointers with different address spaces.
if (auto *PTy = dyn_cast<PointerType>(this)) {		if (auto *PTy = dyn_cast<PointerType>(this)) {
if (auto *OtherPTy = dyn_cast<PointerType>(Ty))		if (auto *OtherPTy = dyn_cast<PointerType>(Ty))
return PTy->getAddressSpace() == OtherPTy->getAddressSpace();		return PTy->getAddressSpace() == OtherPTy->getAddressSpace();
return false;		return false;
Show All 23 Lines	TypeSize Type::getPrimitiveSizeInBits() const {
case Type::HalfTyID: return TypeSize::Fixed(16);		case Type::HalfTyID: return TypeSize::Fixed(16);
case Type::BFloatTyID: return TypeSize::Fixed(16);		case Type::BFloatTyID: return TypeSize::Fixed(16);
case Type::FloatTyID: return TypeSize::Fixed(32);		case Type::FloatTyID: return TypeSize::Fixed(32);
case Type::DoubleTyID: return TypeSize::Fixed(64);		case Type::DoubleTyID: return TypeSize::Fixed(64);
case Type::X86_FP80TyID: return TypeSize::Fixed(80);		case Type::X86_FP80TyID: return TypeSize::Fixed(80);
case Type::FP128TyID: return TypeSize::Fixed(128);		case Type::FP128TyID: return TypeSize::Fixed(128);
case Type::PPC_FP128TyID: return TypeSize::Fixed(128);		case Type::PPC_FP128TyID: return TypeSize::Fixed(128);
case Type::X86_MMXTyID: return TypeSize::Fixed(64);		case Type::X86_MMXTyID: return TypeSize::Fixed(64);
		case Type::X86_AMXTyID: return TypeSize::Fixed(8192);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case Type::X86_AMXTyID: return TypeSize::Fixed(8192); + case Type::X86_AMXTyID: + return TypeSize::Fixed(8192); Lint: Pre-merge checks: clang-format: please reformat the code ``` - case Type::X86_AMXTyID: return TypeSize::Fixed…
case Type::IntegerTyID:		case Type::IntegerTyID:
return TypeSize::Fixed(cast<IntegerType>(this)->getBitWidth());		return TypeSize::Fixed(cast<IntegerType>(this)->getBitWidth());
case Type::FixedVectorTyID:		case Type::FixedVectorTyID:
case Type::ScalableVectorTyID: {		case Type::ScalableVectorTyID: {
const VectorType *VTy = cast<VectorType>(this);		const VectorType *VTy = cast<VectorType>(this);
ElementCount EC = VTy->getElementCount();		ElementCount EC = VTy->getElementCount();
TypeSize ETS = VTy->getElementType()->getPrimitiveSizeInBits();		TypeSize ETS = VTy->getElementType()->getPrimitiveSizeInBits();
assert(!ETS.isScalable() && "Vector type should have fixed-width elements");		assert(!ETS.isScalable() && "Vector type should have fixed-width elements");
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines
Type *Type::getFloatTy(LLVMContext &C) { return &C.pImpl->FloatTy; }		Type *Type::getFloatTy(LLVMContext &C) { return &C.pImpl->FloatTy; }
Type *Type::getDoubleTy(LLVMContext &C) { return &C.pImpl->DoubleTy; }		Type *Type::getDoubleTy(LLVMContext &C) { return &C.pImpl->DoubleTy; }
Type *Type::getMetadataTy(LLVMContext &C) { return &C.pImpl->MetadataTy; }		Type *Type::getMetadataTy(LLVMContext &C) { return &C.pImpl->MetadataTy; }
Type *Type::getTokenTy(LLVMContext &C) { return &C.pImpl->TokenTy; }		Type *Type::getTokenTy(LLVMContext &C) { return &C.pImpl->TokenTy; }
Type *Type::getX86_FP80Ty(LLVMContext &C) { return &C.pImpl->X86_FP80Ty; }		Type *Type::getX86_FP80Ty(LLVMContext &C) { return &C.pImpl->X86_FP80Ty; }
Type *Type::getFP128Ty(LLVMContext &C) { return &C.pImpl->FP128Ty; }		Type *Type::getFP128Ty(LLVMContext &C) { return &C.pImpl->FP128Ty; }
Type *Type::getPPC_FP128Ty(LLVMContext &C) { return &C.pImpl->PPC_FP128Ty; }		Type *Type::getPPC_FP128Ty(LLVMContext &C) { return &C.pImpl->PPC_FP128Ty; }
Type *Type::getX86_MMXTy(LLVMContext &C) { return &C.pImpl->X86_MMXTy; }		Type *Type::getX86_MMXTy(LLVMContext &C) { return &C.pImpl->X86_MMXTy; }
		Type *Type::getX86_AMXTy(LLVMContext &C) { return &C.pImpl->X86_AMXTy; }
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'getX86_AMXTy' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'getX86_AMXTy' [readability-identifier…

IntegerType *Type::getInt1Ty(LLVMContext &C) { return &C.pImpl->Int1Ty; }		IntegerType *Type::getInt1Ty(LLVMContext &C) { return &C.pImpl->Int1Ty; }
IntegerType *Type::getInt8Ty(LLVMContext &C) { return &C.pImpl->Int8Ty; }		IntegerType *Type::getInt8Ty(LLVMContext &C) { return &C.pImpl->Int8Ty; }
IntegerType *Type::getInt16Ty(LLVMContext &C) { return &C.pImpl->Int16Ty; }		IntegerType *Type::getInt16Ty(LLVMContext &C) { return &C.pImpl->Int16Ty; }
IntegerType *Type::getInt32Ty(LLVMContext &C) { return &C.pImpl->Int32Ty; }		IntegerType *Type::getInt32Ty(LLVMContext &C) { return &C.pImpl->Int32Ty; }
IntegerType *Type::getInt64Ty(LLVMContext &C) { return &C.pImpl->Int64Ty; }		IntegerType *Type::getInt64Ty(LLVMContext &C) { return &C.pImpl->Int64Ty; }
IntegerType *Type::getInt128Ty(LLVMContext &C) { return &C.pImpl->Int128Ty; }		IntegerType *Type::getInt128Ty(LLVMContext &C) { return &C.pImpl->Int128Ty; }

Show All 28 Lines
PointerType *Type::getPPC_FP128PtrTy(LLVMContext &C, unsigned AS) {		PointerType *Type::getPPC_FP128PtrTy(LLVMContext &C, unsigned AS) {
return getPPC_FP128Ty(C)->getPointerTo(AS);		return getPPC_FP128Ty(C)->getPointerTo(AS);
}		}

PointerType *Type::getX86_MMXPtrTy(LLVMContext &C, unsigned AS) {		PointerType *Type::getX86_MMXPtrTy(LLVMContext &C, unsigned AS) {
return getX86_MMXTy(C)->getPointerTo(AS);		return getX86_MMXTy(C)->getPointerTo(AS);
}		}

		PointerType *Type::getX86_AMXPtrTy(LLVMContext &C, unsigned AS) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'getX86_AMXPtrTy' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'getX86_AMXPtrTy' [readability-identifier…
		return getX86_AMXTy(C)->getPointerTo(AS);
		}

PointerType *Type::getIntNPtrTy(LLVMContext &C, unsigned N, unsigned AS) {		PointerType *Type::getIntNPtrTy(LLVMContext &C, unsigned N, unsigned AS) {
return getIntNTy(C, N)->getPointerTo(AS);		return getIntNTy(C, N)->getPointerTo(AS);
}		}

PointerType *Type::getInt1PtrTy(LLVMContext &C, unsigned AS) {		PointerType *Type::getInt1PtrTy(LLVMContext &C, unsigned AS) {
return getInt1Ty(C)->getPointerTo(AS);		return getInt1Ty(C)->getPointerTo(AS);
}		}

▲ Show 20 Lines • Show All 459 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

Show First 20 Lines • Show All 4,588 Lines • ▼ Show 20 Lines	case Intrinsic::x86_tileloadd64_internal: {
Node->getOperand(3),		Node->getOperand(3),
Base,		Base,
Scale,		Scale,
Index,		Index,
Disp,		Disp,
Segment,		Segment,
CFG,		CFG,
Chain};		Chain};
CNode = CurDAG->getMachineNode(Opc, dl, {MVT::v256i32, MVT::Other}, Ops);		CNode = CurDAG->getMachineNode(Opc, dl, {MVT::x86amx, MVT::Other}, Ops);
ReplaceNode(Node, CNode);		ReplaceNode(Node, CNode);
return;		return;
}		}
case Intrinsic::x86_tdpbssd_internal: {		case Intrinsic::x86_tdpbssd_internal: {
if (!Subtarget->hasAMXTILE())		if (!Subtarget->hasAMXTILE())
break;		break;
SDValue Chain = Node->getOperand(0);		SDValue Chain = Node->getOperand(0);
unsigned Opc = X86::PTDPBSSDV;		unsigned Opc = X86::PTDPBSSDV;
SDValue CFG = CurDAG->getRegister(0, MVT::Untyped);		SDValue CFG = CurDAG->getRegister(0, MVT::Untyped);
SDValue Ops[] = {Node->getOperand(2),		SDValue Ops[] = {Node->getOperand(2),
Node->getOperand(3),		Node->getOperand(3),
Node->getOperand(4),		Node->getOperand(4),
Node->getOperand(5),		Node->getOperand(5),
Node->getOperand(6),		Node->getOperand(6),
Node->getOperand(7),		Node->getOperand(7),
CFG,		CFG,
Chain};		Chain};
MachineSDNode *CNode =		MachineSDNode *CNode =
CurDAG->getMachineNode(Opc, dl, {MVT::v256i32, MVT::Other}, Ops);		CurDAG->getMachineNode(Opc, dl, {MVT::x86amx, MVT::Other}, Ops);
ReplaceNode(Node, CNode);		ReplaceNode(Node, CNode);
return;		return;
}		}
}		}
break;		break;
}		}
case ISD::INTRINSIC_VOID: {		case ISD::INTRINSIC_VOID: {
unsigned IntNo = Node->getConstantOperandVal(1);		unsigned IntNo = Node->getConstantOperandVal(1);
▲ Show 20 Lines • Show All 1,359 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,892 Lines • ▼ Show 20 Lines	if (!Subtarget.useSoftFloat() && Subtarget.hasVLX()) {
}		}

setOperationAction(ISD::TRUNCATE, MVT::v16i32, Custom);		setOperationAction(ISD::TRUNCATE, MVT::v16i32, Custom);
setOperationAction(ISD::TRUNCATE, MVT::v8i64, Custom);		setOperationAction(ISD::TRUNCATE, MVT::v8i64, Custom);
setOperationAction(ISD::TRUNCATE, MVT::v16i64, Custom);		setOperationAction(ISD::TRUNCATE, MVT::v16i64, Custom);
}		}

if (Subtarget.hasAMXTILE()) {		if (Subtarget.hasAMXTILE()) {
addRegisterClass(MVT::v256i32, &X86::TILERegClass);		addRegisterClass(MVT::x86amx, &X86::TILERegClass);
}		}

// We want to custom lower some of our intrinsics.		// We want to custom lower some of our intrinsics.
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);
if (!Subtarget.is64Bit()) {		if (!Subtarget.is64Bit()) {
setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i64, Custom);		setOperationAction(ISD::INTRINSIC_W_CHAIN, MVT::i64, Custom);
▲ Show 20 Lines • Show All 3,430 Lines • ▼ Show 20 Lines	bool X86TargetLowering::canMergeStoresTo(unsigned AddressSpace, EVT MemVT,
if (NoFloat) {		if (NoFloat) {
unsigned MaxIntSize = Subtarget.is64Bit() ? 64 : 32;		unsigned MaxIntSize = Subtarget.is64Bit() ? 64 : 32;
return (MemVT.getSizeInBits() <= MaxIntSize);		return (MemVT.getSizeInBits() <= MaxIntSize);
}		}
// Make sure we don't merge greater than our preferred vector		// Make sure we don't merge greater than our preferred vector
// width.		// width.
if (MemVT.getSizeInBits() > Subtarget.getPreferVectorWidth())		if (MemVT.getSizeInBits() > Subtarget.getPreferVectorWidth())
return false;		return false;

// Don't merge to x86 amx tile, as we only map MVT::v256i32
// to x86 amx tile on amx intrinsics.
if (MemVT == MVT::v256i32)
return false;

return true;		return true;
		craig.topperUnsubmitted Not Done Reply Inline Actions Should this just be deleted? craig.topper: Should this just be deleted?
}		}

bool X86TargetLowering::isCtlzFast() const {		bool X86TargetLowering::isCtlzFast() const {
return Subtarget.hasFastLZCNT();		return Subtarget.hasFastLZCNT();
}		}

bool X86TargetLowering::isMaskAndCmp0FoldingBeneficial(		bool X86TargetLowering::isMaskAndCmp0FoldingBeneficial(
const Instruction &AndI) const {		const Instruction &AndI) const {
▲ Show 20 Lines • Show All 32,758 Lines • Show Last 20 Lines

llvm/lib/Target/X86/X86LowerAMXType.cpp

	//===- llvm/CodeGen/TileShapeInfo.h - ---------------------------- C++ --===//			//===- llvm/CodeGen/TileShapeInfo.h - ---------------------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	/// \file Pass to transform <256 x i32>			/// \file Pass to transform <256 x i32> load/store
	/// <256 x i32> is mapped to AMX tile register on X86, AMX instruction set only			/// <256 x i32> is bitcasted to x86_amx on X86, and AMX instruction set only
	/// provides simple operation on tile register. The basic elementwise operation			/// provides simple operation on x86_amx. The basic elementwise operation
	/// is not supported by AMX. Since we define the AMX tile as vector <256 x i32>			/// is not supported by AMX. Since x86_amx is bitcasted from vector <256 x i32>
	/// and only AMX intrinsics can operate on the type, we need transform			/// and only AMX intrinsics can operate on the type, we need transform
	/// load/store <256 x i32> instruction to AMX load/store. Besides, we split			/// load/store <256 x i32> instruction to AMX load/store. Otherwise we are not
	/// <256 x i32> to 2 <128 x i32> if the vector is not used or defined by AMX			/// able to lower the bitcast instruction to X86 instruction.
	/// intrinsics, so that in instruction selection it can be lowered to proper
	/// size which HW can support.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	#include "X86.h"			#include "X86.h"
	#include "llvm/ADT/DenseSet.h"			#include "llvm/ADT/PostOrderIterator.h"
				#include "llvm/ADT/SmallSet.h"
	#include "llvm/Analysis/OptimizationRemarkEmitter.h"			#include "llvm/Analysis/OptimizationRemarkEmitter.h"
	#include "llvm/Analysis/TargetTransformInfo.h"			#include "llvm/Analysis/TargetTransformInfo.h"
	#include "llvm/CodeGen/Passes.h"			#include "llvm/CodeGen/Passes.h"
	#include "llvm/CodeGen/ValueTypes.h"			#include "llvm/CodeGen/ValueTypes.h"
	#include "llvm/IR/DataLayout.h"			#include "llvm/IR/DataLayout.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include "llvm/IR/IRBuilder.h"			#include "llvm/IR/IRBuilder.h"
	#include "llvm/IR/Instructions.h"			#include "llvm/IR/Instructions.h"
	#include "llvm/IR/IntrinsicInst.h"			#include "llvm/IR/IntrinsicInst.h"
	#include "llvm/IR/IntrinsicsX86.h"			#include "llvm/IR/IntrinsicsX86.h"
				#include "llvm/IR/PatternMatch.h"
	#include "llvm/InitializePasses.h"			#include "llvm/InitializePasses.h"
	#include "llvm/Pass.h"			#include "llvm/Pass.h"

	using namespace llvm;			using namespace llvm;
				using namespace PatternMatch;

	#define DEBUG_TYPE "lower-amx-type"			#define DEBUG_TYPE "lower-amx-type"

	namespace {			static AllocaInst CreateAllocaInst(IRBuilder<> &Builder, BasicBlock BB) {
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: invalid case style for function 'CreateAllocaInst' [readability-identifier-naming] not useful Lint: Pre-merge checks: clang-tidy: warning: invalid case style for function 'CreateAllocaInst' [readability-identifier…
	class X86LowerAMXType {			Function &F = *BB->getParent();
	Function &Func;			Module *M = BB->getModule();
	const DataLayout &DL;			const DataLayout &DL = M->getDataLayout();
	DenseSet<Instruction *> LDSet;
	DenseSet<Instruction *> STSet;			Type *V256I32Ty = VectorType::get(Builder.getInt32Ty(), 256, false);
	DenseMap<Value , std::pair<LoadInst , LoadInst *>> LoadMap;			auto AllocaAlignment = DL.getPrefTypeAlign(V256I32Ty);
				unsigned AllocaAS = DL.getAllocaAddrSpace();
	public:			AllocaInst *AllocaRes =
	X86LowerAMXType(Function &F) : Func(F), DL(F.getParent()->getDataLayout()) {}			new AllocaInst(V256I32Ty, AllocaAS, "", &F.getEntryBlock().front());
	bool visit();			AllocaRes->setAlignment(AllocaAlignment);
	bool visitLD();			return AllocaRes;
	bool visitST();			}
	void splitST(Instruction *Inst);
	void splitLD(Instruction *Inst);
	};

	// Split v256i32 load/store to 2 v128i32, so that ISel can			static std::pair<Value , Value > getShape(IntrinsicInst *II, unsigned OpNo) {
	// lower it to proper vector size.			Value Row = nullptr, Col = nullptr;
	void X86LowerAMXType::splitST(Instruction *Inst) {
	StoreInst *ST = dyn_cast<StoreInst>(Inst);
	IRBuilder<> Builder(ST);
	LLVMContext &Ctx = Builder.getContext();
	Type *Ty = ST->getValueOperand()->getType();
	EVT VT = EVT::getEVT(Ty);
	EVT HalfVT = VT.getHalfNumVectorElementsVT(Ctx);
	Type *HalfTy = HalfVT.getTypeForEVT(Ctx);

	LoadInst Lo, Hi;
	std::tie(Lo, Hi) = LoadMap[ST->getValueOperand()];
	Value *Ptr = ST->getPointerOperand();
	PointerType *HalfPtrTy = HalfTy->getPointerTo(ST->getPointerAddressSpace());
	Value *HalfPtr = Builder.CreateBitCast(Ptr, HalfPtrTy);
	// The HW require the alignment for AMX tile is 64, but front-end generate
	// code for the vector alignment which is the vector size.
	uint64_t HalfTySize = HalfTy->getPrimitiveSizeInBits().getFixedSize() / 8;
	Align Alignment = std::min(Lo->getAlign(), Align(HalfTySize));
	Builder.CreateAlignedStore(Lo, HalfPtr, Alignment, ST->isVolatile());

	HalfPtr = Builder.CreateGEP(HalfTy, HalfPtr, Builder.getInt32(1));
	Builder.CreateAlignedStore(Hi, HalfPtr, Alignment, ST->isVolatile());
	}

	bool X86LowerAMXType::visitST() {
	if (STSet.empty())
	return false;
	for (auto *Inst : STSet) {
	Value Row, Col;
	const IntrinsicInst *II = dyn_cast<IntrinsicInst>(Inst->getOperand(0));
	if (!II)
	Row = Col = nullptr;
	else {
	switch (II->getIntrinsicID()) {			switch (II->getIntrinsicID()) {
	default:			default:
	Row = Col = nullptr;			llvm_unreachable("Expect amx intrinsics");
	break;
	case Intrinsic::x86_tileloadd64_internal:			case Intrinsic::x86_tileloadd64_internal:
	case Intrinsic::x86_tdpbssd_internal: {			case Intrinsic::x86_tilestored64_internal: {
	Row = II->getArgOperand(0);			Row = II->getArgOperand(0);
	Col = II->getArgOperand(1);			Col = II->getArgOperand(1);
	break;			break;
	}			}
	}			// a * b + c
	}			// The shape depends on which operand.
	if (!Row) {
	splitST(Inst);
	continue;
	}
	IRBuilder<> Builder(Inst);
	LLVMContext &Ctx = Builder.getContext();
	// Use the maximun column as stride. It must be the same with load stride.
	Value *Stride = Builder.getInt64(64);
	Value *I8Ptr =
	Builder.CreateBitCast(Inst->getOperand(1), Type::getInt8PtrTy(Ctx));
	std::array<Value *, 5> Args = {Row, Col, I8Ptr, Stride,
	Inst->getOperand(0)};

	Builder.CreateIntrinsic(Intrinsic::x86_tilestored64_internal, None, Args);
	}
	return true;
	}

	void X86LowerAMXType::splitLD(Instruction *Inst) {
	LoadInst *LD = dyn_cast<LoadInst>(Inst);
	IRBuilder<> Builder(LD);
	LLVMContext &Ctx = Builder.getContext();
	Type *Ty = LD->getType();
	EVT VT = EVT::getEVT(Ty);
	EVT HalfVT = VT.getHalfNumVectorElementsVT(Ctx);
	Type *HalfTy = HalfVT.getTypeForEVT(Ctx);

	Value *Ptr = LD->getPointerOperand();
	PointerType *HalfPtrTy = HalfTy->getPointerTo(LD->getPointerAddressSpace());
	Value *HalfPtr = Builder.CreateBitCast(Ptr, HalfPtrTy);
	// The HW require the alignment for AMX tile is 64, but front-end generate
	// code for the vector alignment which is the vector size.
	uint64_t HalfTySize = HalfTy->getPrimitiveSizeInBits().getFixedSize() / 8;
	Align Alignment = std::min(LD->getAlign(), Align(HalfTySize));
	auto *Lo =
	Builder.CreateAlignedLoad(HalfTy, HalfPtr, Alignment, LD->isVolatile());

	HalfPtr = Builder.CreateGEP(HalfTy, HalfPtr, Builder.getInt32(1));
	auto *Hi =
	Builder.CreateAlignedLoad(HalfTy, HalfPtr, Alignment, LD->isVolatile());

	LoadMap[Inst] = std::make_pair(Lo, Hi);
	}

	bool X86LowerAMXType::visitLD() {
	if (LDSet.empty())
	return false;
	for (auto &Inst : LDSet) {
	int Count = 0;
	Value *NewInst = nullptr;
	// The user should be all AMX intrinsics or all LLVM instruction.
	// Don't support it is used by both AMX intrinsics and LLVM instructions.
	for (auto I = Inst->use_begin(), E = Inst->use_end(); I != E;) {
	Use &U = *I++;
	const IntrinsicInst *II = dyn_cast<IntrinsicInst>(U.getUser());
	if (!II) {
	Count++;
	continue;
	}
	if (NewInst)
	continue;
	Value Row, Col;
	switch (II->getIntrinsicID()) {
	default:
	report_fatal_error("Non-AMX intrinsic use tile type.");
	break;
	case Intrinsic::x86_tdpbssd_internal: {			case Intrinsic::x86_tdpbssd_internal: {
	unsigned OpNo = U.getOperandNo();
	switch (OpNo) {			switch (OpNo) {
	case 3:			case 3:
	Row = II->getArgOperand(0);			Row = II->getArgOperand(0);
	Col = II->getArgOperand(1);			Col = II->getArgOperand(1);
	break;			break;
	case 4:			case 4:
	Row = II->getArgOperand(0);			Row = II->getArgOperand(0);
	Col = II->getArgOperand(2);			Col = II->getArgOperand(2);
	break;			break;
	case 5:			case 5:
	Row = II->getArgOperand(2);			Row = II->getArgOperand(2);
	Col = II->getArgOperand(1);			Col = II->getArgOperand(1);
	break;			break;
	}			}
	break;			break;
	}			}
	case Intrinsic::x86_tilestored64_internal: {
	Row = II->getArgOperand(0);
	Col = II->getArgOperand(1);
	break;
	}			}

				return std::make_pair(Row, Col);
	}			}
	assert(Count == 0 && "Can NOT mix amx intrinsic and LLVM instruction");
	// FIXME: The shape def should be ahead of load.			// %1 = load x86_amx, x86_amx* %0, align 64
	IRBuilder<> Builder(Inst);			// %2 = call x86_amx @llvm.x86.tdpbssd.internal(%1, %1, %1, ...)
	LLVMContext &Ctx = Builder.getContext();			// -->
				// %1 = call x86_amx @llvm.x86.tileloadd64.internal()
				// %2 = call x86_amx @llvm.x86.tdpbssd.internal(%1, %1, %1, ...)
				static void transformTileLoad(LoadInst *LD) {
				Value Row = nullptr, Col = nullptr;
				Use &U = *(LD->use_begin());
				unsigned OpNo = U.getOperandNo();
				auto *II = cast<IntrinsicInst>(U.getUser());
				std::tie(Row, Col) = getShape(II, OpNo);
				IRBuilder<> Builder(LD);
	// Use the maximun column as stride.			// Use the maximun column as stride.
	Value *Stride = Builder.getInt64(64);			Value *Stride = Builder.getInt64(64);
	Value *I8Ptr =			Value *I8Ptr =
	Builder.CreateBitCast(Inst->getOperand(0), Type::getInt8PtrTy(Ctx));			Builder.CreateBitCast(LD->getOperand(0), Builder.getInt8PtrTy());
	std::array<Value *, 4> Args = {Row, Col, I8Ptr, Stride};			std::array<Value *, 4> Args = {Row, Col, I8Ptr, Stride};

	NewInst = Builder.CreateIntrinsic(Intrinsic::x86_tileloadd64_internal,			Value *NewInst =
	None, Args);			Builder.CreateIntrinsic(Intrinsic::x86_tileloadd64_internal, None, Args);
				LD->replaceAllUsesWith(NewInst);
	Inst->replaceAllUsesWith(NewInst);
	}			}
	if (!NewInst)
	splitLD(Inst);			// %src = load <256 x i32>, <256 x i32>* %addr, align 64
				// %2 = bitcast <256 x i32> %src to x86_amx
				// -->
				// %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col,
				// i8* %addr, i64 %stride64)
				static void combineLoadBitcast(LoadInst LD, BitCastInst Bitcast) {
				Value Row = nullptr, Col = nullptr;
				Use &U = *(Bitcast->use_begin());
				unsigned OpNo = U.getOperandNo();
				auto *II = cast<IntrinsicInst>(U.getUser());
				std::tie(Row, Col) = getShape(II, OpNo);
				IRBuilder<> Builder(Bitcast);
				// Use the maximun column as stride.
				Value *Stride = Builder.getInt64(64);
				Value *I8Ptr =
				Builder.CreateBitCast(LD->getOperand(0), Builder.getInt8PtrTy());
				std::array<Value *, 4> Args = {Row, Col, I8Ptr, Stride};

				Value *NewInst =
				Builder.CreateIntrinsic(Intrinsic::x86_tileloadd64_internal, None, Args);
				Bitcast->replaceAllUsesWith(NewInst);
				}

				// %src = call x86_amx @llvm.x86.tileloadd64.internal(%row, %col, %addr,
				// %stride);
				// %13 = bitcast x86_amx %src to <256 x i32>
				// store <256 x i32> %13, <256 x i32>* %addr, align 64
				// -->
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %13)
				static void combineBitcastStore(BitCastInst Bitcast, StoreInst ST) {

				auto *Tile = Bitcast->getOperand(0);
				auto *II = cast<IntrinsicInst>(Tile);
				// Tile is output from AMX intrinsic. The first operand of the
				// intrinsic is row, the second operand of the intrinsic is column.
				Value *Row = II->getOperand(0);
				Value *Col = II->getOperand(1);
				IRBuilder<> Builder(ST);
				// Use the maximum column as stride. It must be the same with load
				// stride.
				pengfeiUnsubmitted Not Done Reply Inline Actions Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here. pengfei: Currently, we don't have HW type for v256i32. I think 64 bytes(512bits) should be enough here.
				Value *Stride = Builder.getInt64(64);
				Value *I8Ptr =
				Builder.CreateBitCast(ST->getOperand(1), Builder.getInt8PtrTy());
				std::array<Value *, 5> Args = {Row, Col, I8Ptr, Stride, Tile};
				Builder.CreateIntrinsic(Intrinsic::x86_tilestored64_internal, None, Args);
				if (Bitcast->hasOneUse())
				return;
				// %13 = bitcast x86_amx %src to <256 x i32>
				// store <256 x i32> %13, <256 x i32>* %addr, align 64
				// %add = <256 x i32> %13, <256 x i32> %src2
				// -->
				// %13 = bitcast x86_amx %src to <256 x i32>
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %13)
				// %14 = load <256 x i32>, %addr
				// %add = <256 x i32> %14, <256 x i32> %src2
				Value *Vec = Builder.CreateLoad(Bitcast->getType(), ST->getOperand(1));
				Bitcast->replaceAllUsesWith(Vec);
				}

				// transform bitcast to <store, load> instructions.
				static bool transformBitcast(BitCastInst *Bitcast) {
				IRBuilder<> Builder(Bitcast);
				pengfeiUnsubmitted Not Done Reply Inline Actions I think we'd better to check exceptions. E.g. default: llvm_unreachable(""); case Intrinsic::x86_tileloadd64_internal: case Intrinsic::x86_tdpbssd_internal: case Intrinsic::x86_tilestored64_internal: Row = II->getArgOperand(0); Col = II->getArgOperand(1); break; pengfei: I think we'd better to check exceptions. E.g. ``` default: llvm_unreachable(""); case…
				AllocaInst *AllocaAddr;
				Value I8Ptr, Stride;
				auto *Src = Bitcast->getOperand(0);

				auto Prepare = [&]() {
				AllocaAddr = CreateAllocaInst(Builder, Bitcast->getParent());
				I8Ptr = Builder.CreateBitCast(AllocaAddr, Builder.getInt8PtrTy());
				Stride = Builder.getInt64(64);
				};

				if (Bitcast->getType()->isX86_AMXTy()) {
				// %2 = bitcast <256 x i32> %src to x86_amx
				// -->
				// %addr = alloca <256 x i32>, align 1024
				// store <256 x i32> %src, <256 x i32>* %addr, align 1024
				// %addr2 = bitcast <256 x i32>* to i8*
				// %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col,
				// i8* %addr2,
				// i64 64)
				Use &U = *(Bitcast->use_begin());
				unsigned OpNo = U.getOperandNo();
				auto *II = dyn_cast<IntrinsicInst>(U.getUser());
				if (!II)
				return false; // May be bitcast from x86amx to <256 x i32>.
				Prepare();
				Builder.CreateStore(Src, AllocaAddr);
				// TODO we can pick an constant operand for the shape.
				Value Row = nullptr, Col = nullptr;
				std::tie(Row, Col) = getShape(II, OpNo);
				std::array<Value *, 4> Args = {Row, Col, I8Ptr, Stride};
				Value *NewInst = Builder.CreateIntrinsic(
				Intrinsic::x86_tileloadd64_internal, None, Args);
				Bitcast->replaceAllUsesWith(NewInst);
				} else {
				// %2 = bitcast x86_amx %src to <256 x i32>
				// -->
				// %addr = alloca <256 x i32>, align 1024
				// %addr2 = bitcast <256 x i32>* to i8*
				// call void @llvm.x86.tilestored64.internal(i16 %row, i16 %col,
				// i8* %addr2, i64 %stride)
				// %2 = load <256 x i32>, <256 x i32>* %addr, align 1024
				auto *II = dyn_cast<IntrinsicInst>(Src);
				if (!II)
				return false; // May be bitcast from <256 x i32> to x86amx.
				Prepare();
				Value *Row = II->getOperand(0);
				Value *Col = II->getOperand(1);
				std::array<Value *, 5> Args = {Row, Col, I8Ptr, Stride, Src};
				Builder.CreateIntrinsic(Intrinsic::x86_tilestored64_internal, None, Args);
				Value *NewInst = Builder.CreateLoad(Bitcast->getType(), AllocaAddr);
				Bitcast->replaceAllUsesWith(NewInst);
	}			}

	return true;			return true;
	}			}

				// %addr = bitcast <256 x i32>* %tile to x86_amx*
				// store x86_amx %9, x86_amx* %addr, align 64
				// -->
				pengfeiUnsubmitted Not Done Reply Inline Actions How about the `Tile` comes from tdpbssd? pengfei: How about the `Tile` comes from tdpbssd?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions We have a convention, when amx intrinsics define a x86_amx tile the first 2 operands is the shape of the defined tile. For tdpbssd, the intrinsics operands are (m, n, k, ...). (m, n) is the shape of the produced tile. LuoYuanke: We have a convention, when amx intrinsics define a x86_amx tile the first 2 operands is the…
				pengfeiUnsubmitted Not Done Reply Inline Actions Oh, yes. I missed that. Thanks. pengfei: Oh, yes. I missed that. Thanks.
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %9)
				static void transformTileStore(StoreInst *ST) {
				auto *II = cast<IntrinsicInst>(ST->getValueOperand());
				pengfeiUnsubmitted Not Done Reply Inline Actions Why don't check empty like line 157? pengfei: Why don't check empty like line 157?
				Value *Row = II->getOperand(0);
				Value *Col = II->getOperand(1);
				IRBuilder<> Builder(ST);
				// Use the maximum column as stride. It must be the same with load
				// stride.
				Value *Stride = Builder.getInt64(64);
				Value *I8Ptr =
				Builder.CreateBitCast(ST->getOperand(1), Builder.getInt8PtrTy());
				std::array<Value *, 5> Args = {Row, Col, I8Ptr, Stride,
				ST->getValueOperand()};
				Builder.CreateIntrinsic(Intrinsic::x86_tilestored64_internal, None, Args);
				pengfeiUnsubmitted Not Done Reply Inline Actions Value pengfei: Value
				}

				namespace {
				class X86LowerAMXType {
				Function &Func;

				public:
				X86LowerAMXType(Function &F) : Func(F) {}
				bool visit();
				};

	bool X86LowerAMXType::visit() {			bool X86LowerAMXType::visit() {
	bool C;			SmallVector<Instruction *, 8> DeadInsts;
	auto IsAMXType = [](FixedVectorType *VTy) {			SmallVector<Instruction *, 2> DeadBitcasts;
				pengfeiUnsubmitted Not Done Reply Inline Actions Maybe better to use BitCastInst? pengfei: Maybe better to use BitCastInst?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions There may be dead load or store instructions. LuoYuanke: There may be dead load or store instructions.
	if (!VTy)
	return false;
	if (!VTy->getScalarType()->isIntegerTy(32))
	return false;
	if (VTy->getNumElements() != 256)
	return false;

	return true;			for (BasicBlock *BB : post_order(&Func)) {
				for (BasicBlock::reverse_iterator II = BB->rbegin(), IE = BB->rend();
				II != IE;) {
				Instruction &Inst = *II++;
				auto *Bitcast = dyn_cast<BitCastInst>(&Inst);
				if (!Bitcast)
				continue;
				if (Bitcast->user_empty()) {
				DeadInsts.push_back(Bitcast);
				pengfeiUnsubmitted Not Done Reply Inline Actions Why don't put it in DeadBitcasts? pengfei: Why don't put it in DeadBitcasts?
				continue;
				}

				Value *Src = Bitcast->getOperand(0);
				pengfeiUnsubmitted Not Done Reply Inline Actions I don't see any chance this happen. But we still need to handle the x86_amx* here if possible, right? Maybe better to give an assertion for now. cast<PointerType>(Src->getType())->isX86_AMXTy() pengfei: I don't see any chance this happen. But we still need to handle the x86_amx* here if possible…
				Type *Ty = Bitcast->getType();
				auto CanonicalizeBitcast = [&]() {
				pengfeiUnsubmitted Not Done Reply Inline Actions Can we leave the canonicalize bitcast cases a single patch. It's a bit complex here and I don't think it's a common case. pengfei: Can we leave the canonicalize bitcast cases a single patch. It's a bit complex here and I don't…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Ok, I'll create another patch for it. LuoYuanke: Ok, I'll create another patch for it.
				bool Change = false;
				Value DstV = Src, PreDst = Bitcast, *SrcV;
				while (match(DstV, m_BitCast(m_Value(SrcV))) &&
				SrcV->getType()->getTypeID() == PreDst->getType()->getTypeID()) {
				PreDst->replaceAllUsesWith(SrcV);
				DeadBitcasts.push_back(cast<Instruction>(PreDst));
				PreDst = DstV;
				DstV = SrcV;
				Change = true;
				}
				return Change;
	};			};

	for (BasicBlock &BB : Func) {			if (Ty->isPointerTy() &&
	for (Instruction &Inst : BB) {			cast<PointerType>(Ty)->getElementType()->isX86_AMXTy()) {
	LoadInst *LD = dyn_cast<LoadInst>(&Inst);			for (auto UI = Bitcast->use_begin(), UE = Bitcast->use_end();
	// Check load instruction.			UI != UE;) {
	// %3 = load <256 x i32>, <256 x i32>* %1, align 64			Value *I = (UI++)->getUser();
				auto *LD = dyn_cast<LoadInst>(I);
				// %0 = bitcast <256 x i32>* %tile to x86_amx*
				// %1 = load x86_amx, x86_amx* %0, align 64
	if (LD) {			if (LD) {
	FixedVectorType *VTy = dyn_cast<FixedVectorType>(Inst.getType());			transformTileLoad(LD);
	if (!IsAMXType(VTy))			DeadInsts.push_back(LD);
				}
				auto *ST = dyn_cast<StoreInst>(I);
				if (ST) {
				// %addr = bitcast <256 x i32>* %tile to x86_amx*
				// store x86_amx %9, x86_amx* %addr, align 64
				// -->
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %9)
				transformTileStore(ST);
				DeadInsts.push_back(ST);
				}
				}
				// If the dst type is <256 x i32>*, it is valid intruction.
				// %0 = bitcast x86_amx* %tile to <256 x i32>*
				// %1 = load <256 x i32>, <256 x i32>* %0, align 64
				// store <256 x i32> %2, <256 x i32>* %0, align 64
				}
				if (Bitcast->getType()->isX86_AMXTy()) {
				if (CanonicalizeBitcast())
	continue;			continue;
	LDSet.insert(&Inst);			LoadInst *LD = dyn_cast<LoadInst>(Src);
				if (!LD) {
				if (transformBitcast(Bitcast))
				DeadInsts.push_back(Bitcast);
	continue;			continue;
	}			}
	// Check store instruction.			// If load has mutli-user, duplicate a amx load.
	// store <256 x i32> %3, <256 x i32>* %2, align 64			// %src = load <256 x i32>, <256 x i32>* %addr, align 64
	StoreInst *ST = dyn_cast<StoreInst>(&Inst);			// %2 = bitcast <256 x i32> %src to x86_amx
	if (!ST)			// %add = add <256 x i32> %src, <256 x i32> %src2
				// -->
				// %src = load <256 x i32>, <256 x i32>* %addr, align 64
				// %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col,
				// i8* %addr, i64 %stride64)
				// %add = add <256 x i32> %src, <256 x i32> %src2

				// If load has one user, the load will be eliminated in DAG ISel.
				// %src = load <256 x i32>, <256 x i32>* %addr, align 64
				// %2 = bitcast <256 x i32> %src to x86_amx
				// -->
				// %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %row, i16 %col,
				// i8* %addr, i64 %stride64)
				combineLoadBitcast(LD, Bitcast);
				if (LD->hasOneUse())
				DeadInsts.push_back(LD);
				DeadInsts.push_back(Bitcast);
				} else if (Src->getType()->isX86_AMXTy()) {
				if (CanonicalizeBitcast())
	continue;			continue;
	FixedVectorType *VTy =			StoreInst *ST = nullptr;
	dyn_cast<FixedVectorType>(ST->getOperand(0)->getType());			for (auto UI = Bitcast->use_begin(), UE = Bitcast->use_end();
	if (!IsAMXType(VTy))			UI != UE;) {
				Value *I = (UI++)->getUser();
				ST = dyn_cast<StoreInst>(I);
				if (ST)
				break;
				}
				if (!ST) {
				if (transformBitcast(Bitcast))
				DeadInsts.push_back(Bitcast);
	continue;			continue;
	STSet.insert(&Inst);			}
				// If bitcast (%13) has one use, combine bitcast and store to amx store.
				// %src = call x86_amx @llvm.x86.tileloadd64.internal(%row, %col, %addr,
				// %stride);
				// %13 = bitcast x86_amx %src to <256 x i32>
				// store <256 x i32> %13, <256 x i32>* %addr, align 64
				// -->
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %13)
				//
				// If bitcast (%13) has multi-use, transform as below.
				// %13 = bitcast x86_amx %src to <256 x i32>
				// store <256 x i32> %13, <256 x i32>* %addr, align 64
				// %add = <256 x i32> %13, <256 x i32> %src2
				// -->
				// %13 = bitcast x86_amx %src to <256 x i32>
				// call void @llvm.x86.tilestored64.internal(%row, %col, %addr,
				// %stride64, %13)
				// %14 = load <256 x i32>, %addr
				// %add = <256 x i32> %14, <256 x i32> %src2
				//
				combineBitcastStore(Bitcast, ST);
				DeadInsts.push_back(Bitcast);
				DeadInsts.push_back(ST);
				}
	}			}
	}			}
				craig.topperUnsubmitted Not Done Reply Inline Actions Shouldn't this be in the function's entry block? craig.topper: Shouldn't this be in the function's entry block?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Yes. It is in function's entry block. It is done in line 48 of function CreateAllocaInst(). CreateAllocaInst() is actually copied from your code. :) LuoYuanke: Yes. It is in function's entry block. It is done in line 48 of function CreateAllocaInst().

	C = visitLD() \| visitST();			bool C = !DeadInsts.empty() \|\| !DeadBitcasts.empty();
				pengfeiUnsubmitted Not Done Reply Inline Actions Better move it to line 310. pengfei: Better move it to line 310.
	for (auto *Inst : STSet)
	Inst->eraseFromParent();			SmallSet<Instruction *, 8> DeletedInst;
	for (auto *Inst : LDSet)			auto DeleteInst = [&](Instruction *Inst) {
				SmallVector<Instruction *, 4> DeadIs;
				pengfeiUnsubmitted Not Done Reply Inline Actions Why the alignment not be 64? pengfei: Why the alignment not be 64?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions 1024 is conservatives, because vector require the alignment to be the vector size. Here generate vector <256 x i32> load/store. LuoYuanke: 1024 is conservatives, because vector require the alignment to be the vector size. Here…
				pengfeiUnsubmitted Not Done Reply Inline Actions We don't need to align to 1024. 64 should be enough. The same for below comments. pengfei: We don't need to align to 1024. 64 should be enough. The same for below comments.
				DeadIs.push_back(Inst);
				while (!DeadIs.empty()) {
				auto *Inst = DeadIs.back();
				DeadIs.pop_back();
				if (DeletedInst.count(Inst))
				pengfeiUnsubmitted Not Done Reply Inline Actions Better to reuse the cast result, e.g. BitCastInst BInst = dyn_cast<BitCastInst>(&Inst); if (!BInst ) You can save several `cast<BitCastInst>(&Inst)` below. pengfei:* Better to reuse the cast result, e.g. ``` BitCastInst *BInst = dyn_cast<BitCastInst>(&Inst); if…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions That's good. Thanks. LuoYuanke: That's good. Thanks.
				continue;
				for (auto I = Inst->op_begin(), E = Inst->op_end(); I != E;) {
				Instruction Op = dyn_cast<Instruction>(I);
				if (Op && Op->hasOneUse())
				DeadIs.push_back(Op);
				craig.topperUnsubmitted Not Done Reply Inline Actions Just use Value. auto doesn't add any value other than shortening by 1 character. craig.topper: Just use Value. auto doesn't add any value other than shortening by 1 character.
				++I;
				}
	Inst->eraseFromParent();			Inst->eraseFromParent();
				pengfeiUnsubmitted Done Reply Inline Actions vector pengfei: vector
				DeletedInst.insert(Inst);
				}
				craig.topperUnsubmitted Done Reply Inline Actions Don't use an assert to check the result of a dyn_cast. If it shouldn't fail just use cast<LoadInst> which will assert internally. craig.topper: Don't use an assert to check the result of a dyn_cast. If it shouldn't fail just use…
				};
				while (!DeadInsts.empty()) {
				auto *Inst = DeadInsts.back();
				DeleteInst(Inst);
				DeadInsts.pop_back();
				}
				// Delete user first.
				pengfeiUnsubmitted Not Done Reply Inline Actions This comment is for above code? Better move it up. pengfei: This comment is for above code? Better move it up.
				for (auto *Inst : DeadBitcasts)
				craig.topperUnsubmitted Done Reply Inline Actions Unchecked dyn_cast craig.topper: Unchecked dyn_cast
				DeleteInst(Inst);
				pengfeiUnsubmitted Not Done Reply Inline Actions How about the `Tile` comes from tdpbssd? pengfei: How about the `Tile` comes from tdpbssd?

	return C;			return C;
	}			}
	} // anonymous namespace			} // anonymous namespace
				pengfeiUnsubmitted Done Reply Inline Actions Why we need to recursively delete them? I think delete the nodes in DeadInsts is enough. pengfei: Why we need to recursively delete them? I think delete the nodes in DeadInsts is enough.

	namespace {			namespace {

				pengfeiUnsubmitted Not Done Reply Inline Actions Is it possible the x86_amx operand isn't from AMX intrinsic, e.g. %src = bitcast <256 x i32> %xxx to x86_amx %2 = bitcast x86_amx %src to <256 x i32> pengfei: Is it possible the x86_amx operand isn't from AMX intrinsic, e.g. ``` %src = bitcast <256 x…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions Good catch. I'll add support for this pattern. LuoYuanke: Good catch. I'll add support for this pattern.
	class X86LowerAMXTypeLegacyPass : public FunctionPass {			class X86LowerAMXTypeLegacyPass : public FunctionPass {
				pengfeiUnsubmitted Not Done Reply Inline Actions Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before this bitcast if it exists? pengfei: Where's `x86_amx* %tile` from? Shouldn't been transfered to `x86_amx` before this bitcast if it…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions In my test case, it is transformed after Combine redundant instructions. * IR Dump After Simplify the CFG * define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 { entry: %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0 %0 = load i16, i16* %row, align 64, !tbaa !2 %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1 %1 = load i16, i16* %col, align 2, !tbaa !7 %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64) #6 %3 = bitcast x86_amx %2 to <256 x i32> %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3 store <256 x i32> %3, <256 x i32>* %tile, align 64, !tbaa !8 ret void } To * IR Dump After Combine redundant instructions * ; Function Attrs: alwaysinline nounwind uwtable mustprogress define internal fastcc void @_ZL12__tile_loaddP15__tile1024i_strPKvm(%struct.__tile1024i_str* nocapture %dst) unnamed_addr #4 { entry: %row = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 0 %0 = load i16, i16* %row, align 64, !tbaa !2 %col = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 1 %1 = load i16, i16* %col, align 2, !tbaa !7 %2 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 64) #6 %tile = getelementptr inbounds %struct.__tile1024i_str, %struct.__tile1024i_str* %dst, i64 0, i32 3 %3 = bitcast <256 x i32>* %tile to x86_amx* store x86_amx %2, x86_amx* %3, align 64, !tbaa !8 ret void } LuoYuanke: In my test case, it is transformed after Combine redundant instructions. ``` *** IR Dump After…
	public:			public:
	static char ID;			static char ID;

	X86LowerAMXTypeLegacyPass() : FunctionPass(ID) {			X86LowerAMXTypeLegacyPass() : FunctionPass(ID) {
	initializeX86LowerAMXTypeLegacyPassPass(*PassRegistry::getPassRegistry());			initializeX86LowerAMXTypeLegacyPassPass(*PassRegistry::getPassRegistry());
	}			}
				pengfeiUnsubmitted Not Done Reply Inline Actions Maybe better to keep a duplicated `load` that calling `transformBitcast`. The same for line 285. pengfei: Maybe better to keep a duplicated `load` that calling `transformBitcast`. The same for line 285.

	bool runOnFunction(Function &F) override {			bool runOnFunction(Function &F) override {
	X86LowerAMXType LAT(F);			X86LowerAMXType LAT(F);
	bool C = LAT.visit();			bool C = LAT.visit();
	return C;			return C;
	}			}

	void getAnalysisUsage(AnalysisUsage &AU) const override {			void getAnalysisUsage(AnalysisUsage &AU) const override {
				craig.topperUnsubmitted Done Reply Inline Actions Use cast. craig.topper: Use cast.
	AU.setPreservesCFG();			AU.setPreservesCFG();
	}			}
	};			};

				pengfeiUnsubmitted Not Done Reply Inline Actions `%src` is not used here. pengfei: `%src` is not used here.
	} // anonymous namespace			} // anonymous namespace

	static const char PassName[] = "Lower AMX type for load/store";			static const char PassName[] = "Lower AMX type for load/store";
				pengfeiUnsubmitted Not Done Reply Inline Actions Why we need to consider <256 x i32> has more than one use? pengfei: Why we need to consider <256 x i32> has more than one use?
	char X86LowerAMXTypeLegacyPass::ID = 0;			char X86LowerAMXTypeLegacyPass::ID = 0;
	INITIALIZE_PASS_BEGIN(X86LowerAMXTypeLegacyPass, DEBUG_TYPE, PassName, false,			INITIALIZE_PASS_BEGIN(X86LowerAMXTypeLegacyPass, DEBUG_TYPE, PassName, false,
	false)			false)
	INITIALIZE_PASS_END(X86LowerAMXTypeLegacyPass, DEBUG_TYPE, PassName, false,			INITIALIZE_PASS_END(X86LowerAMXTypeLegacyPass, DEBUG_TYPE, PassName, false,
	false)			false)

	FunctionPass *llvm::createX86LowerAMXTypePass() {			FunctionPass *llvm::createX86LowerAMXTypePass() {
	return new X86LowerAMXTypeLegacyPass();			return new X86LowerAMXTypeLegacyPass();
	}			}
				craig.topperUnsubmitted Not Done Reply Inline Actions maximun->maximum craig.topper: maximun->maximum
				craig.topperUnsubmitted Not Done Reply Inline Actions Use Builder.getInt8PtrTy then you don't need Ctx craig.topper: Use Builder.getInt8PtrTy then you don't need Ctx

llvm/lib/Target/X86/X86RegisterInfo.td

	Show First 20 Lines • Show All 631 Lines • ▼ Show 20 Lines
	def VK32WM : RegisterClass<"X86", [v32i1], 32, (add VK16WM)> {let Size = 32;}			def VK32WM : RegisterClass<"X86", [v32i1], 32, (add VK16WM)> {let Size = 32;}
	def VK64WM : RegisterClass<"X86", [v64i1], 64, (add VK32WM)> {let Size = 64;}			def VK64WM : RegisterClass<"X86", [v64i1], 64, (add VK32WM)> {let Size = 64;}

	// Bound registers			// Bound registers
	def BNDR : RegisterClass<"X86", [v2i64], 128, (sequence "BND%u", 0, 3)>;			def BNDR : RegisterClass<"X86", [v2i64], 128, (sequence "BND%u", 0, 3)>;

	// Tiles			// Tiles
	let CopyCost = -1 in // Don't allow copying of tile registers			let CopyCost = -1 in // Don't allow copying of tile registers
	def TILE : RegisterClass<"X86", [v256i32], 8192,			def TILE : RegisterClass<"X86", [x86amx], 8192,
	(sequence "TMM%u", 0, 7)> {let Size = 8192;}			(sequence "TMM%u", 0, 7)> {let Size = 8192;}
	def TILECFG : RegisterClass<"X86", [untyped], 512, (add TMMCFG)> {			def TILECFG : RegisterClass<"X86", [untyped], 512, (add TMMCFG)> {
	let CopyCost = -1; // Don't allow copying of tile config registers.			let CopyCost = -1; // Don't allow copying of tile config registers.
	let isAllocatable = 1;			let isAllocatable = 1;
	let Size = 512;			let Size = 512;
	}			}

llvm/test/CodeGen/X86/AMX/amx-across-func.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: popq %r14			; CHECK-NEXT: popq %r14
	; CHECK-NEXT: .cfi_def_cfa_offset 24			; CHECK-NEXT: .cfi_def_cfa_offset 24
	; CHECK-NEXT: popq %r15			; CHECK-NEXT: popq %r15
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: popq %rbp			; CHECK-NEXT: popq %rbp
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: tilerelease			; CHECK-NEXT: tilerelease
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%3 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %0, i16 8, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 0), i64 32) #4			%3 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 8, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 0), i64 32) #4
	%4 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 8, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 1024), i64 32) #4			%4 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 8, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 1024), i64 32) #4
	tail call void (...) @foo() #4			tail call void (...) @foo() #4
	%5 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 2048), i64 32) #4			%5 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 2048), i64 32) #4
	%6 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %0, i16 %1, i16 8, <256 x i32> %5, <256 x i32> %3, <256 x i32> %4) #4			%6 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %0, i16 %1, i16 8, x86_amx %5, x86_amx %3, x86_amx %4) #4
	tail call void @llvm.x86.tilestored64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 2048), i64 32, <256 x i32> %6) #4			tail call void @llvm.x86.tilestored64.internal(i16 %0, i16 %1, i8* getelementptr inbounds ([3072 x i8], [3072 x i8]* @buf, i64 0, i64 2048), i64 32, x86_amx %6) #4
	ret void			ret void
	}			}

	declare dso_local void @foo(...) local_unnamed_addr #3			declare dso_local void @foo(...) local_unnamed_addr #3

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #4			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #4
	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>) #4			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) #4
	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>) #4			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx) #4

	attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #4 = { nounwind }			attributes #4 = { nounwind }
				pengfeiUnsubmitted Not Done Reply Inline Actions Better to remove these unused attributes. The same to other tests. pengfei: Better to remove these unused attributes. The same to other tests.
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions I'll create a separate patch to clean the attributes. LuoYuanke: I'll create a separate patch to clean the attributes.

llvm/test/CodeGen/X86/AMX/amx-config.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%4 = icmp eq i32 %0, 0			%4 = icmp eq i32 %0, 0
	%5 = shl i16 %1, 8			%5 = shl i16 %1, 8
	%6 = ashr exact i16 %5, 8			%6 = ashr exact i16 %5, 8
	br i1 %4, label %11, label %7			br i1 %4, label %11, label %7

	7: ; preds = %3			7: ; preds = %3
	%8 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%8 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%9 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%9 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%10 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%10 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	br label %15			br label %15

	11: ; preds = %3			11: ; preds = %3
	%12 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%12 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	%13 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%13 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	%14 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%14 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	br label %15			br label %15

	15: ; preds = %11, %7			15: ; preds = %11, %7
	%16 = phi <256 x i32> [ %12, %11 ], [ %8, %7 ]			%16 = phi x86_amx [ %12, %11 ], [ %8, %7 ]
	%17 = phi <256 x i32> [ %13, %11 ], [ %9, %7 ]			%17 = phi x86_amx [ %13, %11 ], [ %9, %7 ]
	%18 = phi <256 x i32> [ %14, %11 ], [ %10, %7 ]			%18 = phi x86_amx [ %14, %11 ], [ %10, %7 ]
	%19 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %6, i16 %2, i16 %1, <256 x i32> %18, <256 x i32> %16, <256 x i32> %17) #3			%19 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %6, i16 %2, i16 %1, x86_amx %18, x86_amx %16, x86_amx %17) #3
	tail call void @llvm.x86.tilestored64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32, <256 x i32> %19) #3			tail call void @llvm.x86.tilestored64.internal(i16 %6, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32, x86_amx %19) #3
	ret void			ret void
	}			}

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3

	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>) #3			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) #3

	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>) #3			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx) #3

	attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

llvm/test/CodeGen/X86/AMX/amx-spill.ll

	Show First 20 Lines • Show All 64 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: movl $buf, %eax			; CHECK-NEXT: movl $buf, %eax
	; CHECK-NEXT: movl $32, %ecx			; CHECK-NEXT: movl $32, %ecx
	; CHECK-NEXT: tilestored %tmm0, (%rax,%rcx)			; CHECK-NEXT: tilestored %tmm0, (%rax,%rcx)
	; CHECK-NEXT: addq $2936, %rsp # imm = 0xB78			; CHECK-NEXT: addq $2936, %rsp # imm = 0xB78
	; CHECK-NEXT: .cfi_def_cfa_offset 8			; CHECK-NEXT: .cfi_def_cfa_offset 8
	; CHECK-NEXT: tilerelease			; CHECK-NEXT: tilerelease
	; CHECK-NEXT: vzeroupper			; CHECK-NEXT: vzeroupper
	; CHECK-NEXT: retq			; CHECK-NEXT: retq
	%4 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%4 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%5 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%5 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%6 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%6 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%7 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%7 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%8 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%8 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%9 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%9 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%10 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%10 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%11 = icmp eq i32 %0, 0			%11 = icmp eq i32 %0, 0
	br i1 %11, label %16, label %12			br i1 %11, label %16, label %12

	12: ; preds = %3			12: ; preds = %3
	%13 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%13 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%14 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%14 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	%15 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3			%15 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32) #3
	br label %20			br label %20

	16: ; preds = %3			16: ; preds = %3
	%17 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%17 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %1, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	%18 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%18 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	%19 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3			%19 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %1, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf2, i64 0, i64 0), i64 32) #3
	br label %20			br label %20

	20: ; preds = %16, %12			20: ; preds = %16, %12
	%21 = phi <256 x i32> [ %17, %16 ], [ %13, %12 ]			%21 = phi x86_amx [ %17, %16 ], [ %13, %12 ]
	%22 = phi <256 x i32> [ %18, %16 ], [ %14, %12 ]			%22 = phi x86_amx [ %18, %16 ], [ %14, %12 ]
	%23 = phi <256 x i32> [ %19, %16 ], [ %15, %12 ]			%23 = phi x86_amx [ %19, %16 ], [ %15, %12 ]
	%24 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %1, <256 x i32> %23, <256 x i32> %21, <256 x i32> %22) #3			%24 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %1, x86_amx %23, x86_amx %21, x86_amx %22) #3
	%25 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %2, <256 x i32> %6, <256 x i32> %24, <256 x i32> %5) #3			%25 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %2, x86_amx %6, x86_amx %24, x86_amx %5) #3
	%26 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %2, <256 x i32> %8, <256 x i32> %25, <256 x i32> %7) #3			%26 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %1, i16 %2, i16 %2, x86_amx %8, x86_amx %25, x86_amx %7) #3
	%27 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %2, i16 %2, i16 %2, <256 x i32> %10, <256 x i32> %26, <256 x i32> %9) #3			%27 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %2, i16 %2, i16 %2, x86_amx %10, x86_amx %26, x86_amx %9) #3
	tail call void @llvm.x86.tilestored64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32, <256 x i32> %27) #3			tail call void @llvm.x86.tilestored64.internal(i16 %2, i16 %2, i8* getelementptr inbounds ([1024 x i8], [1024 x i8]* @buf, i64 0, i64 0), i64 32, x86_amx %27) #3
	ret void			ret void
	}			}

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3
	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>) #3			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) #3
	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>) #3			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx) #3

	attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #2 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

llvm/test/CodeGen/X86/AMX/amx-type.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -lower-amx-type %s -S \| FileCheck %s			; RUN: opt -lower-amx-type %s -S \| FileCheck %s
	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"

	%struct.__tile_str = type { i16, i16, <256 x i32> }			%struct.__tile_str = type { i16, i16, <256 x i32> }

	@buf = dso_local global [1024 x i8] zeroinitializer, align 16			@buf = dso_local global [1024 x i8] zeroinitializer, align 16
	@buf2 = dso_local global [1024 x i8] zeroinitializer, align 16			@buf2 = dso_local global [1024 x i8] zeroinitializer, align 16

				define dso_local <256 x i32> @test_amx_bitcast(<256 x i32> %in) #2 {
				; CHECK-LABEL: @test_amx_bitcast(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret <256 x i32> [[IN:%.*]]
				;
				entry:
				%amx = bitcast <256 x i32> %in to x86_amx
				%vec = bitcast x86_amx %amx to <256 x i32>
				ret <256 x i32> %vec
				}

				define dso_local void @test_amx_store(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
				; CHECK-LABEL: @test_amx_store(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[T0:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.]], i16 [[N:%.]], i8 [[BUF:%.]], i64 [[S:%.]]) [[ATTR3:#.*]]
				; CHECK-NEXT: [[ADDR:%.]] = bitcast <256 x i32> [[IN:%.]] to x86_amx
				; CHECK-NEXT: [[TMP0:%.]] = bitcast x86_amx [[ADDR]] to i8*
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[TMP0]], i64 64, x86_amx [[T0]])
				; CHECK-NEXT: ret void
				;
				entry:
				%t0 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
				%addr = bitcast <256 x i32>* %in to x86_amx*
				store x86_amx %t0, x86_amx* %addr, align 64
				ret void
				}

				define dso_local void @test_amx_load(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
				; CHECK-LABEL: @test_amx_load(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[T0:%.]] = bitcast <256 x i32> [[IN:%.]] to x86_amx
				; CHECK-NEXT: [[TMP0:%.]] = bitcast x86_amx [[T0]] to i8*
				; CHECK-NEXT: [[TMP1:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.]], i16 [[N:%.]], i8 [[TMP0]], i64 64)
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.]], i64 [[S:%.]], x86_amx [[TMP1]]) [[ATTR3]]
				; CHECK-NEXT: ret void
				;
				entry:
				%t0 = bitcast <256 x i32>* %in to x86_amx*
				%t1 = load x86_amx, x86_amx* %t0, align 64
				call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t1) #3
				ret void
				}

				; test bitcast x86_amx to <256 x i32>
				define dso_local void @test_user_empty(i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
				; CHECK-LABEL: @test_user_empty(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret void
				;
				entry:
				%t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %n, i8* %buf, i64 %s) #3
				%t2 = bitcast x86_amx %t1 to <256 x i32>
				ret void
				}

				; test bitcast <256 x i32> to x86_amx
				define dso_local void @test_user_empty2(<256 x i32> %in) #2 {
				pengfeiUnsubmitted Not Done Reply Inline Actions For this and the next test, we have chances to optimize to memcpy if we can make sure %s is constant 64. pengfei: For this and the next test, we have chances to optimize to memcpy if we can make sure %s is…
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions If the stride is 64 we can transform the code to memcpy. How about do it in another patch? LuoYuanke: If the stride is 64 we can transform the code to memcpy. How about do it in another patch?
				; CHECK-LABEL: @test_user_empty2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: ret void
				;
				entry:
				%t = bitcast <256 x i32> %in to x86_amx
				ret void
				}

				define dso_local <256 x i32> @test_amx_load_bitcast(<256 x i32>* %in, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
				; CHECK-LABEL: @test_amx_load_bitcast(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[T1:%.]] = load <256 x i32>, <256 x i32> [[IN:%.*]], align 64
				; CHECK-NEXT: [[TMP0:%.]] = bitcast <256 x i32> [[IN]] to i8*
				; CHECK-NEXT: [[TMP1:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.]], i16 [[N:%.]], i8 [[TMP0]], i64 64)
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[N]], i8* [[BUF:%.]], i64 [[S:%.]], x86_amx [[TMP1]]) [[ATTR3]]
				; CHECK-NEXT: ret <256 x i32> [[T1]]
				;
				entry:
				%t1 = load <256 x i32>, <256 x i32>* %in, align 64
				%t2 = bitcast <256 x i32> %t1 to x86_amx
				call void @llvm.x86.tilestored64.internal(i16 %m, i16 %n, i8* %buf, i64 %s, x86_amx %t2) #3
				ret <256 x i32> %t1
				}

				define dso_local <256 x i32> @test_amx_bitcast_store(<256 x i32>* %out, i16 %m, i16 %n, i8 *%buf, i64 %s) #2 {
				; CHECK-LABEL: @test_amx_bitcast_store(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[T1:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[M:%.]], i16 [[M]], i8* [[BUF:%.]], i64 [[S:%.]]) [[ATTR3]]
				; CHECK-NEXT: [[TMP0:%.]] = bitcast <256 x i32> [[OUT:%.]] to i8
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[M]], i16 [[M]], i8* [[TMP0]], i64 64, x86_amx [[T1]])
				; CHECK-NEXT: [[TMP1:%.]] = load <256 x i32>, <256 x i32> [[OUT]], align 1024
				; CHECK-NEXT: ret <256 x i32> [[TMP1]]
				;
				entry:
				%t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %m, i16 %m, i8* %buf, i64 %s) #3
				%t2 = bitcast x86_amx %t1 to <256 x i32>
				store <256 x i32> %t2, <256 x i32>* %out
				ret <256 x i32> %t2
				}

				define dso_local void @test_src_add(<256 x i32> %x, <256 x i32> %y, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
				; CHECK-LABEL: @test_src_add(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = alloca <256 x i32>, align 1024
				; CHECK-NEXT: [[ADD:%.]] = add <256 x i32> [[Y:%.]], [[X:%.*]]
				; CHECK-NEXT: [[TMP1:%.]] = bitcast <256 x i32> [[TMP0]] to i8*
				; CHECK-NEXT: store <256 x i32> [[ADD]], <256 x i32>* [[TMP0]], align 1024
				; CHECK-NEXT: [[TMP2:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.]], i16 [[C:%.]], i8 [[TMP1]], i64 64)
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[BUF:%.]], i64 [[S:%.]], x86_amx [[TMP2]]) [[ATTR3]]
				; CHECK-NEXT: ret void
				;
				entry:
				%add = add <256 x i32> %y, %x
				%t = bitcast <256 x i32> %add to x86_amx
				call void @llvm.x86.tilestored64.internal(i16 %r, i16 %c, i8* %buf, i64 %s, x86_amx %t) #3
				ret void
				}

				define dso_local void @test_src_add2(<256 x i32> %x, i16 %r, i16 %c, i8* %buf, i64 %s) #2 {
				; CHECK-LABEL: @test_src_add2(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[TMP0:%.*]] = alloca <256 x i32>, align 1024
				; CHECK-NEXT: [[T1:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[R:%.]], i16 [[C:%.]], i8 [[BUF:%.]], i64 [[S:%.]]) [[ATTR3]]
				; CHECK-NEXT: [[TMP1:%.]] = bitcast <256 x i32> [[TMP0]] to i8*
				; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[R]], i16 [[C]], i8* [[TMP1]], i64 64, x86_amx [[T1]])
				; CHECK-NEXT: [[TMP2:%.]] = load <256 x i32>, <256 x i32> [[TMP0]], align 1024
				; CHECK-NEXT: [[ADD:%.]] = add <256 x i32> [[TMP2]], [[X:%.]]
				; CHECK-NEXT: ret void
				;
				entry:
				%t1 = call x86_amx @llvm.x86.tileloadd64.internal(i16 %r, i16 %c, i8* %buf, i64 %s) #3
				%t2 = bitcast x86_amx %t1 to <256 x i32>
				%add = add <256 x i32> %t2, %x
				ret void
				}

	define dso_local void @test_load(i8* %in, i8* %out) local_unnamed_addr #2 {			define dso_local void @test_load(i8* %in, i8* %out) local_unnamed_addr #2 {
				pengfeiUnsubmitted Not Done Reply Inline Actions We don't need to check this case now, right? pengfei: We don't need to check this case now, right?
				LuoYuankeAuthorUnsubmitted Done Reply Inline Actions It can check the load and store instruction is not transformed if they are not participate in amx operation. I prefer to keep the case. LuoYuanke: It can check the load and store instruction is not transformed if they are not participate in…
	; CHECK-LABEL: @test_load(			; CHECK-LABEL: @test_load(
	; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[IN:%.]] to <256 x i32>			; CHECK-NEXT: [[TMP1:%.]] = bitcast i8 [[IN:%.]] to <256 x i32>
	; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[OUT:%.]] to <256 x i32>			; CHECK-NEXT: [[TMP2:%.]] = bitcast i8 [[OUT:%.]] to <256 x i32>
	; CHECK-NEXT: [[TMP3:%.]] = bitcast <256 x i32> [[TMP1]] to <128 x i32>*			; CHECK-NEXT: [[TMP3:%.]] = load <256 x i32>, <256 x i32> [[TMP1]], align 64, [[TBAA2:!tbaa !.*]]
	; CHECK-NEXT: [[TMP4:%.]] = load <128 x i32>, <128 x i32> [[TMP3]], align 64			; CHECK-NEXT: store <256 x i32> [[TMP3]], <256 x i32>* [[TMP2]], align 64, [[TBAA2]]
	; CHECK-NEXT: [[TMP5:%.]] = getelementptr <128 x i32>, <128 x i32> [[TMP3]], i32 1
	; CHECK-NEXT: [[TMP6:%.]] = load <128 x i32>, <128 x i32> [[TMP5]], align 64
	; CHECK-NEXT: [[TMP7:%.]] = bitcast <256 x i32> [[TMP2]] to <128 x i32>*
	; CHECK-NEXT: store <128 x i32> [[TMP4]], <128 x i32>* [[TMP7]], align 64
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr <128 x i32>, <128 x i32> [[TMP7]], i32 1
	; CHECK-NEXT: store <128 x i32> [[TMP6]], <128 x i32>* [[TMP8]], align 64
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%1 = bitcast i8* %in to <256 x i32>*			%1 = bitcast i8* %in to <256 x i32>*
	%2 = bitcast i8* %out to <256 x i32>*			%2 = bitcast i8* %out to <256 x i32>*
	%3 = load <256 x i32>, <256 x i32>* %1, align 64, !tbaa !8			%3 = load <256 x i32>, <256 x i32>* %1, align 64, !tbaa !8
	store <256 x i32> %3, <256 x i32>* %2, align 64, !tbaa !8			store <256 x i32> %3, <256 x i32>* %2, align 64, !tbaa !8
	ret void			ret void
	}			}

				define dso_local <256 x i32> @foo(<256 x i32>* nocapture readonly byval(<256 x i32>) align 1024 %0, <256 x i32>* nocapture readonly byval(<256 x i32>) align 1024 %1) local_unnamed_addr #0 {
				; CHECK-LABEL: @foo(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[X:%.]] = load <256 x i32>, <256 x i32> [[TMP0:%.]], align 1024, [[TBAA5:!tbaa !.]]
				; CHECK-NEXT: [[Y:%.]] = load <256 x i32>, <256 x i32> [[TMP1:%.*]], align 1024, [[TBAA5]]
				; CHECK-NEXT: [[ADD:%.*]] = add <256 x i32> [[Y]], [[X]]
				; CHECK-NEXT: ret <256 x i32> [[ADD]]
				;
				entry:
				%x = load <256 x i32>, <256 x i32>* %0, align 1024, !tbaa !2
				%y = load <256 x i32>, <256 x i32>* %1, align 1024, !tbaa !2
				%add = add <256 x i32> %y, %x
				ret <256 x i32> %add
				}

	define dso_local void @__tile_loadd(%struct.__tile_str* nocapture %0, i8* %1, i64 %2) local_unnamed_addr #0 {			define dso_local void @__tile_loadd(%struct.__tile_str* nocapture %0, i8* %1, i64 %2) local_unnamed_addr #0 {
	; CHECK-LABEL: @__tile_loadd(			; CHECK-LABEL: @__tile_loadd(
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP0:%.*]], i64 0, i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP0:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA2:!tbaa !.*]]			; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA5]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0]], i64 0, i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0]], i64 0, i32 1
	; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA7:!tbaa !.*]]			; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA8:!tbaa !.*]]
	; CHECK-NEXT: [[TMP8:%.]] = shl i64 [[TMP2:%.]], 32			; CHECK-NEXT: [[TMP8:%.]] = shl i64 [[TMP2:%.]], 32
	; CHECK-NEXT: [[TMP9:%.*]] = ashr exact i64 [[TMP8]], 32			; CHECK-NEXT: [[TMP9:%.*]] = ashr exact i64 [[TMP8]], 32
	; CHECK-NEXT: [[TMP10:%.]] = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP1:%.]], i64 [[TMP9]]) [[ATTR3:#.]]			; CHECK-NEXT: [[TMP10:%.]] = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP1:%.*]], i64 [[TMP9]]) [[ATTR3]]
	; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0]], i64 0, i32 2			; CHECK-NEXT: [[TMP11:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0]], i64 0, i32 2
	; CHECK-NEXT: [[TMP12:%.]] = bitcast <256 x i32> [[TMP11]] to i8*			; CHECK-NEXT: [[TMP12:%.]] = bitcast <256 x i32> [[TMP11]] to i8*
	; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP12]], i64 64, <256 x i32> [[TMP10]])			; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP12]], i64 64, x86_amx [[TMP10]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 0			%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 0
	%5 = load i16, i16* %4, align 64, !tbaa !2			%5 = load i16, i16* %4, align 64, !tbaa !2
	%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 1			%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 1
	%7 = load i16, i16* %6, align 2, !tbaa !7			%7 = load i16, i16* %6, align 2, !tbaa !7
	%8 = shl i64 %2, 32			%8 = shl i64 %2, 32
	%9 = ashr exact i64 %8, 32			%9 = ashr exact i64 %8, 32
	%10 = tail call <256 x i32> @llvm.x86.tileloadd64.internal(i16 %5, i16 %7, i8* %1, i64 %9) #3			%10 = tail call x86_amx @llvm.x86.tileloadd64.internal(i16 %5, i16 %7, i8* %1, i64 %9) #3
	%11 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 2			%11 = bitcast x86_amx %10 to <256 x i32>
	store <256 x i32> %10, <256 x i32>* %11, align 64, !tbaa !8			%12 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 2
				store <256 x i32> %11, <256 x i32>* %12, align 64, !tbaa !8
	ret void			ret void
	}			}

	define dso_local void @__tile_dpbsud(%struct.__tile_str* nocapture %0, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %1, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %2) local_unnamed_addr #0 {			define dso_local void @__tile_dpbsud(%struct.__tile_str* nocapture %0, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %1, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %2) local_unnamed_addr #0 {
	; CHECK-LABEL: @__tile_dpbsud(			; CHECK-LABEL: @__tile_dpbsud(
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP1:%.*]], i64 0, i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP1:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA2]]			; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA5]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2:%.*]], i64 0, i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2:%.*]], i64 0, i32 1
	; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA7]]			; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA8]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP1]], i64 0, i32 1			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP1]], i64 0, i32 1
	; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP8]], align 2, [[TBAA7]]			; CHECK-NEXT: [[TMP9:%.]] = load i16, i16 [[TMP8]], align 2, [[TBAA8]]
	; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0:%.*]], i64 0, i32 2			; CHECK-NEXT: [[TMP10:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP0:%.*]], i64 0, i32 2
	; CHECK-NEXT: [[TMP11:%.]] = bitcast <256 x i32> [[TMP10]] to i8*			; CHECK-NEXT: [[TMP11:%.]] = bitcast <256 x i32> [[TMP10]] to i8*
	; CHECK-NEXT: [[TMP12:%.]] = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP11]], i64 64)			; CHECK-NEXT: [[TMP12:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP11]], i64 64)
	; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP1]], i64 0, i32 2			; CHECK-NEXT: [[TMP13:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP1]], i64 0, i32 2
	; CHECK-NEXT: [[TMP14:%.]] = bitcast <256 x i32> [[TMP13]] to i8*			; CHECK-NEXT: [[TMP14:%.]] = bitcast <256 x i32> [[TMP13]] to i8*
	; CHECK-NEXT: [[TMP15:%.]] = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP9]], i8 [[TMP14]], i64 64)			; CHECK-NEXT: [[TMP15:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP9]], i8 [[TMP14]], i64 64)
	; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 2			; CHECK-NEXT: [[TMP16:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 2
	; CHECK-NEXT: [[TMP17:%.]] = bitcast <256 x i32> [[TMP16]] to i8*			; CHECK-NEXT: [[TMP17:%.]] = bitcast <256 x i32> [[TMP16]] to i8*
	; CHECK-NEXT: [[TMP18:%.]] = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP9]], i16 [[TMP7]], i8 [[TMP17]], i64 64)			; CHECK-NEXT: [[TMP18:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP9]], i16 [[TMP7]], i8 [[TMP17]], i64 64)
	; CHECK-NEXT: [[TMP19:%.*]] = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 [[TMP5]], i16 [[TMP7]], i16 [[TMP9]], <256 x i32> [[TMP12]], <256 x i32> [[TMP15]], <256 x i32> [[TMP18]]) [[ATTR3]]			; CHECK-NEXT: [[TMP19:%.*]] = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 [[TMP5]], i16 [[TMP7]], i16 [[TMP9]], x86_amx [[TMP12]], x86_amx [[TMP15]], x86_amx [[TMP18]]) [[ATTR3]]
	; CHECK-NEXT: [[TMP20:%.]] = bitcast <256 x i32> [[TMP10]] to i8*			; CHECK-NEXT: [[TMP20:%.]] = bitcast <256 x i32> [[TMP10]] to i8*
	; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP20]], i64 64, <256 x i32> [[TMP19]])			; CHECK-NEXT: call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP20]], i64 64, x86_amx [[TMP19]])
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 0			%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 0
	%5 = load i16, i16* %4, align 64, !tbaa !2			%5 = load i16, i16* %4, align 64, !tbaa !2
	%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 1			%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 1
	%7 = load i16, i16* %6, align 2, !tbaa !7			%7 = load i16, i16* %6, align 2, !tbaa !7
	%8 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 1			%8 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 1
	%9 = load i16, i16* %8, align 2, !tbaa !7			%9 = load i16, i16* %8, align 2, !tbaa !7
	%10 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 2			%10 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %0, i64 0, i32 2
	%11 = load <256 x i32>, <256 x i32>* %10, align 64, !tbaa !8			%11 = load <256 x i32>, <256 x i32>* %10, align 64, !tbaa !8
	%12 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 2			%12 = bitcast <256 x i32> %11 to x86_amx
	%13 = load <256 x i32>, <256 x i32>* %12, align 64, !tbaa !8			%13 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %1, i64 0, i32 2
	%14 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 2			%14 = load <256 x i32>, <256 x i32>* %13, align 64, !tbaa !8
	%15 = load <256 x i32>, <256 x i32>* %14, align 64, !tbaa !8			%15 = bitcast <256 x i32> %14 to x86_amx
	%16 = tail call <256 x i32> @llvm.x86.tdpbssd.internal(i16 %5, i16 %7, i16 %9, <256 x i32> %11, <256 x i32> %13, <256 x i32> %15) #3			%16 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 2
	store <256 x i32> %16, <256 x i32>* %10, align 64, !tbaa !8			%17 = load <256 x i32>, <256 x i32>* %16, align 64, !tbaa !8
				%18 = bitcast <256 x i32> %17 to x86_amx
				%19 = tail call x86_amx @llvm.x86.tdpbssd.internal(i16 %5, i16 %7, i16 %9, x86_amx %12, x86_amx %15, x86_amx %18) #3
				%20 = bitcast x86_amx %19 to <256 x i32>
				store <256 x i32> %20, <256 x i32>* %10, align 64, !tbaa !8
	ret void			ret void
	}			}

	define dso_local void @__tile_stored(i8* %0, i64 %1, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %2) local_unnamed_addr #1 {			define dso_local void @__tile_stored(i8* %0, i64 %1, %struct.__tile_str* nocapture readonly byval(%struct.__tile_str) align 64 %2) local_unnamed_addr #1 {
	; CHECK-LABEL: @__tile_stored(			; CHECK-LABEL: @__tile_stored(
	; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP2:%.*]], i64 0, i32 0			; CHECK-NEXT: [[TMP4:%.]] = getelementptr inbounds [[STRUCT___TILE_STR:%.]], %struct.__tile_str* [[TMP2:%.*]], i64 0, i32 0
	; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA2]]			; CHECK-NEXT: [[TMP5:%.]] = load i16, i16 [[TMP4]], align 64, [[TBAA5]]
	; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 1			; CHECK-NEXT: [[TMP6:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 1
	; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA7]]			; CHECK-NEXT: [[TMP7:%.]] = load i16, i16 [[TMP6]], align 2, [[TBAA8]]
	; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 2			; CHECK-NEXT: [[TMP8:%.]] = getelementptr inbounds [[STRUCT___TILE_STR]], %struct.__tile_str [[TMP2]], i64 0, i32 2
	; CHECK-NEXT: [[TMP9:%.]] = bitcast <256 x i32> [[TMP8]] to i8*			; CHECK-NEXT: [[TMP9:%.]] = bitcast <256 x i32> [[TMP8]] to i8*
	; CHECK-NEXT: [[TMP10:%.]] = call <256 x i32> @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP9]], i64 64)			; CHECK-NEXT: [[TMP10:%.]] = call x86_amx @llvm.x86.tileloadd64.internal(i16 [[TMP5]], i16 [[TMP7]], i8 [[TMP9]], i64 64)
	; CHECK-NEXT: [[TMP11:%.]] = shl i64 [[TMP1:%.]], 32			; CHECK-NEXT: [[TMP11:%.]] = shl i64 [[TMP1:%.]], 32
	; CHECK-NEXT: [[TMP12:%.*]] = ashr exact i64 [[TMP11]], 32			; CHECK-NEXT: [[TMP12:%.*]] = ashr exact i64 [[TMP11]], 32
	; CHECK-NEXT: tail call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP0:%.*]], i64 [[TMP12]], <256 x i32> [[TMP10]]) [[ATTR3]]			; CHECK-NEXT: tail call void @llvm.x86.tilestored64.internal(i16 [[TMP5]], i16 [[TMP7]], i8* [[TMP0:%.*]], i64 [[TMP12]], x86_amx [[TMP10]]) [[ATTR3]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 0			%4 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 0
	%5 = load i16, i16* %4, align 64, !tbaa !2			%5 = load i16, i16* %4, align 64, !tbaa !2
	%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 1			%6 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 1
	%7 = load i16, i16* %6, align 2, !tbaa !7			%7 = load i16, i16* %6, align 2, !tbaa !7
	%8 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 2			%8 = getelementptr inbounds %struct.__tile_str, %struct.__tile_str* %2, i64 0, i32 2
	%9 = load <256 x i32>, <256 x i32>* %8, align 64, !tbaa !8			%9 = load <256 x i32>, <256 x i32>* %8, align 64, !tbaa !8
	%10 = shl i64 %1, 32			%10 = bitcast <256 x i32> %9 to x86_amx
	%11 = ashr exact i64 %10, 32			%11 = shl i64 %1, 32
	tail call void @llvm.x86.tilestored64.internal(i16 %5, i16 %7, i8* %0, i64 %11, <256 x i32> %9) #3			%12 = ashr exact i64 %11, 32
				tail call void @llvm.x86.tilestored64.internal(i16 %5, i16 %7, i8* %0, i64 %12, x86_amx %10) #3
	ret void			ret void
	}			}

	declare <256 x i32> @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3			declare x86_amx @llvm.x86.tileloadd64.internal(i16, i16, i8*, i64) #3
	declare <256 x i32> @llvm.x86.tdpbssd.internal(i16, i16, i16, <256 x i32>, <256 x i32>, <256 x i32>) #3			declare x86_amx @llvm.x86.tdpbssd.internal(i16, i16, i16, x86_amx, x86_amx, x86_amx) #3
	declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, <256 x i32>) #3			declare void @llvm.x86.tilestored64.internal(i16, i16, i8*, i64, x86_amx) #3

	attributes #0 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #0 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="8192" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #1 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #1 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #2 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }			attributes #2 = { alwaysinline nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+amx-int8,+amx-tile,+avx,+avx2,+avx512f,+cx8,+f16c,+fma,+fxsr,+mmx,+popcnt,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+x87,+xsave" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
	attributes #3 = { nounwind }			attributes #3 = { nounwind }

	!llvm.module.flags = !{!0}			!llvm.module.flags = !{!0}
	!llvm.ident = !{!1}			!llvm.ident = !{!1}
	Show All 10 Lines

llvm/utils/TableGen/CodeGenTarget.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	StringRef llvm::getEnumName(MVT::SimpleValueType T) {
case MVT::f16: return "MVT::f16";		case MVT::f16: return "MVT::f16";
case MVT::bf16: return "MVT::bf16";		case MVT::bf16: return "MVT::bf16";
case MVT::f32: return "MVT::f32";		case MVT::f32: return "MVT::f32";
case MVT::f64: return "MVT::f64";		case MVT::f64: return "MVT::f64";
case MVT::f80: return "MVT::f80";		case MVT::f80: return "MVT::f80";
case MVT::f128: return "MVT::f128";		case MVT::f128: return "MVT::f128";
case MVT::ppcf128: return "MVT::ppcf128";		case MVT::ppcf128: return "MVT::ppcf128";
case MVT::x86mmx: return "MVT::x86mmx";		case MVT::x86mmx: return "MVT::x86mmx";
		case MVT::x86amx: return "MVT::x86amx";
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case MVT::x86amx: return "MVT::x86amx"; + case MVT::x86amx: + return "MVT::x86amx"; Lint: Pre-merge checks: clang-format: please reformat the code ``` - case MVT::x86amx: return "MVT::x86amx"; + case…
case MVT::Glue: return "MVT::Glue";		case MVT::Glue: return "MVT::Glue";
case MVT::isVoid: return "MVT::isVoid";		case MVT::isVoid: return "MVT::isVoid";
case MVT::v1i1: return "MVT::v1i1";		case MVT::v1i1: return "MVT::v1i1";
case MVT::v2i1: return "MVT::v2i1";		case MVT::v2i1: return "MVT::v2i1";
case MVT::v4i1: return "MVT::v4i1";		case MVT::v4i1: return "MVT::v4i1";
case MVT::v8i1: return "MVT::v8i1";		case MVT::v8i1: return "MVT::v8i1";
case MVT::v16i1: return "MVT::v16i1";		case MVT::v16i1: return "MVT::v16i1";
case MVT::v32i1: return "MVT::v32i1";		case MVT::v32i1: return "MVT::v32i1";
▲ Show 20 Lines • Show All 815 Lines • Show Last 20 Lines

llvm/utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 191 Lines • ▼ Show 20 Lines	void IntrinsicEmitter::EmitIntrinsicToOverloadTable(
OS << "#endif\n\n";		OS << "#endif\n\n";
}		}


// NOTE: This must be kept in synch with the copy in lib/IR/Function.cpp!		// NOTE: This must be kept in synch with the copy in lib/IR/Function.cpp!
enum IIT_Info {		enum IIT_Info {
// Common values should be encoded with 0-15.		// Common values should be encoded with 0-15.
IIT_Done = 0,		IIT_Done = 0,
IIT_I1 = 1,		IIT_I1 = 1,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - IIT_I1 = 1, - IIT_I8 = 2, - IIT_I16 = 3, - IIT_I32 = 4, - IIT_I64 = 5, - IIT_F16 = 6, - IIT_F32 = 7, - IIT_F64 = 8, - IIT_V2 = 9, - IIT_V4 = 10, 20 diff lines are omitted. See full path. Lint: Pre-merge checks: clang-format: please reformat the code ``` - IIT_I1 = 1, - IIT_I8 = 2, - IIT_I16 = 3…
IIT_I8 = 2,		IIT_I8 = 2,
IIT_I16 = 3,		IIT_I16 = 3,
IIT_I32 = 4,		IIT_I32 = 4,
IIT_I64 = 5,		IIT_I64 = 5,
IIT_F16 = 6,		IIT_F16 = 6,
IIT_F32 = 7,		IIT_F32 = 7,
IIT_F64 = 8,		IIT_F64 = 8,
IIT_V2 = 9,		IIT_V2 = 9,
IIT_V4 = 10,		IIT_V4 = 10,
IIT_V8 = 11,		IIT_V8 = 11,
IIT_V16 = 12,		IIT_V16 = 12,
IIT_V32 = 13,		IIT_V32 = 13,
IIT_PTR = 14,		IIT_PTR = 14,
IIT_ARG = 15,		IIT_ARG = 15,

// Values from 16+ are only encodable with the inefficient encoding.		// Values from 16+ are only encodable with the inefficient encoding.
IIT_V64 = 16,		IIT_V64 = 16,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - IIT_V64 = 16, - IIT_MMX = 17, + IIT_V64 = 16, + IIT_MMX = 17, Lint: Pre-merge checks: clang-format: please reformat the code ``` - IIT_V64 = 16, - IIT_MMX = 17, + IIT_V64 = 16…
IIT_MMX = 17,		IIT_MMX = 17,
IIT_TOKEN = 18,		IIT_TOKEN = 18,
IIT_METADATA = 19,		IIT_METADATA = 19,
IIT_EMPTYSTRUCT = 20,		IIT_EMPTYSTRUCT = 20,
IIT_STRUCT2 = 21,		IIT_STRUCT2 = 21,
IIT_STRUCT3 = 22,		IIT_STRUCT3 = 22,
IIT_STRUCT4 = 23,		IIT_STRUCT4 = 23,
IIT_STRUCT5 = 24,		IIT_STRUCT5 = 24,
IIT_EXTEND_ARG = 25,		IIT_EXTEND_ARG = 25,
IIT_TRUNC_ARG = 26,		IIT_TRUNC_ARG = 26,
IIT_ANYPTR = 27,		IIT_ANYPTR = 27,
IIT_V1 = 28,		IIT_V1 = 28,
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - IIT_V1 = 28, + IIT_V1 = 28, Lint: Pre-merge checks: clang-format: please reformat the code ``` - IIT_V1 = 28, + IIT_V1 = 28, ```
IIT_VARARG = 29,		IIT_VARARG = 29,
IIT_HALF_VEC_ARG = 30,		IIT_HALF_VEC_ARG = 30,
IIT_SAME_VEC_WIDTH_ARG = 31,		IIT_SAME_VEC_WIDTH_ARG = 31,
IIT_PTR_TO_ARG = 32,		IIT_PTR_TO_ARG = 32,
IIT_PTR_TO_ELT = 33,		IIT_PTR_TO_ELT = 33,
IIT_VEC_OF_ANYPTRS_TO_ELT = 34,		IIT_VEC_OF_ANYPTRS_TO_ELT = 34,
IIT_I128 = 35,		IIT_I128 = 35,
IIT_V512 = 36,		IIT_V512 = 36,
IIT_V1024 = 37,		IIT_V1024 = 37,
IIT_STRUCT6 = 38,		IIT_STRUCT6 = 38,
IIT_STRUCT7 = 39,		IIT_STRUCT7 = 39,
IIT_STRUCT8 = 40,		IIT_STRUCT8 = 40,
IIT_F128 = 41,		IIT_F128 = 41,
IIT_VEC_ELEMENT = 42,		IIT_VEC_ELEMENT = 42,
IIT_SCALABLE_VEC = 43,		IIT_SCALABLE_VEC = 43,
IIT_SUBDIVIDE2_ARG = 44,		IIT_SUBDIVIDE2_ARG = 44,
IIT_SUBDIVIDE4_ARG = 45,		IIT_SUBDIVIDE4_ARG = 45,
IIT_VEC_OF_BITCASTS_TO_INT = 46,		IIT_VEC_OF_BITCASTS_TO_INT = 46,
IIT_V128 = 47,		IIT_V128 = 47,
IIT_BF16 = 48,		IIT_BF16 = 48,
IIT_STRUCT9 = 49,		IIT_STRUCT9 = 49,
IIT_V256 = 50		IIT_V256 = 50,
		IIT_AMX = 51
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - IIT_AMX = 51 + IIT_AMX = 51 Lint: Pre-merge checks: clang-format: please reformat the code ``` - IIT_AMX = 51 + IIT_AMX = 51 ```
		pengfeiUnsubmitted Not Done Reply Inline Actions Remove `,` pengfei: Remove `,`
};		};

static void EncodeFixedValueType(MVT::SimpleValueType VT,		static void EncodeFixedValueType(MVT::SimpleValueType VT,
std::vector<unsigned char> &Sig) {		std::vector<unsigned char> &Sig) {
if (MVT(VT).isInteger()) {		if (MVT(VT).isInteger()) {
unsigned BitWidth = MVT(VT).getFixedSizeInBits();		unsigned BitWidth = MVT(VT).getFixedSizeInBits();
switch (BitWidth) {		switch (BitWidth) {
default: PrintFatalError("unhandled integer type width in intrinsic!");		default: PrintFatalError("unhandled integer type width in intrinsic!");
Show All 11 Lines	static void EncodeFixedValueType(MVT::SimpleValueType VT,
case MVT::f16: return Sig.push_back(IIT_F16);		case MVT::f16: return Sig.push_back(IIT_F16);
case MVT::bf16: return Sig.push_back(IIT_BF16);		case MVT::bf16: return Sig.push_back(IIT_BF16);
case MVT::f32: return Sig.push_back(IIT_F32);		case MVT::f32: return Sig.push_back(IIT_F32);
case MVT::f64: return Sig.push_back(IIT_F64);		case MVT::f64: return Sig.push_back(IIT_F64);
case MVT::f128: return Sig.push_back(IIT_F128);		case MVT::f128: return Sig.push_back(IIT_F128);
case MVT::token: return Sig.push_back(IIT_TOKEN);		case MVT::token: return Sig.push_back(IIT_TOKEN);
case MVT::Metadata: return Sig.push_back(IIT_METADATA);		case MVT::Metadata: return Sig.push_back(IIT_METADATA);
case MVT::x86mmx: return Sig.push_back(IIT_MMX);		case MVT::x86mmx: return Sig.push_back(IIT_MMX);
		case MVT::x86amx: return Sig.push_back(IIT_AMX);
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - case MVT::x86amx: return Sig.push_back(IIT_AMX); + case MVT::x86amx: + return Sig.push_back(IIT_AMX); Lint: Pre-merge checks: clang-format: please reformat the code ``` - case MVT::x86amx: return Sig.push_back(IIT_AMX)…
// MVT::OtherVT is used to mean the empty struct type here.		// MVT::OtherVT is used to mean the empty struct type here.
case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);		case MVT::Other: return Sig.push_back(IIT_EMPTYSTRUCT);
// MVT::isVoid is used to represent varargs here.		// MVT::isVoid is used to represent varargs here.
case MVT::isVoid: return Sig.push_back(IIT_VARARG);		case MVT::isVoid: return Sig.push_back(IIT_VARARG);
}		}
}		}

#if defined(_MSC_VER) && !defined(__clang__)		#if defined(_MSC_VER) && !defined(__clang__)
▲ Show 20 Lines • Show All 727 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add x86_amx type for intel AMX.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 313315

clang/test/CodeGen/X86/amx_api.c

llvm/include/llvm-c/Core.h

llvm/include/llvm/Bitcode/LLVMBitCodes.h

llvm/include/llvm/CodeGen/ValueTypes.td

llvm/include/llvm/IR/DataLayout.h

llvm/include/llvm/IR/Intrinsics.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/IntrinsicsX86.td

llvm/include/llvm/IR/Type.h

llvm/include/llvm/Support/MachineValueType.h

llvm/lib/Analysis/ConstantFolding.cpp

llvm/lib/AsmParser/LLLexer.cpp

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

llvm/lib/CodeGen/ValueTypes.cpp

llvm/lib/IR/AsmWriter.cpp

llvm/lib/IR/ConstantFold.cpp

llvm/lib/IR/Core.cpp

llvm/lib/IR/DataLayout.cpp

llvm/lib/IR/Function.cpp

llvm/lib/IR/LLVMContextImpl.h

llvm/lib/IR/LLVMContextImpl.cpp

llvm/lib/IR/Type.cpp

llvm/lib/Target/X86/X86ISelDAGToDAG.cpp

llvm/lib/Target/X86/X86ISelLowering.cpp

llvm/lib/Target/X86/X86LowerAMXType.cpp

llvm/lib/Target/X86/X86RegisterInfo.td

llvm/test/CodeGen/X86/AMX/amx-across-func.ll

llvm/test/CodeGen/X86/AMX/amx-config.ll

llvm/test/CodeGen/X86/AMX/amx-spill.ll

llvm/test/CodeGen/X86/AMX/amx-type.ll

llvm/utils/TableGen/CodeGenTarget.cpp

llvm/utils/TableGen/IntrinsicEmitter.cpp

[X86] Add x86_amx type for intel AMX.
ClosedPublic