This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/
-
mlir/
-
Conversion/
-
Passes.td
-
VectorToLLVM/
1/1
ConvertVectorToLLVM.h
-
Dialect/
-
AMX/
9/9
AMX.td
-
AMXDialect.h
-
CMakeLists.txt
-
Transforms.h
-
CMakeLists.txt
-
InitAllDialects.h
-
Target/LLVMIR/Dialect/
-
LLVMIR/
-
Dialect/
-
AMX/
-
AMXToLLVMIRTranslation.h
-
All.h
-
lib/
-
Conversion/
-
PassDetail.h
-
VectorToLLVM/
-
CMakeLists.txt
-
ConvertVectorToLLVMPass.cpp
-
Dialect/
-
AMX/
-
CMakeLists.txt
-
IR/
1/3
AMXDialect.cpp
-
CMakeLists.txt
-
Transforms/
-
CMakeLists.txt
2/2
LegalizeForLLVMExport.cpp
-
CMakeLists.txt
-
Target/LLVMIR/
-
LLVMIR/
-
CMakeLists.txt
-
Dialect/
-
AMX/
-
AMXToLLVMIRTranslation.cpp
-
CMakeLists.txt
-
CMakeLists.txt
-
test/
-
CMakeLists.txt
-
Dialect/AMX/
-
AMX/
-
invalid.mlir
-
legalize-for-llvm.mlir
-
roundtrip.mlir
-
Integration/Dialect/Vector/CPU/AMX/
-
Dialect/
-
Vector/
-
CPU/
-
AMX/
-
lit.local.cfg
-
test-mulf.mlir
-
test-muli.mlir
1
test-tilezero.mlir
-
Target/LLVMIR/
-
LLVMIR/
-
amx.mlir
-
lit.site.cfg.py.in
-
mlir-opt/
-
commandline.mlir

Differential D98470

[mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect)
ClosedPublic

Authored by aartbik on Mar 11 2021, 4:35 PM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache
mehdi_amini
penpornk
rriddle

Commits

rG6ad7b97e20c2: [mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect)

Summary

The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
multiply unit (TMUL), a tile control register (TILECFG), and eight
tile registers TMM0 through TMM7 (TILEDATA). This new MLIR dialect
provides a bridge between MLIR concepts like vectors and memrefs
and the lower level LLVM IR details of AMX.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Mar 11 2021, 4:35 PM

Herald added a reviewer: ftynse. · View Herald TranscriptMar 11 2021, 4:35 PM

Herald added subscribers: dcaballe, cota, teijeong and 16 others. · View Herald Transcript

aartbik requested review of this revision.Mar 11 2021, 4:35 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptMar 11 2021, 4:35 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

aartbik added reviewers: mehdi_amini, penpornk, rriddle.Mar 11 2021, 4:36 PM

This is a large drop, so I added some background on discourse:

https://llvm.discourse.group/t/intel-amx-vector-dialect/2984/6

Can you split this patch up? Seems like there are several separable components: amx, llvm_amx, the lowering to LLVM, etc.

silvas added a subscriber: silvas.Mar 11 2021, 7:55 PM

silvas added inline comments.

mlir/include/mlir/Dialect/AMX/AMX.td
10	drive by comment: can you provide links to the "source of truth" documentation (insofar as it is available) for folks that want to dig deeper?

Harbormaster completed remote builds in B93399: Diff 330103.Mar 11 2021, 10:47 PM

Putting a blocker for double-checking the stride and maybe the error messages.
The rest is good to go and can be improved incrementally (e.g. dropping the LLVMAMX dialect if we can add a type hook).

Nice!

mlir/include/mlir/Dialect/AMX/AMX.td
78	+1 on pointing to official doc so we can dig deeper. It is unclear to me whether the bytes need to be contiguous in memory of whether there is a way to accept strides and what alignment constraints are required for correctness or eprf. IIRC you mentioned there is a configuration mechanism for the sizes, does it also support something for striding for memory accesses ?
133	You also want to add some type checking on `lhs/rhs` via `TypeMatchesWith` what here is an example from AVX512 for syntax purposes: def MaskRndScaleOp : AVX512_Op<"mask.rndscale", [NoSideEffect, AllTypesMatch<["src", "a", "dst"]>, TypesMatchWith<"imm has the same number of bits as elements in dst", "dst", "imm", "IntegerType::get($_self.getContext(), " "($_self.cast<VectorType>().getShape()[0]))">]> { ... Edit: ah ok I see it appears in the C++ part. Feel free to ignore this comment and leave it as is in C++ or lift some of that into TypesMatchWith.
mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
16 ↗	(On Diff #330103)	Atm this is required because 2-D vector are not native in LLVM-IR, correct? Since the AMX vector type is a 2-D native type, is it reasonable to extend the LLVM type definition with target-specific hooks that would allow this dialect to disappear (@ftynse )?
mlir/lib/Conversion/AMXToLLVM/ConvertAMXToLLVM.cpp
31 ↗	(On Diff #330103)	returning a pair or a struct with 2 values would be more idiomatic IMO
35 ↗	(On Diff #330103)	Add an assert here that the IntOrFloatBitWidth is a power of 2 plz. While I don't expect we'll ever be able to see i33 here, the bug would be so nasty to debug that a line of defense makes sense to me.
52 ↗	(On Diff #330103)	The dynamic stride, if needed, should already be available to you in the descriptor. Is `MemRefDescriptorView` what we use these days to get this information @ftynse ? I haven't followed whether the refactorings still guarantee that static constants are visible when we pass function boundaries, I'd double check it by eyeballing the LLVMIR that gets emitted. In any case this is not just the last size, strides have a life of their own. They happen to be related to sizes in a particular case only.
102 ↗	(On Diff #330103)	Ah ok I see now that the stride is opaque to MLIR and part of the AMX intrinsic. Can you just document that in the op definition please ?
102 ↗	(On Diff #330103)	Some extra checks are needed. If the most minor stride obtained by calling the following on the memref type: LogicalResult getStridesAndOffset(MemRefType t, SmallVectorImpl<int64_t> &strides, int64_t &offset); is not 1, then we should fail the conversion.
114 ↗	(On Diff #330103)	Same comments as above.
148 ↗	(On Diff #330103)	If you returned, you could also more naturally assert the m's agree.
mlir/lib/Dialect/AMX/IR/AMXDialect.cpp
47	I'd go for nicer error messages here: the ops expect a certain vector layout so spelling the error a bit more would be useful.
54	if you moved these emitError inside the function performing the verification, we could have nicer error messages.
mlir/test/Integration/Dialect/Vector/CPU/AMX/test-tilezero.mlir
37	Haha I love these types of test. In polyhedral land, one would skew the the tile and cut it at the boundaries: seeing the pattern is correct is a must.

This revision now requires changes to proceed.Mar 12 2021, 12:08 AM

ftynse added inline comments.Mar 12 2021, 2:39 AM

mlir/include/mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h
26–28	It would be nice if we could reconsider this trick. It was introduced to make sure the type system change between built-in vectors and llvm vectors was smooth, but the type system difference is (almost) gone. It feels like we only need some casting/packing between nD and 1D vectors to make vector-to-llvm conversion separate from "ISA dialect"-to-llvm conversion. Not for this commit though.
mlir/include/mlir/Dialect/AMX/AMX.td
2	Nit: this should match the file name
27–28	This should also match the filename, I suppose
136	Nit: putting quotes or backticks, e.g., an "m x k" tile with a "k x n" around variables would make it more readable
151	Nit: something went wrong with whitespace
mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
1 ↗	(On Diff #330103)	I am going in the inverse direction and removing the X/LLVM_X separation between dialects. It is the legacy of there being two completely disjoint type systems. Only ArmSVE is still there just because I haven't had time. So I would appreciate if this didn't introduce another pair of dialects. I think all of these operations can live in the "main" AMX dialect, the patterns can be an in-dialect conversion given a list of "lower-level" ops. AVX512 already follows this model and can serve as an example. https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/AVX512/AVX512.td This will make it easy to "group" high-level ops and low-level ops so that they can share the description, at least in the comments. We can also think about infrastructural support for defining pairs/groups of ops when that becomes necessary (as a matter of fact, I have prototyped it, but it was more code and complexity than just keeping separate op definitions).
16 ↗	(On Diff #330103)	I'm not sure I understand what you suggest here @nicolasvasilache. A dialect is merely a collection of ops. Absolutely nothing prevents these ops from living in the "main" AMX dialect. The conversion configuration will be slightly longer, but that's pretty much it. These ops match LLVM IR intrinsics 1-1 and the conversion from "main" AVX ops to these is non-trivial. I am quite favorable to keeping these ops and the conversion, rather than somehow extending the translation or llvm types. `LLVM_Type` is actually `LLVM_AnyCompatibleType`, we shouldn't be using this anymore, more specific type constraints have been available for several months. If you want operations to accept different types, there's always `AnyTypeOf<[]>`.
28 ↗	(On Diff #330103)	"MLIR LLVM Dialect type system" no longer exists
49 ↗	(On Diff #330103)	`LLVM_Type` accepts any type potentially usable in the LLVM dialect, could we put tighter type constraints? This should be easy if we have these ops next to the higher-level op definition :)
61 ↗	(On Diff #330103)	Could we have at least a comment about the semantics of this op, `tdpbssd` isn't very intuitive. This should be easy if this op lives next to the higher-level op that has a detailed description :)
mlir/lib/Conversion/AMXToLLVM/ConvertAMXToLLVM.cpp
52 ↗	(On Diff #330103)	It should be possible to just call `memRefDescriptor.stride(position)` here. There's no embedded folding though, it always reads the descriptor. I'm fine adding the folding there if desirable. https://github.com/llvm/llvm-project/blob/cfe8f8e0f010077f5942bce88a2fd331b90ccea7/mlir/include/mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h#L304
69 ↗	(On Diff #330103)	Nit: context is available in both `ptr` and `loc`, no need to pass the type converter to access it.
mlir/lib/Dialect/AMX/IR/AMXDialect.cpp
2	AMXDialect.cpp

nicolasvasilache added inline comments.Mar 12 2021, 3:04 AM

mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
16 ↗	(On Diff #330103)	`A dialect is merely a collection of ops. Absolutely nothing prevents these ops from living in the "main" AMX dialect.` Ah yes I continually oversubscribe on `dialect goes away => we must be use the same type` which def. not true. My brain gets some time getting rewired :)
mlir/lib/Conversion/AMXToLLVM/ConvertAMXToLLVM.cpp
52 ↗	(On Diff #330103)	For the folding part I am just asking about the time the descriptor LLVM struct is created and filled (i.e. function boundary and alloc). When I wrote that the constants were properly propagated at construction time and I don't the refactoring changed it (otherwise things would have broken in noticeable ways). As long as this is still tru, LLVM should be able to canonicalize / fold away for us. Still would be nice to confirm looking at LLVM IR post -O3; if for some reason it does not we may want to work a little harder on our end.

ftynse added inline comments.Mar 12 2021, 3:49 AM

mlir/lib/Conversion/AMXToLLVM/ConvertAMXToLLVM.cpp
52 ↗	(On Diff #330103)	It doesn't sound like something could have been broken by refactorings, they were mostly moving code around. As long as there is a dialect conversion somewhere, it will try folding any operation before converting it. If we need more, we can always start adding canonializations on the LLVM dialect.

Thanks for working on this, Aart! I think the progressive lowering approach that you are taking here is very on point! I’m not working with AMX but it would be great if you could add me as a reviewer to the related code reviews. It would be very educational for me since this approach is also applicable to similar internal problems we have.

aartbik added inline comments.Mar 12 2021, 9:56 AM

mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
1 ↗	(On Diff #330103)	I was of course very aware of the direction you were taken and in fact started out this AMX dialect following this approach. But I got stuck on the fact that the 2-d types needs to be LLVM IR types. So I am uncertain on how you see this work without the LLVM IR dialect. Could you please sketch your vision in a bit more detail here? Also note that the AMX lowering uses quite a few non-trivial lowerings which work really well at the moment (e.g. getStridedElementPtr), which I am unsure would work without the intermediate.

In D98470#2622418, @dcaballe wrote:

Thanks for working on this, Aart! I think the progressive lowering approach that you are taking here is very on point! I’m not working with AMX but it would be great if you could add me as a reviewer to the related code reviews. It would be very educational for me since this approach is also applicable to similar internal problems we have.

Thanks Diego! Absolutely, I will start adding you to vector related stuff.

aartbik added inline comments.Mar 12 2021, 11:58 AM

mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
1 ↗	(On Diff #330103)	I forgot to quote in my reply above, so tagging you here explicitly @ftynse

nicolasvasilache added inline comments.Mar 12 2021, 12:02 PM

mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
1 ↗	(On Diff #330103)	AFAIU @ftynse suggests to just move the ops and drop the extra dialect. The same conversions would still exist but would within the dialect. This is independent from the fact that we can automatically convert without worrying about the type. Am I missing something ?

aartbik added inline comments.Mar 12 2021, 1:19 PM

mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
1 ↗	(On Diff #330103)	Just move the ops and drop the extra dialect. Sorry for being slow. I don't see how to easily lower %4 = amx.tilemulf %1, %2, %3 : vector<2x4xbf16>, vector<2x4xbf16>, vector<2x2xf32> to call x86_amx @llvm.x86.tdpbf16ps.internal(i16 2, i16 8, i16 8, x86_amx %51, x86_amx %46, x86_amx %50 without going through %52 = "llvm_amx.tdpbf16ps"(%50, %49, %51, %47, %34, %44) : (i16, i16, i16, !llvm.array<2 x vector<2xf32>>, !llvm.array<2 x vector<4xbf16>>, !llvm.array<2 x vector<4xbf16>>) first

aartbik added inline comments.Mar 12 2021, 1:49 PM

mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
1 ↗	(On Diff #330103)	Ah, wait, I guess I see what you are getting at. (1) Add the LLVM IR part into AMX dialect (2) make the conversion a legalization where half the AMX ops are valid and half are invalid I suppose that would work yes. But unlike the previous ARM case where ops where literal 1:1 mappings with no type changes, this feels like a very subjective aesthetic. I find the separate dialect more intuitive for this case. But I will do as you requested....

merged LLVM IR AMX dialect with AMX dialect (other comments still to be addressed....)

Harbormaster completed remote builds in B93637: Diff 330416.Mar 12 2021, 8:21 PM

bondhugula added a subscriber: bondhugula.Mar 13 2021, 2:34 AM

bondhugula added inline comments.

mlir/lib/Dialect/AMX/Transforms/LegalizeForLLVMExport.cpp
190	Drop commented out code?
196	Drop commented out code?

nicolasvasilache accepted this revision.Mar 13 2021, 5:04 AM

nicolasvasilache added inline comments.

mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
1 ↗	(On Diff #330103)	Right, I was essentially in the same mental model, see @ftynse's comment about type being orthogonal to dialects. Re: automatic 1-1 patterns, we also have a bit of precedent now in the arm dialect: https://reviews.llvm.org/D98198 There the 1-1 aspect additionally involves 2-d -> 1d flattening considerations.

This revision is now accepted and ready to land.Mar 13 2021, 5:04 AM

Accepted conditioned on addressing the rest, thanks Aart!

aartbik marked 30 inline comments as done.Mar 15 2021, 3:32 PM

aartbik added inline comments.

mlir/include/mlir/Dialect/AMX/AMX.td
10	I added a link (but with the caveat that Intel urls are notorious for changing all the times).
78	The row elements are contiguous, the column starting points are defined by a stride, hardcoded in the instructions. I added a comment to the intrinsics that have that stride.
mlir/include/mlir/Dialect/LLVMIR/LLVMAMX.td
28 ↗	(On Diff #330103)	removed the full dialect, so including this comment ;-)
61 ↗	(On Diff #330103)	Added comment, made type more precise.
mlir/lib/Conversion/AMXToLLVM/ConvertAMXToLLVM.cpp
35 ↗	(On Diff #330103)	Added (note that we also have type restrictions on the op already, but for future extension this nevrer hurts of course).

better error messages, more doc on ops, new asserts, stride checks

removed commented out code

Harbormaster completed remote builds in B93933: Diff 330829.Mar 15 2021, 5:41 PM

Harbormaster completed remote builds in B93935: Diff 330832.Mar 15 2021, 5:55 PM

This revision was landed with ongoing or failed builds.Mar 15 2021, 5:59 PM

Closed by commit rG6ad7b97e20c2: [mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect) (authored by aartbik). · Explain Why

This revision was automatically updated to reflect the committed changes.

aartbik added a commit: rG6ad7b97e20c2: [mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect).

xiangzhangllvm added a subscriber: xiangzhangllvm.Mar 16 2021, 2:00 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Conversion/

Passes.td

6 lines

VectorToLLVM/

ConvertVectorToLLVM.h

8 lines

Dialect/

AMX/

269 lines

26 lines

6 lines

29 lines

1 line

2 lines

Target/

LLVMIR/

Dialect/

AMX/

AMXToLLVMIRTranslation.h

31 lines

All.h

2 lines

lib/

Conversion/

PassDetail.h

1 line

VectorToLLVM/

CMakeLists.txt

2 lines

ConvertVectorToLLVMPass.cpp

9 lines

Dialect/

AMX/

CMakeLists.txt

2 lines

IR/

AMXDialect.cpp

106 lines

CMakeLists.txt

14 lines

Transforms/

CMakeLists.txt

12 lines

LegalizeForLLVMExport.cpp

202 lines

CMakeLists.txt

1 line

Target/

LLVMIR/

CMakeLists.txt

1 line

Dialect/

AMX/

AMXToLLVMIRTranslation.cpp

55 lines

CMakeLists.txt

16 lines

CMakeLists.txt

1 line

test/

CMakeLists.txt

1 line

Dialect/

AMX/

invalid.mlir

36 lines

legalize-for-llvm.mlir

45 lines

roundtrip.mlir

41 lines

Integration/

Dialect/

Vector/

CPU/

AMX/

15 lines

83 lines

83 lines

96 lines

Target/

LLVMIR/

amx.mlir

13 lines

lit.site.cfg.py.in

1 line

mlir-opt/

commandline.mlir

1 line

Diff 330416

mlir/include/mlir/Conversion/Passes.td

Show First 20 Lines • Show All 488 Lines • ▼ Show 20 Lines	def ConvertVectorToLLVM : Pass<"convert-vector-to-llvm", "ModuleOp"> {
let summary = "Lower the operations from the vector dialect into the LLVM "		let summary = "Lower the operations from the vector dialect into the LLVM "
"dialect";		"dialect";
let description = [{		let description = [{

Convert operations from the vector dialect into the LLVM IR dialect		Convert operations from the vector dialect into the LLVM IR dialect
operations. The lowering pass provides several options to control		operations. The lowering pass provides several options to control
the kinds of optimizations that are allowed. It also provides options		the kinds of optimizations that are allowed. It also provides options
that enable the use of one or more architectural-specific dialects		that enable the use of one or more architectural-specific dialects
(AVX512, ArmNeon, ArmSVE, etc.) in combination with the		(AMX, AVX512, ArmNeon, ArmSVE, etc.) in combination with the
architectural-neutral vector dialect lowering.		architectural-neutral vector dialect lowering.

}];		}];
let constructor = "mlir::createConvertVectorToLLVMPass()";		let constructor = "mlir::createConvertVectorToLLVMPass()";
// Override explicitly in C++ to allow conditional dialect dependence.		// Override explicitly in C++ to allow conditional dialect dependence.
// let dependentDialects;		// let dependentDialects;
let options = [		let options = [
Option<"reassociateFPReductions", "reassociate-fp-reductions",		Option<"reassociateFPReductions", "reassociate-fp-reductions",
"bool", /default=/"false",		"bool", /default=/"false",
"Allows llvm to reassociate floating-point reductions for speed">,		"Allows llvm to reassociate floating-point reductions for speed">,
Option<"enableIndexOptimizations", "enable-index-optimizations",		Option<"enableIndexOptimizations", "enable-index-optimizations",
"bool", /default=/"true",		"bool", /default=/"true",
"Allows compiler to assume indices fit in 32-bit if that yields "		"Allows compiler to assume indices fit in 32-bit if that yields "
"faster code">,		"faster code">,
		Option<"enableAMX", "enable-amx",
		"bool", /default=/"false",
		"Enables the use of AMX dialect while lowering the vector "
		"dialect.">,
Option<"enableAVX512", "enable-avx512",		Option<"enableAVX512", "enable-avx512",
"bool", /default=/"false",		"bool", /default=/"false",
"Enables the use of AVX512 dialect while lowering the vector "		"Enables the use of AVX512 dialect while lowering the vector "
"dialect.">,		"dialect.">,
Option<"enableArmNeon", "enable-arm-neon",		Option<"enableArmNeon", "enable-arm-neon",
"bool", /default=/"false",		"bool", /default=/"false",
"Enables the use of ArmNeon dialect while lowering the vector "		"Enables the use of ArmNeon dialect while lowering the vector "
"dialect.">,		"dialect.">,
Show All 29 Lines

mlir/include/mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h

	Show All 17 Lines

	/// Options to control Vector to LLVM lowering.			/// Options to control Vector to LLVM lowering.
	///			///
	/// This should kept in sync with VectorToLLVM options defined for the			/// This should kept in sync with VectorToLLVM options defined for the
	/// ConvertVectorToLLVM pass in include/mlir/Conversion/Passes.td			/// ConvertVectorToLLVM pass in include/mlir/Conversion/Passes.td
	struct LowerVectorToLLVMOptions {			struct LowerVectorToLLVMOptions {
	LowerVectorToLLVMOptions()			LowerVectorToLLVMOptions()
	: reassociateFPReductions(false), enableIndexOptimizations(true),			: reassociateFPReductions(false), enableIndexOptimizations(true),
	enableArmNeon(false), enableArmSVE(false), enableAVX512(false) {}			enableArmNeon(false), enableArmSVE(false), enableAMX(false),
				enableAVX512(false) {}

				ftynseUnsubmitted Done Reply Inline Actions It would be nice if we could reconsider this trick. It was introduced to make sure the type system change between built-in vectors and llvm vectors was smooth, but the type system difference is (almost) gone. It feels like we only need some casting/packing between nD and 1D vectors to make vector-to-llvm conversion separate from "ISA dialect"-to-llvm conversion. Not for this commit though. ftynse: It would be nice if we could reconsider this trick. It was introduced to make sure the type…
	LowerVectorToLLVMOptions &setReassociateFPReductions(bool b) {			LowerVectorToLLVMOptions &setReassociateFPReductions(bool b) {
	reassociateFPReductions = b;			reassociateFPReductions = b;
	return *this;			return *this;
	}			}
	LowerVectorToLLVMOptions &setEnableIndexOptimizations(bool b) {			LowerVectorToLLVMOptions &setEnableIndexOptimizations(bool b) {
	enableIndexOptimizations = b;			enableIndexOptimizations = b;
	return *this;			return *this;
	}			}
	LowerVectorToLLVMOptions &setEnableArmNeon(bool b) {			LowerVectorToLLVMOptions &setEnableArmNeon(bool b) {
	enableArmNeon = b;			enableArmNeon = b;
	return *this;			return *this;
	}			}
	LowerVectorToLLVMOptions &setEnableArmSVE(bool b) {			LowerVectorToLLVMOptions &setEnableArmSVE(bool b) {
	enableArmSVE = b;			enableArmSVE = b;
	return *this;			return *this;
	}			}
				LowerVectorToLLVMOptions &setEnableAMX(bool b) {
				enableAMX = b;
				return *this;
				}
	LowerVectorToLLVMOptions &setEnableAVX512(bool b) {			LowerVectorToLLVMOptions &setEnableAVX512(bool b) {
	enableAVX512 = b;			enableAVX512 = b;
	return *this;			return *this;
	}			}

	bool reassociateFPReductions;			bool reassociateFPReductions;
	bool enableIndexOptimizations;			bool enableIndexOptimizations;
	bool enableArmNeon;			bool enableArmNeon;
	bool enableArmSVE;			bool enableArmSVE;
				bool enableAMX;
	bool enableAVX512;			bool enableAVX512;
	};			};

	/// Collect a set of patterns to convert from Vector contractions to LLVM Matrix			/// Collect a set of patterns to convert from Vector contractions to LLVM Matrix
	/// Intrinsics. To lower to assembly, the LLVM flag -lower-matrix-intrinsics			/// Intrinsics. To lower to assembly, the LLVM flag -lower-matrix-intrinsics
	/// will be needed when invoking LLVM.			/// will be needed when invoking LLVM.
	void populateVectorToLLVMMatrixConversionPatterns(			void populateVectorToLLVMMatrixConversionPatterns(
	LLVMTypeConverter &converter, OwningRewritePatternList &patterns);			LLVMTypeConverter &converter, OwningRewritePatternList &patterns);
	Show All 13 Lines

mlir/include/mlir/Dialect/AMX/AMX.td

This file was added.

				//===-- AMXOps.td - AMX dialect operation definitions - tablegen --===//
				//
				ftynseUnsubmitted Done Reply Inline Actions Nit: this should match the file name ftynse: Nit: this should match the file name
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines the basic operations for the AMX dialect.
				//
				silvasUnsubmitted Done Reply Inline Actions drive by comment: can you provide links to the "source of truth" documentation (insofar as it is available) for folks that want to dig deeper? silvas: drive by comment: can you provide links to the "source of truth" documentation (insofar as it…
				aartbikAuthorUnsubmitted Done Reply Inline Actions I added a link (but with the caveat that Intel urls are notorious for changing all the times). aartbik: I added a link (but with the caveat that Intel urls are notorious for changing all the times).
				// The Intel Advanced Matrix Extensions (AMX) provides a tile matrix
				// multiply unit (TMUL), a tile control register (TILECFG), and eight
				// tile registers TMM0 through TMM7 (TILEDATA).
				//
				// The AMX dialect provides a bridge between MLIR concepts, such as
				// 2-d vector, operations, and memrefs, and the lower level details
				// of Intel AMX, such as configuration setup, tile sizes, instructions,
				// and tile release.
				//
				// Note that since configuration changes (implicit at dialect level) are
				// costly, it is highly recommended to use the AMX dialect on same-shaped
				// vectors, at least within a single method.
				//
				//===----------------------------------------------------------------------===//

				#ifndef AMX_OPS
				#define AMX_OPS

				ftynseUnsubmitted Done Reply Inline Actions This should also match the filename, I suppose ftynse: This should also match the filename, I suppose
				include "mlir/Dialect/LLVMIR/LLVMOpBase.td"
				include "mlir/Interfaces/SideEffectInterfaces.td"

				//===----------------------------------------------------------------------===//
				// AMX dialect definition.
				//===----------------------------------------------------------------------===//

				def AMX_Dialect : Dialect {
				let name = "amx";
				let cppNamespace = "::mlir::amx";
				}

				//===----------------------------------------------------------------------===//
				// AMX Op and IntrOp definitions.
				//===----------------------------------------------------------------------===//

				class AMX_Op<string mnemonic, list<OpTrait> traits = []> :
				Op<AMX_Dialect, mnemonic, traits> {}

				// The "internal" intrinsics are meant for compiler usage.
				class AMX_IntrOp<string mnemonic, int numResults, list<OpTrait> traits = []> :
				LLVM_IntrOpBase<AMX_Dialect, mnemonic,
				"x86_" # !subst(".", "_", mnemonic) # "_internal",
				[], [], traits, numResults>;

				//===----------------------------------------------------------------------===//
				// AMX Op definitions (user facing).
				//===----------------------------------------------------------------------===//

				//
				// Tile reset.
				//

				def TileZeroOp : AMX_Op<"tile_zero", [NoSideEffect]> {
				let summary = "tile zero operation";
				let description = [{
				Zeroes the destination tile, with the shape defined by the 2-dim
				vector type of the result. This is eventually lowered into the
				'tilezero' instruction with the corresponding tile configuration.

				Example:

				```mlir
				%0 = amx.tilezero : vector<16x16xbf16>
				```
				}];
				let verifier = [{ return ::verify(*this); }];
				let results = (outs
				VectorOfRankAndType<[2], [F32, BF16, I32, I8]>:$res);
				let extraClassDeclaration = [{
				nicolasvasilacheUnsubmitted Done Reply Inline Actions +1 on pointing to official doc so we can dig deeper. It is unclear to me whether the bytes need to be contiguous in memory of whether there is a way to accept strides and what alignment constraints are required for correctness or eprf. IIRC you mentioned there is a configuration mechanism for the sizes, does it also support something for striding for memory accesses ? nicolasvasilache: +1 on pointing to official doc so we can dig deeper. It is unclear to me whether the bytes need…
				aartbikAuthorUnsubmitted Done Reply Inline Actions The row elements are contiguous, the column starting points are defined by a stride, hardcoded in the instructions. I added a comment to the intrinsics that have that stride. aartbik: The row elements are contiguous, the column starting points are defined by a stride, hardcoded…
				VectorType getVectorType() {
				return res().getType().cast<VectorType>();
				}
				}];
				let assemblyFormat = "attr-dict `:` type($res)";
				}

				//
				// Tile memory operations.
				//

				def TileLoadOp : AMX_Op<"tile_load", [NoSideEffect]> {
				let summary = "tile load operation";
				let description = [{
				Loads a tile from memory defined by a base and indices, with the
				shape defined by the 2-dim vector type of the result. This is
				eventually lowered into the 'tileloadd' instruction with the
				corresponding tile configuration.

				Example:

				```mlir
				%0 = amx.tileload %arg0[%c0, %c0] : memref<?x?xi8> into vector<16x64xi8>
				```
				}];
				let verifier = [{ return ::verify(*this); }];
				let arguments = (ins Arg<AnyMemRef, "load base", [MemRead]>:$base,
				Variadic<Index>:$indices);
				let results = (outs
				VectorOfRankAndType<[2], [F32, BF16, I32, I8]>:$res);
				let extraClassDeclaration = [{
				MemRefType getMemRefType() {
				return base().getType().cast<MemRefType>();
				}
				VectorType getVectorType() {
				return res().getType().cast<VectorType>();
				}
				}];
				let assemblyFormat = "$base `[` $indices `]` attr-dict `:` "
				"type($base) `into` type($res)";
				}

				def TileStoreOp : AMX_Op<"tile_store"> {
				let summary = "tile store operation";
				let description = [{
				Stores a tile to memory defined by a base and indices, with the
				shape defined by the 2-dim vector type of the value. This is
				eventually lowered into the 'tilestored' instruction with the
				corresponding tile configuration.

				Example:

				```mlir
				amx.tilestore %arg1[%c0, %c0], %0 : memref<?x?xi8>, vector<16x64xi8>
				```
				nicolasvasilacheUnsubmitted Done Reply Inline Actions You also want to add some type checking on `lhs/rhs` via `TypeMatchesWith` what here is an example from AVX512 for syntax purposes: def MaskRndScaleOp : AVX512_Op<"mask.rndscale", [NoSideEffect, AllTypesMatch<["src", "a", "dst"]>, TypesMatchWith<"imm has the same number of bits as elements in dst", "dst", "imm", "IntegerType::get($_self.getContext(), " "($_self.cast<VectorType>().getShape()[0]))">]> { ... Edit: ah ok I see it appears in the C++ part. Feel free to ignore this comment and leave it as is in C++ or lift some of that into TypesMatchWith. nicolasvasilache: You also want to add some type checking on `lhs/rhs` via `TypeMatchesWith` what here is an…
				}];
				let verifier = [{ return ::verify(*this); }];
				let arguments = (ins Arg<AnyMemRef, "store base", [MemWrite]>:$base,
				ftynseUnsubmitted Done Reply Inline Actions Nit: putting quotes or backticks, e.g., an "m x k" tile with a "k x n" around variables would make it more readable ftynse: Nit: putting quotes or backticks, e.g., an "m x k" tile with a "k x n" around variables would…
				Variadic<Index>:$indices,
				VectorOfRankAndType<[2], [F32, BF16, I32, I8]>:$val);
				let extraClassDeclaration = [{
				MemRefType getMemRefType() {
				return base().getType().cast<MemRefType>();
				}
				VectorType getVectorType() {
				return val().getType().cast<VectorType>();
				}
				}];
				let assemblyFormat = "$base `[` $indices `]` `,` $val attr-dict `:` "
				"type($base) `,` type($val)";
				}

				//
				ftynseUnsubmitted Done Reply Inline Actions Nit: something went wrong with whitespace ftynse: Nit: something went wrong with whitespace
				// Tile arithmetic operations.
				//

				def TileMulFOp : AMX_Op<"tile_mulf", [NoSideEffect, AllTypesMatch<["acc", "res"]>]> {
				let summary = "tile multiplication operation (floating-point)";
				let description = [{
				Multiplies a m x k tile with a k x n tile and accumulates the results
				into a m x n destination tile. Supports f32 <- bf16 x bf16 (with
				pairs of bf16). The operation is eventually lowered into the
				'tdpbf16ps' instruction with the corresponding tile configuration.

				Example:

				```mlir
				%0 = amx.tilemulf %a, %b, %c
				: vector<16x32xbf16>, vector<16x32xbf16>, vector<16x16xf32>
				```
				}];
				let verifier = [{ return ::verify(*this); }];
				let arguments = (ins VectorOfRankAndType<[2], [F32, BF16]>:$lhs,
				VectorOfRankAndType<[2], [F32, BF16]>:$rhs,
				VectorOfRankAndType<[2], [F32, BF16]>:$acc);
				let results = (outs VectorOfRankAndType<[2], [F32, BF16]>:$res);
				let extraClassDeclaration = [{
				VectorType getLhsVectorType() {
				return lhs().getType().cast<VectorType>();
				}
				VectorType getRhsVectorType() {
				return rhs().getType().cast<VectorType>();
				}
				VectorType getVectorType() {
				return res().getType().cast<VectorType>();
				}
				}];
				let assemblyFormat = "$lhs `,` $rhs `,` $acc attr-dict `:` "
				"type($lhs) `,` type($rhs) `,` type($acc) ";
				}

				def TileMulIOp : AMX_Op<"tile_muli", [NoSideEffect, AllTypesMatch<["acc", "res"]>]> {
				let summary = "tile multiplication operation (integer)";
				let description = [{
				Multiplies a m x k tile with a k x n tile and accumulates the results
				into a m x n destination tile. Supports all si32 <- s/ui8 x s/ui8
				combinations (4 bytes packed into dwords in the columns of both the
				source operand tiles; the zero or sign extension is specified with
				the attributes). The operation is eventually lowered into one of
				the tdpbssd', 'tdpbsud', 'tdpbusd', or 'tdpbuud' instructions with
				the corresponding tile configuration.

				Example:

				```mlir
				%0 = amx.tilemuli %a, %b, %c [true, true]
				: vector<16x64xi8>, vector<16x64xi8>, vector<16x16xi32>
				```
				}];
				let verifier = [{ return ::verify(*this); }];
				let arguments = (ins VectorOfRankAndType<[2], [I32, I8]>:$lhs,
				VectorOfRankAndType<[2], [I32, I8]>:$rhs,
				VectorOfRankAndType<[2], [I32, I8]>:$acc,
				BoolArrayAttr:$zext);
				let results = (outs VectorOfRankAndType<[2], [I32, I8]>:$res);
				let extraClassDeclaration = [{
				VectorType getLhsVectorType() {
				return lhs().getType().cast<VectorType>();
				}
				VectorType getRhsVectorType() {
				return rhs().getType().cast<VectorType>();
				}
				VectorType getVectorType() {
				return res().getType().cast<VectorType>();
				}
				}];
				let assemblyFormat = "$lhs `,` $rhs `,` $acc $zext attr-dict `:` "
				"type($lhs) `,` type($rhs) `,` type($acc) ";
				}

				//===----------------------------------------------------------------------===//
				// AMX IntrOp definitions (LLVM compiler facing).
				//===----------------------------------------------------------------------===//

				//
				// Tile reset.
				//

				def LLVM_x86_amx_tilezero : AMX_IntrOp<"tilezero", 1>,
				Arguments<(ins LLVM_Type, LLVM_Type)>;

				//
				// Tile memory operations.
				//

				def LLVM_x86_amx_tileloadd64 : AMX_IntrOp<"tileloadd64", 1>,
				Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

				def LLVM_x86_amx_tilestored64 : AMX_IntrOp<"tilestored64", 0>,
				Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

				//
				// Tile multiplication operations (series of dot products).
				//

				def LLVM_x86_amx_tdpbf16ps : AMX_IntrOp<"tdpbf16ps", 1>,
				Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

				def LLVM_x86_amx_tdpbssd : AMX_IntrOp<"tdpbssd", 1>,
				Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

				def LLVM_x86_amx_tdpbsud : AMX_IntrOp<"tdpbsud", 1>,
				Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

				def LLVM_x86_amx_tdpbusd : AMX_IntrOp<"tdpbusd", 1>,
				Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

				def LLVM_x86_amx_tdpbuud : AMX_IntrOp<"tdpbuud", 1>,
				Arguments<(ins LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type, LLVM_Type)>;

				#endif // AMX_OPS

mlir/include/mlir/Dialect/AMX/AMXDialect.h

This file was added.

				//===- AMXDialect.h - MLIR Dialect for AMX ----------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file declares the Target dialect for AMX in MLIR.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_DIALECT_AMX_AMXDIALECT_H_
				#define MLIR_DIALECT_AMX_AMXDIALECT_H_

				#include "mlir/IR/BuiltinTypes.h"
				#include "mlir/IR/Dialect.h"
				#include "mlir/IR/OpDefinition.h"
				#include "mlir/Interfaces/SideEffectInterfaces.h"

				#include "mlir/Dialect/AMX/AMXDialect.h.inc"

				#define GET_OP_CLASSES
				#include "mlir/Dialect/AMX/AMX.h.inc"

				#endif // MLIR_DIALECT_AMX_AMXDIALECT_H_

mlir/include/mlir/Dialect/AMX/CMakeLists.txt

This file was added.

				add_mlir_dialect(AMX amx)
				add_mlir_doc(AMX -gen-dialect-doc AMX Dialects/)

				set(LLVM_TARGET_DEFINITIONS AMX.td)
				mlir_tablegen(AMXConversions.inc -gen-llvmir-conversions)
				add_public_tablegen_target(MLIRAMXConversionsIncGen)

mlir/include/mlir/Dialect/AMX/Transforms.h

This file was added.

				//===- Transforms.h - AMX Dialect Transformation Entrypoints ----- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_DIALECT_AMX_TRANSFORMS_H
				#define MLIR_DIALECT_AMX_TRANSFORMS_H

				namespace mlir {

				class LLVMConversionTarget;
				class LLVMTypeConverter;
				class OwningRewritePatternList;

				/// Collect a set of patterns to lower AMX ops to ops that map to LLVM
				/// intrinsics.
				void populateAMXLegalizeForLLVMExportPatterns(
				LLVMTypeConverter &converter, OwningRewritePatternList &patterns);

				/// Configure the target to support lowering AMX ops to ops that map to LLVM
				/// intrinsics.
				void configureAMXLegalizeForExportTarget(LLVMConversionTarget &target);

				} // namespace mlir

				#endif // MLIR_DIALECT_AMX_TRANSFORMS_H

mlir/include/mlir/Dialect/CMakeLists.txt

	add_subdirectory(Affine)			add_subdirectory(Affine)
	add_subdirectory(Async)			add_subdirectory(Async)
	add_subdirectory(ArmNeon)			add_subdirectory(ArmNeon)
	add_subdirectory(ArmSVE)			add_subdirectory(ArmSVE)
				add_subdirectory(AMX)
	add_subdirectory(AVX512)			add_subdirectory(AVX512)
	add_subdirectory(Complex)			add_subdirectory(Complex)
	add_subdirectory(GPU)			add_subdirectory(GPU)
	add_subdirectory(Math)			add_subdirectory(Math)
	add_subdirectory(Linalg)			add_subdirectory(Linalg)
	add_subdirectory(LLVMIR)			add_subdirectory(LLVMIR)
	add_subdirectory(OpenACC)			add_subdirectory(OpenACC)
	add_subdirectory(OpenMP)			add_subdirectory(OpenMP)
	Show All 10 Lines

mlir/include/mlir/InitAllDialects.h

	//===- InitAllDialects.h - MLIR Dialects Registration ------------ C++ --===//			//===- InitAllDialects.h - MLIR Dialects Registration ------------ C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines a helper to trigger the registration of all dialects and			// This file defines a helper to trigger the registration of all dialects and
	// passes to the system.			// passes to the system.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_INITALLDIALECTS_H_			#ifndef MLIR_INITALLDIALECTS_H_
	#define MLIR_INITALLDIALECTS_H_			#define MLIR_INITALLDIALECTS_H_

				#include "mlir/Dialect/AMX/AMXDialect.h"
	#include "mlir/Dialect/AVX512/AVX512Dialect.h"			#include "mlir/Dialect/AVX512/AVX512Dialect.h"
	#include "mlir/Dialect/Affine/IR/AffineOps.h"			#include "mlir/Dialect/Affine/IR/AffineOps.h"
	#include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"			#include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"
	#include "mlir/Dialect/ArmSVE/ArmSVEDialect.h"			#include "mlir/Dialect/ArmSVE/ArmSVEDialect.h"
	#include "mlir/Dialect/Async/IR/Async.h"			#include "mlir/Dialect/Async/IR/Async.h"
	#include "mlir/Dialect/Complex/IR/Complex.h"			#include "mlir/Dialect/Complex/IR/Complex.h"
	#include "mlir/Dialect/GPU/GPUDialect.h"			#include "mlir/Dialect/GPU/GPUDialect.h"
	#include "mlir/Dialect/LLVMIR/LLVMArmSVEDialect.h"			#include "mlir/Dialect/LLVMIR/LLVMArmSVEDialect.h"
	Show All 19 Lines

	namespace mlir {			namespace mlir {

	/// Add all the MLIR dialects to the provided registry.			/// Add all the MLIR dialects to the provided registry.
	inline void registerAllDialects(DialectRegistry &registry) {			inline void registerAllDialects(DialectRegistry &registry) {
	// clang-format off			// clang-format off
	registry.insert<acc::OpenACCDialect,			registry.insert<acc::OpenACCDialect,
	AffineDialect,			AffineDialect,
				amx::AMXDialect,
	arm_neon::ArmNeonDialect,			arm_neon::ArmNeonDialect,
	async::AsyncDialect,			async::AsyncDialect,
	avx512::AVX512Dialect,			avx512::AVX512Dialect,
	complex::ComplexDialect,			complex::ComplexDialect,
	gpu::GPUDialect,			gpu::GPUDialect,
	LLVM::LLVMDialect,			LLVM::LLVMDialect,
	LLVM::LLVMArmSVEDialect,			LLVM::LLVMArmSVEDialect,
	linalg::LinalgDialect,			linalg::LinalgDialect,
	Show All 29 Lines

mlir/include/mlir/Target/LLVMIR/Dialect/AMX/AMXToLLVMIRTranslation.h

This file was added.

				//===- AMXToLLVMIRTranslation.h - AMX to LLVM IR ----------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This provides registration calls for AMX dialect to LLVM IR translation.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_TARGET_LLVMIR_DIALECT_AMX_AMXTOLLVMIRTRANSLATION_H
				#define MLIR_TARGET_LLVMIR_DIALECT_AMX_AMXTOLLVMIRTRANSLATION_H

				namespace mlir {

				class DialectRegistry;
				class MLIRContext;

				/// Register the AMX dialect and the translation from it to the LLVM IR
				/// in the given registry;
				void registerAMXDialectTranslation(DialectRegistry &registry);

				/// Register the AMX dialect and the translation from it in the registry
				/// associated with the given context.
				void registerAMXDialectTranslation(MLIRContext &context);

				} // namespace mlir

				#endif // MLIR_TARGET_LLVMIR_DIALECT_AMX_AMXTOLLVMIRTRANSLATION_H

mlir/include/mlir/Target/LLVMIR/Dialect/All.h

	//===- All.h - MLIR To LLVM IR Translation Registration ---------- C++ --===//			//===- All.h - MLIR To LLVM IR Translation Registration ---------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines a helper to register the translations of all suitable			// This file defines a helper to register the translations of all suitable
	// dialects to LLVM IR.			// dialects to LLVM IR.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef MLIR_TARGET_LLVMIR_DIALECT_ALL_H			#ifndef MLIR_TARGET_LLVMIR_DIALECT_ALL_H
	#define MLIR_TARGET_LLVMIR_DIALECT_ALL_H			#define MLIR_TARGET_LLVMIR_DIALECT_ALL_H

				#include "mlir/Target/LLVMIR/Dialect/AMX/AMXToLLVMIRTranslation.h"
	#include "mlir/Target/LLVMIR/Dialect/AVX512/AVX512ToLLVMIRTranslation.h"			#include "mlir/Target/LLVMIR/Dialect/AVX512/AVX512ToLLVMIRTranslation.h"
	#include "mlir/Target/LLVMIR/Dialect/ArmNeon/ArmNeonToLLVMIRTranslation.h"			#include "mlir/Target/LLVMIR/Dialect/ArmNeon/ArmNeonToLLVMIRTranslation.h"
	#include "mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h"			#include "mlir/Target/LLVMIR/Dialect/LLVMArmSVE/LLVMArmSVEToLLVMIRTranslation.h"
	#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"			#include "mlir/Target/LLVMIR/Dialect/LLVMIR/LLVMToLLVMIRTranslation.h"
	#include "mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h"			#include "mlir/Target/LLVMIR/Dialect/NVVM/NVVMToLLVMIRTranslation.h"
	#include "mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h"			#include "mlir/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.h"
	#include "mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h"			#include "mlir/Target/LLVMIR/Dialect/ROCDL/ROCDLToLLVMIRTranslation.h"

	namespace mlir {			namespace mlir {
	class DialectRegistry;			class DialectRegistry;

	/// Registers all dialects that can be translated to LLVM IR and the			/// Registers all dialects that can be translated to LLVM IR and the
	/// corresponding translation interfaces.			/// corresponding translation interfaces.
	static inline void registerAllToLLVMIRTranslations(DialectRegistry &registry) {			static inline void registerAllToLLVMIRTranslations(DialectRegistry &registry) {
	registerArmNeonDialectTranslation(registry);			registerArmNeonDialectTranslation(registry);
				registerAMXDialectTranslation(registry);
	registerAVX512DialectTranslation(registry);			registerAVX512DialectTranslation(registry);
	registerLLVMArmSVEDialectTranslation(registry);			registerLLVMArmSVEDialectTranslation(registry);
	registerLLVMDialectTranslation(registry);			registerLLVMDialectTranslation(registry);
	registerNVVMDialectTranslation(registry);			registerNVVMDialectTranslation(registry);
	registerOpenMPDialectTranslation(registry);			registerOpenMPDialectTranslation(registry);
	registerROCDLDialectTranslation(registry);			registerROCDLDialectTranslation(registry);
	}			}
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_TARGET_LLVMIR_DIALECT_ALL_H			#endif // MLIR_TARGET_LLVMIR_DIALECT_ALL_H

mlir/lib/Conversion/PassDetail.h

	Show All 24 Lines

	namespace gpu {			namespace gpu {
	class GPUDialect;			class GPUDialect;
	class GPUModuleOp;			class GPUModuleOp;
	} // end namespace gpu			} // end namespace gpu

	namespace LLVM {			namespace LLVM {
	class LLVMArmSVEDialect;			class LLVMArmSVEDialect;
	class LLVMAVX512Dialect;
	class LLVMDialect;			class LLVMDialect;
	} // end namespace LLVM			} // end namespace LLVM

	namespace NVVM {			namespace NVVM {
	class NVVMDialect;			class NVVMDialect;
	} // end namespace NVVM			} // end namespace NVVM

	namespace omp {			namespace omp {
	Show All 37 Lines

mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt

	add_mlir_conversion_library(MLIRVectorToLLVM			add_mlir_conversion_library(MLIRVectorToLLVM
	ConvertVectorToLLVM.cpp			ConvertVectorToLLVM.cpp
	ConvertVectorToLLVMPass.cpp			ConvertVectorToLLVMPass.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/VectorToLLVM			${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/VectorToLLVM

	DEPENDS			DEPENDS
	MLIRConversionPassIncGen			MLIRConversionPassIncGen
	intrinsics_gen			intrinsics_gen

	LINK_COMPONENTS			LINK_COMPONENTS
	Core			Core

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIRArmNeon			MLIRArmNeon
				MLIRAMX
				MLIRAMXTransforms
	MLIRAVX512			MLIRAVX512
	MLIRAVX512Transforms			MLIRAVX512Transforms
	MLIRArmSVE			MLIRArmSVE
	MLIRArmSVEToLLVM			MLIRArmSVEToLLVM
	MLIRLLVMArmSVE			MLIRLLVMArmSVE
	MLIRLLVMIR			MLIRLLVMIR
	MLIRStandardToLLVM			MLIRStandardToLLVM
	MLIRTargetLLVMIRExport			MLIRTargetLLVMIRExport
	MLIRTransforms			MLIRTransforms
	MLIRVector			MLIRVector
	)			)

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

//===- VectorToLLVM.cpp - Conversion from Vector to the LLVM dialect ------===//		//===- VectorToLLVM.cpp - Conversion from Vector to the LLVM dialect ------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"		#include "mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h"

#include "../PassDetail.h"		#include "../PassDetail.h"

#include "mlir/Conversion/ArmSVEToLLVM/ArmSVEToLLVM.h"		#include "mlir/Conversion/ArmSVEToLLVM/ArmSVEToLLVM.h"
#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h"		#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h"
#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"		#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVMPass.h"
		#include "mlir/Dialect/AMX/AMXDialect.h"
		#include "mlir/Dialect/AMX/Transforms.h"
#include "mlir/Dialect/AVX512/AVX512Dialect.h"		#include "mlir/Dialect/AVX512/AVX512Dialect.h"
#include "mlir/Dialect/AVX512/Transforms.h"		#include "mlir/Dialect/AVX512/Transforms.h"
#include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"		#include "mlir/Dialect/ArmNeon/ArmNeonDialect.h"
#include "mlir/Dialect/ArmSVE/ArmSVEDialect.h"		#include "mlir/Dialect/ArmSVE/ArmSVEDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMArmSVEDialect.h"		#include "mlir/Dialect/LLVMIR/LLVMArmSVEDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"		#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/StandardOps/IR/Ops.h"		#include "mlir/Dialect/StandardOps/IR/Ops.h"
#include "mlir/Dialect/Vector/VectorOps.h"		#include "mlir/Dialect/Vector/VectorOps.h"
#include "mlir/Transforms/GreedyPatternRewriteDriver.h"		#include "mlir/Transforms/GreedyPatternRewriteDriver.h"

using namespace mlir;		using namespace mlir;
using namespace mlir::vector;		using namespace mlir::vector;

namespace {		namespace {
struct LowerVectorToLLVMPass		struct LowerVectorToLLVMPass
: public ConvertVectorToLLVMBase<LowerVectorToLLVMPass> {		: public ConvertVectorToLLVMBase<LowerVectorToLLVMPass> {
LowerVectorToLLVMPass(const LowerVectorToLLVMOptions &options) {		LowerVectorToLLVMPass(const LowerVectorToLLVMOptions &options) {
this->reassociateFPReductions = options.reassociateFPReductions;		this->reassociateFPReductions = options.reassociateFPReductions;
this->enableIndexOptimizations = options.enableIndexOptimizations;		this->enableIndexOptimizations = options.enableIndexOptimizations;
this->enableArmNeon = options.enableArmNeon;		this->enableArmNeon = options.enableArmNeon;
this->enableArmSVE = options.enableArmSVE;		this->enableArmSVE = options.enableArmSVE;
		this->enableAMX = options.enableAMX;
this->enableAVX512 = options.enableAVX512;		this->enableAVX512 = options.enableAVX512;
}		}
// Override explicitly to allow conditional dialect dependence.		// Override explicitly to allow conditional dialect dependence.
void getDependentDialects(DialectRegistry &registry) const override {		void getDependentDialects(DialectRegistry &registry) const override {
registry.insert<LLVM::LLVMDialect>();		registry.insert<LLVM::LLVMDialect>();
if (enableArmNeon)		if (enableArmNeon)
registry.insert<arm_neon::ArmNeonDialect>();		registry.insert<arm_neon::ArmNeonDialect>();
if (enableArmSVE)		if (enableArmSVE)
registry.insert<LLVM::LLVMArmSVEDialect>();		registry.insert<LLVM::LLVMArmSVEDialect>();
		if (enableAMX)
		registry.insert<amx::AMXDialect>();
if (enableAVX512)		if (enableAVX512)
registry.insert<avx512::AVX512Dialect>();		registry.insert<avx512::AVX512Dialect>();
}		}
void runOnOperation() override;		void runOnOperation() override;
};		};
} // namespace		} // namespace

void LowerVectorToLLVMPass::runOnOperation() {		void LowerVectorToLLVMPass::runOnOperation() {
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	if (enableArmSVE) {
});		});
target.addDynamicallyLegalOp<CallOp, CallIndirectOp, ReturnOp>(		target.addDynamicallyLegalOp<CallOp, CallIndirectOp, ReturnOp>(
[hasScalableVectorType](Operation *op) {		[hasScalableVectorType](Operation *op) {
return !hasScalableVectorType(op->getOperandTypes()) &&		return !hasScalableVectorType(op->getOperandTypes()) &&
!hasScalableVectorType(op->getResultTypes());		!hasScalableVectorType(op->getResultTypes());
});		});
populateArmSVEToLLVMConversionPatterns(converter, patterns);		populateArmSVEToLLVMConversionPatterns(converter, patterns);
}		}
		if (enableAMX) {
		configureAMXLegalizeForExportTarget(target);
		populateAMXLegalizeForLLVMExportPatterns(converter, patterns);
		}
if (enableAVX512) {		if (enableAVX512) {
configureAVX512LegalizeForExportTarget(target);		configureAVX512LegalizeForExportTarget(target);
populateAVX512LegalizeForLLVMExportPatterns(converter, patterns);		populateAVX512LegalizeForLLVMExportPatterns(converter, patterns);
}		}

if (failed(		if (failed(
applyPartialConversion(getOperation(), target, std::move(patterns))))		applyPartialConversion(getOperation(), target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}

std::unique_ptr<OperationPass<ModuleOp>>		std::unique_ptr<OperationPass<ModuleOp>>
mlir::createConvertVectorToLLVMPass(const LowerVectorToLLVMOptions &options) {		mlir::createConvertVectorToLLVMPass(const LowerVectorToLLVMOptions &options) {
return std::make_unique<LowerVectorToLLVMPass>(options);		return std::make_unique<LowerVectorToLLVMPass>(options);
}		}

mlir/lib/Dialect/AMX/CMakeLists.txt

This file was added.

				add_subdirectory(IR)
				add_subdirectory(Transforms)

mlir/lib/Dialect/AMX/IR/AMXDialect.cpp

This file was added.

				//===- AMXOps.cpp - MLIR AMX ops implementation ---------------------------===//
				//
				ftynseUnsubmitted Done Reply Inline Actions AMXDialect.cpp ftynse: AMXDialect.cpp
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the AMX dialect and its operations.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/AMX/AMXDialect.h"
				#include "mlir/Dialect/LLVMIR/LLVMTypes.h"
				#include "mlir/IR/Builders.h"
				#include "mlir/IR/OpImplementation.h"
				#include "mlir/IR/TypeUtilities.h"

				using namespace mlir;

				void amx::AMXDialect::initialize() {
				addOperations<
				#define GET_OP_LIST
				#include "mlir/Dialect/AMX/AMX.cpp.inc"
				>();
				}

				/// Verify that AMX supports the implied tile shape.
				static LogicalResult verifyTileSize(VectorType tp) {
				const unsigned kMaxRows = 16;
				const unsigned kBitsPerRow = 64 * 8;
				unsigned col = tp.getDimSize(1) * tp.getElementType().getIntOrFloatBitWidth();
				if (tp.getDimSize(0) > kMaxRows \|\| col > kBitsPerRow)
				return failure();
				if (col & 0x1f)
				return failure(); // should be multiple of 4 bytes
				return success();
				}

				/// Verify that AMX supports the multiplication.
				static LogicalResult verifyMultShape(VectorType atp, VectorType btp,
				VectorType ctp, unsigned scale) {
				unsigned am = atp.getDimSize(0), ak = atp.getDimSize(1) >> scale;
				unsigned bk = btp.getDimSize(0), bn = btp.getDimSize(1) >> scale;
				unsigned cm = ctp.getDimSize(0), cn = ctp.getDimSize(1);
				if (cm != am \|\| cn != bn \|\| ak != bk)
				return failure();
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions I'd go for nicer error messages here: the ops expect a certain vector layout so spelling the error a bit more would be useful. nicolasvasilache: I'd go for nicer error messages here: the ops expect a certain vector layout so spelling the…
				return success();
				}

				static LogicalResult verify(amx::TileZeroOp op) {
				if (failed(verifyTileSize(op.getVectorType())))
				return op.emitOpError("unsupported tile size");
				return success();
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions if you moved these emitError inside the function performing the verification, we could have nicer error messages. nicolasvasilache: if you moved these emitError inside the function performing the verification, we could have…
				}

				static LogicalResult verify(amx::TileLoadOp op) {
				if (failed(verifyTileSize(op.getVectorType())))
				return op.emitOpError("unsupported tile size");
				return success();
				}

				static LogicalResult verify(amx::TileStoreOp op) {
				if (failed(verifyTileSize(op.getVectorType())))
				return op.emitOpError("unsupported tile size");
				return success();
				}

				static LogicalResult verify(amx::TileMulFOp op) {
				VectorType aType = op.getLhsVectorType();
				VectorType bType = op.getRhsVectorType();
				VectorType cType = op.getVectorType();
				if (failed(verifyMultShape(aType, bType, cType, 1)))
				return op.emitOpError("unexpected shape");
				if (failed(verifyTileSize(aType)) \|\| failed(verifyTileSize(bType)) \|\|
				failed(verifyTileSize(cType)))
				return op.emitOpError("unsupported tile size");
				Type ta = aType.getElementType();
				Type tb = bType.getElementType();
				Type tc = cType.getElementType();
				if (ta.isBF16() && tb.isBF16() && tc.isF32())
				return success();
				return op.emitOpError("unsupported type combination");
				}

				static LogicalResult verify(amx::TileMulIOp op) {
				if (op.zext().size() != 2)
				return op.emitOpError("unexpected zext length");
				VectorType aType = op.getLhsVectorType();
				VectorType bType = op.getRhsVectorType();
				VectorType cType = op.getVectorType();
				if (failed(verifyMultShape(aType, bType, cType, 2)))
				return op.emitOpError("unexpected shape");
				if (failed(verifyTileSize(aType)) \|\| failed(verifyTileSize(bType)) \|\|
				failed(verifyTileSize(cType)))
				return op.emitOpError("unsupported tile size");
				Type ta = aType.getElementType();
				Type tb = bType.getElementType();
				Type tc = cType.getElementType();
				if (ta.isInteger(8) && tb.isInteger(8) && tc.isInteger(32))
				return success();
				return op.emitOpError("unsupported type combination");
				}

				#define GET_OP_CLASSES
				#include "mlir/Dialect/AMX/AMX.cpp.inc"

mlir/lib/Dialect/AMX/IR/CMakeLists.txt

This file was added.

				add_mlir_dialect_library(MLIRAMX
				AMXDialect.cpp

				ADDITIONAL_HEADER_DIRS
				${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/AMX

				DEPENDS
				MLIRAMXIncGen

				LINK_LIBS PUBLIC
				MLIRIR
				MLIRLLVMIR
				MLIRSideEffectInterfaces
				)

mlir/lib/Dialect/AMX/Transforms/CMakeLists.txt

This file was added.

				add_mlir_dialect_library(MLIRAMXTransforms
				LegalizeForLLVMExport.cpp

				DEPENDS
				MLIRAMXConversionsIncGen

				LINK_LIBS PUBLIC
				MLIRAMX
				MLIRIR
				MLIRLLVMIR
				MLIRStandardToLLVM
				)

mlir/lib/Dialect/AMX/Transforms/LegalizeForLLVMExport.cpp

This file was added.

				//===- LegalizeForLLVMExport.cpp - Prepare AMX for LLVM translation ----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Dialect/AMX/Transforms.h"

				#include "mlir/Conversion/StandardToLLVM/ConvertStandardToLLVM.h"
				#include "mlir/Dialect/AMX/AMXDialect.h"
				#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
				#include "mlir/Dialect/StandardOps/IR/Ops.h"
				#include "mlir/IR/BuiltinOps.h"
				#include "mlir/IR/PatternMatch.h"

				using namespace mlir;
				using namespace mlir::amx;

				namespace {

				/// Maps the 2-dim vector shape to the two 16-bit tile sizes. The first
				/// dimension directly translates into the number of rows of the tiles.
				/// The second dimensions needs to be scaled by the number of bytes.
				void getTileSizes(ConversionPatternRewriter &rewriter,
				LLVMTypeConverter &typeConverter, VectorType vType,
				Location loc, Value &m, Value &n) {
				Type llvmInt16Type = IntegerType::get(&typeConverter.getContext(), 16);
				unsigned bytes = vType.getElementType().getIntOrFloatBitWidth() >> 3;
				auto mattr = rewriter.getI16IntegerAttr(vType.getDimSize(0));
				auto nattr = rewriter.getI16IntegerAttr(vType.getDimSize(1) * bytes);
				m = rewriter.create<LLVM::ConstantOp>(loc, llvmInt16Type, mattr);
				n = rewriter.create<LLVM::ConstantOp>(loc, llvmInt16Type, nattr);
				}

				/// Maps the 2-dim memref shape to the 64-bit stride. Note that the buffer
				/// shape may "envelop" the actual tile shape, and may be dynamically sized.
				Value getStride(ConversionPatternRewriter &rewriter,
				LLVMTypeConverter &typeConverter, MemRefType mType, Value base,
				Location loc) {
				unsigned last = mType.getShape().size();
				assert(last >= 2);
				Type llvmInt64Type = IntegerType::get(&typeConverter.getContext(), 64);
				unsigned bytes = mType.getElementType().getIntOrFloatBitWidth() >> 3;
				if (mType.isDynamicDim(last - 1)) {
				// Dynamic size needs code to compute the stride at runtime.
				MemRefDescriptor memrefDescriptor(base);
				auto attr = rewriter.getI64IntegerAttr(bytes);
				Value scale = rewriter.create<LLVM::ConstantOp>(loc, llvmInt64Type, attr);
				return rewriter.create<LLVM::MulOp>(
				loc, llvmInt64Type, scale,
				memrefDescriptor.size(rewriter, loc, last - 1));
				}
				// Use direct constant for static size.
				auto attr = rewriter.getI64IntegerAttr(mType.getDimSize(last - 1) * bytes);
				return rewriter.create<LLVM::ConstantOp>(loc, llvmInt64Type, attr);
				}

				/// Cast any pointer to the !llvm.ptr<i8> pointer type.
				Value castPtr(ConversionPatternRewriter &rewriter,
				LLVMTypeConverter &typeConverter, Location loc, Value ptr) {
				auto i8Ptr = LLVM::LLVMPointerType::get(
				IntegerType::get(&typeConverter.getContext(), 8));
				return rewriter.create<LLVM::BitcastOp>(loc, i8Ptr, ptr);
				}

				struct TileZeroConversion : public ConvertOpToLLVMPattern<TileZeroOp> {
				using ConvertOpToLLVMPattern<TileZeroOp>::ConvertOpToLLVMPattern;
				LogicalResult
				matchAndRewrite(TileZeroOp op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				VectorType vType = op.getVectorType();
				// Determine m x n tile sizes.
				Value m, n;
				getTileSizes(rewriter, *getTypeConverter(), vType, op.getLoc(), m, n);
				// Replace operation with intrinsic.
				Type resType = typeConverter->convertType(vType);
				rewriter.replaceOpWithNewOp<amx::x86_amx_tilezero>(op, resType, m, n);
				return success();
				}
				};

				struct TileLoadConversion : public ConvertOpToLLVMPattern<TileLoadOp> {
				using ConvertOpToLLVMPattern<TileLoadOp>::ConvertOpToLLVMPattern;

				LogicalResult
				matchAndRewrite(TileLoadOp op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				TileLoadOp::Adaptor adaptor(operands);
				MemRefType mType = op.getMemRefType();
				VectorType vType = op.getVectorType();
				// Determine m x n tile sizes.
				Value m, n;
				getTileSizes(rewriter, *getTypeConverter(), vType, op.getLoc(), m, n);
				// Replace operation with intrinsic.
				Value stride = getStride(rewriter, *getTypeConverter(), mType,
				adaptor.base(), op.getLoc());
				Value ptr = getStridedElementPtr(op.getLoc(), mType, adaptor.base(),
				adaptor.indices(), rewriter);
				ptr = castPtr(rewriter, *getTypeConverter(), op.getLoc(), ptr);
				Type resType = typeConverter->convertType(vType);
				rewriter.replaceOpWithNewOp<amx::x86_amx_tileloadd64>(op, resType, m, n,
				ptr, stride);
				return success();
				}
				};

				struct TileStoreConversion : public ConvertOpToLLVMPattern<TileStoreOp> {
				using ConvertOpToLLVMPattern<TileStoreOp>::ConvertOpToLLVMPattern;

				LogicalResult
				matchAndRewrite(TileStoreOp op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				TileStoreOp::Adaptor adaptor(operands);
				MemRefType mType = op.getMemRefType();
				VectorType vType = op.getVectorType();
				// Determine m x n tile sizes.
				Value m, n;
				getTileSizes(rewriter, *getTypeConverter(), vType, op.getLoc(), m, n);
				// Replace operation with intrinsic.
				Value stride = getStride(rewriter, *getTypeConverter(), mType,
				adaptor.base(), op.getLoc());
				Value ptr = getStridedElementPtr(op.getLoc(), mType, adaptor.base(),
				adaptor.indices(), rewriter);
				ptr = castPtr(rewriter, *getTypeConverter(), op.getLoc(), ptr);
				rewriter.replaceOpWithNewOp<amx::x86_amx_tilestored64>(
				op, m, n, ptr, stride, adaptor.val());
				return success();
				}
				};

				struct TileMulFConversion : public ConvertOpToLLVMPattern<TileMulFOp> {
				using ConvertOpToLLVMPattern<TileMulFOp>::ConvertOpToLLVMPattern;
				LogicalResult
				matchAndRewrite(TileMulFOp op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				TileMulFOp::Adaptor adaptor(operands);
				VectorType aType = op.getLhsVectorType();
				VectorType cType = op.getVectorType();
				// Determine m x n x k tile sizes.
				Value m, n, k;
				getTileSizes(rewriter, *getTypeConverter(), cType, op.getLoc(), m, n);
				getTileSizes(rewriter, *getTypeConverter(), aType, op.getLoc(), m, k);
				// Replace operation with intrinsic.
				Type resType = typeConverter->convertType(cType);
				rewriter.replaceOpWithNewOp<amx::x86_amx_tdpbf16ps>(
				op, resType, m, n, k, adaptor.acc(), adaptor.lhs(), adaptor.rhs());
				return success();
				}
				};

				struct TileMulIConversion : public ConvertOpToLLVMPattern<TileMulIOp> {
				using ConvertOpToLLVMPattern<TileMulIOp>::ConvertOpToLLVMPattern;
				LogicalResult
				matchAndRewrite(TileMulIOp op, ArrayRef<Value> operands,
				ConversionPatternRewriter &rewriter) const override {
				TileMulIOp::Adaptor adaptor(operands);
				VectorType aType = op.getLhsVectorType();
				VectorType bType = op.getRhsVectorType();
				VectorType cType = op.getVectorType();
				// Determine m x n x k tile sizes.
				Value m, n, k, k2;
				getTileSizes(rewriter, *getTypeConverter(), aType, op.getLoc(), m, k);
				getTileSizes(rewriter, *getTypeConverter(), bType, op.getLoc(), k2, n);
				// Replace operation with intrinsic.
				Type resType = typeConverter->convertType(cType);
				bool zexta = op.zext()[0].cast<BoolAttr>().getValue();
				bool zextb = op.zext()[1].cast<BoolAttr>().getValue();
				if (zexta && zextb)
				rewriter.replaceOpWithNewOp<amx::x86_amx_tdpbuud>(
				op, resType, m, n, k, adaptor.acc(), adaptor.lhs(), adaptor.rhs());
				else if (zexta && !zextb)
				rewriter.replaceOpWithNewOp<amx::x86_amx_tdpbusd>(
				op, resType, m, n, k, adaptor.acc(), adaptor.lhs(), adaptor.rhs());
				else if (!zexta && zextb)
				rewriter.replaceOpWithNewOp<amx::x86_amx_tdpbsud>(
				op, resType, m, n, k, adaptor.acc(), adaptor.lhs(), adaptor.rhs());
				else
				rewriter.replaceOpWithNewOp<amx::x86_amx_tdpbssd>(
				op, resType, m, n, k, adaptor.acc(), adaptor.lhs(), adaptor.rhs());
				return success();
				}
				};

				} // namespace

				void mlir::populateAMXLegalizeForLLVMExportPatterns(
				LLVMTypeConverter &converter, OwningRewritePatternList &patterns) {
				// Registry::registerPatterns(converter, patterns);
				bondhugulaUnsubmitted Done Reply Inline Actions Drop commented out code? bondhugula: Drop commented out code?
				patterns.insert<TileZeroConversion, TileLoadConversion, TileStoreConversion,
				TileMulFConversion, TileMulIConversion>(converter);
				}

				void mlir::configureAMXLegalizeForExportTarget(LLVMConversionTarget &target) {
				// Registry::configureTarget(target);
				bondhugulaUnsubmitted Done Reply Inline Actions Drop commented out code? bondhugula: Drop commented out code?
				target.addLegalOp<x86_amx_tilezero, x86_amx_tileloadd64, x86_amx_tilestored64,
				x86_amx_tdpbf16ps, x86_amx_tdpbssd, x86_amx_tdpbsud,
				x86_amx_tdpbusd, x86_amx_tdpbuud>();
				target.addIllegalOp<TileZeroOp, TileLoadOp, TileStoreOp, TileMulIOp,
				TileMulFOp>();
				}

mlir/lib/Dialect/CMakeLists.txt

	add_subdirectory(Affine)			add_subdirectory(Affine)
	add_subdirectory(ArmNeon)			add_subdirectory(ArmNeon)
	add_subdirectory(ArmSVE)			add_subdirectory(ArmSVE)
	add_subdirectory(Async)			add_subdirectory(Async)
				add_subdirectory(AMX)
	add_subdirectory(AVX512)			add_subdirectory(AVX512)
	add_subdirectory(Complex)			add_subdirectory(Complex)
	add_subdirectory(GPU)			add_subdirectory(GPU)
	add_subdirectory(Linalg)			add_subdirectory(Linalg)
	add_subdirectory(LLVMIR)			add_subdirectory(LLVMIR)
	add_subdirectory(Math)			add_subdirectory(Math)
	add_subdirectory(OpenACC)			add_subdirectory(OpenACC)
	add_subdirectory(OpenMP)			add_subdirectory(OpenMP)
	Show All 26 Lines

mlir/lib/Target/LLVMIR/CMakeLists.txt

Show All 31 Lines	add_mlir_translation_library(MLIRTargetLLVMIRExport
MLIRTranslation		MLIRTranslation
)		)

add_mlir_translation_library(MLIRToLLVMIRTranslationRegistration		add_mlir_translation_library(MLIRToLLVMIRTranslationRegistration
ConvertToLLVMIR.cpp		ConvertToLLVMIR.cpp

LINK_LIBS PUBLIC		LINK_LIBS PUBLIC
MLIRArmNeonToLLVMIRTranslation		MLIRArmNeonToLLVMIRTranslation
		MLIRAMXToLLVMIRTranslation
MLIRAVX512ToLLVMIRTranslation		MLIRAVX512ToLLVMIRTranslation
MLIRLLVMArmSVEToLLVMIRTranslation		MLIRLLVMArmSVEToLLVMIRTranslation
MLIRLLVMToLLVMIRTranslation		MLIRLLVMToLLVMIRTranslation
MLIRNVVMToLLVMIRTranslation		MLIRNVVMToLLVMIRTranslation
MLIROpenMPToLLVMIRTranslation		MLIROpenMPToLLVMIRTranslation
MLIRROCDLToLLVMIRTranslation		MLIRROCDLToLLVMIRTranslation
)		)

Show All 14 Lines

mlir/lib/Target/LLVMIR/Dialect/AMX/AMXToLLVMIRTranslation.cpp

This file was added.

				//===- AMXToLLVMIRTranslation.cpp - Translate AMX to LLVM IR --------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements a translation between the AMX dialect and LLVM IR.
				//
				//===----------------------------------------------------------------------===//

				#include "mlir/Target/LLVMIR/Dialect/AMX/AMXToLLVMIRTranslation.h"
				#include "mlir/Dialect/AMX/AMXDialect.h"
				#include "mlir/IR/Operation.h"
				#include "mlir/Target/LLVMIR/ModuleTranslation.h"

				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IntrinsicsX86.h"

				using namespace mlir;
				using namespace mlir::LLVM;

				namespace {
				/// Implementation of the dialect interface that converts operations belonging
				/// to the AMX dialect to LLVM IR.
				class AMXDialectLLVMIRTranslationInterface
				: public LLVMTranslationDialectInterface {
				public:
				using LLVMTranslationDialectInterface::LLVMTranslationDialectInterface;

				/// Translates the given operation to LLVM IR using the provided IR builder
				/// and saving the state in `moduleTranslation`.
				LogicalResult
				convertOperation(Operation *op, llvm::IRBuilderBase &builder,
				LLVM::ModuleTranslation &moduleTranslation) const final {
				Operation &opInst = *op;
				#include "mlir/Dialect/AMX/AMXConversions.inc"

				return failure();
				}
				};
				} // end namespace

				void mlir::registerAMXDialectTranslation(DialectRegistry &registry) {
				registry.insert<amx::AMXDialect>();
				registry.addDialectInterface<amx::AMXDialect,
				AMXDialectLLVMIRTranslationInterface>();
				}

				void mlir::registerAMXDialectTranslation(MLIRContext &context) {
				DialectRegistry registry;
				registerAMXDialectTranslation(registry);
				context.appendDialectRegistry(registry);
				}

mlir/lib/Target/LLVMIR/Dialect/AMX/CMakeLists.txt

This file was added.

				add_mlir_translation_library(MLIRAMXToLLVMIRTranslation
				AMXToLLVMIRTranslation.cpp

				DEPENDS
				MLIRAMXConversionsIncGen

				LINK_COMPONENTS
				Core

				LINK_LIBS PUBLIC
				MLIRIR
				MLIRAMX
				MLIRLLVMIR
				MLIRSupport
				MLIRTargetLLVMIRExport
				)

mlir/lib/Target/LLVMIR/Dialect/CMakeLists.txt

	add_subdirectory(ArmNeon)			add_subdirectory(ArmNeon)
				add_subdirectory(AMX)
	add_subdirectory(AVX512)			add_subdirectory(AVX512)
	add_subdirectory(LLVMArmSVE)			add_subdirectory(LLVMArmSVE)
	add_subdirectory(LLVMIR)			add_subdirectory(LLVMIR)
	add_subdirectory(NVVM)			add_subdirectory(NVVM)
	add_subdirectory(OpenMP)			add_subdirectory(OpenMP)
	add_subdirectory(ROCDL)			add_subdirectory(ROCDL)

mlir/test/CMakeLists.txt

	Show All 23 Lines
	# for the mlir rocm / spirv / vulkan runner tests.			# for the mlir rocm / spirv / vulkan runner tests.
	set(MLIR_ROCM_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})			set(MLIR_ROCM_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})
	set(MLIR_SPIRV_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})			set(MLIR_SPIRV_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})
	set(MLIR_VULKAN_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})			set(MLIR_VULKAN_WRAPPER_LIBRARY_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})

	if (MLIR_INCLUDE_INTEGRATION_TESTS)			if (MLIR_INCLUDE_INTEGRATION_TESTS)
	set(INTEL_SDE_EXECUTABLE "" CACHE STRING			set(INTEL_SDE_EXECUTABLE "" CACHE STRING
	"If set, arch-specific integration tests are run with Intel SDE.")			"If set, arch-specific integration tests are run with Intel SDE.")
				option(MLIR_RUN_AMX_TESTS "Run AMX tests.")
	option(MLIR_RUN_AVX512_TESTS "Run AVX512 tests.")			option(MLIR_RUN_AVX512_TESTS "Run AVX512 tests.")
	# Passed to lit.site.cfg.py.in to set up the path where to find the libraries.			# Passed to lit.site.cfg.py.in to set up the path where to find the libraries.
	set(MLIR_INTEGRATION_TEST_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})			set(MLIR_INTEGRATION_TEST_DIR ${CMAKE_LIBRARY_OUTPUT_DIRECTORY})

	# Copy test data over.			# Copy test data over.
	file(COPY ${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/test.mtx			file(COPY ${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/test.mtx
	${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/test.tns			${CMAKE_CURRENT_SOURCE_DIR}/Integration/data/test.tns
	DESTINATION ${MLIR_INTEGRATION_TEST_DIR}/data/)			DESTINATION ${MLIR_INTEGRATION_TEST_DIR}/data/)
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

mlir/test/Dialect/AMX/invalid.mlir

This file was added.

				// RUN: mlir-opt %s -split-input-file -verify-diagnostics

				// -----

				func @rowsize() {
				// expected-error@+1 {{'amx.tile_zero' op unsupported tile size}}
				%0 = amx.tile_zero : vector<17x16xbf16>
				}

				// -----

				func @colsize(%arg0: memref<?x?xi8>) {
				%0 = constant 0 : index
				// expected-error@+1 {{'amx.tile_load' op unsupported tile size}}
				%1 = amx.tile_load %arg0[%0, %0] : memref<?x?xi8> into vector<16x65xi8>
				}

				// -----

				func @multsize() {
				%0 = amx.tile_zero : vector<8x8xbf16>
				%1 = amx.tile_zero : vector<8x8xbf16>
				%2 = amx.tile_zero : vector<4x4xf32>
				// expected-error@+1 {{'amx.tile_mulf' op unexpected shape}}
				%3 = amx.tile_mulf %0, %1, %2 : vector<8x8xbf16>, vector<8x8xbf16>, vector<4x4xf32>
				}

				// -----

				func @zextsize() {
				%0 = amx.tile_zero : vector<8x8xi8>
				%1 = amx.tile_zero : vector<8x8xi8>
				%2 = amx.tile_zero : vector<8x8xi32>
				// expected-error@+1 {{'amx.tile_muli' op unexpected zext length}}
				%3 = amx.tile_muli %0, %1, %2 [true] : vector<8x8xi8>, vector<8x8xi8>, vector<8x8xi32>
				}

mlir/test/Dialect/AMX/legalize-for-llvm.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-llvm="enable-amx" \| mlir-opt \| FileCheck %s

				// CHECK-LABEL: muli(
				// CHECK: amx.tilezero
				// CHECK: amx.tileloadd64
				// CHECK: amx.tileloadd64
				// CHECK: amx.tdpbuud
				// CHECK: amx.tilestored64
				// CHECK: amx.tdpbssd
				// CHECK: amx.tilestored64
				// CHECK: amx.tdpbusd
				// CHECK: amx.tilestored64
				// CHECK: amx.tdpbsud
				// CHECK: amx.tilestored64
				func @muli(%arg0: memref<?x?xi8>, %arg1: memref<?x?xi32>) {
				%0 = constant 0 : index
				%1 = amx.tile_zero : vector<16x64xi8>
				%2 = amx.tile_load %arg0[%0, %0] : memref<?x?xi8> into vector<16x64xi8>
				%3 = amx.tile_load %arg1[%0, %0] : memref<?x?xi32> into vector<16x16xi32>
				%4 = amx.tile_muli %1, %2, %3 [true, true] : vector<16x64xi8>, vector<16x64xi8>, vector<16x16xi32>
				amx.tile_store %arg1[%0, %0], %4 : memref<?x?xi32>, vector<16x16xi32>
				%5 = amx.tile_muli %1, %2, %3 [false, false] : vector<16x64xi8>, vector<16x64xi8>, vector<16x16xi32>
				amx.tile_store %arg1[%0, %0], %5 : memref<?x?xi32>, vector<16x16xi32>
				%6 = amx.tile_muli %1, %2, %3 [true, false] : vector<16x64xi8>, vector<16x64xi8>, vector<16x16xi32>
				amx.tile_store %arg1[%0, %0], %6 : memref<?x?xi32>, vector<16x16xi32>
				%7 = amx.tile_muli %1, %2, %3 [false, true] : vector<16x64xi8>, vector<16x64xi8>, vector<16x16xi32>
				amx.tile_store %arg1[%0, %0], %7 : memref<?x?xi32>, vector<16x16xi32>
				return
				}

				// CHECK-LABEL: mulf(
				// CHECK: amx.tilezero
				// CHECK: amx.tileloadd64
				// CHECK: amx.tileloadd64
				// CHECK: amx.tdpbf16ps
				// CHECK: amx.tilestored64
				func @mulf(%arg0: memref<?x?xbf16>, %arg1: memref<?x?xf32>) {
				%0 = constant 0 : index
				%1 = amx.tile_zero : vector<16x32xbf16>
				%2 = amx.tile_load %arg0[%0, %0] : memref<?x?xbf16> into vector<16x32xbf16>
				%3 = amx.tile_load %arg1[%0, %0] : memref<?x?xf32> into vector<16x16xf32>
				%4 = amx.tile_mulf %1, %2, %3 : vector<16x32xbf16>, vector<16x32xbf16>, vector<16x16xf32>
				amx.tile_store %arg1[%0, %0], %4 : memref<?x?xf32>, vector<16x16xf32>
				return
				}

mlir/test/Dialect/AMX/roundtrip.mlir

This file was added.

				// RUN: mlir-opt -verify-diagnostics %s \| mlir-opt \| FileCheck %s

				// CHECK-LABEL: tzero
				// CHECK: amx.tile_zero : vector<16x16xbf16>
				// CHECK amx.tile_store %{{.}}[%{{.}}, %{{.}}], %{{.}} : memref<?x?xbf16>, vector<16x16xbf16>
				func @tzero(%arg0: memref<?x?xbf16>) {
				%0 = constant 0 : index
				%1 = amx.tile_zero : vector<16x16xbf16>
				amx.tile_store %arg0[%0, %0], %1 : memref<?x?xbf16>, vector<16x16xbf16>
				return
				}

				// CHECK-LABEL: tmulf
				// CHECK: %[[x:.]] = amx.tile_load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xbf16> into vector<16x32xbf16>
				// CHECK: %[[z:.]] = amx.tile_load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xf32> into vector<16x16xf32>
				// CHECK: %[[m:.*]] = amx.tile_mulf %[[x]], %[[x]], %[[z]] : vector<16x32xbf16>, vector<16x32xbf16>, vector<16x16xf32>
				// CHECK: amx.tile_store %{{.}}[%{{.}}, %{{.*}}], %[[m]] : memref<?x?xf32>, vector<16x16xf32>
				func @tmulf(%arg0: memref<?x?xbf16>, %arg1: memref<?x?xf32>) {
				%0 = constant 0 : index
				%1 = amx.tile_load %arg0[%0, %0] : memref<?x?xbf16> into vector<16x32xbf16>
				%2 = amx.tile_load %arg1[%0, %0] : memref<?x?xf32> into vector<16x16xf32>
				%3 = amx.tile_mulf %1, %1, %2 : vector<16x32xbf16>, vector<16x32xbf16>, vector<16x16xf32>
				amx.tile_store %arg1[%0, %0], %3 : memref<?x?xf32>, vector<16x16xf32>
				return
				}

				// CHECK-LABEL: tmuli
				// CHECK: %[[x:.]] = amx.tile_load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xi8> into vector<16x64xi8>
				// CHECK: %[[y:.]] = amx.tile_load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xi8> into vector<16x64xi8>
				// CHECK: %[[z:.]] = amx.tile_load %{{.}}[%{{.}}, %{{.}}] : memref<?x?xi32> into vector<16x16xi32>
				// CHECK: %[[m:.*]] = amx.tile_muli %[[x]], %[[y]], %[[z]] [true, true] : vector<16x64xi8>, vector<16x64xi8>, vector<16x16xi32>
				// CHECK: amx.tile_store %{{.}}[%{{.}}, %{{.*}}], %[[m]] : memref<?x?xi32>, vector<16x16xi32>
				func @tmuli(%arg0: memref<?x?xi8>, %arg1: memref<?x?xi8>, %arg2: memref<?x?xi32>) {
				%0 = constant 0 : index
				%1 = amx.tile_load %arg0[%0, %0] : memref<?x?xi8> into vector<16x64xi8>
				%2 = amx.tile_load %arg1[%0, %0] : memref<?x?xi8> into vector<16x64xi8>
				%3 = amx.tile_load %arg2[%0, %0] : memref<?x?xi32> into vector<16x16xi32>
				%4 = amx.tile_muli %1, %2, %3 [true, true] : vector<16x64xi8>, vector<16x64xi8>, vector<16x16xi32>
				amx.tile_store %arg2[%0, %0], %4 : memref<?x?xi32>, vector<16x16xi32>
				return
				}

mlir/test/Integration/Dialect/Vector/CPU/AMX/lit.local.cfg

This file was added.

				import sys

				# AMX tests must be enabled via build flag.
				if config.mlir_run_amx_tests != 'ON':
				config.unsupported = True

				# No JIT on win32.
				if sys.platform == 'win32':
				config.unsupported = True

				if config.intel_sde_executable:
				# Run test in emulator (Intel SDE): AMX needs Sapphire Rapids CPU.
				config.substitutions.append(('%lli', config.intel_sde_executable + ' -spr -- lli'))
				else:
				config.substitutions.append(('%lli', 'lli'))

mlir/test/Integration/Dialect/Vector/CPU/AMX/test-mulf.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-scf -lower-affine -convert-scf-to-std -convert-vector-to-llvm="enable-amx" -convert-std-to-llvm \| \
				// RUN: mlir-translate -mlir-to-llvmir \| \
				// RUN: %lli --entry-function=entry --mattr="+amx-tile,+amx-int8,+amx-bf16" --dlopen=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				// Note: To run this test, your CPU must support AMX.

				// Multiply into zeroed destination.
				func @kernel1(%arg0: memref<2x4xbf16>,
				%arg1: memref<2x4xbf16>,
				%arg2: memref<2x2xf32>) {
				%0 = constant 0 : index
				%1 = amx.tile_load %arg0[%0, %0] : memref<2x4xbf16> into vector<2x4xbf16>
				%2 = amx.tile_load %arg1[%0, %0] : memref<2x4xbf16> into vector<2x4xbf16>
				%3 = amx.tile_zero : vector<2x2xf32>
				%4 = amx.tile_mulf %1, %2, %3 : vector<2x4xbf16>, vector<2x4xbf16>, vector<2x2xf32>
				amx.tile_store %arg2[%0, %0], %4 : memref<2x2xf32>, vector<2x2xf32>
				return
				}

				// Multiply and update into destination.
				func @kernel2(%arg0: memref<2x4xbf16>,
				%arg1: memref<2x4xbf16>,
				%arg2: memref<2x2xf32>) {
				%0 = constant 0 : index
				%1 = amx.tile_load %arg0[%0, %0] : memref<2x4xbf16> into vector<2x4xbf16>
				%2 = amx.tile_load %arg1[%0, %0] : memref<2x4xbf16> into vector<2x4xbf16>
				%3 = amx.tile_load %arg2[%0, %0] : memref<2x2xf32> into vector<2x2xf32>
				%4 = amx.tile_mulf %1, %2, %3 : vector<2x4xbf16>, vector<2x4xbf16>, vector<2x2xf32>
				amx.tile_store %arg2[%0, %0], %4 : memref<2x2xf32>, vector<2x2xf32>
				return
				}

				func @entry() {
				%f0 = constant 0.0: f32
				%c0 = constant 0: index
				%c1 = constant 1: index
				%c2 = constant 2: index

				// Set up memory.
				%a = alloc() : memref<2x4xbf16>
				%b = alloc() : memref<2x4xbf16>
				%c = alloc() : memref<2x2xf32>

				%0 = std.constant dense<[[1.0, 2.0, 3.0, 4.0 ],
				[5.0, 6.0, 7.0, 8.0 ]]> : vector<2x4xbf16>
				vector.transfer_write %0, %a[%c0, %c0] : vector<2x4xbf16>, memref<2x4xbf16>
				%1 = std.constant dense<[[ 9.0, 10.0, 11.0, 12.0 ],
				[13.0, 14.0, 15.0, 16.0 ]]> : vector<2x4xbf16>
				vector.transfer_write %1, %b[%c0, %c0] : vector<2x4xbf16>, memref<2x4xbf16>

				// Call kernel.
				call @kernel1(%a, %b, %c) : (memref<2x4xbf16>, memref<2x4xbf16>, memref<2x2xf32>) -> ()

				// Print and verify.
				//
				// CHECK: ( 124, 144 )
				// CHECK-NEXT: ( 308, 360 )
				scf.for %i = %c0 to %c2 step %c1 {
				%av = vector.transfer_read %c[%i, %c0], %f0: memref<2x2xf32>, vector<2xf32>
				vector.print %av : vector<2xf32>
				}

				// Call kernel.
				call @kernel2(%a, %b, %c) : (memref<2x4xbf16>, memref<2x4xbf16>, memref<2x2xf32>) -> ()

				// Print and verify.
				//
				// CHECK-NEXT: ( 248, 288 )
				// CHECK-NEXT: ( 616, 720 )
				//
				scf.for %i = %c0 to %c2 step %c1 {
				%cv = vector.transfer_read %c[%i, %c0], %f0: memref<2x2xf32>, vector<2xf32>
				vector.print %cv : vector<2xf32>
				}

				// Release resources.
				dealloc %a : memref<2x4xbf16>
				dealloc %b : memref<2x4xbf16>
				dealloc %c : memref<2x2xf32>

				return
				}

mlir/test/Integration/Dialect/Vector/CPU/AMX/test-muli.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-scf -lower-affine -convert-scf-to-std -convert-vector-to-llvm="enable-amx" -convert-std-to-llvm \| \
				// RUN: mlir-translate -mlir-to-llvmir \| \
				// RUN: %lli --entry-function=entry --mattr="+amx-tile,+amx-int8,+amx-bf16" --dlopen=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				// Note: To run this test, your CPU must support AMX.

				// Multiply into zeroed destination.
				func @kernel1(%arg0: memref<2x8xi8>,
				%arg1: memref<2x8xi8>,
				%arg2: memref<2x2xi32>) {
				%0 = constant 0 : index
				%1 = amx.tile_load %arg0[%0, %0] : memref<2x8xi8> into vector<2x8xi8>
				%2 = amx.tile_load %arg1[%0, %0] : memref<2x8xi8> into vector<2x8xi8>
				%3 = amx.tile_zero : vector<2x2xi32>
				%4 = amx.tile_muli %1, %2, %3 [true, true] : vector<2x8xi8>, vector<2x8xi8>, vector<2x2xi32>
				amx.tile_store %arg2[%0, %0], %4 : memref<2x2xi32>, vector<2x2xi32>
				return
				}

				// Multiply and update into destination.
				func @kernel2(%arg0: memref<2x8xi8>,
				%arg1: memref<2x8xi8>,
				%arg2: memref<2x2xi32>) {
				%0 = constant 0 : index
				%1 = amx.tile_load %arg0[%0, %0] : memref<2x8xi8> into vector<2x8xi8>
				%2 = amx.tile_load %arg1[%0, %0] : memref<2x8xi8> into vector<2x8xi8>
				%3 = amx.tile_load %arg2[%0, %0] : memref<2x2xi32> into vector<2x2xi32>
				%4 = amx.tile_muli %1, %2, %3 [true, true] : vector<2x8xi8>, vector<2x8xi8>, vector<2x2xi32>
				amx.tile_store %arg2[%0, %0], %4 : memref<2x2xi32>, vector<2x2xi32>
				return
				}

				func @entry() {
				%i0 = constant 0: i32
				%c0 = constant 0: index
				%c1 = constant 1: index
				%c2 = constant 2: index

				// Set up memory.
				%a = alloc() : memref<2x8xi8>
				%b = alloc() : memref<2x8xi8>
				%c = alloc() : memref<2x2xi32>

				%0 = std.constant dense<[[1 , 2, 3 , 4 , 5, 6, 7, 8],
				[9, 10, 11, 12, 13, 14, 15, 16]]> : vector<2x8xi8>
				vector.transfer_write %0, %a[%c0, %c0] : vector<2x8xi8>, memref<2x8xi8>
				%1 = std.constant dense<[[17, 18, 19, 20, 21, 22, 23, 24],
				[25, 26, 27, 28, 29, 30, 31, 32]]> : vector<2x8xi8>
				vector.transfer_write %1, %b[%c0, %c0] : vector<2x8xi8>, memref<2x8xi8>

				// Call kernel.
				call @kernel1(%a, %b, %c) : (memref<2x8xi8>, memref<2x8xi8>, memref<2x2xi32>) -> ()

				// Print and verify.
				//
				// CHECK: ( 884, 1028 )
				// CHECK-NEXT: ( 2324, 2724 )
				scf.for %i = %c0 to %c2 step %c1 {
				%av = vector.transfer_read %c[%i, %c0], %i0: memref<2x2xi32>, vector<2xi32>
				vector.print %av : vector<2xi32>
				}

				// Call kernel.
				call @kernel2(%a, %b, %c) : (memref<2x8xi8>, memref<2x8xi8>, memref<2x2xi32>) -> ()

				// Print and verify.
				//
				// CHECK-NEXT: ( 1768, 2056 )
				// CHECK-NEXT: ( 4648, 5448 )
				//
				scf.for %i = %c0 to %c2 step %c1 {
				%cv = vector.transfer_read %c[%i, %c0], %i0: memref<2x2xi32>, vector<2xi32>
				vector.print %cv : vector<2xi32>
				}

				// Release resources.
				dealloc %a : memref<2x8xi8>
				dealloc %b : memref<2x8xi8>
				dealloc %c : memref<2x2xi32>

				return
				}

mlir/test/Integration/Dialect/Vector/CPU/AMX/test-tilezero.mlir

This file was added.

				// RUN: mlir-opt %s -convert-vector-to-scf -lower-affine -convert-scf-to-std -convert-vector-to-llvm="enable-amx" -convert-std-to-llvm \| \
				// RUN: mlir-translate -mlir-to-llvmir \| \
				// RUN: %lli --entry-function=entry --mattr="+amx-tile,+amx-int8,+amx-bf16" --dlopen=%mlir_integration_test_dir/libmlir_c_runner_utils%shlibext \| \
				// RUN: FileCheck %s

				// Note: To run this test, your CPU must support AMX.

				func @tilezero(%arg0: memref<?x?xi32>, %i: index, %j: index) {
				%1 = amx.tile_zero : vector<16x16xi32>
				amx.tile_store %arg0[%i, %j], %1 : memref<?x?xi32>, vector<16x16xi32>
				return
				}

				func @entry() {
				%i0 = constant 0: i32
				%i1 = constant 1: i32
				%c0 = constant 0: index
				%c1 = constant 1: index
				%c3 = constant 3: index
				%c19 = constant 19: index

				// Set up memory.
				%a = alloc(%c19, %c19) : memref<?x?xi32>
				scf.for %i = %c0 to %c19 step %c1 {
				scf.for %j = %c0 to %c19 step %c1 {
				store %i1, %a[%i, %j] : memref<?x?xi32>
				}
				}

				// Call kernel.
				call @tilezero(%a, %c1, %c1) : (memref<?x?xi32>, index, index) -> ()

				// Print and verify that the tilezero is correctly strided within
				// the enveloping 19x19 buffer.
				//
				// CHECK: ( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				nicolasvasilacheUnsubmitted Not Done Reply Inline Actions Haha I love these types of test. In polyhedral land, one would skew the the tile and cut it at the boundaries: seeing the pattern is correct is a must. nicolasvasilache: Haha I love these types of test. In polyhedral land, one would skew the the tile and cut it at…
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 )
				// CHECK-NEXT: ( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 )
				//
				scf.for %i = %c0 to %c19 step %c1 {
				%av = vector.transfer_read %a[%i, %c0], %i0: memref<?x?xi32>, vector<19xi32>
				vector.print %av : vector<19xi32>
				}

				// Call kernel with different indices.
				call @tilezero(%a, %c0, %c3) : (memref<?x?xi32>, index, index) -> ()

				// Print and verify that the tilezero is again correctly strided
				// within the enveloping 19x19 buffer.
				//
				// CHECK-NEXT: ( 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 )
				// CHECK-NEXT: ( 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 )
				// CHECK-NEXT: ( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 )
				// CHECK-NEXT: ( 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 )
				//
				scf.for %i = %c0 to %c19 step %c1 {
				%av = vector.transfer_read %a[%i, %c0], %i0: memref<?x?xi32>, vector<19xi32>
				vector.print %av : vector<19xi32>
				}

				// Release resources.
				dealloc %a : memref<?x?xi32>

				return
				}

mlir/test/Target/LLVMIR/amx.mlir

This file was added.

				// RUN: mlir-translate --mlir-to-llvmir %s \| FileCheck %s

				// CHECK-LABEL: define void @target(i8* %0)
				// CHECK: %[[c:.*]] = call x86_amx @llvm.x86.tilezero.internal(i16 16, i16 16)
				// CHECK: call void @llvm.x86.tilestored64.internal(i16 16, i16 16, i8* %0, i64 32, x86_amx %[[c]]
				llvm.func @target(%ptr: !llvm.ptr<i8>) {
				%c = llvm.mlir.constant(16 : i16) : i16
				%s = llvm.mlir.constant(32 : i64) : i64
				%0 = "amx.tilezero"(%c, %c) : (i16, i16) -> !llvm.array<16 x vector<16xbf16>>
				"amx.tilestored64"(%c, %c, %ptr, %s, %0) : (i16, i16, !llvm.ptr<i8>, i64, !llvm.array<16 x vector<16xbf16>>) -> ()
				llvm.return
				}

mlir/test/lit.site.cfg.py.in

	Show First 20 Lines • Show All 42 Lines • ▼ Show 20 Lines
	config.enable_rocm_runner = @MLIR_ROCM_RUNNER_ENABLED@			config.enable_rocm_runner = @MLIR_ROCM_RUNNER_ENABLED@
	config.spirv_wrapper_library_dir = "@MLIR_SPIRV_WRAPPER_LIBRARY_DIR@"			config.spirv_wrapper_library_dir = "@MLIR_SPIRV_WRAPPER_LIBRARY_DIR@"
	config.enable_spirv_cpu_runner = @MLIR_SPIRV_CPU_RUNNER_ENABLED@			config.enable_spirv_cpu_runner = @MLIR_SPIRV_CPU_RUNNER_ENABLED@
	config.vulkan_wrapper_library_dir = "@MLIR_VULKAN_WRAPPER_LIBRARY_DIR@"			config.vulkan_wrapper_library_dir = "@MLIR_VULKAN_WRAPPER_LIBRARY_DIR@"
	config.enable_vulkan_runner = @MLIR_VULKAN_RUNNER_ENABLED@			config.enable_vulkan_runner = @MLIR_VULKAN_RUNNER_ENABLED@
	config.enable_bindings_python = @MLIR_BINDINGS_PYTHON_ENABLED@			config.enable_bindings_python = @MLIR_BINDINGS_PYTHON_ENABLED@
	config.mlir_integration_test_dir = "@MLIR_INTEGRATION_TEST_DIR@"			config.mlir_integration_test_dir = "@MLIR_INTEGRATION_TEST_DIR@"
	config.intel_sde_executable = "@INTEL_SDE_EXECUTABLE@"			config.intel_sde_executable = "@INTEL_SDE_EXECUTABLE@"
				config.mlir_run_amx_tests = "@MLIR_RUN_AMX_TESTS@"
	config.mlir_run_avx512_tests = "@MLIR_RUN_AVX512_TESTS@"			config.mlir_run_avx512_tests = "@MLIR_RUN_AVX512_TESTS@"
	config.mlir_include_integration_tests = "@MLIR_INCLUDE_INTEGRATION_TESTS@"			config.mlir_include_integration_tests = "@MLIR_INCLUDE_INTEGRATION_TESTS@"

	# Support substitution of the tools_dir with user parameters. This is			# Support substitution of the tools_dir with user parameters. This is
	# used when we can't determine the tool dir at configuration time.			# used when we can't determine the tool dir at configuration time.
	try:			try:
	config.llvm_tools_dir = config.llvm_tools_dir % lit_config.params			config.llvm_tools_dir = config.llvm_tools_dir % lit_config.params
	config.llvm_lib_dir = config.llvm_lib_dir % lit_config.params			config.llvm_lib_dir = config.llvm_lib_dir % lit_config.params
	Show All 11 Lines

mlir/test/mlir-opt/commandline.mlir

	// RUN: mlir-opt --show-dialects \| FileCheck %s			// RUN: mlir-opt --show-dialects \| FileCheck %s
	// CHECK: Available Dialects:			// CHECK: Available Dialects:
	// CHECK-NEXT: acc			// CHECK-NEXT: acc
	// CHECK-NEXT: affine			// CHECK-NEXT: affine
				// CHECK-NEXT: amx
	// CHECK-NEXT: arm_neon			// CHECK-NEXT: arm_neon
	// CHECK-NEXT: arm_sve			// CHECK-NEXT: arm_sve
	// CHECK-NEXT: async			// CHECK-NEXT: async
	// CHECK-NEXT: avx512			// CHECK-NEXT: avx512
	// CHECK-NEXT: complex			// CHECK-NEXT: complex
	// CHECK-NEXT: gpu			// CHECK-NEXT: gpu
	// CHECK-NEXT: linalg			// CHECK-NEXT: linalg
	// CHECK-NEXT: llvm			// CHECK-NEXT: llvm
	Show All 17 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 330416

mlir/include/mlir/Conversion/Passes.td

mlir/include/mlir/Conversion/VectorToLLVM/ConvertVectorToLLVM.h

mlir/include/mlir/Dialect/AMX/AMX.td

mlir/include/mlir/Dialect/AMX/AMXDialect.h

mlir/include/mlir/Dialect/AMX/CMakeLists.txt

mlir/include/mlir/Dialect/AMX/Transforms.h

mlir/include/mlir/Dialect/CMakeLists.txt

mlir/include/mlir/InitAllDialects.h

mlir/include/mlir/Target/LLVMIR/Dialect/AMX/AMXToLLVMIRTranslation.h

mlir/include/mlir/Target/LLVMIR/Dialect/All.h

mlir/lib/Conversion/PassDetail.h

mlir/lib/Conversion/VectorToLLVM/CMakeLists.txt

mlir/lib/Conversion/VectorToLLVM/ConvertVectorToLLVMPass.cpp

mlir/lib/Dialect/AMX/CMakeLists.txt

mlir/lib/Dialect/AMX/IR/AMXDialect.cpp

mlir/lib/Dialect/AMX/IR/CMakeLists.txt

mlir/lib/Dialect/AMX/Transforms/CMakeLists.txt

mlir/lib/Dialect/AMX/Transforms/LegalizeForLLVMExport.cpp

mlir/lib/Dialect/CMakeLists.txt

mlir/lib/Target/LLVMIR/CMakeLists.txt

mlir/lib/Target/LLVMIR/Dialect/AMX/AMXToLLVMIRTranslation.cpp

mlir/lib/Target/LLVMIR/Dialect/AMX/CMakeLists.txt

mlir/lib/Target/LLVMIR/Dialect/CMakeLists.txt

mlir/test/CMakeLists.txt

mlir/test/Dialect/AMX/invalid.mlir

mlir/test/Dialect/AMX/legalize-for-llvm.mlir

mlir/test/Dialect/AMX/roundtrip.mlir

mlir/test/Integration/Dialect/Vector/CPU/AMX/lit.local.cfg

mlir/test/Integration/Dialect/Vector/CPU/AMX/test-mulf.mlir

mlir/test/Integration/Dialect/Vector/CPU/AMX/test-muli.mlir

mlir/test/Integration/Dialect/Vector/CPU/AMX/test-tilezero.mlir

mlir/test/Target/LLVMIR/amx.mlir

mlir/test/lit.site.cfg.py.in

mlir/test/mlir-opt/commandline.mlir

[mlir][amx] Add Intel AMX dialect (architectural-specific vector dialect)
ClosedPublic