Download Raw Diff

Details

Reviewers

ftynse
dcaballe
craig.topper

Commits

rGf39c2a114283: [mlir][llvm] Add vector insert/extract intrinsics

Summary

These intrinsics will be needed to convert between fixed-length vectors
and scalable vectors.

This operation will be needed for VLS (vector-length specific)
vectorization, when interfacing with vector functions or intrinsics that
take scalable vectors as operands in a context where the length of our
vectors is known or assumed at compile time, but we still want to
generate scalable vector instructions.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jsetoain created this revision.Jun 6 2022, 4:05 AM

Herald added a reviewer: ftynse. · View Herald TranscriptJun 6 2022, 4:05 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bzcheeseman, awarzynski, sdasgup3 and 21 others. · View Herald Transcript

jsetoain requested review of this revision.Jun 6 2022, 4:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2022, 4:05 AM

Herald added subscribers: alextsao1999, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

jsetoain added a reviewer: dcaballe.Jun 6 2022, 4:06 AM

Harbormaster completed remote builds in B168011: Diff 434423.Jun 6 2022, 4:24 AM

Bad merge

Harbormaster completed remote builds in B168032: Diff 434448.Jun 6 2022, 6:52 AM

LGTM! A couple of comments

mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
440	%0 = llvm.intr.experimental.vector.insert %arg0, %arg2[0] : vector<8xf32> into vector<[4]xf32> I didn't know we could insert/extract larger multiples of the fixed part of the scalable type. That's interesting. I think we should add custom verifiers to make sure the fixed type is the same as the fixed part of the scalable type or a multiple of it. I guess we should also check for the upper bound of these multiples based on the variants defined in LLVM.

dcaballe added a reviewer: craig.topper.Jun 6 2022, 12:04 PM

Herald added a subscriber: StephenFan. · View Herald TranscriptJun 6 2022, 12:04 PM

jsetoain added inline comments.Jun 7 2022, 2:26 AM

mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
440	Yes! The idea is to be able to interface VLS code with VLA functions. I'm writing another patch for shape_cast that will verify that the cast makes sense. The easiest restriction, for instance, is that the fixed-length vector must be a multiple of the scalable one, but what "a multiple" means gets complicated quickly if we think about multi-rank vectors (I'm working on it). As for checking the upper bound, since that's going to be architecture-dependent and we need this to be a bit higher level, I believe we should defer that to the architecture-dependent stages of the pipeline. The whole operation is already treading on a thin layer of unverifiability, we are assuming the programmer/compiler know what they're doing, so it should be acceptable. I want to elaborate on all these things (and more) in the post I promised, I just need to finish the shape_cast patch so we can discuss on the basis of a use case.

dcaballe added inline comments.Jun 7 2022, 12:05 PM

mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
440	Take your time! In the meantime, let me clarify and ask a few questions: I assume that the `shape_cast` work and the multi-rank vector variants of the insert/extract operations would go to the Vector dialect, not to the LLVM dialect, right? Multi-dimensional vectors should have been unrolled or lowered to multi-dimensional arrays. Re verification (upper bound and fixed-scalable vector properties), the intrinsics in the LLVM dialect should faithfully represent the ones defined in LLVM. If we generated a variant that is not defined in LLVM (let's say `<vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v40960i32(<vscale x 4 x i32>, <40960 x i32>, i64 immarg)`, we shouldn't accept it in the LLVM dialect because we would generate invalid LLVM IR. I think we should only allow valid variants in the LLVM dialect (i.e., variants that are defined in LLVM). Any transformations aimed at "legalizing" high-level versions of these ops should happen before reaching the LLVM dialect. Perhaps I'm missing something... :)

jsetoain added inline comments.Jun 8 2022, 2:01 AM

mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
440	Let me elaborate a bit. The idea is to lower "non-trivial" fl->sv/sv->fl shape_casts to fl->fl/sv->sv "non-trivial" shape_cast plus a "trivial" fl->sv/sv->fl. fl->fl and sv->sv are lowered as they do now (down to a series of insert/extracts, and then to LLVM), but the "trivial" fl->sv/sv->fl are replaced on the vector->llvm lowering by one of the vector.insert/extract intrinsics. There are other alternatives, but I think this one is the best. By "trivial", I mean things like `vector<4x4xf32>` to `vector<[4]xf32>` or `vector<4xf32>` (which is equivalent to `vector<1x4xf32>`) to `vector<[4]xf32>` (and vice-versa). I call these trivial because a `vector<[4]xf32>` is equivalent to a `vector<n x 4 x f32>` where `n` is determined at runtime, hence things like `vector<n x shape x type>` are trivially "mappable" to a `vector<[shape] x type>` if we assume that the bitsize of your runtime vector is `n x sizeof(type)`. But we also need to be able to cast something like `vector<2x8xi8>` to `vector<[16]xi8>` by doing a cast from `vector<2x8xi8>` to `vector<16xi8>` and another (trivial) one from `vector<16xi8>` to `vector<[16]xi8>`; or the 256b VLS equivalent: `vector<2x2x8xi8>` to `vector<[16]xi8>` through `vector<2x16xi8>`. This would allow to interface VLS GEMM code with xMMLA SVE intrinsics, for instance. Re.: 2, after doing a few tests, it looks like, indeed, `vector.insert/extract` have a limit to the fixed size of 2^17 bits. I don't understand why but, since it's there, I agree we need to add verification to avoid producing invalid LLVM Dialect that can't be translated to LLVM IR. It was I who was missing something :-P Since it's taking so long and this is probably the worst place to have this discussion, I will open a discourse thread and we can talk about this over there :-)

jsetoain added a child revision: D127758: [mlir][vector] Add cast op between scalable and fixed-length vectors.Jun 14 2022, 8:56 AM

Accept fixed-length insert/extract

Harbormaster completed remote builds in B169980: Diff 437151.Jun 15 2022, 7:39 AM

jsetoain added a child revision: D127875: [mlir][vector] Add vector.scalable.insert/extract ops.Jun 15 2022, 9:48 AM

Hopefully soon the vector.insert/extract intrinsics will be moving out of the experimental namespace (https://reviews.llvm.org/D127976). Just a heads up, depending on which patch lands first.

Brilliant! Thanks for heads up, Bradley!

Adding verification of vector sizes, allowing more modes of operation.

I believe this change addresses the issue with vectors being too long. I also changed the constraints to match those of the LLVM intrinsic.

jsetoain marked an inline comment as done.Jun 16 2022, 11:04 AM

Harbormaster completed remote builds in B170307: Diff 437610.Jun 16 2022, 11:30 AM

Add missing constraints.

Harbormaster completed remote builds in B170351: Diff 437676.Jun 16 2022, 2:03 PM

Matt added a subscriber: Matt.Jun 16 2022, 4:45 PM

LGTM! Thank you so much for addressing the comments and for the discussion!

This revision is now accepted and ready to land.Jun 17 2022, 12:13 PM

Rebase on top of main with LLVM vector.insert/extract outside of experimental

Harbormaster completed remote builds in B172162: Diff 440173.Jun 27 2022, 5:17 AM

Closed by commit rGf39c2a114283: [mlir][llvm] Add vector insert/extract intrinsics (authored by jsetoain). · Explain WhyJun 27 2022, 6:15 AM

This revision was automatically updated to reflect the committed changes.

jsetoain added a commit: rGf39c2a114283: [mlir][llvm] Add vector insert/extract intrinsics.

jsetoain removed a child revision: D127875: [mlir][vector] Add vector.scalable.insert/extract ops.Oct 23 2022, 12:05 PM

Diff 440195

mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td

	Show First 20 Lines • Show All 402 Lines • ▼ Show 20 Lines
	/// Create a call to stepvector intrinsic.			/// Create a call to stepvector intrinsic.
	def LLVM_StepVectorOp			def LLVM_StepVectorOp
	: LLVM_IntrOp<"experimental.stepvector", [0], [], [NoSideEffect], 1> {			: LLVM_IntrOp<"experimental.stepvector", [0], [], [NoSideEffect], 1> {
	let arguments = (ins);			let arguments = (ins);
	let results = (outs LLVM_Type:$res);			let results = (outs LLVM_Type:$res);
	let assemblyFormat = "attr-dict `:` type($res)";			let assemblyFormat = "attr-dict `:` type($res)";
	}			}

				/// Create a call to vector.insert intrinsic
				def LLVM_vector_insert
				: LLVM_Op<"intr.vector.insert",
				[NoSideEffect, AllTypesMatch<["dstvec", "res"]>,
				PredOpTrait<"vectors are not bigger than 2^17 bits.", And<[
				CPred<"getSrcVectorBitWidth() <= 131072">,
				CPred<"getDstVectorBitWidth() <= 131072">
				]>>,
				PredOpTrait<"it is not inserting scalable into fixed-length vectors.",
				CPred<"!isScalableVectorType($srcvec.getType()) \|\| "
				"isScalableVectorType($dstvec.getType())">>]> {
				let arguments = (ins LLVM_AnyVector:$srcvec, LLVM_AnyVector:$dstvec,
				I64Attr:$pos);
				let results = (outs LLVM_AnyVector:$res);
				let builders = [LLVM_OneResultOpBuilder];
				string llvmBuilder = [{
				$res = builder.CreateInsertVector(
				$_resultType, $dstvec, $srcvec, builder.getInt64($pos));
				}];
				let assemblyFormat = "$srcvec `,` $dstvec `[` $pos `]` attr-dict `:` "
				"type($srcvec) `into` type($res)";
				let extraClassDeclaration = [{
				uint64_t getVectorBitWidth(Type vector) {
				return getVectorNumElements(vector).getKnownMinValue() *
				getVectorElementType(vector).getIntOrFloatBitWidth();
				}
				uint64_t getSrcVectorBitWidth() {
				return getVectorBitWidth(getSrcvec().getType());
				}
				uint64_t getDstVectorBitWidth() {
				dcaballeUnsubmitted Done Reply Inline Actions %0 = llvm.intr.experimental.vector.insert %arg0, %arg2[0] : vector<8xf32> into vector<[4]xf32> I didn't know we could insert/extract larger multiples of the fixed part of the scalable type. That's interesting. I think we should add custom verifiers to make sure the fixed type is the same as the fixed part of the scalable type or a multiple of it. I guess we should also check for the upper bound of these multiples based on the variants defined in LLVM. dcaballe: > %0 = llvm.intr.experimental.vector.insert %arg0, %arg2[0] : vector<8xf32> into vector<…
				jsetoainAuthorUnsubmitted Done Reply Inline Actions Yes! The idea is to be able to interface VLS code with VLA functions. I'm writing another patch for shape_cast that will verify that the cast makes sense. The easiest restriction, for instance, is that the fixed-length vector must be a multiple of the scalable one, but what "a multiple" means gets complicated quickly if we think about multi-rank vectors (I'm working on it). As for checking the upper bound, since that's going to be architecture-dependent and we need this to be a bit higher level, I believe we should defer that to the architecture-dependent stages of the pipeline. The whole operation is already treading on a thin layer of unverifiability, we are assuming the programmer/compiler know what they're doing, so it should be acceptable. I want to elaborate on all these things (and more) in the post I promised, I just need to finish the shape_cast patch so we can discuss on the basis of a use case. jsetoain: Yes! The idea is to be able to interface VLS code with VLA functions. I'm writing another patch…
				dcaballeUnsubmitted Done Reply Inline Actions Take your time! In the meantime, let me clarify and ask a few questions: I assume that the `shape_cast` work and the multi-rank vector variants of the insert/extract operations would go to the Vector dialect, not to the LLVM dialect, right? Multi-dimensional vectors should have been unrolled or lowered to multi-dimensional arrays. Re verification (upper bound and fixed-scalable vector properties), the intrinsics in the LLVM dialect should faithfully represent the ones defined in LLVM. If we generated a variant that is not defined in LLVM (let's say `<vscale x 4 x i32> @llvm.experimental.vector.insert.nxv4i32.v40960i32(<vscale x 4 x i32>, <40960 x i32>, i64 immarg)`, we shouldn't accept it in the LLVM dialect because we would generate invalid LLVM IR. I think we should only allow valid variants in the LLVM dialect (i.e., variants that are defined in LLVM). Any transformations aimed at "legalizing" high-level versions of these ops should happen before reaching the LLVM dialect. Perhaps I'm missing something... :) dcaballe: Take your time! In the meantime, let me clarify and ask a few questions: # I assume that the…
				jsetoainAuthorUnsubmitted Done Reply Inline Actions Let me elaborate a bit. The idea is to lower "non-trivial" fl->sv/sv->fl shape_casts to fl->fl/sv->sv "non-trivial" shape_cast plus a "trivial" fl->sv/sv->fl. fl->fl and sv->sv are lowered as they do now (down to a series of insert/extracts, and then to LLVM), but the "trivial" fl->sv/sv->fl are replaced on the vector->llvm lowering by one of the vector.insert/extract intrinsics. There are other alternatives, but I think this one is the best. By "trivial", I mean things like `vector<4x4xf32>` to `vector<[4]xf32>` or `vector<4xf32>` (which is equivalent to `vector<1x4xf32>`) to `vector<[4]xf32>` (and vice-versa). I call these trivial because a `vector<[4]xf32>` is equivalent to a `vector<n x 4 x f32>` where `n` is determined at runtime, hence things like `vector<n x shape x type>` are trivially "mappable" to a `vector<[shape] x type>` if we assume that the bitsize of your runtime vector is `n x sizeof(type)`. But we also need to be able to cast something like `vector<2x8xi8>` to `vector<[16]xi8>` by doing a cast from `vector<2x8xi8>` to `vector<16xi8>` and another (trivial) one from `vector<16xi8>` to `vector<[16]xi8>`; or the 256b VLS equivalent: `vector<2x2x8xi8>` to `vector<[16]xi8>` through `vector<2x16xi8>`. This would allow to interface VLS GEMM code with xMMLA SVE intrinsics, for instance. Re.: 2, after doing a few tests, it looks like, indeed, `vector.insert/extract` have a limit to the fixed size of 2^17 bits. I don't understand why but, since it's there, I agree we need to add verification to avoid producing invalid LLVM Dialect that can't be translated to LLVM IR. It was I who was missing something :-P Since it's taking so long and this is probably the worst place to have this discussion, I will open a discourse thread and we can talk about this over there :-) jsetoain: Let me elaborate a bit. The idea is to lower //"non-trivial"// fl->sv/sv->fl shape_casts to fl…
				return getVectorBitWidth(getDstvec().getType());
				}
				}];
				}

				/// Create a call to vector.extract intrinsic
				def LLVM_vector_extract
				: LLVM_Op<"intr.vector.extract",
				[NoSideEffect,
				PredOpTrait<"vectors are not bigger than 2^17 bits.", And<[
				CPred<"getSrcVectorBitWidth() <= 131072">,
				CPred<"getResVectorBitWidth() <= 131072">
				]>>,
				PredOpTrait<"it is not extracting scalable from fixed-length vectors.",
				CPred<"!isScalableVectorType($res.getType()) \|\| "
				"isScalableVectorType($srcvec.getType())">>]> {
				let arguments = (ins LLVM_AnyVector:$srcvec, I64Attr:$pos);
				let results = (outs LLVM_AnyVector:$res);
				let builders = [LLVM_OneResultOpBuilder];
				string llvmBuilder = [{
				$res = builder.CreateExtractVector(
				$_resultType, $srcvec, builder.getInt64($pos));
				}];
				let assemblyFormat = "$srcvec `[` $pos `]` attr-dict `:` "
				"type($res) `from` type($srcvec)";
				let extraClassDeclaration = [{
				uint64_t getVectorBitWidth(Type vector) {
				return getVectorNumElements(vector).getKnownMinValue() *
				getVectorElementType(vector).getIntOrFloatBitWidth();
				}
				uint64_t getSrcVectorBitWidth() {
				return getVectorBitWidth(getSrcvec().getType());
				}
				uint64_t getResVectorBitWidth() {
				return getVectorBitWidth(getRes().getType());
				}
				}];
				}

	//			//
	// LLVM Vector Predication operations.			// LLVM Vector Predication operations.
	//			//

	class LLVM_VPBinaryBase<string mnem, Type element>			class LLVM_VPBinaryBase<string mnem, Type element>
	: LLVM_OneResultIntrOp<"vp." # mnem, [0], [], [NoSideEffect]>,			: LLVM_OneResultIntrOp<"vp." # mnem, [0], [], [NoSideEffect]>,
	Arguments<(ins LLVM_VectorOf<element>:$lhs, LLVM_VectorOf<element>:$rhs,			Arguments<(ins LLVM_VectorOf<element>:$lhs, LLVM_VectorOf<element>:$rhs,
	LLVM_VectorOf<I1>:$mask, I32:$evl)>;			LLVM_VectorOf<I1>:$mask, I32:$evl)>;
	▲ Show 20 Lines • Show All 133 Lines • Show Last 20 Lines

mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td

	Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines
	def LLVM_AnyNonAggregate : Type<And<[LLVM_Type.predicate,			def LLVM_AnyNonAggregate : Type<And<[LLVM_Type.predicate,
	Neg<LLVM_AnyAggregate.predicate>]>,			Neg<LLVM_AnyAggregate.predicate>]>,
	"LLVM-compatible non-aggregate type">;			"LLVM-compatible non-aggregate type">;

	// Type constraint accepting any LLVM vector type.			// Type constraint accepting any LLVM vector type.
	def LLVM_AnyVector : Type<CPred<"::mlir::LLVM::isCompatibleVectorType($_self)">,			def LLVM_AnyVector : Type<CPred<"::mlir::LLVM::isCompatibleVectorType($_self)">,
	"LLVM dialect-compatible vector type">;			"LLVM dialect-compatible vector type">;

				// Type constraint accepting any LLVM fixed-length vector type.
				def LLVM_AnyFixedVector : Type<CPred<
				"!::mlir::LLVM::isScalableVectorType($_self)">,
				"LLVM dialect-compatible fixed-length vector type">;

				// Type constraint accepting any LLVM scalable vector type.
				def LLVM_AnyScalableVector : Type<CPred<
				"::mlir::LLVM::isScalableVectorType($_self)">,
				"LLVM dialect-compatible scalable vector type">;

	// Type constraint accepting an LLVM vector type with an additional constraint			// Type constraint accepting an LLVM vector type with an additional constraint
	// on the vector element type.			// on the vector element type.
	class LLVM_VectorOf<Type element> : Type<			class LLVM_VectorOf<Type element> : Type<
	And<[LLVM_AnyVector.predicate,			And<[LLVM_AnyVector.predicate,
	SubstLeaves<			SubstLeaves<
	"$_self",			"$_self",
	"::mlir::LLVM::getVectorElementType($_self)",			"::mlir::LLVM::getVectorElementType($_self)",
	element.predicate>]>,			element.predicate>]>,
	▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

mlir/test/Dialect/LLVMIR/invalid.mlir

	Show First 20 Lines • Show All 1,357 Lines • ▼ Show 20 Lines
	}			}

	// -----			// -----

	// expected-error@+1 {{size of 'llvm.struct_attrs' must match the size of the annotated '!llvm.struct'}}			// expected-error@+1 {{size of 'llvm.struct_attrs' must match the size of the annotated '!llvm.struct'}}
	func.func @invalid_res_struct_attr_size(%arg0 : !llvm.struct<(i32)>) -> (!llvm.struct<(i32)> {llvm.struct_attrs = []}) {			func.func @invalid_res_struct_attr_size(%arg0 : !llvm.struct<(i32)>) -> (!llvm.struct<(i32)> {llvm.struct_attrs = []}) {
	return %arg0 : !llvm.struct<(i32)>			return %arg0 : !llvm.struct<(i32)>
	}			}

				// -----

				func.func @insert_vector_invalid_source_vector_size(%arg0 : vector<16385xi8>, %arg1 : vector<[16]xi8>) {
				// expected-error@+1 {{op failed to verify that vectors are not bigger than 2^17 bits.}}
				%0 = llvm.intr.vector.insert %arg0, %arg1[0] : vector<16385xi8> into vector<[16]xi8>
				}

				// -----

				func.func @insert_vector_invalid_dest_vector_size(%arg0 : vector<16xi8>, %arg1 : vector<[16385]xi8>) {
				// expected-error@+1 {{op failed to verify that vectors are not bigger than 2^17 bits.}}
				%0 = llvm.intr.vector.insert %arg0, %arg1[0] : vector<16xi8> into vector<[16385]xi8>
				}

				// -----

				func.func @insert_scalable_into_fixed_length_vector(%arg0 : vector<[8]xf32>, %arg1 : vector<16xf32>) {
				// expected-error@+1 {{op failed to verify that it is not inserting scalable into fixed-length vectors.}}
				%0 = llvm.intr.vector.insert %arg0, %arg1[0] : vector<[8]xf32> into vector<16xf32>
				}

				// -----

				func.func @extract_vector_invalid_source_vector_size(%arg0 : vector<[16385]xi8>) {
				// expected-error@+1 {{op failed to verify that vectors are not bigger than 2^17 bits.}}
				%0 = llvm.intr.vector.extract %arg0[0] : vector<16xi8> from vector<[16385]xi8>
				}

				// -----

				func.func @extract_vector_invalid_result_vector_size(%arg0 : vector<[16]xi8>) {
				// expected-error@+1 {{op failed to verify that vectors are not bigger than 2^17 bits.}}
				%0 = llvm.intr.vector.extract %arg0[0] : vector<16385xi8> from vector<[16]xi8>
				}

				// -----

				func.func @extract_scalable_from_fixed_length_vector(%arg0 : vector<16xf32>) {
				// expected-error@+1 {{op failed to verify that it is not extracting scalable from fixed-length vectors.}}
				%0 = llvm.intr.vector.extract %arg0[0] : vector<[8]xf32> from vector<16xf32>
				}

mlir/test/Dialect/LLVMIR/roundtrip.mlir

Show First 20 Lines • Show All 299 Lines • ▼ Show 20 Lines	// CHECK: = llvm.insertelement {{.*}} : vector<[4]xf32>
%1 = llvm.insertelement %arg2, %arg0[%arg1 : i32] : vector<[4]xf32>		%1 = llvm.insertelement %arg2, %arg0[%arg1 : i32] : vector<[4]xf32>
// CHECK: = llvm.shufflevector {{.*}} [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<[4]xf32>, vector<[4]xf32>		// CHECK: = llvm.shufflevector {{.*}} [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<[4]xf32>, vector<[4]xf32>
%2 = llvm.shufflevector %arg0, %arg0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<[4]xf32>, vector<[4]xf32>		%2 = llvm.shufflevector %arg0, %arg0 [0 : i32, 0 : i32, 0 : i32, 0 : i32] : vector<[4]xf32>, vector<[4]xf32>
// CHECK: = llvm.mlir.constant(dense<1.000000e+00> : vector<[4]xf32>) : vector<[4]xf32>		// CHECK: = llvm.mlir.constant(dense<1.000000e+00> : vector<[4]xf32>) : vector<[4]xf32>
%3 = llvm.mlir.constant(dense<1.0> : vector<[4]xf32>) : vector<[4]xf32>		%3 = llvm.mlir.constant(dense<1.0> : vector<[4]xf32>) : vector<[4]xf32>
return		return
}		}

		// CHECK-LABEL: @mixed_vect
		func.func @mixed_vect(%arg0: vector<8xf32>, %arg1: vector<4xf32>, %arg2: vector<[4]xf32>) {
		// CHECK: = llvm.intr.vector.insert {{.*}} : vector<8xf32> into vector<[4]xf32>
		%0 = llvm.intr.vector.insert %arg0, %arg2[0] : vector<8xf32> into vector<[4]xf32>
		// CHECK: = llvm.intr.vector.insert {{.*}} : vector<4xf32> into vector<[4]xf32>
		%1 = llvm.intr.vector.insert %arg1, %arg2[0] : vector<4xf32> into vector<[4]xf32>
		// CHECK: = llvm.intr.vector.insert {{.*}} : vector<4xf32> into vector<[4]xf32>
		%2 = llvm.intr.vector.insert %arg1, %1[4] : vector<4xf32> into vector<[4]xf32>
		// CHECK: = llvm.intr.vector.insert {{.*}} : vector<4xf32> into vector<8xf32>
		%3 = llvm.intr.vector.insert %arg1, %arg0[4] : vector<4xf32> into vector<8xf32>
		// CHECK: = llvm.intr.vector.extract {{.*}} : vector<8xf32> from vector<[4]xf32>
		%4 = llvm.intr.vector.extract %2[0] : vector<8xf32> from vector<[4]xf32>
		// CHECK: = llvm.intr.vector.extract {{.*}} : vector<2xf32> from vector<8xf32>
		%5 = llvm.intr.vector.extract %arg0[6] : vector<2xf32> from vector<8xf32>
		return
		}

// CHECK-LABEL: @alloca		// CHECK-LABEL: @alloca
func.func @alloca(%size : i64) {		func.func @alloca(%size : i64) {
// CHECK: llvm.alloca %{{.*}} x i32 : (i64) -> !llvm.ptr<i32>		// CHECK: llvm.alloca %{{.*}} x i32 : (i64) -> !llvm.ptr<i32>
llvm.alloca %size x i32 {alignment = 0} : (i64) -> (!llvm.ptr<i32>)		llvm.alloca %size x i32 {alignment = 0} : (i64) -> (!llvm.ptr<i32>)
// CHECK: llvm.alloca %{{.*}} x i32 {alignment = 8 : i64} : (i64) -> !llvm.ptr<i32>		// CHECK: llvm.alloca %{{.*}} x i32 {alignment = 8 : i64} : (i64) -> !llvm.ptr<i32>
llvm.alloca %size x i32 {alignment = 8} : (i64) -> (!llvm.ptr<i32>)		llvm.alloca %size x i32 {alignment = 8} : (i64) -> (!llvm.ptr<i32>)
llvm.return		llvm.return
}		}
▲ Show 20 Lines • Show All 166 Lines • Show Last 20 Lines

mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir

Show First 20 Lines • Show All 674 Lines • ▼ Show 20 Lines	llvm.func @vector_predication_intrinsics(%A: vector<8xi32>, %B: vector<8xi32>,
"llvm.intr.vp.ptrtoint" (%G, %mask, %evl) :		"llvm.intr.vp.ptrtoint" (%G, %mask, %evl) :
(!llvm.vec<8 x !llvm.ptr<i32>>, vector<8xi1>, i32) -> vector<8xi64>		(!llvm.vec<8 x !llvm.ptr<i32>>, vector<8xi1>, i32) -> vector<8xi64>
// CHECK: call <8 x ptr> @llvm.vp.inttoptr.v8p0.v8i64		// CHECK: call <8 x ptr> @llvm.vp.inttoptr.v8p0.v8i64
"llvm.intr.vp.inttoptr" (%E, %mask, %evl) :		"llvm.intr.vp.inttoptr" (%E, %mask, %evl) :
(vector<8xi64>, vector<8xi1>, i32) -> !llvm.vec<8 x !llvm.ptr<i32>>		(vector<8xi64>, vector<8xi1>, i32) -> !llvm.vec<8 x !llvm.ptr<i32>>
llvm.return		llvm.return
}		}

		// CHECK-LABEL: @vector_insert_extract
		llvm.func @vector_insert_extract(%f256: vector<8xi32>, %f128: vector<4xi32>,
		%sv: vector<[4]xi32>) {
		// CHECK: call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v8i32
		%0 = llvm.intr.vector.insert %f256, %sv[0] :
		vector<8xi32> into vector<[4]xi32>
		// CHECK: call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32
		%1 = llvm.intr.vector.insert %f128, %sv[0] :
		vector<4xi32> into vector<[4]xi32>
		// CHECK: call <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32
		%2 = llvm.intr.vector.insert %f128, %1[4] :
		vector<4xi32> into vector<[4]xi32>
		// CHECK: call <8 x i32> @llvm.vector.insert.v8i32.v4i32
		%3 = llvm.intr.vector.insert %f128, %f256[4] :
		vector<4xi32> into vector<8xi32>
		// CHECK: call <8 x i32> @llvm.vector.extract.v8i32.nxv4i32
		%4 = llvm.intr.vector.extract %2[0] :
		vector<8xi32> from vector<[4]xi32>
		// CHECK: call <4 x i32> @llvm.vector.extract.v4i32.nxv4i32
		%5 = llvm.intr.vector.extract %2[0] :
		vector<4xi32> from vector<[4]xi32>
		// CHECK: call <2 x i32> @llvm.vector.extract.v2i32.v8i32
		%6 = llvm.intr.vector.extract %f256[6] :
		vector<2xi32> from vector<8xi32>
		llvm.return
		}

// Check that intrinsics are declared with appropriate types.		// Check that intrinsics are declared with appropriate types.
// CHECK-DAG: declare float @llvm.fma.f32(float, float, float)		// CHECK-DAG: declare float @llvm.fma.f32(float, float, float)
// CHECK-DAG: declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.fma.v8f32(<8 x float>, <8 x float>, <8 x float>) #0
// CHECK-DAG: declare float @llvm.fmuladd.f32(float, float, float)		// CHECK-DAG: declare float @llvm.fmuladd.f32(float, float, float)
// CHECK-DAG: declare <8 x float> @llvm.fmuladd.v8f32(<8 x float>, <8 x float>, <8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.fmuladd.v8f32(<8 x float>, <8 x float>, <8 x float>) #0
// CHECK-DAG: declare void @llvm.prefetch.p0(ptr nocapture readonly, i32 immarg, i32 immarg, i32)		// CHECK-DAG: declare void @llvm.prefetch.p0(ptr nocapture readonly, i32 immarg, i32 immarg, i32)
// CHECK-DAG: declare float @llvm.exp.f32(float)		// CHECK-DAG: declare float @llvm.exp.f32(float)
// CHECK-DAG: declare <8 x float> @llvm.exp.v8f32(<8 x float>) #0		// CHECK-DAG: declare <8 x float> @llvm.exp.v8f32(<8 x float>) #0
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
// CHECK-DAG: declare <8 x i64> @llvm.vp.zext.v8i64.v8i32(<8 x i32>, <8 x i1>, i32) #2		// CHECK-DAG: declare <8 x i64> @llvm.vp.zext.v8i64.v8i32(<8 x i32>, <8 x i1>, i32) #2
// CHECK-DAG: declare <8 x i64> @llvm.vp.sext.v8i64.v8i32(<8 x i32>, <8 x i1>, i32) #2		// CHECK-DAG: declare <8 x i64> @llvm.vp.sext.v8i64.v8i32(<8 x i32>, <8 x i1>, i32) #2
// CHECK-DAG: declare <8 x float> @llvm.vp.fptrunc.v8f32.v8f64(<8 x double>, <8 x i1>, i32) #2		// CHECK-DAG: declare <8 x float> @llvm.vp.fptrunc.v8f32.v8f64(<8 x double>, <8 x i1>, i32) #2
// CHECK-DAG: declare <8 x double> @llvm.vp.fpext.v8f64.v8f32(<8 x float>, <8 x i1>, i32) #2		// CHECK-DAG: declare <8 x double> @llvm.vp.fpext.v8f64.v8f32(<8 x float>, <8 x i1>, i32) #2
// CHECK-DAG: declare <8 x i64> @llvm.vp.fptoui.v8i64.v8f64(<8 x double>, <8 x i1>, i32) #2		// CHECK-DAG: declare <8 x i64> @llvm.vp.fptoui.v8i64.v8f64(<8 x double>, <8 x i1>, i32) #2
// CHECK-DAG: declare <8 x i64> @llvm.vp.fptosi.v8i64.v8f64(<8 x double>, <8 x i1>, i32) #2		// CHECK-DAG: declare <8 x i64> @llvm.vp.fptosi.v8i64.v8f64(<8 x double>, <8 x i1>, i32) #2
// CHECK-DAG: declare <8 x i64> @llvm.vp.ptrtoint.v8i64.v8p0(<8 x ptr>, <8 x i1>, i32) #2		// CHECK-DAG: declare <8 x i64> @llvm.vp.ptrtoint.v8i64.v8p0(<8 x ptr>, <8 x i1>, i32) #2
// CHECK-DAG: declare <8 x ptr> @llvm.vp.inttoptr.v8p0.v8i64(<8 x i64>, <8 x i1>, i32) #2		// CHECK-DAG: declare <8 x ptr> @llvm.vp.inttoptr.v8p0.v8i64(<8 x i64>, <8 x i1>, i32) #2
		// CHECK-DAG: declare <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v8i32(<vscale x 4 x i32>, <8 x i32>, i64 immarg) #2
		// CHECK-DAG: declare <vscale x 4 x i32> @llvm.vector.insert.nxv4i32.v4i32(<vscale x 4 x i32>, <4 x i32>, i64 immarg) #2
		// CHECK-DAG: declare <8 x i32> @llvm.vector.insert.v8i32.v4i32(<8 x i32>, <4 x i32>, i64 immarg) #2
		// CHECK-DAG: declare <8 x i32> @llvm.vector.extract.v8i32.nxv4i32(<vscale x 4 x i32>, i64 immarg) #2
		// CHECK-DAG: declare <4 x i32> @llvm.vector.extract.v4i32.nxv4i32(<vscale x 4 x i32>, i64 immarg) #2
		// CHECK-DAG: declare <2 x i32> @llvm.vector.extract.v2i32.v8i32(<8 x i32>, i64 immarg) #2

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][llvm] Add vector insert/extract intrinsics
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 440195

mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td

mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td

mlir/test/Dialect/LLVMIR/invalid.mlir

mlir/test/Dialect/LLVMIR/roundtrip.mlir

mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][llvm] Add vector insert/extract intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 440195

mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td

mlir/include/mlir/Dialect/LLVMIR/LLVMOpBase.td

mlir/test/Dialect/LLVMIR/invalid.mlir

mlir/test/Dialect/LLVMIR/roundtrip.mlir

mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir

[mlir][llvm] Add vector insert/extract intrinsics
ClosedPublic