Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
dcaballe
mravishankar
herhut
ThomasRaoux

Commits

rG9a795f0c59b1: [mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract`

Summary

Consider mixed precision data type, i.e., F16 input lhs, F16 input rhs, F32 accumulation, and F32 output. This is typically written as F32 <= F16*F16 + F32.

During vectorization from linalg to vector for mixed precision data type (F32 <= F16*F16 + F32), linalg.matmul introduces arith.extf on input lhs and rhs operands.

"linalg.matmul"(%lhs, %rhs, %acc) ({
      ^bb0(%arg1: f16, %arg2: f16, %arg3: f32):
        %lhs_f32 = "arith.extf"(%arg1) : (f16) -> f32
        %rhs_f32 = "arith.extf"(%arg2) : (f16) -> f32
       %mul = "arith.mulf"(%lhs_f32, %rhs_f32) : (f32, f32) -> f32
        %acc = "arith.addf"(%arg3, %mul) : (f32, f32) -> f32
      "linalg.yield"(%acc) : (f32) -> ()
    })

There are backend that natively supports mixed-precision data type and does not need the arith.extf. For example, NVIDIA A100 GPU has mma.sync.aligned.*.f32.f16.f16.f32 that can support mixed-precision data type. However, the presence of arith.extf in the IR, introduces the unnecessary casting targeting F32 Tensor Cores instead of F16 Tensor Cores for NVIDIA backend. This patch adds a folding pattern to fold arith.extf into vector.contract

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

manishucsd created this revision.Jun 1 2023, 11:33 AM

Herald added a reviewer: aartbik. · View Herald TranscriptJun 1 2023, 11:33 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 24 others. · View Herald Transcript

manishucsd requested review of this revision.Jun 1 2023, 11:33 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJun 1 2023, 11:33 AM

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: • pcwang-thead, limo1996, stephenneuendorffer, nicolasvasilache. · View Herald Transcript

kuhar added a subscriber: kuhar.Jun 1 2023, 11:49 AM

kuhar added inline comments.

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
1240–1247	You can combine the defining op check with the type check: auto lhsDef = contractOp.getLhs().getDefiningOp<arith::ExtFOp>(); ... if (!lhsDef \|\| !rhsDef) {

Harbormaster completed remote builds in B235935: Diff 527530.Jun 1 2023, 12:38 PM

Apply comment from kuhar

manishucsd marked an inline comment as done.Jun 1 2023, 12:50 PM

Harbormaster completed remote builds in B235952: Diff 527574.Jun 1 2023, 1:40 PM

manishucsd added a reviewer: mravishankar.Jun 1 2023, 3:11 PM

Thanks, Manish! LG in general. A few comments.

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h
79	typo
80	Would it make sense to add these patterns to the regular vector.contact folding? Any reason to keep them separate? I understand that if the target doesn't have "native" support for these flavors they will be decomposed again into the original arith ops.
mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
1242	nit: no curly braces per coding standards
mlir/test/Dialect/Vector/fold-arith-extf-into-vector-contract.mlir
17	Could you please add another test that lowers a `vector.contract f16, f16 -> f32` to plain arith instructions again? We have to make sure that the basic end-to-end lowering is working.

Looks fine to me, but please wait for others to weigh in as well.

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h
80	Would they be decomposed that way. I am not aware of all the lowerings to LLVM, but I am not sure they will automatically undo them... At least for now it make sense to keep them separate (unless you know its OK). I think unless there is some representation of what is legal for the target being compiled, all these patterns being mixed together is problematic.
mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp
1242	nit nit: I think if the line spans more than one textual lines, braces are expected :)

This revision is now accepted and ready to land.Jun 1 2023, 3:44 PM

manishucsd added inline comments.Jun 1 2023, 4:24 PM

mlir/test/Dialect/Vector/fold-arith-extf-into-vector-contract.mlir

I am not sure I follow this. Are we looking for a test mlir that looks something like the following?

func.func @vector_contract_f16_f16_f32(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>) -> vector<64x64xf32> {
    %result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %arg0, %arg1, %arg2 : vector<64x64xf16>, vector<64x64xf16> into vector<64x64xf32>
   return %result : vector<64x64xf32>
}

The expectation from fold-arith-extf-into-vector-contract-patterns will be to do nothin for the above MLIR. Is that's what we are looking to add?

Fixed typo "Airth" to "Arith"

manishucsd marked an inline comment as done.Jun 1 2023, 4:31 PM

Harbormaster completed remote builds in B236021: Diff 527660.Jun 1 2023, 5:41 PM

dcaballe requested changes to this revision.Jun 1 2023, 6:20 PM

dcaballe added inline comments.

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h
80	Yes, that sounds good for now
mlir/test/Dialect/Vector/fold-arith-extf-into-vector-contract.mlir
17	Sorry, I wasn't clear enough. I'm asking to add a test with the IR that you posted above (basically the output of the folding transformation that you are adding) and then invoke the vector contract lowering patterns on it to make sure there is a basic lowering for the new folded contract operation. You can take a look at `vector-contract-to-parallel-arith-transforms.mlir` or `vector-contract-to-outerproduct-transforms.mlir` and add the new test there. If a similar test exists, then great, we don't have to do anything! I'm requesting this because we have complaints about dead-end patterns in MLIR upstream, i.e., patterns that generate code which can't be lowered to anything else with MLIR upstream code. We should make sure this is not one of those case and a basic lowering to LLVM exists upstream.

This revision now requires changes to proceed.Jun 1 2023, 6:20 PM

manishucsd marked an inline comment as not done.Jun 2 2023, 10:12 AM

manishucsd added inline comments.

mlir/test/Dialect/Vector/fold-arith-extf-into-vector-contract.mlir

For the below input IR:

vector-contract-to-parallel-arith-transforms.mlir

func.func @vector_contract_f16_f16_f32(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>) -> vector<64x64xf32> {
    %result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %arg0, %arg1, %arg2 : vector<64x64xf16>, vector<64x64xf16> into vector<64x64xf32>
   return %result : vector<64x64xf32>
}

vector-contract-to-parallel-arith-transforms will not lower it to vector.fma, if that is what we are looking for? See below:

$ mlir-opt vector-contract-to-parallel-arith-transforms.mlir --test-transform-dialect-interpreter


"builtin.module"() ({
  "func.func"() <{function_type = (vector<64x64xf16>, vector<64x64xf16>, vector<64x64xf32>) -> vector<64x64xf32>, sym_name = "vector_contract_f16_f16_f32"}> ({
  ^bb0(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>):
    %0 = "vector.contract"(%arg0, %arg1, %arg2) <{indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = [#vector.iterator_type<parallel>, #vector.iterator_type<parallel>, #vector.iterator_type<reduction>], kind = #vector.kind<add>}> : (vector<64x64xf16>, vector<64x64xf16>, vector<64x64xf32>) -> vector<64x64xf32>
    "func.return"(%0) : (vector<64x64xf32>) -> ()
  }) : () -> ()
}) : () -> ()

fold-arith-extf-into-vector-contract-patterns is a NVIDIA GPU-specific folding that is needed to support mixed precision. The output of fold-arith-extf-into-vector-contract-patterns is unrolled into smaller vector.contracts to match nvgpu.mma.syncshapes and data type, then lowered to nvgpu dialect

Test on mixed mode vector.contract lowering to mma.sync

Herald added a reviewer: herhut. · View Herald TranscriptJun 2 2023, 4:49 PM

Herald added a reviewer: ThomasRaoux. · View Herald Transcript

Herald added a subscriber: csigg. · View Herald Transcript

Harbormaster completed remote builds in B236317: Diff 528040.Jun 2 2023, 5:07 PM

Thanks, Manish! LGTM

This revision is now accepted and ready to land.Jun 5 2023, 10:44 AM

Closed by commit rG9a795f0c59b1: [mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract` (authored by manishucsd). · Explain WhyJun 5 2023, 4:31 PM

This revision was automatically updated to reflect the committed changes.

manishucsd added a commit: rG9a795f0c59b1: [mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract`.

Herald added a subscriber: manas. · View Herald TranscriptJun 5 2023, 4:31 PM

Diff 528621

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h

	Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines
	BroadcastableToResult			BroadcastableToResult
	isBroadcastableTo(Type srcType, VectorType dstVectorType,			isBroadcastableTo(Type srcType, VectorType dstVectorType,
	std::pair<int, int> *mismatchingDims = nullptr);			std::pair<int, int> *mismatchingDims = nullptr);

	/// Collect a set of vector-to-vector canonicalization patterns.			/// Collect a set of vector-to-vector canonicalization patterns.
	void populateVectorToVectorCanonicalizationPatterns(RewritePatternSet &patterns,			void populateVectorToVectorCanonicalizationPatterns(RewritePatternSet &patterns,
	PatternBenefit benefit = 1);			PatternBenefit benefit = 1);

				/// Collect a set of patterns that fold arithmetic extension on floating point
				/// into vector contract for the backends with native support.
				void populateFoldArithExtensionPatterns(RewritePatternSet &patterns);
				dcaballeUnsubmitted Done Reply Inline Actions typo dcaballe: typo

				dcaballeUnsubmitted Not Done Reply Inline Actions Would it make sense to add these patterns to the regular vector.contact folding? Any reason to keep them separate? I understand that if the target doesn't have "native" support for these flavors they will be decomposed again into the original arith ops. dcaballe: Would it make sense to add these patterns to the regular vector.contact folding? Any reason to…
				mravishankarUnsubmitted Not Done Reply Inline Actions Would they be decomposed that way. I am not aware of all the lowerings to LLVM, but I am not sure they will automatically undo them... At least for now it make sense to keep them separate (unless you know its OK). I think unless there is some representation of what is legal for the target being compiled, all these patterns being mixed together is problematic. mravishankar: Would they be decomposed that way. I am not aware of all the lowerings to LLVM, but I am not…
				dcaballeUnsubmitted Not Done Reply Inline Actions Yes, that sounds good for now dcaballe: Yes, that sounds good for now
	/// Returns the integer type required for subscripts in the vector dialect.			/// Returns the integer type required for subscripts in the vector dialect.
	IntegerType getVectorSubscriptType(Builder &builder);			IntegerType getVectorSubscriptType(Builder &builder);

	/// Returns an integer array attribute containing the given values using			/// Returns an integer array attribute containing the given values using
	/// the integer type required for subscripts in the vector dialect.			/// the integer type required for subscripts in the vector dialect.
	ArrayAttr getVectorSubscriptAttr(Builder &b, ArrayRef<int64_t> values);			ArrayAttr getVectorSubscriptAttr(Builder &b, ArrayRef<int64_t> values);

	/// Returns the value obtained by reducing the vector into a scalar using the			/// Returns the value obtained by reducing the vector into a scalar using the
	▲ Show 20 Lines • Show All 79 Lines • Show Last 20 Lines

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

Show First 20 Lines • Show All 1,206 Lines • ▼ Show 20 Lines	rewriter.replaceOpWithNewOp<vector::ContractionOp>(
op.getIteratorTypes());		op.getIteratorTypes());
return success();		return success();
};		};

private:		private:
FilterConstraintType filter;		FilterConstraintType filter;
};		};

		/// Pattern to fold arithmetic extensions on floating point data types into
		/// vector contraction operations. linalg.matmul introduces arithmetic
		/// extensions on its operands. Please mlir snippets below for more details.
		/// ```mlir
		/// "linalg.matmul"(%lhs, %rhs, %acc) ({
		/// ^bb0(%arg1: f16, %arg2: f16, %arg3: f32):
		/// %lhs_f32 = "arith.extf"(%arg1) : (f16) -> f32
		/// %rhs_f32 = "arith.extf"(%arg2) : (f16) -> f32
		/// %mul = "arith.mulf"(%lhs_f32, %rhs_f32) : (f32, f32) -> f32
		/// %acc = "arith.addf"(%arg3, %mul) : (f32, f32) -> f32
		/// "linalg.yield"(%acc) : (f32) -> ()
		/// })
		/// ```
		/// This restricts the native usage of mixed precision NVIDIA Ampere Tensor
		/// Cores, i.e, `mma.sync..f32.f16.f16.f32` and `mma.sync..f32.bf16.bf16.f32`.
		/// This pattern folds the arithmetic extensions into the vector contraction and
		/// enables the usage of native mixed precision Tensor Core instructions.
		struct FoldArithExtIntoContractionOp
		: public OpRewritePattern<vector::ContractionOp> {
		using OpRewritePattern::OpRewritePattern;

		LogicalResult matchAndRewrite(vector::ContractionOp contractOp,
		PatternRewriter &rewriter) const override {

		auto lhsDefOp = contractOp.getLhs().getDefiningOp<arith::ExtFOp>();
		auto rhsDefOp = contractOp.getRhs().getDefiningOp<arith::ExtFOp>();

		if (!lhsDefOp \|\| !rhsDefOp) {
		dcaballeUnsubmitted Not Done Reply Inline Actions nit: no curly braces per coding standards dcaballe: nit: no curly braces per coding standards
		mravishankarUnsubmitted Not Done Reply Inline Actions nit nit: I think if the line spans more than one textual lines, braces are expected :) mravishankar: nit nit: I think if the line spans more than one textual lines, braces are expected :)
		return rewriter.notifyMatchFailure(contractOp,
		"no defining op on contract operands");
		}

		rewriter.replaceOpWithNewOp<vector::ContractionOp>(
		kuharUnsubmitted Done Reply Inline Actions You can combine the defining op check with the type check: auto lhsDef = contractOp.getLhs().getDefiningOp<arith::ExtFOp>(); ... if (!lhsDef \|\| !rhsDef) { kuhar: You can combine the defining op check with the type check: ``` auto lhsDef = contractOp.getLhs…
		contractOp, lhsDefOp->getOperand(0), rhsDefOp->getOperand(0),
		contractOp.getAcc(), contractOp.getIndexingMapsAttr(),
		contractOp.getIteratorTypesAttr());

		return success();
		}
		};

} // namespace		} // namespace

		void mlir::vector::populateFoldArithExtensionPatterns(
		RewritePatternSet &patterns) {
		patterns.add<FoldArithExtIntoContractionOp>(patterns.getContext());
		}

void mlir::vector::populateVectorMaskMaterializationPatterns(		void mlir::vector::populateVectorMaskMaterializationPatterns(
RewritePatternSet &patterns, bool force32BitVectorIndices,		RewritePatternSet &patterns, bool force32BitVectorIndices,
PatternBenefit benefit) {		PatternBenefit benefit) {
patterns.add<VectorCreateMaskOpConversion,		patterns.add<VectorCreateMaskOpConversion,
MaterializeTransferMask<vector::TransferReadOp>,		MaterializeTransferMask<vector::TransferReadOp>,
MaterializeTransferMask<vector::TransferWriteOp>>(		MaterializeTransferMask<vector::TransferWriteOp>>(
patterns.getContext(), force32BitVectorIndices, benefit);		patterns.getContext(), force32BitVectorIndices, benefit);
}		}
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

mlir/test/Conversion/VectorToGPU/fold-arith-vector-to-mma-ops-mma-sync.mlir

This file was added.

				// RUN: mlir-opt %s -split-input-file -pass-pipeline="builtin.module(func.func(test-fold-arith-extf-into-vector-contract-patterns,convert-vector-to-gpu{use-nvgpu=true},cse))" \| FileCheck %s

				//###############################################################################################
				// FP16 input, F32 accumulation row-row-row (ldmatrix x4 for matrixA and ldmatrix x4 for matrixB)
				//###############################################################################################

				#map0 = affine_map<(d0, d1) -> (d1, d0)>
				#map1 = affine_map<(d0, d1, d2) -> (d0, d2)>
				#map2 = affine_map<(d0, d1, d2) -> (d1, d2)>
				#map3 = affine_map<(d0, d1, d2) -> (d0, d1)>

				// CHECK-LABEL: func @m16n8k16_mmasync16816_f16_f16_f32_row_row_row
				func.func @m16n8k16_mmasync16816_f16_f16_f32_row_row_row(%arg0: memref<42x32xf16, #gpu.address_space<workgroup>>, %arg1: memref<32x64xf16, #gpu.address_space<workgroup>>, %arg2: memref<42x64xf32, #gpu.address_space<workgroup>>) {
				%c0 = arith.constant 0 : index
				%c8 = arith.constant 8 : index
				%cst_f16 = arith.constant 0.000000e+00 : f16
				%cst_f32 = arith.constant 0.000000e+00 : f32

				// CHECK-DAG: nvgpu.ldmatrix %arg0[%{{.}}, %{{.}}] {numTiles = 4 : i32, transpose = false}
				%A = vector.transfer_read %arg0[%c0, %c0], %cst_f16 {in_bounds = [true, true]} : memref<42x32xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>
				%A_f32 = arith.extf %A : vector<16x16xf16> to vector<16x16xf32>


				// CHECK-DAG: nvgpu.ldmatrix %arg1[%{{.}}, %{{.}}] {numTiles = 4 : i32, transpose = true}
				%B = vector.transfer_read %arg1[%c0, %c0], %cst_f16 {permutation_map = #map0, in_bounds = [true, true]} : memref<32x64xf16, #gpu.address_space<workgroup>>, vector<16x16xf16>
				%C = vector.transfer_read %arg2[%c0, %c0], %cst_f32 {in_bounds = [true, true]} : memref<42x64xf32, #gpu.address_space<workgroup>>, vector<16x16xf32>

				%B0 = vector.extract_strided_slice %B {offsets = [0, 0], sizes = [8, 16], strides = [1, 1]} : vector<16x16xf16> to vector<8x16xf16>
				%B0_f32 = arith.extf %B0 : vector<8x16xf16> to vector<8x16xf32>
				%C0 = vector.extract_strided_slice %C {offsets = [0, 0], sizes = [16, 8], strides = [1, 1]} : vector<16x16xf32> to vector<16x8xf32>

				// CHECK-DAG: nvgpu.mma.sync({{.*}}) {mmaShape = [16, 8, 16]} : (vector<4x2xf16>, vector<2x2xf16>, vector<2x2xf32>) -> vector<2x2xf32>
				%D0 = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A_f32, %B0_f32, %C0 : vector<16x16xf32>, vector<8x16xf32> into vector<16x8xf32>
				vector.transfer_write %D0, %arg2[%c0, %c0] {in_bounds = [true, true]} : vector<16x8xf32>, memref<42x64xf32, #gpu.address_space<workgroup>>


				%B1 = vector.extract_strided_slice %B {offsets = [8, 0], sizes = [8, 16], strides = [1, 1]} : vector<16x16xf16> to vector<8x16xf16>
				%B1_f32 = arith.extf %B1 : vector<8x16xf16> to vector<8x16xf32>
				%C1 = vector.extract_strided_slice %C {offsets = [0, 8], sizes = [16, 8], strides = [1, 1]} : vector<16x16xf32> to vector<16x8xf32>

				// CHECK-DAG: nvgpu.mma.sync({{.*}}) {mmaShape = [16, 8, 16]} : (vector<4x2xf16>, vector<2x2xf16>, vector<2x2xf32>) -> vector<2x2xf32>
				%D1 = vector.contract {indexing_maps = [#map1, #map2, #map3], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %A_f32, %B1_f32, %C1 : vector<16x16xf32>, vector<8x16xf32> into vector<16x8xf32>
				vector.transfer_write %D1, %arg2[%c0, %c0] {in_bounds = [true, true]} : vector<16x8xf32>, memref<42x64xf32, #gpu.address_space<workgroup>>

				return
				}

mlir/test/Dialect/Vector/fold-arith-extf-into-vector-contract.mlir

This file was added.

				// RUN: mlir-opt -split-input-file -test-fold-arith-extf-into-vector-contract-patterns %s \| FileCheck %s


				// CHECK-DAG: #[[$map0:.*]] = affine_map<(d0, d1, d2) -> (d0, d2)>
				// CHECK-DAG: #[[$map1:.*]] = affine_map<(d0, d1, d2) -> (d2, d1)>
				// CHECK-DAG: #[[$map2:.*]] = affine_map<(d0, d1, d2) -> (d0, d1)>
				// CHECK-LABEL: func.func @fold_arith_extf_into_contract
				// CHECK-SAME: (%[[ARG0:.]]: vector<64x64xf16>, %[[ARG1:.]]: vector<64x64xf16>, %[[ARG2:.*]]: vector<64x64xf32>)
				// CHECK-NEXT: %[[R:.+]] = vector.contract {indexing_maps = [#[[$map0]], #[[$map1]], #[[$map2]]],
				// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>}
				// CHECK-SAME: %[[ARG0]], %[[ARG1]], %[[ARG2]] : vector<64x64xf16>, vector<64x64xf16> into vector<64x64xf32>
				// CHECK-NEXT: return %[[R]] : vector<64x64xf32>
				func.func @fold_arith_extf_into_contract(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>) -> vector<64x64xf32> {
				%lhs_f32 = arith.extf %arg0 : vector<64x64xf16> to vector<64x64xf32>
				%rhs_f32 = arith.extf %arg1 : vector<64x64xf16> to vector<64x64xf32>
				%result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %lhs_f32, %rhs_f32, %arg2 : vector<64x64xf32>, vector<64x64xf32> into vector<64x64xf32>
				return %result : vector<64x64xf32>
				dcaballeUnsubmitted Not Done Reply Inline Actions Could you please add another test that lowers a `vector.contract f16, f16 -> f32` to plain arith instructions again? We have to make sure that the basic end-to-end lowering is working. dcaballe: Could you please add another test that lowers a `vector.contract f16, f16 -> f32` to plain…
				manishucsdAuthorUnsubmitted Not Done Reply Inline Actions I am not sure I follow this. Are we looking for a test mlir that looks something like the following? func.func @vector_contract_f16_f16_f32(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>) -> vector<64x64xf32> { %result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %arg0, %arg1, %arg2 : vector<64x64xf16>, vector<64x64xf16> into vector<64x64xf32> return %result : vector<64x64xf32> } The expectation from `fold-arith-extf-into-vector-contract-patterns` will be to do nothin for the above MLIR. Is that's what we are looking to add? manishucsd: I am not sure I follow this. Are we looking for a test mlir that looks something like the…
				dcaballeUnsubmitted Not Done Reply Inline Actions Sorry, I wasn't clear enough. I'm asking to add a test with the IR that you posted above (basically the output of the folding transformation that you are adding) and then invoke the vector contract lowering patterns on it to make sure there is a basic lowering for the new folded contract operation. You can take a look at `vector-contract-to-parallel-arith-transforms.mlir` or `vector-contract-to-outerproduct-transforms.mlir` and add the new test there. If a similar test exists, then great, we don't have to do anything! I'm requesting this because we have complaints about dead-end patterns in MLIR upstream, i.e., patterns that generate code which can't be lowered to anything else with MLIR upstream code. We should make sure this is not one of those case and a basic lowering to LLVM exists upstream. dcaballe: Sorry, I wasn't clear enough. I'm asking to add a test with the IR that you posted above…
				manishucsdAuthorUnsubmitted Done Reply Inline Actions For the below input IR: `vector-contract-to-parallel-arith-transforms.mlir` func.func @vector_contract_f16_f16_f32(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>) -> vector<64x64xf32> { %result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %arg0, %arg1, %arg2 : vector<64x64xf16>, vector<64x64xf16> into vector<64x64xf32> return %result : vector<64x64xf32> } vector-contract-to-parallel-arith-transforms will not lower it to `vector.fma`, if that is what we are looking for? See below: $ mlir-opt vector-contract-to-parallel-arith-transforms.mlir --test-transform-dialect-interpreter "builtin.module"() ({ "func.func"() <{function_type = (vector<64x64xf16>, vector<64x64xf16>, vector<64x64xf32>) -> vector<64x64xf32>, sym_name = "vector_contract_f16_f16_f32"}> ({ ^bb0(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>): %0 = "vector.contract"(%arg0, %arg1, %arg2) <{indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = [#vector.iterator_type<parallel>, #vector.iterator_type<parallel>, #vector.iterator_type<reduction>], kind = #vector.kind<add>}> : (vector<64x64xf16>, vector<64x64xf16>, vector<64x64xf32>) -> vector<64x64xf32> "func.return"(%0) : (vector<64x64xf32>) -> () }) : () -> () }) : () -> () `fold-arith-extf-into-vector-contract-patterns` is a NVIDIA GPU-specific folding that is needed to support mixed precision. The output of `fold-arith-extf-into-vector-contract-patterns` is unrolled into smaller `vector.contracts` to match `nvgpu.mma.sync`shapes and data type, then lowered to `nvgpu` dialect manishucsd: For the below input IR: `vector-contract-to-parallel-arith-transforms.mlir ` ``` func.func…
				}
				No newline at end of file

mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp

Show All 13 Lines
#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/Func/IR/FuncOps.h"		#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"		#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"		#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Passes.h"		#include "mlir/Dialect/Linalg/Passes.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"		#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"		#include "mlir/Dialect/MemRef/IR/MemRef.h"
		#include "mlir/Dialect/NVGPU/IR/NVGPUDialect.h"
#include "mlir/Dialect/SCF/IR/SCF.h"		#include "mlir/Dialect/SCF/IR/SCF.h"
#include "mlir/Dialect/Tensor/IR/Tensor.h"		#include "mlir/Dialect/Tensor/IR/Tensor.h"
#include "mlir/Dialect/Vector/IR/VectorOps.h"		#include "mlir/Dialect/Vector/IR/VectorOps.h"
#include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"		#include "mlir/Dialect/Vector/Transforms/LoweringPatterns.h"
#include "mlir/Dialect/Vector/Transforms/VectorDistribution.h"		#include "mlir/Dialect/Vector/Transforms/VectorDistribution.h"
#include "mlir/Dialect/Vector/Transforms/VectorRewritePatterns.h"		#include "mlir/Dialect/Vector/Transforms/VectorRewritePatterns.h"
#include "mlir/Dialect/Vector/Transforms/VectorTransforms.h"		#include "mlir/Dialect/Vector/Transforms/VectorTransforms.h"
#include "mlir/Pass/Pass.h"		#include "mlir/Pass/Pass.h"
▲ Show 20 Lines • Show All 674 Lines • ▼ Show 20 Lines	struct TestVectorTransferTensorSlicePatterns

void runOnOperation() override {		void runOnOperation() override {
RewritePatternSet patterns(&getContext());		RewritePatternSet patterns(&getContext());
populateVectorTransferTensorSliceTransforms(patterns);		populateVectorTransferTensorSliceTransforms(patterns);
(void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns));		(void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns));
}		}
};		};

		struct TestFoldArithExtensionIntoVectorContractPatterns
		: public PassWrapper<TestFoldArithExtensionIntoVectorContractPatterns,
		OperationPass<func::FuncOp>> {
		MLIR_DEFINE_EXPLICIT_INTERNAL_INLINE_TYPE_ID(
		TestFoldArithExtensionIntoVectorContractPatterns)

		StringRef getArgument() const final {
		return "test-fold-arith-extf-into-vector-contract-patterns";
		}
		StringRef getDescription() const final {
		return "Test patterns that fold arithmetic extension ops into vector "
		"contract ops";
		}

		void getDependentDialects(DialectRegistry &registry) const override {
		registry.insert<arith::ArithDialect, func::FuncDialect, nvgpu::NVGPUDialect,
		memref::MemRefDialect, scf::SCFDialect,
		tensor::TensorDialect, vector::VectorDialect>();
		}

		void runOnOperation() override {
		RewritePatternSet patterns(&getContext());
		populateFoldArithExtensionPatterns(patterns);
		(void)applyPatternsAndFoldGreedily(getOperation(), std::move(patterns));
		}
		};
} // namespace		} // namespace

namespace mlir {		namespace mlir {
namespace test {		namespace test {
void registerTestVectorLowerings() {		void registerTestVectorLowerings() {
PassRegistration<TestVectorToVectorLowering>();		PassRegistration<TestVectorToVectorLowering>();

PassRegistration<TestVectorContractionPrepareForMMTLowering>();		PassRegistration<TestVectorContractionPrepareForMMTLowering>();
Show All 20 Lines	void registerTestVectorLowerings() {

PassRegistration<TestVectorBreakDownBitCast>();		PassRegistration<TestVectorBreakDownBitCast>();

PassRegistration<TestCreateVectorBroadcast>();		PassRegistration<TestCreateVectorBroadcast>();

PassRegistration<TestVectorGatherLowering>();		PassRegistration<TestVectorGatherLowering>();

PassRegistration<TestVectorTransferTensorSlicePatterns>();		PassRegistration<TestVectorTransferTensorSlicePatterns>();

		PassRegistration<TestFoldArithExtensionIntoVectorContractPatterns>();
}		}
} // namespace test		} // namespace test
} // namespace mlir		} // namespace mlir

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract`
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 528621

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

mlir/test/Conversion/VectorToGPU/fold-arith-vector-to-mma-ops-mma-sync.mlir

mlir/test/Dialect/Vector/fold-arith-extf-into-vector-contract.mlir

mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract`ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 528621

mlir/include/mlir/Dialect/Vector/IR/VectorOps.h

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

mlir/test/Conversion/VectorToGPU/fold-arith-vector-to-mma-ops-mma-sync.mlir

mlir/test/Dialect/Vector/fold-arith-extf-into-vector-contract.mlir

mlir/test/lib/Dialect/Vector/TestVectorTransforms.cpp

[mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract`
ClosedPublic