This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Vector/Transforms/
-
Dialect/
-
Vector/
-
Transforms/
-
VectorTransforms.cpp
-
test/Dialect/Vector/
-
Dialect/
-
Vector/
-
fold-arith-extf-into-vector-contract.mlir

Differential D155645

[mlir][vector] Materialize out non-matching vector.contract operands
AbandonedPublic

Authored by rsuderman on Jul 18 2023, 2:28 PM.

Download Raw Diff

Details

Reviewers

aartbik
nicolasvasilache
dcaballe

Summary

vector.contract attempts to enforce the the lhs and rhs values are
required to have the same element type. In cases where this requirement is
not true we should materialize out the arith.extf to the accumulation
type.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rsuderman created this revision.Jul 18 2023, 2:28 PM

Herald added a reviewer: aartbik. · View Herald TranscriptJul 18 2023, 2:28 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: bviyer, Moerafaat, zero9178 and 23 others. · View Herald Transcript

rsuderman requested review of this revision.Jul 18 2023, 2:28 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 18 2023, 2:28 PM

Herald added a reviewer: dcaballe. · View Herald Transcript

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Hey Rob! Thanks for sending this patch.

E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.

I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?

Thanks,
Diego

In D155645#4512578, @dcaballe wrote:

Hey Rob! Thanks for sending this patch.

E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.

I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?

Thanks,
Diego

So there is actually a slight problem with that. I used the i8 and i16 case for a simple example. The actual case I am encountering is f16 and bf16 which have lossy casting both ways. Happy to look into an alternate solution but none are coming to my mind.

In D155645#4512836, @rsuderman wrote:

In D155645#4512578, @dcaballe wrote:

Hey Rob! Thanks for sending this patch.

E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.

I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?

Thanks,
Diego

So there is actually a slight problem with that. I used the i8 and i16 case for a simple example. The actual case I am encountering is f16 and bf16 which have lossy casting both ways. Happy to look into an alternate solution but none are coming to my mind.

Ha, that's interesting. I guess it's up to the framework to define what happens when we operate an f16 and bf16 (what is the direction of the conversion). It sounds like we want to disambiguate that as early as possible instead of rolling down the ball. That call would have to be made by specific spec in charge.

Harbormaster completed remote builds in B246355: Diff 541737.Jul 18 2023, 8:15 PM

In D155645#4512893, @dcaballe wrote:

In D155645#4512836, @rsuderman wrote:

In D155645#4512578, @dcaballe wrote:

Hey Rob! Thanks for sending this patch.

E.g. i8 and i16 may accumulate into an i32 legally, despite not matching in bitwidth.

I think if that is the case, we should extend i8 to i16 first and then use an i16, i16 -> i32 contract. We have to keep the number of variants under control at this level of abstraction. Otherwise, the transformations and contract lowerings would have to deal with too many cases.
Would that work for your case?

Thanks,
Diego

So there is actually a slight problem with that. I used the i8 and i16 case for a simple example. The actual case I am encountering is f16 and bf16 which have lossy casting both ways. Happy to look into an alternate solution but none are coming to my mind.

Ha, that's interesting. I guess it's up to the framework to define what happens when we operate an f16 and bf16 (what is the direction of the conversion). It sounds like we want to disambiguate that as early as possible instead of rolling down the ball. That call would have to be made by specific spec in charge.

Alright. I feel like the best option is to arith.extf the operands to the reduction type if they do not match? I cannot see a better option and seems to align with how we treated linalg.matmul internal reductions.

Updated to remove requirement change and instead update vectorization code

rsuderman retitled this revision from [mlir][vector] Drop matching element type requirement on `vector.contract` to [mlir][vector] Materialize out non-matching vector.contract operands.Jul 19 2023, 12:37 PM

rsuderman edited the summary of this revision. (Show Details)

Thanks, Rob! I would like to know what @nicolasvasilache thinks, in case I'm missing something, but I think both linalg.matmul and vector.contract shouldn't allow inputs with different types at all. That should be something to check in the op verification and the proper conversion should happen up in the pipeline, when we generate the linalg.matmul op in first place. Does that make sense to you?

In D155645#4516004, @dcaballe wrote:

Thanks, Rob! I would like to know what @nicolasvasilache thinks, in case I'm missing something, but I think both linalg.matmul and vector.contract shouldn't allow inputs with different types at all. That should be something to check in the op verification and the proper conversion should happen up in the pipeline, when we generate the linalg.matmul op in first place. Does that make sense to you?

I am not entirely certain I agree but lets wait to here from Nicolas. My assumption is most matmuls are bounded by I/O so decreasing tile sizes would take priority. If we handle the extf prior to the linalg.matmul it could result in materializing out a larger tensor, which would increase memory load times. I assume we would expect the arith.extf to fuse into the inner matmul in the end, however this feels potentially more complex than supporting mixed types.

Harbormaster completed remote builds in B246647: Diff 542155.Jul 19 2023, 1:55 PM

In D155645#4516080, @rsuderman wrote:

In D155645#4516004, @dcaballe wrote:

Thanks, Rob! I would like to know what @nicolasvasilache thinks, in case I'm missing something, but I think both linalg.matmul and vector.contract shouldn't allow inputs with different types at all. That should be something to check in the op verification and the proper conversion should happen up in the pipeline, when we generate the linalg.matmul op in first place. Does that make sense to you?

I am not entirely certain I agree but lets wait to here from Nicolas. My assumption is most matmuls are bounded by I/O so decreasing tile sizes would take priority. If we handle the extf prior to the linalg.matmul it could result in materializing out a larger tensor, which would increase memory load times. I assume we would expect the arith.extf to fuse into the inner matmul in the end, however this feels potentially more complex than supporting mixed types.

That could potentially happen with any binary operation, to name a few examples. I don't see a strong reason to make an exception for matmuls given the complexity added to lowerings and transformations. Even when targeting some of the RISC-V widening ops that allow wider accumulators we have been able to deal with the extension instruction without too much of a hasle.

rsuderman abandoned this revision.Aug 31 2023, 10:01 AM

Herald added a subscriber: sunshaoce. · View Herald TranscriptAug 31 2023, 10:01 AM

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Vector/

Transforms/

VectorTransforms.cpp

48 lines

test/

Dialect/

Vector/

fold-arith-extf-into-vector-contract.mlir

16 lines

Diff 542155

mlir/lib/Dialect/Vector/Transforms/VectorTransforms.cpp

Show First 20 Lines • Show All 1,298 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(vector::ContractionOp contractOp,
auto lhsDefOp = contractOp.getLhs().getDefiningOp<arith::ExtFOp>();		auto lhsDefOp = contractOp.getLhs().getDefiningOp<arith::ExtFOp>();
auto rhsDefOp = contractOp.getRhs().getDefiningOp<arith::ExtFOp>();		auto rhsDefOp = contractOp.getRhs().getDefiningOp<arith::ExtFOp>();

if (!lhsDefOp \|\| !rhsDefOp) {		if (!lhsDefOp \|\| !rhsDefOp) {
return rewriter.notifyMatchFailure(contractOp,		return rewriter.notifyMatchFailure(contractOp,
"no defining op on contract operands");		"no defining op on contract operands");
}		}

		auto lhsOperandTy = getElementTypeOrSelf(lhsDefOp.getOperand().getType());
		auto rhsOperandTy = getElementTypeOrSelf(rhsDefOp.getOperand().getType());
		if (lhsOperandTy != rhsOperandTy) {
		return rewriter.notifyMatchFailure(contractOp,
		"lhs/rhs extf operation must match");
		}

rewriter.replaceOpWithNewOp<vector::ContractionOp>(		rewriter.replaceOpWithNewOp<vector::ContractionOp>(
contractOp, lhsDefOp->getOperand(0), rhsDefOp->getOperand(0),		contractOp, lhsDefOp->getOperand(0), rhsDefOp->getOperand(0),
contractOp.getAcc(), contractOp.getIndexingMapsAttr(),		contractOp.getAcc(), contractOp.getIndexingMapsAttr(),
contractOp.getIteratorTypesAttr());		contractOp.getIteratorTypesAttr());

return success();		return success();
}		}
};		};

		/// Pattern to materlize arithmetic extensions out of floating point data
		/// types from vector contraction operations. This is a specialized case
		/// when `vector.contraction` has inputs of non-matching types.
		struct MaterializeContractionOpExt
		: public OpRewritePattern<vector::ContractionOp> {
		using OpRewritePattern::OpRewritePattern;

		LogicalResult matchAndRewrite(vector::ContractionOp contractOp,
		PatternRewriter &rewriter) const override {

		Value lhs = contractOp.getLhs();
		Value rhs = contractOp.getRhs();
		auto lhsTy = cast<VectorType>(lhs.getType());
		auto rhsTy = cast<VectorType>(rhs.getType());
		if (getElementTypeOrSelf(lhsTy) == getElementTypeOrSelf(rhsTy)) {
		return rewriter.notifyMatchFailure(contractOp, "lhs/rhs match correctly");
		}

		auto loc = contractOp.getLoc();
		auto resultTy = contractOp.getType();
		if (getElementTypeOrSelf(lhs) != getElementTypeOrSelf(resultTy)) {
		lhs = rewriter.create<arith::ExtFOp>(
		loc, lhsTy.clone(getElementTypeOrSelf(resultTy)), lhs);
		}

		if (getElementTypeOrSelf(rhs) != getElementTypeOrSelf(resultTy)) {
		rhs = rewriter.create<arith::ExtFOp>(
		loc, rhsTy.clone(getElementTypeOrSelf(resultTy)), rhs);
		}

		rewriter.replaceOpWithNewOp<vector::ContractionOp>(
		contractOp, resultTy, lhs, rhs, contractOp.getAcc(),
		contractOp.getIndexingMapsAttr(), contractOp.getIteratorTypesAttr());

		return success();
		}
		};

} // namespace		} // namespace

void mlir::vector::populateFoldArithExtensionPatterns(		void mlir::vector::populateFoldArithExtensionPatterns(
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
patterns.add<FoldArithExtIntoContractionOp>(patterns.getContext());		patterns.add<FoldArithExtIntoContractionOp, MaterializeContractionOpExt>(
		patterns.getContext());
}		}

void mlir::vector::populateVectorMaskMaterializationPatterns(		void mlir::vector::populateVectorMaskMaterializationPatterns(
RewritePatternSet &patterns, bool force32BitVectorIndices,		RewritePatternSet &patterns, bool force32BitVectorIndices,
PatternBenefit benefit) {		PatternBenefit benefit) {
patterns.add<VectorCreateMaskOpConversion,		patterns.add<VectorCreateMaskOpConversion,
MaterializeTransferMask<vector::TransferReadOp>,		MaterializeTransferMask<vector::TransferReadOp>,
MaterializeTransferMask<vector::TransferWriteOp>>(		MaterializeTransferMask<vector::TransferWriteOp>>(
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

mlir/test/Dialect/Vector/fold-arith-extf-into-vector-contract.mlir

	Show All 9 Lines
	// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>}			// CHECK-SAME: iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>}
	// CHECK-SAME: %[[ARG0]], %[[ARG1]], %[[ARG2]] : vector<64x64xf16>, vector<64x64xf16> into vector<64x64xf32>			// CHECK-SAME: %[[ARG0]], %[[ARG1]], %[[ARG2]] : vector<64x64xf16>, vector<64x64xf16> into vector<64x64xf32>
	// CHECK-NEXT: return %[[R]] : vector<64x64xf32>			// CHECK-NEXT: return %[[R]] : vector<64x64xf32>
	func.func @fold_arith_extf_into_contract(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>) -> vector<64x64xf32> {			func.func @fold_arith_extf_into_contract(%arg0: vector<64x64xf16>, %arg1: vector<64x64xf16>, %arg2: vector<64x64xf32>) -> vector<64x64xf32> {
	%lhs_f32 = arith.extf %arg0 : vector<64x64xf16> to vector<64x64xf32>			%lhs_f32 = arith.extf %arg0 : vector<64x64xf16> to vector<64x64xf32>
	%rhs_f32 = arith.extf %arg1 : vector<64x64xf16> to vector<64x64xf32>			%rhs_f32 = arith.extf %arg1 : vector<64x64xf16> to vector<64x64xf32>
	%result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %lhs_f32, %rhs_f32, %arg2 : vector<64x64xf32>, vector<64x64xf32> into vector<64x64xf32>			%result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %lhs_f32, %rhs_f32, %arg2 : vector<64x64xf32>, vector<64x64xf32> into vector<64x64xf32>
	return %result : vector<64x64xf32>			return %result : vector<64x64xf32>
	}			}
	No newline at end of file
				// -----

				// CHECK-LABEL: @no_fold_arith_extf_into_contract
				// CHECK: %[[EXTF0:.+]] = arith.extf %arg0 : vector<2x4xf16> to vector<2x4xf32>
				// CHECK: %[[EXTF1:.+]] = arith.extf %arg1 : vector<4x15xbf16> to vector<4x15xf32>
				// CHECK: %[[CONTRACT:.+]] = vector.contract
				// CHECK-SAME: %[[EXTF0]], %[[EXTF1]], %arg2 : vector<2x4xf32>, vector<4x15xf32> into vector<2x15xf32>
				func.func @no_fold_arith_extf_into_contract(%arg0 : vector<2x4xf16>, %arg1 : vector<4x15xbf16>, %arg2 : vector<2x15xf32>) -> vector<2x15xf32> {
				%0 = arith.extf %arg0 : vector<2x4xf16> to vector<2x4xf32>
				%1 = arith.extf %arg1 : vector<4x15xbf16> to vector<4x15xf32>
				%2 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d2, d1)>, affine_map<(d0, d1, d2) -> (d0, d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %0, %1, %arg2 : vector<2x4xf32>, vector<4x15xf32> into vector<2x15xf32>
				return %2 : vector<2x15xf32>
				}