This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
include/mlir/Dialect/Linalg/
-
mlir/
-
Dialect/
-
Linalg/
-
Passes.h
-
Passes.td
-
Transforms/
-
Transforms.h
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
-
CMakeLists.txt
-
QuantizedMatmulToMatmul.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
quantized-matmul-to-matmul.mlir

Differential D120358

Add a lowering of quantized_matmul to matmul.
AbandonedPublic

Authored by Benoit on Feb 22 2022, 2:18 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache

Summary

This adds a pattern that lowers linalg.quantized_matmul to linalg.matmul.

quantized_matmul is useful as a higher-level op, as it matches higher-level dialects such as TOSA.
Most practical codegen paths will want to distribute out the zero-point
subtractions and thereby reduce quantized_matmul to matmul. This commit
adds a pass and pattern doing that literally, generating a linalg.matmul
named op. While that's not necessary (generic linalg transforms
could [be taught to] do the same), at this point a few different groups have found (*) that
they currently depend on matmuls being named matmul ops for various
reasons, so this pattern will be useful for the time being.

(*): Tracking issue:
https://github.com/google/iree/issues/8330

Two different sets of people having run into this:
https://github.com/google/iree/issues/8149
https://github.com/google/iree/pull/8281

Independently, I also need this for other matmul-to-mmt4d.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

Benoit created this revision.Feb 22 2022, 2:18 PM

Herald added subscribers: sdasgup3, wenzhicui, wrengr and 24 others. · View Herald TranscriptFeb 22 2022, 2:18 PM

Benoit requested review of this revision.Feb 22 2022, 2:18 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 22 2022, 2:18 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Benoit retitled this revision from quantized-matmul-to-matmul to [WIP][DRAFT]quantized-matmul-to-matmul.Feb 22 2022, 2:20 PM

Benoit edited the summary of this revision. (Show Details)

Herald added a subscriber: limo1996. · View Herald TranscriptFeb 22 2022, 2:20 PM

Harbormaster completed remote builds in B150942: Diff 410646.Feb 22 2022, 2:44 PM

ormris removed a subscriber: ormris.Feb 24 2022, 10:08 AM

Ready for review!

Benoit retitled this revision from [WIP][DRAFT]quantized-matmul-to-matmul to Add a lowering of quantized_matmul to matmul..Feb 24 2022, 11:12 AM

Benoit edited the summary of this revision. (Show Details)

Benoit added a reviewer: mravishankar.

Herald added a subscriber: eric-k256. · View Herald TranscriptFeb 24 2022, 11:12 AM

remove unneeded using namespace

Benoit edited the summary of this revision. (Show Details)Feb 24 2022, 11:23 AM

Harbormaster completed remote builds in B151329: Diff 411197.Feb 24 2022, 11:47 AM

Benoit removed reviewers: nicolasvasilache, mravishankar.Feb 24 2022, 1:34 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptFeb 24 2022, 1:34 PM

Abandoning: per discussion with @mravishankar , quantized_matmul may be an oddity not worthy of a new MLIR core pattern. Going to take this to IREE for now. Feel free to reopen / take over this if there is interest in this on the MLIR core side.

For what it's worth: the new home of this is https://github.com/google/iree/pull/8409

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Linalg/

Passes.h

4 lines

Passes.td

7 lines

Transforms/

Transforms.h

3 lines

lib/

Dialect/

Linalg/

Transforms/

CMakeLists.txt

1 line

QuantizedMatmulToMatmul.cpp

245 lines

test/

Dialect/

Linalg/

quantized-matmul-to-matmul.mlir

132 lines

Diff 411197

mlir/include/mlir/Dialect/Linalg/Passes.h

Show First 20 Lines • Show All 149 Lines • ▼ Show 20 Lines	std::unique_ptr<OperationPass<FuncOp>> createLinalgStrategyLowerVectorsPass(
linalg::LinalgVectorLoweringOptions opt =		linalg::LinalgVectorLoweringOptions opt =
linalg::LinalgVectorLoweringOptions(),		linalg::LinalgVectorLoweringOptions(),
const linalg::LinalgTransformationFilter &filter =		const linalg::LinalgTransformationFilter &filter =
linalg::LinalgTransformationFilter());		linalg::LinalgTransformationFilter());

/// Create a LinalgStrategyRemoveMarkersPass.		/// Create a LinalgStrategyRemoveMarkersPass.
std::unique_ptr<OperationPass<FuncOp>> createLinalgStrategyRemoveMarkersPass();		std::unique_ptr<OperationPass<FuncOp>> createLinalgStrategyRemoveMarkersPass();

		/// Create a LinalgQuantizedMatmulToMatmulPass.
		std::unique_ptr<OperationPass<FuncOp>>
		createLinalgQuantizedMatmulToMatmulPass();

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Registration		// Registration
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// Generate the code for registering passes.		/// Generate the code for registering passes.
#define GEN_PASS_REGISTRATION		#define GEN_PASS_REGISTRATION
#include "mlir/Dialect/Linalg/Passes.h.inc"		#include "mlir/Dialect/Linalg/Passes.h.inc"

} // namespace mlir		} // namespace mlir

#endif // MLIR_DIALECT_LINALG_PASSES_H_		#endif // MLIR_DIALECT_LINALG_PASSES_H_

mlir/include/mlir/Dialect/Linalg/Passes.td

Show First 20 Lines • Show All 386 Lines • ▼ Show 20 Lines	def LinalgStrategyRemoveMarkersPass
let constructor = "mlir::createLinalgStrategyRemoveMarkersPass()";		let constructor = "mlir::createLinalgStrategyRemoveMarkersPass()";
let dependentDialects = ["linalg::LinalgDialect"];		let dependentDialects = ["linalg::LinalgDialect"];
let options = [		let options = [
Option<"anchorFuncName", "anchor-func", "std::string", /default=/"",		Option<"anchorFuncName", "anchor-func", "std::string", /default=/"",
"Which func op is the anchor to latch on.">,		"Which func op is the anchor to latch on.">,
];		];
}		}

		def LinalgQuantizedMatmulToMatmulPass
		: Pass<"linalg-quantized-matmul-to-matmul", "FuncOp"> {
		let summary = "lower quantized_matmul to matmul";
		let constructor = "mlir::createLinalgQuantizedMatmulToMatmulPass()";
		let dependentDialects = ["linalg::LinalgDialect"];
		}

#endif // MLIR_DIALECT_LINALG_PASSES		#endif // MLIR_DIALECT_LINALG_PASSES

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

	Show First 20 Lines • Show All 1,372 Lines • ▼ Show 20 Lines
	///			///
	/// Note: This function rewrites the given TiledLoopOp in-place and clones the			/// Note: This function rewrites the given TiledLoopOp in-place and clones the
	/// TileLoopOp operation for the last iteration. It replaces all uses of the			/// TileLoopOp operation for the last iteration. It replaces all uses of the
	/// unpeeled TiledLoopOp with the results of the newly generated TiledLoopOp.			/// unpeeled TiledLoopOp with the results of the newly generated TiledLoopOp.
	LogicalResult peelAndCanonicalizeTiledLoop(RewriterBase &rewriter,			LogicalResult peelAndCanonicalizeTiledLoop(RewriterBase &rewriter,
	TiledLoopOp loopOp, int64_t idx,			TiledLoopOp loopOp, int64_t idx,
	TiledLoopOp &result);			TiledLoopOp &result);

				/// Patterns to lower quantized_matmul to matmul.
				void populateQuantizedMatmulToMatmulPatterns(RewritePatternSet &patterns);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Support for staged pattern application.			// Support for staged pattern application.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	/// Helper function to allow applying rewrite patterns, interleaved with more			/// Helper function to allow applying rewrite patterns, interleaved with more
	/// global transformations, in a staged fashion:			/// global transformations, in a staged fashion:
	/// 1. the first stage consists of a list of FrozenRewritePatternSet. Each			/// 1. the first stage consists of a list of FrozenRewritePatternSet. Each
	/// FrozenRewritePatternSet in this list is applied once, in order.			/// FrozenRewritePatternSet in this list is applied once, in order.
	/// 2. the second stage consists of a single RewritePattern that is applied			/// 2. the second stage consists of a single RewritePattern that is applied
	▲ Show 20 Lines • Show All 89 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

Show All 13 Lines	add_mlir_dialect_library(MLIRLinalgTransforms
HoistPadding.cpp		HoistPadding.cpp
InlineScalarOperands.cpp		InlineScalarOperands.cpp
Interchange.cpp		Interchange.cpp
Loops.cpp		Loops.cpp
LinalgStrategyPasses.cpp		LinalgStrategyPasses.cpp
NamedOpConversions.cpp		NamedOpConversions.cpp
PadOpInterchange.cpp		PadOpInterchange.cpp
Promotion.cpp		Promotion.cpp
		QuantizedMatmulToMatmul.cpp
Tiling.cpp		Tiling.cpp
Transforms.cpp		Transforms.cpp
Vectorization.cpp		Vectorization.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Linalg		${MLIR_MAIN_INCLUDE_DIR}/mlir/Dialect/Linalg

DEPENDS		DEPENDS
Show All 37 Lines

mlir/lib/Dialect/Linalg/Transforms/QuantizedMatmulToMatmul.cpp

This file was added.

				//===- QuantizedMatmulToMatmul.cpp - lower quantized_matmul to matmul -----===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file rewrites any linalg.quantized_matmul into a linalg.matmul plus
				// other ops as needed to implement the effect of the zero-points.
				//
				//===----------------------------------------------------------------------===//

				#include "PassDetail.h"
				#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
				#include "mlir/Dialect/Linalg/IR/Linalg.h"
				#include "mlir/Dialect/Linalg/Passes.h"
				#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
				#include "mlir/Dialect/Linalg/Utils/Utils.h"
				#include "mlir/Dialect/Tensor/IR/Tensor.h"
				#include "mlir/IR/AffineExpr.h"
				#include "mlir/IR/AffineMap.h"
				#include "mlir/IR/BuiltinTypes.h"
				#include "mlir/Transforms/FoldUtils.h"
				#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/Debug.h"

				using namespace mlir;

				namespace {

				// Returns the add-reduction of the input 2D tensor `matrix` along one of the
				// two dimensions. The `parallelDim` argument specifies which of the two
				// dimensions (0 or 1) is the parallel (i.e. not reduction) dimension.
				// The input `matrix`'s element type is assumed to be signless integer.
				// The result's element type is `accElTy`. The input elements are sign-extended
				// to `accElTy` before being added.
				Value additiveReductionLeaving1ParallelDim(PatternRewriter &rewriter,
				Location loc, Value matrix,
				int parallelDim, Type accElTy) {
				RankedTensorType matrixType = matrix.getType().cast<RankedTensorType>();
				assert(matrixType.getRank() == 2);
				assert(parallelDim == 0 \|\| parallelDim == 1);
				// Create the accumulator.
				int64_t dstStaticSize = matrixType.getShape()[parallelDim];
				SmallVector<Value> dstDynSizes;
				if (dstStaticSize == ShapedType::kDynamicSize) {
				dstDynSizes.push_back(
				rewriter.create<tensor::DimOp>(loc, matrix, parallelDim));
				}
				Value initAcc =
				rewriter
				.create<linalg::InitTensorOp>(
				loc, dstDynSizes, ArrayRef<int64_t>{dstStaticSize}, accElTy)
				.getResult();
				// Zero-fill the accumulator.
				Value zeroInt =
				rewriter.create<arith::ConstantIntOp>(loc, 0, accElTy).getResult();
				Value zeroAcc =
				rewriter.create<linalg::FillOp>(loc, zeroInt, initAcc).getResult(0);
				// Create the indexing maps for the generic.
				MLIRContext *context = rewriter.getContext();
				AffineExpr expr[2];
				bindDims(context, expr[0], expr[1]);
				AffineExpr parallelExpr = expr[parallelDim];
				AffineMap mapIdentity = AffineMap::get(2, 0, expr, context);
				AffineMap mapToParallelDim = AffineMap::get(2, 0, parallelExpr, context);
				SmallVector<AffineMap> indexingMaps{mapIdentity, mapToParallelDim};
				// Create the iterators for the generic.
				auto iterator = [=](int dim) -> StringRef {
				return dim == parallelDim ? "parallel" : "reduction";
				};
				SmallVector<StringRef> iterators{iterator(0), iterator(1)};
				// Create the generic.
				return rewriter
				.create<linalg::GenericOp>(
				loc, zeroAcc.getType(), ValueRange{matrix}, ValueRange{zeroAcc},
				indexingMaps, iterators,
				[=](OpBuilder &b, Location loc, ValueRange args) {
				Value matrixEl = args[0];
				// Sign-extend the input matrix elem to accElTy before adding.
				Value promotedMatrixEl =
				b.create<arith::ExtSIOp>(loc, accElTy, matrixEl);
				Value accEl = args[1];
				Value sum = b.create<arith::AddIOp>(loc, promotedMatrixEl, accEl);
				b.create<linalg::YieldOp>(loc, sum);
				})
				.getResult(0);
				}

				bool isConstantZero(Value val) {
				auto constIntOp = val.getDefiningOp<arith::ConstantIntOp>();
				return constIntOp && constIntOp.value() == 0;
				}

				// Pattern lowering quantized_matmul to matmul.
				// Always succeeds.
				//
				// This is implementing the math explained in Section 2.3 of
				// https://arxiv.org/abs/1712.05877.
				struct QuantizedMatmulToMatmul
				: public OpRewritePattern<linalg::QuantizedMatmulOp> {
				using OpRewritePattern<linalg::QuantizedMatmulOp>::OpRewritePattern;

				LogicalResult matchAndRewrite(linalg::QuantizedMatmulOp quantizedMatmulOp,
				PatternRewriter &rewriter) const override {
				Location loc = quantizedMatmulOp.getLoc();
				ValueRange inputs = quantizedMatmulOp.inputs();
				assert(inputs.size() == 4);
				Value lhs = inputs[0];
				Value rhs = inputs[1];
				Value lhsZp = inputs[2];
				Value rhsZp = inputs[3];
				ValueRange outputs = quantizedMatmulOp.outputs();
				// Compute the matmul part.
				Value acc = outputs[0];
				Value matmul = rewriter
				.create<linalg::MatmulOp>(loc, ValueRange{lhs, rhs},
				ValueRange{acc})
				.getResult(0);
				bool lhsZpIsConstantZero = isConstantZero(lhsZp);
				bool rhsZpIsConstantZero = isConstantZero(rhsZp);
				if (lhsZpIsConstantZero && rhsZpIsConstantZero) {
				// Easy case: both zero points are constant zeros, so the quantized_matmul
				// was just a matmul all along.
				rewriter.replaceOp(quantizedMatmulOp, matmul);
				return success();
				}
				// Create the result. No need to zero-fill it as we will overwrite it.
				ShapedType accType = acc.getType().cast<ShapedType>();
				auto accDynShape = linalg::getDynOperands(loc, acc, rewriter);
				Value initResult = rewriter.create<linalg::InitTensorOp>(
				loc, accDynShape, accType.getShape(), accType.getElementType());
				// Create the indexing maps for the generic.
				MLIRContext *context = rewriter.getContext();
				AffineExpr m, n;
				bindDims(context, m, n);
				AffineMap mapToNone = AffineMap::get(2, 0, context);
				AffineMap mapToRowDim = AffineMap::get(2, 0, m, context);
				AffineMap mapToColumnDim = AffineMap::get(2, 0, n, context);
				AffineMap mapIdentity =
				AffineMap::get(2, 0, ArrayRef<AffineExpr>{m, n}, context);
				SmallVector<AffineMap> indexingMaps;
				SmallVector<Value> ins;
				auto addInput = [&](Value val, AffineMap map) -> int {
				ins.push_back(val);
				indexingMaps.push_back(map);
				return ins.size() - 1;
				};
				int indexOfMatmulInput = addInput(matmul, mapIdentity);
				int indexOfLhsSumsInput = 0;
				int indexOfLhsZpInput = 0;
				int indexOfRhsSumsInput = 0;
				int indexOfRhsZpInput = 0;
				int indexOfLhsZpTimesRhsZpTimesKSizeInput = 0;
				Type accElTy = accType.getElementType();
				if (!rhsZpIsConstantZero) {
				Value lhsSums =
				additiveReductionLeaving1ParallelDim(rewriter, loc, lhs, 0, accElTy);
				indexOfLhsSumsInput = addInput(lhsSums, mapToRowDim);
				indexOfRhsZpInput = addInput(rhsZp, mapToNone);
				}
				if (!lhsZpIsConstantZero) {
				Value rhsSums =
				additiveReductionLeaving1ParallelDim(rewriter, loc, rhs, 1, accElTy);
				indexOfRhsSumsInput = addInput(rhsSums, mapToColumnDim);
				indexOfLhsZpInput = addInput(lhsZp, mapToNone);
				}
				if (!lhsZpIsConstantZero && !rhsZpIsConstantZero) {
				Value lhsZpTimesRhsZp = rewriter.create<arith::MulIOp>(loc, lhsZp, rhsZp);
				Value kSize = rewriter.create<arith::IndexCastOp>(
				loc, accElTy, rewriter.create<tensor::DimOp>(loc, lhs, 1));
				Value lhsZpTimesRhsZpTimesKSize =
				rewriter.create<arith::MulIOp>(loc, lhsZpTimesRhsZp, kSize);
				indexOfLhsZpTimesRhsZpTimesKSizeInput =
				addInput(lhsZpTimesRhsZpTimesKSize, mapToNone);
				}
				// Add the indexing map for the initResult 'output' even though it's unused.
				indexingMaps.push_back(mapIdentity);
				// Create the generic putting all the terms together.
				SmallVector<StringRef> iterators{"parallel", "parallel"};
				rewriter.replaceOpWithNewOp<linalg::GenericOp>(
				quantizedMatmulOp, acc.getType(), ins, ValueRange{initResult},
				indexingMaps, iterators,
				[=](OpBuilder &b, Location loc, ValueRange args) {
				Value matmulEl = args[indexOfMatmulInput];
				Value lhsSumsEl = args[indexOfLhsSumsInput];
				Value rhsSumsEl = args[indexOfRhsSumsInput];
				Value lhsZp = args[indexOfLhsZpInput];
				Value rhsZp = args[indexOfRhsZpInput];
				Value lhsZpTimesRhsZpTimesKSize =
				args[indexOfLhsZpTimesRhsZpTimesKSizeInput];
				Value result = matmulEl;
				// If the rhs zero-point is not a constant zero, we need to add it
				// times the sums along rows of lhs.
				if (!rhsZpIsConstantZero) {
				Value lhsSumsElTimesRhsZp =
				b.create<arith::MulIOp>(loc, lhsSumsEl, rhsZp);
				result = b.create<arith::SubIOp>(loc, result, lhsSumsElTimesRhsZp);
				}
				// If the lhs zero-point is not a constant zero, we need to add it
				// times the sums along columns of rhs.
				if (!lhsZpIsConstantZero) {
				Value rhsSumsElTimesLhsZp =
				b.create<arith::MulIOp>(loc, rhsSumsEl, lhsZp);
				result = b.create<arith::SubIOp>(loc, result, rhsSumsElTimesLhsZp);
				}
				// Add the final correction term, if neither zero-point is cst zero.
				if (!lhsZpIsConstantZero && !rhsZpIsConstantZero) {
				result =
				b.create<arith::AddIOp>(loc, result, lhsZpTimesRhsZpTimesKSize);
				}
				b.create<linalg::YieldOp>(loc, result);
				});
				return success();
				}
				};
				} // namespace

				void mlir::linalg::populateQuantizedMatmulToMatmulPatterns(
				RewritePatternSet &patterns) {
				auto *context = patterns.getContext();
				patterns.add<QuantizedMatmulToMatmul>(context);
				}

				namespace {
				/// Pass that lowers quantized_matmul to matmul.
				struct LinalgQuantizedMatmulToMatmulPass
				: public LinalgQuantizedMatmulToMatmulPassBase<
				LinalgQuantizedMatmulToMatmulPass> {
				void runOnOperation() override {
				Operation *op = getOperation();
				MLIRContext *context = op->getContext();
				RewritePatternSet patterns(context);
				linalg::populateQuantizedMatmulToMatmulPatterns(patterns);
				(void)applyPatternsAndFoldGreedily(op, std::move(patterns));
				}
				};
				} // namespace

				std::unique_ptr<OperationPass<FuncOp>>
				mlir::createLinalgQuantizedMatmulToMatmulPass() {
				return std::make_unique<LinalgQuantizedMatmulToMatmulPass>();
				}

mlir/test/Dialect/Linalg/quantized-matmul-to-matmul.mlir

This file was added.

				// RUN: mlir-opt -linalg-quantized-matmul-to-matmul -split-input-file %s \| FileCheck %s

				// Tests -linalg-quantized-matmul-to-matmul, converting linalg.quantized_matmul
				// ops to linalg.matmul ops plus additional arithmetic to account for any
				// nonzero zero-point.

				func @quantized_matmul_both_zp_0_dynamic(%lhs : tensor<?x?xi8>, %rhs : tensor<?x?xi8>, %acc : tensor<?x?xi32>) -> tensor<?x?xi32> {
				%lhs_zp = arith.constant 0 : i32
				%rhs_zp = arith.constant 0 : i32
				%1 = linalg.quantized_matmul ins(%lhs, %rhs, %lhs_zp, %rhs_zp : tensor<?x?xi8>, tensor<?x?xi8>, i32, i32) outs(%acc : tensor<?x?xi32>) -> tensor<?x?xi32>
				return %1 : tensor<?x?xi32>
				}
				// CHECK-LABEL: func @quantized_matmul_both_zp_0_dynamic
				// CHECK-SAME: %[[LHS:.+]]: tensor<?x?xi8>, %[[RHS:.+]]: tensor<?x?xi8>
				// CHECK-SAME: %[[ACC:.+]]: tensor<?x?xi32>
				// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%[[LHS]], %[[RHS]] : tensor<?x?xi8>, tensor<?x?xi8>) outs(%[[ACC]] : tensor<?x?xi32>)
				// CHECK: return %[[MATMUL]]
				// -----

				func @quantized_matmul_lhs_zp_0_dynamic(%lhs : tensor<?x?xi8>, %rhs : tensor<?x?xi8>, %rhs_zp : i32, %acc : tensor<?x?xi32>) -> tensor<?x?xi32> {
				%lhs_zp = arith.constant 0 : i32
				%1 = linalg.quantized_matmul ins(%lhs, %rhs, %lhs_zp, %rhs_zp : tensor<?x?xi8>, tensor<?x?xi8>, i32, i32) outs(%acc : tensor<?x?xi32>) -> tensor<?x?xi32>
				return %1 : tensor<?x?xi32>
				}
				// CHECK-LABEL: func @quantized_matmul_lhs_zp_0_dynamic
				// CHECK-SAME: %[[LHS:.+]]: tensor<?x?xi8>, %[[RHS:.+]]: tensor<?x?xi8>
				// CHECK-SAME: %[[RHS_ZP:.+]]: i32
				// CHECK-SAME: %[[ACC:.+]]: tensor<?x?xi32>
				// CHECK: %[[C0_I32:.+]] = arith.constant 0 : i32
				// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%[[LHS]], %[[RHS]] : tensor<?x?xi8>, tensor<?x?xi8>) outs(%[[ACC]] : tensor<?x?xi32>)
				// CHECK: %[[INIT_RESULT:.+]] = linalg.init_tensor
				// CHECK: %[[INIT_LHS_SUMS_ACC:.+]] = linalg.init_tensor
				// CHECK: %[[ZERO_LHS_SUMS_ACC:.+]] = linalg.fill(%[[C0_I32]], %[[INIT_LHS_SUMS_ACC]])
				// CHECK: %[[LHS_SUMS:.+]] = linalg.generic
				// CHECK-SAME: "parallel", "reduction"
				// CHECK-SAME: ins(%[[LHS]] : tensor<?x?xi8>)
				// CHECK-SAME: outs(%[[ZERO_LHS_SUMS_ACC]] : tensor<?xi32>)
				// CHECK: %[[RESULT:.+]] = linalg.generic
				// CHECK-SAME: "parallel", "parallel"
				// CHECK-SAME: ins(%[[MATMUL]], %[[LHS_SUMS]], %[[RHS_ZP]] : tensor<?x?xi32>, tensor<?xi32>, i32)
				// CHECK: return %[[RESULT]]
				// -----

				func @quantized_matmul_rhs_zp_0_dynamic(%lhs : tensor<?x?xi8>, %rhs : tensor<?x?xi8>, %lhs_zp : i32, %acc : tensor<?x?xi32>) -> tensor<?x?xi32> {
				%rhs_zp = arith.constant 0 : i32
				%1 = linalg.quantized_matmul ins(%lhs, %rhs, %lhs_zp, %rhs_zp : tensor<?x?xi8>, tensor<?x?xi8>, i32, i32) outs(%acc : tensor<?x?xi32>) -> tensor<?x?xi32>
				return %1 : tensor<?x?xi32>
				}
				// CHECK-LABEL: func @quantized_matmul_rhs_zp_0_dynamic
				// CHECK-SAME: %[[LHS:.+]]: tensor<?x?xi8>, %[[RHS:.+]]: tensor<?x?xi8>
				// CHECK-SAME: %[[LHS_ZP:.+]]: i32
				// CHECK-SAME: %[[ACC:.+]]: tensor<?x?xi32>
				// CHECK: %[[C0_I32:.+]] = arith.constant 0 : i32
				// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%[[LHS]], %[[RHS]] : tensor<?x?xi8>, tensor<?x?xi8>) outs(%[[ACC]] : tensor<?x?xi32>)
				// CHECK: %[[INIT_RESULT:.+]] = linalg.init_tensor
				// CHECK: %[[INIT_RHS_SUMS_ACC:.+]] = linalg.init_tensor
				// CHECK: %[[ZERO_RHS_SUMS_ACC:.+]] = linalg.fill(%[[C0_I32]], %[[INIT_RHS_SUMS_ACC]])
				// CHECK: %[[RHS_SUMS:.+]] = linalg.generic
				// CHECK-SAME: "reduction", "parallel"
				// CHECK-SAME: ins(%[[RHS]] : tensor<?x?xi8>)
				// CHECK-SAME: outs(%[[ZERO_RHS_SUMS_ACC]] : tensor<?xi32>)
				// CHECK: %[[RESULT:.+]] = linalg.generic
				// CHECK-SAME: "parallel", "parallel"
				// CHECK-SAME: ins(%[[MATMUL]], %[[RHS_SUMS]], %[[LHS_ZP]] : tensor<?x?xi32>, tensor<?xi32>, i32)
				// CHECK: return %[[RESULT]]
				// -----

				func @quantized_matmul_neither_zp_0_dynamic(%lhs : tensor<?x?xi8>, %rhs : tensor<?x?xi8>, %lhs_zp : i32, %rhs_zp : i32, %acc : tensor<?x?xi32>) -> tensor<?x?xi32> {
				%1 = linalg.quantized_matmul ins(%lhs, %rhs, %lhs_zp, %rhs_zp : tensor<?x?xi8>, tensor<?x?xi8>, i32, i32) outs(%acc : tensor<?x?xi32>) -> tensor<?x?xi32>
				return %1 : tensor<?x?xi32>
				}
				// CHECK-LABEL: func @quantized_matmul_neither_zp_0_dynamic
				// CHECK-SAME: %[[LHS:.+]]: tensor<?x?xi8>, %[[RHS:.+]]: tensor<?x?xi8>
				// CHECK-SAME: %[[LHS_ZP:.+]]: i32, %[[RHS_ZP:.+]]: i32
				// CHECK-SAME: %[[ACC:.+]]: tensor<?x?xi32>
				// CHECK-DAG: %[[C1_INDEX:.+]] = arith.constant 1 : index
				// CHECK-DAG: %[[C0_I32:.+]] = arith.constant 0 : i32
				// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%[[LHS]], %[[RHS]] : tensor<?x?xi8>, tensor<?x?xi8>) outs(%[[ACC]] : tensor<?x?xi32>)
				// CHECK: %[[INIT_RESULT:.+]] = linalg.init_tensor
				// CHECK: %[[INIT_LHS_SUMS_ACC:.+]] = linalg.init_tensor
				// CHECK: %[[ZERO_LHS_SUMS_ACC:.+]] = linalg.fill(%[[C0_I32]], %[[INIT_LHS_SUMS_ACC]])
				// CHECK: %[[LHS_SUMS:.+]] = linalg.generic
				// CHECK-SAME: "parallel", "reduction"
				// CHECK-SAME: ins(%[[LHS]] : tensor<?x?xi8>)
				// CHECK-SAME: outs(%[[ZERO_LHS_SUMS_ACC]] : tensor<?xi32>)
				// CHECK: %[[INIT_RHS_SUMS_ACC:.+]] = linalg.init_tensor
				// CHECK: %[[ZERO_RHS_SUMS_ACC:.+]] = linalg.fill(%[[C0_I32]], %[[INIT_RHS_SUMS_ACC]])
				// CHECK: %[[RHS_SUMS:.+]] = linalg.generic
				// CHECK-SAME: "reduction", "parallel"
				// CHECK-SAME: ins(%[[RHS]] : tensor<?x?xi8>)
				// CHECK-SAME: outs(%[[ZERO_RHS_SUMS_ACC]] : tensor<?xi32>)
				// CHECK: %[[LHS_ZP_TIMES_RHS_ZP:.+]] = arith.muli %[[LHS_ZP]], %[[RHS_ZP]]
				// CHECK: %[[K_SIZE:.+]] = tensor.dim %[[LHS]], %[[C1_INDEX]]
				// CHECK: %[[K_SIZE_I32:.+]] = arith.index_cast %[[K_SIZE]] : index to i32
				// CHECK: %[[PRODUCT_TERM:.+]] = arith.muli %[[LHS_ZP_TIMES_RHS_ZP]], %[[K_SIZE_I32]]
				// CHECK: %[[RESULT:.+]] = linalg.generic
				// CHECK-SAME: "parallel", "parallel"
				// CHECK-SAME: ins(%[[MATMUL]], %[[LHS_SUMS]], %[[RHS_ZP]], %[[RHS_SUMS]], %[[LHS_ZP]], %[[PRODUCT_TERM]] : tensor<?x?xi32>, tensor<?xi32>, i32, tensor<?xi32>, i32, i32)
				// CHECK: return %[[RESULT]]
				// -----

				func @quantized_matmul_neither_zp_0_3x4x5(%lhs : tensor<3x4xi8>, %rhs : tensor<4x5xi8>, %lhs_zp : i32, %rhs_zp : i32, %acc : tensor<3x5xi32>) -> tensor<3x5xi32> {
				%1 = linalg.quantized_matmul ins(%lhs, %rhs, %lhs_zp, %rhs_zp : tensor<3x4xi8>, tensor<4x5xi8>, i32, i32) outs(%acc : tensor<3x5xi32>) -> tensor<3x5xi32>
				return %1 : tensor<3x5xi32>
				}
				// CHECK-LABEL: func @quantized_matmul_neither_zp_0_3x4x5
				// CHECK-SAME: %[[LHS:.+]]: tensor<3x4xi8>, %[[RHS:.+]]: tensor<4x5xi8>
				// CHECK-SAME: %[[LHS_ZP:.+]]: i32, %[[RHS_ZP:.+]]: i32
				// CHECK-SAME: %[[ACC:.+]]: tensor<3x5xi32>
				// CHECK-DAG: %[[C0_I32:.+]] = arith.constant 0 : i32
				// CHECK-DAG: %[[C4_I32:.+]] = arith.constant 4 : i32
				// CHECK: %[[MATMUL:.+]] = linalg.matmul ins(%[[LHS]], %[[RHS]] : tensor<3x4xi8>, tensor<4x5xi8>) outs(%[[ACC]] : tensor<3x5xi32>)
				// CHECK: %[[INIT_RESULT:.+]] = linalg.init_tensor
				// CHECK: %[[INIT_LHS_SUMS_ACC:.+]] = linalg.init_tensor
				// CHECK: %[[ZERO_LHS_SUMS_ACC:.+]] = linalg.fill(%[[C0_I32]], %[[INIT_LHS_SUMS_ACC]])
				// CHECK: %[[LHS_SUMS:.+]] = linalg.generic
				// CHECK-SAME: "parallel", "reduction"
				// CHECK-SAME: ins(%[[LHS]] : tensor<3x4xi8>)
				// CHECK-SAME: outs(%[[ZERO_LHS_SUMS_ACC]] : tensor<3xi32>)
				// CHECK: %[[INIT_RHS_SUMS_ACC:.+]] = linalg.init_tensor
				// CHECK: %[[ZERO_RHS_SUMS_ACC:.+]] = linalg.fill(%[[C0_I32]], %[[INIT_RHS_SUMS_ACC]])
				// CHECK: %[[RHS_SUMS:.+]] = linalg.generic
				// CHECK-SAME: "reduction", "parallel"
				// CHECK-SAME: ins(%[[RHS]] : tensor<4x5xi8>)
				// CHECK-SAME: outs(%[[ZERO_RHS_SUMS_ACC]] : tensor<5xi32>)
				// CHECK: %[[LHS_ZP_TIMES_RHS_ZP:.+]] = arith.muli %[[LHS_ZP]], %[[RHS_ZP]]
				// CHECK: %[[PRODUCT_TERM:.+]] = arith.muli %[[LHS_ZP_TIMES_RHS_ZP]], %[[C4_I32]]
				// CHECK: %[[RESULT:.+]] = linalg.generic
				// CHECK-SAME: "parallel", "parallel"
				// CHECK-SAME: ins(%[[MATMUL]], %[[LHS_SUMS]], %[[RHS_ZP]], %[[RHS_SUMS]], %[[LHS_ZP]], %[[PRODUCT_TERM]] : tensor<3x5xi32>, tensor<3xi32>, i32, tensor<5xi32>, i32, i32)
				// CHECK: return %[[RESULT]]
				// -----

This is an archive of the discontinued LLVM Phabricator instance.

Add a lowering of quantized_matmul to matmul.AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 411197

mlir/include/mlir/Dialect/Linalg/Passes.h

mlir/include/mlir/Dialect/Linalg/Passes.td

mlir/include/mlir/Dialect/Linalg/Transforms/Transforms.h

mlir/lib/Dialect/Linalg/Transforms/CMakeLists.txt

mlir/lib/Dialect/Linalg/Transforms/QuantizedMatmulToMatmul.cpp

mlir/test/Dialect/Linalg/quantized-matmul-to-matmul.mlir

Add a lowering of quantized_matmul to matmul.
AbandonedPublic