This is an archive of the discontinued LLVM Phabricator instance.

[mlir][arith] Add expansion pattern for ext/trunc of bf16
ClosedPublic

Authored by rsuderman on Mar 28 2023, 2:52 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
jpienaar

Commits

rG5bff523793ee: [mlir][arith] Add expansion pattern for ext/trunc of bf16

Summary

bf16 has a trivial truncation/extension behavior with F32 that
can be described in elementary arith operations. Include some
expansions to efficiently convert.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

rsuderman created this revision.Mar 28 2023, 2:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2023, 2:52 PM

Herald added subscribers: Moerafaat, zero9178, bzcheeseman and 21 others. · View Herald Transcript

rsuderman requested review of this revision.Mar 28 2023, 2:52 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptMar 28 2023, 2:52 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

rsuderman added a reviewer: jpienaar.Mar 28 2023, 2:53 PM

Harbormaster completed remote builds in B222357: Diff 509148.Mar 29 2023, 12:34 AM

Nice (sorry forgot to hit submit)

mlir/lib/Dialect/Arith/Transforms/ExpandOps.cpp
207	Could we do this before cloning?
305	rm ?

Updated for jpienaar@ comments

rsuderman marked 2 inline comments as done.Mar 29 2023, 5:08 PM

LG to me, and optionally done via populate, I'm not sure if always good idea vs letting lower level codegen handle it (seems direct). But given not supported on all backends, SGTM.

mlir/include/mlir/Dialect/Arith/Transforms/Passes.h
41	to lower level bitcasts and shifts ? (something useful, is not too descriptive).

This revision is now accepted and ready to land.Mar 29 2023, 5:35 PM

In D147091#4232242, @jpienaar wrote:

LG to me, and optionally done via populate, I'm not sure if always good idea vs letting lower level codegen handle it (seems direct). But given not supported on all backends, SGTM.

I agree. It does not appear that anything depends on the pass specifically and we should be able to integrate via the populate command.

Updated comment.

This revision was landed with ongoing or failed builds.Mar 29 2023, 5:59 PM

Closed by commit rG5bff523793ee: [mlir][arith] Add expansion pattern for ext/trunc of bf16 (authored by Robert Suderman <suderman@google.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Robert Suderman <suderman@google.com> added a commit: rG5bff523793ee: [mlir][arith] Add expansion pattern for ext/trunc of bf16.

Harbormaster completed remote builds in B222629: Diff 509517.Mar 29 2023, 6:27 PM

bkramer added a subscriber: bkramer.Apr 4 2023, 6:50 AM

bkramer added inline comments.

mlir/lib/Dialect/Arith/Transforms/ExpandOps.cpp
223	Sorry for being late, but this expansion is simply incorrect. converting from f32 to bf16 needs a rounding step. Can this pattern be removed or at least pulled out into an opt-in pass. Having it on by default just gives us incorrect results.

bkramer added a reverting change: rG3bde144de32d: Revert "[mlir][arith] Add expansion pattern for ext/trunc of bf16".Apr 4 2023, 6:59 AM

Revision Contents

Path

Size

mlir/

include/

mlir/

Dialect/

Arith/

Transforms/

Passes.h

3 lines

lib/

Dialect/

Arith/

Transforms/

ExpandOps.cpp

104 lines

test/

Dialect/

Arith/

expand-ops.mlir

64 lines

Diff 509519

mlir/include/mlir/Dialect/Arith/Transforms/Passes.h

	Show All 32 Lines
	/// types into supported ones. This is done by splitting original power-of-two			/// types into supported ones. This is done by splitting original power-of-two
	/// i2N integer types into two iN halves.			/// i2N integer types into two iN halves.
	void populateArithWideIntEmulationPatterns(			void populateArithWideIntEmulationPatterns(
	WideIntEmulationConverter &typeConverter, RewritePatternSet &patterns);			WideIntEmulationConverter &typeConverter, RewritePatternSet &patterns);

	/// Add patterns to expand Arith ceil/floor division ops.			/// Add patterns to expand Arith ceil/floor division ops.
	void populateCeilFloorDivExpandOpsPatterns(RewritePatternSet &patterns);			void populateCeilFloorDivExpandOpsPatterns(RewritePatternSet &patterns);

				/// Add patterns to expand Arith bf16 patterns to lower level bitcasts/shifts.
				jpienaarUnsubmitted Not Done Reply Inline Actions to lower level bitcasts and shifts ? (something useful, is not too descriptive). jpienaar: to lower level bitcasts and shifts ? (something useful, is not too descriptive).
				void populateExpandBFloat16Patterns(RewritePatternSet &patterns);

	/// Add patterns to expand Arith ops.			/// Add patterns to expand Arith ops.
	void populateArithExpandOpsPatterns(RewritePatternSet &patterns);			void populateArithExpandOpsPatterns(RewritePatternSet &patterns);

	/// Create a pass to legalize Arith ops.			/// Create a pass to legalize Arith ops.
	std::unique_ptr<Pass> createArithExpandOpsPass();			std::unique_ptr<Pass> createArithExpandOpsPass();

	/// Create a pass to replace signed ops with unsigned ones where they are proven			/// Create a pass to replace signed ops with unsigned ones where they are proven
	/// equivalent.			/// equivalent.
	Show All 21 Lines

mlir/lib/Dialect/Arith/Transforms/ExpandOps.cpp

//===- ExpandOps.cpp - Pass to legalize Arith ops for LLVM lowering --===//		//===- ExpandOps.cpp - Pass to legalize Arith ops for LLVM lowering --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "mlir/Dialect/Arith/Transforms/Passes.h"		#include "mlir/Dialect/Arith/Transforms/Passes.h"

#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/Vector/IR/VectorOps.h"		#include "mlir/Dialect/Vector/IR/VectorOps.h"
		#include "mlir/IR/ImplicitLocOpBuilder.h"
#include "mlir/IR/TypeUtilities.h"		#include "mlir/IR/TypeUtilities.h"
#include "mlir/Transforms/DialectConversion.h"		#include "mlir/Transforms/DialectConversion.h"

namespace mlir {		namespace mlir {
namespace arith {		namespace arith {
#define GEN_PASS_DEF_ARITHEXPANDOPS		#define GEN_PASS_DEF_ARITHEXPANDOPS
#include "mlir/Dialect/Arith/Transforms/Passes.h.inc"		#include "mlir/Dialect/Arith/Transforms/Passes.h.inc"
} // namespace arith		} // namespace arith
} // namespace mlir		} // namespace mlir

using namespace mlir;		using namespace mlir;

/// Create an integer or index constant.		/// Create an integer or index constant.
static Value createConst(Location loc, Type type, int value,		static Value createConst(Location loc, Type type, int value,
PatternRewriter &rewriter) {		PatternRewriter &rewriter) {
		auto attr = rewriter.getIntegerAttr(getElementTypeOrSelf(type), value);
auto elTy = getElementTypeOrSelf(type);		if (auto shapedTy = dyn_cast<ShapedType>(type)) {
auto constantAttr = rewriter.getIntegerAttr(elTy, value);

if (auto vecTy = llvm::dyn_cast<ShapedType>(type))
return rewriter.create<arith::ConstantOp>(		return rewriter.create<arith::ConstantOp>(
loc, vecTy, DenseElementsAttr::get(vecTy, constantAttr));		loc, DenseElementsAttr::get(shapedTy, attr));
		}

return rewriter.create<arith::ConstantOp>(loc, constantAttr);		return rewriter.create<arith::ConstantOp>(loc, attr);
}		}

namespace {		namespace {

/// Expands CeilDivUIOp (n, m) into		/// Expands CeilDivUIOp (n, m) into
/// n == 0 ? 0 : ((n-1) / m) + 1		/// n == 0 ? 0 : ((n-1) / m) + 1
struct CeilDivUIOpConverter : public OpRewritePattern<arith::CeilDivUIOp> {		struct CeilDivUIOpConverter : public OpRewritePattern<arith::CeilDivUIOp> {
using OpRewritePattern::OpRewritePattern;		using OpRewritePattern::OpRewritePattern;
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(OpTy op,
// Handle the case where rhs is NaN: 'isNaN(rhs) ? rhs : select'.		// Handle the case where rhs is NaN: 'isNaN(rhs) ? rhs : select'.
Value isNaN = rewriter.create<arith::CmpFOp>(loc, arith::CmpFPredicate::UNO,		Value isNaN = rewriter.create<arith::CmpFOp>(loc, arith::CmpFPredicate::UNO,
rhs, rhs);		rhs, rhs);
rewriter.replaceOpWithNewOp<arith::SelectOp>(op, isNaN, rhs, select);		rewriter.replaceOpWithNewOp<arith::SelectOp>(op, isNaN, rhs, select);
return success();		return success();
}		}
};		};

		struct BFloat16ExtFOpConverter : public OpRewritePattern<arith::ExtFOp> {
		using OpRewritePattern::OpRewritePattern;
		LogicalResult matchAndRewrite(arith::ExtFOp op,
		PatternRewriter &rewriter) const final {
		ImplicitLocOpBuilder b(op.getLoc(), rewriter);
		auto operand = op.getOperand();
		Type operandTy = operand.getType();
		Type resultTy = op.getType();
		Type operandETy = getElementTypeOrSelf(operandTy);
		Type resultETy = getElementTypeOrSelf(resultTy);

		if (!operandETy.isBF16() \|\| !resultETy.isF32()) {
		return rewriter.notifyMatchFailure(op, "not a ext of bf16 to f32.");
		}

		Type i16Ty = b.getI16Type();
		Type i32Ty = b.getI32Type();
		if (auto shapedTy = dyn_cast<ShapedType>(operandTy)) {
		i16Ty = shapedTy.clone(i16Ty);
		jpienaarUnsubmitted Done Reply Inline Actions Could we do this before cloning? jpienaar: Could we do this before cloning?
		i32Ty = shapedTy.clone(i32Ty);
		}

		Value bitcast = b.create<arith::BitcastOp>(i16Ty, operand);
		Value exti = b.create<arith::ExtUIOp>(i32Ty, bitcast);

		Value c16 = createConst(op.getLoc(), i32Ty, 16, rewriter);
		Value shl = b.create<arith::ShLIOp>(exti, c16);
		Value result = b.create<arith::BitcastOp>(resultTy, shl);

		rewriter.replaceOp(op, result);
		return success();
		}
		};

		struct BFloat16TruncFOpConverter : public OpRewritePattern<arith::TruncFOp> {
		bkramerUnsubmitted Not Done Reply Inline Actions Sorry for being late, but this expansion is simply incorrect. converting from f32 to bf16 needs a rounding step. Can this pattern be removed or at least pulled out into an opt-in pass. Having it on by default just gives us incorrect results. bkramer: Sorry for being late, but this expansion is simply incorrect. converting from f32 to bf16 needs…
		using OpRewritePattern::OpRewritePattern;
		LogicalResult matchAndRewrite(arith::TruncFOp op,
		PatternRewriter &rewriter) const final {
		ImplicitLocOpBuilder b(op.getLoc(), rewriter);
		auto operand = op.getOperand();
		Type operandTy = operand.getType();
		Type resultTy = op.getType();
		Type operandETy = getElementTypeOrSelf(operandTy);
		Type resultETy = getElementTypeOrSelf(resultTy);

		if (!operandETy.isF32() \|\| !resultETy.isBF16()) {
		return rewriter.notifyMatchFailure(op, "not a trunc of f32 to bf16.");
		}

		Type i16Ty = b.getI16Type();
		Type i32Ty = b.getI32Type();
		if (auto shapedTy = dyn_cast<ShapedType>(operandTy)) {
		i16Ty = shapedTy.clone(i16Ty);
		i32Ty = shapedTy.clone(i32Ty);
		}

		Value bitcast = b.create<arith::BitcastOp>(i32Ty, operand);
		Value c16 = createConst(op.getLoc(), i32Ty, 16, rewriter);
		Value shl = b.create<arith::ShRUIOp>(bitcast, c16);
		Value trunc = b.create<arith::TruncIOp>(i16Ty, shl);
		Value result = b.create<arith::BitcastOp>(resultTy, trunc);

		rewriter.replaceOp(op, result);
		return success();
		}
		};

struct ArithExpandOpsPass		struct ArithExpandOpsPass
: public arith::impl::ArithExpandOpsBase<ArithExpandOpsPass> {		: public arith::impl::ArithExpandOpsBase<ArithExpandOpsPass> {
void runOnOperation() override {		void runOnOperation() override {
RewritePatternSet patterns(&getContext());		RewritePatternSet patterns(&getContext());
ConversionTarget target(getContext());		ConversionTarget target(getContext());

arith::populateArithExpandOpsPatterns(patterns);		arith::populateArithExpandOpsPatterns(patterns);

target.addLegalDialect<arith::ArithDialect>();		target.addLegalDialect<arith::ArithDialect>();
// clang-format off		// clang-format off
target.addIllegalOp<		target.addIllegalOp<
arith::CeilDivSIOp,		arith::CeilDivSIOp,
arith::CeilDivUIOp,		arith::CeilDivUIOp,
arith::FloorDivSIOp,		arith::FloorDivSIOp,
arith::MaxFOp,		arith::MaxFOp,
arith::MinFOp		arith::MinFOp
>();		>();

		target.addDynamicallyLegalOp<arith::ExtFOp>(
		[](arith::ExtFOp op) {
		Type inETy = getElementTypeOrSelf(op.getOperand().getType());
		Type outETy = getElementTypeOrSelf(op.getType());
		return !(inETy.isBF16() && outETy.isF32());
		});

		target.addDynamicallyLegalOp<arith::TruncFOp>(
		[](arith::TruncFOp op) {
		Type inETy = getElementTypeOrSelf(op.getOperand().getType());
		Type outETy = getElementTypeOrSelf(op.getType());
		return !(inETy.isF32() && outETy.isBF16());
		});

// clang-format on		// clang-format on
if (failed(applyPartialConversion(getOperation(), target,		if (failed(applyPartialConversion(getOperation(), target,
std::move(patterns))))		std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}
};		};

} // namespace		} // namespace

void mlir::arith::populateCeilFloorDivExpandOpsPatterns(		void mlir::arith::populateCeilFloorDivExpandOpsPatterns(
RewritePatternSet &patterns) {		RewritePatternSet &patterns) {
patterns		patterns
.add<CeilDivSIOpConverter, CeilDivUIOpConverter, FloorDivSIOpConverter>(		.add<CeilDivSIOpConverter, CeilDivUIOpConverter, FloorDivSIOpConverter>(
patterns.getContext());		patterns.getContext());
}		}

		void mlir::arith::populateExpandBFloat16Patterns(RewritePatternSet &patterns) {
		patterns.add<BFloat16ExtFOpConverter, BFloat16TruncFOpConverter>(
		jpienaarUnsubmitted Done Reply Inline Actions rm ? jpienaar: rm ?
		patterns.getContext());
		}

void mlir::arith::populateArithExpandOpsPatterns(RewritePatternSet &patterns) {		void mlir::arith::populateArithExpandOpsPatterns(RewritePatternSet &patterns) {
populateCeilFloorDivExpandOpsPatterns(patterns);		populateCeilFloorDivExpandOpsPatterns(patterns);
// clang-format off		// clang-format off
patterns.add<		patterns.add<
MaxMinFOpConverter<MaxFOp, arith::CmpFPredicate::UGT>,		MaxMinFOpConverter<MaxFOp, arith::CmpFPredicate::UGT>,
MaxMinFOpConverter<MinFOp, arith::CmpFPredicate::ULT>		MaxMinFOpConverter<MinFOp, arith::CmpFPredicate::ULT>,
		BFloat16ExtFOpConverter,
		BFloat16TruncFOpConverter
>(patterns.getContext());		>(patterns.getContext());
// clang-format on		// clang-format on
}		}

std::unique_ptr<Pass> mlir::arith::createArithExpandOpsPass() {		std::unique_ptr<Pass> mlir::arith::createArithExpandOpsPass() {
return std::make_unique<ArithExpandOpsPass>();		return std::make_unique<ArithExpandOpsPass>();
}		}

mlir/test/Dialect/Arith/expand-ops.mlir

Show First 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	func.func @minf(%a: f32, %b: f32) -> f32 {
return %result : f32		return %result : f32
}		}
// CHECK-SAME: %[[LHS:.]]: f32, %[[RHS:.]]: f32)		// CHECK-SAME: %[[LHS:.]]: f32, %[[RHS:.]]: f32)
// CHECK-NEXT: %[[CMP:.*]] = arith.cmpf ult, %[[LHS]], %[[RHS]] : f32		// CHECK-NEXT: %[[CMP:.*]] = arith.cmpf ult, %[[LHS]], %[[RHS]] : f32
// CHECK-NEXT: %[[SELECT:.*]] = arith.select %[[CMP]], %[[LHS]], %[[RHS]] : f32		// CHECK-NEXT: %[[SELECT:.*]] = arith.select %[[CMP]], %[[LHS]], %[[RHS]] : f32
// CHECK-NEXT: %[[IS_NAN:.*]] = arith.cmpf uno, %[[RHS]], %[[RHS]] : f32		// CHECK-NEXT: %[[IS_NAN:.*]] = arith.cmpf uno, %[[RHS]], %[[RHS]] : f32
// CHECK-NEXT: %[[RESULT:.*]] = arith.select %[[IS_NAN]], %[[RHS]], %[[SELECT]] : f32		// CHECK-NEXT: %[[RESULT:.*]] = arith.select %[[IS_NAN]], %[[RHS]], %[[SELECT]] : f32
// CHECK-NEXT: return %[[RESULT]] : f32		// CHECK-NEXT: return %[[RESULT]] : f32

		// -----

		func.func @extf_bf16(%arg0 : bf16) -> f32 {
		%0 = arith.extf %arg0 : bf16 to f32
		return %0 : f32
		}

		// CHECK-LABEL: @extf_bf16
		// CHECK-SAME: %[[ARG0:.+]]: bf16
		// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %[[ARG0]] : bf16 to i16
		// CHECK-DAG: %[[EXT:.+]] = arith.extui %[[BITCAST]] : i16 to i32
		// CHECK-DAG: %[[C16:.+]] = arith.constant 16
		// CHECK-DAG: %[[SHLI:.+]] = arith.shli %[[EXT]], %[[C16]]
		// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %[[SHLI]] : i32 to f32
		// CHECK: return %[[BITCAST]]

		// -----

		func.func @extf_vector_bf16(%arg0 : vector<4xbf16>) -> vector<4xf32> {
		%0 = arith.extf %arg0 : vector<4xbf16> to vector<4xf32>
		return %0 : vector<4xf32>
		}

		// CHECK-LABEL: @extf_vector_bf16
		// CHECK-SAME: %[[ARG0:.+]]: vector<4xbf16>
		// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %[[ARG0]] : vector<4xbf16> to vector<4xi16>
		// CHECK-DAG: %[[EXT:.+]] = arith.extui %[[BITCAST]] : vector<4xi16> to vector<4xi32>
		// CHECK-DAG: %[[C16:.+]] = arith.constant dense<16>
		// CHECK-DAG: %[[SHLI:.+]] = arith.shli %[[EXT]], %[[C16]]
		// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %[[SHLI]] : vector<4xi32> to vector<4xf32>
		// CHECK: return %[[BITCAST]]

		// -----

		func.func @truncf_f32(%arg0 : f32) -> bf16 {
		%0 = arith.truncf %arg0 : f32 to bf16
		return %0 : bf16
		}

		// CHECK-LABEL: @truncf_f32
		// CHECK-SAME: %[[ARG0:.+]]: f32
		// CHECK-DAG: %[[C16:.+]] = arith.constant 16
		// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %[[ARG0]] : f32 to i32
		// CHECK-DAG: %[[SHR:.+]] = arith.shrui %[[BITCAST]], %[[C16]]
		// CHECK-DAG: %[[TRUNC:.+]] = arith.trunci %[[SHR]] : i32 to i16
		// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %[[TRUNC]] : i16 to bf16
		// CHECK: return %[[BITCAST]] : bf16

		// -----

		func.func @truncf_vector_f32(%arg0 : vector<4xf32>) -> vector<4xbf16> {
		%0 = arith.truncf %arg0 : vector<4xf32> to vector<4xbf16>
		return %0 : vector<4xbf16>
		}

		// CHECK-LABEL: @truncf_vector_f32
		// CHECK-SAME: %[[ARG0:.+]]: vector<4xf32>
		// CHECK-DAG: %[[C16:.+]] = arith.constant dense<16>
		// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %[[ARG0]] : vector<4xf32> to vector<4xi32>
		// CHECK-DAG: %[[SHR:.+]] = arith.shrui %[[BITCAST]], %[[C16]]
		// CHECK-DAG: %[[TRUNC:.+]] = arith.trunci %[[SHR]] : vector<4xi32> to vector<4xi16>
		// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %[[TRUNC]] : vector<4xi16> to vector<4xbf16>
		// CHECK: return %[[BITCAST]] : vector<4xbf16>

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][arith] Add expansion pattern for ext/trunc of bf16ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 509519

mlir/include/mlir/Dialect/Arith/Transforms/Passes.h

mlir/lib/Dialect/Arith/Transforms/ExpandOps.cpp

mlir/test/Dialect/Arith/expand-ops.mlir

[mlir][arith] Add expansion pattern for ext/trunc of bf16
ClosedPublic