This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Arith/Transforms/
-
Dialect/
-
Arith/
-
Transforms/
-
ExpandOps.cpp
-
test/
-
Dialect/Arith/
-
Arith/
-
expand-ops.mlir
-
mlir-cpu-runner/
3
expand-arith-ops.mlir

Differential D156362

[mlir][Arith] Change F32 to BF16 truncation to match __truncsfbf2
Needs RevisionPublic

Authored by krzysz00 on Jul 26 2023, 1:17 PM.

Download Raw Diff

Details

Reviewers

kuhar
nicolasvasilache
rsuderman

Summary

The current implementation of truncf %* : f32 to bf16 in the expand
ops pass is slow and does not match the truncsfbf2 / float2bf16
function in the execution utilities. This has caused divergences in
how floats are truncated between code that uses this pass and code
which relies on truncsfbf2 implementations, creating spurious
failures in conformance testing.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

krzysz00 created this revision.Jul 26 2023, 1:17 PM

Herald added a reviewer: kuhar. · View Herald TranscriptJul 26 2023, 1:17 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: manas, bviyer, Moerafaat and 22 others. · View Herald Transcript

krzysz00 requested review of this revision.Jul 26 2023, 1:17 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJul 26 2023, 1:17 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

Harbormaster completed remote builds in B248350: Diff 544489.Jul 26 2023, 6:02 PM

kuhar added a reviewer: rsuderman.Jul 27 2023, 10:46 AM

I agree that this is the right decomposition. The original implementation matched this behavior as well, however it was reverted due to other parties requesting truncation of fp should include rounding. Approving so you can land, just be aware this may bring wider discussions.

This revision is now accepted and ready to land.Aug 16 2023, 2:48 PM

The current implementation of truncf %* : f32 to bf16 in the expand ops pass is slow and does not match the truncsfbf2 / float2bf16 function in the execution utilities.

Can you elaborate on the actual numerical difference between the current expansion and the proposed one?

In D156362#4593439, @rsuderman wrote:

I agree that this is the right decomposition. The original implementation matched this behavior as well, however it was reverted due to other parties requesting truncation of fp should include rounding.

Can you link to these previous discussion?

@mehdi_amini

I couldn't find this discussion. @rsuderman, do you have pointers?

The numerical difference, as far as I can tell, has to do with the exact details of the rounding, as can be seen in the updated tests. I don't know exactly how to characterize it, though.

@rsuderman do you have some info on the original implementation (it was yours right?) and the change in numerics here?

mlir/test/mlir-cpu-runner/expand-arith-ops.mlir
47	0x7f7fffff is 3.40282347E+38 ; so the new expansion is less precise than the previous one?

mehdi_amini added inline comments.Aug 26 2023, 3:12 PM

mlir/test/mlir-cpu-runner/expand-arith-ops.mlir
47	Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does IEEE says anything? LLVM says "Results assume the round-to-nearest rounding mode" in general.

On Discourse, you linked to the LLVM backend lowering, which basically just the 16bits right shift: https://github.com/llvm/llvm-project/blob/1783185790de29b24d3850d33d9a9d586e6bbd39/llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp#L3230
Your expansion mimics truncsfbf2 , but we could also modify truncsfbf2 instead.
The implementation claims it comes from Eigen, but here is Eigen https://gitlab.com/libeigen/eigen/-/blob/master/Eigen/src/Core/arch/Default/BFloat16.h#L347-359 ; it matches the LLVM backend

In D156362#4619838, @mehdi_amini wrote:

@rsuderman do you have some info on the original implementation (it was yours right?) and the change in numerics here?

The original implementation landed was a non-rounding truncation, here is the revert of the pattern:
https://github.com/llvm/llvm-project/commit/3bde144de32dc09a0b227f7afcff94f908ac6739

I do not believe the reverter was necessarily right, I believe it just changed their downstream behavior so they reverted.

mlir/test/mlir-cpu-runner/expand-arith-ops.mlir
47	Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does IEEE says anything? LLVM says "Results assume the round-to-nearest rounding mode" in general. I think you are right Mehdi and this should not change. The new behavior is not quite clamping to the expected behavior.

rsuderman requested changes to this revision.Aug 31 2023, 10:09 AM

This revision now requires changes to proceed.Aug 31 2023, 10:09 AM

Does anyone know where that code in ExecutionEngine came from, then? Because perhaps that needs to be corrected as well

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Arith/

Transforms/

ExpandOps.cpp

68 lines

test/

Dialect/

Arith/

expand-ops.mlir

38 lines

mlir-cpu-runner/

expand-arith-ops.mlir

9 lines

Diff 544489

mlir/lib/Dialect/Arith/Transforms/ExpandOps.cpp

Show First 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(arith::TruncFOp op,
Type f32Ty = b.getF32Type();		Type f32Ty = b.getF32Type();
if (auto shapedTy = dyn_cast<ShapedType>(operandTy)) {		if (auto shapedTy = dyn_cast<ShapedType>(operandTy)) {
i1Ty = shapedTy.clone(i1Ty);		i1Ty = shapedTy.clone(i1Ty);
i16Ty = shapedTy.clone(i16Ty);		i16Ty = shapedTy.clone(i16Ty);
i32Ty = shapedTy.clone(i32Ty);		i32Ty = shapedTy.clone(i32Ty);
f32Ty = shapedTy.clone(f32Ty);		f32Ty = shapedTy.clone(f32Ty);
}		}

Value bitcast = b.create<arith::BitcastOp>(i32Ty, operand);		// See also lib/ExecutionEngine/Float16bits.cpp .
		Value c1 = createConst(op.getLoc(), i32Ty, 1, rewriter);
Value c23 = createConst(op.getLoc(), i32Ty, 23, rewriter);		Value c16 = createConst(op.getLoc(), i32Ty, 16, rewriter);
Value c31 = createConst(op.getLoc(), i32Ty, 31, rewriter);		Value c31 = createConst(op.getLoc(), i32Ty, 31, rewriter);
Value c23Mask = createConst(op.getLoc(), i32Ty, (1 << 23) - 1, rewriter);		Value cBias = createConst(op.getLoc(), i32Ty, 0x7fff, rewriter);
Value expMask =		Value qNaN = createConst(op.getLoc(), i16Ty, 0x7FC0, rewriter);
createConst(op.getLoc(), i32Ty, ((1 << 8) - 1) << 23, rewriter);		Value sNaN = createConst(op.getLoc(), i16Ty, 0xFFC0, rewriter);
Value expMax =
createConst(op.getLoc(), i32Ty, ((1 << 8) - 2) << 23, rewriter);

// Grab the sign bit.
Value sign = b.create<arith::ShRUIOp>(bitcast, c31);

// Our mantissa rounding value depends on the sign bit and the last
// truncated bit.
Value cManRound = createConst(op.getLoc(), i32Ty, (1 << 15), rewriter);
cManRound = b.create<arith::SubIOp>(cManRound, sign);

// Grab out the mantissa and directly apply rounding.
Value man = b.create<arith::AndIOp>(bitcast, c23Mask);
Value manRound = b.create<arith::AddIOp>(man, cManRound);

// Grab the overflow bit and shift right if we overflow.
Value roundBit = b.create<arith::ShRUIOp>(manRound, c23);
Value manNew = b.create<arith::ShRUIOp>(manRound, roundBit);

// Grab the exponent and round using the mantissa's carry bit.
Value exp = b.create<arith::AndIOp>(bitcast, expMask);
Value expCarry = b.create<arith::AddIOp>(exp, manRound);
expCarry = b.create<arith::AndIOp>(expCarry, expMask);

// If the exponent is saturated, we keep the max value.
Value expCmp =
b.create<arith::CmpIOp>(arith::CmpIPredicate::uge, exp, expMax);
exp = b.create<arith::SelectOp>(expCmp, exp, expCarry);

// If the exponent is max and we rolled over, keep the old mantissa.
Value roundBitBool = b.create<arith::TruncIOp>(i1Ty, roundBit);
Value keepOldMan = b.create<arith::AndIOp>(expCmp, roundBitBool);
man = b.create<arith::SelectOp>(keepOldMan, man, manNew);

// Assemble the now rounded f32 value (as an i32).
Value rounded = b.create<arith::ShLIOp>(sign, c31);
rounded = b.create<arith::OrIOp>(rounded, exp);
rounded = b.create<arith::OrIOp>(rounded, man);

Value c16 = createConst(op.getLoc(), i32Ty, 16, rewriter);		Value bitcast = b.create<arith::BitcastOp>(i32Ty, operand);
Value shr = b.create<arith::ShRUIOp>(rounded, c16);		Value isNaN =
Value trunc = b.create<arith::TruncIOp>(i16Ty, shr);		b.create<arith::CmpFOp>(arith::CmpFPredicate::UNO, operand, operand);
		Value sign =
		b.create<arith::TruncIOp>(i1Ty, b.create<arith::ShRUIOp>(bitcast, c31));
		Value nanVal = b.create<arith::SelectOp>(sign, sNaN, qNaN);

		Value lsb =
		b.create<arith::AndIOp>(b.create<arith::ShRUIOp>(bitcast, c16), c1);
		Value roundingBias = b.create<arith::AddIOp>(cBias, lsb);
		Value biased = b.create<arith::AddIOp>(bitcast, roundingBias);

		Value shifted = b.create<arith::ShRUIOp>(biased, c16);
		Value truncTypical = b.create<arith::TruncIOp>(i16Ty, shifted);
		Value trunc = b.create<arith::SelectOp>(isNaN, nanVal, truncTypical);
Value result = b.create<arith::BitcastOp>(resultTy, trunc);		Value result = b.create<arith::BitcastOp>(resultTy, trunc);

rewriter.replaceOp(op, result);		rewriter.replaceOp(op, result);
return success();		return success();
}		}
};		};

struct ArithExpandOpsPass		struct ArithExpandOpsPass
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

mlir/test/Dialect/Arith/expand-ops.mlir

	Show First 20 Lines • Show All 219 Lines • ▼ Show 20 Lines
	// -----			// -----

	func.func @truncf_f32(%arg0 : f32) -> bf16 {			func.func @truncf_f32(%arg0 : f32) -> bf16 {
	%0 = arith.truncf %arg0 : f32 to bf16			%0 = arith.truncf %arg0 : f32 to bf16
	return %0 : bf16			return %0 : bf16
	}			}

	// CHECK-LABEL: @truncf_f32			// CHECK-LABEL: @truncf_f32
				// CHECK-DAG: %[[C1:.+]] = arith.constant 1
	// CHECK-DAG: %[[C16:.+]] = arith.constant 16			// CHECK-DAG: %[[C16:.+]] = arith.constant 16
	// CHECK-DAG: %[[C32768:.+]] = arith.constant 32768
	// CHECK-DAG: %[[C2130706432:.+]] = arith.constant 2130706432
	// CHECK-DAG: %[[C2139095040:.+]] = arith.constant 2139095040
	// CHECK-DAG: %[[C8388607:.+]] = arith.constant 8388607
	// CHECK-DAG: %[[C31:.+]] = arith.constant 31			// CHECK-DAG: %[[C31:.+]] = arith.constant 31
	// CHECK-DAG: %[[C23:.+]] = arith.constant 23			// CHECK-DAG: %[[C32767:.+]] = arith.constant 32767
				// CHECK-DAG: %[[C32704:.+]] = arith.constant 32704
				// CHECK-DAG: %[[C_64:.+]] = arith.constant -64
	// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %arg0			// CHECK-DAG: %[[BITCAST:.+]] = arith.bitcast %arg0
				// CHECK-DAG: %[[ISNAN:.+]] = arith.cmpf uno, %arg0, %arg0
	// CHECK-DAG: %[[SIGN:.+]] = arith.shrui %[[BITCAST:.+]], %[[C31]]			// CHECK-DAG: %[[SIGN:.+]] = arith.shrui %[[BITCAST:.+]], %[[C31]]
	// CHECK-DAG: %[[ROUND:.+]] = arith.subi %[[C32768]], %[[SIGN]]			// CHECK-DAG: %[[SIGNBIT:.+]] = arith.trunci %[[SIGN]]
	// CHECK-DAG: %[[MANTISSA:.+]] = arith.andi %[[BITCAST]], %[[C8388607]]			// CHECK-DAG: %[[NANVAL:.+]] = arith.select %[[SIGNBIT]], %[[C_64]], %[[C32704]]
	// CHECK-DAG: %[[ROUNDED:.+]] = arith.addi %[[MANTISSA]], %[[ROUND]]			// CHECK-DAG: %[[LSB_PART:.+]] = arith.shrui %[[BITCAST]], %[[C16]]
	// CHECK-DAG: %[[ROLL:.+]] = arith.shrui %[[ROUNDED]], %[[C23]]			// CHECK-DAG: %[[LSB:.+]] = arith.andi %[[LSB_PART]], %[[C1]]
	// CHECK-DAG: %[[SHR:.+]] = arith.shrui %[[ROUNDED]], %[[ROLL]]			// CHECK-DAG: %[[BIAS:.+]] = arith.addi %[[C32767]], %[[LSB]]
	// CHECK-DAG: %[[EXP:.+]] = arith.andi %0, %[[C2139095040]]			// CHECK-DAG: %[[BIASED:.+]] = arith.addi %[[BITCAST]], %[[BIAS]]
	// CHECK-DAG: %[[EXPROUND:.+]] = arith.addi %[[EXP]], %[[ROUNDED]]			// CHECK-DAG: %[[SHIFT:.+]] = arith.shrui %[[BIASED]], %[[C16]]
	// CHECK-DAG: %[[EXPROLL:.+]] = arith.andi %[[EXPROUND]], %[[C2139095040]]			// CHECK-DAG: %[[TRUNCTYPICAL:.+]] = arith.trunci %[[SHIFT]]
	// CHECK-DAG: %[[EXPMAX:.+]] = arith.cmpi uge, %[[EXP]], %[[C2130706432]]			// CHECK-DAG: %[[TRUNC:.+]] = arith.select %[[ISNAN]], %[[NANVAL]], %[[TRUNCTYPICAL]]
	// CHECK-DAG: %[[EXPNEW:.+]] = arith.select %[[EXPMAX]], %[[EXP]], %[[EXPROLL]]
	// CHECK-DAG: %[[OVERFLOW_B:.+]] = arith.trunci %[[ROLL]]
	// CHECK-DAG: %[[KEEP_MAN:.+]] = arith.andi %[[EXPMAX]], %[[OVERFLOW_B]]
	// CHECK-DAG: %[[MANNEW:.+]] = arith.select %[[KEEP_MAN]], %[[MANTISSA]], %[[SHR]]
	// CHECK-DAG: %[[NEWSIGN:.+]] = arith.shli %[[SIGN]], %[[C31]]
	// CHECK-DAG: %[[WITHEXP:.+]] = arith.ori %[[NEWSIGN]], %[[EXPNEW]]
	// CHECK-DAG: %[[WITHMAN:.+]] = arith.ori %[[WITHEXP]], %[[MANNEW]]
	// CHECK-DAG: %[[SHIFT:.+]] = arith.shrui %[[WITHMAN]], %[[C16]]
	// CHECK-DAG: %[[TRUNC:.+]] = arith.trunci %[[SHIFT]]
	// CHECK-DAG: %[[RES:.+]] = arith.bitcast %[[TRUNC]]			// CHECK-DAG: %[[RES:.+]] = arith.bitcast %[[TRUNC]]
	// CHECK: return %[[RES]]			// CHECK: return %[[RES]]

	// -----			// -----

	func.func @truncf_vector_f32(%arg0 : vector<4xf32>) -> vector<4xbf16> {			func.func @truncf_vector_f32(%arg0 : vector<4xf32>) -> vector<4xbf16> {
	%0 = arith.truncf %arg0 : vector<4xf32> to vector<4xbf16>			%0 = arith.truncf %arg0 : vector<4xf32> to vector<4xbf16>
	return %0 : vector<4xbf16>			return %0 : vector<4xbf16>
	}			}

	// CHECK-LABEL: @truncf_vector_f32			// CHECK-LABEL: @truncf_vector_f32
	// CHECK-NOT: arith.truncf			// CHECK-NOT: arith.truncf

mlir/test/mlir-cpu-runner/expand-arith-ops.mlir

	// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(arith-expand{include-bf16=true},convert-arith-to-llvm),convert-vector-to-llvm,convert-func-to-llvm,reconcile-unrealized-casts)" \			// RUN: mlir-opt %s -pass-pipeline="builtin.module(func.func(arith-expand{include-bf16=true},convert-arith-to-llvm),convert-vector-to-llvm,convert-func-to-llvm,reconcile-unrealized-casts)" \
	// RUN: \| mlir-cpu-runner \			// RUN: \| mlir-cpu-runner \
	// RUN: -e main -entry-point-result=void -O0 \			// RUN: -e main -entry-point-result=void -O0 \
	// RUN: -shared-libs=%mlir_c_runner_utils \			// RUN: -shared-libs=%mlir_c_runner_utils \
	// RUN: -shared-libs=%mlir_runner_utils \			// RUN: -shared-libs=%mlir_runner_utils \
	// RUN: \| FileCheck %s			// RUN: \| FileCheck %s

	func.func @trunc_bf16(%a : f32) {			func.func @trunc_bf16(%a : f32) {
	%b = arith.truncf %a : f32 to bf16			%b = arith.truncf %a : f32 to bf16
	%c = arith.extf %b : bf16 to f32			%c = arith.extf %b : bf16 to f32
	vector.print %c : f32			vector.print %c : f32
	return			return
	}			}

	func.func @main() {			func.func @main() {
	// CHECK: 1.00781			// CHECK: 1.00781
				%noRoundOneI = arith.constant 0x3f808001 : i32
				%noRoundOneF = arith.bitcast %noRoundOneI : i32 to f32
				call @trunc_bf16(%noRoundOneF): (f32) -> ()

				// CHECK: 1
	%roundOneI = arith.constant 0x3f808000 : i32			%roundOneI = arith.constant 0x3f808000 : i32
	%roundOneF = arith.bitcast %roundOneI : i32 to f32			%roundOneF = arith.bitcast %roundOneI : i32 to f32
	call @trunc_bf16(%roundOneF): (f32) -> ()			call @trunc_bf16(%roundOneF): (f32) -> ()

	// CHECK-NEXT: -1			// CHECK-NEXT: -1
	%noRoundNegOneI = arith.constant 0xbf808000 : i32			%noRoundNegOneI = arith.constant 0xbf808000 : i32
	%noRoundNegOneF = arith.bitcast %noRoundNegOneI : i32 to f32			%noRoundNegOneF = arith.bitcast %noRoundNegOneI : i32 to f32
	call @trunc_bf16(%noRoundNegOneF): (f32) -> ()			call @trunc_bf16(%noRoundNegOneF): (f32) -> ()

	// CHECK-NEXT: -1.00781			// CHECK-NEXT: -1.00781
	%roundNegOneI = arith.constant 0xbf808001 : i32			%roundNegOneI = arith.constant 0xbf808001 : i32
	%roundNegOneF = arith.bitcast %roundNegOneI : i32 to f32			%roundNegOneF = arith.bitcast %roundNegOneI : i32 to f32
	call @trunc_bf16(%roundNegOneF): (f32) -> ()			call @trunc_bf16(%roundNegOneF): (f32) -> ()

	// CHECK-NEXT: inf			// CHECK-NEXT: inf
	%infi = arith.constant 0x7f800000 : i32			%infi = arith.constant 0x7f800000 : i32
	%inff = arith.bitcast %infi : i32 to f32			%inff = arith.bitcast %infi : i32 to f32
	call @trunc_bf16(%inff): (f32) -> ()			call @trunc_bf16(%inff): (f32) -> ()

	// CHECK-NEXT: -inf			// CHECK-NEXT: -inf
	%neginfi = arith.constant 0xff800000 : i32			%neginfi = arith.constant 0xff800000 : i32
	%neginff = arith.bitcast %neginfi : i32 to f32			%neginff = arith.bitcast %neginfi : i32 to f32
	call @trunc_bf16(%neginff): (f32) -> ()			call @trunc_bf16(%neginff): (f32) -> ()

	// CHECK-NEXT: 3.38953e+38			// CHECK-NEXT: inf
	%bigi = arith.constant 0x7f7fffff : i32			%bigi = arith.constant 0x7f7fffff : i32
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions 0x7f7fffff is 3.40282347E+38 ; so the new expansion is less precise than the previous one? mehdi_amini: 0x7f7fffff is 3.40282347E+38 ; so the new expansion is less precise than the previous one?
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does IEEE says anything? LLVM says "Results assume the round-to-nearest rounding mode" in general. mehdi_amini: Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does…
				rsudermanUnsubmitted Not Done Reply Inline Actions Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does IEEE says anything? LLVM says "Results assume the round-to-nearest rounding mode" in general. I think you are right Mehdi and this should not change. The new behavior is not quite clamping to the expected behavior. rsuderman: > Actually it is 3.40282347E+38 in f32, I don't know what's expected for this conversion, does…
	%bigf = arith.bitcast %bigi : i32 to f32			%bigf = arith.bitcast %bigi : i32 to f32
	call @trunc_bf16(%bigf): (f32) -> ()			call @trunc_bf16(%bigf): (f32) -> ()

	// CHECK-NEXT: -3.38953e+38			// CHECK-NEXT: -inf
	%negbigi = arith.constant 0xff7fffff : i32			%negbigi = arith.constant 0xff7fffff : i32
	%negbigf = arith.bitcast %negbigi : i32 to f32			%negbigf = arith.bitcast %negbigi : i32 to f32
	call @trunc_bf16(%negbigf): (f32) -> ()			call @trunc_bf16(%negbigf): (f32) -> ()

	// CHECK-NEXT: 1.625			// CHECK-NEXT: 1.625
	%exprolli = arith.constant 0x3fcfffff : i32			%exprolli = arith.constant 0x3fcfffff : i32
	%exprollf = arith.bitcast %exprolli : i32 to f32			%exprollf = arith.bitcast %exprolli : i32 to f32
	call @trunc_bf16(%exprollf): (f32) -> ()			call @trunc_bf16(%exprollf): (f32) -> ()

	// CHECK-NEXT: -1.625			// CHECK-NEXT: -1.625
	%exprollnegi = arith.constant 0xbfcfffff : i32			%exprollnegi = arith.constant 0xbfcfffff : i32
	%exprollnegf = arith.bitcast %exprollnegi : i32 to f32			%exprollnegf = arith.bitcast %exprollnegi : i32 to f32
	call @trunc_bf16(%exprollnegf): (f32) -> ()			call @trunc_bf16(%exprollnegf): (f32) -> ()

	return			return
	}			}