This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
flang/
-
lib/Optimizer/Transforms/
-
Optimizer/
-
Transforms/
-
SimplifyIntrinsics.cpp
-
test/Transforms/
-
Transforms/
-
simplifyintrinsics.fir

Differential D158200

[flang] Fixed simplification for FP maxval.
ClosedPublic

Authored by vzakhari on Aug 17 2023, 11:25 AM.

Download Raw Diff

Details

Reviewers

Leporacanthicus
DavidTruby
kiranchandramohan

Commits

rG89b98c13e023: [flang] Fixed simplification for FP maxval.

Summary

On x86, a simplified F128 maxval ends up calling fmaxl that does not
work properly for F128 arguments. It is probably an LLVM issue, but
we also should not use arith.maxf if NaN or -0.0 operands are possible.
The change is to use cmpf and select. Unfortunately, these arith ops
do not support FastMathFlags currently, so I will have to fix this
sooner or later (depending on how this affects performance).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vzakhari created this revision.Aug 17 2023, 11:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 17 2023, 11:25 AM

Herald added subscribers: mehdi_amini, jdoerfert, pengfei. · View Herald Transcript

vzakhari requested review of this revision.Aug 17 2023, 11:25 AM

Just for information: I could not make clang generate a call to fmaxl. I tried the following test:

#include <stdio.h>

void test(__float128 *x) {
  int i;
  __float128 max = -1000.0;
  for (i = 0; i < 4; ++i) {
    max = (x[i] > max) ? x[i] : max;
  }
  printf("%e\n", (double)max);
}

int main() {
  __float128 c[4] = {1.0, 1.5, 2.0, 0.5};
  test(c);
}

clang -fno-signed-zeros -fno-honor-nans maxval.c -O2: InstCombine can recognize the select into call nnan nsz fp128 @llvm.maxnum.f128, but then the instruction selection takes a different route and lowers the call back into select using __gttf2 for FP128 comparison:

CALL64pcrel32 target-flags(x86-plt) &__gttf2, <regmask $bh $bl $bp $bph $bpl $bx $ebp $ebx $hbp $hbx $rbp $rbx $r12 $r13 $r14 $r15 $r12b $r13b $r14b $r15b $r12bh $r13bh $r14bh $r15bh $r12d $r13d $r14d $r15d $r12w $r13w $r14w $r15w $r12wh and 3 more...>, implicit $rsp, implicit $ssp, implicit $xmm0, implicit $xmm1, implicit-def $rsp, implicit-def $ssp, implicit-def $eax
%6:gr32 = COPY $eax
TEST32rr %6:gr32, %6:gr32, implicit-def $eflags
%7:vr128 = CMOV_VR128 %5:vr128, %1:vr128, 15, implicit $eflags

I think generating the llvm.maxnum without nnan nsz is never possible with clang, but if this happens the isel will produce fmaxl call.

Harbormaster completed remote builds in B253260: Diff 551199.Aug 17 2023, 12:21 PM

vzakhari added a reviewer: kiranchandramohan.Aug 21 2023, 12:09 PM

Looks OK.

FYI, there are some changes going on in the arith.maxf area.
https://reviews.llvm.org/D137786
https://reviews.llvm.org/D137655

This revision is now accepted and ready to land.Aug 21 2023, 1:27 PM

In D158200#4604640, @kiranchandramohan wrote:

Looks OK.

FYI, there are some changes going on in the arith.maxf area.
https://reviews.llvm.org/D137786
https://reviews.llvm.org/D137655

Thank you for the links, Kiran! I did not see this.
I think it should be okay to use the select in the meantime.

Closed by commit rG89b98c13e023: [flang] Fixed simplification for FP maxval. (authored by vzakhari). · Explain WhyAug 21 2023, 7:34 PM

This revision was automatically updated to reflect the committed changes.

vzakhari added a commit: rG89b98c13e023: [flang] Fixed simplification for FP maxval..

Revision Contents

Path

Size

flang/

lib/

Optimizer/

Transforms/

SimplifyIntrinsics.cpp

14 lines

test/

Transforms/

simplifyintrinsics.fir

3 lines

Diff 552190

flang/lib/Optimizer/Transforms/SimplifyIntrinsics.cpp

Show First 20 Lines • Show All 611 Lines • ▼ Show 20 Lines	auto init = [](fir::FirOpBuilder builder, mlir::Location loc,
unsigned bits = elementType.getIntOrFloatBitWidth();		unsigned bits = elementType.getIntOrFloatBitWidth();
int64_t minInt = llvm::APInt::getSignedMinValue(bits).getSExtValue();		int64_t minInt = llvm::APInt::getSignedMinValue(bits).getSExtValue();
return builder.createIntegerConstant(loc, elementType, minInt);		return builder.createIntegerConstant(loc, elementType, minInt);
};		};

auto genBodyOp = [](fir::FirOpBuilder builder, mlir::Location loc,		auto genBodyOp = [](fir::FirOpBuilder builder, mlir::Location loc,
mlir::Type elementType, mlir::Value elem1,		mlir::Type elementType, mlir::Value elem1,
mlir::Value elem2) -> mlir::Value {		mlir::Value elem2) -> mlir::Value {
if (elementType.isa<mlir::FloatType>())		if (elementType.isa<mlir::FloatType>()) {
return builder.create<mlir::arith::MaxFOp>(loc, elem1, elem2);		// arith.maxf later converted to llvm.intr.maxnum does not work
		// correctly for NaNs and -0.0 (see maxnum/minnum pattern matching
		// in LLVM's InstCombine pass). Moreover, llvm.intr.maxnum
		// for F128 operands is lowered into fmaxl call by LLVM.
		// This libm function may not work properly for F128 arguments
		// on targets where long double is not F128. It is an LLVM issue,
		// but we just use normal select here to resolve all the cases.
		auto compare = builder.create<mlir::arith::CmpFOp>(
		loc, mlir::arith::CmpFPredicate::OGT, elem1, elem2);
		return builder.create<mlir::arith::SelectOp>(loc, compare, elem1, elem2);
		}
if (elementType.isa<mlir::IntegerType>())		if (elementType.isa<mlir::IntegerType>())
return builder.create<mlir::arith::MaxSIOp>(loc, elem1, elem2);		return builder.create<mlir::arith::MaxSIOp>(loc, elem1, elem2);

llvm_unreachable("unsupported type");		llvm_unreachable("unsupported type");
return {};		return {};
};		};

mlir::Location loc = mlir::UnknownLoc::get(builder.getContext());		mlir::Location loc = mlir::UnknownLoc::get(builder.getContext());
▲ Show 20 Lines • Show All 747 Lines • Show Last 20 Lines

flang/test/Transforms/simplifyintrinsics.fir

	Show First 20 Lines • Show All 893 Lines • ▼ Show 20 Lines
	// CHECK: %[[NEG_DBL_MAX:.*]] = arith.constant -1.7976931348623157E+308 : f64			// CHECK: %[[NEG_DBL_MAX:.*]] = arith.constant -1.7976931348623157E+308 : f64
	// CHECK: %[[CINDEX_1:.*]] = arith.constant 1 : index			// CHECK: %[[CINDEX_1:.*]] = arith.constant 1 : index
	// CHECK: %[[DIMIDX_0:.*]] = arith.constant 0 : index			// CHECK: %[[DIMIDX_0:.*]] = arith.constant 0 : index
	// CHECK: %[[DIMS:.*]]:3 = fir.box_dims %[[ARR_BOX_F64]], %[[DIMIDX_0]] : (!fir.box<!fir.array<?xf64>>, index) -> (index, index, index)			// CHECK: %[[DIMS:.*]]:3 = fir.box_dims %[[ARR_BOX_F64]], %[[DIMIDX_0]] : (!fir.box<!fir.array<?xf64>>, index) -> (index, index, index)
	// CHECK: %[[EXTENT:.*]] = arith.subi %[[DIMS]]#1, %[[CINDEX_1]] : index			// CHECK: %[[EXTENT:.*]] = arith.subi %[[DIMS]]#1, %[[CINDEX_1]] : index
	// CHECK: %[[RES:.]] = fir.do_loop %[[ITER:.]] = %[[CINDEX_0]] to %[[EXTENT]] step %[[CINDEX_1]] iter_args(%[[MAX]] = %[[NEG_DBL_MAX]]) -> (f64) {			// CHECK: %[[RES:.]] = fir.do_loop %[[ITER:.]] = %[[CINDEX_0]] to %[[EXTENT]] step %[[CINDEX_1]] iter_args(%[[MAX]] = %[[NEG_DBL_MAX]]) -> (f64) {
	// CHECK: %[[ITEM:.*]] = fir.coordinate_of %[[ARR_BOX_F64]], %[[ITER]] : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>			// CHECK: %[[ITEM:.*]] = fir.coordinate_of %[[ARR_BOX_F64]], %[[ITER]] : (!fir.box<!fir.array<?xf64>>, index) -> !fir.ref<f64>
	// CHECK: %[[ITEM_VAL:.*]] = fir.load %[[ITEM]] : !fir.ref<f64>			// CHECK: %[[ITEM_VAL:.*]] = fir.load %[[ITEM]] : !fir.ref<f64>
	// CHECK: %[[NEW_MAX:.*]] = arith.maxf %[[ITEM_VAL]], %[[MAX]] : f64			// CHECK: %[[CMP:.*]] = arith.cmpf ogt, %[[ITEM_VAL]], %[[MAX]] : f64
				// CHECK: %[[NEW_MAX:.*]] = arith.select %[[CMP]], %[[ITEM_VAL]], %[[MAX]] : f64
	// CHECK: fir.result %[[NEW_MAX]] : f64			// CHECK: fir.result %[[NEW_MAX]] : f64
	// CHECK: }			// CHECK: }
	// CHECK: return %[[RES]] : f64			// CHECK: return %[[RES]] : f64
	// CHECK: }			// CHECK: }

	// -----			// -----

	// SUM reduction of sliced explicit-shape array is replaced with			// SUM reduction of sliced explicit-shape array is replaced with
	▲ Show 20 Lines • Show All 1,439 Lines • Show Last 20 Lines