This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add combine for fcmp sqrt(x),C --> fcmp x,C*C
AbandonedPublic

Authored by bsmith on May 23 2022, 3:37 AM.

Download Raw Diff

Details

Reviewers

paulwalker-arm
peterwaller-arm
sdesmalen
spatel

Summary

Co-Authored-by: Paul Walker <paul.walker@arm.com>

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,120 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,140 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,070 ms	x64 debian > MLIR.Examples/standalone::test.toy
	60,030 ms	x64 debian > libomp.worksharing/single::omp_single.c

Event Timeline

bsmith created this revision.May 23 2022, 3:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 3:37 AM

Herald added subscribers: hiraditya, kristof.beyls. · View Herald Transcript

bsmith requested review of this revision.May 23 2022, 3:37 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 3:37 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B165801: Diff 431321.May 23 2022, 4:19 AM

bsmith retitled this revision from [AArch64][InstCombine] Add combine for fcmp sqrt(x),C --> fcmp x,C*C to [InstCombine] Add combine for fcmp sqrt(x),C --> fcmp x,C*C.May 23 2022, 5:13 AM

Be stricter around fast math flags and require 'fast' not just 'nnan'.
Propagate fast math flags through combine.

Is this “safe” even with fast math for all predicates?

nikic added a reviewer: spatel.May 23 2022, 8:32 AM

Harbormaster completed remote builds in B165843: Diff 431375.May 23 2022, 8:34 AM

spatel mentioned this in rGe8c20d995bed: [IR] add and use pattern match specialization for sqrt intrinsic; NFC.May 23 2022, 11:16 AM

spatel added inline comments.May 23 2022, 11:28 AM

llvm/include/llvm/IR/PatternMatch.h
2204 ↗	(On Diff #431375)	This is useful already, so I added it as a preliminary step: e8c20d995bed
llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
6781	We're (very slowly) moving away from using FMF with fcmp because fcmp doesn't usually have a logical relationship to the FMF. What really matters for this fold is that we can't require precise results from sqrt, so we should check that the sqrt has "reassoc" or "afn".
llvm/test/Transforms/InstCombine/fcmp.ll
1234	Can we handle this kind of example as a preliminary patch? There's no fast-math required if we are checking if the result of sqrt is or is not negative (but we should test several predicates to verify that we have the correct behavior with NAN): https://alive2.llvm.org/ce/z/EeYGLt https://alive2.llvm.org/ce/z/X7qsg2

spatel added inline comments.May 23 2022, 11:52 AM

llvm/test/Transforms/InstCombine/fcmp.ll
1234	Note that we should already reduce some compares via: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/InstructionSimplify.cpp#L3995

Maybe also worth to check gcc's issues with this transformation:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91734

Rebase.
Move fast math flag requirements from fcmp to sqrt.
Don't require 'fast' flag on sqrt, only 'nnan && (reassoc || afn)'

In D126190#3531596, @xbolva00 wrote:

Is this “safe” even with fast math for all predicates?

With the fast math flags requirement moved to the sqrt itself, do you still see a problem here?

llvm/test/Transforms/InstCombine/fcmp.ll
1234	That optimization is orthogonal to what we are doing here, additionally I don't think I have the floating-point expertise in order to get the NaN cases in such a change right.

Harbormaster completed remote builds in B166036: Diff 431649.May 24 2022, 6:11 AM

spatel added inline comments.May 27 2022, 10:44 AM

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
6785	What happens if C*C overflows or underflows? Unless there's some reason to diverge from the behavior in the gcc bug link (see earlier comment from @xbolva00 ), we should try to have the same rules for when this transform is allowed.

I think I'd prefer to start with the cases that don't require fast-math flags, then expand the transform as needed to include the cases that do require them.

For example, Alive2 says the following is correct without any fast-math flags:

declare half @llvm.sqrt.f16(half)
define i1 @src(half %x) {
  %sqrt = call half @llvm.sqrt.f16(half %x)
  %cmp = fcmp ogt half %sqrt, 2.0
  ret i1 %cmp
}
define i1 @tgt(half %x) {
  %r = fcmp ogt half %x, 4.00390625
  ret i1 %r
}

This has turned out to be far more complex to do safely than originally thought, given I don't as of yet have a compelling example where this provides a significant benefit I will abandon this for now.

In D126190#3563544, @bsmith wrote:

This has turned out to be far more complex to do safely than originally thought, given I don't as of yet have a compelling example where this provides a significant benefit I will abandon this for now.

No problem - I filed a bug and linked it to this review, so someone can revisit:
https://github.com/llvm/llvm-project/issues/55918

Revision Contents

Path

Size

llvm/

lib/

Transforms/

InstCombine/

InstCombineCompares.cpp

13 lines

test/

Transforms/

InstCombine/

fcmp.ll

58 lines

Diff 431649

llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 6,769 Lines • ▼ Show 20 Lines	if (match(Op0, m_FNeg(m_Value(X)))) {
// fcmp pred (fneg X), C --> fcmp swap(pred) X, -C		// fcmp pred (fneg X), C --> fcmp swap(pred) X, -C
Constant *C;		Constant *C;
if (match(Op1, m_Constant(C))) {		if (match(Op1, m_Constant(C))) {
Constant *NegC = ConstantExpr::getFNeg(C);		Constant *NegC = ConstantExpr::getFNeg(C);
return new FCmpInst(I.getSwappedPredicate(), X, NegC, "", &I);		return new FCmpInst(I.getSwappedPredicate(), X, NegC, "", &I);
}		}
}		}

		if (match(Op0, m_Sqrt(m_Value(X)))) {
		// fcmp sqrt(x),C -> fcmp x,C*C
		const APFloat *CF;
		Instruction *Sqrt = cast<Instruction>(Op0);
		spatelUnsubmitted Not Done Reply Inline Actions We're (very slowly) moving away from using FMF with fcmp because fcmp doesn't usually have a logical relationship to the FMF. What really matters for this fold is that we can't require precise results from sqrt, so we should check that the sqrt has "reassoc" or "afn". spatel: We're (very slowly) moving away from using FMF with fcmp because fcmp doesn't usually have a…
		if (match(Op1, m_APFloat(CF)) && !CF->isNegative() && Sqrt->hasNoNaNs() &&
		(Sqrt->hasAllowReassoc() \|\| Sqrt->hasApproxFunc())) {
		Constant C = ConstantFP::get(X->getType(), CF);
		Instruction *FCmp = new FCmpInst(Pred, X, ConstantExpr::getFMul(C, C));
		spatelUnsubmitted Not Done Reply Inline Actions What happens if CC overflows or underflows? Unless there's some reason to diverge from the behavior in the gcc bug link (see earlier comment from @xbolva00 ), we should try to have the same rules for when this transform is allowed. spatel:* What happens if C*C overflows or underflows? Unless there's some reason to diverge from the…
		FCmp->setFastMathFlags(I.getFastMathFlags());
		return FCmp;
		}
		}

if (match(Op0, m_FPExt(m_Value(X)))) {		if (match(Op0, m_FPExt(m_Value(X)))) {
// fcmp (fpext X), (fpext Y) -> fcmp X, Y		// fcmp (fpext X), (fpext Y) -> fcmp X, Y
if (match(Op1, m_FPExt(m_Value(Y))) && X->getType() == Y->getType())		if (match(Op1, m_FPExt(m_Value(Y))) && X->getType() == Y->getType())
return new FCmpInst(Pred, X, Y, "", &I);		return new FCmpInst(Pred, X, Y, "", &I);

const APFloat *C;		const APFloat *C;
if (match(Op1, m_APFloat(C))) {		if (match(Op1, m_APFloat(C))) {
const fltSemantics &FPSem =		const fltSemantics &FPSem =
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/fcmp.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -passes=instcombine < %s \| FileCheck %s			; RUN: opt -S -passes=instcombine < %s \| FileCheck %s

	declare half @llvm.fabs.f16(half)			declare half @llvm.fabs.f16(half)
	declare double @llvm.fabs.f64(double)			declare double @llvm.fabs.f64(double)
				declare double @llvm.sqrt.f64(double)
	declare <2 x float> @llvm.fabs.v2f32(<2 x float>)			declare <2 x float> @llvm.fabs.v2f32(<2 x float>)
	declare double @llvm.copysign.f64(double, double)			declare double @llvm.copysign.f64(double, double)
	declare <2 x double> @llvm.copysign.v2f64(<2 x double>, <2 x double>)			declare <2 x double> @llvm.copysign.v2f64(<2 x double>, <2 x double>)

	define i1 @fpext_fpext(float %x, float %y) {			define i1 @fpext_fpext(float %x, float %y) {
	; CHECK-LABEL: @fpext_fpext(			; CHECK-LABEL: @fpext_fpext(
	; CHECK-NEXT: [[CMP:%.]] = fcmp nnan ogt float [[X:%.]], [[Y:%.*]]			; CHECK-NEXT: [[CMP:%.]] = fcmp nnan ogt float [[X:%.]], [[Y:%.*]]
	; CHECK-NEXT: ret i1 [[CMP]]			; CHECK-NEXT: ret i1 [[CMP]]
	▲ Show 20 Lines • Show All 1,191 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf une float [[A]], 0.000000e+00			; CHECK-NEXT: [[CMP:%.*]] = fcmp ninf une float [[A]], 0.000000e+00
	; CHECK-NEXT: ret i1 [[CMP]]			; CHECK-NEXT: ret i1 [[CMP]]
	;			;
	%a = fadd float %p, %p ; thwart complexity-based canonicalization			%a = fadd float %p, %p ; thwart complexity-based canonicalization
	%fneg = fneg float %a			%fneg = fneg float %a
	%cmp = fcmp ninf une float %a, %fneg			%cmp = fcmp ninf une float %a, %fneg
	ret i1 %cmp			ret i1 %cmp
	}			}

				; fcmp sqrt(X),C --> fcmp X,C*C - when using afn flag
				define i1 @fcmp_fsqrt_test1(double %v) {
				; CHECK-LABEL: @fcmp_fsqrt_test1(
				; CHECK-NEXT: [[CMP:%.]] = fcmp ogt double [[V:%.]], 4.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%sqrt = call nnan afn double @llvm.sqrt.f64(double %v)
				%cmp = fcmp ogt double %sqrt, 2.000000e+00
				ret i1 %cmp
				}

				; fcmp sqrt(X),C --> fcmp X,C*C - when using reassoc flag
				define i1 @fcmp_fsqrt_test2(double %v) {
				; CHECK-LABEL: @fcmp_fsqrt_test2(
				; CHECK-NEXT: [[CMP:%.]] = fcmp ogt double [[V:%.]], 4.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%sqrt = call nnan reassoc double @llvm.sqrt.f64(double %v)
				%cmp = fcmp ogt double %sqrt, 2.000000e+00
				ret i1 %cmp
				spatelUnsubmitted Not Done Reply Inline Actions Can we handle this kind of example as a preliminary patch? There's no fast-math required if we are checking if the result of sqrt is or is not negative (but we should test several predicates to verify that we have the correct behavior with NAN): https://alive2.llvm.org/ce/z/EeYGLt https://alive2.llvm.org/ce/z/X7qsg2 spatel: Can we handle this kind of example as a preliminary patch? There's no fast-math required if we…
				spatelUnsubmitted Not Done Reply Inline Actions Note that we should already reduce some compares via: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Analysis/InstructionSimplify.cpp#L3995 spatel: Note that we should already reduce some compares via: https://github.com/llvm/llvm…
				bsmithAuthorUnsubmitted Done Reply Inline Actions That optimization is orthogonal to what we are doing here, additionally I don't think I have the floating-point expertise in order to get the NaN cases in such a change right. bsmith: That optimization is orthogonal to what we are doing here, additionally I don't think I have…
				}

				; fcmp sqrt(X),C --> fcmp X,C*C - fcmp flags are preserved
				define i1 @fcmp_fsqrt_test3(double %v) {
				; CHECK-LABEL: @fcmp_fsqrt_test3(
				; CHECK-NEXT: [[CMP:%.]] = fcmp fast ogt double [[V:%.]], 4.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%sqrt = call nnan reassoc double @llvm.sqrt.f64(double %v)
				%cmp = fcmp fast ogt double %sqrt, 2.000000e+00
				ret i1 %cmp
				}

				; ensure we preserve sqrts when compared against negative numbers.
				define i1 @fcmp_fsqrt_test4(double %v) {
				; CHECK-LABEL: @fcmp_fsqrt_test4(
				; CHECK-NEXT: [[SQRT:%.]] = call nnan afn double @llvm.sqrt.f64(double [[V:%.]])
				; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[SQRT]], -2.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%sqrt = call nnan afn double @llvm.sqrt.f64(double %v)
				%cmp = fcmp ogt double %sqrt, -2.000000e+00
				ret i1 %cmp
				}

				; ensure we maintain sqrts when preserving NaNs.
				define i1 @fcmp_fsqrt_test5(double %v) {
				; CHECK-LABEL: @fcmp_fsqrt_test5(
				; CHECK-NEXT: [[SQRT:%.]] = call reassoc double @llvm.sqrt.f64(double [[V:%.]])
				; CHECK-NEXT: [[CMP:%.*]] = fcmp ogt double [[SQRT]], 2.000000e+00
				; CHECK-NEXT: ret i1 [[CMP]]
				;
				%sqrt = call reassoc double @llvm.sqrt.f64(double %v)
				%cmp = fcmp ogt double %sqrt, 2.000000e+00
				ret i1 %cmp
				}