This is an archive of the discontinued LLVM Phabricator instance.

InstCombine optimization to convert floating-point sign bit XORs to fsubs from -0.0
AbandonedPublic

Authored by aarzee on Apr 7 2016, 3:42 PM.

Download Raw Diff

Details

Reviewers

majnemer
scanon
escha

Summary

Discussion is inconclusive on the mailing list (http://lists.llvm.org/pipermail/llvm-dev/2016-April/098098.html) and IRC on whether this optimization is correct on platforms that do not follow IEEE-754.

Diff Detail

Event Timeline

aarzee updated this revision to Diff 52962.Apr 7 2016, 3:42 PM

aarzee retitled this revision from to InstCombine optimization to convert floating-point sign bit XORs to fsubs from -0.0.

aarzee updated this object.

aarzee added reviewers: escha, spatel.

Added tests.

aarzee added a reviewer: majnemer.Apr 8 2016, 8:17 AM

Leaving aside the correctness question (adding Steve just in case), is it always profitable?

Profitable, or "more canonical" (i.e. is it the right place, compared to some later lowering pass)

There are a few niche floating-point formats that use a representation of the form [ exponent | 2s complement significand ], so the signbit ends up sitting in the middle of the number. Does LLVM claim to support arbitrary oddball formats for float/double? Ideally we would specify that LLVM assumes IEEE formats, even if we don't require fully conformant operations in hardware; that would give us some flexibility.

From a profitability standpoint, we'll end up undoing this in the backend on many platforms, but as a canonicalization it seems OK. We should really add an actual fneg at some point, though.

In D18874#395527, @scanon wrote:

Does LLVM claim to support arbitrary oddball formats for float/double? Ideally we would specify that LLVM assumes IEEE formats, even if we don't require fully conformant operations in hardware; that would give us some flexibility.

AFAICT, the LangRef is silent about this. In practice in the optimizer, I think we've baked in enough IEEE format assumptions that undoing it would take great suffering.

From a profitability standpoint, we'll end up undoing this in the backend on many platforms, but as a canonicalization it seems OK. We should really add an actual fneg at some point, though.

The motivating case for the earlier fabs transform was the existing x86 codegen produced for the integer-variant IR:
https://llvm.org/bugs/show_bug.cgi?id=22428

Since SSE has bitops on FP, I've been able to minimize the impact on most of those examples. But AArch64 still does worse with the integer IR:

define float @fabs_via_int(float %x) {
  %bc1 = bitcast float %x to i32
  %and = and i32 %bc1, 2147483647
  %bc2 = bitcast i32 %and to float
  ret float %bc2
}

define float @fneg_via_int(float %x) {
  %bc1 = bitcast float %x to i32
  %xor = xor i32 %bc1, 2147483648
  %bc2 = bitcast i32 %xor to float
  ret float %bc2
}

define float @fabs_via_fp(float %x) {
  %fabs = call float @llvm.fabs.f32(float %x)
  ret float %fabs
}

declare float  @llvm.fabs.f32(float)

define float @fneg_via_fp(float %x) {
  %sub = fsub float -0.0, %x
  ret float %sub
}

$ ./llc -o - fabsneg.ll -mtriple=aarch64

fabs_via_int:           
  fmov	w8, s0
  and	w8, w8, #0x7fffffff
  fmov	s0, w8
  ret
fneg_via_int:              
  fmov	w8, s0
  eor	w8, w8, #0x80000000
  fmov	s0, w8
  ret
fabs_via_fp:                   
  fabs	s0, s0
  ret
fneg_via_fp:                     
  fneg	s0, s0
  ret

And for horror, change the triple to 'ppc64'...I still have nightmares. :)

I'm nervous about this change. The input code is denorm preserving on architectures that flush denorms, but the output may not be.

The input code is denorm preserving on architectures that flush denorms, but the output may not be.

Yet another reason to have a real fneg.

mehdi_amini removed a subscriber: mehdi_amini.Apr 8 2016, 11:24 AM

D19391 would obviate the need for this patch; any target that wants this sort of transform could enable it with the proposed TLI hook.

In D18874#422848, @spatel wrote:

D19391 would obviate the need for this patch; any target that wants this sort of transform could enable it with the proposed TLI hook.

This was committed:
http://reviews.llvm.org/rL271573

So I think this patch should be abandoned. Any target that wants the fabs/fneg transforms can enable it using the TLI hook.

Note that we left the question of FP format in LLVM IR unanswered: the sign bit could be anywhere...although there is mention of IEEE NaN/Inf in the LangRef

If we do specify the format, then we could at least restore the ValueTracking part of this to indicate that the top bit is cleared by fabs().

spatel resigned from this revision.Jun 30 2016, 7:37 AM

spatel removed a reviewer: spatel.

aarzee abandoned this revision.Jul 29 2016, 11:25 AM

arsenm mentioned this in D151934: InstCombine: Recognize fneg when performed as bitcasted integer.Jun 1 2023, 2:00 PM

arsenm mentioned this in rG50a9b3d8a522: InstCombine: Recognize fneg when performed as bitcasted integer.Aug 31 2023, 4:09 PM

Revision Contents

Path

Size

lib/

Transforms/

InstCombine/

InstCombineAndOrXor.cpp

15 lines

test/

Transforms/

InstCombine/

xor2.ll

49 lines

Diff 53027

lib/Transforms/InstCombine/InstCombineAndOrXor.cpp

Show First 20 Lines • Show All 1,742 Lines • ▼ Show 20 Lines	Value InstCombiner::FoldOrOfICmps(ICmpInst LHS, ICmpInst *RHS,

// E.g. (icmp slt x, 0) \| (icmp sgt x, n) --> icmp ugt x, n		// E.g. (icmp slt x, 0) \| (icmp sgt x, n) --> icmp ugt x, n
if (Value V = simplifyRangeCheck(LHS, RHS, /Inverted=*/true))		if (Value V = simplifyRangeCheck(LHS, RHS, /Inverted=*/true))
return V;		return V;

// E.g. (icmp sgt x, n) \| (icmp slt x, 0) --> icmp ugt x, n		// E.g. (icmp sgt x, n) \| (icmp slt x, 0) --> icmp ugt x, n
if (Value V = simplifyRangeCheck(RHS, LHS, /Inverted=*/true))		if (Value V = simplifyRangeCheck(RHS, LHS, /Inverted=*/true))
return V;		return V;

// This only handles icmp of constants: (icmp1 A, C1) \| (icmp2 B, C2).		// This only handles icmp of constants: (icmp1 A, C1) \| (icmp2 B, C2).
if (!LHSCst \|\| !RHSCst) return nullptr;		if (!LHSCst \|\| !RHSCst) return nullptr;

if (LHSCst == RHSCst && LHSCC == RHSCC) {		if (LHSCst == RHSCst && LHSCC == RHSCC) {
// (icmp ne A, 0) \| (icmp ne B, 0) --> (icmp ne (A\|B), 0)		// (icmp ne A, 0) \| (icmp ne B, 0) --> (icmp ne (A\|B), 0)
if (LHSCC == ICmpInst::ICMP_NE && LHSCst->isZero()) {		if (LHSCC == ICmpInst::ICMP_NE && LHSCst->isZero()) {
Value *NewOr = Builder->CreateOr(Val, Val2);		Value *NewOr = Builder->CreateOr(Val, Val2);
return Builder->CreateICmp(LHSCC, NewOr, LHSCst);		return Builder->CreateICmp(LHSCC, NewOr, LHSCst);
▲ Show 20 Lines • Show All 983 Lines • ▼ Show 20 Lines	if (CastInst *Op1C = dyn_cast<CastInst>(Op1))
I.getType())) {		I.getType())) {
Value *NewOp = Builder->CreateXor(Op0C->getOperand(0),		Value *NewOp = Builder->CreateXor(Op0C->getOperand(0),
Op1C->getOperand(0), I.getName());		Op1C->getOperand(0), I.getName());
return CastInst::Create(Op0C->getOpcode(), NewOp, I.getType());		return CastInst::Create(Op0C->getOpcode(), NewOp, I.getType());
}		}
}		}
}		}

		// If we are XORing the sign bit of a floating-point value, convert
		// this to fsub from -0.0, then cast back to integer.
		ConstantInt *CI;
		if (CastInst *Op0C = dyn_cast<CastInst>(Op0)) {
		Type *SrcTy = Op0C->getOperand(0)->getType();
		if (isa<BitCastInst>(Op0C) && SrcTy->isFloatingPointTy() &&
		match(Op1, m_ConstantInt(CI)) && CI->isMinValue(true)) {
		Value *Call = Builder->CreateFSub(
		ConstantFP::getNegativeZero(SrcTy), Op0C->getOperand(0));
		return CastInst::CreateBitOrPointerCast(Call, I.getType());
		}
		}

return Changed ? &I : nullptr;		return Changed ? &I : nullptr;
}		}

test/Transforms/InstCombine/xor2.ll

Show All 17 Lines	; CHECK: %C = icmp slt i32 %A, 0
%C = icmp slt i32 %B, 0		%C = icmp slt i32 %B, 0
ret i1 %C		ret i1 %C
}		}

; PR1014		; PR1014
define i32 @test2(i32 %tmp1) {		define i32 @test2(i32 %tmp1) {
; CHECK-LABEL: @test2(		; CHECK-LABEL: @test2(
; CHECK-NEXT: and i32 %tmp1, 32		; CHECK-NEXT: and i32 %tmp1, 32
; CHECK-NEXT: or i32 %ovm, 8		; CHECK-NEXT: or i32 %ovm, 8
; CHECK-NEXT: ret i32		; CHECK-NEXT: ret i32
%ovm = and i32 %tmp1, 32		%ovm = and i32 %tmp1, 32
%ov3 = add i32 %ovm, 145		%ov3 = add i32 %ovm, 145
%ov110 = xor i32 %ov3, 153		%ov110 = xor i32 %ov3, 153
ret i32 %ov110		ret i32 %ov110
}		}

define i32 @test3(i32 %tmp1) {		define i32 @test3(i32 %tmp1) {
; CHECK-LABEL: @test3(		; CHECK-LABEL: @test3(
; CHECK-NEXT: and i32 %tmp1, 32		; CHECK-NEXT: and i32 %tmp1, 32
; CHECK-NEXT: or i32 %ovm, 8		; CHECK-NEXT: or i32 %ovm, 8
; CHECK-NEXT: ret i32		; CHECK-NEXT: ret i32
%ovm = or i32 %tmp1, 145		%ovm = or i32 %tmp1, 145
%ov31 = and i32 %ovm, 177		%ov31 = and i32 %ovm, 177
%ov110 = xor i32 %ov31, 153		%ov110 = xor i32 %ov31, 153
ret i32 %ov110		ret i32 %ov110
}		}

define i32 @test4(i32 %A, i32 %B) {		define i32 @test4(i32 %A, i32 %B) {
%1 = xor i32 %A, -1		%1 = xor i32 %A, -1
%2 = ashr i32 %1, %B		%2 = ashr i32 %1, %B
Show All 15 Lines	test5:
%add = add i32 %xor1, %xor		%add = add i32 %xor1, %xor
ret i32 %add		ret i32 %add
; CHECK-LABEL: @test5(		; CHECK-LABEL: @test5(
; CHECK: lshr i32 %val1, 8		; CHECK: lshr i32 %val1, 8
; CHECK: ret		; CHECK: ret
}		}

; defect-1 in rdar://12329730		; defect-1 in rdar://12329730
; Simplify (X^Y) -> X or Y in the user's context if we know that		; Simplify (X^Y) -> X or Y in the user's context if we know that
; only bits from X or Y are demanded.		; only bits from X or Y are demanded.
; e.g. the "x ^ 1234" can be optimized into x in the context of "t >> 16".		; e.g. the "x ^ 1234" can be optimized into x in the context of "t >> 16".
; Put in other word, t >> 16 -> x >> 16.		; Put in other word, t >> 16 -> x >> 16.
; unsigned foo(unsigned x) { unsigned t = x ^ 1234; ; return (t >> 16) + t;}		; unsigned foo(unsigned x) { unsigned t = x ^ 1234; ; return (t >> 16) + t;}
define i32 @test6(i32 %x) {		define i32 @test6(i32 %x) {
%xor = xor i32 %x, 1234		%xor = xor i32 %x, 1234
%shr = lshr i32 %xor, 16		%shr = lshr i32 %xor, 16
%add = add i32 %shr, %xor		%add = add i32 %shr, %xor
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
%xor = xor i32 %neg, %or		%xor = xor i32 %neg, %or
ret i32 %xor		ret i32 %xor
; CHECK-LABEL: @test14(		; CHECK-LABEL: @test14(
; CHECK-NEXT: %[[not:.*]] = xor i32 %a, -1		; CHECK-NEXT: %[[not:.*]] = xor i32 %a, -1
; CHECK-NEXT: %[[and:.*]] = and i32 %[[not]], %b		; CHECK-NEXT: %[[and:.*]] = and i32 %[[not]], %b
; CHECK-NEXT: %[[xor:.*]] = xor i32 %[[and]], %c		; CHECK-NEXT: %[[xor:.*]] = xor i32 %[[and]], %c
; CHECK-NEXT: ret i32 %[[xor]]		; CHECK-NEXT: ret i32 %[[xor]]
}		}


		define i64 @fsub_double(double %x) {
		; CHECK-LABEL: @fsub_double(
		; CHECK-NEXT: %1 = fsub double -0.000000e+00, %x
		; CHECK-NEXT: %xor = bitcast double %1 to i64
		; CHECK-NEXT: ret i64 %xor
		%bc = bitcast double %x to i64
		%xor = xor i64 %bc, -9223372036854775808
		ret i64 %xor
		}

		define i64 @fsub_double_swap(double %x) {
		; CHECK-LABEL: @fsub_double_swap(
		; CHECK-NEXT: %1 = fsub double -0.000000e+00, %x
		; CHECK-NEXT: %xor = bitcast double %1 to i64
		; CHECK-NEXT: ret i64 %xor
		%bc = bitcast double %x to i64
		%xor = xor i64 -9223372036854775808, %bc
		ret i64 %xor
		}

		define i32 @fsub_float(float %x) {
		; CHECK-LABEL: @fsub_float(
		; CHECK-NEXT: %1 = fsub float -0.000000e+00, %x
		; CHECK-NEXT: %xor = bitcast float %1 to i32
		; CHECK-NEXT: ret i32 %xor
		%bc = bitcast float %x to i32
		%xor = xor i32 %bc, -2147483648
		ret i32 %xor
		}

		; Make sure that only a bitcast is transformed.

		define i64 @fsub_double_not_bitcast(double %x) {
		; CHECK-LABEL: @fsub_double_not_bitcast(
		; CHECK-NEXT: %bc = fptoui double %x to i64
		; CHECK-NEXT: %xor = xor i64 %bc, -9223372036854775808
		; CHECK-NEXT: ret i64 %xor
		%bc = fptoui double %x to i64
		%xor = xor i64 %bc, -9223372036854775808
		ret i64 %xor
		}