Page MenuHomePhabricator

InstCombine optimization to convert floating-point sign bit XORs to fsubs from -0.0
AbandonedPublic

Authored by aarzee on Apr 7 2016, 3:42 PM.

Details

Summary

Discussion is inconclusive on the mailing list (http://lists.llvm.org/pipermail/llvm-dev/2016-April/098098.html) and IRC on whether this optimization is correct on platforms that do not follow IEEE-754.

Diff Detail

Event Timeline

aarzee updated this revision to Diff 52962.Apr 7 2016, 3:42 PM
aarzee retitled this revision from to InstCombine optimization to convert floating-point sign bit XORs to fsubs from -0.0.
aarzee updated this object.
aarzee added reviewers: escha, spatel.
aarzee updated this revision to Diff 53027.Apr 8 2016, 8:15 AM

Added tests.

mehdi_amini added a subscriber: mehdi_amini.

Leaving aside the correctness question (adding Steve just in case), is it always profitable?

Profitable, or "more canonical" (i.e. is it the right place, compared to some later lowering pass)

scanon edited edge metadata.Apr 8 2016, 9:53 AM

There are a few niche floating-point formats that use a representation of the form [ exponent | 2s complement significand ], so the signbit ends up sitting in the middle of the number. Does LLVM claim to support arbitrary oddball formats for float/double? Ideally we would specify that LLVM assumes IEEE formats, even if we don't require fully conformant operations in hardware; that would give us some flexibility.

From a profitability standpoint, we'll end up undoing this in the backend on many platforms, but as a canonicalization it seems OK. We should really add an actual fneg at some point, though.

spatel edited edge metadata.Apr 8 2016, 10:34 AM

Does LLVM claim to support arbitrary oddball formats for float/double? Ideally we would specify that LLVM assumes IEEE formats, even if we don't require fully conformant operations in hardware; that would give us some flexibility.

AFAICT, the LangRef is silent about this. In practice in the optimizer, I think we've baked in enough IEEE format assumptions that undoing it would take great suffering.

From a profitability standpoint, we'll end up undoing this in the backend on many platforms, but as a canonicalization it seems OK. We should really add an actual fneg at some point, though.

The motivating case for the earlier fabs transform was the existing x86 codegen produced for the integer-variant IR:
https://llvm.org/bugs/show_bug.cgi?id=22428

Since SSE has bitops on FP, I've been able to minimize the impact on most of those examples. But AArch64 still does worse with the integer IR:

define float @fabs_via_int(float %x) {
  %bc1 = bitcast float %x to i32
  %and = and i32 %bc1, 2147483647
  %bc2 = bitcast i32 %and to float
  ret float %bc2
}

define float @fneg_via_int(float %x) {
  %bc1 = bitcast float %x to i32
  %xor = xor i32 %bc1, 2147483648
  %bc2 = bitcast i32 %xor to float
  ret float %bc2
}

define float @fabs_via_fp(float %x) {
  %fabs = call float @llvm.fabs.f32(float %x)
  ret float %fabs
}

declare float  @llvm.fabs.f32(float)

define float @fneg_via_fp(float %x) {
  %sub = fsub float -0.0, %x
  ret float %sub
}

$ ./llc -o - fabsneg.ll -mtriple=aarch64

fabs_via_int:           
  fmov	w8, s0
  and	w8, w8, #0x7fffffff
  fmov	s0, w8
  ret
fneg_via_int:              
  fmov	w8, s0
  eor	w8, w8, #0x80000000
  fmov	s0, w8
  ret
fabs_via_fp:                   
  fabs	s0, s0
  ret
fneg_via_fp:                     
  fneg	s0, s0
  ret

And for horror, change the triple to 'ppc64'...I still have nightmares. :)

I'm nervous about this change. The input code is denorm preserving on architectures that flush denorms, but the output may not be.

The input code is denorm preserving on architectures that flush denorms, but the output may not be.

Yet another reason to have a real fneg.

spatel added a comment.May 5 2016, 1:56 PM

D19391 would obviate the need for this patch; any target that wants this sort of transform could enable it with the proposed TLI hook.

spatel added a comment.Jun 5 2016, 7:49 AM

D19391 would obviate the need for this patch; any target that wants this sort of transform could enable it with the proposed TLI hook.

This was committed:
http://reviews.llvm.org/rL271573

So I think this patch should be abandoned. Any target that wants the fabs/fneg transforms can enable it using the TLI hook.

Note that we left the question of FP format in LLVM IR unanswered: the sign bit could be anywhere...although there is mention of IEEE NaN/Inf in the LangRef

If we do specify the format, then we could at least restore the ValueTracking part of this to indicate that the top bit is cleared by fabs().

spatel resigned from this revision.Jun 30 2016, 7:37 AM
spatel removed a reviewer: spatel.
aarzee abandoned this revision.Jul 29 2016, 11:25 AM