Discussion is inconclusive on the mailing list (http://lists.llvm.org/pipermail/llvm-dev/2016-April/098098.html) and IRC on whether this optimization is correct on platforms that do not follow IEEE-754.
Diff Detail
Event Timeline
Leaving aside the correctness question (adding Steve just in case), is it always profitable?
Profitable, or "more canonical" (i.e. is it the right place, compared to some later lowering pass)
There are a few niche floating-point formats that use a representation of the form [ exponent | 2s complement significand ], so the signbit ends up sitting in the middle of the number. Does LLVM claim to support arbitrary oddball formats for float/double? Ideally we would specify that LLVM assumes IEEE formats, even if we don't require fully conformant operations in hardware; that would give us some flexibility.
From a profitability standpoint, we'll end up undoing this in the backend on many platforms, but as a canonicalization it seems OK. We should really add an actual fneg at some point, though.
AFAICT, the LangRef is silent about this. In practice in the optimizer, I think we've baked in enough IEEE format assumptions that undoing it would take great suffering.
From a profitability standpoint, we'll end up undoing this in the backend on many platforms, but as a canonicalization it seems OK. We should really add an actual fneg at some point, though.
The motivating case for the earlier fabs transform was the existing x86 codegen produced for the integer-variant IR:
https://llvm.org/bugs/show_bug.cgi?id=22428
Since SSE has bitops on FP, I've been able to minimize the impact on most of those examples. But AArch64 still does worse with the integer IR:
define float @fabs_via_int(float %x) { %bc1 = bitcast float %x to i32 %and = and i32 %bc1, 2147483647 %bc2 = bitcast i32 %and to float ret float %bc2 } define float @fneg_via_int(float %x) { %bc1 = bitcast float %x to i32 %xor = xor i32 %bc1, 2147483648 %bc2 = bitcast i32 %xor to float ret float %bc2 } define float @fabs_via_fp(float %x) { %fabs = call float @llvm.fabs.f32(float %x) ret float %fabs } declare float @llvm.fabs.f32(float) define float @fneg_via_fp(float %x) { %sub = fsub float -0.0, %x ret float %sub }
$ ./llc -o - fabsneg.ll -mtriple=aarch64
fabs_via_int: fmov w8, s0 and w8, w8, #0x7fffffff fmov s0, w8 ret fneg_via_int: fmov w8, s0 eor w8, w8, #0x80000000 fmov s0, w8 ret fabs_via_fp: fabs s0, s0 ret fneg_via_fp: fneg s0, s0 ret
And for horror, change the triple to 'ppc64'...I still have nightmares. :)
I'm nervous about this change. The input code is denorm preserving on architectures that flush denorms, but the output may not be.
The input code is denorm preserving on architectures that flush denorms, but the output may not be.
Yet another reason to have a real fneg.
D19391 would obviate the need for this patch; any target that wants this sort of transform could enable it with the proposed TLI hook.
This was committed:
http://reviews.llvm.org/rL271573
So I think this patch should be abandoned. Any target that wants the fabs/fneg transforms can enable it using the TLI hook.
Note that we left the question of FP format in LLVM IR unanswered: the sign bit could be anywhere...although there is mention of IEEE NaN/Inf in the LangRef
If we do specify the format, then we could at least restore the ValueTracking part of this to indicate that the top bit is cleared by fabs().