This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.
So for x86, instead of something like this:
movd %rdi, %xmm0 xorps .LCPI2_0(%rip), %xmm0 ; constant pool load of sign mask
We should generate:
movabsq (put sign bit mask in integer register via immediate) xorq (flip sign bits)
For ARM, this patch replaces the test case in test/CodeGen/ARM/2009-10-21-InvalidFNeg.ll with a new test case in test/CodeGen/ARM/fnegs.ll. That test file covers several ARM hardware variants. In each run of the new testcase, we should now just be using the most basic integer op (eor) rather than VFP/NEON.
For reference, the replaced test case used to generate:
add r1, sp, #36 add r0, r0, #48 vld1.32 {d16[0]}, [r1:32] add r1, r1, #4 vld1.32 {d16[1]}, [r1:32] add r1, sp, #44 vld1.32 {d17[0]}, [r1:32] add r1, r1, #4 vld1.32 {d17[1]}, [r1:32] vneg.f32 q8, q8 vst1.64 {d16, d17}, [r0:128] bx lr
And should now generate:
push {r4, lr} ldr r1, [sp, #48] ldr r12, [sp, #52] ldr r2, [sp, #56] eor r1, r1, #-2147483648 ldr lr, [sp, #44] eor r3, r12, #-2147483648 eor r4, r2, #-2147483648 add r12, r0, #52 eor r2, lr, #-2147483648 str r2, [r0, #48] stm r12, {r1, r3, r4} pop {r4, pc}
This is a sibling patch to an fabs optimization that was checked in at r214892:
http://reviews.llvm.org/D4785
Ideally, we can refactor the visitFNEG and visitFABS functions in DAGCombiner since they are very similar, but I'll leave that for another patch.
Both patches originated from PR20354:
http://llvm.org/bugs/show_bug.cgi?id=20354