This is an archive of the discontinued LLVM Phabricator instance.

optimize vector fneg of bitcasted integer value
ClosedPublic

Authored by spatel on Aug 11 2014, 1:24 PM.

Details

Summary

This patch allows a vector fneg of a bitcasted integer value to be optimized in the same way that we already optimize a scalar fneg. If the integer variable is a constant, we can precompute the result and not require any logic ops.

So for x86, instead of something like this:

movd       %rdi, %xmm0
xorps      .LCPI2_0(%rip), %xmm0  ; constant pool load of sign mask

We should generate:

movabsq     (put sign bit mask in integer register via immediate)
xorq        (flip sign bits)

For ARM, this patch replaces the test case in test/CodeGen/ARM/2009-10-21-InvalidFNeg.ll with a new test case in test/CodeGen/ARM/fnegs.ll. That test file covers several ARM hardware variants. In each run of the new testcase, we should now just be using the most basic integer op (eor) rather than VFP/NEON.

For reference, the replaced test case used to generate:

add	r1, sp, #36
add	r0, r0, #48
vld1.32	{d16[0]}, [r1:32]
add	r1, r1, #4
vld1.32	{d16[1]}, [r1:32]
add	r1, sp, #44
vld1.32	{d17[0]}, [r1:32]
add	r1, r1, #4
vld1.32	{d17[1]}, [r1:32]
vneg.f32	q8, q8
vst1.64	{d16, d17}, [r0:128]
bx	lr

And should now generate:

push	{r4, lr}
ldr	r1, [sp, #48]
ldr	r12, [sp, #52]
ldr	r2, [sp, #56]
eor	r1, r1, #-2147483648
ldr	lr, [sp, #44]
eor	r3, r12, #-2147483648
eor	r4, r2, #-2147483648
add	r12, r0, #52
eor	r2, lr, #-2147483648
str	r2, [r0, #48]
stm	r12, {r1, r3, r4}
pop	{r4, pc}

This is a sibling patch to an fabs optimization that was checked in at r214892:
http://reviews.llvm.org/D4785

Ideally, we can refactor the visitFNEG and visitFABS functions in DAGCombiner since they are very similar, but I'll leave that for another patch.

Both patches originated from PR20354:
http://llvm.org/bugs/show_bug.cgi?id=20354

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 12365.Aug 11 2014, 1:24 PM
spatel retitled this revision from to optimize vector fneg of bitcasted integer value.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added a subscriber: Unknown Object (MLST).
spatel updated this revision to Diff 12382.Aug 11 2014, 9:24 PM

Fixed FileCheck prefixes for ARM testcase.

rengolin accepted this revision.Aug 13 2014, 6:26 AM
rengolin edited edge metadata.

Hi Sanjay,

The new code's "estimated" cost is so close the old one that I won't object. But I fear this might create some regressions, since the cost of using the vector pipeline in modern ARM cores is next to zero, and using both pipelines at the same time might actually be faster.

However, I tried to run them on an ARM OOO chip and made no noticeable difference, so go for it! Any relevant side effect should be picked up later and could be fixed on the ARM DAG legalizer, if needed.

cheers,
--renato

This revision is now accepted and ready to land.Aug 13 2014, 6:26 AM
spatel closed this revision.Aug 14 2014, 8:24 AM
spatel updated this revision to Diff 12506.

Closed by commit rL215646 (authored by @spatel).

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp