This is an archive of the discontinued LLVM Phabricator instance.

optimize vector fabs of bitcasted constant integer values
ClosedPublic

Authored by spatel on Aug 4 2014, 12:51 PM.

Details

Summary

This patch allows vector fabs operations of bitcasted constant integer values to be optimized in the same way that we already optimize scalar fabs.

So for code like this:
%bitcast = bitcast i64 18446744069414584320 to <2 x float> ; 0xFFFF_FFFF_0000_0000
%fabs = call <2 x float> @llvm.fabs.v2f32(<2 x float> %bitcast)
%ret = bitcast <2 x float> %fabs to i64

Instead of generating something like this:

movabsq (constant pool loadi of mask for sign bits) 
vmovq   (move from integer register to vector/fp register)
vandps  (mask off sign bits)
vmovq   (move vector/fp register back to integer return register)

We should generate:

mov     (put constant value in return register)

I have also removed a redundant clause in the first 'if' statement:
N0.getOperand(0).getValueType().isInteger()

is the same thing as:
IntVT.isInteger()

For more background, please see:
http://reviews.llvm.org/D4770

And:
http://llvm.org/bugs/show_bug.cgi?id=20354

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 12174.Aug 4 2014, 12:51 PM
spatel retitled this revision from to optimize vector fabs of bitcasted constant integer values.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: rengolin, pete, chandlerc.
spatel added a subscriber: Unknown Object (MLST).
spatel added a comment.Aug 5 2014, 9:29 AM

I know Renato is interested in the change to ARM codegen for the FNEG version of this optimization (not included in this patch) because it will change an existing test case. Here's ARM codegen for the 2 new FABS cases in this proposed patch.

I'm not sure what LLVM policy is for repeating codegen testcases per architecture, but if that's considered a good thing, I can add these to codegen/ARM:

Using:
$ ./llc -o - vec_fabs.ll -march=arm -mcpu=cortex-a8 -mattr=+neon

We get:

_fabs_v2f32_1:                          @ @fabs_v2f32_1
@ BB#0:
 	mvn	r0, #0
 	mov	r1, #0
 	vmov	d16, r1, r0
 	vabs.f32	d16, d16
 	vmov	r0, r1, d16
 	bx	lr
 
_fabs_v2f32_2:                          @ @fabs_v2f32_2
@ BB#0:
 	mov	r0, #0
 	mvn	r1, #0
 	vmov	d16, r1, r0
 	vabs.f32	d16, d16
 	vmov	r0, r1, d16
 	bx	lr

After the optimization, this would become:

_fabs_v2f32_1:                          @ @fabs_v2f32_1
@ BB#0:
	mov	r0, #0
	mvn	r1, #-2147483648
	mov	pc, lr

_fabs_v2f32_2:                          @ @fabs_v2f32_2
@ BB#0:
	mvn	r0, #-2147483648
	mov	r1, #0
	mov	pc, lr
rengolin accepted this revision.Aug 5 2014, 9:43 AM
rengolin edited edge metadata.

Hi Sanjay,

That looks good to me, thanks for taking the time.

Duplicating the fabs case in the ARM tree adding a CHECK: mvn + CHECK-NOT: vabs would be great, with a little explanation just like you did for x86.

Thanks!
--renato

This revision is now accepted and ready to land.Aug 5 2014, 9:43 AM
spatel closed this revision.Aug 5 2014, 10:44 AM
spatel updated this revision to Diff 12202.

Closed by commit rL214892 (authored by @spatel).