This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix some missed optimization opportunities in combineSetCC
ClosedPublic

Authored by HLJ2009 on Oct 17 2018, 2:15 AM.

Details

Summary
  1. 0-x == y -> x + y ==0

0-x == y ir is

define i64 @eq(i64, i64) local_unnamed_addr #0 {
  %3 = sub nsw i64 0, %0
  %4 = icmp eq i64 %3, %1
  %5 = zext i1 %4 to i64
  ret i64 %5
}

Now the code generated by llvm is as follows

0000000000000000 <eq>:
   0:	d0 00 63 7c 	neg     r3,r3
   4:	78 22 63 7c 	xor     r3,r3,r4
   8:	74 00 63 7c 	cntlzd  r3,r3
   c:	e2 d7 63 78 	rldicl  r3,r3,58,63
  10:	20 00 80 4e 	blr

We can optimize the code as follows

0000000000000000 <eq>:
   0:	14 1a 64 7c 	add     r3,r4,r3
   4:	74 00 63 7c 	cntlzd  r3,r3
   8:	e2 d7 63 78 	rldicl  r3,r3,58,63
   c:	20 00 80 4e 	blr
  1. 0 - x != y -> x + y != 0 0 - x != y ir is
define i64 @neq(i64, i64) local_unnamed_addr #0 {
  %3 = sub nsw i64 0, %0
  %4 = icmp ne i64 %3, %1
  %5 = zext i1 %4 to i64
  ret i64 %5
}

Now the code generated by llvm is as follows

0000000000000000 <neq>:
   0:	d0 00 63 7c 	neg     r3,r3
   4:	78 22 63 7c 	xor     r3,r3,r4
   8:	ff ff 83 30 	addic   r4,r3,-1
   c:	10 19 64 7c 	subfe   r3,r4,r3
  10:	20 00 80 4e 	blr

We can optimize the code as follows

0000000000000000 <neq>:
   0:	14 1a 64 7c 	add     r3,r4,r3
   4:	ff ff 83 30 	addic   r4,r3,-1
   8:	10 19 64 7c 	subfe   r3,r4,r3
   c:	20 00 80 4e 	blr

Diff Detail

Event Timeline

HLJ2009 created this revision.Oct 17 2018, 2:15 AM
HLJ2009 updated this revision to Diff 169972.Oct 17 2018, 2:22 AM
HLJ2009 edited the summary of this revision. (Show Details)Oct 17 2018, 2:57 AM
HLJ2009 edited the summary of this revision. (Show Details)
HLJ2009 edited the summary of this revision. (Show Details)
nemanjai accepted this revision.Oct 19 2018, 4:06 AM

Other than the minor nit, LGTM.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
12482

I think we should probably return an empty SDValue() from that function and change this section to something like:

if (SDValue CSCC = combineSetCC(N, DCI))
  return CSCC;
LLVM_FALLTHROUGH;
14287

This is kind of hidden from the top-level view of what combine function is called for which node. I think it is clearer to the reader if we actually call this from PerformDAGCombine() above as I suggested in my comment.

This revision is now accepted and ready to land.Oct 19 2018, 4:06 AM
HLJ2009 updated this revision to Diff 170389.Oct 22 2018, 5:02 AM

Other than the minor nit, LGTM.

Thank you for pointing this out and also thank you for review the code.

This revision was automatically updated to reflect the committed changes.