This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Add PowerPC cmpb builtin and emit target indepedent code for XL compatibility
ClosedPublic

Authored by NeHuang on Jun 30 2021, 7:25 AM.

Details

Summary

This patch is in a series of patches to provide builtins for compatibility
with the XL compiler. This patch add the builtin and emit target independent
code for __cmpb.

Backend test case (IR) emitting the single instruction cmpb (llvm/test/CodeGen/PowerPC/cmpb.ll)

; RUN: llc -verify-machineinstrs -mtriple powerpc64-unknown-linux-gnu -mcpu pwr7 < %s | FileCheck %s
; RUN: llc -verify-machineinstrs -mtriple powerpc64-ibm-aix-xcoff -mcpu pwr7 -vec-extabi < %s | FileCheck %s

define i64 @test64(i64 %x, i64 %y) #0 {
; CHECK-LABEL: test64:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    cmpb 3, 3, 4
; CHECK-NEXT:    blr
entry:
  %shr19 = lshr i64 %x, 56
  %conv21 = trunc i64 %shr19 to i32
  %shr43 = lshr i64 %y, 56
  %conv45 = trunc i64 %shr43 to i32
  %0 = xor i64 %y, %x
  %1 = and i64 %0, 255
  %cmp = icmp eq i64 %1, 0
  %2 = and i64 %0, 65280
  %cmp52 = icmp eq i64 %2, 0
  %3 = and i64 %0, 16711680
  %cmp58 = icmp eq i64 %3, 0
  %4 = and i64 %0, 4278190080
  %cmp64 = icmp eq i64 %4, 0
  %5 = and i64 %0, 1095216660480
  %cmp70 = icmp eq i64 %5, 0
  %6 = and i64 %0, 280375465082880
  %cmp76 = icmp eq i64 %6, 0
  %7 = and i64 %0, 71776119061217280
  %cmp82 = icmp eq i64 %7, 0
  %cmp88 = icmp eq i32 %conv21, %conv45
  %conv92 = select i1 %cmp, i64 255, i64 0
  %conv93 = select i1 %cmp52, i64 65280, i64 0
  %or = or i64 %conv92, %conv93
  %conv95 = select i1 %cmp58, i64 16711680, i64 0
  %or97 = or i64 %or, %conv95
  %conv98 = select i1 %cmp64, i64 4278190080, i64 0
  %or100 = or i64 %or97, %conv98
  %conv101 = select i1 %cmp70, i64 1095216660480, i64 0
  %or103 = or i64 %or100, %conv101
  %conv104 = select i1 %cmp76, i64 280375465082880, i64 0
  %or106 = or i64 %or103, %conv104
  %conv107 = select i1 %cmp82, i64 71776119061217280, i64 0
  %or109 = or i64 %or106, %conv107
  %conv110 = select i1 %cmp88, i64 -72057594037927936, i64 0
  %or112 = or i64 %or109, %conv110
  ret i64 %or112
}

Diff Detail

Event Timeline

NeHuang created this revision.Jun 30 2021, 7:25 AM
NeHuang requested review of this revision.Jun 30 2021, 7:25 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2021, 7:25 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

gentle ping

NeHuang updated this revision to Diff 357626.Jul 9 2021, 1:52 PM

Remove entry check in test case.

nemanjai requested changes to this revision.Jul 13 2021, 7:29 AM

Also if we are going to be emitting this complex sequence, in the description of the patch, point out the test case that shows the back end handles this and emits a single instruction.

clang/lib/CodeGen/CGBuiltin.cpp
15081

I find it rather surprising that we are emitting this complex sequence for this builtin. Perhaps there is a good reason for doing so, but at the very least, this requires a thorough explanation in a comment.

One additional concern I have with this is that if some transformation proves that some portion of this is unused (perhaps using DemandedBits analysis), it may optimize out a portion of this, thereby making the sequence emit a whole bunch of xor's, or's, rotates, etc.

For example:

...
unsigned long long A = __builtin_ppc_cmpb(B, C);
return A & 0xFF00FF00FF00FF;

It is entirely possible that the optimizer will get rid of some of the produced instructions and then the back end won't be able to emit a single cmpb but will have to emit a whole bunch of scalar instructions.

This revision now requires changes to proceed.Jul 13 2021, 7:29 AM
NeHuang added inline comments.Jul 14 2021, 7:52 AM
clang/lib/CodeGen/CGBuiltin.cpp
15081
  • The backend test case define i64 @test64(i64 %x, i64 %y) is in llvm/test/CodeGen/PowerPC/cmpb.ll
  • Also Tried the test case and results look fine.
$ cat test_cmpb.c
long long test_cmpb(long long a, long long b) {
  //return __cmpb(a, b);
  unsigned long long A = __builtin_ppc_cmpb(a, b);
  return A & 0xFF00FF00FF00FF;
}
$ clang -cc1 -O3 -triple powerpc-unknown-aix test_cmpb.c -target-cpu pwr9 -S -o test_cmpb_32bit.s
...
.test_cmpb:
# %bb.0:                                # %entry
        cmpb 4, 6, 4
        lis 6, 255
        cmpb 3, 5, 3
        ori 6, 6, 255
        and 4, 4, 6
        and 3, 3, 6
        blr
$ clang -cc1 -O3 -triple powerpc64-unknown-aix test_cmpb.c -target-cpu pwr9 -S -o test_cmpb_64bit.s
.test_cmpb:
# %bb.0:                                # %entry
        cmpb 3, 4, 3
        lis 4, 255
        ori 4, 4, 255
        rldimi 4, 4, 32, 0
        and 3, 3, 4
        blr
NeHuang edited the summary of this revision. (Show Details)Jul 14 2021, 2:51 PM
NeHuang marked an inline comment as done.Jul 14 2021, 5:10 PM
NeHuang added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
15081

The intension of emitting this complex sequence for this builtin is to produce the target independent code matching the backend test case IR (in the description) and produce asm results matching xlc. Verified the target IR will produce expected asm results for 32 bit and 64 bit.

  • For 64 bit
cmpb 3, 3, 4
blr
  • For 32 bit
cmpb 4, 6, 4
cmpb 3, 5, 3
blr
NeHuang updated this revision to Diff 359467.Jul 16 2021, 3:42 PM

Address review comment to rework 32 bit handling.

nemanjai requested changes to this revision.Jul 19 2021, 8:05 AM

Mostly minor comments, but it'll be good to have another look to make sure they're all addressed.

clang/lib/CodeGen/CGBuiltin.cpp
15101

Nit: no else after return.

15104

These comments are superfluous. You have the entire sequence listed above, no need to repeat it piece-wise here.

15105

Please do not name variables with non-descriptive single letters. It is somewhat conventional to use suffixes such as Hi/Lo for high/low order parts of a value.

llvm/include/llvm/IR/IntrinsicsPowerPC.td
1537

Line too long?

This revision now requires changes to proceed.Jul 19 2021, 8:05 AM
NeHuang updated this revision to Diff 359843.Jul 19 2021, 10:52 AM

Address review comments from Nemanja.

NeHuang marked 4 inline comments as done.Jul 19 2021, 10:53 AM

Rebased the patch with ToT.

NeHuang updated this revision to Diff 360109.Jul 20 2021, 6:47 AM

clang-format

nemanjai accepted this revision.Jul 20 2021, 9:40 AM

LGTM. Thank you.

This revision is now accepted and ready to land.Jul 20 2021, 9:40 AM
This revision was landed with ongoing or failed builds.Jul 20 2021, 11:10 AM
This revision was automatically updated to reflect the committed changes.

I moved a couple of new tests from this patch from CodeGen -> CodeGen/PowerPC so they don't fail when the PPC backend isn't built: f6769b663a0d4432b5e00e0c03904a5dfba7b077

Thanks @jroelofs for moving the test cases! Those cases were added in https://reviews.llvm.org/D105946 and I have notified the author.