This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCCallingConv.cpp
-
PPCCallingConv.td
-
PPCISelLowering.h
-
PPCISelLowering.cpp
-
PPCInstrInfo.td
-
PPCInstrSPE.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
spe.ll

Differential D54583

PowerPC: Optimize SPE double parameter calling setup
ClosedPublic

Authored by jhibbits on Nov 15 2018, 8:38 AM.

Download Raw Diff

Details

Reviewers

nemanjai
hfinkel
joerg

Commits

rG1d1cf30b738b: PowerPC: Optimize SPE double parameter calling setup
rL363526: PowerPC: Optimize SPE double parameter calling setup

Summary

SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.

For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:

evstdd      5, X(1)
lwz         3, X(1)
lwz         4, X+4(1)

Likewise, to form a double into r5 from args in r3 and r4:

stw         3, X(1)
stw         4, X+4(1)
evldd       5, X(1)

This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:

mr          4, 5
evmergehi   3, 5, 5

And to form a double into r5 from args in r3 and r4:

evmergelo   5, 3, 4

This is comparable to the way that gcc generates the double splits.

This also fixes expanding of builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes.

Diff Detail

Repository: rL LLVM

Event Timeline

jhibbits created this revision.Nov 15 2018, 8:38 AM

Herald added subscribers: llvm-commits, jsji, jfb, kbarton. · View Herald TranscriptNov 15 2018, 8:38 AM

Harbormaster completed remote builds in B25053: Diff 174220.Nov 15 2018, 8:38 AM

glaubitz added a subscriber: glaubitz.Dec 4 2018, 7:31 AM

I have applied this patch to the llvm-toolchain-7 package in Debian and did not see any regressions on x86_64 or 32-Bit PowerPC. Additionally, I have included the patches from https://reviews.llvm.org/D49754 and https://reviews.llvm.org/D54409 saw no regressions on x86_64 and 32-bit PowerPC.

All three patches will be part of the next upload of the llvm-toolchain-7 package in Debian unstable which will be version 1:7.0.1~+rc2-9.

vit9696 added a subscriber: vit9696.Dec 21 2018, 1:44 PM

nemanjai added inline comments.Dec 29 2018, 1:33 PM

lib/Target/PowerPC/PPCISelLowering.cpp
393 ↗	(On Diff #174220)	No need for braces when there is a single statement in the if/else.
7833 ↗	(On Diff #174220)	The early exit should be first.
7837 ↗	(On Diff #174220)	The indentation is off. Maybe run `clang-format` on this function. I don't know which editor you use but if you use `Vim`, you can run `:7827,7845 ! clang-format` if you have `clang-format` in your `$PATH`. Also, it seems like you might want to check the input type as well - maybe with an assert if it can't be anything other than `f64`.
7840 ↗	(On Diff #174220)	A more descriptive assert message is probably in order. Also, perhaps it would be clearer if this was rewritten to: Assert that the constant operand value is less than 2 Use a ternary operator to select the opcode (or just have one opcode - see above)
7842 ↗	(On Diff #174220)	Line too long?
lib/Target/PowerPC/PPCISelLowering.h
203 ↗	(On Diff #174220)	Why not just have `EXTRACT_SPE_OP` and have it take a constant operand that determines Hi/Lo? Also, for both build and extract, it would be good to add a comment that these correspond almost exactly to `BUILD_PAIR` and `EXTRACT_VECTOR_ELT` nodes except that the input types are floating point since `i64` isn't a legal type for the target.

jhibbits marked 6 inline comments as done.Dec 29 2018, 8:24 PM

jhibbits added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
7840 ↗	(On Diff #174220)	This assert was during some debugging, and I forgot to remove it. However, it does make sense to have an assert along the lines of your suggestion. I"ll make such a change.
7842 ↗	(On Diff #174220)	Yeah, just a hair (one character). Reformatting with clang-format will fix that.
lib/Target/PowerPC/PPCISelLowering.h
203 ↗	(On Diff #174220)	I'm not sure how to pass a constant through to the tablegen'd layer. These two pseudo-ops are just light wrappers to EVMERGEHI and MR. If there is a way to pass a constant and do the switch down in that layer, then that's acceptable as well.

nemanjai added inline comments.Dec 30 2018, 5:12 AM

lib/Target/PowerPC/PPCISelLowering.h
203 ↗	(On Diff #174220)	Sure, there are existing examples. Vector conversion custom nodes are probably quite similar to what you need: PPCISD::SINT_VEC_TO_FP PPCISD::UINT_VEC_TO_FP But there will be others.

Fix expanding builtins to libcalls. Remove the need for intermediate illegal types in expanding and pairing the arguments and return values.

Unfortunately arcanist on my machine seems to be broken working with this repository, so I had to upload a diff manually.

Herald added subscribers: dexonsmith, mehdi_amini. · View Herald TranscriptJan 14 2019, 8:37 PM

kthomsen added a subscriber: kthomsen.Jan 16 2019, 1:46 AM

jhibbits mentioned this in D49754: Add -m(no-)spe, and e500 CPU definitions and support to clang.Jan 17 2019, 12:54 PM

Fix argument indices for indexing through OutVals[]. It should be the argument index, not the physical register index.

Hi Justin, I'm watching your work and used your patches to bring SPE into my CLANG for OS-9.
The OutVals[] issue is what I found yesterday as well by debugging the CLANG part.
There is a 2. location of this in PPCTargetLowering::LowerReturn()

 // Copy the result values into the output registers.
for (unsigned i = 0, realI = 0; i != RVLocs.size(); ++i, ++realI) {
  CCValAssign &VA = RVLocs[i];
  assert(VA.isRegLoc() && "Can only return in registers!");

  SDValue Arg = OutVals[realI];

Regards, Kei

Hi Kei, thanks for that. I've updated my code, and will post an updated diff tomorrow.

One more argument index fixup.

I have a question:
When compiling

double a;
void func(double x) {
a = x;
}

It is generating

lis 5, a@ha
evstdd 3, a@l(5)

But as the evstdd and evldd are having only 8bit (5bit real) UIMM offset, this code is not working, as the offset a@lo is not known to be 8bit only.
Is this issue already addressed in a patch and I simply haven't seen this, or is this still a missing part?

I would expect

lis 5, a@ha
li 4, a@l
evstddx 3, 4, 5

It seems that this is checked/generated by the PPCISelLowering.cpp SelectAddressRegReg() and/or SelectAddressRegImm()
Actually I'm trying to find out I have enough information in the SDValue N to identify this as a SPE load/store.
Do I missed a patch for this?
Thanks, Kei

Hi Kei, yes you need the patch in D54409, which fixes the offset handling. It should fix your case as well, but I didn't test foreign addresses.

The Patch D54409 is only handling the variables on the stack named in the code as "framedata". I'm going on to find out, how to manage this for global variables. SelectAddressRegReg() and SelectAddressRegImm() are doing this, but there is no information about the Target data. Maybe it needs to be decided somewhere else.

Hmm, I have not yet tried to explore this, but I get a feeling a regression appeared somewhere during the patch iterations. Either this or D54409.
At this point I am consistently getting weird generated instructions for __floatundidf from compiler-rt (compiling with freebsd & -O3), yet I have a correct one in my files.
What makes it strange is that the logs show that the correct and first incorrect examples were generated by the same compiler (binary file) with the same flags, yet I no longer can get the correct one.

Does it reproduce for anyone?

Supposedly correct one:

.set back_chain, -0x20
.set var_10, -0x10
.set var_8, -8
94 21 FF E0                       stwu      r1, back_chain(r1)
3C A0 45 30                       lis       r5, 0x4530
90 61 00 1C                       stw       r3, 0x20+var_8+4(r1)
90 A1 00 18                       stw       r5, 0x20+var_8(r1)
3C A0 43 30                       lis       r5, 0x4330
10 61 1B 01                       evldd     r3, 0x20+var_8(r1)
90 81 00 14                       stw       r4, 0x20+var_10+4(r1)
3C 80 80 02                       lis       r4, -0x7FFE
10 84 DB 01                       evldd     r4, 0xD8(r4) ; note this one
90 A1 00 10                       stw       r5, 0x20+var_10(r1)
10 A1 13 01                       evldd     r5, 0x20+var_10(r1)
10 63 22 E0                       efdadd    r3, r3, r4
10 83 2A E0                       efdadd    r4, r3, r5
10 64 22 2C                       evmergehi r3, r4, r4
38 21 00 20                       addi      r1, r1, 0x20
4E 80 00 20                       blr

What I get now:

94 21 FF E0                       stwu      r1, back_chain(r1)
3C A0 45 30                       lis       r5, 0x4530
90 61 00 1C                       stw       r3, 0x20+var_8+4(r1)
90 A1 00 18                       stw       r5, 0x20+var_8(r1)
3C A0 43 30                       lis       r5, 0x4330
10 61 1B 01                       evldd     r3, 0x20+var_8(r1)
90 81 00 14                       stw       r4, 0x20+var_10+4(r1)
3C 80 80 02                       lis       r4, -0x7FFE
10 8C BB 01                       evldd     r4, 0xB8(r12) ; note this one
90 A1 00 10                       stw       r5, 0x20+var_10(r1)
10 A1 13 01                       evldd     r5, 0x20+var_10(r1)
10 63 22 E0                       efdadd    r3, r3, r4
10 83 2A E0                       efdadd    r4, r3, r5
10 64 22 2C                       evmergehi r3, r4, r4
38 21 00 20                       addi      r1, r1, 0x20
4E 80 00 20                       blr

94 21 FF E0                       stwu      r1, back_chain(r1)
3C A0 45 30                       lis       r5, 0x4530
90 61 00 1C                       stw       r3, 0x20+var_8+4(r1)
90 A1 00 18                       stw       r5, 0x20+var_8(r1)
3C A0 43 30                       lis       r5, 0x4330
10 61 1B 01                       evldd     r3, 0x20+var_8(r1)
90 81 00 14                       stw       r4, 0x20+var_10+4(r1)
3C 80 80 01                       lis       r4, -0x7FFF
10 9F 1B 01                       evldd     r4, 0x18(r31) ; note this one
90 A1 00 10                       stw       r5, 0x20+var_10(r1)
10 A1 13 01                       evldd     r5, 0x20+var_10(r1)
10 63 22 E0                       efdadd    r3, r3, r4
10 83 2A E0                       efdadd    r4, r3, r5
10 64 22 2C                       evmergehi r3, r4, r4
38 21 00 20                       addi      r1, r1, 0x20
4E 80 00 20                       blr

Reference source:

double floatundidf(unsigned long long a)
{
    static const double twop52 = 4503599627370496.0; // 0x1.0p52
    static const double twop84 = 19342813113834066795298816.0; // 0x1.0p84
    static const double twop84_plus_twop52 = 19342813118337666422669312.0; // 0x1.00000001p84

    union { uint64_t x; double d; } high = { .d = twop84 };
    union { uint64_t x; double d; } low = { .d = twop52 };

    high.x |= a >> 32;
    low.x |= a & UINT64_C(0x00000000ffffffff);

    const double result = (high.d - twop84_plus_twop52) + low.d;
    return result;
}

As promised I have modified the SelectAddressRegReg() in PPCISelLowering.cpp to create correct evldd(x) and evstdd(x) instructions when accessing global variables.

bool PPCTargetLowering::SelectAddressRegReg(SDValue N, SDValue &Base,

                                          SDValue &Index,
                                          SelectionDAG &DAG) const {
int16_t imm = 0;
if (N.getOpcode() == ISD::ADD) {
  if (hasSPE()) {
    // Is there any SPE load/store (f64) which can't handle 16bit offset?
    for (SDNode::use_iterator UI = N->use_begin(), E = N->use_end();
        UI != E; ++UI) {
      if (UI->getOpcode() == ISD::STORE) {
        // Store has the type Operand[1]
        if ((UI->getNumOperands() >= 2) 
          && (UI->getOperand(1).getSimpleValueType() == MVT::f64)) {
            // This is a f64 store with SPE
            // The instruction evstdd can only handle 8bit offset
            Base = N.getOperand(0);
            Index = N.getOperand(1);
            return true;
        }
      } else
      if (UI->getOpcode() == ISD::LOAD) {
        // Load has the type in Values[0]
        if ((UI->getNumValues() >= 1) 
          && (UI->getSimpleValueType(0) == MVT::f64)) {
            // This is a f64 load with SPE
            // The instruction evldd can only handle 8bit offset
            Base = N.getOperand(0);
            Index = N.getOperand(1);
            return true;
        }
      }
    }
  }
  if (isIntS16Immediate(N.getOperand(1), imm))
    return false;    // r+i
  if (N.getOperand(1).getOpcode() == PPCISD::Lo)
    return false;    // r+i

....
The modification starts with if(hasSPE()) { and ends with the fitting }

What have I done: The SelectAddressRegReg() function is the central decider for "offset+r4" or "r4+r5", but it only knows about 16bit offsets. evldd and evstdd are using 8bit (5bit usable) offsets. Therefore they can't be used when accessing global variables. The patch in here is now looking in the useList if it is a SPE with Load or Store for f64 data. Then it tells, that it must be Register+Register addressing, which is then automatically chaning to evlddx / evstddx.
I have tested this with OS-9 running on a P2020 (e500v2).

Would you like to put my patch into your patch D54583?
Kei

Hi Kei, thanks! I'll gladly add it to one of the patches. It might fit better with D54409, since the purpose of that is to fix the evldd handling in general.

Hi @vit9696, it looks like Kei's patch fixes the issue you're seeing as well.

I placed the info in this thread, as D54583 already has PPCISelLowering.cpp in it. But in real it better fits into D54409.

In D54583#1368816, @jhibbits wrote:

Hi @vit9696, it looks like Kei's patch fixes the issue you're seeing as well.

Yes, I am currently testing the code with the latest changes, and so far it appears to be resolved. Thanks.

Also, as we discussed on IRC, please provide full context to make it easier to review.
I am actually OK with this revision as long as the comments are addressed, but I'm just requesting another revision to ensure all the comments have been addressed.

lib/Target/PowerPC/PPCISelLowering.cpp
3164 ↗	(On Diff #182715)	Since this isn't local to a very small loop, I think a more descriptive name is in order. Perhaps `RegIdx`?
3165 ↗	(On Diff #182715)	I think it is more obvious to use `sizeof(MCPhysReg)` rather than `sizeof(HiRegList[0])`.
3196 ↗	(On Diff #182715)	I can't say I am overly familiar with this code, but we don't need to call `State.AllocateReg()` for the low register?
3550 ↗	(On Diff #182715)	I don't think it's useful to duplicate a comment down both paths in a condition.
3555 ↗	(On Diff #182715)	Perhaps something like assert(i + 1 < e && "No second half of double precision argument");
5520 ↗	(On Diff #182715)	`i, realI, j` are not adequate given this rather complex iteration space... Please name them for what they are meant to refer to.
6755 ↗	(On Diff #182715)	Can we be consistent about how we specify the index to the extract between here and above? I am OK with either approach as long as we use the same.
lib/Target/PowerPC/PPCInstrInfo.td
237 ↗	(On Diff #182715)	Seems like we don't need to separately state that the types are the same and that they're the same size?

This revision now requires changes to proceed.Feb 19 2019, 4:13 PM

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 19 2019, 4:13 PM

Address feedback. Provide full context for diffs.

I found an issue with the SPE compare operations. The result of a efdcmpeq , efdcmpgt and efdcmplt is every time the GT-Bit in the Condition Register. This is adressed in one place of the PPCISelDAGToDAG.cpp, but not addressed for a second case of the code generation.
The diff of PPCISelDAGToDAG.cpp is:

--- PPCISelDAGToDAG.cpp 2019-03-12 15:35:45.000000000 +0100
+++ "\\ellcc\\PPCISelDAGToDAG.cpp"      2019-03-27 10:52:21.088326000 +0100
@@ -5039,6 +5038,32 @@ void PPCDAGToDAGISel::Select(SDNode *N)
       PCC |= getBranchHint(PCC, FuncInfo, N->getOperand(4));

     SDValue CondCode = SelectCC(N->getOperand(2), N->getOperand(3), CC, dl);
+
+    if (PPCSubTarget->hasSPE() && N->getOperand(2).getValueType().isFloatingPoint()) {
+      // For SPE instructions, the result is in GT bit of the CR
+      switch(CC) {
+        case ISD::SETOEQ:
+        case ISD::SETEQ:
+        case ISD::SETOLT:
+        case ISD::SETLT:
+        case ISD::SETOGT:
+        case ISD::SETGT:
+                PCC = PPC::PRED_GT;
+                break;
+        case ISD::SETUNE:
+        case ISD::SETNE:
+        case ISD::SETULE:
+        case ISD::SETLE:
+        case ISD::SETUGE:
+        case ISD::SETGE:
+                PCC = PPC::PRED_LE;
+                break;
+        default:
+                break;
+      }
+    }
+
+
     SDValue Ops[] = { getI32Imm(PCC, dl), CondCode,
                         N->getOperand(4), N->getOperand(0) };
     CurDAG->SelectNodeTo(N, PPC::BCC, MVT::Other, Ops);

The following testprogram is checking all methods. And the LIBC++ tests are now correctly compiled and running (about 1000 of 5800 failed before).

#include <stdio.h>

int teq(double a, double b)
{
    printf("%lf == %lf\n",a,b);
    if (a == b)
    {
      printf("equal\n");
      return 1;
    }
    printf("!equal\n");
    return 0;
}

int tne(double a, double b)
{
    printf("%lf != %lf\n",a,b);
    if (a != b)
    {
      printf("notequal\n");
      return 1;
    }
    printf("!notequal\n");
    return 0;
}
int tgt(double a, double b)
{
    printf("%lf > %lf\n",a,b);
    if (a > b)
    {
      printf("greater than\n");
      return 1;
    }
    printf("!greater than\n");
    return 0;
}
int tge(double a, double b)
{
    printf("%lf >= %lf\n",a,b);
    if (a >= b)
    {
      printf("greater equal\n");
      return 1;
    }
    printf("!greater equal\n");
    return 0;
}
int tlt(double a, double b)
{
    printf("%lf < %lf\n",a,b);
    if (a < b)
    {
      printf("less than\n");
      return 1;
    }
    printf("!less than\n");
    return 0;
}

int tle(double a, double b)
{
    printf("%lf <= %lf\n",a,b);
    if (a <= b)
    {
      printf("less equal\n");
      return 1;
    }
    printf("!less equal\n");
    return 0;
}

int main()
{
    teq(5.5,5.5);
    teq(5.5,5.6);
        
    tne(5.5,5.6);
    tne(5.5,5.5);
    
    tgt(5.5,5.6);
    tgt(5.5,5.5);
    tgt(5.5,5.4);
    
    tge(5.5,5.6);
    tge(5.5,5.5);
    tge(5.5,5.4);
    
    tlt(5.5,5.6);
    tlt(5.5,5.5);
    tlt(5.5,5.4);
    
    tle(5.5,5.6);
    tle(5.5,5.5);
    tle(5.5,5.4);
        
    return 0;
}

The result is:

$ eq
5.500000 == 5.500000
equal
5.500000 == 5.600000
!equal
5.500000 != 5.600000
notequal
5.500000 != 5.500000
!notequal
5.500000 > 5.600000
!greater than
5.500000 > 5.500000
!greater than
5.500000 > 5.400000
greater than
5.500000 >= 5.600000
!greater equal
5.500000 >= 5.500000
greater equal
5.500000 >= 5.400000
greater equal
5.500000 < 5.600000
less than
5.500000 < 5.500000
!less than
5.500000 < 5.400000
!less than
5.500000 <= 5.600000
less equal
5.500000 <= 5.500000
less equal
5.500000 <= 5.400000
!less equal

I'm not sure if this fix is good for D54583 or if you like to create a new one.

Best regards, Kei

@kthomsen can you create a new revision just for that diff?

lib/Target/PowerPC/PPCISelLowering.cpp
3165 ↗	(On Diff #182715)	This is often spelled out in a macro as nitems(), or sizeofArray() (or similar).

@jhibbits I don't know how to create a new revision here. My idea is to handle this fix via you, as you are already known for the SPE modifications.

In D54583#1445579, @kthomsen wrote:

@jhibbits I don't know how to create a new revision here. My idea is to handle this fix via you, as you are already known for the SPE modifications.

Fair enough, I'll create a new revision tonight for this change.

LGTM. The remaining comments are stylistic nits that can be addressed on the commit and do not require another round of review. Thank you for your patience with this review.

lib/Target/PowerPC/PPCISelLowering.cpp
3215 ↗	(On Diff #188756)	I don't really understand why this is handled differently than the parameters in terms of how the allocation is done. It is perhaps a good indicator that a comment explaining the difference would be useful.
3578 ↗	(On Diff #188756)	I think it would be good to keep using the Hi/Lo naming here as well and make it very obvious that you're adding two live-in registers and adding copies from them into virtual regs. Something along the lines of: if (VA.getLocVT() == MVT::f64 && Subtarget.hasSPE()) { assert(i + 1 < e && "No second half of double precision argument"); unsigned RegLo = MF.addLiveIn(VA.getLocReg(), RC); unsigned RegHi = MF.addLiveIn(ArgLocs[++i].getLocReg(), RC); SDValue ArgValueLo = DAG.getCopyFromReg(Chain, dl, RegLo, MVT::i32); SDValue ArgValueHi = DAG.getCopyFromReg(Chain, dl, RegHi, MVT::i32); if (!Subtarget.isLittleEndian()) std::swap (ArgValueLo, ArgValueHi); ArgValue = DAG.getNode(PPCISD::BUILD_SPE64, dl, MVT::f64, ArgValueLo, ArgValueHi); } else { // existing code ... }
5545 ↗	(On Diff #188756)	I think it would be nice to add a comment explaining what the induction variables track (correct the below if it's wrong): i - Tracks the index into the list of registers allocated for the call RealArgIdx - Tracks the index into the list of actual function arguments j - Tracks the index into the list of byval arguments
5600 ↗	(On Diff #188756)	Minor nit: this doesn't really match the naming convention. Perhaps `IsLE`?
6756 ↗	(On Diff #188756)	Similar comment for this code as above regarding induction variables and the naming of `isLittleEndian`.
lib/Target/PowerPC/PPCISelLowering.h
1114 ↗	(On Diff #188756)	This does not appear to be used. Perhaps an artifact remaining from a previous revision?
lib/Target/PowerPC/PPCInstrInfo.td
236 ↗	(On Diff #188756)	I think all the types have to be exact - `i32` inputs, `f64` output. Might as well make that explicit: `[SDTCisVT<0, f64>, SDTCisVT<1, i32>, SDTCisVT<2, i32>]` Similarly below.

Forgot to select Accept Revision from the pulldown.

This revision is now accepted and ready to land.Mar 31 2019, 3:09 AM

@jhibbits @kthomsen Sorry for the delay, I checked the change in PPCISelDAGToDAG.cpp, and it indeed fixes the issue. So far I ran into no other bugs and have no objection of this getting merged.

jhibbits marked 7 inline comments as done.Apr 2 2019, 1:12 PM

jhibbits added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
3215 ↗	(On Diff #188756)	Yeah, they're identical, but on second look they shouldn't be. Return values can only be in R3 and R4, so they should be identical in every way except the register list.
lib/Target/PowerPC/PPCISelLowering.h
1114 ↗	(On Diff #188756)	Yes, it's an artifact.

Closed by commit rL363526: PowerPC: Optimize SPE double parameter calling setup (authored by jhibbits). · Explain WhyJun 16 2019, 8:14 PM

This revision was automatically updated to reflect the committed changes.

jhibbits marked 2 inline comments as done.

Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2019, 8:15 PM

kthomsen mentioned this in D54409: PowerPC/SPE: Fix load/store handling for SPE.Jul 1 2019, 1:04 AM

jhibbits mentioned this in D69483: [PowerPC]: Fix predicate handling with SPE.Dec 13 2019, 8:18 AM

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

PowerPC/

54 lines

7 lines

17 lines

119 lines

12 lines

12 lines

test/

CodeGen/

PowerPC/

spe.ll

8 lines

Diff 204989

llvm/trunk/lib/Target/PowerPC/PPCCallingConv.cpp

Show First 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	static bool CC_PPC32_SVR4_Custom_AlignFPArgRegs(unsigned &ValNo, MVT &ValVT,

// Always return false here, as this function only makes sure that the two f64		// Always return false here, as this function only makes sure that the two f64
// values a ppc_fp128 value is split into are both passed in registers or both		// values a ppc_fp128 value is split into are both passed in registers or both
// passed on the stack and does not actually allocate a register for the		// passed on the stack and does not actually allocate a register for the
// current argument.		// current argument.
return false;		return false;
}		}

		// Split F64 arguments into two 32-bit consecutive registers.
		static bool CC_PPC32_SPE_CustomSplitFP64(unsigned &ValNo, MVT &ValVT,
		MVT &LocVT,
		CCValAssign::LocInfo &LocInfo,
		ISD::ArgFlagsTy &ArgFlags,
		CCState &State) {
		static const MCPhysReg HiRegList[] = { PPC::R3, PPC::R5, PPC::R7, PPC::R9 };
		static const MCPhysReg LoRegList[] = { PPC::R4, PPC::R6, PPC::R8, PPC::R10 };

		// Try to get the first register.
		unsigned Reg = State.AllocateReg(HiRegList);
		if (!Reg)
		return false;

		unsigned i;
		for (i = 0; i < sizeof(HiRegList) / sizeof(HiRegList[0]); ++i)
		if (HiRegList[i] == Reg)
		break;

		unsigned T = State.AllocateReg(LoRegList[i]);
		(void)T;
		assert(T == LoRegList[i] && "Could not allocate register");

		State.addLoc(CCValAssign::getCustomReg(ValNo, ValVT, Reg, LocVT, LocInfo));
		State.addLoc(CCValAssign::getCustomReg(ValNo, ValVT, LoRegList[i],
		LocVT, LocInfo));
		return true;
		}

		// Same as above, but for return values, so only allocate for R3 and R4
		static bool CC_PPC32_SPE_RetF64(unsigned &ValNo, MVT &ValVT,
		MVT &LocVT,
		CCValAssign::LocInfo &LocInfo,
		ISD::ArgFlagsTy &ArgFlags,
		CCState &State) {
		static const MCPhysReg HiRegList[] = { PPC::R3 };
		static const MCPhysReg LoRegList[] = { PPC::R4 };

		// Try to get the first register.
		unsigned Reg = State.AllocateReg(HiRegList, LoRegList);
		if (!Reg)
		return false;

		unsigned i;
		for (i = 0; i < sizeof(HiRegList) / sizeof(HiRegList[0]); ++i)
		if (HiRegList[i] == Reg)
		break;

		State.addLoc(CCValAssign::getCustomReg(ValNo, ValVT, Reg, LocVT, LocInfo));
		State.addLoc(CCValAssign::getCustomReg(ValNo, ValVT, LoRegList[i],
		LocVT, LocInfo));
		return true;
		}

#include "PPCGenCallingConv.inc"		#include "PPCGenCallingConv.inc"

llvm/trunk/lib/Target/PowerPC/PPCCallingConv.td

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Lines	def RetCC_PPC : CallingConv<[
// only the ELFv2 ABI fully utilizes all these registers.		// only the ELFv2 ABI fully utilizes all these registers.
CCIfNotSubtarget<"hasSPE()",		CCIfNotSubtarget<"hasSPE()",
CCIfType<[f32], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>>,		CCIfType<[f32], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>>,
CCIfNotSubtarget<"hasSPE()",		CCIfNotSubtarget<"hasSPE()",
CCIfType<[f64], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>>,		CCIfType<[f64], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>>,
CCIfSubtarget<"hasSPE()",		CCIfSubtarget<"hasSPE()",
CCIfType<[f32], CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>>,		CCIfType<[f32], CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>>,
CCIfSubtarget<"hasSPE()",		CCIfSubtarget<"hasSPE()",
CCIfType<[f64], CCAssignToReg<[S3, S4, S5, S6, S7, S8, S9, S10]>>>,		CCIfType<[f64], CCCustom<"CC_PPC32_SPE_RetF64">>>,

// For P9, f128 are passed in vector registers.		// For P9, f128 are passed in vector registers.
CCIfType<[f128],		CCIfType<[f128],
CCIfSubtarget<"hasP9Vector()",		CCIfSubtarget<"hasP9Vector()",
CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>,		CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>,

// QPX vectors are returned in QF1 and QF2.		// QPX vectors are returned in QF1 and QF2.
CCIfType<[v4f64, v4f32, v4i1],		CCIfType<[v4f64, v4f32, v4i1],
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	def CC_PPC32_SVR4_Common : CallingConv<[
CCIfType<[i32],		CCIfType<[i32],
CCIfSplit<CCIfSubtarget<"useSoftFloat()",		CCIfSplit<CCIfSubtarget<"useSoftFloat()",
CCIfOrigArgWasNotPPCF128<		CCIfOrigArgWasNotPPCF128<
CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>>>,		CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>>>,

CCIfType<[i32],		CCIfType<[i32],
CCIfSplit<CCIfNotSubtarget<"useSoftFloat()",		CCIfSplit<CCIfNotSubtarget<"useSoftFloat()",
CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>>,		CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>>,
		CCIfType<[f64],
		CCIfSubtarget<"hasSPE()",
		CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>,
CCIfSplit<CCIfSubtarget<"useSoftFloat()",		CCIfSplit<CCIfSubtarget<"useSoftFloat()",
CCIfOrigArgWasPPCF128<CCCustom<		CCIfOrigArgWasPPCF128<CCCustom<
"CC_PPC32_SVR4_Custom_SkipLastArgRegsPPCF128">>>>,		"CC_PPC32_SVR4_Custom_SkipLastArgRegsPPCF128">>>>,

// The 'nest' parameter, if any, is passed in R11.		// The 'nest' parameter, if any, is passed in R11.
CCIfNest<CCAssignToReg<[R11]>>,		CCIfNest<CCAssignToReg<[R11]>>,

// The first 8 integer arguments are passed in integer registers.		// The first 8 integer arguments are passed in integer registers.
CCIfType<[i32], CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>,		CCIfType<[i32], CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>,

// Make sure the i64 words from a long double are either both passed in		// Make sure the i64 words from a long double are either both passed in
// registers or both passed on the stack.		// registers or both passed on the stack.
CCIfType<[f64], CCIfSplit<CCCustom<"CC_PPC32_SVR4_Custom_AlignFPArgRegs">>>,		CCIfType<[f64], CCIfSplit<CCCustom<"CC_PPC32_SVR4_Custom_AlignFPArgRegs">>>,

// FP values are passed in F1 - F8.		// FP values are passed in F1 - F8.
CCIfType<[f32, f64],		CCIfType<[f32, f64],
CCIfNotSubtarget<"hasSPE()",		CCIfNotSubtarget<"hasSPE()",
CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>>,		CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>>,
CCIfType<[f64],		CCIfType<[f64],
CCIfSubtarget<"hasSPE()",		CCIfSubtarget<"hasSPE()",
CCAssignToReg<[S3, S4, S5, S6, S7, S8, S9, S10]>>>,		CCCustom<"CC_PPC32_SPE_CustomSplitFP64">>>,
CCIfType<[f32],		CCIfType<[f32],
CCIfSubtarget<"hasSPE()",		CCIfSubtarget<"hasSPE()",
CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>>,		CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>>,

// Split arguments have an alignment of 8 bytes on the stack.		// Split arguments have an alignment of 8 bytes on the stack.
CCIfType<[i32], CCIfSplit<CCAssignToStack<4, 8>>>,		CCIfType<[i32], CCIfSplit<CCAssignToStack<4, 8>>>,

CCIfType<[i32], CCAssignToStack<4, 4>>,		CCIfType<[i32], CCAssignToStack<4, 4>>,
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.h

Show First 20 Lines • Show All 188 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
MTVSRA,		MTVSRA,

/// Direct move from a GPR to a VSX register (zero)		/// Direct move from a GPR to a VSX register (zero)
MTVSRZ,		MTVSRZ,

/// Direct move of 2 consecutive GPR to a VSX register.		/// Direct move of 2 consecutive GPR to a VSX register.
BUILD_FP128,		BUILD_FP128,

		/// BUILD_SPE64 and EXTRACT_SPE are analogous to BUILD_PAIR and
		/// EXTRACT_ELEMENT but take f64 arguments instead of i64, as i64 is
		/// unsupported for this target.
		/// Merge 2 GPRs to a single SPE register.
		BUILD_SPE64,

		/// Extract SPE register component, second argument is high or low.
		EXTRACT_SPE,

/// Extract a subvector from signed integer vector and convert to FP.		/// Extract a subvector from signed integer vector and convert to FP.
/// It is primarily used to convert a (widened) illegal integer vector		/// It is primarily used to convert a (widened) illegal integer vector
/// type to a legal floating point vector type.		/// type to a legal floating point vector type.
/// For example v2i32 -> widened to v4i32 -> v2f64		/// For example v2i32 -> widened to v4i32 -> v2f64
SINT_VEC_TO_FP,		SINT_VEC_TO_FP,

/// Extract a subvector from unsigned integer vector and convert to FP.		/// Extract a subvector from unsigned integer vector and convert to FP.
/// As with SINT_VEC_TO_FP, used for converting illegal types.		/// As with SINT_VEC_TO_FP, used for converting illegal types.
▲ Show 20 Lines • Show All 698 Lines • ▼ Show 20 Lines	public:
unsigned getJumpTableEncoding() const override;		unsigned getJumpTableEncoding() const override;
bool isJumpTableRelative() const override;		bool isJumpTableRelative() const override;
SDValue getPICJumpTableRelocBase(SDValue Table,		SDValue getPICJumpTableRelocBase(SDValue Table,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;
const MCExpr getPICJumpTableRelocBaseExpr(const MachineFunction MF,		const MCExpr getPICJumpTableRelocBaseExpr(const MachineFunction MF,
unsigned JTI,		unsigned JTI,
MCContext &Ctx) const override;		MCContext &Ctx) const override;

unsigned getNumRegistersForCallingConv(LLVMContext &Context,
CallingConv:: ID CC,
EVT VT) const override;

MVT getRegisterTypeForCallingConv(LLVMContext &Context,
CallingConv:: ID CC,
EVT VT) const override;

private:		private:
struct ReuseLoadInfo {		struct ReuseLoadInfo {
SDValue Ptr;		SDValue Ptr;
SDValue Chain;		SDValue Chain;
SDValue ResChain;		SDValue ResChain;
MachinePointerInfo MPI;		MachinePointerInfo MPI;
bool IsDereferenceable = false;		bool IsDereferenceable = false;
bool IsInvariant = false;		bool IsInvariant = false;
▲ Show 20 Lines • Show All 272 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,263 Lines • ▼ Show 20 Lines	unsigned PPCTargetLowering::getByValTypeAlignment(Type *Ty,
// 16byte and wider vectors are passed on 16byte boundary.		// 16byte and wider vectors are passed on 16byte boundary.
// The rest is 8 on PPC64 and 4 on PPC32 boundary.		// The rest is 8 on PPC64 and 4 on PPC32 boundary.
unsigned Align = Subtarget.isPPC64() ? 8 : 4;		unsigned Align = Subtarget.isPPC64() ? 8 : 4;
if (Subtarget.hasAltivec() \|\| Subtarget.hasQPX())		if (Subtarget.hasAltivec() \|\| Subtarget.hasQPX())
getMaxByValAlign(Ty, Align, Subtarget.hasQPX() ? 32 : 16);		getMaxByValAlign(Ty, Align, Subtarget.hasQPX() ? 32 : 16);
return Align;		return Align;
}		}

unsigned PPCTargetLowering::getNumRegistersForCallingConv(LLVMContext &Context,
CallingConv:: ID CC,
EVT VT) const {
if (Subtarget.hasSPE() && VT == MVT::f64)
return 2;
return PPCTargetLowering::getNumRegisters(Context, VT);
}

MVT PPCTargetLowering::getRegisterTypeForCallingConv(LLVMContext &Context,
CallingConv:: ID CC,
EVT VT) const {
if (Subtarget.hasSPE() && VT == MVT::f64)
return MVT::i32;
return PPCTargetLowering::getRegisterType(Context, VT);
}

bool PPCTargetLowering::useSoftFloat() const {		bool PPCTargetLowering::useSoftFloat() const {
return Subtarget.useSoftFloat();		return Subtarget.useSoftFloat();
}		}

bool PPCTargetLowering::hasSPE() const {		bool PPCTargetLowering::hasSPE() const {
return Subtarget.hasSPE();		return Subtarget.hasSPE();
}		}

▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::VABSD: return "PPCISD::VABSD";		case PPCISD::VABSD: return "PPCISD::VABSD";
case PPCISD::QVFPERM: return "PPCISD::QVFPERM";		case PPCISD::QVFPERM: return "PPCISD::QVFPERM";
case PPCISD::QVGPCI: return "PPCISD::QVGPCI";		case PPCISD::QVGPCI: return "PPCISD::QVGPCI";
case PPCISD::QVALIGNI: return "PPCISD::QVALIGNI";		case PPCISD::QVALIGNI: return "PPCISD::QVALIGNI";
case PPCISD::QVESPLATI: return "PPCISD::QVESPLATI";		case PPCISD::QVESPLATI: return "PPCISD::QVESPLATI";
case PPCISD::QBFLT: return "PPCISD::QBFLT";		case PPCISD::QBFLT: return "PPCISD::QBFLT";
case PPCISD::QVLFSb: return "PPCISD::QVLFSb";		case PPCISD::QVLFSb: return "PPCISD::QVLFSb";
case PPCISD::BUILD_FP128: return "PPCISD::BUILD_FP128";		case PPCISD::BUILD_FP128: return "PPCISD::BUILD_FP128";
		case PPCISD::BUILD_SPE64: return "PPCISD::BUILD_SPE64";
		case PPCISD::EXTRACT_SPE: return "PPCISD::EXTRACT_SPE";
case PPCISD::EXTSWSLI: return "PPCISD::EXTSWSLI";		case PPCISD::EXTSWSLI: return "PPCISD::EXTSWSLI";
case PPCISD::LD_VSX_LH: return "PPCISD::LD_VSX_LH";		case PPCISD::LD_VSX_LH: return "PPCISD::LD_VSX_LH";
case PPCISD::FP_EXTEND_LH: return "PPCISD::FP_EXTEND_LH";		case PPCISD::FP_EXTEND_LH: return "PPCISD::FP_EXTEND_LH";
}		}
return nullptr;		return nullptr;
}		}

EVT PPCTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &C,		EVT PPCTargetLowering::getSetCCResultType(const DataLayout &DL, LLVMContext &C,
▲ Show 20 Lines • Show All 2,009 Lines • ▼ Show 20 Lines	SDValue PPCTargetLowering::LowerFormalArguments_32SVR4(
// Assign locations to all of the incoming arguments.		// Assign locations to all of the incoming arguments.
SmallVector<CCValAssign, 16> ArgLocs;		SmallVector<CCValAssign, 16> ArgLocs;
PPCCCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), ArgLocs,		PPCCCState CCInfo(CallConv, isVarArg, DAG.getMachineFunction(), ArgLocs,
*DAG.getContext());		*DAG.getContext());

// Reserve space for the linkage area on the stack.		// Reserve space for the linkage area on the stack.
unsigned LinkageSize = Subtarget.getFrameLowering()->getLinkageSize();		unsigned LinkageSize = Subtarget.getFrameLowering()->getLinkageSize();
CCInfo.AllocateStack(LinkageSize, PtrByteSize);		CCInfo.AllocateStack(LinkageSize, PtrByteSize);
if (useSoftFloat() \|\| hasSPE())		if (useSoftFloat())
CCInfo.PreAnalyzeFormalArguments(Ins);		CCInfo.PreAnalyzeFormalArguments(Ins);

CCInfo.AnalyzeFormalArguments(Ins, CC_PPC32_SVR4);		CCInfo.AnalyzeFormalArguments(Ins, CC_PPC32_SVR4);
CCInfo.clearWasPPCF128();		CCInfo.clearWasPPCF128();

for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {		for (unsigned i = 0, e = ArgLocs.size(); i != e; ++i) {
CCValAssign &VA = ArgLocs[i];		CCValAssign &VA = ArgLocs[i];

Show All 16 Lines	if (VA.isRegLoc()) {
RC = &PPC::SPE4RCRegClass;		RC = &PPC::SPE4RCRegClass;
else		else
RC = &PPC::F4RCRegClass;		RC = &PPC::F4RCRegClass;
break;		break;
case MVT::f64:		case MVT::f64:
if (Subtarget.hasVSX())		if (Subtarget.hasVSX())
RC = &PPC::VSFRCRegClass;		RC = &PPC::VSFRCRegClass;
else if (Subtarget.hasSPE())		else if (Subtarget.hasSPE())
RC = &PPC::SPERCRegClass;		// SPE passes doubles in GPR pairs.
		RC = &PPC::GPRCRegClass;
else		else
RC = &PPC::F8RCRegClass;		RC = &PPC::F8RCRegClass;
break;		break;
case MVT::v16i8:		case MVT::v16i8:
case MVT::v8i16:		case MVT::v8i16:
case MVT::v4i32:		case MVT::v4i32:
RC = &PPC::VRRCRegClass;		RC = &PPC::VRRCRegClass;
break;		break;
case MVT::v4f32:		case MVT::v4f32:
RC = Subtarget.hasQPX() ? &PPC::QSRCRegClass : &PPC::VRRCRegClass;		RC = Subtarget.hasQPX() ? &PPC::QSRCRegClass : &PPC::VRRCRegClass;
break;		break;
case MVT::v2f64:		case MVT::v2f64:
case MVT::v2i64:		case MVT::v2i64:
RC = &PPC::VRRCRegClass;		RC = &PPC::VRRCRegClass;
break;		break;
case MVT::v4f64:		case MVT::v4f64:
RC = &PPC::QFRCRegClass;		RC = &PPC::QFRCRegClass;
break;		break;
case MVT::v4i1:		case MVT::v4i1:
RC = &PPC::QBRCRegClass;		RC = &PPC::QBRCRegClass;
break;		break;
}		}

// Transform the arguments stored in physical registers into virtual ones.		SDValue ArgValue;
		// Transform the arguments stored in physical registers into
		// virtual ones.
		if (VA.getLocVT() == MVT::f64 && Subtarget.hasSPE()) {
		assert(i + 1 < e && "No second half of double precision argument");
		unsigned RegLo = MF.addLiveIn(VA.getLocReg(), RC);
		unsigned RegHi = MF.addLiveIn(ArgLocs[++i].getLocReg(), RC);
		SDValue ArgValueLo = DAG.getCopyFromReg(Chain, dl, RegLo, MVT::i32);
		SDValue ArgValueHi = DAG.getCopyFromReg(Chain, dl, RegHi, MVT::i32);
		if (!Subtarget.isLittleEndian())
		std::swap (ArgValueLo, ArgValueHi);
		ArgValue = DAG.getNode(PPCISD::BUILD_SPE64, dl, MVT::f64, ArgValueLo,
		ArgValueHi);
		} else {
unsigned Reg = MF.addLiveIn(VA.getLocReg(), RC);		unsigned Reg = MF.addLiveIn(VA.getLocReg(), RC);
SDValue ArgValue = DAG.getCopyFromReg(Chain, dl, Reg,		ArgValue = DAG.getCopyFromReg(Chain, dl, Reg,
ValVT == MVT::i1 ? MVT::i32 : ValVT);		ValVT == MVT::i1 ? MVT::i32 : ValVT);

if (ValVT == MVT::i1)		if (ValVT == MVT::i1)
ArgValue = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, ArgValue);		ArgValue = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, ArgValue);
		}

InVals.push_back(ArgValue);		InVals.push_back(ArgValue);
} else {		} else {
// Argument stored in memory.		// Argument stored in memory.
assert(VA.isMemLoc());		assert(VA.isMemLoc());

// Get the extended size of the argument type in stack		// Get the extended size of the argument type in stack
unsigned ArgSize = VA.getLocVT().getStoreSize();		unsigned ArgSize = VA.getLocVT().getStoreSize();
▲ Show 20 Lines • Show All 1,628 Lines • ▼ Show 20 Lines	CCRetInfo.AnalyzeCallResult(
? RetCC_PPC_Cold		? RetCC_PPC_Cold
: RetCC_PPC);		: RetCC_PPC);

// Copy all of the result registers out of their specified physreg.		// Copy all of the result registers out of their specified physreg.
for (unsigned i = 0, e = RVLocs.size(); i != e; ++i) {		for (unsigned i = 0, e = RVLocs.size(); i != e; ++i) {
CCValAssign &VA = RVLocs[i];		CCValAssign &VA = RVLocs[i];
assert(VA.isRegLoc() && "Can only return in registers!");		assert(VA.isRegLoc() && "Can only return in registers!");

SDValue Val = DAG.getCopyFromReg(Chain, dl,		SDValue Val;

		if (Subtarget.hasSPE() && VA.getLocVT() == MVT::f64) {
		SDValue Lo = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), MVT::i32,
		InFlag);
		Chain = Lo.getValue(1);
		InFlag = Lo.getValue(2);
		VA = RVLocs[++i]; // skip ahead to next loc
		SDValue Hi = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), MVT::i32,
		InFlag);
		Chain = Hi.getValue(1);
		InFlag = Hi.getValue(2);
		if (!Subtarget.isLittleEndian())
		std::swap (Lo, Hi);
		Val = DAG.getNode(PPCISD::BUILD_SPE64, dl, MVT::f64, Lo, Hi);
		} else {
		Val = DAG.getCopyFromReg(Chain, dl,
VA.getLocReg(), VA.getLocVT(), InFlag);		VA.getLocReg(), VA.getLocVT(), InFlag);
Chain = Val.getValue(1);		Chain = Val.getValue(1);
InFlag = Val.getValue(2);		InFlag = Val.getValue(2);
		}

switch (VA.getLocInfo()) {		switch (VA.getLocInfo()) {
default: llvm_unreachable("Unknown loc info!");		default: llvm_unreachable("Unknown loc info!");
case CCValAssign::Full: break;		case CCValAssign::Full: break;
case CCValAssign::AExt:		case CCValAssign::AExt:
Val = DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), Val);		Val = DAG.getNode(ISD::TRUNCATE, dl, VA.getValVT(), Val);
break;		break;
case CCValAssign::ZExt:		case CCValAssign::ZExt:
▲ Show 20 Lines • Show All 304 Lines • ▼ Show 20 Lines	#endif
SDValue StackPtr = DAG.getRegister(PPC::R1, MVT::i32);		SDValue StackPtr = DAG.getRegister(PPC::R1, MVT::i32);

SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPass;		SmallVector<std::pair<unsigned, SDValue>, 8> RegsToPass;
SmallVector<TailCallArgumentInfo, 8> TailCallArguments;		SmallVector<TailCallArgumentInfo, 8> TailCallArguments;
SmallVector<SDValue, 8> MemOpChains;		SmallVector<SDValue, 8> MemOpChains;

bool seenFloatArg = false;		bool seenFloatArg = false;
// Walk the register/memloc assignments, inserting copies/loads.		// Walk the register/memloc assignments, inserting copies/loads.
for (unsigned i = 0, j = 0, e = ArgLocs.size();		// i - Tracks the index into the list of registers allocated for the call
		// RealArgIdx - Tracks the index into the list of actual function arguments
		// j - Tracks the index into the list of byval arguments
		for (unsigned i = 0, RealArgIdx = 0, j = 0, e = ArgLocs.size();
i != e;		i != e;
++i) {		++i, ++RealArgIdx) {
CCValAssign &VA = ArgLocs[i];		CCValAssign &VA = ArgLocs[i];
SDValue Arg = OutVals[i];		SDValue Arg = OutVals[RealArgIdx];
ISD::ArgFlagsTy Flags = Outs[i].Flags;		ISD::ArgFlagsTy Flags = Outs[RealArgIdx].Flags;

if (Flags.isByVal()) {		if (Flags.isByVal()) {
// Argument is an aggregate which is passed by value, thus we need to		// Argument is an aggregate which is passed by value, thus we need to
// create a copy of it in the local variable space of the current stack		// create a copy of it in the local variable space of the current stack
// frame (which is the stack frame of the caller) and pass the address of		// frame (which is the stack frame of the caller) and pass the address of
// this copy to the callee.		// this copy to the callee.
assert((j < ByValArgLocs.size()) && "Index out of bounds!");		assert((j < ByValArgLocs.size()) && "Index out of bounds!");
CCValAssign &ByValVA = ByValArgLocs[j++];		CCValAssign &ByValVA = ByValArgLocs[j++];
Show All 32 Lines	for (unsigned i = 0, RealArgIdx = 0, j = 0, e = ArgLocs.size();
// Extend i1 and ensure callee will get i32.		// Extend i1 and ensure callee will get i32.
if (Arg.getValueType() == MVT::i1)		if (Arg.getValueType() == MVT::i1)
Arg = DAG.getNode(Flags.isSExt() ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND,		Arg = DAG.getNode(Flags.isSExt() ? ISD::SIGN_EXTEND : ISD::ZERO_EXTEND,
dl, MVT::i32, Arg);		dl, MVT::i32, Arg);

if (VA.isRegLoc()) {		if (VA.isRegLoc()) {
seenFloatArg \|= VA.getLocVT().isFloatingPoint();		seenFloatArg \|= VA.getLocVT().isFloatingPoint();
// Put argument in a physical register.		// Put argument in a physical register.
		if (Subtarget.hasSPE() && Arg.getValueType() == MVT::f64) {
		bool IsLE = Subtarget.isLittleEndian();
		SDValue SVal = DAG.getNode(PPCISD::EXTRACT_SPE, dl, MVT::i32, Arg,
		DAG.getIntPtrConstant(IsLE ? 0 : 1, dl));
		RegsToPass.push_back(std::make_pair(VA.getLocReg(), SVal.getValue(0)));
		SVal = DAG.getNode(PPCISD::EXTRACT_SPE, dl, MVT::i32, Arg,
		DAG.getIntPtrConstant(IsLE ? 1 : 0, dl));
		RegsToPass.push_back(std::make_pair(ArgLocs[++i].getLocReg(),
		SVal.getValue(0)));
		} else
RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));		RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));
} else {		} else {
// Put argument in the parameter list area of the current stack frame.		// Put argument in the parameter list area of the current stack frame.
assert(VA.isMemLoc());		assert(VA.isMemLoc());
unsigned LocMemOffset = VA.getLocMemOffset();		unsigned LocMemOffset = VA.getLocMemOffset();

if (!isTailCall) {		if (!isTailCall) {
SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl);		SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset, dl);
PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(MF.getDataLayout()),		PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(MF.getDataLayout()),
▲ Show 20 Lines • Show All 1,251 Lines • ▼ Show 20 Lines	CCInfo.AnalyzeReturn(Outs,
(Subtarget.isSVR4ABI() && CallConv == CallingConv::Cold)		(Subtarget.isSVR4ABI() && CallConv == CallingConv::Cold)
? RetCC_PPC_Cold		? RetCC_PPC_Cold
: RetCC_PPC);		: RetCC_PPC);

SDValue Flag;		SDValue Flag;
SmallVector<SDValue, 4> RetOps(1, Chain);		SmallVector<SDValue, 4> RetOps(1, Chain);

// Copy the result values into the output registers.		// Copy the result values into the output registers.
for (unsigned i = 0; i != RVLocs.size(); ++i) {		for (unsigned i = 0, RealResIdx = 0; i != RVLocs.size(); ++i, ++RealResIdx) {
CCValAssign &VA = RVLocs[i];		CCValAssign &VA = RVLocs[i];
assert(VA.isRegLoc() && "Can only return in registers!");		assert(VA.isRegLoc() && "Can only return in registers!");

SDValue Arg = OutVals[i];		SDValue Arg = OutVals[RealResIdx];

switch (VA.getLocInfo()) {		switch (VA.getLocInfo()) {
default: llvm_unreachable("Unknown loc info!");		default: llvm_unreachable("Unknown loc info!");
case CCValAssign::Full: break;		case CCValAssign::Full: break;
case CCValAssign::AExt:		case CCValAssign::AExt:
Arg = DAG.getNode(ISD::ANY_EXTEND, dl, VA.getLocVT(), Arg);		Arg = DAG.getNode(ISD::ANY_EXTEND, dl, VA.getLocVT(), Arg);
break;		break;
case CCValAssign::ZExt:		case CCValAssign::ZExt:
Arg = DAG.getNode(ISD::ZERO_EXTEND, dl, VA.getLocVT(), Arg);		Arg = DAG.getNode(ISD::ZERO_EXTEND, dl, VA.getLocVT(), Arg);
break;		break;
case CCValAssign::SExt:		case CCValAssign::SExt:
Arg = DAG.getNode(ISD::SIGN_EXTEND, dl, VA.getLocVT(), Arg);		Arg = DAG.getNode(ISD::SIGN_EXTEND, dl, VA.getLocVT(), Arg);
break;		break;
}		}
		if (Subtarget.hasSPE() && VA.getLocVT() == MVT::f64) {
		bool isLittleEndian = Subtarget.isLittleEndian();
		// Legalize ret f64 -> ret 2 x i32.
		SDValue SVal =
		DAG.getNode(PPCISD::EXTRACT_SPE, dl, MVT::i32, Arg,
		DAG.getIntPtrConstant(isLittleEndian ? 0 : 1, dl));
		Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), SVal, Flag);
		RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
		SVal = DAG.getNode(PPCISD::EXTRACT_SPE, dl, MVT::i32, Arg,
		DAG.getIntPtrConstant(isLittleEndian ? 1 : 0, dl));
		Flag = Chain.getValue(1);
		VA = RVLocs[++i]; // skip ahead to next loc
		Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), SVal, Flag);
		} else
Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), Arg, Flag);		Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), Arg, Flag);
Flag = Chain.getValue(1);		Flag = Chain.getValue(1);
RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));		RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
}		}

const PPCRegisterInfo *TRI = Subtarget.getRegisterInfo();		const PPCRegisterInfo *TRI = Subtarget.getRegisterInfo();
const MCPhysReg *I =		const MCPhysReg *I =
TRI->getCalleeSavedRegsViaCopy(&DAG.getMachineFunction());		TRI->getCalleeSavedRegsViaCopy(&DAG.getMachineFunction());
if (I) {		if (I) {
▲ Show 20 Lines • Show All 8,426 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrInfo.td

	Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines

	// Move 2 i64 values into a VSX register			// Move 2 i64 values into a VSX register
	def PPCbuild_fp128: SDNode<"PPCISD::BUILD_FP128",			def PPCbuild_fp128: SDNode<"PPCISD::BUILD_FP128",
	SDTypeProfile<1, 2,			SDTypeProfile<1, 2,
	[SDTCisFP<0>, SDTCisSameSizeAs<1,2>,			[SDTCisFP<0>, SDTCisSameSizeAs<1,2>,
	SDTCisSameAs<1,2>]>,			SDTCisSameAs<1,2>]>,
	[]>;			[]>;

				def PPCbuild_spe64: SDNode<"PPCISD::BUILD_SPE64",
				SDTypeProfile<1, 2,
				[SDTCisVT<0, f64>, SDTCisVT<1,i32>,
				SDTCisVT<1,i32>]>,
				[]>;

				def PPCextract_spe : SDNode<"PPCISD::EXTRACT_SPE",
				SDTypeProfile<1, 2,
				[SDTCisVT<0, i32>, SDTCisVT<1, f64>,
				SDTCisPtrTy<2>]>,
				[]>;

	// These are target-independent nodes, but have target-specific formats.			// These are target-independent nodes, but have target-specific formats.
	def callseq_start : SDNode<"ISD::CALLSEQ_START", SDT_PPCCallSeqStart,			def callseq_start : SDNode<"ISD::CALLSEQ_START", SDT_PPCCallSeqStart,
	[SDNPHasChain, SDNPOutGlue]>;			[SDNPHasChain, SDNPOutGlue]>;
	def callseq_end : SDNode<"ISD::CALLSEQ_END", SDT_PPCCallSeqEnd,			def callseq_end : SDNode<"ISD::CALLSEQ_END", SDT_PPCCallSeqEnd,
	[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;			[SDNPHasChain, SDNPOptInGlue, SDNPOutGlue]>;

	def SDT_PPCCall : SDTypeProfile<0, -1, [SDTCisInt<0>]>;			def SDT_PPCCall : SDTypeProfile<0, -1, [SDTCisInt<0>]>;
	def PPCcall : SDNode<"PPCISD::CALL", SDT_PPCCall,			def PPCcall : SDNode<"PPCISD::CALL", SDT_PPCCall,
	▲ Show 20 Lines • Show All 4,768 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/PowerPC/PPCInstrSPE.td

Show First 20 Lines • Show All 505 Lines • ▼ Show 20 Lines	def EVLWHSPLATX : EVXForm_1<796, (outs sperc:$RT), (ins memrr:$src),
"evlwhsplatx $RT, $src", IIC_LdStLoad, []>;		"evlwhsplatx $RT, $src", IIC_LdStLoad, []>;
def EVLWWSPLAT : EVXForm_D<793, (outs sperc:$RT), (ins spe4dis:$dst),		def EVLWWSPLAT : EVXForm_D<793, (outs sperc:$RT), (ins spe4dis:$dst),
"evlwwsplat $RT, $dst", IIC_LdStLoad, []>;		"evlwwsplat $RT, $dst", IIC_LdStLoad, []>;
def EVLWWSPLATX : EVXForm_1<792, (outs sperc:$RT), (ins memrr:$src),		def EVLWWSPLATX : EVXForm_1<792, (outs sperc:$RT), (ins memrr:$src),
"evlwwsplatx $RT, $src", IIC_LdStLoad, []>;		"evlwwsplatx $RT, $src", IIC_LdStLoad, []>;

def EVMERGEHI : EVXForm_1<556, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),		def EVMERGEHI : EVXForm_1<556, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),
"evmergehi $RT, $RA, $RB", IIC_VecGeneral, []>;		"evmergehi $RT, $RA, $RB", IIC_VecGeneral, []>;
def EVMERGELO : EVXForm_1<557, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),		def EVMERGELO : EVXForm_1<557, (outs sperc:$RT), (ins gprc:$RA, gprc:$RB),
"evmergelo $RT, $RA, $RB", IIC_VecGeneral, []>;		"evmergelo $RT, $RA, $RB", IIC_VecGeneral, []>;
def EVMERGEHILO : EVXForm_1<558, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),		def EVMERGEHILO : EVXForm_1<558, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),
"evmergehilo $RT, $RA, $RB", IIC_VecGeneral, []>;		"evmergehilo $RT, $RA, $RB", IIC_VecGeneral, []>;
def EVMERGELOHI : EVXForm_1<559, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),		def EVMERGELOHI : EVXForm_1<559, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),
"evmergelohi $RT, $RA, $RB", IIC_VecGeneral, []>;		"evmergelohi $RT, $RA, $RB", IIC_VecGeneral, []>;

def EVMHEGSMFAA : EVXForm_1<1323, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),		def EVMHEGSMFAA : EVXForm_1<1323, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),
"evmhegsmfaa $RT, $RA, $RB", IIC_VecComplex, []>;		"evmhegsmfaa $RT, $RA, $RB", IIC_VecComplex, []>;
▲ Show 20 Lines • Show All 358 Lines • ▼ Show 20 Lines
def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETUGE)),		def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETUGE)),
(SELECT_SPE (CRORC $lhs, $rhs), $tval, $fval)>;		(SELECT_SPE (CRORC $lhs, $rhs), $tval, $fval)>;
def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETGT)),		def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETGT)),
(SELECT_SPE (CRANDC $rhs, $lhs), $tval, $fval)>;		(SELECT_SPE (CRANDC $rhs, $lhs), $tval, $fval)>;
def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETUGT)),		def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETUGT)),
(SELECT_SPE (CRANDC $lhs, $rhs), $tval, $fval)>;		(SELECT_SPE (CRANDC $lhs, $rhs), $tval, $fval)>;
def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETNE)),		def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETNE)),
(SELECT_SPE (CRXOR $lhs, $rhs), $tval, $fval)>;		(SELECT_SPE (CRXOR $lhs, $rhs), $tval, $fval)>;


		def : Pat<(f64 (PPCbuild_spe64 i32:$rB, i32:$rA)),
		(f64 (COPY_TO_REGCLASS (EVMERGELO $rA, $rB), SPERC))>;

		def : Pat<(i32 (PPCextract_spe f64:$rA, 1)),
		(i32 (EXTRACT_SUBREG (EVMERGEHI $rA, $rA), sub_32))>;
		def : Pat<(i32 (PPCextract_spe f64:$rA, 0)),
		(i32 (EXTRACT_SUBREG $rA, sub_32))>;

}		}

llvm/trunk/test/CodeGen/PowerPC/spe.ll

	Show First 20 Lines • Show All 466 Lines • ▼ Show 20 Lines

	define double @test_dselect(double %a, double %b, i1 %c) {			define double @test_dselect(double %a, double %b, i1 %c) {
	entry:			entry:
	%r = select i1 %c, double %a, double %b			%r = select i1 %c, double %a, double %b
	ret double %r			ret double %r
	; CHECK-LABEL: test_dselect			; CHECK-LABEL: test_dselect
	; CHECK: andi.			; CHECK: andi.
	; CHECK: bc			; CHECK: bc
	; CHECK: evldd			; CHECK: evor
	; CHECK: b			; CHECK: evmergehi
	; CHECK: evldd
	; CHECK: evstdd
	; CHECK: blr			; CHECK: blr
	}			}

	define i32 @test_dtoui(double %a) {			define i32 @test_dtoui(double %a) {
	entry:			entry:
	%v = fptoui double %a to i32			%v = fptoui double %a to i32
	ret i32 %v			ret i32 %v
	; CHECK-LABEL: test_dtoui			; CHECK-LABEL: test_dtoui
	Show All 27 Lines
	define i32 @test_dasmconst(double %x) {			define i32 @test_dasmconst(double %x) {
	entry:			entry:
	%x.addr = alloca double, align 8			%x.addr = alloca double, align 8
	store double %x, double* %x.addr, align 8			store double %x, double* %x.addr, align 8
	%0 = load double, double* %x.addr, align 8			%0 = load double, double* %x.addr, align 8
	%1 = call i32 asm sideeffect "efdctsi $0, $1", "=d,d"(double %0)			%1 = call i32 asm sideeffect "efdctsi $0, $1", "=d,d"(double %0)
	ret i32 %1			ret i32 %1
	; CHECK-LABEL: test_dasmconst			; CHECK-LABEL: test_dasmconst
	; CHECK: evldd			; CHECK: evmergelo
	; CHECK: #APP			; CHECK: #APP
	; CHECK: efdctsi			; CHECK: efdctsi
	; CHECK: #NO_APP			; CHECK: #NO_APP
	}			}

	define double @test_spill(double %a) nounwind {			define double @test_spill(double %a) nounwind {
	entry:			entry:
	%0 = fadd double %a, %a			%0 = fadd double %a, %a
	Show All 12 Lines