This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/PowerPC/
-
Target/
-
PowerPC/
-
PPCCallingConv.td
3/5
PPCISelLowering.h
12/20
PPCISelLowering.cpp
1/2
PPCInstrInfo.td
-
PPCInstrSPE.td
-
test/CodeGen/PowerPC/
-
CodeGen/
-
PowerPC/
-
spe.ll

Differential D54583

PowerPC: Optimize SPE double parameter calling setup
ClosedPublic

Authored by jhibbits on Nov 15 2018, 8:38 AM.

Download Raw Diff

Details

Reviewers

nemanjai
hfinkel
joerg

Commits

rG1d1cf30b738b: PowerPC: Optimize SPE double parameter calling setup
rL363526: PowerPC: Optimize SPE double parameter calling setup

Summary

SPE passes doubles the same as soft-float, in register pairs as i32
types. This is all handled by the target-independent layer. However,
this is not optimal when splitting or reforming the doubles, as it
pushes to the stack and loads from, on either side.

For instance, to pass a double argument to a function, assuming the
double value is in r5, the sequence currently looks like this:

evstdd      5, X(1)
lwz         3, X(1)
lwz         4, X+4(1)

Likewise, to form a double into r5 from args in r3 and r4:

stw         3, X(1)
stw         4, X+4(1)
evldd       5, X(1)

This optimizes the fence to use SPE instructions. Now, to pass a double
to a function:

mr          4, 5
evmergehi   3, 5, 5

And to form a double into r5 from args in r3 and r4:

evmergelo   5, 3, 4

This is comparable to the way that gcc generates the double splits.

This also fixes expanding of builtins to libcalls, where the LowerCallTo() code path was generating intermediate illegal type nodes.

Diff Detail

Event Timeline

jhibbits created this revision.Nov 15 2018, 8:38 AM

Herald added subscribers: llvm-commits, jsji, jfb, kbarton. · View Herald TranscriptNov 15 2018, 8:38 AM

Harbormaster completed remote builds in B25053: Diff 174220.Nov 15 2018, 8:38 AM

glaubitz added a subscriber: glaubitz.Dec 4 2018, 7:31 AM

I have applied this patch to the llvm-toolchain-7 package in Debian and did not see any regressions on x86_64 or 32-Bit PowerPC. Additionally, I have included the patches from https://reviews.llvm.org/D49754 and https://reviews.llvm.org/D54409 saw no regressions on x86_64 and 32-bit PowerPC.

All three patches will be part of the next upload of the llvm-toolchain-7 package in Debian unstable which will be version 1:7.0.1~+rc2-9.

vit9696 added a subscriber: vit9696.Dec 21 2018, 1:44 PM

nemanjai added inline comments.Dec 29 2018, 1:33 PM

lib/Target/PowerPC/PPCISelLowering.cpp
393	No need for braces when there is a single statement in the if/else.
7833	The early exit should be first.
7837	The indentation is off. Maybe run `clang-format` on this function. I don't know which editor you use but if you use `Vim`, you can run `:7827,7845 ! clang-format` if you have `clang-format` in your `$PATH`. Also, it seems like you might want to check the input type as well - maybe with an assert if it can't be anything other than `f64`.
7840	A more descriptive assert message is probably in order. Also, perhaps it would be clearer if this was rewritten to: Assert that the constant operand value is less than 2 Use a ternary operator to select the opcode (or just have one opcode - see above)
7842	Line too long?
lib/Target/PowerPC/PPCISelLowering.h
203	Why not just have `EXTRACT_SPE_OP` and have it take a constant operand that determines Hi/Lo? Also, for both build and extract, it would be good to add a comment that these correspond almost exactly to `BUILD_PAIR` and `EXTRACT_VECTOR_ELT` nodes except that the input types are floating point since `i64` isn't a legal type for the target.

jhibbits marked 6 inline comments as done.Dec 29 2018, 8:24 PM

jhibbits added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
7840	This assert was during some debugging, and I forgot to remove it. However, it does make sense to have an assert along the lines of your suggestion. I"ll make such a change.
7842	Yeah, just a hair (one character). Reformatting with clang-format will fix that.
lib/Target/PowerPC/PPCISelLowering.h
203	I'm not sure how to pass a constant through to the tablegen'd layer. These two pseudo-ops are just light wrappers to EVMERGEHI and MR. If there is a way to pass a constant and do the switch down in that layer, then that's acceptable as well.

nemanjai added inline comments.Dec 30 2018, 5:12 AM

lib/Target/PowerPC/PPCISelLowering.h
203	Sure, there are existing examples. Vector conversion custom nodes are probably quite similar to what you need: PPCISD::SINT_VEC_TO_FP PPCISD::UINT_VEC_TO_FP But there will be others.

Fix expanding builtins to libcalls. Remove the need for intermediate illegal types in expanding and pairing the arguments and return values.

Unfortunately arcanist on my machine seems to be broken working with this repository, so I had to upload a diff manually.

Herald added subscribers: dexonsmith, mehdi_amini. · View Herald TranscriptJan 14 2019, 8:37 PM

kthomsen added a subscriber: kthomsen.Jan 16 2019, 1:46 AM

jhibbits mentioned this in D49754: Add -m(no-)spe, and e500 CPU definitions and support to clang.Jan 17 2019, 12:54 PM

Fix argument indices for indexing through OutVals[]. It should be the argument index, not the physical register index.

Hi Justin, I'm watching your work and used your patches to bring SPE into my CLANG for OS-9.
The OutVals[] issue is what I found yesterday as well by debugging the CLANG part.
There is a 2. location of this in PPCTargetLowering::LowerReturn()

 // Copy the result values into the output registers.
for (unsigned i = 0, realI = 0; i != RVLocs.size(); ++i, ++realI) {
  CCValAssign &VA = RVLocs[i];
  assert(VA.isRegLoc() && "Can only return in registers!");

  SDValue Arg = OutVals[realI];

Regards, Kei

Hi Kei, thanks for that. I've updated my code, and will post an updated diff tomorrow.

One more argument index fixup.

I have a question:
When compiling

double a;
void func(double x) {
a = x;
}

It is generating

lis 5, a@ha
evstdd 3, a@l(5)

But as the evstdd and evldd are having only 8bit (5bit real) UIMM offset, this code is not working, as the offset a@lo is not known to be 8bit only.
Is this issue already addressed in a patch and I simply haven't seen this, or is this still a missing part?

I would expect

lis 5, a@ha
li 4, a@l
evstddx 3, 4, 5

It seems that this is checked/generated by the PPCISelLowering.cpp SelectAddressRegReg() and/or SelectAddressRegImm()
Actually I'm trying to find out I have enough information in the SDValue N to identify this as a SPE load/store.
Do I missed a patch for this?
Thanks, Kei

Hi Kei, yes you need the patch in D54409, which fixes the offset handling. It should fix your case as well, but I didn't test foreign addresses.

The Patch D54409 is only handling the variables on the stack named in the code as "framedata". I'm going on to find out, how to manage this for global variables. SelectAddressRegReg() and SelectAddressRegImm() are doing this, but there is no information about the Target data. Maybe it needs to be decided somewhere else.

Hmm, I have not yet tried to explore this, but I get a feeling a regression appeared somewhere during the patch iterations. Either this or D54409.
At this point I am consistently getting weird generated instructions for __floatundidf from compiler-rt (compiling with freebsd & -O3), yet I have a correct one in my files.
What makes it strange is that the logs show that the correct and first incorrect examples were generated by the same compiler (binary file) with the same flags, yet I no longer can get the correct one.

Does it reproduce for anyone?

Supposedly correct one:

.set back_chain, -0x20
.set var_10, -0x10
.set var_8, -8
94 21 FF E0                       stwu      r1, back_chain(r1)
3C A0 45 30                       lis       r5, 0x4530
90 61 00 1C                       stw       r3, 0x20+var_8+4(r1)
90 A1 00 18                       stw       r5, 0x20+var_8(r1)
3C A0 43 30                       lis       r5, 0x4330
10 61 1B 01                       evldd     r3, 0x20+var_8(r1)
90 81 00 14                       stw       r4, 0x20+var_10+4(r1)
3C 80 80 02                       lis       r4, -0x7FFE
10 84 DB 01                       evldd     r4, 0xD8(r4) ; note this one
90 A1 00 10                       stw       r5, 0x20+var_10(r1)
10 A1 13 01                       evldd     r5, 0x20+var_10(r1)
10 63 22 E0                       efdadd    r3, r3, r4
10 83 2A E0                       efdadd    r4, r3, r5
10 64 22 2C                       evmergehi r3, r4, r4
38 21 00 20                       addi      r1, r1, 0x20
4E 80 00 20                       blr

What I get now:

94 21 FF E0                       stwu      r1, back_chain(r1)
3C A0 45 30                       lis       r5, 0x4530
90 61 00 1C                       stw       r3, 0x20+var_8+4(r1)
90 A1 00 18                       stw       r5, 0x20+var_8(r1)
3C A0 43 30                       lis       r5, 0x4330
10 61 1B 01                       evldd     r3, 0x20+var_8(r1)
90 81 00 14                       stw       r4, 0x20+var_10+4(r1)
3C 80 80 02                       lis       r4, -0x7FFE
10 8C BB 01                       evldd     r4, 0xB8(r12) ; note this one
90 A1 00 10                       stw       r5, 0x20+var_10(r1)
10 A1 13 01                       evldd     r5, 0x20+var_10(r1)
10 63 22 E0                       efdadd    r3, r3, r4
10 83 2A E0                       efdadd    r4, r3, r5
10 64 22 2C                       evmergehi r3, r4, r4
38 21 00 20                       addi      r1, r1, 0x20
4E 80 00 20                       blr

94 21 FF E0                       stwu      r1, back_chain(r1)
3C A0 45 30                       lis       r5, 0x4530
90 61 00 1C                       stw       r3, 0x20+var_8+4(r1)
90 A1 00 18                       stw       r5, 0x20+var_8(r1)
3C A0 43 30                       lis       r5, 0x4330
10 61 1B 01                       evldd     r3, 0x20+var_8(r1)
90 81 00 14                       stw       r4, 0x20+var_10+4(r1)
3C 80 80 01                       lis       r4, -0x7FFF
10 9F 1B 01                       evldd     r4, 0x18(r31) ; note this one
90 A1 00 10                       stw       r5, 0x20+var_10(r1)
10 A1 13 01                       evldd     r5, 0x20+var_10(r1)
10 63 22 E0                       efdadd    r3, r3, r4
10 83 2A E0                       efdadd    r4, r3, r5
10 64 22 2C                       evmergehi r3, r4, r4
38 21 00 20                       addi      r1, r1, 0x20
4E 80 00 20                       blr

Reference source:

double floatundidf(unsigned long long a)
{
    static const double twop52 = 4503599627370496.0; // 0x1.0p52
    static const double twop84 = 19342813113834066795298816.0; // 0x1.0p84
    static const double twop84_plus_twop52 = 19342813118337666422669312.0; // 0x1.00000001p84

    union { uint64_t x; double d; } high = { .d = twop84 };
    union { uint64_t x; double d; } low = { .d = twop52 };

    high.x |= a >> 32;
    low.x |= a & UINT64_C(0x00000000ffffffff);

    const double result = (high.d - twop84_plus_twop52) + low.d;
    return result;
}

As promised I have modified the SelectAddressRegReg() in PPCISelLowering.cpp to create correct evldd(x) and evstdd(x) instructions when accessing global variables.

bool PPCTargetLowering::SelectAddressRegReg(SDValue N, SDValue &Base,

                                          SDValue &Index,
                                          SelectionDAG &DAG) const {
int16_t imm = 0;
if (N.getOpcode() == ISD::ADD) {
  if (hasSPE()) {
    // Is there any SPE load/store (f64) which can't handle 16bit offset?
    for (SDNode::use_iterator UI = N->use_begin(), E = N->use_end();
        UI != E; ++UI) {
      if (UI->getOpcode() == ISD::STORE) {
        // Store has the type Operand[1]
        if ((UI->getNumOperands() >= 2) 
          && (UI->getOperand(1).getSimpleValueType() == MVT::f64)) {
            // This is a f64 store with SPE
            // The instruction evstdd can only handle 8bit offset
            Base = N.getOperand(0);
            Index = N.getOperand(1);
            return true;
        }
      } else
      if (UI->getOpcode() == ISD::LOAD) {
        // Load has the type in Values[0]
        if ((UI->getNumValues() >= 1) 
          && (UI->getSimpleValueType(0) == MVT::f64)) {
            // This is a f64 load with SPE
            // The instruction evldd can only handle 8bit offset
            Base = N.getOperand(0);
            Index = N.getOperand(1);
            return true;
        }
      }
    }
  }
  if (isIntS16Immediate(N.getOperand(1), imm))
    return false;    // r+i
  if (N.getOperand(1).getOpcode() == PPCISD::Lo)
    return false;    // r+i

....
The modification starts with if(hasSPE()) { and ends with the fitting }

What have I done: The SelectAddressRegReg() function is the central decider for "offset+r4" or "r4+r5", but it only knows about 16bit offsets. evldd and evstdd are using 8bit (5bit usable) offsets. Therefore they can't be used when accessing global variables. The patch in here is now looking in the useList if it is a SPE with Load or Store for f64 data. Then it tells, that it must be Register+Register addressing, which is then automatically chaning to evlddx / evstddx.
I have tested this with OS-9 running on a P2020 (e500v2).

Would you like to put my patch into your patch D54583?
Kei

Hi Kei, thanks! I'll gladly add it to one of the patches. It might fit better with D54409, since the purpose of that is to fix the evldd handling in general.

Hi @vit9696, it looks like Kei's patch fixes the issue you're seeing as well.

I placed the info in this thread, as D54583 already has PPCISelLowering.cpp in it. But in real it better fits into D54409.

In D54583#1368816, @jhibbits wrote:

Hi @vit9696, it looks like Kei's patch fixes the issue you're seeing as well.

Yes, I am currently testing the code with the latest changes, and so far it appears to be resolved. Thanks.

Also, as we discussed on IRC, please provide full context to make it easier to review.
I am actually OK with this revision as long as the comments are addressed, but I'm just requesting another revision to ensure all the comments have been addressed.

lib/Target/PowerPC/PPCISelLowering.cpp
3164	Since this isn't local to a very small loop, I think a more descriptive name is in order. Perhaps `RegIdx`?
3165	I think it is more obvious to use `sizeof(MCPhysReg)` rather than `sizeof(HiRegList[0])`.
3196	I can't say I am overly familiar with this code, but we don't need to call `State.AllocateReg()` for the low register?
3550	I don't think it's useful to duplicate a comment down both paths in a condition.
3555	Perhaps something like assert(i + 1 < e && "No second half of double precision argument");
5520	`i, realI, j` are not adequate given this rather complex iteration space... Please name them for what they are meant to refer to.
6755	Can we be consistent about how we specify the index to the extract between here and above? I am OK with either approach as long as we use the same.
lib/Target/PowerPC/PPCInstrInfo.td
237	Seems like we don't need to separately state that the types are the same and that they're the same size?

This revision now requires changes to proceed.Feb 19 2019, 4:13 PM

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 19 2019, 4:13 PM

Address feedback. Provide full context for diffs.

I found an issue with the SPE compare operations. The result of a efdcmpeq , efdcmpgt and efdcmplt is every time the GT-Bit in the Condition Register. This is adressed in one place of the PPCISelDAGToDAG.cpp, but not addressed for a second case of the code generation.
The diff of PPCISelDAGToDAG.cpp is:

--- PPCISelDAGToDAG.cpp 2019-03-12 15:35:45.000000000 +0100
+++ "\\ellcc\\PPCISelDAGToDAG.cpp"      2019-03-27 10:52:21.088326000 +0100
@@ -5039,6 +5038,32 @@ void PPCDAGToDAGISel::Select(SDNode *N)
       PCC |= getBranchHint(PCC, FuncInfo, N->getOperand(4));

     SDValue CondCode = SelectCC(N->getOperand(2), N->getOperand(3), CC, dl);
+
+    if (PPCSubTarget->hasSPE() && N->getOperand(2).getValueType().isFloatingPoint()) {
+      // For SPE instructions, the result is in GT bit of the CR
+      switch(CC) {
+        case ISD::SETOEQ:
+        case ISD::SETEQ:
+        case ISD::SETOLT:
+        case ISD::SETLT:
+        case ISD::SETOGT:
+        case ISD::SETGT:
+                PCC = PPC::PRED_GT;
+                break;
+        case ISD::SETUNE:
+        case ISD::SETNE:
+        case ISD::SETULE:
+        case ISD::SETLE:
+        case ISD::SETUGE:
+        case ISD::SETGE:
+                PCC = PPC::PRED_LE;
+                break;
+        default:
+                break;
+      }
+    }
+
+
     SDValue Ops[] = { getI32Imm(PCC, dl), CondCode,
                         N->getOperand(4), N->getOperand(0) };
     CurDAG->SelectNodeTo(N, PPC::BCC, MVT::Other, Ops);

The following testprogram is checking all methods. And the LIBC++ tests are now correctly compiled and running (about 1000 of 5800 failed before).

#include <stdio.h>

int teq(double a, double b)
{
    printf("%lf == %lf\n",a,b);
    if (a == b)
    {
      printf("equal\n");
      return 1;
    }
    printf("!equal\n");
    return 0;
}

int tne(double a, double b)
{
    printf("%lf != %lf\n",a,b);
    if (a != b)
    {
      printf("notequal\n");
      return 1;
    }
    printf("!notequal\n");
    return 0;
}
int tgt(double a, double b)
{
    printf("%lf > %lf\n",a,b);
    if (a > b)
    {
      printf("greater than\n");
      return 1;
    }
    printf("!greater than\n");
    return 0;
}
int tge(double a, double b)
{
    printf("%lf >= %lf\n",a,b);
    if (a >= b)
    {
      printf("greater equal\n");
      return 1;
    }
    printf("!greater equal\n");
    return 0;
}
int tlt(double a, double b)
{
    printf("%lf < %lf\n",a,b);
    if (a < b)
    {
      printf("less than\n");
      return 1;
    }
    printf("!less than\n");
    return 0;
}

int tle(double a, double b)
{
    printf("%lf <= %lf\n",a,b);
    if (a <= b)
    {
      printf("less equal\n");
      return 1;
    }
    printf("!less equal\n");
    return 0;
}

int main()
{
    teq(5.5,5.5);
    teq(5.5,5.6);
        
    tne(5.5,5.6);
    tne(5.5,5.5);
    
    tgt(5.5,5.6);
    tgt(5.5,5.5);
    tgt(5.5,5.4);
    
    tge(5.5,5.6);
    tge(5.5,5.5);
    tge(5.5,5.4);
    
    tlt(5.5,5.6);
    tlt(5.5,5.5);
    tlt(5.5,5.4);
    
    tle(5.5,5.6);
    tle(5.5,5.5);
    tle(5.5,5.4);
        
    return 0;
}

The result is:

$ eq
5.500000 == 5.500000
equal
5.500000 == 5.600000
!equal
5.500000 != 5.600000
notequal
5.500000 != 5.500000
!notequal
5.500000 > 5.600000
!greater than
5.500000 > 5.500000
!greater than
5.500000 > 5.400000
greater than
5.500000 >= 5.600000
!greater equal
5.500000 >= 5.500000
greater equal
5.500000 >= 5.400000
greater equal
5.500000 < 5.600000
less than
5.500000 < 5.500000
!less than
5.500000 < 5.400000
!less than
5.500000 <= 5.600000
less equal
5.500000 <= 5.500000
less equal
5.500000 <= 5.400000
!less equal

I'm not sure if this fix is good for D54583 or if you like to create a new one.

Best regards, Kei

@kthomsen can you create a new revision just for that diff?

lib/Target/PowerPC/PPCISelLowering.cpp
3165	This is often spelled out in a macro as nitems(), or sizeofArray() (or similar).

@jhibbits I don't know how to create a new revision here. My idea is to handle this fix via you, as you are already known for the SPE modifications.

In D54583#1445579, @kthomsen wrote:

@jhibbits I don't know how to create a new revision here. My idea is to handle this fix via you, as you are already known for the SPE modifications.

Fair enough, I'll create a new revision tonight for this change.

LGTM. The remaining comments are stylistic nits that can be addressed on the commit and do not require another round of review. Thank you for your patience with this review.

lib/Target/PowerPC/PPCISelLowering.cpp
3215	I don't really understand why this is handled differently than the parameters in terms of how the allocation is done. It is perhaps a good indicator that a comment explaining the difference would be useful.
3578	I think it would be good to keep using the Hi/Lo naming here as well and make it very obvious that you're adding two live-in registers and adding copies from them into virtual regs. Something along the lines of: if (VA.getLocVT() == MVT::f64 && Subtarget.hasSPE()) { assert(i + 1 < e && "No second half of double precision argument"); unsigned RegLo = MF.addLiveIn(VA.getLocReg(), RC); unsigned RegHi = MF.addLiveIn(ArgLocs[++i].getLocReg(), RC); SDValue ArgValueLo = DAG.getCopyFromReg(Chain, dl, RegLo, MVT::i32); SDValue ArgValueHi = DAG.getCopyFromReg(Chain, dl, RegHi, MVT::i32); if (!Subtarget.isLittleEndian()) std::swap (ArgValueLo, ArgValueHi); ArgValue = DAG.getNode(PPCISD::BUILD_SPE64, dl, MVT::f64, ArgValueLo, ArgValueHi); } else { // existing code ... }
5545	I think it would be nice to add a comment explaining what the induction variables track (correct the below if it's wrong): i - Tracks the index into the list of registers allocated for the call RealArgIdx - Tracks the index into the list of actual function arguments j - Tracks the index into the list of byval arguments
5600	Minor nit: this doesn't really match the naming convention. Perhaps `IsLE`?
6756	Similar comment for this code as above regarding induction variables and the naming of `isLittleEndian`.
lib/Target/PowerPC/PPCISelLowering.h
1114	This does not appear to be used. Perhaps an artifact remaining from a previous revision?
lib/Target/PowerPC/PPCInstrInfo.td
236	I think all the types have to be exact - `i32` inputs, `f64` output. Might as well make that explicit: `[SDTCisVT<0, f64>, SDTCisVT<1, i32>, SDTCisVT<2, i32>]` Similarly below.

Forgot to select Accept Revision from the pulldown.

This revision is now accepted and ready to land.Mar 31 2019, 3:09 AM

@jhibbits @kthomsen Sorry for the delay, I checked the change in PPCISelDAGToDAG.cpp, and it indeed fixes the issue. So far I ran into no other bugs and have no objection of this getting merged.

jhibbits marked 7 inline comments as done.Apr 2 2019, 1:12 PM

jhibbits added inline comments.

lib/Target/PowerPC/PPCISelLowering.cpp
3215	Yeah, they're identical, but on second look they shouldn't be. Return values can only be in R3 and R4, so they should be identical in every way except the register list.
lib/Target/PowerPC/PPCISelLowering.h
1114	Yes, it's an artifact.

Closed by commit rL363526: PowerPC: Optimize SPE double parameter calling setup (authored by jhibbits). · Explain WhyJun 16 2019, 8:14 PM

This revision was automatically updated to reflect the committed changes.

jhibbits marked 2 inline comments as done.

Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2019, 8:15 PM

kthomsen mentioned this in D54409: PowerPC/SPE: Fix load/store handling for SPE.Jul 1 2019, 1:04 AM

jhibbits mentioned this in D69483: [PowerPC]: Fix predicate handling with SPE.Dec 13 2019, 8:18 AM

Revision Contents

Path

Size

lib/

Target/

PowerPC/

7 lines

29 lines

171 lines

11 lines

12 lines

test/

CodeGen/

PowerPC/

spe.ll

8 lines

Diff 182715

lib/Target/PowerPC/PPCCallingConv.td

Context not available.
	CCIfSubtarget<"hasSPE()",	CCIfSubtarget<"hasSPE()",
	CCIfType<[f32], CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>>,	CCIfType<[f32], CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>>,
	CCIfSubtarget<"hasSPE()",	CCIfSubtarget<"hasSPE()",
	CCIfType<[f64], CCAssignToReg<[S3, S4, S5, S6, S7, S8, S9, S10]>>>,	CCIfType<[f64], CCCustom<"CC_PPC32_SPE_RetF64">>>,

	// For P9, f128 are passed in vector registers.	// For P9, f128 are passed in vector registers.
	CCIfType<[f128],	CCIfType<[f128],
Context not available.
	CCIfType<[i32],	CCIfType<[i32],
	CCIfSplit<CCIfNotSubtarget<"useSoftFloat()",	CCIfSplit<CCIfNotSubtarget<"useSoftFloat()",
	CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>>,	CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>>,
		CCIfType<[f64],
		CCIfSubtarget<"hasSPE()",
		CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>,
	CCIfSplit<CCIfSubtarget<"useSoftFloat()",	CCIfSplit<CCIfSubtarget<"useSoftFloat()",
	CCIfOrigArgWasPPCF128<CCCustom<	CCIfOrigArgWasPPCF128<CCCustom<
	"CC_PPC32_SVR4_Custom_SkipLastArgRegsPPCF128">>>>,	"CC_PPC32_SVR4_Custom_SkipLastArgRegsPPCF128">>>>,
Context not available.
	CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>>,	CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>>,
	CCIfType<[f64],	CCIfType<[f64],
	CCIfSubtarget<"hasSPE()",	CCIfSubtarget<"hasSPE()",
	CCAssignToReg<[S3, S4, S5, S6, S7, S8, S9, S10]>>>,	CCCustom<"CC_PPC32_SPE_CustomSplitFP64">>>,
	CCIfType<[f32],	CCIfType<[f32],
	CCIfSubtarget<"hasSPE()",	CCIfSubtarget<"hasSPE()",
	CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>>,	CCAssignToReg<[R3, R4, R5, R6, R7, R8, R9, R10]>>>,
Context not available.

lib/Target/PowerPC/PPCISelLowering.h

Context not available.
	/// Direct move of 2 consective GPR to a VSX register.	/// Direct move of 2 consective GPR to a VSX register.
	BUILD_FP128,	BUILD_FP128,

		/// BUILD_SPE64 and EXTRACT_SPE are analogous to BUILD_PAIR and
		/// EXTRACT_ELEMENT but take f64 arguments instead of i64, as i64 is
		/// unsupported for this target.
		/// Merge 2 GPRs to a single SPE register.
		BUILD_SPE64,
		nemanjaiUnsubmitted Not Done Reply Inline Actions Why not just have `EXTRACT_SPE_OP` and have it take a constant operand that determines Hi/Lo? Also, for both build and extract, it would be good to add a comment that these correspond almost exactly to `BUILD_PAIR` and `EXTRACT_VECTOR_ELT` nodes except that the input types are floating point since `i64` isn't a legal type for the target. nemanjai: Why not just have `EXTRACT_SPE_OP` and have it take a constant operand that determines Hi/Lo?
		jhibbitsAuthorUnsubmitted Done Reply Inline Actions I'm not sure how to pass a constant through to the tablegen'd layer. These two pseudo-ops are just light wrappers to EVMERGEHI and MR. If there is a way to pass a constant and do the switch down in that layer, then that's acceptable as well. jhibbits: I'm not sure how to pass a constant through to the tablegen'd layer. These two pseudo-ops are…
		nemanjaiUnsubmitted Not Done Reply Inline Actions Sure, there are existing examples. Vector conversion custom nodes are probably quite similar to what you need: PPCISD::SINT_VEC_TO_FP PPCISD::UINT_VEC_TO_FP But there will be others. nemanjai: Sure, there are existing examples. Vector conversion custom nodes are probably quite similar to…

		/// Extract SPE register component, second argument is high or low.
		EXTRACT_SPE,

	/// Extract a subvector from signed integer vector and convert to FP.	/// Extract a subvector from signed integer vector and convert to FP.
	/// It is primarily used to convert a (widened) illegal integer vector	/// It is primarily used to convert a (widened) illegal integer vector
	/// type to a legal floating point vector type.	/// type to a legal floating point vector type.
Context not available.
	unsigned JTI,	unsigned JTI,
	MCContext &Ctx) const override;	MCContext &Ctx) const override;

	unsigned getNumRegistersForCallingConv(LLVMContext &Context,
	CallingConv:: ID CC,
	EVT VT) const override;

	MVT getRegisterTypeForCallingConv(LLVMContext &Context,
	CallingConv:: ID CC,
	EVT VT) const override;

	private:	private:
	struct ReuseLoadInfo {	struct ReuseLoadInfo {
	SDValue Ptr;	SDValue Ptr;
Context not available.
	SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;	SDValue lowerEH_SJLJ_SETJMP(SDValue Op, SelectionDAG &DAG) const;
	SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;	SDValue lowerEH_SJLJ_LONGJMP(SDValue Op, SelectionDAG &DAG) const;
	SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;	SDValue LowerBITCAST(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerEXTRACT_ELEMENT(SDValue Op, SelectionDAG &DAG) const;
		nemanjaiUnsubmitted Done Reply Inline Actions This does not appear to be used. Perhaps an artifact remaining from a previous revision? nemanjai: This does not appear to be used. Perhaps an artifact remaining from a previous revision?
		jhibbitsAuthorUnsubmitted Done Reply Inline Actions Yes, it's an artifact. jhibbits: Yes, it's an artifact.

	SDValue DAGCombineExtBoolTrunc(SDNode *N, DAGCombinerInfo &DCI) const;	SDValue DAGCombineExtBoolTrunc(SDNode *N, DAGCombinerInfo &DCI) const;
	SDValue DAGCombineBuildVector(SDNode *N, DAGCombinerInfo &DCI) const;	SDValue DAGCombineBuildVector(SDNode *N, DAGCombinerInfo &DCI) const;
Context not available.
	ISD::ArgFlagsTy &ArgFlags,	ISD::ArgFlagsTy &ArgFlags,
	CCState &State);	CCState &State);

		bool CC_PPC32_SPE_CustomSplitFP64(unsigned &ValNo, MVT &ValVT,
		MVT &LocVT,
		CCValAssign::LocInfo &LocInfo,
		ISD::ArgFlagsTy &ArgFlags,
		CCState &State);
		bool CC_PPC32_SPE_RetF64(unsigned &ValNo, MVT &ValVT,
		MVT &LocVT,
		CCValAssign::LocInfo &LocInfo,
		ISD::ArgFlagsTy &ArgFlags,
		CCState &State);

	bool CC_PPC32_SVR4_Custom_AlignFPArgRegs(unsigned &ValNo, MVT &ValVT,	bool CC_PPC32_SVR4_Custom_AlignFPArgRegs(unsigned &ValNo, MVT &ValVT,
	MVT &LocVT,	MVT &LocVT,
	CCValAssign::LocInfo &LocInfo,	CCValAssign::LocInfo &LocInfo,
Context not available.

lib/Target/PowerPC/PPCISelLowering.cpp

Context not available.
	return Align;	return Align;
	}	}

	unsigned PPCTargetLowering::getNumRegistersForCallingConv(LLVMContext &Context,
	CallingConv:: ID CC,
	EVT VT) const {
	if (Subtarget.hasSPE() && VT == MVT::f64)
	return 2;
	return PPCTargetLowering::getNumRegisters(Context, VT);
	}

	MVT PPCTargetLowering::getRegisterTypeForCallingConv(LLVMContext &Context,
	CallingConv:: ID CC,
	EVT VT) const {
	if (Subtarget.hasSPE() && VT == MVT::f64)
	return MVT::i32;
	return PPCTargetLowering::getRegisterType(Context, VT);
	}

	bool PPCTargetLowering::useSoftFloat() const {	bool PPCTargetLowering::useSoftFloat() const {
	return Subtarget.useSoftFloat();	return Subtarget.useSoftFloat();
	}	}
Context not available.
	case PPCISD::QBFLT: return "PPCISD::QBFLT";	case PPCISD::QBFLT: return "PPCISD::QBFLT";
	case PPCISD::QVLFSb: return "PPCISD::QVLFSb";	case PPCISD::QVLFSb: return "PPCISD::QVLFSb";
	case PPCISD::BUILD_FP128: return "PPCISD::BUILD_FP128";	case PPCISD::BUILD_FP128: return "PPCISD::BUILD_FP128";
		case PPCISD::BUILD_SPE64: return "PPCISD::BUILD_SPE64";
		case PPCISD::EXTRACT_SPE: return "PPCISD::EXTRACT_SPE";
	case PPCISD::EXTSWSLI: return "PPCISD::EXTSWSLI";	case PPCISD::EXTSWSLI: return "PPCISD::EXTSWSLI";
	}	}
	return nullptr;	return nullptr;
Context not available.
	return true;	return true;
	}	}

		bool llvm::CC_PPC32_SPE_CustomSplitFP64(unsigned &ValNo, MVT &ValVT,
		MVT &LocVT,
		CCValAssign::LocInfo &LocInfo,
		ISD::ArgFlagsTy &ArgFlags,
		CCState &State) {
		static const MCPhysReg HiRegList[] = { PPC::R3, PPC::R5, PPC::R7, PPC::R9 };
		static const MCPhysReg LoRegList[] = { PPC::R4, PPC::R6, PPC::R8, PPC::R10 };

		// Try to get the first register.
		unsigned Reg = State.AllocateReg(HiRegList);
		if (!Reg)
		return false;

		unsigned i;
		nemanjaiUnsubmitted Not Done Reply Inline Actions Since this isn't local to a very small loop, I think a more descriptive name is in order. Perhaps `RegIdx`? nemanjai: Since this isn't local to a very small loop, I think a more descriptive name is in order.
		for (i = 0; i < sizeof(HiRegList) / sizeof(HiRegList[0]); ++i)
		nemanjaiUnsubmitted Not Done Reply Inline Actions I think it is more obvious to use `sizeof(MCPhysReg)` rather than `sizeof(HiRegList[0])`. nemanjai: I think it is more obvious to use `sizeof(MCPhysReg)` rather than `sizeof(HiRegList[0])`.
		jhibbitsAuthorUnsubmitted Done Reply Inline Actions This is often spelled out in a macro as nitems(), or sizeofArray() (or similar). jhibbits: This is often spelled out in a macro as nitems(), or sizeofArray() (or similar).
		if (HiRegList[i] == Reg)
		break;

		unsigned T = State.AllocateReg(LoRegList[i]);
		(void)T;
		assert(T == LoRegList[i] && "Could not allocate register");

		State.addLoc(CCValAssign::getCustomReg(ValNo, ValVT, Reg, LocVT, LocInfo));
		State.addLoc(CCValAssign::getCustomReg(ValNo, ValVT, LoRegList[i],
		LocVT, LocInfo));
		return true;
		}

		bool llvm::CC_PPC32_SPE_RetF64(unsigned &ValNo, MVT &ValVT,
		MVT &LocVT,
		CCValAssign::LocInfo &LocInfo,
		ISD::ArgFlagsTy &ArgFlags,
		CCState &State) {
		static const MCPhysReg HiRegList[] = { PPC::R3, PPC::R5, PPC::R7, PPC::R9 };
		static const MCPhysReg LoRegList[] = { PPC::R4, PPC::R6, PPC::R8, PPC::R10 };

		// Try to get the first register.
		unsigned Reg = State.AllocateReg(HiRegList);
		if (!Reg)
		return false;

		unsigned i;
		for (i = 0; i < sizeof(HiRegList) / sizeof(HiRegList[0]); ++i)
		if (HiRegList[i] == Reg)
		break;

		nemanjaiUnsubmitted Done Reply Inline Actions I can't say I am overly familiar with this code, but we don't need to call `State.AllocateReg()` for the low register? nemanjai: I can't say I am overly familiar with this code, but we don't need to call `State.AllocateReg…
		State.addLoc(CCValAssign::getCustomReg(ValNo, ValVT, Reg, LocVT, LocInfo));
		State.addLoc(CCValAssign::getCustomReg(ValNo, ValVT, LoRegList[i],
		LocVT, LocInfo));
		return true;
		}

	bool llvm::CC_PPC32_SVR4_Custom_AlignArgRegs(unsigned &ValNo, MVT &ValVT,	bool llvm::CC_PPC32_SVR4_Custom_AlignArgRegs(unsigned &ValNo, MVT &ValVT,
	MVT &LocVT,	MVT &LocVT,
	CCValAssign::LocInfo &LocInfo,	CCValAssign::LocInfo &LocInfo,
		nemanjaiUnsubmitted Done Reply Inline Actions I don't really understand why this is handled differently than the parameters in terms of how the allocation is done. It is perhaps a good indicator that a comment explaining the difference would be useful. nemanjai: I don't really understand why this is handled differently than the parameters in terms of how…
		jhibbitsAuthorUnsubmitted Done Reply Inline Actions Yeah, they're identical, but on second look they shouldn't be. Return values can only be in R3 and R4, so they should be identical in every way except the register list. jhibbits: Yeah, they're identical, but on second look they shouldn't be. Return values can only be in R3…
Context not available.
	// Reserve space for the linkage area on the stack.	// Reserve space for the linkage area on the stack.
	unsigned LinkageSize = Subtarget.getFrameLowering()->getLinkageSize();	unsigned LinkageSize = Subtarget.getFrameLowering()->getLinkageSize();
	CCInfo.AllocateStack(LinkageSize, PtrByteSize);	CCInfo.AllocateStack(LinkageSize, PtrByteSize);
	if (useSoftFloat() \|\| hasSPE())	if (useSoftFloat())
	CCInfo.PreAnalyzeFormalArguments(Ins);	CCInfo.PreAnalyzeFormalArguments(Ins);

	CCInfo.AnalyzeFormalArguments(Ins, CC_PPC32_SVR4);	CCInfo.AnalyzeFormalArguments(Ins, CC_PPC32_SVR4);
Context not available.
	if (Subtarget.hasVSX())	if (Subtarget.hasVSX())
	RC = &PPC::VSFRCRegClass;	RC = &PPC::VSFRCRegClass;
	else if (Subtarget.hasSPE())	else if (Subtarget.hasSPE())
	RC = &PPC::SPERCRegClass;	// SPE passes doubles in GPR pairs.
		RC = &PPC::GPRCRegClass;
	else	else
	RC = &PPC::F8RCRegClass;	RC = &PPC::F8RCRegClass;
	break;	break;
Context not available.
	break;	break;
	}	}

	// Transform the arguments stored in physical registers into virtual ones.	SDValue ArgValue;
	unsigned Reg = MF.addLiveIn(VA.getLocReg(), RC);	if (VA.getLocVT() == MVT::f64 && Subtarget.hasSPE()) {
	SDValue ArgValue = DAG.getCopyFromReg(Chain, dl, Reg,	// Transform the arguments stored in physical registers into
		nemanjaiUnsubmitted Done Reply Inline Actions I don't think it's useful to duplicate a comment down both paths in a condition. nemanjai: I don't think it's useful to duplicate a comment down both paths in a condition.
	ValVT == MVT::i1 ? MVT::i32 : ValVT);	// virtual ones.
		unsigned Reg = MF.addLiveIn(VA.getLocReg(), RC);
		ArgValue = DAG.getCopyFromReg(Chain, dl, Reg, MVT::i32);

		SDValue ArgValue2;
		nemanjaiUnsubmitted Done Reply Inline Actions Perhaps something like assert(i + 1 < e && "No second half of double precision argument"); nemanjai: Perhaps something like ``` assert(i + 1 < e && "No second half of double precision…
		Reg = MF.addLiveIn(ArgLocs[++i].getLocReg(), RC);
		ArgValue2 = DAG.getCopyFromReg(Chain, dl, Reg, MVT::i32);
		if (!Subtarget.isLittleEndian())
		std::swap (ArgValue, ArgValue2);
		ArgValue = DAG.getNode(PPCISD::BUILD_SPE64, dl, MVT::f64, ArgValue,
		ArgValue2);
		} else {

	if (ValVT == MVT::i1)	// Transform the arguments stored in physical registers into
	ArgValue = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, ArgValue);	// virtual ones.
		unsigned Reg = MF.addLiveIn(VA.getLocReg(), RC);
		ArgValue = DAG.getCopyFromReg(Chain, dl, Reg,
		ValVT == MVT::i1 ? MVT::i32 : ValVT);
		if (ValVT == MVT::i1)
		ArgValue = DAG.getNode(ISD::TRUNCATE, dl, MVT::i1, ArgValue);
		}

	InVals.push_back(ArgValue);	InVals.push_back(ArgValue);
	} else {	} else {
		nemanjaiUnsubmitted Done Reply Inline Actions I think it would be good to keep using the Hi/Lo naming here as well and make it very obvious that you're adding two live-in registers and adding copies from them into virtual regs. Something along the lines of: if (VA.getLocVT() == MVT::f64 && Subtarget.hasSPE()) { assert(i + 1 < e && "No second half of double precision argument"); unsigned RegLo = MF.addLiveIn(VA.getLocReg(), RC); unsigned RegHi = MF.addLiveIn(ArgLocs[++i].getLocReg(), RC); SDValue ArgValueLo = DAG.getCopyFromReg(Chain, dl, RegLo, MVT::i32); SDValue ArgValueHi = DAG.getCopyFromReg(Chain, dl, RegHi, MVT::i32); if (!Subtarget.isLittleEndian()) std::swap (ArgValueLo, ArgValueHi); ArgValue = DAG.getNode(PPCISD::BUILD_SPE64, dl, MVT::f64, ArgValueLo, ArgValueHi); } else { // existing code ... } nemanjai: I think it would be good to keep using the Hi/Lo naming here as well and make it very obvious…
Context not available.
	CCValAssign &VA = RVLocs[i];	CCValAssign &VA = RVLocs[i];
	assert(VA.isRegLoc() && "Can only return in registers!");	assert(VA.isRegLoc() && "Can only return in registers!");

	SDValue Val = DAG.getCopyFromReg(Chain, dl,	SDValue Val;
	VA.getLocReg(), VA.getLocVT(), InFlag);
	Chain = Val.getValue(1);	if (Subtarget.hasSPE() && VA.getLocVT() == MVT::f64) {
	InFlag = Val.getValue(2);	SDValue Lo = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), MVT::i32,
		InFlag);
		Chain = Lo.getValue(1);
		InFlag = Lo.getValue(2);
		VA = RVLocs[++i]; // skip ahead to next loc
		SDValue Hi = DAG.getCopyFromReg(Chain, dl, VA.getLocReg(), MVT::i32,
		InFlag);
		Chain = Hi.getValue(1);
		InFlag = Hi.getValue(2);
		if (!Subtarget.isLittleEndian())
		std::swap (Lo, Hi);
		Val = DAG.getNode(PPCISD::BUILD_SPE64, dl, MVT::f64, Lo, Hi);
		} else {
		Val = DAG.getCopyFromReg(Chain, dl,
		VA.getLocReg(), VA.getLocVT(), InFlag);
		Chain = Val.getValue(1);
		InFlag = Val.getValue(2);
		}

	switch (VA.getLocInfo()) {	switch (VA.getLocInfo()) {
	default: llvm_unreachable("Unknown loc info!");	default: llvm_unreachable("Unknown loc info!");
Context not available.

	bool seenFloatArg = false;	bool seenFloatArg = false;
	// Walk the register/memloc assignments, inserting copies/loads.	// Walk the register/memloc assignments, inserting copies/loads.
	for (unsigned i = 0, j = 0, e = ArgLocs.size();	for (unsigned i = 0, realI = 0, j = 0, e = ArgLocs.size();
		nemanjaiUnsubmitted Not Done Reply Inline Actions `i, realI, j` are not adequate given this rather complex iteration space... Please name them for what they are meant to refer to. nemanjai: `i, realI, j` are not adequate given this rather complex iteration space... Please name them…
	i != e;	i != e;
	++i) {	++i, ++realI) {
	CCValAssign &VA = ArgLocs[i];	CCValAssign &VA = ArgLocs[i];
	SDValue Arg = OutVals[i];	SDValue Arg = OutVals[realI];
	ISD::ArgFlagsTy Flags = Outs[i].Flags;	ISD::ArgFlagsTy Flags = Outs[realI].Flags;

	if (Flags.isByVal()) {	if (Flags.isByVal()) {
	// Argument is an aggregate which is passed by value, thus we need to	// Argument is an aggregate which is passed by value, thus we need to
		nemanjaiUnsubmitted Done Reply Inline Actions I think it would be nice to add a comment explaining what the induction variables track (correct the below if it's wrong): i - Tracks the index into the list of registers allocated for the call RealArgIdx - Tracks the index into the list of actual function arguments j - Tracks the index into the list of byval arguments nemanjai: I think it would be nice to add a comment explaining what the induction variables track…
Context not available.
	if (VA.isRegLoc()) {	if (VA.isRegLoc()) {
	seenFloatArg \|= VA.getLocVT().isFloatingPoint();	seenFloatArg \|= VA.getLocVT().isFloatingPoint();
	// Put argument in a physical register.	// Put argument in a physical register.
	RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));	if (Subtarget.hasSPE() && Arg.getValueType() == MVT::f64) {
		unsigned id = Subtarget.isLittleEndian() ? 0 : 1;
		SDValue SVal = DAG.getNode(PPCISD::EXTRACT_SPE, dl, MVT::i32, Arg,
		DAG.getIntPtrConstant(id, dl));
		RegsToPass.push_back(std::make_pair(VA.getLocReg(), SVal.getValue(0)));
		SVal = DAG.getNode(PPCISD::EXTRACT_SPE, dl, MVT::i32, Arg,
		DAG.getIntPtrConstant(1 - id, dl));

		RegsToPass.push_back(std::make_pair(ArgLocs[++i].getLocReg(),
		SVal.getValue(0)));
		} else
		RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));
	} else {	} else {
	// Put argument in the parameter list area of the current stack frame.	// Put argument in the parameter list area of the current stack frame.
	assert(VA.isMemLoc());	assert(VA.isMemLoc());
		nemanjaiUnsubmitted Not Done Reply Inline Actions Minor nit: this doesn't really match the naming convention. Perhaps `IsLE`? nemanjai: Minor nit: this doesn't really match the naming convention. Perhaps `IsLE`?
Context not available.
	SmallVector<SDValue, 4> RetOps(1, Chain);	SmallVector<SDValue, 4> RetOps(1, Chain);

	// Copy the result values into the output registers.	// Copy the result values into the output registers.
	for (unsigned i = 0; i != RVLocs.size(); ++i) {	for (unsigned i = 0, realI = 0; i != RVLocs.size(); ++i, ++realI) {
	CCValAssign &VA = RVLocs[i];	CCValAssign &VA = RVLocs[i];
	assert(VA.isRegLoc() && "Can only return in registers!");	assert(VA.isRegLoc() && "Can only return in registers!");

	SDValue Arg = OutVals[i];	SDValue Arg = OutVals[realI];

	switch (VA.getLocInfo()) {	switch (VA.getLocInfo()) {
	default: llvm_unreachable("Unknown loc info!");	default: llvm_unreachable("Unknown loc info!");
Context not available.
	Arg = DAG.getNode(ISD::SIGN_EXTEND, dl, VA.getLocVT(), Arg);	Arg = DAG.getNode(ISD::SIGN_EXTEND, dl, VA.getLocVT(), Arg);
	break;	break;
	}	}
		if (Subtarget.hasSPE() && VA.getLocVT() == MVT::f64) {
	Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), Arg, Flag);	bool isLittleEndian = Subtarget.isLittleEndian();
		// Legalize ret f64 -> ret 2 x i32.
		SDValue SVal =
		DAG.getNode(PPCISD::EXTRACT_SPE, dl, MVT::i32, Arg,
		DAG.getIntPtrConstant(isLittleEndian ? 0 : 1, dl));
		nemanjaiUnsubmitted Not Done Reply Inline Actions Can we be consistent about how we specify the index to the extract between here and above? I am OK with either approach as long as we use the same. nemanjai: Can we be consistent about how we specify the index to the extract between here and above? I am…
		Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), SVal, Flag);
		nemanjaiUnsubmitted Not Done Reply Inline Actions Similar comment for this code as above regarding induction variables and the naming of `isLittleEndian`. nemanjai: Similar comment for this code as above regarding induction variables and the naming of…
		RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
		SVal = DAG.getNode(PPCISD::EXTRACT_SPE, dl, MVT::i32, Arg,
		DAG.getIntPtrConstant(isLittleEndian ? 1 : 0, dl));
		Flag = Chain.getValue(1);
		VA = RVLocs[++i]; // skip ahead to next loc
		Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), SVal, Flag);
		} else
		Chain = DAG.getCopyToReg(Chain, dl, VA.getLocReg(), Arg, Flag);
	Flag = Chain.getValue(1);	Flag = Chain.getValue(1);
	RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));	RetOps.push_back(DAG.getRegister(VA.getLocReg(), VA.getLocVT()));
	}	}
Context not available.
		nemanjaiUnsubmitted Not Done Reply Inline Actions A more descriptive assert message is probably in order. Also, perhaps it would be clearer if this was rewritten to: Assert that the constant operand value is less than 2 Use a ternary operator to select the opcode (or just have one opcode - see above) nemanjai: A more descriptive assert message is probably in order. Also, perhaps it would be clearer if…
		jhibbitsAuthorUnsubmitted Done Reply Inline Actions This assert was during some debugging, and I forgot to remove it. However, it does make sense to have an assert along the lines of your suggestion. I"ll make such a change. jhibbits: This assert was during some debugging, and I forgot to remove it. However, it does make sense…
		nemanjaiUnsubmitted Done Reply Inline Actions The early exit should be first. nemanjai: The early exit should be first.
		nemanjaiUnsubmitted Not Done Reply Inline Actions Line too long? nemanjai: Line too long?
		jhibbitsAuthorUnsubmitted Done Reply Inline Actions Yeah, just a hair (one character). Reformatting with clang-format will fix that. jhibbits: Yeah, just a hair (one character). Reformatting with clang-format will fix that.
		nemanjaiUnsubmitted Done Reply Inline Actions The indentation is off. Maybe run `clang-format` on this function. I don't know which editor you use but if you use `Vim`, you can run `:7827,7845 ! clang-format` if you have `clang-format` in your `$PATH`. Also, it seems like you might want to check the input type as well - maybe with an assert if it can't be anything other than `f64`. nemanjai: The indentation is off. Maybe run `clang-format` on this function. I don't know which editor…

lib/Target/PowerPC/PPCInstrInfo.td

Context not available.
	SDTCisSameAs<1,2>]>,	SDTCisSameAs<1,2>]>,
	[]>;	[]>;

		def PPCbuild_spe64: SDNode<"PPCISD::BUILD_SPE64",
		SDTypeProfile<1, 2,
		[SDTCisFP<0>, SDTCisSameSizeAs<1,2>,
		nemanjaiUnsubmitted Done Reply Inline Actions I think all the types have to be exact - `i32` inputs, `f64` output. Might as well make that explicit: `[SDTCisVT<0, f64>, SDTCisVT<1, i32>, SDTCisVT<2, i32>]` Similarly below. nemanjai: I think all the types have to be exact - `i32` inputs, `f64` output. Might as well make that…
		SDTCisSameAs<1,2>]>,
		nemanjaiUnsubmitted Not Done Reply Inline Actions Seems like we don't need to separately state that the types are the same and that they're the same size? nemanjai: Seems like we don't need to separately state that the types are the same and that they're the…
		[]>;

		def PPCextract_spe : SDNode<"PPCISD::EXTRACT_SPE",
		SDTypeProfile<1, 2,
		[SDTCisInt<0>, SDTCisFP<1>, SDTCisPtrTy<2>]>,
		[]>;

	// These are target-independent nodes, but have target-specific formats.	// These are target-independent nodes, but have target-specific formats.
	def callseq_start : SDNode<"ISD::CALLSEQ_START", SDT_PPCCallSeqStart,	def callseq_start : SDNode<"ISD::CALLSEQ_START", SDT_PPCCallSeqStart,
	[SDNPHasChain, SDNPOutGlue]>;	[SDNPHasChain, SDNPOutGlue]>;
Context not available.

lib/Target/PowerPC/PPCInstrSPE.td

Context not available.

	def EVMERGEHI : EVXForm_1<556, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),	def EVMERGEHI : EVXForm_1<556, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),
	"evmergehi $RT, $RA, $RB", IIC_VecGeneral, []>;	"evmergehi $RT, $RA, $RB", IIC_VecGeneral, []>;
	def EVMERGELO : EVXForm_1<557, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),	def EVMERGELO : EVXForm_1<557, (outs sperc:$RT), (ins gprc:$RA, gprc:$RB),
	"evmergelo $RT, $RA, $RB", IIC_VecGeneral, []>;	"evmergelo $RT, $RA, $RB", IIC_VecGeneral, []>;
	def EVMERGEHILO : EVXForm_1<558, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),	def EVMERGEHILO : EVXForm_1<558, (outs sperc:$RT), (ins sperc:$RA, sperc:$RB),
	"evmergehilo $RT, $RA, $RB", IIC_VecGeneral, []>;	"evmergehilo $RT, $RA, $RB", IIC_VecGeneral, []>;
Context not available.
	(SELECT_SPE (CRANDC $lhs, $rhs), $tval, $fval)>;	(SELECT_SPE (CRANDC $lhs, $rhs), $tval, $fval)>;
	def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETNE)),	def : Pat<(f64 (selectcc i1:$lhs, i1:$rhs, f64:$tval, f64:$fval, SETNE)),
	(SELECT_SPE (CRXOR $lhs, $rhs), $tval, $fval)>;	(SELECT_SPE (CRXOR $lhs, $rhs), $tval, $fval)>;


		def : Pat<(f64 (PPCbuild_spe64 i32:$rB, i32:$rA)),
		(f64 (COPY_TO_REGCLASS (EVMERGELO $rA, $rB), SPERC))>;

		def : Pat<(i32 (PPCextract_spe f64:$rA, 1)),
		(i32 (EXTRACT_SUBREG (EVMERGEHI $rA, $rA), sub_32))>;
		def : Pat<(i32 (PPCextract_spe f64:$rA, 0)),
		(i32 (EXTRACT_SUBREG $rA, sub_32))>;

	}	}
Context not available.

test/CodeGen/PowerPC/spe.ll

Context not available.
	; CHECK-LABEL: test_dselect	; CHECK-LABEL: test_dselect
	; CHECK: andi.	; CHECK: andi.
	; CHECK: bc	; CHECK: bc
	; CHECK: evldd	; CHECK: evor
	; CHECK: b	; CHECK: evmergehi
	; CHECK: evldd
	; CHECK: evstdd
	; CHECK: blr	; CHECK: blr
	}	}

Context not available.
	%1 = call i32 asm sideeffect "efdctsi $0, $1", "=d,d"(double %0)	%1 = call i32 asm sideeffect "efdctsi $0, $1", "=d,d"(double %0)
	ret i32 %1	ret i32 %1
	; CHECK-LABEL: test_dasmconst	; CHECK-LABEL: test_dasmconst
	; CHECK: evldd	; CHECK: evmergelo
	; CHECK: #APP	; CHECK: #APP
	; CHECK: efdctsi	; CHECK: efdctsi
	; CHECK: #NO_APP	; CHECK: #NO_APP
Context not available.