This is an archive of the discontinued LLVM Phabricator instance.

PowerPC/SPE: Fix load/store handling for SPE
ClosedPublic

Authored by jhibbits on Nov 11 2018, 2:58 PM.

Details

Summary

Pointed out in a comment for D49754, register spilling will currently
spill SPE registers at almost any offset. However, the instructions
evstdd and evldd require a) 8-byte alignment, and b) a limit of 256
(unsigned) bytes from the base register, as the offset must fix into a
5-bit offset, which ranges from 0-31 (indexed in double-words).

This enforces the alignment in offsetMinAlign(), and enforces the spill
range check in PPCRegisterInfo::eliminateFrameIndex().

The update to the register spill test is taken partially from the test
case shown in D49754.

Additionally, pointed out by Kei Thomsen, globals will currently use
evldd/evstdd, though the offset isn't known at compile time, so may
exceed the 8-bit (unsigned) offset permitted. This fixes that as well,
by forcing it to always use evlddx/evstddx when accessing globals.

Part of the patch contributed by Kei Thomsen.

Diff Detail

Repository
rL LLVM

Event Timeline

jhibbits created this revision.Nov 11 2018, 2:58 PM

I have applied this patch to the llvm-toolchain-7 package in Debian and did not see any regressions on x86_64 or 32-Bit PowerPC. Additionally, I have included the patches from https://reviews.llvm.org/D49754 and https://reviews.llvm.org/D54583 saw no regressions on x86_64 and 32-bit PowerPC.

All three patches will be part of the next upload of the llvm-toolchain-7 package in Debian unstable which will be version 1:7.0.1~+rc2-9.

nemanjai accepted this revision.Dec 29 2018, 1:43 PM

Other than the minor nit about the test case, LGTM.

test/CodeGen/PowerPC/spe.ll
537 ↗(On Diff #173593)

Can you maybe filter this test case through something like opt -mem2reg to get rid of the extraneous alloca's to aid readability?

This revision is now accepted and ready to land.Dec 29 2018, 1:43 PM

Other than the minor nit about the test case, LGTM.

Sorry for hi-jacking this, but could you maybe also review this small PowerPC-related change? https://reviews.llvm.org/D55326

jhibbits updated this revision to Diff 185484.Feb 5 2019, 8:31 PM
jhibbits retitled this revision from PowerPC/SPE: Fix register spilling for SPE registers to PowerPC/SPE: Fix load/store handling for SPE.
jhibbits edited the summary of this revision. (Show Details)

Incorporate patch from @kthomsen for handling globals.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 8:31 PM
jhibbits requested review of this revision.Feb 19 2019, 12:40 PM

I thought this would've been auto-marked as needs review when I updated, but apparently not.

nemanjai requested changes to this revision.Feb 20 2019, 3:46 AM

I can't really review this without context. Could you please re-upload with context?

This revision now requires changes to proceed.Feb 20 2019, 3:46 AM
jhibbits updated this revision to Diff 188755.Feb 28 2019, 10:31 AM

Address feedback. Provide full context for diffs.

nemanjai requested changes to this revision.Mar 31 2019, 5:14 AM

I am only requesting changes for the refactoring of the function. As I mentioned in the inline comment, I am not necessarily requesting that you rework the addressing mode computation functions as I mentioned, just that you clean up the code in SelectAddressRegReg().

lib/Target/PowerPC/PPCISelLowering.cpp
2213 ↗(On Diff #188755)

I really don't think this belongs here. This should probably be in another function. Perhaps something like SelectAddressEVXRegReg(...). This function would need to decide when r+r mode needs to be used.

Furthermore, it is probably buggy that we use iaddr for EVLDD because it might go into a function like this, the node that computes the address is not an ISD::ADD and it reverts to default handling (which is set up to handle 16-bit displacements). Even if you added the code to the ISD::OR section below, it is possible that we would get it wrong as the address came from a different node that SelectAddressRegImm() ends up knowing about.

Ultimately, this is a different addressing mode and it should use different functions to compute when to use the r+r form and when to use r+i form.

That being said, I understand that this solves a current problem whereas what I am suggesting solves a future problem that may never actually occur. So I won't stand in the way of this patch - you just might want to think about re-working this.

At the very least, this code needs to be pulled out into a separate function and this can look like:

if (hasSPE() && usedForDPMemOp(N)) {
  Base = N.getOperand(0);
  Index = N.getOperand(1);
  return true;
2217 ↗(On Diff #188755)

This is not necessary. You should be able to do something like:

if (MemSDNode *Memop = dyn_cast<MemSDNode>(*UI)) {
  if (Memop->getMemoryVT() != MVT::f64)
    // ...
}
lib/Target/PowerPC/PPCRegisterInfo.cpp
969 ↗(On Diff #188755)

A more descriptive name might be something like OffsetFitsMnemonic. Notice the capitalization of the variable name as per coding guidelines.

This revision now requires changes to proceed.Mar 31 2019, 5:14 AM
jhibbits marked 2 inline comments as done.Apr 29 2019, 7:26 AM
jhibbits added inline comments.
lib/Target/PowerPC/PPCRegisterInfo.cpp
969 ↗(On Diff #188755)

Oops, I always forget about the capitalizations. Sorry.

jhibbits updated this revision to Diff 206985.Jun 27 2019, 6:51 PM

Address nemanjai's feedback. Move the EVX check into a separate callable
function. In the future it could possibly be used as a separate addressing
mode, but a naive approach didn't work, and this solves the problem at hand.

I integrated the last patch (yes it is working) and saw, that there can be a another optimization. It is now checking for the Offset to fit into the 8-bit offset of the EVLDD / EVSTD, including an alignment of 8. This reduces the effort if variables are stored on the stack and if the variable is in a range short enough, to be addressed directly.

--- PPCISelLowering.orig.h      2019-06-28 09:45:05.061923700 +0200
+++ PPCISelLowering.h   2019-06-28 09:07:19.143760400 +0200
@@ -1200,6 +1200,9 @@ namespace llvm {
   bool isIntS16Immediate(SDNode *N, int16_t &Imm);
   bool isIntS16Immediate(SDValue Op, int16_t &Imm);
 
+  bool isIntS8Immediate(SDNode *N, int8_t &Imm);
+  bool isIntS8Immediate(SDValue Op, int8_t &Imm);
+
 } // end namespace llvm
 
 #endif // LLVM_TARGET_POWERPC_PPC32ISELLOWERING_H
--- PPCISelLowering.orig.cpp    2019-06-28 09:45:47.495524500 +0200
+++ PPCISelLowering.cpp 2019-06-28 10:07:31.750129100 +0200
@@ -2227,6 +2227,23 @@ bool llvm::isIntS16Immediate(SDValue Op,
   return isIntS16Immediate(Op.getNode(), Imm);
 }
 
+/// isIntS8Immediate - This method tests to see if the node is either a 32-bit
+/// or 64-bit immediate, and if the value can be accurately represented as a
+/// sign extension from a 8-bit value.  If so, this returns true and the
+/// immediate.
+bool llvm::isIntS8Immediate(SDNode *N, int8_t &Imm) {
+  if (!isa<ConstantSDNode>(N))
+    return false;
+  Imm = (int8_t)cast<ConstantSDNode>(N)->getZExtValue();
+  if (N->getValueType(0) == MVT::i32)
+    return Imm == (int32_t)cast<ConstantSDNode>(N)->getZExtValue();
+  else
+    return Imm == (int64_t)cast<ConstantSDNode>(N)->getZExtValue();
+}
+bool llvm::isIntS8Immediate(SDValue Op, int8_t &Imm) {
+  return isIntS8Immediate(Op.getNode(), Imm);
+}
+
 /// SelectAddressEVXRegReg - Given the specified address, check to see if it can
 /// be represented as an indexed [r+r] operation.
 bool PPCTargetLowering::SelectAddressEVXRegReg(SDValue N, SDValue &Base,
@@ -2259,8 +2276,15 @@ bool PPCTargetLowering::SelectAddressReg
     if (hasSPE())
       // Is there any SPE load/store (f64), which can't handle 16bit offset?
       // SPE load/store can only handle 8-bit offsets.
-      if (SelectAddressEVXRegReg(N, Base, Index, DAG))
-        return true;
+      if (SelectAddressEVXRegReg(N, Base, Index, DAG)) {
+        int8_t imm8 = 0;
+        // Check if the offset is small enough to fit into the 8-bit
+        // e.g. a value on the stack
+        if (isIntS8Immediate(N.getOperand(1), imm8) && !(imm8 % 8))
+          return false; // offset is okay for the 8-bit offset 
+        else
+          return true; // offset is > 8-bit or unknown yet
+      }
     if (isIntS16Immediate(N.getOperand(1), imm) &&
         (!EncodingAlignment || !(imm % EncodingAlignment)))
       return false; // r+i

What do you think?
Best regards, Kei

The immediate offsets for the evldd/evstdd instructions are UInt8, not SInt8, but otherwise your change looks fine. Do you have additional tests for it? You had mentioned before about problematic relocations, is that still the case with your patch?

I found an even better way to check for the Unsigned (thank you for the hint) 8bit with alignment 8 offset. It has now been integrated into the SelectAddressEVXRegReg() function. Therefore the calling function SelectAddressRegReg() doesn't need to deal with any EVx instructions specials, it is all done in the EVXRegReg function.
Explanation for the SelectAddressEVXRegReg function: If it can find a MVT::f64 (there can only be maximal one MVT::f64) , the offset used here will be checked if it is known now and if the offset is 0..255, with an alignment of 8. Then a false is returned and the calling SelectAddressRegReg() will continue with the isIntS16Immediate(), which will be true as well (if it fits into 8 bit unsigned, it will fit into 16 bit signed as well).

--- PPCISelLowering.orig.cpp    2019-07-01 09:08:11.444438400 +0200
+++ PPCISelLowering.cpp 2019-07-01 08:45:16.911244700 +0200
@@ -2227,9 +2227,27 @@ bool llvm::isIntS16Immediate(SDValue Op,
   return isIntS16Immediate(Op.getNode(), Imm);
 }
 
+/// isIntU8Immediate - This method tests to see if the node is either a 32-bit
+/// or 64-bit immediate, and if the value can be accurately represented as a
+/// zero (unsigned) extension from a 8-bit value.  If so, this returns true 
+/// and the immediate.
+bool llvm::isIntU8Immediate(SDNode *N, uint8_t &Imm) {
+  if (!isa<ConstantSDNode>(N))
+    return false;
+  Imm = (uint8_t)cast<ConstantSDNode>(N)->getZExtValue();
+  if (N->getValueType(0) == MVT::i32)
+    return Imm == (int32_t)cast<ConstantSDNode>(N)->getZExtValue();
+  else
+    return Imm == (int64_t)cast<ConstantSDNode>(N)->getZExtValue();
+}
+bool llvm::isIntU8Immediate(SDValue Op, uint8_t &Imm) {
+  return isIntU8Immediate(Op.getNode(), Imm);
+}
 
-/// SelectAddressEVXRegReg - Given the specified address, check to see if it can
-/// be represented as an indexed [r+r] operation.
+/// SelectAddressEVXRegReg - Given the specified address, check to see if it must
+/// be represented as an indexed [r+r] operation for EVLDD and EVSTD instruction.
+/// If the address offset is know now, it will be checked if it fits into the 
+/// 8-bit offset, with an alignment of 8.
 bool PPCTargetLowering::SelectAddressEVXRegReg(SDValue N, SDValue &Base,
                                                SDValue &Index,
                                                SelectionDAG &DAG) const {
@@ -2237,9 +2255,13 @@ bool PPCTargetLowering::SelectAddressEVX
       UI != E; ++UI) {
     if (MemSDNode *Memop = dyn_cast<MemSDNode>(*UI)) {
       if (Memop->getMemoryVT() == MVT::f64) {
-          Base = N.getOperand(0);
-          Index = N.getOperand(1);
-          return true;
+           uint8_t imm = 0;
+        if (isIntU8Immediate(N.getOperand(1), imm)
+            && !(imm % EVXEncodingAlignment))
+          return false; // offset is okay for the 8-bit offset
+        Base = N.getOperand(0);
+        Index = N.getOperand(1);
+        return true; // offset is unknown or too large. Use reg-reg
       }
     }
   }
--- PPCISelLowering.orig.h      2019-07-01 09:08:11.462574400 +0200
+++ PPCISelLowering.h   2019-07-01 08:39:14.625343900 +0200
@@ -1199,6 +1200,11 @@ namespace llvm {
   bool isIntS16Immediate(SDNode *N, int16_t &Imm);
   bool isIntS16Immediate(SDValue Op, int16_t &Imm);
 
+  bool isIntU8Immediate(SDNode *N, uint8_t &Imm);
+  bool isIntU8Immediate(SDValue Op, uint8_t &Imm);
+
+  // The EVLDD and EVSTD are using 8bit offset with an aligment of 8
+  const unsigned EVXEncodingAlignment = 8;
 } // end namespace llvm
 
 #endif // LLVM_TARGET_POWERPC_PPC32ISELLOWERING_H

I'm not sure about the

const unsigned EVXEncodingAlignment = 8;

in the headerfile. I want to use a "name" for the alignment, instead of a hardcoded "8". What/where is the correct usage for such constants?

I don't have additional tests for it, as I haven't run the compiler tests and don't know how specify the tests. My local tests are to compile all Libraries I have for OS-9, several benchmarks (e.g. whetstone, dhrystone) and other tests to see if it is correct. I know that this test information is not enough here, but I can't handle it in a different way at the moment.

There are no open relocation issues seen from my side.

The only thing I haven't seen is the patch for the compares, I have made in PPCISelDAGToDAG.cpp in D54583 on March 27. Without this patch, the FPU (SPE) compares are not correct. Or has this been addressed in a different way somewhere else (which I might have not seen)?

I found one more patch, where I'm not sure if we already had this one and removed it, or if I simply forgot to post it.

--- PPCISelLowering.orig.cpp    2019-07-01 09:08:11.444438400 +0200
+++ PPCISelLowering.cpp 2019-07-01 08:45:16.911244700 +0200
@@ -3011,7 +3114,7 @@ SDValue PPCTargetLowering::LowerVAARG(SD
                                     VAListPtr, MachinePointerInfo(SV), MVT::i8);
   InChain = GprIndex.getValue(1);
 
-  if (VT == MVT::i64) {
+  if ((VT == MVT::i64) || (hasSPE() && (VT == MVT::f64))) {
     // Check if GprIndex is even
     SDValue GprAnd = DAG.getNode(ISD::AND, dl, MVT::i32, GprIndex,
                                  DAG.getConstant(1, dl, MVT::i32));

This has been done, to make sure the register pair for a f64 is the same (even register) like when using i64.

nemanjai accepted this revision.Jul 16 2019, 2:05 PM

LGTM other than a few minor nits.

lib/Target/PowerPC/PPCISelLowering.cpp
2239 ↗(On Diff #206985)

Am I reading this correctly? If this node is used for any memory operation that is accessing an f64, we will never use r+i addressing. Is that the desired behaviour?

So if I'm not mistaken, we have the following semantics:

  • If we are accessing an MVT::f64, always use r+r
  • If we are accessing any other type, use r+i if the offset is a 16-bit signed value
    • Otherwise use r+r addressing
2263 ↗(On Diff #206985)

We can probably combine this condition with the one above into a single if, but that can be done on the commit - no need for another review.

This revision is now accepted and ready to land.Jul 16 2019, 2:05 PM
jhibbits marked an inline comment as done.Jul 16 2019, 6:45 PM
jhibbits added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
2239 ↗(On Diff #206985)

Yes, that's it for right now. @kthomsen has a fix (commented above), but I wanted to get this in first, and then optimize it after, so @kthomsen's patch will be a separate change.

This revision was automatically updated to reflect the committed changes.