Page MenuHomePhabricator

[PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns.
ClosedPublic

Authored by amyk on Dec 15 2020, 10:37 PM.

Details

Summary

This patch introduces a new infrastructure that is used to select the load and store instructions in the PPC backend.

The primary motivation is that the current implementation of selecting load/stores is dependent on the ordering of patterns in TableGen.
Given this limitation, we are not able to easily and reliably generate the P10 prefixed load and stores instructions (such as when
the immediates that fit within 34-bits). This refactoring is meant to provide us with more control over the patterns/different forms to exploit,
as well as eliminating dependency of pattern declaration in TableGen.

The idea of this refactoring is that it introduces a set of addressing modes that correspond to different instruction formats
of a particular load and store instruction, along with a set of common flags that describes a load/store. Whenever a load/store
instruction is being selected, we analyze the instruction and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.

This computation of flags is done in computeMOFlags(), while selecting the optimal addressing mode is done through
getAddrModeForFlags(); in which this functions searches for a set of address flags stored in a map that relates common flags to
addressing modes. Once the optimal addressing mode is determined, this information is given to SelectOptimalAddrMode(),
where we set the base and displacement of the load/store accordingly based on the addressing mode.

Another thing to note is the SelectForceXForm() function is similar to SelectAddressRegRegOnly(), with an updated naming
and a removed condition to better suit the refactoring that is being done on the loads/stores.

This patch is the first of a series of patches to be committed - it contains the initial implementation of the refactored load/store
selection infrastructure and also updates P8/P9 patterns to adopt this infrastructure. The idea is that incremental patches will
add more implementation and support, and eventually the old implementation will be removed.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
stefanp added inline comments.Dec 17 2020, 8:03 AM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
2529

I find it odd that you are zero extending this (ie unsigned) and then casting it to a signed.
You can get into all kinds of issues with this kind of thing.
For example:

int16_t a = -1;  // This is 0xFFFF
int32_t b = zeroExtend(a); // This is 0x0000FFFF (Not -1 but 65535)

It is important to think about what the possible value types are for N. If the only possible types are MVT::i32 and MVT::i64 then we are fine.

2533

You can probably simplify this a little bit. See what others think too...

Imm = (int32_t)cast<ConstantSDNode>(N)->getZExtValue();
int64_t Imm64 = (int64_t)cast<ConstantSDNode>(N)->getZExtValue();
return isInt<32>(Imm64);
17034

For this case it may be easier to just work with the APInt instead of creating your own functions.
So, you may not need isIntS32Immediate and isIntS34Immediate. Especially since you don't use Imm34.

I should have said this in my first comments. I like this refactor. I just want to make sure I understand this refactoring more clear^_^. Thanks for your detailed explanation. @nemanjai

For legacy infra, I have one pain for the matching efficiency. When we match a worst-case in td files, we will
1: first check dform candidate and in dform candidate handling function SelectAddressRegImm, we check xform candidate. If it fails:
2: check xform candidate and in xform candidate handling function SelectAddressRegReg, we will check dform candidate. If it fails:
3: check xform only candidate and in xform only candidate handling function SelectAddressRegRegOnly, we will get the desired hardware instructions.

I think there must be logic redundant. We check again and again for the load/store address mode.

So for the new infra, I am thinking should we avoid this redundant logic? I still see redundant checks/flags collects for one load/store instruction. We first check SelectDSForm and then check SelectXForm and then SelectForceXForm according to the order in td files. In each SelectXXXForm(), we will collect the flags once. There should also be some redundant logics?

Can we select the address mode not starting from td files? Instead, we start the selection from cpp file for IR level load? For example in function PPCDAGToDAGISel::Select(), change case ISD::LOAD and inside the case, we call SelectOptimalAddrMode and then select the PPC instruction directly in cpp files.

So for the new infra, I am thinking should we avoid this redundant logic? I still see redundant checks/flags collects for one load/store instruction. We first check SelectDSForm and then check SelectXForm and then SelectForceXForm according to the order in td files. In each SelectXXXForm(), we will collect the flags once. There should also be some redundant logics?

Can we select the address mode not starting from td files? Instead, we start the selection from cpp file for IR level load? For example in function PPCDAGToDAGISel::Select(), change case ISD::LOAD and inside the case, we call SelectOptimalAddrMode and then select the PPC instruction directly in cpp files.

I think this may be an interesting follow-up to optimize this slightly. However, I would imagine we would do this in PreprocessISelDAG(). Namely, we can replace ISD::LOAD with something like PPCISD::LOAD_W_ADDRMODE (and similarly for the stores). That way we can still specify more complex patterns in .td files rather than reimplementing everything in C++ code. Namely, we can still add things like:

def : Pat<(f128 (sint_to_fp (i64 (load xaddrX4:$src)))),
          (f128 (XSCVSDQP (LXSDX xaddrX4:$src)))>;

except it would be something like:

def : Pat<(f128 (sint_to_fp (i64 (PPCloadWAddForm DSForm:$src)))),
          (f128 (XSCVSDQP (LXSD DSForm:$src)))>;

However, considering the number of loads/stores in a typical DAG and how quickly these flags, etc. can be computed, this isn't a huge compile time improvement.

amyk updated this revision to Diff 315122.Jan 7 2021, 6:26 AM
amyk retitled this revision from [PowerPC][NFC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns. to [PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns..
  • Updated names of the selection functions in the td patterns
  • Address the comment of using APInt when computing address flags
  • Removed NFC from the title as there is one test case update that we expect a DSForm in (instead of an XForm instruction)
amyk marked an inline comment as done.Jan 7 2021, 6:28 AM

Removed the addition of isIntS32Immediate() - it is no longer needed after addressing the comment of using APInt.

Thanks a lot for working on that Amy ! I have some comments on the patch.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16959

The address flags are are stored -> The address flags are stored

16966–16967

I think you can directly use the Subtarget field of the PPCTargetLowering class instead of using a static_cast here.

16980

This line should be moved below just before the assert.

17027

consstants -> constants

17056–17057

Should we name this flag MOF_NotAddNorCst then ?

17105–17107

I know this is code we already have in SelectAddressRegRegOnly, but isn't that condition weird ? Or at least it doesn't match the comment above. If I understand correctly, this is the case where we get rid of the add, but looking at the condition, we get rid of it only if it is not an add of a value and a 16-bit signed constant or if one of the operands doesn't have a single use.
Shouldn't the condition be (N.getOpcode() == ISD::ADD && !isIntS16Immediate(N.getOperand(1), ForceXFormImm) && N.getOperand(1).hasOneUse() && N.getOperand(0).hasOneUse()) instead ?

llvm/lib/Target/PowerPC/PPCISelLowering.h
695–700

The flag names are not really uniform, I guess DWInt is for DoubleWordInt ? I think we should drop the abbreviation here so it is more clear what the flags represent. Something like:

MOF_SubWordInt = 1 << 15,
MOF_WordInt = 1 << 16,
MOF_DoubleWordInt = 1 << 17,
MOF_ScalarFloat = 1 << 18,
MOF_Vector = 1 << 19,
MOF_Vector256 = 1 << 20,
703

Maybe we should name this flag MOF_SubtargetBeforeP9 or something like that, to make it clear that the flag is not set on P10.

722–723

I think the map and the function declaration should be moved with the others private fields below.

1089

This method should be private.

1094

This method should be private.

amyk updated this revision to Diff 318115.Jan 20 2021, 10:54 PM

Addressed review comments:

  • Rename some of the MemOpFlags
  • Fixed typo in comments
  • Moved variables near their use
  • Made some of the methods private
amyk marked 10 inline comments as done.Jan 21 2021, 3:12 PM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
17105–17107

That's a good point, Baptiste. You're right in that this is the case where we eliminate the add.

I thought about it a bit I believe the this condition might match the comment more (only get rid of the add if we do not have an add of a value and a signed 16-bit immediate, and the two operands don't have a single use)

if (N.getOpcode() == ISD::ADD &&
      (!(isIntS16Immediate(N.getOperand(1), imm) &&
         N.getOperand(1).hasOneUse() && N.getOperand(0).hasOneUse())))

which I believe, should then be equivalent to the condition in the code.

But yes, the condition was previously in SelectAddressRegRegOnly() which is why I kept it.
If there are any more concerns on the condition and/or comment, I can probably adjust it.

bsaleil accepted this revision as: bsaleil.Jan 22 2021, 3:10 PM

Thanks for addressing the comments, LGTM now.

This revision is now accepted and ready to land.Jan 22 2021, 3:10 PM
amyk updated this revision to Diff 320765.Feb 2 2021, 5:59 AM

Update patch to rebase with latest trunk, and rearrange/clean up code slightly.

amyk updated this revision to Diff 322972.Feb 11 2021, 5:58 AM

Update patch to move the check for if we have a value with an offset that fits into a 34-bit immediate into it's own condition instead of inside an else if.

amyk updated this revision to Diff 323854.Feb 15 2021, 6:00 PM

Update patch to fix a small issue when setting the Base and Disp for DForms when we have constant that fits in 32-bits.
Previously I used a uint64_t when it should have been a uint16_t.

amyk updated this revision to Diff 330636.Mar 15 2021, 6:41 AM

Rebase this patch to the latest changes.

nemanjai requested changes to this revision.Mar 20 2021, 3:42 PM

Thank you for handling this much needed refactoring. My comments are mostly related to readability so this is very close to approval, but let's have another look when you address the comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
1432

s/instruction formats/addressing modes

1437

I think you should add some comments regarding sample instructions that correspond to these sets of flags. For example:

PPC::MOF_ZExt | PPC::MOF_RPlusSImm16 | PPC::MOF_WordInt,    // LWZ
PPC::MOF_ZExt | PPC::MOF_RPlusSImm16 | PPC::MOF_SubWordInt, // LBZ, LHZ
PPC::MOF_SExt | PPC::MOF_RPlusSImm16 | PPC::MOF_SubWordInt, // LHA
...
2521

I think we no longer need to put the name of the function in Doxygen comments.

2523–2524

Replace:

This is for when we have an OR of disjoint bitfields, we can codegen it as an add (for better address arithmetic).

with

An OR of two provably disjoint values is equivalent to an ADD. Most PPC load/store instructions compute the effective address as a sum, so doing this conversion is useful.
16952

s/return an X-Form instructions can always be matched/return an X-Form as it is the most general addressing mode

16959–16960

Remove this part of the comment. That is not done here AFAICT.

16970

I think this should be checking for prefixed instructions. It is entirely possible to have
-mcpu=pwr9 -Xclang -target-feature -Xclang -prefix-instrs (or llc -mattr=-prefix-instrs).

16983

This is a large lambda that is only called once. AFAICT, it only captures FlagSet. I think it is probably better off as a static function and FlagSet can be passed by reference.

Removing it will also remove SetAlignFlagsForImm from this function and the whole thing should become significantly more readable.

Also as a minor nit, you are less likely to have these mismatches in naming/capitalization such as between SetAlignFlagsForImm and computeFlagsForAddressComputation. Both are lambdas, one is capitalized as a variable and the other as a function - understandable omission since a lambda is both.

17044

s/Vectors only/Integer vectors.

17055

Seems like we should have another unreachable here in case we end up with an illegal floating point type such as half precision or some weird FP type (such as Intel's 80-bit FP).

17078

Add this to the comment:

// We set the extension mode to zero extension so we don't have
// to add separate entries in AddrModesMap for stores and loads.
17084–17085
// If we don't have prefixed instructions, 34-bit constants should be
// treated as PPC::MOF_NotAddNorCst so they can match D-Forms.
17086–17089
bool Is34BitConstNoP10 =
    (PPC::MOF_RPlusSImm34 | PPC::MOF_AddrIsSImm32 | PPC::MOF_SubtargetP10) &
    FlagSet == PPC::MOF_RPlusSImm34;
if (N.getOpcode() != ISD::ADD && N.getOpcode() != ISD::OR &&
    IsNonP1034BitConst)
  FlagSet |= PPC::MOF_NotAddNorCst;
17156

This condition seems superfluous. Why do we check that the operand is a signed 16-bit constant (and then create another constant with the same value) when we have PPC::MOF_RPlusSImm16 set? Doesn't the flag already assure us that this is so?

Can we not just assert that this is so?

17191

Why can't this be
(CNType == MVT::i32 || isInt<32>(CNImm))

17193

For purposes where the width of the value matters, please refrain from using types with an implicit size and use explicitly sized types (i.e. int32_t vs. int, int16_t vs. short).

llvm/lib/Target/PowerPC/PPCISelLowering.h
678

// Extension mode for integer loads.

684–692
MOF_NotAddNorCst = 1 << 5,      // Not const. or sum of ptr and scalar.
MOF_RPlusSImm16 = 1 << 6,       // Reg plus signed 16-bit constant.
MOF_RPlusLo = 1 << 7,           // Reg plus signed 16-bit relocation
MOF_RPlusSImm16Mult4 = 1 << 8,  // Reg plus 16-bit signed multiple of 4.
MOF_RPlusSImm16Mult16 = 1 << 9, // Reg plus 16-bit signed multiple of 16.
MOF_RPlusSImm34 = 1 << 10,      // Reg plus 34-bit signed constant.
MOF_RPlusR = 1 << 11,           // Sum of two variables.
MOF_PCRel = 1 << 12,            // PC-Relative relocation.
MOF_AddrIsSImm32 = 1 << 13,     // A simple 32-bit constant.
698–699
MOF_ScalarFloat = 1 << 18, // Scalar single or double precision.
MOF_Vector = 1 << 19,      // Vector types and quad precision scalars.
1383

s/are are/are

This revision now requires changes to proceed.Mar 20 2021, 3:42 PM

I understand this refactoring was done to simplify adding instructions both now and in the future. Would it be possible to add comments in the code to instruct the next developer that attempts to add instructions on the how to do it.

amyk marked 24 inline comments as done.Mar 23 2021, 7:51 AM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16970

I actually added a flag for prefixed instructions in https://reviews.llvm.org/D96075. Is this approach still acceptable, or would you prefer it to be done here instead in the approach you mentioned?

amyk updated this revision to Diff 333291.Mar 25 2021, 6:57 AM
  • Create a static function to compute flags for address computation
  • Create static function to set alignment flags for FrameIndex
  • Update comments
amyk updated this revision to Diff 338417.Apr 18 2021, 7:48 PM
amyk marked 2 inline comments as done.
  • Rebase patch
  • Add documentation regarding refactored load and store implementation
  • Add the PPC::MOF_SubtargetP10 flag if the subtarget has prefixed instructions
nemanjai accepted this revision.Apr 23 2021, 2:29 PM

The rest of the comments I have are minor comment and code restructuring nits. Those can be addressed on the commit. Otherwise, LGTM.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16963

Please reduce nesting by flipping this and early-exiting:

FrameIndexSDNode *FI =
  dyn_cast<FrameIndexSDNode>(IsAdd ? N.getOperand(0) : N);
if (!FI)
  return;
// The rest of the code is not nested in this if
16966

It might be clear to call this something like FrameIndexAlign since it seems to refer to alignment rather than a value.

17194
// This is a register plus a 16-bit immediate. The base will be the
// register and the displacement will be the immediate unless it
// isn't sufficiently aligned.
17201–17206
Disp = DAG.getTargetConstant(Imm, DL, N.getValueType());
Base = Op0;
if (FrameIndexSDNode *FI = dyn_cast<FrameIndexSDNode>(Op0)) {
  Base = DAG.getTargetFrameIndex(FI->getIndex(), N.getValueType());
  fixupFuncForFI(DAG, FI->getIndex(), N.getValueType());
}
17210

This is not necessarily a load I think.

// This is a register plus the @lo relocation. The base is the register
// and the displacement is the global address.
17220
// This is a constant address at most 32 bits. The base will be
// zero or load-immediate-shifted and the displacement will be
// the low 16 bits of the address.
This revision is now accepted and ready to land.Apr 23 2021, 2:29 PM
This revision was landed with ongoing or failed builds.Apr 30 2021, 7:53 AM
This revision was automatically updated to reflect the committed changes.