Page MenuHomePhabricator

[PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns.
AcceptedPublic

Authored by amyk on Dec 15 2020, 10:37 PM.

Details

Reviewers
power-llvm-team
nemanjai
bsaleil
Group Reviewers
Restricted Project
Summary

This patch introduces a new infrastructure that is used to select the load and store instructions in the PPC backend.

The primary motivation is that the current implementation of selecting load/stores is dependent on the ordering of patterns in TableGen.
Given this limitation, we are not able to easily and reliably generate the P10 prefixed load and stores instructions (such as when
the immediates that fit within 34-bits). This refactoring is meant to provide us with more control over the patterns/different forms to exploit,
as well as eliminating dependency of pattern declaration in TableGen.

The idea of this refactoring is that it introduces a set of addressing modes that correspond to different instruction formats
of a particular load and store instruction, along with a set of common flags that describes a load/store. Whenever a load/store
instruction is being selected, we analyze the instruction and compute a set of flags for it. The computed flags are then used to
select the most optimal load/store addressing mode.

This computation of flags is done in computeMOFlags(), while selecting the optimal addressing mode is done through
getAddrModeForFlags(); in which this functions searches for a set of address flags stored in a map that relates common flags to
addressing modes. Once the optimal addressing mode is determined, this information is given to SelectOptimalAddrMode(),
where we set the base and displacement of the load/store accordingly based on the addressing mode.

Another thing to note is the SelectForceXForm() function is similar to SelectAddressRegRegOnly(), with an updated naming
and a removed condition to better suit the refactoring that is being done on the loads/stores.

This patch is the first of a series of patches to be committed - it contains the initial implementation of the refactored load/store
selection infrastructure and also updates P8/P9 patterns to adopt this infrastructure. The idea is that incremental patches will
add more implementation and support, and eventually the old implementation will be removed.

Diff Detail

Event Timeline

amyk created this revision.Dec 15 2020, 10:37 PM
amyk requested review of this revision.Dec 15 2020, 10:37 PM

So, is this patch still incomplete as I didn't see the test change ?

llvm/lib/Target/PowerPC/PPCInstr64Bit.td
1068

Not sure if this is handled correctly as you are removing the align restrict for LWA. Technical speaking, we need to remove all such kind of alignment restrict in the load/store as far as we did some analysis in the source code. But I notice that we are removing the align restrict here but still keep it for LD.

lkail added a subscriber: lkail.Dec 16 2020, 4:55 AM
lkail added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
2496

Should it be a static function?

amyk added a comment.Dec 16 2020, 10:32 AM

@steven.zhang The idea is that this patch should be NFC. All existing load/store test cases should pass with this refactoring. I do think there should be more tests added, perhaps in a follow up patch. What do you think?

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
2496

Good point. I will fix this.

llvm/lib/Target/PowerPC/PPCISelLowering.h
712

Add comments to describe the functions.

llvm/lib/Target/PowerPC/PPCInstr64Bit.td
1068

You're right; I meant to remove the align/unalign in the other LD,LWA patterns, as well, as the new load/store infrastructure is meant to use load/sextload/zextload and compute alignment based on the flags.

I think it is better to explicitly give some reasons why we need this big refactoring, in other words, what's the disadvantage/limitation of legacy implementation? Thank you for the big effort.

llvm/lib/Target/PowerPC/PPCISelLowering.h
693

The following features should not be associated to one specific memory operation? Should we add them into the flag set of each memory operation in computeMOFlags? Is it possible to get the sub-target info when we do am selection after we get all flags?

amyk updated this revision to Diff 312325.Dec 16 2020, 5:01 PM

Add more comments to functions, make provablyDisjointOr() static, fix pattern in PPCInstr64Bit.td.

All existing tests in the backend run successfully with this patch.

amyk added a comment.Dec 16 2020, 5:14 PM

I think it is better to explicitly give some reasons why we need this big refactoring, in other words, what's the disadvantage/limitation of legacy implementation? Thank you for the big effort.

That is a good point. Thank you for bringing this up. The primary reason behind this refactoring is that it would allow us to easily exploit prefixed load and stores when the offset is large (such as for immediates that fit within 34-bits).

The current implementation of selecting load/stores is very dependent on the ordering of patterns within TableGen, and we aren't able to easily and reliably generated the prefixed load/stores (also since those patterns would also be written in a different file).
Essentially, the refactoring of how the loads/stores are handled is meant to provide us with more control over the patterns/different forms to exploit. Additionally, we want to eliminate the dependency of pattern declaration ordering as much as possible so regardless of the order of the patterns that the compiler tries to match, we can generate the best possible code.

amyk edited the summary of this revision. (Show Details)Dec 16 2020, 5:17 PM
amyk added inline comments.Dec 16 2020, 5:26 PM
llvm/lib/Target/PowerPC/PPCISelLowering.h
693

I apologize, I think I do not quite follow. If you could clarify, that would be great.

Are you suggesting that these should not be flags, but more of Subtarget checks after the flags are computed?

I think it can be useful to store the subtarget information in the set of flags is that we can easily know which instructions we can produce on P9 and P10.

I was thinking for instance, if we have flags set for SubtargetP9, ScalFlt and RPlusSImm16Mult4 (register + signed 16 bit immediate, multiple of 4), then we know we can generate the DS-Form (corresponding to DFLOAD pseudoinstruction. Or, when we have PPC::MOF_SubtargetP10 and PPC::MOF_RPlusSImm34 (signed 34-bit immediate), we can know we can generate prefixed load/stores.

The primary motivation is that the current implementation of selecting load/stores is dependent on the ordering of patterns in TableGen.

I think this also can be resolved by AddedComplexity in td files? Is it convenient to solve the issue in D91279 when the worst case happens in instruction selection for load/store. With legacy infra, we have to select a xform with one zero register operand for worst case. But seems on P10, we prefer a dform with 0 offset, it is good for some linker opt. That issue does not have a simple fix with legacy infra because the worst case is handled in select xform only function, we must select a xform, so we must use zero register. Could it be solved easier under the new infra? Thanks.

llvm/lib/Target/PowerPC/PPCISelLowering.h
693

Are you suggesting that these should not be flags, but more of Subtarget checks after the flags are computed?

Yes, that was my first thought. For normal load/store instructions before ISEL, we can get type, address info(zext/sext/imm) from the instruction itself. But sub-target is not loads/stores characteristic. we can check the sub-target info when we do the address mode selection when we have other flags. But I am ok if making the sub-target as a flag for load/store for implementation convenience.

@steven.zhang The idea is that this patch should be NFC. All existing load/store test cases should pass with this refactoring. I do think there should be more tests added, perhaps in a follow up patch. What do you think?

Yeah, you'd better tag the revision as NFC if it is.

Just my general $0.02 regarding this refactoring effort...

The existing infrastructure was fine for quite a while. We had D-Form, DS-Form and X-Form which made things reasonably simple. However, we started adding new addressing forms - DQ-Form, MLS:D-Form, 8LS:D-Form, etc. which really started adding complexity to the existing infrastructure that was difficult to understand. Ultimately, the patterns were asking the wrong questions that required an increasing amount of context to answer. Questions like "is this a reg+reg operation with an immediate displacement that is a multiple of 4/16" (SelectAddrIdxX4/SelectAddrIdxX16). A question like that doesn't even fundamentally make sense. If the address is represented as an addition of two registers, the question of whether one of them is a multiple of anything is meaningless. Of course, we needed that in order to avoid selecting an X-Form when a D[SQ]-Form is available, but that is still a weird way to structure a query.
Ultimately, selection of memory access instructions wants to know one thing - "What is the optimal addressing form for this access?" So when selecting an instruction, the question is "is my addressing form the optimal one?"
That is what this refactoring aims to accomplish. So in the future, if we get new D<whatever>-Form instructions that have different requirements for alignment/displacement size/displacement alignment/etc. we should be able to easily extend this.

Furthermore, this reduces/avoids the need for hacks such as AddedComplexity and CodeSize.

amyk retitled this revision from [PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns. to [PowerPC][NFC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns..Dec 17 2020, 7:04 AM

I just had some minor comments. I think it makes sense overall.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
2482

I find it odd that you are zero extending this (ie unsigned) and then casting it to a signed.
You can get into all kinds of issues with this kind of thing.
For example:

int16_t a = -1;  // This is 0xFFFF
int32_t b = zeroExtend(a); // This is 0x0000FFFF (Not -1 but 65535)

It is important to think about what the possible value types are for N. If the only possible types are MVT::i32 and MVT::i64 then we are fine.

2486

You can probably simplify this a little bit. See what others think too...

Imm = (int32_t)cast<ConstantSDNode>(N)->getZExtValue();
int64_t Imm64 = (int64_t)cast<ConstantSDNode>(N)->getZExtValue();
return isInt<32>(Imm64);
16576

For this case it may be easier to just work with the APInt instead of creating your own functions.
So, you may not need isIntS32Immediate and isIntS34Immediate. Especially since you don't use Imm34.

I should have said this in my first comments. I like this refactor. I just want to make sure I understand this refactoring more clear^_^. Thanks for your detailed explanation. @nemanjai

For legacy infra, I have one pain for the matching efficiency. When we match a worst-case in td files, we will
1: first check dform candidate and in dform candidate handling function SelectAddressRegImm, we check xform candidate. If it fails:
2: check xform candidate and in xform candidate handling function SelectAddressRegReg, we will check dform candidate. If it fails:
3: check xform only candidate and in xform only candidate handling function SelectAddressRegRegOnly, we will get the desired hardware instructions.

I think there must be logic redundant. We check again and again for the load/store address mode.

So for the new infra, I am thinking should we avoid this redundant logic? I still see redundant checks/flags collects for one load/store instruction. We first check SelectDSForm and then check SelectXForm and then SelectForceXForm according to the order in td files. In each SelectXXXForm(), we will collect the flags once. There should also be some redundant logics?

Can we select the address mode not starting from td files? Instead, we start the selection from cpp file for IR level load? For example in function PPCDAGToDAGISel::Select(), change case ISD::LOAD and inside the case, we call SelectOptimalAddrMode and then select the PPC instruction directly in cpp files.

So for the new infra, I am thinking should we avoid this redundant logic? I still see redundant checks/flags collects for one load/store instruction. We first check SelectDSForm and then check SelectXForm and then SelectForceXForm according to the order in td files. In each SelectXXXForm(), we will collect the flags once. There should also be some redundant logics?

Can we select the address mode not starting from td files? Instead, we start the selection from cpp file for IR level load? For example in function PPCDAGToDAGISel::Select(), change case ISD::LOAD and inside the case, we call SelectOptimalAddrMode and then select the PPC instruction directly in cpp files.

I think this may be an interesting follow-up to optimize this slightly. However, I would imagine we would do this in PreprocessISelDAG(). Namely, we can replace ISD::LOAD with something like PPCISD::LOAD_W_ADDRMODE (and similarly for the stores). That way we can still specify more complex patterns in .td files rather than reimplementing everything in C++ code. Namely, we can still add things like:

def : Pat<(f128 (sint_to_fp (i64 (load xaddrX4:$src)))),
          (f128 (XSCVSDQP (LXSDX xaddrX4:$src)))>;

except it would be something like:

def : Pat<(f128 (sint_to_fp (i64 (PPCloadWAddForm DSForm:$src)))),
          (f128 (XSCVSDQP (LXSD DSForm:$src)))>;

However, considering the number of loads/stores in a typical DAG and how quickly these flags, etc. can be computed, this isn't a huge compile time improvement.

amyk updated this revision to Diff 315122.Jan 7 2021, 6:26 AM
amyk retitled this revision from [PowerPC][NFC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns. to [PowerPC] Add new infrastructure to select load/store instructions, update P8/P9 load/store patterns..
  • Updated names of the selection functions in the td patterns
  • Address the comment of using APInt when computing address flags
  • Removed NFC from the title as there is one test case update that we expect a DSForm in (instead of an XForm instruction)
amyk marked an inline comment as done.Jan 7 2021, 6:28 AM

Removed the addition of isIntS32Immediate() - it is no longer needed after addressing the comment of using APInt.

Thanks a lot for working on that Amy ! I have some comments on the patch.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16501

The address flags are are stored -> The address flags are stored

16508–16509

I think you can directly use the Subtarget field of the PPCTargetLowering class instead of using a static_cast here.

16522

This line should be moved below just before the assert.

16569

consstants -> constants

16598–16599

Should we name this flag MOF_NotAddNorCst then ?

16647–16649

I know this is code we already have in SelectAddressRegRegOnly, but isn't that condition weird ? Or at least it doesn't match the comment above. If I understand correctly, this is the case where we get rid of the add, but looking at the condition, we get rid of it only if it is not an add of a value and a 16-bit signed constant or if one of the operands doesn't have a single use.
Shouldn't the condition be (N.getOpcode() == ISD::ADD && !isIntS16Immediate(N.getOperand(1), ForceXFormImm) && N.getOperand(1).hasOneUse() && N.getOperand(0).hasOneUse()) instead ?

llvm/lib/Target/PowerPC/PPCISelLowering.h
686–691

The flag names are not really uniform, I guess DWInt is for DoubleWordInt ? I think we should drop the abbreviation here so it is more clear what the flags represent. Something like:

MOF_SubWordInt = 1 << 15,
MOF_WordInt = 1 << 16,
MOF_DoubleWordInt = 1 << 17,
MOF_ScalarFloat = 1 << 18,
MOF_Vector = 1 << 19,
MOF_Vector256 = 1 << 20,
694

Maybe we should name this flag MOF_SubtargetBeforeP9 or something like that, to make it clear that the flag is not set on P10.

713–714

I think the map and the function declaration should be moved with the others private fields below.

1082

This method should be private.

1087

This method should be private.

amyk updated this revision to Diff 318115.Jan 20 2021, 10:54 PM

Addressed review comments:

  • Rename some of the MemOpFlags
  • Fixed typo in comments
  • Moved variables near their use
  • Made some of the methods private
amyk marked 10 inline comments as done.Jan 21 2021, 3:12 PM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
16647–16649

That's a good point, Baptiste. You're right in that this is the case where we eliminate the add.

I thought about it a bit I believe the this condition might match the comment more (only get rid of the add if we do not have an add of a value and a signed 16-bit immediate, and the two operands don't have a single use)

if (N.getOpcode() == ISD::ADD &&
      (!(isIntS16Immediate(N.getOperand(1), imm) &&
         N.getOperand(1).hasOneUse() && N.getOperand(0).hasOneUse())))

which I believe, should then be equivalent to the condition in the code.

But yes, the condition was previously in SelectAddressRegRegOnly() which is why I kept it.
If there are any more concerns on the condition and/or comment, I can probably adjust it.

bsaleil accepted this revision as: bsaleil.Jan 22 2021, 3:10 PM

Thanks for addressing the comments, LGTM now.

This revision is now accepted and ready to land.Jan 22 2021, 3:10 PM
amyk updated this revision to Diff 320765.Feb 2 2021, 5:59 AM

Update patch to rebase with latest trunk, and rearrange/clean up code slightly.

amyk updated this revision to Diff 322972.Thu, Feb 11, 5:58 AM

Update patch to move the check for if we have a value with an offset that fits into a 34-bit immediate into it's own condition instead of inside an else if.

amyk updated this revision to Diff 323854.Mon, Feb 15, 6:00 PM

Update patch to fix a small issue when setting the Base and Disp for DForms when we have constant that fits in 32-bits.
Previously I used a uint64_t when it should have been a uint16_t.