Page MenuHomePhabricator

[PPC] Implement the heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st.

Authored by jtony on Oct 2 2017, 6:32 PM.



The VSX versions have the advantage of a full 64-register target whereas the FP ones have the advantage of lower latency and higher throughput. So what we’re after is using the faster instructions in low register pressure situations and using the larger register file in high register pressure situations.

The heuristic chooses between the following 7 pairs of instructions.


Diff Detail


Event Timeline

jtony created this revision.Oct 2 2017, 6:32 PM
jtony added inline comments.
3089 ↗(On Diff #117459)

Do we really need the NoP9Vector here ? If not, I can remove it.

hfinkel accepted this revision.Oct 3 2017, 9:05 PM
hfinkel added inline comments.
3089 ↗(On Diff #117459)

If we don't have it, won't this pattern compete with the DblToIntLoadP9 pattern on the P9?Maybe the complexity metric will make the latter win, but it probably makes sense to leave the NoP9Vector here so that it's explicit.

Otherwise, this LGTM.

This revision is now accepted and ready to land.Oct 3 2017, 9:05 PM
nemanjai added inline comments.Oct 6 2017, 1:24 AM
3511 ↗(On Diff #117459)

Is this just matching because we're emitting lfsx? I really don't see anything in this patch that can convert an X-Form load to a D-Form load. If it is indeed a D-Form we're emitting here, please respond to this comment as to how this is happening.
Furthermore, if we're converting this to a D-Form here, explain why we're not doing the same with the lxsdx -> lfdx change above.

nemanjai added inline comments.Oct 6 2017, 1:32 AM
2080 ↗(On Diff #117459)

Minor nit: everything you've added below here is just a duplication of how we handle the D-Forms above. Why not just unify the two - after all, the approach is identical:

  • Identify the upper and lower opcodes
  • Change the opcode of the instruction accordingly
3089 ↗(On Diff #117459)

I agree with Hal here. Please leave this in, the fact that there is no observable change in behaviour may be just luck for some of the patterns (i.e. if Complexity and CodeSize are equal).
Plus this makes it very nice and clear when looking at the code that we mean to use these on P7 and P8 targets.

jtony updated this revision to Diff 118834.EditedOct 12 2017, 1:38 PM
jtony marked 5 inline comments as done.

Address comments from Nemanja.
(1) Abstract the common part code for handling D-Form and X-Form LD/ST heuristic into a function.
(2) Fix missing predicate guards (HasP8Vector, AddedComplexity=400) problem.

jtony added inline comments.Oct 12 2017, 1:42 PM
3511 ↗(On Diff #117459)

This has been fixed by the new patch. It is caused by not guarding the new Pseudo instructions with proper predicates: HasP8Vector and AddedComplexity=400.

jtony marked an inline comment as done.Oct 12 2017, 1:42 PM
nemanjai requested changes to this revision.Oct 17 2017, 5:55 AM

Please add test cases for the following:

  • An actual use of the VSX instructions (i.e. where the register pressure is high enough that an upper VSX register will be allocated - you should be able to accomplish this with inline asm clobbers lists in a small test case)
  • A Power9 test case where we will use an X-Form version (i.e. the offset is large enough not to fit in a displacement field)
1992 ↗(On Diff #118834)

Redundant line.

1993 ↗(On Diff #118834)

The name is a bit too vague. Perhaps something like expandVSXMemInstr(), to indicate that it does the actual expansion (i.e. changes the opcode of the MI) as well as that it handles VSX memory operation instructions.

2864 ↗(On Diff #118834)

I really don't think it is a particularly good idea to separate this out here. I'd keep it next to where the definitions of the actual instructions are. This will ensure that you define these in the right scope with all the properties set and will avoid having to figure out what the correct set of let expressions is. For example, as you've defined them here, the mayLoad/mayStore flags won't be set correctly here. These flags are used in various post-ISEL passes to check for side-effects, so it is important to set them correctly.

This revision now requires changes to proceed.Oct 17 2017, 5:55 AM
jtony updated this revision to Diff 120443.Oct 26 2017, 10:06 AM
jtony edited edge metadata.
jtony marked 3 inline comments as done.

Address the comments from Nemanja.

nemanjai accepted this revision.Nov 1 2017, 9:20 AM

The remaining comments I have are minor nits. Please feel free to fix them on the commit. Otherwise, LGTM.

134 ↗(On Diff #120443)

Style nits (of course, this applies not only here but everywhere below as well):

  1. No braces when the let expression applies to a single def
  2. Just enclose the new def into a block if the same let expression applies to both (i.e. let CodeSize = 3 applies to both LXSDX and XFLOADf64 here. Just enclose the two into a block like you enclosed them into the mayLoad/mayStore block.
  3. I find it surprising that we need let isPseudo = 1 on definitions that use Pseudo<>. Are you sure that's needed? Isn't it part of the class definition for Pseudo<>?
12 ↗(On Diff #120443)

So we don't get this on Power9?

This revision is now accepted and ready to land.Nov 1 2017, 9:20 AM
nemanjai added inline comments.Nov 1 2017, 12:58 PM
12 ↗(On Diff #120443)

Looks like part of my comment disappeared. What I meant to add here is to ask you to please add checks for the code before and after the inline asm for both Power8 and Power9.

This revision was automatically updated to reflect the committed changes.
jtony marked 3 inline comments as done.