This is an archive of the discontinued LLVM Phabricator instance.

Power9 - Part-word VSX integer scalar loads/stores and sign extend instructions
ClosedPublic

Authored by nemanjai on Aug 4 2016, 1:07 AM.

Details

Summary

This patch uses a portion of https://reviews.llvm.org/D20310 that removes the VSHRC register class. The reason for this is that these instructions have a similar need for the changes to the register classes and the other revision is on hold for now. When this revision is committed, the other one will simply be an addition of the patterns for the instructions.

The new instructions are useful for a few purposes:

  • Int to Fp conversions of 1 or 2-byte values loaded from memory
  • Building vectors of 1 or 2-byte integers with values loaded from memory
  • Storing individual 1 or 2-byte elements from integer vectors

This patch implements all of those uses.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 66770.Aug 4 2016, 1:07 AM
nemanjai retitled this revision from to Power9 - Part-word VSX integer scalar loads/stores and sign extend instructions.
nemanjai updated this object.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added subscribers: echristo, llvm-commits.

Add Tom.

Hi Nemanja,
Thank you for working on it :D
CY

kbarton added inline comments.Aug 31 2016, 12:57 PM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
476

What does this comment mean?
Does this need to be done? If not, can it be removed?

lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp
354

This routine is defined as static in at least two files.
Can we add it to a more general location and just define it once?

lib/Target/PowerPC/PPCAsmPrinter.cpp
165

Same comment as above. Can these go into a more general location?

177

Extraneous comment?

lib/Target/PowerPC/PPCISelLowering.cpp
10559

Was this new section formatted with clang-format?
The indentation looks strange to me, but this might be what it does when you have these types of long lines.

cycheng added inline comments.Sep 2 2016, 4:06 AM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
476

Because ISA3.0 add new VSX instruction formats, so couple of new VSX instructions (xsabsqp, xsnabsqp, ...) uses V registers.

When we want to print operand for VSX instructions, we have to know which register class it is.

if ((MII.get(MI->getOpcode()).TSFlags & PPCII::UseVSXReg)) {

if PPCII::UseVSXReg is true, then it means the VSX instruction use VSX registers (without any V registers.), so the following code will treat these register as VSX register.

if (isVRRegister(Reg))
  Reg = PPC::VSX32 + (Reg - PPC::V0);

But because we use v0-v31 to represent vs32-vs63, so you will see the Reg is PPC::V0 - PPC::V31, so we get the offset value, and add that offset back to base PPC::VSX32.

The purpose here is, we want to translate v0-v31 back to vs32-vs63.

lib/Target/PowerPC/PPCAsmPrinter.cpp
177

Same logic as PPCInstPrinter::printOperand, maybe we can replace this comment with short one.

nemanjai added inline comments.Sep 28 2016, 8:20 AM
lib/Target/PowerPC/PPCISelLowering.cpp
10559

Yeah. Well I use Vim and the provided .vimrc which indents things according to the policy. It will line up the conditions when they're parenthesized (i.e. in an if statement). But for an assignment/initialization, it indents the continued lines 1 tab stop (2 spaces) from the indentation of the declarator.

nemanjai updated this revision to Diff 72834.Sep 28 2016, 8:27 AM
nemanjai edited edge metadata.

Moved the definition of the static functions that were defined in 3 places to the class definition.
Clarified the comment regarding the handling of VMX (Altivec) registers.
Rebased the patch on a more recent revision (r282478) since a lot of files and behaviour changed.

Please note: there are a couple of FIXME's in the test cases. The reason for this is that there are a number of instructions we exploited recently that can match BUILD_VECTOR nodes. Some of the previously committed patches make the code in this patch unable to match a BUILD_VECTOR node under very specific conditions. I have a subsequent patch coming up which will take all the recent changes into account and produce optimal code for BUILD_VECTOR nodes for both Power8 and Power9.

@hfinkel Hi Hal, this is the patch I mentioned that I'd like you to have a look at since it involves that infrastructural change that CY started working on - removing the VSHRC register class.

kbarton edited edge metadata.Oct 3 2016, 8:30 AM

Aside from minor comments this LGTM.
I think it would be good for @hfinkel take a quick look too.

lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
474

This should change to C++-style comments (i.e., //) as per the comment-formatting guidelines: http://llvm.org/docs/CodingStandards.html#comment-formatting

lib/Target/PowerPC/PPCAsmPrinter.cpp
174

Reformat comments

lib/Target/PowerPC/PPCCallingConv.td
71

This is very minor, but this used to be sorted by integer types (in decending order of vector elements), followed by floats sorted in the same way. To maintain this, v2f64 should be moved to the end after v4f32 unless there is a specific reason it has been put where it is.

122

Same comment about placement of v2f64.

192

Same comment about v2f64

lib/Target/PowerPC/PPCISelLowering.cpp
7079

Could this condition be combined with the one below?

10577

Is FP really needed? Can you just return on the lines where FP is being assigned?

10828

ByteWidth might not be the best name here, since it shouldn't vary ;)

10829

I don't see ByteWidth used anywhere else, so you can probably do away with it by just copying the logic to assign ByteWidth into this constructor call.

lib/Target/PowerPC/PPCInstrInfo.cpp
1063

This is duplicated in both storeRegToStackSlot and loadRegFromStackSlot.
This should be refactored into a separate method to get the TargetRegisterClass and called from both these methods instead, to ensure consistency.

lib/Target/PowerPC/PPCMIPeephole.cpp
198

Is it possible for MyOpcode to be VSPLTBs also? I don't see this case handled anywhere and it's not clear that this situation could not be symmetric (i.e., MyOpcode == PPC::VSPLTBs and DefOpcode == PPC::VSPLTB).

lib/Target/PowerPC/PPCRegisterInfo.td
286

Same comment about ordering.

kbarton accepted this revision.Oct 3 2016, 8:31 AM
kbarton edited edge metadata.
This revision is now accepted and ready to land.Oct 3 2016, 8:31 AM
hfinkel added inline comments.Oct 3 2016, 9:56 AM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
479

We don't need to mention how it used to work, that won't help anyone looking at the current code, and might be confusing.

lib/Target/PowerPC/PPCAsmPrinter.cpp
179

Same here; we don't need to mention how it used to work.

lib/Target/PowerPC/PPCInstrInfo.cpp
1058

We should make this comment more specific, and explain that the difference is that the lanes are stored in a different order.

1061

It would be better if we could explain here why this happens. It is not obvious because the live range that was split via the spill/restore would have had some particular register class.

lib/Target/PowerPC/PPCInstrVSX.td
96

I don't really understand what this is saying and why using the same register would matter.

1125

Why don't you set UseVSXReg = 1 in an outer block (essentially over all definitions), and then override by setting UseVSXReg = 0 for those instructions that shouldn't have it. I'm under the impression that this is a smaller set.

nemanjai added inline comments.Oct 3 2016, 11:33 AM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
474

Will do.

479

OK, I'll remove the last sentence.

lib/Target/PowerPC/PPCAsmPrinter.cpp
174

Will do.

179

OK, will remove the last sentence.

lib/Target/PowerPC/PPCCallingConv.td
71

I don't think the order implies anything in this list. I'll move v2f64 to the end (after v4f32).

122

Yup, will move it.

192

Yup.

lib/Target/PowerPC/PPCISelLowering.cpp
7079

Yeah, the one below was an afterthought and I guess I just missed adding it to this one. I'll combine them.

10577

Yup, I'll rewrite it that way.

10829

Yeah, I'll combine it.

lib/Target/PowerPC/PPCInstrInfo.cpp
1058

I'll rewrite it as:
We need to avoid a situation in which the value from a VRRC register is spilled using an Altivec instruction and reloaded into
a VSRC register using a VSX instruction. The issue with this is that the VSX load/store instructions swap the doublewords
in the vector and the Altivec ones don't. The register classes on the spill/reload may be different if the register is defined using
an Altivec instruction and is then used by a VSX instruction.

lib/Target/PowerPC/PPCInstrVSX.td
96

If you look at the instruction where I use this for an example:

def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
           (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>;

If I were to replace that with

def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
           (v2i64 (XXPERMDI (LXSIHZX xoaddr:$src), (LXSIHZX xoaddr:$src), 0))>;

The output DAG actually produces code that will perform both loads into different registers (i.e. two separate LXSIHZX instructions) and will do the XXPERMDI on the result registers. I just found using a version of the instruction that has a single input register operand to be an easy way to avoid this.

1125

I think that we'd have to interrupt this block for definitions that do not inherit from Instruction such as MovesToVSR, VectorExtractions, etc.
In any case, I think that over time, our use of these blocks has introduced some subtle issues that may or may not ever manifest themselves.
For example, we may have a

let Predicates = [ListA] in {
  let Predicates = [ListB] in {
  }
}

where the content of the inner block should really have

let Predicates = [ListA, ListB] in {
}

but of course, it does not since the inner assignment overrides the Predicates rather than appending tothem.
I plan to go through the file and clean this up a bit but I'd prefer to do this as a separate patch.

lib/Target/PowerPC/PPCMIPeephole.cpp
198

We don't currently have any patterns that will produce such a combination. Namely, the versions suffixed with "s" are instructions that splat a scalar (actually loaded as a scalar) into the vector register. Without this scalar load, we should not be using these scalar splats. So a VSPLTBs can only be fed by an LXSIBZX (or I suppose also an MTVSR[DW]) but never a VSPLTB.

lib/Target/PowerPC/PPCRegisterInfo.td
286

OK, I'll reorder this as well.

@hfinkel @kbarton Do you guys want me to post an updated patch or just address the comments in the commit? I'm sorry to be spamming you, but I'd like to commit this patch so that I can get the other one in (as well as having clear direction on the one I'm working on now).

hfinkel edited edge metadata.Oct 3 2016, 11:56 AM

@hfinkel @kbarton Do you guys want me to post an updated patch or just address the comments in the commit? I'm sorry to be spamming you, but I'd like to commit this patch so that I can get the other one in (as well as having clear direction on the one I'm working on now).

Based on your responses, I'm find with you addressing the comments and committing.

nemanjai closed this revision.Oct 4 2016, 12:08 AM

Committed revision 283190.

shawnl added a subscriber: shawnl.Jul 2 2019, 7:52 AM

I think the essential functionality of this patch should be in LLVM and not Clang, so that all front-ends can benefit. To many generally useful things are in Clang when they should be in LLVM (e.g. C ABI for ARM and x86; ranged switch statements). I opened an upstream bug to discuss this. https://github.com/clang-randstruct/llvm-project/issues/52

Herald added a project: Restricted Project. · View Herald TranscriptJul 2 2019, 7:52 AM
Herald added a subscriber: jsji. · View Herald Transcript
shawnl added a comment.Jul 2 2019, 7:53 AM

Oh sorry, wrong bug.