Page MenuHomePhabricator

Power9 - Add exploitation of vector load and store that do not require swaps
ClosedPublic

Authored by nemanjai on May 2 2016, 1:02 PM.

Details

Summary

The new lxvx/stxvx instructions do not require the swaps to line the elements up correctly. In order to select them over the lxvd2x/lxvw4x instructions which require swaps, the corresponding patterns are given a higher AddedComplexity.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 55870.May 2 2016, 1:02 PM
nemanjai retitled this revision from to Power9 - Add exploitation of vector load and store that do not require swaps.
nemanjai updated this object.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added a subscriber: llvm-commits.
nemanjai added inline comments.May 2 2016, 1:08 PM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
36

If there are objections to adding this, I'm happy to remove it. The reason I added it is that I find it frustrating that I can't specify FileCheck patterns that ensure that a VSX instruction's result is the correct operand of a VMX instruction and vice-versa.

I could also wrap this in an

#ifndef _NDEBUG

block, but then we would have to ensure that test cases that use it somehow require LLVM Asserts.

wschmidt edited edge metadata.May 2 2016, 1:18 PM

This looks fine to me. A couple of inline comments. I'll let someone else give final approval.

lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
36

I'm agnostic about that; however, you should remove the extra "l" from "respectively." :)

lib/Target/PowerPC/PPCInstrInfo.cpp
1012

Could use ternary operator here.

1135

Same comment.

nemanjai added inline comments.May 2 2016, 2:51 PM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
36

:) I think I need the little red squiggly line in Vim.

lib/Target/PowerPC/PPCInstrInfo.cpp
1012

Will do. I noticed that the if is quite extraneous here after I posted the patch.

nemanjai added inline comments.May 3 2016, 12:55 PM
lib/Target/PowerPC/PPCInstrVSX.td
2161

I'll add to the readme that patterns can be added to emit lxvd2x and friends in cases where we happen to want the elements in the reverse order (i.e. something like the vec_xl use as well as if the load is followed by a vector_shuffle that will reverse the elements).

nemanjai edited edge metadata.May 3 2016, 12:56 PM
nemanjai added a subscriber: echristo.

Few inline comments.

Thanks!

-eric

lib/Target/PowerPC/PPCISelLowering.cpp
10541

This comment and the isISA3_0 don't quite match up. At least I'm assuming that there may be more isa3.0 processors other than power9? If not, then why the feature :)

It also seems like this could be factored in a different way since you replicate it a bunch of times.

lib/Target/PowerPC/PPCInstrVSX.td
2161

Eh? What's with the AddedComplexity here?

ehsan resigned from this revision.May 4 2016, 8:57 AM
ehsan removed a reviewer: ehsan.
nemanjai added inline comments.May 4 2016, 9:20 AM
lib/Target/PowerPC/PPCISelLowering.cpp
10541

I'll change the comment to:
Not needed on CPUs that implement ISA 3.0 since we have a load that lines things up correctly.

I'll define a local var as follows and use it in conditions:
bool NeedsSwapsForVSXMemOps = Subtarget.hasVSX() && Subtarget.isLittleEndian() && !Subtarget.isISA3_0();

lib/Target/PowerPC/PPCInstrVSX.td
2161

I need these patterns to be preferred over the ones that use LXVD2X when we are on a CPU that implements ISA 3.0. This is the hack that we use in order to favour VSX instructions over other choices, so I just took it a step further to favour specific VSX instructions.

amehsan added inline comments.May 4 2016, 10:24 PM
test/CodeGen/PowerPC/swaps-le-1.ll
4

What is grep here (toward the end of the line)?

nemanjai added inline comments.May 5 2016, 12:34 AM
test/CodeGen/PowerPC/swaps-le-1.ll
4

Nice catch. I have no idea how the stray grep made it in there nor do I understand why it doesn't make the test case fail. In any case, I'll remove it.

amehsan added inline comments.May 5 2016, 4:17 AM
test/CodeGen/PowerPC/swaps-le-1.ll
4

Did you try this test case individually and it did not fail or you run all tests? I just came across this PR that might explain it if you run all tests and did not see an error:

https://llvm.org/bugs/show_bug.cgi?id=27652

nemanjai updated this revision to Diff 56717.May 10 2016, 8:14 AM

Addressed the comments:

  • Removed the odd "grep" from the test case (it appears I had added that to the patch file accidentally - it does cause failures)
  • Used ternary operator where it makes sense
  • Updated comments to refer to ISA 3.0 rather than P9 for selecting whether swaps are needed
kbarton added inline comments.May 11 2016, 9:12 AM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
445

Can't you use RegNum instead of Op.getReg() here?

nemanjai marked an inline comment as done.May 11 2016, 1:12 PM
nemanjai added inline comments.
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
445

Yeah, that's a good point. When I defined RegNum, I guess I forgot to change this one. I'll change it on the commit if no further changes are requested.

amehsan added inline comments.May 18 2016, 1:10 PM
test/CodeGen/PowerPC/vsx-p9.ll
1

mtriple here?

nemanjai marked an inline comment as done.May 18 2016, 1:53 PM
nemanjai added inline comments.
test/CodeGen/PowerPC/vsx-p9.ll
1

Yeah, absolutely. Thanks. One of those things I meant to do and forgot - since I really wanted to add the BE testing as well (i.e. to confirm that the same load/store code is emitted).

cycheng added inline comments.May 19 2016, 8:54 PM
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
36

It's good to have this option : )

test/CodeGen/PowerPC/swaps-le-1.ll
105

; CHECK-P9-NOT: xxswapd

Looks like it has been covered by "-implicit-check-not xxswapd"?
Or this is intentional?

test/CodeGen/PowerPC/vsx-p9.ll
103

I'm curious about that we don't add pattern match of i128 type for lxvs in PPCInstrVSX.td, but why it finally gets matched to lxvx?

test/CodeGen/PowerPC/vsx_shuffle_le.ll
81

So looks like use 'lxvd2x' can be more benefit?
If it is true, then, of course we don't have to handle it in current milestone, just a todo in future.

nemanjai added inline comments.May 20 2016, 12:17 AM
test/CodeGen/PowerPC/swaps-le-1.ll
105

Good point. I initially started by adding a bunch of CHECK-NOT's and then realized that adding the implicit one is easier.

test/CodeGen/PowerPC/vsx-p9.ll
103

The legalization code takes care of this as far as I remember.

test/CodeGen/PowerPC/vsx_shuffle_le.ll
81

Yes, I looked into this. I think this can actually be handled rather cleanly in the swap optimization pass. For situations where we actually want the value loaded to be permuted in the register, we can use the lxvd2x instruction.
However, such situations should be reasonably rare that we can defer this work as a "nice-to-have".

nemanjai updated this revision to Diff 59063.May 31 2016, 8:05 AM

Addressed the minor comments that were posted on the last review.

This patch was functionally tested on the P9 simulator and the behaviour with these loads and stores is the same as the behaviour on a Power8 machine with the swapping loads and stores.

echristo added inline comments.Jul 6 2016, 7:01 PM
lib/Target/PowerPC/PPCISelLowering.cpp
10545

Would it make sense for this to be a subtarget feature given all of the load/store instruction changes for vectors? Though I guess you couldn't just use Subtarget.isISA3_0 anymore as the canonical "am I targeting something P9 or later".

nemanjai added inline comments.Jul 7 2016, 9:03 AM
lib/Target/PowerPC/PPCISelLowering.cpp
10545

I can certainly define something like HasNonPermutingMemOps in PPC.td and make it part of the Power9 processor definition. The only reason it hasn't been defined is that we kind of decided not to have too much granularity with Power9 since there are no optional features. But if you feel that it might be useful to be able to turn this off and turn on the rest of the Power9 instructions, I can certainly do that.

Couple more inline comments... I'm just coming up to speed on the new vector stuff so excuse any obvious questions :)

lib/Target/PowerPC/PPCISelLowering.cpp
10545

Actually, you should probably just make it a PPCSubtarget routine, don't worry about the feature aspect - as you say, we don't need to worry about separating it out.

That way you can use the simple query all over the place.

lib/Target/PowerPC/PPCInstrInfo.cpp
1009–1010

... including here I imagine.

lib/Target/PowerPC/PPCInstrVSX.td
2161

Can we just exclude the other patterns? We don't ever want them to show up do we?

kbarton requested changes to this revision.Jul 11 2016, 11:30 AM
kbarton edited edge metadata.

As Eric suggested, remove the old patterns to ensure LXVX/STXVX instructions are generated, instead of using AddedComplexity.

lib/Target/PowerPC/PPCInstrVSX.td
2161

I agree, it would be better to just use the LXVX and STXVX instructions when they are available.
So, we would need to remove the previous patterns for ISA3_0, but keep them for older ISAs.

This revision now requires changes to proceed.Jul 11 2016, 11:30 AM
nemanjai added inline comments.Jul 11 2016, 11:57 AM
lib/Target/PowerPC/PPCInstrVSX.td
2161

@kbarton @echristo
Can you just elaborate a bit on what you mean by removing the other patterns?
If I remove all the patterns that generate LXV[DW][24]X/STXV[WD][24]X, we have will likely have issues on Power8 (assuming that these patterns were needed). So perhaps what I can do is predicate all the patterns for the old instructions with IsPreISA3_0 (which would be !Subtarget.isISA3_0()). Maybe even needSwapsForVSXMemoryOps().Then I wouldn't need the AddedComplexity.

nemanjai updated this revision to Diff 63797.Jul 13 2016, 5:46 AM
nemanjai edited edge metadata.

Removed the additional "AddedComplexity" clause to favour new loads/stores over the old ones.
Predicated the old loads and stores on previous ISA levels (prior to 3.0).
Provided a subtarget method for querying whether the target needs swaps for VSX memory operations.

nemanjai added inline comments.Jul 13 2016, 5:47 AM
lib/Target/PowerPC/PPCInstrInfo.cpp
1009–1010

I prefer to leave this query as-is because I don't think we would gain anything by emitting PPC::STXVD2X on a big endian Power9 either.

OK, I think this is better.
However, I still find this confusing because there are several different checks being used here in various places: ISA3_0, P9Vector and hasVSX. If they can be used interchangeably, we should pick one and use it, instead of mixing them up. I'll add specific comments inline as examples.

Regarding the testing, if I understand the patch correctly, we should also be able to do: -mcpu=pwr9 -mattr=-P9Vector, and get the same codegen as we would expect with -mcpu=pwr8 (i.e., generate the additional swap instructions). If that's true, can you please add these runsteps as well to ensure we get the right codegen?

lib/Target/PowerPC/PPCISelLowering.cpp
10541

Here the comments refer to ISA3_0. Does this imply P9Vector? What about hasVSX()?

lib/Target/PowerPC/PPCInstrInfo.cpp
1009–1010

Here we check for ISA3_0. Again, does this imply P9Vector()? Are the LXVX and STXVX instructions guarded by ISA3_0 or P9Vector?

lib/Target/PowerPC/PPCInstrVSX.td
89

Here we're checking for P9Vector only, no ISA3_0.

lib/Target/PowerPC/PPCSubtarget.h
279 ↗(On Diff #63797)

Here we're checking hasVSX() and ISA3_0, but no mention of P9Vector.

kbarton requested changes to this revision.Jul 13 2016, 12:27 PM
kbarton edited edge metadata.

Sorry for another revision, but I think we need to cleanup the way different pieces of this are guarded. Plus, we should add some additional testing to cover cases with P9 but no LXVX/STXVX instructions.

This revision now requires changes to proceed.Jul 13 2016, 12:27 PM

Sorry for another revision, but I think we need to cleanup the way different pieces of this are guarded. Plus, we should add some additional testing to cover cases with P9 but no LXVX/STXVX instructions.

I'm happy to publish another review, that's fine. Particularly because you've definitely pointed out a bug - as it is now, we'd try to use LXVX/STXVX to reload/spill even with -mcpu=pwr9 -mattr=-power9-vector which would probably result in a crash.

lib/Target/PowerPC/PPCISelLowering.cpp
10541

If you don't mind, I'd like to keep the text in the comment as "ISA 3.0" because that is when the actual instructions were introduced. I prefer to keep general comments like this descriptive and the use of P9Vector is a kind of an implementation detail that isn't necessarily relevant.

Of course, if you feel strongly about it, I'll certainly change it.

lib/Target/PowerPC/PPCInstrInfo.cpp
1009–1010

This is a good point. You're absolutely right - we need to be checking P9Vector because that is the Predicate in the .td file. I'll change this.

1132–1133

Same here. I'll change this to P9Vector.

lib/Target/PowerPC/PPCInstrVSX.td
89

Yup. This is the right thing to do - it's the other uses that are wrong. We need to do it this way because the new instructions are guarded by P9Vector.

lib/Target/PowerPC/PPCSubtarget.h
279 ↗(On Diff #63797)

Yup. This will change as well.

nemanjai updated this revision to Diff 64187.Jul 15 2016, 1:22 PM
nemanjai edited edge metadata.

Fixed the way we query for whether the new instructions are available (to match the predicate in the .td file).
Added run lines for combination -mcpu=pwr9 -mattr=-power9-vector to the test cases.

kbarton added inline comments.Aug 31 2016, 9:27 AM
lib/Target/PowerPC/PPCInstrVSX.td
2159–2178

Yes, I don't think you can remove them altogether, but predicate them so they only appear on preISA3.
Ideally you would want an inequality, but I'm not sure if that is possible. If you use !Subtarget.isISA3_0, then you need to make sure to include isISA3_0 in all future subtargets as well.
As you said, that should eliminate the need for the AddedComplexity, which would be good.

kbarton accepted this revision.Sep 21 2016, 11:30 AM
kbarton edited edge metadata.

LGTM

This revision is now accepted and ready to land.Sep 21 2016, 11:30 AM
nemanjai closed this revision.Sep 22 2016, 3:02 AM

Committed revision 282143.