The new lxvx/stxvx instructions do not require the swaps to line the elements up correctly. In order to select them over the lxvd2x/lxvw4x instructions which require swaps, the corresponding patterns are given a higher AddedComplexity.
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp | ||
---|---|---|
36 | If there are objections to adding this, I'm happy to remove it. The reason I added it is that I find it frustrating that I can't specify FileCheck patterns that ensure that a VSX instruction's result is the correct operand of a VMX instruction and vice-versa. I could also wrap this in an #ifndef _NDEBUG block, but then we would have to ensure that test cases that use it somehow require LLVM Asserts. |
This looks fine to me. A couple of inline comments. I'll let someone else give final approval.
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp | ||
---|---|---|
36 | I'm agnostic about that; however, you should remove the extra "l" from "respectively." :) | |
lib/Target/PowerPC/PPCInstrInfo.cpp | ||
1012 | Could use ternary operator here. | |
1134 | Same comment. |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
2196 | I'll add to the readme that patterns can be added to emit lxvd2x and friends in cases where we happen to want the elements in the reverse order (i.e. something like the vec_xl use as well as if the load is followed by a vector_shuffle that will reverse the elements). |
Few inline comments.
Thanks!
-eric
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10580 | This comment and the isISA3_0 don't quite match up. At least I'm assuming that there may be more isa3.0 processors other than power9? If not, then why the feature :) It also seems like this could be factored in a different way since you replicate it a bunch of times. | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
2196 | Eh? What's with the AddedComplexity here? |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10580 | I'll change the comment to: I'll define a local var as follows and use it in conditions: | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
2196 | I need these patterns to be preferred over the ones that use LXVD2X when we are on a CPU that implements ISA 3.0. This is the hack that we use in order to favour VSX instructions over other choices, so I just took it a step further to favour specific VSX instructions. |
test/CodeGen/PowerPC/swaps-le-1.ll | ||
---|---|---|
4 | What is grep here (toward the end of the line)? |
test/CodeGen/PowerPC/swaps-le-1.ll | ||
---|---|---|
4 | Nice catch. I have no idea how the stray grep made it in there nor do I understand why it doesn't make the test case fail. In any case, I'll remove it. |
test/CodeGen/PowerPC/swaps-le-1.ll | ||
---|---|---|
4 | Did you try this test case individually and it did not fail or you run all tests? I just came across this PR that might explain it if you run all tests and did not see an error: |
Addressed the comments:
- Removed the odd "grep" from the test case (it appears I had added that to the patch file accidentally - it does cause failures)
- Used ternary operator where it makes sense
- Updated comments to refer to ISA 3.0 rather than P9 for selecting whether swaps are needed
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp | ||
---|---|---|
445 | Can't you use RegNum instead of Op.getReg() here? |
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp | ||
---|---|---|
445 | Yeah, that's a good point. When I defined RegNum, I guess I forgot to change this one. I'll change it on the commit if no further changes are requested. |
test/CodeGen/PowerPC/vsx-p9.ll | ||
---|---|---|
2 | mtriple here? |
test/CodeGen/PowerPC/vsx-p9.ll | ||
---|---|---|
2 | Yeah, absolutely. Thanks. One of those things I meant to do and forgot - since I really wanted to add the BE testing as well (i.e. to confirm that the same load/store code is emitted). |
lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp | ||
---|---|---|
36 | It's good to have this option : ) | |
test/CodeGen/PowerPC/swaps-le-1.ll | ||
110 | ; CHECK-P9-NOT: xxswapd Looks like it has been covered by "-implicit-check-not xxswapd"? | |
test/CodeGen/PowerPC/vsx-p9.ll | ||
104 | I'm curious about that we don't add pattern match of i128 type for lxvs in PPCInstrVSX.td, but why it finally gets matched to lxvx? | |
test/CodeGen/PowerPC/vsx_shuffle_le.ll | ||
85 | So looks like use 'lxvd2x' can be more benefit? |
test/CodeGen/PowerPC/swaps-le-1.ll | ||
---|---|---|
110 | Good point. I initially started by adding a bunch of CHECK-NOT's and then realized that adding the implicit one is easier. | |
test/CodeGen/PowerPC/vsx-p9.ll | ||
104 | The legalization code takes care of this as far as I remember. | |
test/CodeGen/PowerPC/vsx_shuffle_le.ll | ||
85 | Yes, I looked into this. I think this can actually be handled rather cleanly in the swap optimization pass. For situations where we actually want the value loaded to be permuted in the register, we can use the lxvd2x instruction. |
This patch was functionally tested on the P9 simulator and the behaviour with these loads and stores is the same as the behaviour on a Power8 machine with the swapping loads and stores.
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10584 | I can certainly define something like HasNonPermutingMemOps in PPC.td and make it part of the Power9 processor definition. The only reason it hasn't been defined is that we kind of decided not to have too much granularity with Power9 since there are no optional features. But if you feel that it might be useful to be able to turn this off and turn on the rest of the Power9 instructions, I can certainly do that. |
Couple more inline comments... I'm just coming up to speed on the new vector stuff so excuse any obvious questions :)
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10584 | Actually, you should probably just make it a PPCSubtarget routine, don't worry about the feature aspect - as you say, we don't need to worry about separating it out. That way you can use the simple query all over the place. | |
lib/Target/PowerPC/PPCInstrInfo.cpp | ||
1009–1010 | ... including here I imagine. | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
2196 | Can we just exclude the other patterns? We don't ever want them to show up do we? |
As Eric suggested, remove the old patterns to ensure LXVX/STXVX instructions are generated, instead of using AddedComplexity.
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
2196 | I agree, it would be better to just use the LXVX and STXVX instructions when they are available. |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
2196 | @kbarton @echristo |
Removed the additional "AddedComplexity" clause to favour new loads/stores over the old ones.
Predicated the old loads and stores on previous ISA levels (prior to 3.0).
Provided a subtarget method for querying whether the target needs swaps for VSX memory operations.
lib/Target/PowerPC/PPCInstrInfo.cpp | ||
---|---|---|
1009–1010 | I prefer to leave this query as-is because I don't think we would gain anything by emitting PPC::STXVD2X on a big endian Power9 either. |
OK, I think this is better.
However, I still find this confusing because there are several different checks being used here in various places: ISA3_0, P9Vector and hasVSX. If they can be used interchangeably, we should pick one and use it, instead of mixing them up. I'll add specific comments inline as examples.
Regarding the testing, if I understand the patch correctly, we should also be able to do: -mcpu=pwr9 -mattr=-P9Vector, and get the same codegen as we would expect with -mcpu=pwr8 (i.e., generate the additional swap instructions). If that's true, can you please add these runsteps as well to ensure we get the right codegen?
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10580 | Here the comments refer to ISA3_0. Does this imply P9Vector? What about hasVSX()? | |
lib/Target/PowerPC/PPCInstrInfo.cpp | ||
1009–1010 | Here we check for ISA3_0. Again, does this imply P9Vector()? Are the LXVX and STXVX instructions guarded by ISA3_0 or P9Vector? | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
95 | Here we're checking for P9Vector only, no ISA3_0. | |
lib/Target/PowerPC/PPCSubtarget.h | ||
279 | Here we're checking hasVSX() and ISA3_0, but no mention of P9Vector. |
Sorry for another revision, but I think we need to cleanup the way different pieces of this are guarded. Plus, we should add some additional testing to cover cases with P9 but no LXVX/STXVX instructions.
I'm happy to publish another review, that's fine. Particularly because you've definitely pointed out a bug - as it is now, we'd try to use LXVX/STXVX to reload/spill even with -mcpu=pwr9 -mattr=-power9-vector which would probably result in a crash.
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10580 | If you don't mind, I'd like to keep the text in the comment as "ISA 3.0" because that is when the actual instructions were introduced. I prefer to keep general comments like this descriptive and the use of P9Vector is a kind of an implementation detail that isn't necessarily relevant. Of course, if you feel strongly about it, I'll certainly change it. | |
lib/Target/PowerPC/PPCInstrInfo.cpp | ||
1009–1010 | This is a good point. You're absolutely right - we need to be checking P9Vector because that is the Predicate in the .td file. I'll change this. | |
1131–1132 | Same here. I'll change this to P9Vector. | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
95 | Yup. This is the right thing to do - it's the other uses that are wrong. We need to do it this way because the new instructions are guarded by P9Vector. | |
lib/Target/PowerPC/PPCSubtarget.h | ||
279 | Yup. This will change as well. |
Fixed the way we query for whether the new instructions are available (to match the predicate in the .td file).
Added run lines for combination -mcpu=pwr9 -mattr=-power9-vector to the test cases.
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
2196 | Yes, I don't think you can remove them altogether, but predicate them so they only appear on preISA3. |
If there are objections to adding this, I'm happy to remove it. The reason I added it is that I find it frustrating that I can't specify FileCheck patterns that ensure that a VSX instruction's result is the correct operand of a VMX instruction and vice-versa.
I could also wrap this in an
block, but then we would have to ensure that test cases that use it somehow require LLVM Asserts.