This is an archive of the discontinued LLVM Phabricator instance.

Scalar to vector conversions using direct moves
ClosedPublic

Authored by nemanjai on Jul 23 2015, 12:53 PM.

Details

Summary

We currently always build vectors from scalar non-constant values using stack operations which involves the load-hit-store hazard. This patch is the first in a series that will allow operations such as scalar to vector, extract vector element, etc. to be done using direct move instructions rather than memory operations.
Since this is just the first patch, the code in some cases is not yet optimal, but all scalar to vector operations involve fewer memory operations.

Diff Detail

Repository
rL LLVM

Event Timeline

nemanjai updated this revision to Diff 30516.Jul 23 2015, 12:53 PM
nemanjai retitled this revision from to Scalar to vector conversions using direct moves.
nemanjai updated this object.
nemanjai added reviewers: wschmidt, hfinkel, kbarton, seurer.
nemanjai set the repository for this revision to rL LLVM.
nemanjai added a subscriber: llvm-commits.
hfinkel added inline comments.Jul 25 2015, 10:38 AM
lib/Target/PowerPC/PPCISelLowering.cpp
7504

Just return the result of the DAG.getNode(...) call. Then you don't even need the { } around the body of the if.

That having been said, it does not seem like you have to do this at all. Just declare the relevant SCALAR_TO_VECTOR as Legal, and pattern match it in the usual way in the .td file.

lib/Target/PowerPC/PPCISelLowering.h
135

Why are you adding these? What generates them?

lib/Target/PowerPC/PPCInstrVSX.td
1198

Given that I don't see anything that actually generates this ISD nodes, I'm suspicions that you don't actually have tests for these.

1236

Maybe we should name this BE_LOW_BYTE, or something like that?

nemanjai added inline comments.Jul 27 2015, 9:50 AM
lib/Target/PowerPC/PPCISelLowering.cpp
7504

I agree with the first point and can do as you suggested.
However, I felt that this way is a little cleaner than doing it all in the .td file. If I try to use PPCmtvsrdVec in an output pattern, tblgen gives me the message:

Cannot use 'PPCmtvsrdVec' in an output pattern!

Of course, I could do away with that SDAG node altogether and provide the same patterns for scalar_to_vector as I did for the new nodes. I just felt that the node that explicitly names the move to VSR to be more descriptive and allow for a different implementation of scalar_to_vector should a need for that arise in the future.
And again, I will implement it however you would prefer.

lib/Target/PowerPC/PPCISelLowering.h
135

Thank you for catching this. It turns out that this is one of those events where I have tried a number of approaches to get the results I want until I settled on the approach that I plan to stick with. However, these SDAG nodes were missed in the re-re-re-factoring :).
The idea was to emit these SDAG nodes when we need scalar to vector conversions so that they can be matched, but since they will only match one instruction, they're kind of pointless.
I will remove them in the next review.

lib/Target/PowerPC/PPCInstrVSX.td
1198

Yes, sorry about this. See above. I will remove the SD nodes and the patterns from these instructions.

1236

The BE version of this move will put the value into the most significant byte in the VSR. Similarly with the halfword, word and doubleword versions. The LE versions will place it into the corresponding least significant element.
Maybe BE_MSB? BE_HI_BYTE? And correspondingly for the half, word and double?

1247

The naming is different with LE since what is happening is fundamentally different. The move is the same for the byte, halfword and word. The doubleword version is the same move as for BE. That's why the LE versions have the "MoveWord" portion, then the "CopyWord" portion (for the regclass copy) and finaly the LE_BHW to signify that this is a byte/half/word correctly aligned for the shift to element 0 on LE.
I can put a comment to this end in the code.

nemanjai updated this revision to Diff 30861.Jul 28 2015, 3:05 PM

Removed the custom code that isn't necessary.
Added a test case specifically for scalar_to_vector nodes.
Renamed the DAG's in PPCInstrVSX.td.
Reconciled the predicate for the v4f32 version (it was predicated on HasDirectMove in the .td file and on hasP8Vector() in the c++ code).

wschmidt edited edge metadata.Jul 29 2015, 11:39 AM

A few issues and questions...

lib/Target/PowerPC/PPCInstrVSX.td
1198

The comment at line 1195 is wrong. These are scalar conversions between single and double precision, right?

1245

BE_DWORD_0? This isn't an LE pattern, so it needs a BE designator, correct? Even though it's used in an LE pattern, it is LE doubleword 1 there, so naming it BE_DWORD_0 is more expressive.

1252

I find LE_BHW and LE_DBL to be pretty incomprehensible names. For consistency and understanding, I would name them LE_WORD_0 and LE_DWORD_0. Also, LE_CPW is arguably LE_WORD_1 and LE_CPD is LE_DWORD_1. This gives the more understandable:

dag LE_DWORD_1 = (v2i64 (COPY_TO_REGCLASS BE_DWORD_0, VSRC));

reinforcing that LE doubleword 1 is the same as BE doubleword 0.

I don't much care what you call LE_MVW so long as it is readable and implies moving an integer to a vector register. Longer meaningful names beat brief incomprehensible ones. Of course that's still better than long incomprehensible ones. ;)

test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
74 ↗(On Diff #30861)

I was confused by this at first, but now I get it. In all of these tests, the code generation that you are checking for is just for the %splat.splatinsert calculation, right? This is the part that translates into a scalar_to_vector node. I assume that the generated code follows this up with code to splat element 0 to the entire result register, correct?

wschmidt added inline comments.Jul 29 2015, 12:53 PM
lib/Target/PowerPC/PPCInstrVSX.td
1198

Withdrawn per online discussion.

nemanjai added inline comments.Jul 29 2015, 12:54 PM
lib/Target/PowerPC/PPCInstrVSX.td
1245

Fair enough, the value needs to be shifted/swapped for LE, so I'll rename it to BE_DWORD_0. And the subsequent swap will put it in LE doubleword 0.

1252

OK, I will rename them as you suggest:
LE_BHW -> LE_WORD_0
LE_CPW -> LE_WORD_1
LE_DBL -> LE_DWORD_0
LE_CPD -> LE_DWORD_1

test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
74 ↗(On Diff #30861)

Yes, the subsequent instruction will be a splat (VSX if we have one, VMX otherwise). I can add those to the check patterns but, I can't verify the correct register (because of the "off-by-32 relationship" between the VSX and VMX registers). I was actually hoping to use the FileCheck capability of checking line numbers (ranges), but that actually looks for the word line I think. So I can add the CHECK: vsplat[bhw] and just confirm the right element in the splat instruction if you'd like.

wschmidt added inline comments.Jul 29 2015, 1:10 PM
test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll
74 ↗(On Diff #30861)

No, that's ok. Just add a comment at the beginning of the file indicating that you are just checking the code generated for the insertelement. Alternatively, you can make this more obvious by putting the CHECK comment lines right after the insertelement in each test variant.

@hfinkel@anl.gov Hi Hal, if you get the chance, can you please have a look and let me know if you're ok with the updates? Thank you.

hfinkel accepted this revision.Aug 9 2015, 8:02 PM
hfinkel edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Aug 9 2015, 8:02 PM
nemanjai closed this revision.Aug 13 2015, 10:41 AM

Committed revision 244921.