We currently always build vectors from scalar non-constant values using stack operations which involves the load-hit-store hazard. This patch is the first in a series that will allow operations such as scalar to vector, extract vector element, etc. to be done using direct move instructions rather than memory operations.
Since this is just the first patch, the code in some cases is not yet optimal, but all scalar to vector operations involve fewer memory operations.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7504 | Just return the result of the DAG.getNode(...) call. Then you don't even need the { } around the body of the if. That having been said, it does not seem like you have to do this at all. Just declare the relevant SCALAR_TO_VECTOR as Legal, and pattern match it in the usual way in the .td file. | |
lib/Target/PowerPC/PPCISelLowering.h | ||
135 | Why are you adding these? What generates them? | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
1198 | Given that I don't see anything that actually generates this ISD nodes, I'm suspicions that you don't actually have tests for these. | |
1236 | Maybe we should name this BE_LOW_BYTE, or something like that? |
lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
7504 | I agree with the first point and can do as you suggested. Cannot use 'PPCmtvsrdVec' in an output pattern! Of course, I could do away with that SDAG node altogether and provide the same patterns for scalar_to_vector as I did for the new nodes. I just felt that the node that explicitly names the move to VSR to be more descriptive and allow for a different implementation of scalar_to_vector should a need for that arise in the future. | |
lib/Target/PowerPC/PPCISelLowering.h | ||
135 | Thank you for catching this. It turns out that this is one of those events where I have tried a number of approaches to get the results I want until I settled on the approach that I plan to stick with. However, these SDAG nodes were missed in the re-re-re-factoring :). | |
lib/Target/PowerPC/PPCInstrVSX.td | ||
1198 | Yes, sorry about this. See above. I will remove the SD nodes and the patterns from these instructions. | |
1236 | The BE version of this move will put the value into the most significant byte in the VSR. Similarly with the halfword, word and doubleword versions. The LE versions will place it into the corresponding least significant element. | |
1247 | The naming is different with LE since what is happening is fundamentally different. The move is the same for the byte, halfword and word. The doubleword version is the same move as for BE. That's why the LE versions have the "MoveWord" portion, then the "CopyWord" portion (for the regclass copy) and finaly the LE_BHW to signify that this is a byte/half/word correctly aligned for the shift to element 0 on LE. |
Removed the custom code that isn't necessary.
Added a test case specifically for scalar_to_vector nodes.
Renamed the DAG's in PPCInstrVSX.td.
Reconciled the predicate for the v4f32 version (it was predicated on HasDirectMove in the .td file and on hasP8Vector() in the c++ code).
A few issues and questions...
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
1198 | The comment at line 1195 is wrong. These are scalar conversions between single and double precision, right? | |
1245 | BE_DWORD_0? This isn't an LE pattern, so it needs a BE designator, correct? Even though it's used in an LE pattern, it is LE doubleword 1 there, so naming it BE_DWORD_0 is more expressive. | |
1252 | I find LE_BHW and LE_DBL to be pretty incomprehensible names. For consistency and understanding, I would name them LE_WORD_0 and LE_DWORD_0. Also, LE_CPW is arguably LE_WORD_1 and LE_CPD is LE_DWORD_1. This gives the more understandable: dag LE_DWORD_1 = (v2i64 (COPY_TO_REGCLASS BE_DWORD_0, VSRC)); reinforcing that LE doubleword 1 is the same as BE doubleword 0. I don't much care what you call LE_MVW so long as it is readable and implies moving an integer to a vector register. Longer meaningful names beat brief incomprehensible ones. Of course that's still better than long incomprehensible ones. ;) | |
test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll | ||
74 ↗ | (On Diff #30861) | I was confused by this at first, but now I get it. In all of these tests, the code generation that you are checking for is just for the %splat.splatinsert calculation, right? This is the part that translates into a scalar_to_vector node. I assume that the generated code follows this up with code to splat element 0 to the entire result register, correct? |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
1198 | Withdrawn per online discussion. |
lib/Target/PowerPC/PPCInstrVSX.td | ||
---|---|---|
1245 | Fair enough, the value needs to be shifted/swapped for LE, so I'll rename it to BE_DWORD_0. And the subsequent swap will put it in LE doubleword 0. | |
1252 | OK, I will rename them as you suggest: | |
test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll | ||
74 ↗ | (On Diff #30861) | Yes, the subsequent instruction will be a splat (VSX if we have one, VMX otherwise). I can add those to the check patterns but, I can't verify the correct register (because of the "off-by-32 relationship" between the VSX and VMX registers). I was actually hoping to use the FileCheck capability of checking line numbers (ranges), but that actually looks for the word line I think. So I can add the CHECK: vsplat[bhw] and just confirm the right element in the splat instruction if you'd like. |
test/CodeGen/PowerPC/p8-scalar_vector_conversions.ll | ||
---|---|---|
74 ↗ | (On Diff #30861) | No, that's ok. Just add a comment at the beginning of the file indicating that you are just checking the code generated for the insertelement. Alternatively, you can make this more obvious by putting the CHECK comment lines right after the insertelement in each test variant. |
@hfinkel@anl.gov Hi Hal, if you get the chance, can you please have a look and let me know if you're ok with the updates? Thank you.
Why are you adding these? What generates them?