This patch updates the P10 patterns with a load feeding into an insertelt to utilize the
refactored load and store infrastructure, as well as updating any tests that exhibit any
codegen changes. Furthermore, custom legalization is added for v4f32 on Power9 and
above to not only assist with adjusting the refactored load/stores for P10 vector insert,
but also it enables the utilization of direct moves.
Details
- Reviewers
nemanjai - Group Reviewers
Restricted Project - Commits
- rG0d6e64755acf: [PowerPC] Update P10 vector insert patterns to use refactored load/stores, and…
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
A minor nit about the description - you talk about implementing a DAG combine but you implement a custom legalization.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10786–10792 | This does a very good job explaining *what* is being done, which I don't think is particularly useful because the code is rather nicely self-documenting. What is missing is *why* this is being done. Note, this is also useful on Power9 because direct moves are cheaper than the alternative. So please pull this out and guard with Subtarget.hasP9Vector() (and of course change the respective tests). | |
10796 | This isn't necessary. Let the DAG handle this for you.
| |
10799 | It is not a good idea to insert an i32 into a v4f32 vector. You should bitcast V1 as well. |
Address Nemanja's comments:
- Pulling out the code and guarding it for P9 and above
- Update test cases affected by the change
- Update the handling to:
- Bitcast the vector
- Bitcast the scalar
- Produce the v4i32 vector insert
- Bitcast and return the result
LGTM other than the change to the comment.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp | ||
---|---|---|
10766–10771 | Please note that your description re. Power10 is temporal in nature. The "refactored load and store infrastructure..." is newly refactored now. In the future, it will just be how we select loads and stores (i.e. code has no memory of how it looked before). I think it would suffice to add something like this as a comment: // On targets with inexpensive direct moves (Power9 and up), a // (insert_vector_elt v4f32:$vec, (f32 load)) is always better as an // integer load since a single precision load will involve conversion // to double precision on the load followed by another conversion // to single precision. |
llvm/lib/Target/PowerPC/PPCInstrPrefix.td | ||
---|---|---|
2818–2819 | I think that you should be able to remove all of the patterns for load+insert with f32 since they will not make it to instruction selection (i.e. they're all being converted to i32). |
Please note that your description re. Power10 is temporal in nature. The "refactored load and store infrastructure..." is newly refactored now. In the future, it will just be how we select loads and stores (i.e. code has no memory of how it looked before). I think it would suffice to add something like this as a comment: