This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert.
ClosedPublic

Authored by amyk on Dec 13 2021, 6:40 PM.

Details

Summary

This patch updates the P10 patterns with a load feeding into an insertelt to utilize the
refactored load and store infrastructure, as well as updating any tests that exhibit any
codegen changes. Furthermore, custom legalization is added for v4f32 on Power9 and
above to not only assist with adjusting the refactored load/stores for P10 vector insert,
but also it enables the utilization of direct moves.

Diff Detail

Event Timeline

amyk created this revision.Dec 13 2021, 6:40 PM
amyk requested review of this revision.Dec 13 2021, 6:40 PM

A minor nit about the description - you talk about implementing a DAG combine but you implement a custom legalization.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10786–10792

This does a very good job explaining *what* is being done, which I don't think is particularly useful because the code is rather nicely self-documenting. What is missing is *why* this is being done.
We only do this with ISA 3.1 because on previous architectures, it is cheaper to do an lxsiwzx + a permute.

Note, this is also useful on Power9 because direct moves are cheaper than the alternative. So please pull this out and guard with Subtarget.hasP9Vector() (and of course change the respective tests).

10796

This isn't necessary. Let the DAG handle this for you.

  • Bitcast the vector
  • Bitcast the scalar
  • Produce a v4i32 insert
  • Bitcast and return the result
10799

It is not a good idea to insert an i32 into a v4f32 vector. You should bitcast V1 as well.

nemanjai requested changes to this revision.Jan 20 2022, 7:23 AM
This revision now requires changes to proceed.Jan 20 2022, 7:23 AM
amyk updated this revision to Diff 402091.Jan 21 2022, 1:17 PM
amyk retitled this revision from [PowerPC] Update P10 vector insert patterns to utilize the refactored load/store infrastructure. to [PowerPC] Update P10 vector insert patterns to use refactored load/stores, and update handling of v4f32 vector insert..
amyk edited the summary of this revision. (Show Details)

Address Nemanja's comments:

  • Pulling out the code and guarding it for P9 and above
  • Update test cases affected by the change
  • Update the handling to:
    • Bitcast the vector
    • Bitcast the scalar
    • Produce the v4i32 vector insert
    • Bitcast and return the result
nemanjai accepted this revision.Jan 28 2022, 5:29 AM

LGTM other than the change to the comment.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
10766–10771

Please note that your description re. Power10 is temporal in nature. The "refactored load and store infrastructure..." is newly refactored now. In the future, it will just be how we select loads and stores (i.e. code has no memory of how it looked before). I think it would suffice to add something like this as a comment:

// On targets with inexpensive direct moves (Power9 and up), a
// (insert_vector_elt v4f32:$vec, (f32 load)) is always better as an
// integer load since a single precision load will involve conversion
// to double precision on the load followed by another conversion
// to single precision.
This revision is now accepted and ready to land.Jan 28 2022, 5:29 AM
nemanjai added inline comments.Jan 28 2022, 5:40 AM
llvm/lib/Target/PowerPC/PPCInstrPrefix.td
2818–2819

I think that you should be able to remove all of the patterns for load+insert with f32 since they will not make it to instruction selection (i.e. they're all being converted to i32).

This revision was landed with ongoing or failed builds.Feb 1 2022, 6:48 AM
This revision was automatically updated to reflect the committed changes.
amyk marked an inline comment as done.Feb 1 2022, 6:49 AM