This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Simplify fp-to-int store optimization
ClosedPublic

Authored by qiucf on Jan 11 2023, 2:14 AM.

Details

Reviewers
nemanjai
shchenz
lei
Group Reviewers
Restricted Project
Commits
rG8064caf83fb1: [PowerPC] Simplify fp-to-int store optimization
Summary

On PowerPC VSX targets, fp-to-int will be transformed into xscv* + mfvsr. When the result is to be stored, mfvsr can be replaced by a direct store.

This patch simplifies the optimization by using existing fp-to-int code, which also helps for:

  1. Strict floating point support of this optimization (in another patch)
  2. Better codegen from CSE

Diff Detail

Event Timeline

qiucf created this revision.Jan 11 2023, 2:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2023, 2:14 AM
qiucf requested review of this revision.Jan 11 2023, 2:14 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 11 2023, 2:14 AM
shchenz added inline comments.Jan 11 2023, 11:49 PM
llvm/test/CodeGen/PowerPC/store_fptoi.ll
1032

From the case change, I am thinking common the 3 xscvdpsxds sounds like making more sense?

qiucf updated this revision to Diff 494157.Feb 1 2023, 9:18 PM
qiucf marked an inline comment as done.
qiucf retitled this revision from [PowerPC] Only optimize store of fptoint for single used to [PowerPC] Simplify fp-to-int store optimization.
qiucf edited the summary of this revision. (Show Details)
qiucf updated this revision to Diff 495004.Feb 5 2023, 10:46 PM
shchenz added inline comments.Apr 11 2023, 8:11 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8075

This change is not enough I think. You extend this function to be used before type legalizer and make this function depend on the caller that will not pass other illegal types except i8 and i16. How about simple types like i2 and i4?

We need a guard for the type in this function.

8076

And a case please for i16 or i8

8089

If we plan to support type MVT::f128, we may need to change the PPCISD node names to generic ones. These names are 1:1 map to PPC instructions and these instructions assume the types are f64.

14878

We may need to following this logic and consider about adding the users back to the work list.

llvm/lib/Target/PowerPC/PPCInstrP10.td
1257

These two patterns are same with below two patterns. We still need to distinguish f64 and f128 input for PPCstore_scal_int_from_vsr.

2208

Same like above, there are duplicated patterns now as f64 and f128 are not distinguished

If this patch can optimize the redundant xscvdpsxws away for case: https://godbolt.org/z/zWsaae4j4 ?

qiucf updated this revision to Diff 513057.Apr 12 2023, 9:38 PM
qiucf marked 3 inline comments as done.

If this patch can optimize the redundant xscvdpsxws away for case: https://godbolt.org/z/zWsaae4j4 ?

Yes, one less xscvdpsxws.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8075

combineStoreFPToInt already rejected illegal types. And codegen for i2 i4 are good:

xscvdpsxws 0, 1
mffprwz	3, 0
clrlwi	3, 3, 28
stb 3, 0(4)
stb 3, 0(5)
stb 3, 0(6)
blr
8089

Thanks for reminding about this. I think that deserves another patch. I propose renaming them into CONV_FP_(I|U)(32|64)

14878

Is that necessary? DAG combiner will add the users into worklist.

qiucf added inline comments.Apr 13 2023, 7:58 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14878

The root node to be returned is the ST_VSR_SCAL_INT, then FCTIDZ, then FP_EXTEND (if f32). The FP_EXTEND from f32 to f64 will disappear after ISel. The others are PPC-specific nodes so adding them into worklist manually will not help.

Yes, one less xscvdpsxws.

Good to hear that. This patch looks right to me. Some more comments

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
8075

OK. But if we assume caller will only pass certain types to this function, we may need to add an assertion here? Otherwise when there is a new caller for this function in future, it is not so obvious that this function has assumption for the types. For example, if we pass i2 or i4 or even pass i8 or i16 on a pwr8 target, we get llvm_unreachable("Unhandled FP_TO_INT type in custom expander!");?

8128

Should we use MVT::i64 for DestTy if the target is 64-bit? With this, we should still have accurate round result if the input float is between 2^31 and 2^63 -1 before truncating.

14878

OK, fair. Thanks for checking.

llvm/lib/Target/PowerPC/PPCInstrP10.td
1258

Can you explain why we need to copy to regclass VSRC? The definition of PSTXSDpc seems requiring RC to be VFRC for $src.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
3161

Compared with the lines at 4046, we seems missing some pattern here? This is for [HasVSX, NoP9Vector].

3311

This is for [HasVSX, HasP8Vector]

4046

This is for HasVSX, HasP9Vector.

Except the patterns for STXSIHX and STXSIBX(Power9 specific), all other patterns seems common for these three predicates: [HasVSX, HasP9Vector], [HasVSX, HasP8Vector], [HasVSX, NoP9Vector]. Can we try to refactor these patterns a little bit, there seems some redundancy with current implementation?

qiucf updated this revision to Diff 517436.Apr 26 2023, 7:25 PM
qiucf marked 2 inline comments as done.

Address comments and further simplify tablegen patterns.

qiucf marked 2 inline comments as done.Apr 26 2023, 7:28 PM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCInstrP10.td
1258

Thanks, I updated the copied classes to respective regclass of the instructions. This COPY_TO_REGCLASS is needed, otherewise we'll get a HW typeset is empty build failure.

llvm/lib/Target/PowerPC/PPCInstrVSX.td
3161

This is for pwr7/pwr8 only, other patterns are located at HasP8Vector HasVSX section, so that they can be shared with pwr9.

shchenz added inline comments.May 7 2023, 8:49 PM
llvm/lib/Target/PowerPC/PPCInstrP10.td
1258

Yeah, I know we need a COPY_TO_REGCLASS, my confusion is why the dest RC is VSRC, not the original VFRC as the $src for PSTXSDpc requires VFRC.

qiucf updated this revision to Diff 522488.May 16 2023, 1:21 AM
qiucf added inline comments.
llvm/lib/Target/PowerPC/PPCInstrP10.td
1258

I updated them accordingly, they should get missed in last revision.

As of why VSRC also passes, I think the classes will be automatically shortened.

shchenz accepted this revision as: shchenz.May 17 2023, 10:25 PM

LGTM. No more comments from me.

Please hold on some days in case other reviewers have concerns. Thanks very much for this improvement.

llvm/lib/Target/PowerPC/PPCInstrP10.td
1258

Thanks, yes, VFRC matchs what I thought.

This revision is now accepted and ready to land.May 17 2023, 10:25 PM

And look forward to your refactor patch related to the PPCISD nodes.

This revision was landed with ongoing or failed builds.May 23 2023, 1:50 AM
This revision was automatically updated to reflect the committed changes.
vitalybuka reopened this revision.May 24 2023, 10:00 AM
vitalybuka added a subscriber: vitalybuka.

Reverting to fix the bot

This revision is now accepted and ready to land.May 24 2023, 10:00 AM
qiucf closed this revision.Jun 5 2023, 1:08 AM