This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] custom lower `v2f64 fpext v2f32`
ClosedPublic

Authored by lei on Feb 6 2019, 3:46 PM.

Details

Summary

Reduces scalarization overhead via custom lowering of v2f64 fpext v2f32

eg. For the following IR

%0 = load <2 x float>, <2 x float>* %Ptr, align 8
%1 = fpext <2 x float> %0 to <2 x double>
ret <2 x double> %1

Pre custom lowering:

ld r3, 0(r3)
mtvsrd f0, r3
xxswapd vs34, vs0
xscvspdpn f0, vs0
xxsldwi vs1, vs34, vs34, 3
xscvspdpn f1, vs1
xxmrghd vs34, vs0, vs1

After custom lowering:

lfd f0, 0(r3)
xxmrghw vs0, vs0, vs0
xvcvspdp vs34, vs0

spec2017 improvements:

  • parest by 1.16%
  • blender by 1.24%.

spec2006 improvements:

  • mcf by 2%
  • xalancbmk by 1.29%

Diff Detail

Repository
rL LLVM

Event Timeline

lei created this revision.Feb 6 2019, 3:46 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 6 2019, 3:46 PM
saghir requested changes to this revision.Feb 7 2019, 2:30 PM
saghir added a subscriber: saghir.
saghir added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9526 ↗(On Diff #185655)

Formatting

9544 ↗(On Diff #185655)

Minor: DAG instead of dag.

llvm/lib/Target/PowerPC/PPCISelLowering.h
407 ↗(On Diff #185655)

expand maybe?

llvm/lib/Target/PowerPC/PPCInstrVSX.td
1089 ↗(On Diff #185655)

You may get rid of this new line.

llvm/test/CodeGen/PowerPC/reduce_scalarization.ll
7 ↗(On Diff #185655)

Do we need RUN on all these lines?

This revision now requires changes to proceed.Feb 7 2019, 2:30 PM
saghir added inline comments.Feb 7 2019, 5:51 PM
llvm/test/CodeGen/PowerPC/reduce_scalarization.ll
7 ↗(On Diff #185655)

Never mind - please ignore this comment.

lei requested review of this revision.Feb 13 2019, 5:21 AM
amyk added a subscriber: amyk.Feb 13 2019, 9:45 AM
amyk added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9553 ↗(On Diff #185655)

Spacing between newOp=DAG... into newOp = DAG... maybe?

llvm/lib/Target/PowerPC/PPCISelLowering.h
407 ↗(On Diff #185655)

I think maybe the correct word is extend since the line below is FP_EXTEND_LHW?

lei updated this revision to Diff 186710.Feb 13 2019, 11:42 AM
lei marked 5 inline comments as done.

Address review comments.

saghir accepted this revision.Feb 13 2019, 12:27 PM

LGTM

This revision is now accepted and ready to land.Feb 13 2019, 12:27 PM

I think approaching the problem this way is unnecessarily limited - and I say that knowing that I may have suggested a similar approach.
However, what we are actually looking to do is to convert a DAG such as:

t1: v2f32,ch = load(...)
t2: v2f32,ch = load(...)
# arbitrary operations on v2f32
tN: v2f64 = fp_extend...

into:

t1: v2f32,ch = load(...)
t2: v2f32,ch = load(...)
t3: v2f64 = fp_extend t1
t4: v2f64 = fp_extend t2
# widen all operations to v2f64

And then all we need is a custom load of two float values into a vector of two double values. I think the best way to do that would be to combine any occurrences of
(v2f64 fp_extend (v2f32 op (v2f32 op_input1), (v2f32 op_input2)...)) (i.e. as long as all its inputs are of type v2f32)
into
(v2f64 op (fp_extend op_input1) [, (fp_extend op_input2)...])

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9524 ↗(On Diff #186710)

// We only want to custom lower an extend from v2f32 to v2f64.

Talking about return values or parameters on SD Nodes seems quite unnatural.

9544 ↗(On Diff #186710)

You are not generating a new DAG, just a new node.

9568 ↗(On Diff #186710)

It is obvious that an llvm_unreachable should not be reached. A more descriptive comment is desired. This is equivalent to emitting a message when the compiler crashes to say "Compiler crashed".

llvm/lib/Target/PowerPC/PPCISelLowering.h
408 ↗(On Diff #186710)

I don't think you should use the abbreviation LHW in these as that seems to suggest a "half-word" and that is not the case. This is the "low half" of a VSR (which is itself a doubleword). I believe FP-EXTEND_LH and LD_VSX_LH should be adequate (the name of the node as you have it makes it seem as if it mimics an ISA mnemonic, which it does not).

lei updated this revision to Diff 194183.Apr 8 2019, 11:39 AM
lei marked 3 inline comments as done.

Update comments and renamed new ppc ISD nodes.

I tried the suggested approach to fpextend v2f32 loads to v2f64 and then adding a custom load of two float values into a vector of two double values. However this generated extra instructions for each load and we no longer have any of the performance gains seen with the original approach. With the new approach we see negligible performance changes for all 2017 spec benchmarks with the exception of omnetpp which showed a 1.7% performance degradation.

lei requested review of this revision.Apr 8 2019, 11:56 AM
nemanjai accepted this revision.May 3 2019, 5:51 AM

Other than a few minor nits that can be addressed on the commit, LGTM.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
9599 ↗(On Diff #194183)

In a subsequent patch, we should probably add FP unary operations here as well.

9619 ↗(On Diff #194183)

nit: naming convention (NewOp).

llvm/lib/Target/PowerPC/PPCInstrVSX.td
3307 ↗(On Diff #194183)

These should copy into VSRC rather than VRRC so as to avoid unnecessary copies. It is very important to get these into the right register class as the extra copies will certainly reduce the performance impact of this transformation.

This revision is now accepted and ready to land.May 3 2019, 5:51 AM
This revision was automatically updated to reflect the committed changes.
lei marked 2 inline comments as done.