This is an archive of the discontinued LLVM Phabricator instance.

[ppc] Legalize the load of MVT::v4i8 into VSX register
ClosedPublic

Authored by Carrot on Nov 15 2016, 3:54 PM.

Details

Summary

VSX has an instruction lxsiwax that can load 32bit value into VSX register. That patch makes it known to memory cost model, so the vectorization of the test case in pr30990 is beneficial.

Diff Detail

Repository
rL LLVM

Event Timeline

Carrot updated this revision to Diff 78092.Nov 15 2016, 3:54 PM
Carrot retitled this revision from to [ppc] Legalize the load of MVT::v4i8 into VSX register.
Carrot updated this object.
Carrot added a reviewer: hfinkel.
Carrot added a subscriber: llvm-commits.
nemanjai added inline comments.Nov 16 2016, 10:40 AM
lib/Target/PowerPC/PPCISelLowering.cpp
680 ↗(On Diff #78092)

Is there more to this patch? Taking a quick look in PPCISelLowering.cpp, I don't see which custom lowering function will be invoked that will actually handle this node. Of course, it's entirely possible that I just missed it.

Carrot added inline comments.Nov 16 2016, 8:15 PM
lib/Target/PowerPC/PPCISelLowering.cpp
680 ↗(On Diff #78092)

You didn't miss anything.
After vectorization following load instruction is generated
%wide.load = load <4 x i8>, <4 x i8>* %lsr.iv16, align 1, !tbaa !1

In lowering phase, it is translated to following SDNode
t5: v4i8,ch = load<LD4[%lsr.iv16](align=1)(tbaa=<0xa868c68>)> t0, t2, undef:i64

And after LegalizeTypes(),

    t48: i32,ch = load<LD4[%lsr.iv16](align=1)(tbaa=<0xa868c68>)> t0, t2, undef:i64
  t49: v4i32 = scalar_to_vector t48
t50: v16i8 = bitcast t49

And then it is directly translated to LXSIWAX.
So it is already correctly handled.

nemanjai added inline comments.Nov 17 2016, 6:29 AM
lib/Target/PowerPC/PPCISelLowering.cpp
680 ↗(On Diff #78092)

I see, legalization eliminates this illegal type and this patch just serves to provide a better answer when TTI checks if this is "LegalOrCustom" so that the vectorizer does not consider this too expensive to be profitable.
If that understanding is correct, could you please add a comment outlining this above this line so that it is clear why this is a Custom operation?

Carrot updated this revision to Diff 78424.Nov 17 2016, 3:02 PM
Carrot marked an inline comment as done.
amehsan added inline comments.
lib/Target/PowerPC/PPCISelLowering.cpp
681–683 ↗(On Diff #78424)

Now that we are fixing this for v4i8, is it possible to fix this for some other vector types if a similar fix makes sense for them? For example does it make sense to make a similar change for v4i16 or v2i16 as well?

Carrot updated this revision to Diff 79479.Nov 28 2016, 4:45 PM
Carrot marked an inline comment as done.
hfinkel edited edge metadata.Dec 1 2016, 2:29 PM

This is certainly a novel use of setLoadExtAction for an illegal type ;) -- I worry that this is a bit hacky and not a technique we can use universally regardless. My inclination is that putting logic into PPCTargetTransformInfo would be better. X86TargetTransformInfo certainly has its fair share of cost tables, etc. to handle various situations the generic cost logic doesn't get quite right. I'm certainly open to hearing other opinions.

Carrot updated this revision to Diff 80138.Dec 2 2016, 3:27 PM
Carrot edited edge metadata.

Add the memory cost computation to PPCTTIImpl::getMemoryOpCost.

hfinkel accepted this revision.Dec 2 2016, 3:51 PM
hfinkel edited edge metadata.

Please also directly add a cost-model test (e.g. to test/Analysis/CostModel/PowerPC/load_store.ll). With that, this LGTM.

This revision is now accepted and ready to land.Dec 2 2016, 3:51 PM
This revision was automatically updated to reflect the committed changes.