Page MenuHomePhabricator

[AArch64] Implement vector splitting on UADDV.
ClosedPublic

Authored by chatur01 on Oct 6 2015, 4:49 AM.

Diff Detail

Event Timeline

chatur01 retitled this revision from to [AArch64] Implement vector splitting on UADDV..
chatur01 updated this object.
chatur01 added reviewers: jmolloy, mcrosier, junbuml.
chatur01 added a subscriber: llvm-commits.
junbuml edited edge metadata.Oct 6 2015, 9:34 AM

Thanks Charlie for this patch.

lib/Target/AArch64/AArch64ISelLowering.cpp
9614

Do we need to split into Lo and Hi again here.
Is "std::tie(Lo, Hi) = DAG.SplitVectorOperand(N, 0);" not enough?

chatur01 edited edge metadata.

Thanks Jun, that was a mistake.

Similar splitting for [S|U][MINV|MAXV should be done as well.

Indeed, I wanted to get review on this approach first before adding the other operations. I'll update the patch tomorrow. Thank you very much for your review!

--Charlie.

jmolloy requested changes to this revision.Oct 7 2015, 2:46 AM
jmolloy edited edge metadata.

Echoing Jun, this needs to support SADDV, [SU]{MAX,MIN}V too (do we support FMINV and friends yet?)

Please also add a testcase that shows this works when more than one split is required; <16 x i32> for example.

This revision now requires changes to proceed.Oct 7 2015, 2:46 AM

My change for FMAXNMV and FMINNMV is in http://reviews.llvm.org/D13121.
James, this change need your feedback about FMINV/FMAXV supports.

Charlie, are you planing to support SADDV, [SU]{MAX,MIN}V as well ? Please let me know if you are not, then, I may extend it.

Charlie, are you planing to support SADDV, [SU]{MAX,MIN}V as well ? Please let me know if you are not, then, I may extend it

Yep, I am planning on adding support for those in this patch.

chatur01 edited edge metadata.

Thanks Jun & James for the reviews!

I've added support for SADDV, [SU]{MAX,MIN}V.

do we support FMINV and friends yet?

Doesn't look like it.

Please also add a testcase that shows this works when more than one split is required; <16 x i32> for example.

There's some missing patterns for ADDV that prevents me from testing this. I raised https://llvm.org/bugs/show_bug.cgi?id=25093

The SADDV / UADDV distinction looks a bit superficial to me, so ISD::ADD is used in both cases. We don't have a sign distinction in the architecture for ADDV, so I don't know why we have two nodes for them.

I'll update the commit title when it lands to reflect the wider scope.

jmolloy accepted this revision.Oct 7 2015, 9:59 AM
jmolloy edited edge metadata.

LGTM, thanks!

This revision is now accepted and ready to land.Oct 7 2015, 9:59 AM

Looks like PR25093 is another case for ADDV match.
For the test case for <16 x i32> here, I think we could also use below simple test case :

define i32 @test(<16 x i32>* %arr)  {
  %bin.rdx = load <16 x i32>, <16 x i32>* %arr

  %rdx.shuf0 = shufflevector <16 x i32> %bin.rdx, <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %bin.rdx0 = add <16 x i32> %bin.rdx, %rdx.shuf0

  %rdx.shuf = shufflevector <16 x i32> %bin.rdx0, <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef >
  %bin.rdx11 = add <16 x i32> %bin.rdx0, %rdx.shuf

  %rdx.shuf12 = shufflevector <16 x i32> %bin.rdx11, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef>
  %bin.rdx13 = add <16 x i32> %bin.rdx11, %rdx.shuf12

  %rdx.shuf13 = shufflevector <16 x i32> %bin.rdx13, <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef>
  %bin.rdx14 = add <16 x i32> %bin.rdx13, %rdx.shuf13

  %r = extractelement <16 x i32> %bin.rdx14, i32 0
  ret i32 %r
}

Please also see below simple test case for <16xi32> smaxv :

define i32 @foo(<16 x i32>* nocapture readonly %arr)  {
  %arr.load = load <16 x i32>, <16 x i32>* %arr
  %rdx.shuf = shufflevector <16 x i32> %arr.load, <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %rdx.minmax.cmp22 = icmp sgt <16 x i32> %arr.load, %rdx.shuf
  %rdx.minmax.select23 = select <16 x i1> %rdx.minmax.cmp22, <16 x i32> %arr.load, <16 x i32> %rdx.shuf
  %rdx.shuf24 = shufflevector <16 x i32> %rdx.minmax.select23, <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %rdx.minmax.cmp25 = icmp sgt <16 x i32> %rdx.minmax.select23, %rdx.shuf24
  %rdx.minmax.select26 = select <16 x i1> %rdx.minmax.cmp25, <16 x i32> %rdx.minmax.select23, <16 x i32> %rdx.shuf24
  %rdx.shuf27 = shufflevector <16 x i32> %rdx.minmax.select26, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %rdx.minmax.cmp28 = icmp sgt <16 x i32> %rdx.minmax.select26, %rdx.shuf27
  %rdx.minmax.select29 = select <16 x i1> %rdx.minmax.cmp28, <16 x i32> %rdx.minmax.select26, <16 x i32> %rdx.shuf27
  %rdx.shuf30 = shufflevector <16 x i32> %rdx.minmax.select29, <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %rdx.minmax.cmp31 = icmp sgt <16 x i32> %rdx.minmax.select29, %rdx.shuf30
  %rdx.minmax.cmp31.elt = extractelement <16 x i1> %rdx.minmax.cmp31, i32 0
  %rdx.minmax.select29.elt = extractelement <16 x i32> %rdx.minmax.select29, i32 0
  %rdx.shuf30.elt = extractelement <16 x i32> %rdx.minmax.select29, i32 1
  %r = select i1 %rdx.minmax.cmp31.elt, i32 %rdx.minmax.select29.elt, i32 %rdx.shuf30.elt
  ret i32 %r
}
chatur01 edited edge metadata.

Thanks Jun for your test cases. I have added some more to my patch.

chatur01 closed this revision.Oct 16 2015, 8:40 AM