Fixes PR25056.
Details
Diff Detail
Event Timeline
Thanks Charlie for this patch.
lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
9614 | Do we need to split into Lo and Hi again here. |
Indeed, I wanted to get review on this approach first before adding the other operations. I'll update the patch tomorrow. Thank you very much for your review!
--Charlie.
Echoing Jun, this needs to support SADDV, [SU]{MAX,MIN}V too (do we support FMINV and friends yet?)
Please also add a testcase that shows this works when more than one split is required; <16 x i32> for example.
My change for FMAXNMV and FMINNMV is in http://reviews.llvm.org/D13121.
James, this change need your feedback about FMINV/FMAXV supports.
Charlie, are you planing to support SADDV, [SU]{MAX,MIN}V as well ? Please let me know if you are not, then, I may extend it.
Charlie, are you planing to support SADDV, [SU]{MAX,MIN}V as well ? Please let me know if you are not, then, I may extend it
Yep, I am planning on adding support for those in this patch.
Thanks Jun & James for the reviews!
I've added support for SADDV, [SU]{MAX,MIN}V.
do we support FMINV and friends yet?
Doesn't look like it.
Please also add a testcase that shows this works when more than one split is required; <16 x i32> for example.
There's some missing patterns for ADDV that prevents me from testing this. I raised https://llvm.org/bugs/show_bug.cgi?id=25093
The SADDV / UADDV distinction looks a bit superficial to me, so ISD::ADD is used in both cases. We don't have a sign distinction in the architecture for ADDV, so I don't know why we have two nodes for them.
I'll update the commit title when it lands to reflect the wider scope.
Looks like PR25093 is another case for ADDV match.
For the test case for <16 x i32> here, I think we could also use below simple test case :
define i32 @test(<16 x i32>* %arr) { %bin.rdx = load <16 x i32>, <16 x i32>* %arr %rdx.shuf0 = shufflevector <16 x i32> %bin.rdx, <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> %bin.rdx0 = add <16 x i32> %bin.rdx, %rdx.shuf0 %rdx.shuf = shufflevector <16 x i32> %bin.rdx0, <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef > %bin.rdx11 = add <16 x i32> %bin.rdx0, %rdx.shuf %rdx.shuf12 = shufflevector <16 x i32> %bin.rdx11, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef> %bin.rdx13 = add <16 x i32> %bin.rdx11, %rdx.shuf12 %rdx.shuf13 = shufflevector <16 x i32> %bin.rdx13, <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef> %bin.rdx14 = add <16 x i32> %bin.rdx13, %rdx.shuf13 %r = extractelement <16 x i32> %bin.rdx14, i32 0 ret i32 %r }
Please also see below simple test case for <16xi32> smaxv :
define i32 @foo(<16 x i32>* nocapture readonly %arr) { %arr.load = load <16 x i32>, <16 x i32>* %arr %rdx.shuf = shufflevector <16 x i32> %arr.load, <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> %rdx.minmax.cmp22 = icmp sgt <16 x i32> %arr.load, %rdx.shuf %rdx.minmax.select23 = select <16 x i1> %rdx.minmax.cmp22, <16 x i32> %arr.load, <16 x i32> %rdx.shuf %rdx.shuf24 = shufflevector <16 x i32> %rdx.minmax.select23, <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> %rdx.minmax.cmp25 = icmp sgt <16 x i32> %rdx.minmax.select23, %rdx.shuf24 %rdx.minmax.select26 = select <16 x i1> %rdx.minmax.cmp25, <16 x i32> %rdx.minmax.select23, <16 x i32> %rdx.shuf24 %rdx.shuf27 = shufflevector <16 x i32> %rdx.minmax.select26, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> %rdx.minmax.cmp28 = icmp sgt <16 x i32> %rdx.minmax.select26, %rdx.shuf27 %rdx.minmax.select29 = select <16 x i1> %rdx.minmax.cmp28, <16 x i32> %rdx.minmax.select26, <16 x i32> %rdx.shuf27 %rdx.shuf30 = shufflevector <16 x i32> %rdx.minmax.select29, <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef> %rdx.minmax.cmp31 = icmp sgt <16 x i32> %rdx.minmax.select29, %rdx.shuf30 %rdx.minmax.cmp31.elt = extractelement <16 x i1> %rdx.minmax.cmp31, i32 0 %rdx.minmax.select29.elt = extractelement <16 x i32> %rdx.minmax.select29, i32 0 %rdx.shuf30.elt = extractelement <16 x i32> %rdx.minmax.select29, i32 1 %r = select i1 %rdx.minmax.cmp31.elt, i32 %rdx.minmax.select29.elt, i32 %rdx.shuf30.elt ret i32 %r }
Do we need to split into Lo and Hi again here.
Is "std::tie(Lo, Hi) = DAG.SplitVectorOperand(N, 0);" not enough?