This is an archive of the discontinued LLVM Phabricator instance.

Indeed, I wanted to get review on this approach first before adding the other operations. I'll update the patch tomorrow. Thank you very much for your review!

--Charlie.

Echoing Jun, this needs to support SADDV, [SU]{MAX,MIN}V too (do we support FMINV and friends yet?)

Please also add a testcase that shows this works when more than one split is required; <16 x i32> for example.

This revision now requires changes to proceed.Oct 7 2015, 2:46 AM

My change for FMAXNMV and FMINNMV is in http://reviews.llvm.org/D13121.
James, this change need your feedback about FMINV/FMAXV supports.

Charlie, are you planing to support SADDV, [SU]{MAX,MIN}V as well ? Please let me know if you are not, then, I may extend it.

Charlie, are you planing to support SADDV, [SU]{MAX,MIN}V as well ? Please let me know if you are not, then, I may extend it

Yep, I am planning on adding support for those in this patch.

Thanks Jun & James for the reviews!

I've added support for SADDV, [SU]{MAX,MIN}V.

do we support FMINV and friends yet?

Doesn't look like it.

Please also add a testcase that shows this works when more than one split is required; <16 x i32> for example.

There's some missing patterns for ADDV that prevents me from testing this. I raised https://llvm.org/bugs/show_bug.cgi?id=25093

The SADDV / UADDV distinction looks a bit superficial to me, so ISD::ADD is used in both cases. We don't have a sign distinction in the architecture for ADDV, so I don't know why we have two nodes for them.

I'll update the commit title when it lands to reflect the wider scope.

LGTM, thanks!

This revision is now accepted and ready to land.Oct 7 2015, 9:59 AM

Looks like PR25093 is another case for ADDV match.
For the test case for <16 x i32> here, I think we could also use below simple test case :

define i32 @test(<16 x i32>* %arr)  {
  %bin.rdx = load <16 x i32>, <16 x i32>* %arr

  %rdx.shuf0 = shufflevector <16 x i32> %bin.rdx, <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %bin.rdx0 = add <16 x i32> %bin.rdx, %rdx.shuf0

  %rdx.shuf = shufflevector <16 x i32> %bin.rdx0, <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef >
  %bin.rdx11 = add <16 x i32> %bin.rdx0, %rdx.shuf

  %rdx.shuf12 = shufflevector <16 x i32> %bin.rdx11, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef>
  %bin.rdx13 = add <16 x i32> %bin.rdx11, %rdx.shuf12

  %rdx.shuf13 = shufflevector <16 x i32> %bin.rdx13, <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef, i32 undef, i32 undef,i32 undef, i32 undef>
  %bin.rdx14 = add <16 x i32> %bin.rdx13, %rdx.shuf13

  %r = extractelement <16 x i32> %bin.rdx14, i32 0
  ret i32 %r
}

Please also see below simple test case for <16xi32> smaxv :

define i32 @foo(<16 x i32>* nocapture readonly %arr)  {
  %arr.load = load <16 x i32>, <16 x i32>* %arr
  %rdx.shuf = shufflevector <16 x i32> %arr.load, <16 x i32> undef, <16 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %rdx.minmax.cmp22 = icmp sgt <16 x i32> %arr.load, %rdx.shuf
  %rdx.minmax.select23 = select <16 x i1> %rdx.minmax.cmp22, <16 x i32> %arr.load, <16 x i32> %rdx.shuf
  %rdx.shuf24 = shufflevector <16 x i32> %rdx.minmax.select23, <16 x i32> undef, <16 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %rdx.minmax.cmp25 = icmp sgt <16 x i32> %rdx.minmax.select23, %rdx.shuf24
  %rdx.minmax.select26 = select <16 x i1> %rdx.minmax.cmp25, <16 x i32> %rdx.minmax.select23, <16 x i32> %rdx.shuf24
  %rdx.shuf27 = shufflevector <16 x i32> %rdx.minmax.select26, <16 x i32> undef, <16 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %rdx.minmax.cmp28 = icmp sgt <16 x i32> %rdx.minmax.select26, %rdx.shuf27
  %rdx.minmax.select29 = select <16 x i1> %rdx.minmax.cmp28, <16 x i32> %rdx.minmax.select26, <16 x i32> %rdx.shuf27
  %rdx.shuf30 = shufflevector <16 x i32> %rdx.minmax.select29, <16 x i32> undef, <16 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
  %rdx.minmax.cmp31 = icmp sgt <16 x i32> %rdx.minmax.select29, %rdx.shuf30
  %rdx.minmax.cmp31.elt = extractelement <16 x i1> %rdx.minmax.cmp31, i32 0
  %rdx.minmax.select29.elt = extractelement <16 x i32> %rdx.minmax.select29, i32 0
  %rdx.shuf30.elt = extractelement <16 x i32> %rdx.minmax.select29, i32 1
  %r = select i1 %rdx.minmax.cmp31.elt, i32 %rdx.minmax.select29.elt, i32 %rdx.shuf30.elt
  ret i32 %r
}

Thanks Jun for your test cases. I have added some more to my patch.

• chatur01 closed this revision.Oct 16 2015, 8:40 AM

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

16 lines

test/

CodeGen/

AArch64/

aarch64-addv.ll

24 lines

Diff 36634

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,594 Lines • ▼ Show 20 Lines	Op = SDValue(
DAG.getMachineNode(TargetOpcode::INSERT_SUBREG, DL, MVT::f32,		DAG.getMachineNode(TargetOpcode::INSERT_SUBREG, DL, MVT::f32,
DAG.getUNDEF(MVT::i32), Op,		DAG.getUNDEF(MVT::i32), Op,
DAG.getTargetConstant(AArch64::hsub, DL, MVT::i32)),		DAG.getTargetConstant(AArch64::hsub, DL, MVT::i32)),
0);		0);
Op = DAG.getNode(ISD::BITCAST, DL, MVT::i32, Op);		Op = DAG.getNode(ISD::BITCAST, DL, MVT::i32, Op);
Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, MVT::i16, Op));		Results.push_back(DAG.getNode(ISD::TRUNCATE, DL, MVT::i16, Op));
}		}

		static void ReplaceUADDVResults(SDNode *N, SmallVectorImpl<SDValue> &Results,
		SelectionDAG &DAG) {
		EVT LoVT, HiVT;
		SDValue Lo, Hi;
		SDLoc dl(N);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		std::tie(LoVT, HiVT) = DAG.GetSplitDestVTs(N->getValueType(0));
		std::tie(Lo, Hi) = DAG.SplitVectorOperand(N, 0);
		SDValue ResultAdd = DAG.getNode(ISD::ADD, dl, LoVT, Lo, Hi);
		SDValue SplitUADDV = DAG.getNode(AArch64ISD::UADDV, dl, LoVT, ResultAdd);
		Results.push_back(SplitUADDV);
		}
		junbumlUnsubmitted Not Done Reply Inline Actions Do we need to split into Lo and Hi again here. Is "std::tie(Lo, Hi) = DAG.SplitVectorOperand(N, 0);" not enough? junbuml: Do we need to split into Lo and Hi again here. Is "std::tie(Lo, Hi) = DAG.SplitVectorOperand…

void AArch64TargetLowering::ReplaceNodeResults(		void AArch64TargetLowering::ReplaceNodeResults(
SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG) const {		SDNode *N, SmallVectorImpl<SDValue> &Results, SelectionDAG &DAG) const {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
llvm_unreachable("Don't know how to custom expand this");		llvm_unreachable("Don't know how to custom expand this");
case ISD::BITCAST:		case ISD::BITCAST:
ReplaceBITCASTResults(N, Results, DAG);		ReplaceBITCASTResults(N, Results, DAG);
return;		return;
		case AArch64ISD::UADDV:
		ReplaceUADDVResults(N, Results, DAG);
		return;
case ISD::FP_TO_UINT:		case ISD::FP_TO_UINT:
case ISD::FP_TO_SINT:		case ISD::FP_TO_SINT:
assert(N->getValueType(0) == MVT::i128 && "unexpected illegal conversion");		assert(N->getValueType(0) == MVT::i128 && "unexpected illegal conversion");
// Let normal code take care of it by not adding anything to Results.		// Let normal code take care of it by not adding anything to Results.
return;		return;
}		}
}		}

▲ Show 20 Lines • Show All 134 Lines • Show Last 20 Lines

test/CodeGen/AArch64/aarch64-addv.ll

	Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: add_D			; CHECK-LABEL: add_D
	; CHECK-NOT: addv			; CHECK-NOT: addv
	%bin.rdx = load <2 x i64>, <2 x i64>* %arr			%bin.rdx = load <2 x i64>, <2 x i64>* %arr
	%rdx.shuf0 = shufflevector <2 x i64> %bin.rdx, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>			%rdx.shuf0 = shufflevector <2 x i64> %bin.rdx, <2 x i64> undef, <2 x i32> <i32 1, i32 undef>
	%bin.rdx0 = add <2 x i64> %bin.rdx, %rdx.shuf0			%bin.rdx0 = add <2 x i64> %bin.rdx, %rdx.shuf0
	%r = extractelement <2 x i64> %bin.rdx0, i32 0			%r = extractelement <2 x i64> %bin.rdx0, i32 0
	ret i64 %r			ret i64 %r
	}			}

				define i32 @PR25056(i8* noalias nocapture readonly %arg1, i8* noalias nocapture readonly %arg2) {
				; CHECK-LABEL: PR25056
				; CHECK: addv {{s[0-9]+}}, {{v[0-9]+}}.4s
				entry:
				%0 = bitcast i8* %arg1 to <8 x i8>*
				%1 = load <8 x i8>, <8 x i8>* %0, align 1
				%2 = zext <8 x i8> %1 to <8 x i32>
				%3 = bitcast i8* %arg2 to <8 x i8>*
				%4 = load <8 x i8>, <8 x i8>* %3, align 1
				%5 = zext <8 x i8> %4 to <8 x i32>
				%6 = sub nsw <8 x i32> %2, %5
				%7 = icmp slt <8 x i32> %6, zeroinitializer
				%8 = sub nsw <8 x i32> zeroinitializer, %6
				%9 = select <8 x i1> %7, <8 x i32> %8, <8 x i32> %6
				%rdx.shuf = shufflevector <8 x i32> %9, <8 x i32> undef, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx = add <8 x i32> %9, %rdx.shuf
				%rdx.shuf1 = shufflevector <8 x i32> %bin.rdx, <8 x i32> undef, <8 x i32> <i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx2 = add <8 x i32> %bin.rdx, %rdx.shuf1
				%rdx.shuf3 = shufflevector <8 x i32> %bin.rdx2, <8 x i32> undef, <8 x i32> <i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
				%bin.rdx4 = add <8 x i32> %bin.rdx2, %rdx.shuf3
				%10 = extractelement <8 x i32> %bin.rdx4, i32 0
				ret i32 %10
				}

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Implement vector splitting on UADDV.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 36634

lib/Target/AArch64/AArch64ISelLowering.cpp

test/CodeGen/AArch64/aarch64-addv.ll

[AArch64] Implement vector splitting on UADDV.
ClosedPublic