This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1/3
DAGCombiner.cpp
-
test/CodeGen/
-
CodeGen/
-
AArch64/
-
neon-abd.ll
-
sve-abd.ll
-
Thumb2/
-
mve-vabdus.ll

Differential D115739

[SVE][DAGCombiner] Enable ISD::ABDS and ISD::ABDU for SVE.
ClosedPublic

Authored by paulwalker-arm on Dec 14 2021, 8:58 AM.

Download Raw Diff

Details

Reviewers

efriedma
sdesmalen
david-arm
dmgreen

Commits

rG6457f42bde82: [DAGCombiner] Extend ISD::ABDS/U combine to handle more cases.

Summary

Add the typical custom lowering and isel patterns to enable ABD
for scalable vectors.

The existing ABD combine doesn't quite work because for SVE only
a single scalable vector per scalar integer type it legal. (i.e.
for i32, <vscale x 4 x i32> is the only legal scalable vector type)

To account for this I've extended the combine for the case when the
extension of the input operands cannot be folded into the ABD. The
accompanying tests use legal and twice-the-size of legal types to
exercise both combines.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

paulwalker-arm created this revision.Dec 14 2021, 8:58 AM

Herald added a reviewer: efriedma. · View Herald TranscriptDec 14 2021, 8:58 AM

Herald added subscribers: ctetreau, ecnelises, psnobl and 2 others. · View Herald Transcript

paulwalker-arm requested review of this revision.Dec 14 2021, 8:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2021, 8:58 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

paulwalker-arm added reviewers: sdesmalen, david-arm, dmgreen.Dec 14 2021, 9:00 AM

Harbormaster completed remote builds in B139242: Diff 394280.Dec 14 2021, 10:00 AM

efriedma added inline comments.Dec 14 2021, 2:00 PM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9526	If I'm understanding correctly, this is just saying that if you know some number of leading sign bits of the operands, abs(a-b) is equivalent to abds(a,b) (and the equivalent for unsigned). This would be more clear if you explicitly checked for leading zero/sign bits, instead of implying them from a SIGN_EXTEND/ZERO_EXTEND opcode. I'd like to see non-SVE testcases for this.

paulwalker-arm added inline comments.Dec 17 2021, 10:16 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9526	@efriedma : I've confused myself several times when creating this patch so perhaps I'm still misunderstanding something but I don't think the checks relate to knowing if leading sign bits exists but rather looking for an indication as to how the operands should be interpreted. By this I mean knowing that a value is positive or negative is not enough to determine which of `ISD::ABDS` or `ISD::ABDU` will produce the same result as a plain `abs(sub())` sequence. I couldn't see a clear way to write non-SVE test cases. The nodes have very limited uses and from what I can see they don't have any of the typical legalisation plumbing either. The existing AArch64 uses don't suffer the same problem with SVE because there's typically a shorter legal vector type available that uses the same element type.

Matt added a subscriber: Matt.Jan 4 2022, 9:22 AM

review ping

The, lets say, "standard" pattern for a uabd is trunc(abs(sub(zext(a), zext(b)))). We start the transform at the abs though, as it allows for more folding, and convert the whole thing to trunc(zext(abd(a, b))), which is folded to plain abd(a, b). Starting at the abd helps us fold more patterns, like those where (say) vecreduce(abs(sub(zext(a), zext(b)))) can be folded to vecreduce(zext(abd(a, b))), which becomes a udot(abd(a, b)) on aarch64 (for example, with the right types).

I would suspect that you could include the trunc in the SVE tests, which would remove the need for the target independent parts of this patch, and it could test the legal types. (The existing ones might be good to keep around too if they would show inefficient codegen). And if you wanted to continue with the target indepedent part it could be done in a separate patch? Where it is probably worth adding some NEON AArch64 tests too.

Thanks @dmgreen. I've broken the ISel part out into D117873. I'll revisit this patch once the ISel is available. I took another look at the NEON side of things and I looks like I can use 32bit vectors to exercise the new combine so I'll add those tests when I rebase.

Rebase now that the isel side of things in on main.

Harbormaster completed remote builds in B145726: Diff 403231.Jan 27 2022, 1:48 AM

review ping

Herald added a subscriber: alextsao1999. · View Herald TranscriptFeb 8 2022, 4:30 AM

sdesmalen added inline comments.Feb 8 2022, 6:25 AM

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9523–9526	Is there a test for the case where VT1 != VT2 ?

There is potential alternative for the "Smaller than legal" Neon types in D119075 (It's an MVE patch, but the AArch64 version I have looks very similar). I'm not sure it will handle all the types here (like i1's), but I would hope SVE could be handled in the same way, and some of the assembly will be more optimal as it will use the smaller type with a single extend. I haven't had enough time to update them since the weekend yet.

There is also D119072, which too is a little contentious, but it uses ComputeNumSignBits and computeKnownBits/countMinLeadingZeros for detecting the number of sign bits. (The contentious part is about it running from DemandedBits, as far as I understand, not the calls to computeKnownBits). This patch it kind of doing that but just with zext/sext, and not creating the smaller type if possible.

Rebase and adds extra tests requested by Sander.

Harbormaster completed remote builds in B148532: Diff 407224.Feb 9 2022, 1:52 PM

I don't know what you're trying to tell me here @dmgreen. As I see it, this is a generic issue that I've solved in a target independent and simple way. Does it cover all bases? No, but then nor does the original combine, hence this addition. If it turns out we need to support more cases then we'd just extend the combine and add the necessary tests.

No strong objection from me!

But like Eli said, the code is a little odd, and it doesn't seem to be producing optimal results. If you don't want to wait for me to get back to D119075, this sounds fine too.

paulwalker-arm added a child revision: D119830: [SVE] Add isel patterns for SABA/UABA..Feb 15 2022, 4:36 AM

It looks like the other patch still needs more work. Feel free to go with this in the meantime. LGTM

This revision is now accepted and ready to land.Feb 15 2022, 10:07 AM

This revision was landed with ongoing or failed builds.Feb 17 2022, 5:34 AM

Closed by commit rG6457f42bde82: [DAGCombiner] Extend ISD::ABDS/U combine to handle more cases. (authored by paulwalker-arm). · Explain Why

This revision was automatically updated to reflect the committed changes.

paulwalker-arm added a commit: rG6457f42bde82: [DAGCombiner] Extend ISD::ABDS/U combine to handle more cases..

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

25 lines

test/

CodeGen/

AArch64/

neon-abd.ll

12 lines

sve-abd.ll

96 lines

Thumb2/

mve-vabdus.ll

12 lines

Diff 409610

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,499 Lines • ▼ Show 20 Lines	static SDValue combineABSToABD(SDNode *N, SelectionDAG &DAG,
Op1 = AbsOp1.getOperand(1);		Op1 = AbsOp1.getOperand(1);

unsigned Opc0 = Op0.getOpcode();		unsigned Opc0 = Op0.getOpcode();
// Check if the operands of the sub are (zero\|sign)-extended.		// Check if the operands of the sub are (zero\|sign)-extended.
if (Opc0 != Op1.getOpcode() \|\|		if (Opc0 != Op1.getOpcode() \|\|
(Opc0 != ISD::ZERO_EXTEND && Opc0 != ISD::SIGN_EXTEND))		(Opc0 != ISD::ZERO_EXTEND && Opc0 != ISD::SIGN_EXTEND))
return SDValue();		return SDValue();

		EVT VT = N->getValueType(0);
EVT VT1 = Op0.getOperand(0).getValueType();		EVT VT1 = Op0.getOperand(0).getValueType();
EVT VT2 = Op1.getOperand(0).getValueType();		EVT VT2 = Op1.getOperand(0).getValueType();
// Check if the operands are of same type and valid size.
unsigned ABDOpcode = (Opc0 == ISD::SIGN_EXTEND) ? ISD::ABDS : ISD::ABDU;		unsigned ABDOpcode = (Opc0 == ISD::SIGN_EXTEND) ? ISD::ABDS : ISD::ABDU;
if (VT1 != VT2 \|\| !TLI.isOperationLegalOrCustom(ABDOpcode, VT1))
return SDValue();

		// fold abs(sext(x) - sext(y)) -> zext(abds(x, y))
		// fold abs(zext(x) - zext(y)) -> zext(abdu(x, y))
		// NOTE: Extensions must be equivalent.
		if (VT1 == VT2 && TLI.isOperationLegalOrCustom(ABDOpcode, VT1)) {
Op0 = Op0.getOperand(0);		Op0 = Op0.getOperand(0);
Op1 = Op1.getOperand(0);		Op1 = Op1.getOperand(0);
SDValue ABD =		SDValue ABD = DAG.getNode(ABDOpcode, SDLoc(N), VT1, Op0, Op1);
DAG.getNode(ABDOpcode, SDLoc(N), Op0->getValueType(0), Op0, Op1);		return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), VT, ABD);
return DAG.getNode(ISD::ZERO_EXTEND, SDLoc(N), N->getValueType(0), ABD);		}

		// fold abs(sext(x) - sext(y)) -> abds(sext(x), sext(y))
		// fold abs(zext(x) - zext(y)) -> abdu(zext(x), zext(y))
		if (TLI.isOperationLegalOrCustom(ABDOpcode, VT))
		return DAG.getNode(ABDOpcode, SDLoc(N), VT, Op0, Op1);
		efriedmaUnsubmitted Not Done Reply Inline Actions If I'm understanding correctly, this is just saying that if you know some number of leading sign bits of the operands, abs(a-b) is equivalent to abds(a,b) (and the equivalent for unsigned). This would be more clear if you explicitly checked for leading zero/sign bits, instead of implying them from a SIGN_EXTEND/ZERO_EXTEND opcode. I'd like to see non-SVE testcases for this. efriedma: If I'm understanding correctly, this is just saying that if you know some number of leading…
		paulwalker-armAuthorUnsubmitted Done Reply Inline Actions @efriedma : I've confused myself several times when creating this patch so perhaps I'm still misunderstanding something but I don't think the checks relate to knowing if leading sign bits exists but rather looking for an indication as to how the operands should be interpreted. By this I mean knowing that a value is positive or negative is not enough to determine which of `ISD::ABDS` or `ISD::ABDU` will produce the same result as a plain `abs(sub())` sequence. I couldn't see a clear way to write non-SVE test cases. The nodes have very limited uses and from what I can see they don't have any of the typical legalisation plumbing either. The existing AArch64 uses don't suffer the same problem with SVE because there's typically a shorter legal vector type available that uses the same element type. paulwalker-arm: @efriedma : I've confused myself several times when creating this patch so perhaps I'm still…
		sdesmalenUnsubmitted Not Done Reply Inline Actions Is there a test for the case where VT1 != VT2 ? sdesmalen: Is there a test for the case where VT1 != VT2 ?

		return SDValue();
}		}

SDValue DAGCombiner::visitABS(SDNode *N) {		SDValue DAGCombiner::visitABS(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);

// fold (abs c1) -> c2		// fold (abs c1) -> c2
if (DAG.isConstantIntBuildVectorOrConstantInt(N0))		if (DAG.isConstantIntBuildVectorOrConstantInt(N0))
▲ Show 20 Lines • Show All 14,758 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/neon-abd.ll

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

define <4 x i16> @sabd_4h_promoted_ops(<4 x i8> %a, <4 x i8> %b) #0 {		define <4 x i16> @sabd_4h_promoted_ops(<4 x i8> %a, <4 x i8> %b) #0 {
; CHECK-LABEL: sabd_4h_promoted_ops:		; CHECK-LABEL: sabd_4h_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: shl v0.4h, v0.4h, #8		; CHECK-NEXT: shl v0.4h, v0.4h, #8
; CHECK-NEXT: shl v1.4h, v1.4h, #8		; CHECK-NEXT: shl v1.4h, v1.4h, #8
; CHECK-NEXT: sshr v0.4h, v0.4h, #8		; CHECK-NEXT: sshr v0.4h, v0.4h, #8
; CHECK-NEXT: sshr v1.4h, v1.4h, #8		; CHECK-NEXT: sshr v1.4h, v1.4h, #8
; CHECK-NEXT: sub v0.4h, v0.4h, v1.4h		; CHECK-NEXT: sabd v0.4h, v0.4h, v1.4h
; CHECK-NEXT: abs v0.4h, v0.4h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.sext = sext <4 x i8> %a to <4 x i16>		%a.sext = sext <4 x i8> %a to <4 x i16>
%b.sext = sext <4 x i8> %b to <4 x i16>		%b.sext = sext <4 x i8> %b to <4 x i16>
%sub = sub <4 x i16> %a.sext, %b.sext		%sub = sub <4 x i16> %a.sext, %b.sext
%abs = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %sub, i1 true)		%abs = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %sub, i1 true)
ret <4 x i16> %abs		ret <4 x i16> %abs
}		}

Show All 37 Lines

define <2 x i32> @sabd_2s_promoted_ops(<2 x i16> %a, <2 x i16> %b) #0 {		define <2 x i32> @sabd_2s_promoted_ops(<2 x i16> %a, <2 x i16> %b) #0 {
; CHECK-LABEL: sabd_2s_promoted_ops:		; CHECK-LABEL: sabd_2s_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: shl v0.2s, v0.2s, #16		; CHECK-NEXT: shl v0.2s, v0.2s, #16
; CHECK-NEXT: shl v1.2s, v1.2s, #16		; CHECK-NEXT: shl v1.2s, v1.2s, #16
; CHECK-NEXT: sshr v0.2s, v0.2s, #16		; CHECK-NEXT: sshr v0.2s, v0.2s, #16
; CHECK-NEXT: sshr v1.2s, v1.2s, #16		; CHECK-NEXT: sshr v1.2s, v1.2s, #16
; CHECK-NEXT: sub v0.2s, v0.2s, v1.2s		; CHECK-NEXT: sabd v0.2s, v0.2s, v1.2s
; CHECK-NEXT: abs v0.2s, v0.2s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.sext = sext <2 x i16> %a to <2 x i32>		%a.sext = sext <2 x i16> %a to <2 x i32>
%b.sext = sext <2 x i16> %b to <2 x i32>		%b.sext = sext <2 x i16> %b to <2 x i32>
%sub = sub <2 x i32> %a.sext, %b.sext		%sub = sub <2 x i32> %a.sext, %b.sext
%abs = call <2 x i32> @llvm.abs.v2i32(<2 x i32> %sub, i1 true)		%abs = call <2 x i32> @llvm.abs.v2i32(<2 x i32> %sub, i1 true)
ret <2 x i32> %abs		ret <2 x i32> %abs
}		}

▲ Show 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
ret <4 x i16> %trunc		ret <4 x i16> %trunc
}		}

define <4 x i16> @uabd_4h_promoted_ops(<4 x i8> %a, <4 x i8> %b) #0 {		define <4 x i16> @uabd_4h_promoted_ops(<4 x i8> %a, <4 x i8> %b) #0 {
; CHECK-LABEL: uabd_4h_promoted_ops:		; CHECK-LABEL: uabd_4h_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: bic v0.4h, #255, lsl #8		; CHECK-NEXT: bic v0.4h, #255, lsl #8
; CHECK-NEXT: bic v1.4h, #255, lsl #8		; CHECK-NEXT: bic v1.4h, #255, lsl #8
; CHECK-NEXT: sub v0.4h, v0.4h, v1.4h		; CHECK-NEXT: uabd v0.4h, v0.4h, v1.4h
; CHECK-NEXT: abs v0.4h, v0.4h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.zext = zext <4 x i8> %a to <4 x i16>		%a.zext = zext <4 x i8> %a to <4 x i16>
%b.zext = zext <4 x i8> %b to <4 x i16>		%b.zext = zext <4 x i8> %b to <4 x i16>
%sub = sub <4 x i16> %a.zext, %b.zext		%sub = sub <4 x i16> %a.zext, %b.zext
%abs = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %sub, i1 true)		%abs = call <4 x i16> @llvm.abs.v4i16(<4 x i16> %sub, i1 true)
ret <4 x i16> %abs		ret <4 x i16> %abs
}		}

Show All 36 Lines
}		}

define <2 x i32> @uabd_2s_promoted_ops(<2 x i16> %a, <2 x i16> %b) #0 {		define <2 x i32> @uabd_2s_promoted_ops(<2 x i16> %a, <2 x i16> %b) #0 {
; CHECK-LABEL: uabd_2s_promoted_ops:		; CHECK-LABEL: uabd_2s_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: movi d2, #0x00ffff0000ffff		; CHECK-NEXT: movi d2, #0x00ffff0000ffff
; CHECK-NEXT: and v0.8b, v0.8b, v2.8b		; CHECK-NEXT: and v0.8b, v0.8b, v2.8b
; CHECK-NEXT: and v1.8b, v1.8b, v2.8b		; CHECK-NEXT: and v1.8b, v1.8b, v2.8b
; CHECK-NEXT: sub v0.2s, v0.2s, v1.2s		; CHECK-NEXT: uabd v0.2s, v0.2s, v1.2s
; CHECK-NEXT: abs v0.2s, v0.2s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.zext = zext <2 x i16> %a to <2 x i32>		%a.zext = zext <2 x i16> %a to <2 x i32>
%b.zext = zext <2 x i16> %b to <2 x i32>		%b.zext = zext <2 x i16> %b to <2 x i32>
%sub = sub <2 x i32> %a.zext, %b.zext		%sub = sub <2 x i32> %a.zext, %b.zext
%abs = call <2 x i32> @llvm.abs.v2i32(<2 x i32> %sub, i1 true)		%abs = call <2 x i32> @llvm.abs.v2i32(<2 x i32> %sub, i1 true)
ret <2 x i32> %abs		ret <2 x i32> %abs
}		}

▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-abd.ll

Show All 18 Lines	; CHECK-NEXT: ret
%abs = call <vscale x 16 x i16> @llvm.abs.nxv16i16(<vscale x 16 x i16> %sub, i1 true)		%abs = call <vscale x 16 x i16> @llvm.abs.nxv16i16(<vscale x 16 x i16> %sub, i1 true)
%trunc = trunc <vscale x 16 x i16> %abs to <vscale x 16 x i8>		%trunc = trunc <vscale x 16 x i16> %abs to <vscale x 16 x i8>
ret <vscale x 16 x i8> %trunc		ret <vscale x 16 x i8> %trunc
}		}

define <vscale x 16 x i8> @sabd_b_promoted_ops(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) #0 {		define <vscale x 16 x i8> @sabd_b_promoted_ops(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) #0 {
; CHECK-LABEL: sabd_b_promoted_ops:		; CHECK-LABEL: sabd_b_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: ptrue p2.b
; CHECK-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff		; CHECK-NEXT: mov z0.b, p0/z, #-1 // =0xffffffffffffffff
; CHECK-NEXT: mov z1.b, p1/z, #-1 // =0xffffffffffffffff		; CHECK-NEXT: mov z1.b, p1/z, #-1 // =0xffffffffffffffff
; CHECK-NEXT: ptrue p2.b		; CHECK-NEXT: sabd z0.b, p2/m, z0.b, z1.b
; CHECK-NEXT: sub z0.b, z0.b, z1.b
; CHECK-NEXT: abs z0.b, p2/m, z0.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.sext = sext <vscale x 16 x i1> %a to <vscale x 16 x i8>		%a.sext = sext <vscale x 16 x i1> %a to <vscale x 16 x i8>
%b.sext = sext <vscale x 16 x i1> %b to <vscale x 16 x i8>		%b.sext = sext <vscale x 16 x i1> %b to <vscale x 16 x i8>
%sub = sub <vscale x 16 x i8> %a.sext, %b.sext		%sub = sub <vscale x 16 x i8> %a.sext, %b.sext
%abs = call <vscale x 16 x i8> @llvm.abs.nxv16i8(<vscale x 16 x i8> %sub, i1 true)		%abs = call <vscale x 16 x i8> @llvm.abs.nxv16i8(<vscale x 16 x i8> %sub, i1 true)
ret <vscale x 16 x i8> %abs		ret <vscale x 16 x i8> %abs
}		}

Show All 12 Lines
}		}

define <vscale x 8 x i16> @sabd_h_promoted_ops(<vscale x 8 x i8> %a, <vscale x 8 x i8> %b) #0 {		define <vscale x 8 x i16> @sabd_h_promoted_ops(<vscale x 8 x i8> %a, <vscale x 8 x i8> %b) #0 {
; CHECK-LABEL: sabd_h_promoted_ops:		; CHECK-LABEL: sabd_h_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.h		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: sxtb z0.h, p0/m, z0.h		; CHECK-NEXT: sxtb z0.h, p0/m, z0.h
; CHECK-NEXT: sxtb z1.h, p0/m, z1.h		; CHECK-NEXT: sxtb z1.h, p0/m, z1.h
; CHECK-NEXT: sub z0.h, z0.h, z1.h		; CHECK-NEXT: sabd z0.h, p0/m, z0.h, z1.h
; CHECK-NEXT: abs z0.h, p0/m, z0.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.sext = sext <vscale x 8 x i8> %a to <vscale x 8 x i16>		%a.sext = sext <vscale x 8 x i8> %a to <vscale x 8 x i16>
%b.sext = sext <vscale x 8 x i8> %b to <vscale x 8 x i16>		%b.sext = sext <vscale x 8 x i8> %b to <vscale x 8 x i16>
%sub = sub <vscale x 8 x i16> %a.sext, %b.sext		%sub = sub <vscale x 8 x i16> %a.sext, %b.sext
%abs = call <vscale x 8 x i16> @llvm.abs.nxv8i16(<vscale x 8 x i16> %sub, i1 true)		%abs = call <vscale x 8 x i16> @llvm.abs.nxv8i16(<vscale x 8 x i16> %sub, i1 true)
ret <vscale x 8 x i16> %abs		ret <vscale x 8 x i16> %abs
}		}

Show All 12 Lines
}		}

define <vscale x 4 x i32> @sabd_s_promoted_ops(<vscale x 4 x i16> %a, <vscale x 4 x i16> %b) #0 {		define <vscale x 4 x i32> @sabd_s_promoted_ops(<vscale x 4 x i16> %a, <vscale x 4 x i16> %b) #0 {
; CHECK-LABEL: sabd_s_promoted_ops:		; CHECK-LABEL: sabd_s_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: sxth z0.s, p0/m, z0.s		; CHECK-NEXT: sxth z0.s, p0/m, z0.s
; CHECK-NEXT: sxth z1.s, p0/m, z1.s		; CHECK-NEXT: sxth z1.s, p0/m, z1.s
; CHECK-NEXT: sub z0.s, z0.s, z1.s		; CHECK-NEXT: sabd z0.s, p0/m, z0.s, z1.s
; CHECK-NEXT: abs z0.s, p0/m, z0.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.sext = sext <vscale x 4 x i16> %a to <vscale x 4 x i32>		%a.sext = sext <vscale x 4 x i16> %a to <vscale x 4 x i32>
%b.sext = sext <vscale x 4 x i16> %b to <vscale x 4 x i32>		%b.sext = sext <vscale x 4 x i16> %b to <vscale x 4 x i32>
%sub = sub <vscale x 4 x i32> %a.sext, %b.sext		%sub = sub <vscale x 4 x i32> %a.sext, %b.sext
%abs = call <vscale x 4 x i32> @llvm.abs.nxv4i32(<vscale x 4 x i32> %sub, i1 true)		%abs = call <vscale x 4 x i32> @llvm.abs.nxv4i32(<vscale x 4 x i32> %sub, i1 true)
ret <vscale x 4 x i32> %abs		ret <vscale x 4 x i32> %abs
}		}

Show All 12 Lines
}		}

define <vscale x 2 x i64> @sabd_d_promoted_ops(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) #0 {		define <vscale x 2 x i64> @sabd_d_promoted_ops(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) #0 {
; CHECK-LABEL: sabd_d_promoted_ops:		; CHECK-LABEL: sabd_d_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: sxtw z0.d, p0/m, z0.d		; CHECK-NEXT: sxtw z0.d, p0/m, z0.d
; CHECK-NEXT: sxtw z1.d, p0/m, z1.d		; CHECK-NEXT: sxtw z1.d, p0/m, z1.d
; CHECK-NEXT: sub z0.d, z0.d, z1.d		; CHECK-NEXT: sabd z0.d, p0/m, z0.d, z1.d
; CHECK-NEXT: abs z0.d, p0/m, z0.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.sext = sext <vscale x 2 x i32> %a to <vscale x 2 x i64>		%a.sext = sext <vscale x 2 x i32> %a to <vscale x 2 x i64>
%b.sext = sext <vscale x 2 x i32> %b to <vscale x 2 x i64>		%b.sext = sext <vscale x 2 x i32> %b to <vscale x 2 x i64>
%sub = sub <vscale x 2 x i64> %a.sext, %b.sext		%sub = sub <vscale x 2 x i64> %a.sext, %b.sext
%abs = call <vscale x 2 x i64> @llvm.abs.nxv2i64(<vscale x 2 x i64> %sub, i1 true)		%abs = call <vscale x 2 x i64> @llvm.abs.nxv2i64(<vscale x 2 x i64> %sub, i1 true)
ret <vscale x 2 x i64> %abs		ret <vscale x 2 x i64> %abs
}		}

Show All 13 Lines	; CHECK-NEXT: ret
%abs = call <vscale x 16 x i16> @llvm.abs.nxv16i16(<vscale x 16 x i16> %sub, i1 true)		%abs = call <vscale x 16 x i16> @llvm.abs.nxv16i16(<vscale x 16 x i16> %sub, i1 true)
%trunc = trunc <vscale x 16 x i16> %abs to <vscale x 16 x i8>		%trunc = trunc <vscale x 16 x i16> %abs to <vscale x 16 x i8>
ret <vscale x 16 x i8> %trunc		ret <vscale x 16 x i8> %trunc
}		}

define <vscale x 16 x i8> @uabd_b_promoted_ops(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) #0 {		define <vscale x 16 x i8> @uabd_b_promoted_ops(<vscale x 16 x i1> %a, <vscale x 16 x i1> %b) #0 {
; CHECK-LABEL: uabd_b_promoted_ops:		; CHECK-LABEL: uabd_b_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: mov z0.b, p0/z, #1 // =0x1
; CHECK-NEXT: mov z1.b, p1/z, #-1 // =0xffffffffffffffff
; CHECK-NEXT: ptrue p2.b		; CHECK-NEXT: ptrue p2.b
; CHECK-NEXT: add z0.b, z0.b, z1.b		; CHECK-NEXT: mov z0.b, p0/z, #1 // =0x1
; CHECK-NEXT: abs z0.b, p2/m, z0.b		; CHECK-NEXT: mov z1.b, p1/z, #1 // =0x1
		; CHECK-NEXT: uabd z0.b, p2/m, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.zext = zext <vscale x 16 x i1> %a to <vscale x 16 x i8>		%a.zext = zext <vscale x 16 x i1> %a to <vscale x 16 x i8>
%b.zext = zext <vscale x 16 x i1> %b to <vscale x 16 x i8>		%b.zext = zext <vscale x 16 x i1> %b to <vscale x 16 x i8>
%sub = sub <vscale x 16 x i8> %a.zext, %b.zext		%sub = sub <vscale x 16 x i8> %a.zext, %b.zext
%abs = call <vscale x 16 x i8> @llvm.abs.nxv16i8(<vscale x 16 x i8> %sub, i1 true)		%abs = call <vscale x 16 x i8> @llvm.abs.nxv16i8(<vscale x 16 x i8> %sub, i1 true)
ret <vscale x 16 x i8> %abs		ret <vscale x 16 x i8> %abs
}		}

Show All 9 Lines	; CHECK-NEXT: ret
%abs = call <vscale x 8 x i32> @llvm.abs.nxv8i32(<vscale x 8 x i32> %sub, i1 true)		%abs = call <vscale x 8 x i32> @llvm.abs.nxv8i32(<vscale x 8 x i32> %sub, i1 true)
%trunc = trunc <vscale x 8 x i32> %abs to <vscale x 8 x i16>		%trunc = trunc <vscale x 8 x i32> %abs to <vscale x 8 x i16>
ret <vscale x 8 x i16> %trunc		ret <vscale x 8 x i16> %trunc
}		}

define <vscale x 8 x i16> @uabd_h_promoted_ops(<vscale x 8 x i8> %a, <vscale x 8 x i8> %b) #0 {		define <vscale x 8 x i16> @uabd_h_promoted_ops(<vscale x 8 x i8> %a, <vscale x 8 x i8> %b) #0 {
; CHECK-LABEL: uabd_h_promoted_ops:		; CHECK-LABEL: uabd_h_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: ptrue p0.h
; CHECK-NEXT: and z0.h, z0.h, #0xff		; CHECK-NEXT: and z0.h, z0.h, #0xff
; CHECK-NEXT: and z1.h, z1.h, #0xff		; CHECK-NEXT: and z1.h, z1.h, #0xff
; CHECK-NEXT: ptrue p0.h		; CHECK-NEXT: uabd z0.h, p0/m, z0.h, z1.h
; CHECK-NEXT: sub z0.h, z0.h, z1.h
; CHECK-NEXT: abs z0.h, p0/m, z0.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.zext = zext <vscale x 8 x i8> %a to <vscale x 8 x i16>		%a.zext = zext <vscale x 8 x i8> %a to <vscale x 8 x i16>
%b.zext = zext <vscale x 8 x i8> %b to <vscale x 8 x i16>		%b.zext = zext <vscale x 8 x i8> %b to <vscale x 8 x i16>
%sub = sub <vscale x 8 x i16> %a.zext, %b.zext		%sub = sub <vscale x 8 x i16> %a.zext, %b.zext
%abs = call <vscale x 8 x i16> @llvm.abs.nxv8i16(<vscale x 8 x i16> %sub, i1 true)		%abs = call <vscale x 8 x i16> @llvm.abs.nxv8i16(<vscale x 8 x i16> %sub, i1 true)
ret <vscale x 8 x i16> %abs		ret <vscale x 8 x i16> %abs
}		}

Show All 9 Lines	; CHECK-NEXT: ret
%abs = call <vscale x 4 x i64> @llvm.abs.nxv4i64(<vscale x 4 x i64> %sub, i1 true)		%abs = call <vscale x 4 x i64> @llvm.abs.nxv4i64(<vscale x 4 x i64> %sub, i1 true)
%trunc = trunc <vscale x 4 x i64> %abs to <vscale x 4 x i32>		%trunc = trunc <vscale x 4 x i64> %abs to <vscale x 4 x i32>
ret <vscale x 4 x i32> %trunc		ret <vscale x 4 x i32> %trunc
}		}

define <vscale x 4 x i32> @uabd_s_promoted_ops(<vscale x 4 x i16> %a, <vscale x 4 x i16> %b) #0 {		define <vscale x 4 x i32> @uabd_s_promoted_ops(<vscale x 4 x i16> %a, <vscale x 4 x i16> %b) #0 {
; CHECK-LABEL: uabd_s_promoted_ops:		; CHECK-LABEL: uabd_s_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: and z0.s, z0.s, #0xffff		; CHECK-NEXT: and z0.s, z0.s, #0xffff
; CHECK-NEXT: and z1.s, z1.s, #0xffff		; CHECK-NEXT: and z1.s, z1.s, #0xffff
; CHECK-NEXT: ptrue p0.s		; CHECK-NEXT: uabd z0.s, p0/m, z0.s, z1.s
; CHECK-NEXT: sub z0.s, z0.s, z1.s
; CHECK-NEXT: abs z0.s, p0/m, z0.s
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.zext = zext <vscale x 4 x i16> %a to <vscale x 4 x i32>		%a.zext = zext <vscale x 4 x i16> %a to <vscale x 4 x i32>
%b.zext = zext <vscale x 4 x i16> %b to <vscale x 4 x i32>		%b.zext = zext <vscale x 4 x i16> %b to <vscale x 4 x i32>
%sub = sub <vscale x 4 x i32> %a.zext, %b.zext		%sub = sub <vscale x 4 x i32> %a.zext, %b.zext
%abs = call <vscale x 4 x i32> @llvm.abs.nxv4i32(<vscale x 4 x i32> %sub, i1 true)		%abs = call <vscale x 4 x i32> @llvm.abs.nxv4i32(<vscale x 4 x i32> %sub, i1 true)
ret <vscale x 4 x i32> %abs		ret <vscale x 4 x i32> %abs
}		}

Show All 9 Lines	; CHECK-NEXT: ret
%abs = call <vscale x 2 x i128> @llvm.abs.nxv2i128(<vscale x 2 x i128> %sub, i1 true)		%abs = call <vscale x 2 x i128> @llvm.abs.nxv2i128(<vscale x 2 x i128> %sub, i1 true)
%trunc = trunc <vscale x 2 x i128> %abs to <vscale x 2 x i64>		%trunc = trunc <vscale x 2 x i128> %abs to <vscale x 2 x i64>
ret <vscale x 2 x i64> %trunc		ret <vscale x 2 x i64> %trunc
}		}

define <vscale x 2 x i64> @uabd_d_promoted_ops(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) #0 {		define <vscale x 2 x i64> @uabd_d_promoted_ops(<vscale x 2 x i32> %a, <vscale x 2 x i32> %b) #0 {
; CHECK-LABEL: uabd_d_promoted_ops:		; CHECK-LABEL: uabd_d_promoted_ops:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
		; CHECK-NEXT: ptrue p0.d
; CHECK-NEXT: and z0.d, z0.d, #0xffffffff		; CHECK-NEXT: and z0.d, z0.d, #0xffffffff
; CHECK-NEXT: and z1.d, z1.d, #0xffffffff		; CHECK-NEXT: and z1.d, z1.d, #0xffffffff
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: uabd z0.d, p0/m, z0.d, z1.d
; CHECK-NEXT: sub z0.d, z0.d, z1.d
; CHECK-NEXT: abs z0.d, p0/m, z0.d
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%a.zext = zext <vscale x 2 x i32> %a to <vscale x 2 x i64>		%a.zext = zext <vscale x 2 x i32> %a to <vscale x 2 x i64>
%b.zext = zext <vscale x 2 x i32> %b to <vscale x 2 x i64>		%b.zext = zext <vscale x 2 x i32> %b to <vscale x 2 x i64>
%sub = sub <vscale x 2 x i64> %a.zext, %b.zext		%sub = sub <vscale x 2 x i64> %a.zext, %b.zext
%abs = call <vscale x 2 x i64> @llvm.abs.nxv2i64(<vscale x 2 x i64> %sub, i1 true)		%abs = call <vscale x 2 x i64> @llvm.abs.nxv2i64(<vscale x 2 x i64> %sub, i1 true)
ret <vscale x 2 x i64> %abs		ret <vscale x 2 x i64> %abs
}		}

		; Test the situation where isLegal(ISD::ABD, typeof(%a)) returns true but %a and
		; %b have differing types.
		define <vscale x 4 x i32> @uabd_non_matching_extension(<vscale x 4 x i32> %a, <vscale x 4 x i8> %b) #0 {
		; CHECK-LABEL: uabd_non_matching_extension:
		; CHECK: // %bb.0:
		; CHECK-NEXT: and z1.s, z1.s, #0xff
		; CHECK-NEXT: uunpkhi z2.d, z0.s
		; CHECK-NEXT: uunpklo z0.d, z0.s
		; CHECK-NEXT: uunpkhi z3.d, z1.s
		; CHECK-NEXT: uunpklo z1.d, z1.s
		; CHECK-NEXT: ptrue p0.d
		; CHECK-NEXT: sub z0.d, z0.d, z1.d
		; CHECK-NEXT: sub z1.d, z2.d, z3.d
		; CHECK-NEXT: abs z1.d, p0/m, z1.d
		; CHECK-NEXT: abs z0.d, p0/m, z0.d
		; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
		; CHECK-NEXT: ret
		%a.zext = zext <vscale x 4 x i32> %a to <vscale x 4 x i64>
		%b.zext = zext <vscale x 4 x i8> %b to <vscale x 4 x i64>
		%sub = sub <vscale x 4 x i64> %a.zext, %b.zext
		%abs = call <vscale x 4 x i64> @llvm.abs.nxv4i64(<vscale x 4 x i64> %sub, i1 true)
		%trunc = trunc <vscale x 4 x i64> %abs to <vscale x 4 x i32>
		ret <vscale x 4 x i32> %trunc
		}

		; Test the situation where isLegal(ISD::ABD, typeof(%a.zext)) returns true but
		; %a and %b have differing types.
		define <vscale x 4 x i32> @uabd_non_matching_promoted_ops(<vscale x 4 x i8> %a, <vscale x 4 x i16> %b) #0 {
		; CHECK-LABEL: uabd_non_matching_promoted_ops:
		; CHECK: // %bb.0:
		; CHECK-NEXT: ptrue p0.s
		; CHECK-NEXT: and z0.s, z0.s, #0xff
		; CHECK-NEXT: and z1.s, z1.s, #0xffff
		; CHECK-NEXT: uabd z0.s, p0/m, z0.s, z1.s
		; CHECK-NEXT: ret
		%a.zext = zext <vscale x 4 x i8> %a to <vscale x 4 x i32>
		%b.zext = zext <vscale x 4 x i16> %b to <vscale x 4 x i32>
		%sub = sub <vscale x 4 x i32> %a.zext, %b.zext
		%abs = call <vscale x 4 x i32> @llvm.abs.nxv4i32(<vscale x 4 x i32> %sub, i1 true)
		ret <vscale x 4 x i32> %abs
		}

		; Test the situation where isLegal(ISD::ABD, typeof(%a)) returns true but %a and
		; %b are promoted differently.
		define <vscale x 4 x i32> @uabd_non_matching_promotion(<vscale x 4 x i8> %a, <vscale x 4 x i8> %b) #0 {
		; CHECK-LABEL: uabd_non_matching_promotion:
		; CHECK: // %bb.0:
		; CHECK-NEXT: ptrue p0.s
		; CHECK-NEXT: and z0.s, z0.s, #0xff
		; CHECK-NEXT: sxtb z1.s, p0/m, z1.s
		; CHECK-NEXT: sub z0.s, z0.s, z1.s
		; CHECK-NEXT: abs z0.s, p0/m, z0.s
		; CHECK-NEXT: ret
		%a.zext = zext <vscale x 4 x i8> %a to <vscale x 4 x i32>
		%b.zext = sext <vscale x 4 x i8> %b to <vscale x 4 x i32>
		%sub = sub <vscale x 4 x i32> %a.zext, %b.zext
		%abs = call <vscale x 4 x i32> @llvm.abs.nxv4i32(<vscale x 4 x i32> %sub, i1 true)
		ret <vscale x 4 x i32> %abs
		}

declare <vscale x 16 x i8> @llvm.abs.nxv16i8(<vscale x 16 x i8>, i1)		declare <vscale x 16 x i8> @llvm.abs.nxv16i8(<vscale x 16 x i8>, i1)

declare <vscale x 8 x i16> @llvm.abs.nxv8i16(<vscale x 8 x i16>, i1)		declare <vscale x 8 x i16> @llvm.abs.nxv8i16(<vscale x 8 x i16>, i1)
declare <vscale x 16 x i16> @llvm.abs.nxv16i16(<vscale x 16 x i16>, i1)		declare <vscale x 16 x i16> @llvm.abs.nxv16i16(<vscale x 16 x i16>, i1)

declare <vscale x 4 x i32> @llvm.abs.nxv4i32(<vscale x 4 x i32>, i1)		declare <vscale x 4 x i32> @llvm.abs.nxv4i32(<vscale x 4 x i32>, i1)
declare <vscale x 8 x i32> @llvm.abs.nxv8i32(<vscale x 8 x i32>, i1)		declare <vscale x 8 x i32> @llvm.abs.nxv8i32(<vscale x 8 x i32>, i1)

declare <vscale x 2 x i64> @llvm.abs.nxv2i64(<vscale x 2 x i64>, i1)		declare <vscale x 2 x i64> @llvm.abs.nxv2i64(<vscale x 2 x i64>, i1)
declare <vscale x 4 x i64> @llvm.abs.nxv4i64(<vscale x 4 x i64>, i1)		declare <vscale x 4 x i64> @llvm.abs.nxv4i64(<vscale x 4 x i64>, i1)

declare <vscale x 2 x i128> @llvm.abs.nxv2i128(<vscale x 2 x i128>, i1)		declare <vscale x 2 x i128> @llvm.abs.nxv2i128(<vscale x 2 x i128>, i1)

attributes #0 = { "target-features"="+neon,+sve" }		attributes #0 = { "target-features"="+neon,+sve" }

llvm/test/CodeGen/Thumb2/mve-vabdus.ll

Show All 15 Lines	; CHECK-NEXT: bx lr
ret <16 x i8> %result		ret <16 x i8> %result
}		}

define arm_aapcs_vfpcc <8 x i8> @vabd_v8s8(<8 x i8> %src1, <8 x i8> %src2) {		define arm_aapcs_vfpcc <8 x i8> @vabd_v8s8(<8 x i8> %src1, <8 x i8> %src2) {
; CHECK-LABEL: vabd_v8s8:		; CHECK-LABEL: vabd_v8s8:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.s8 q1, q1		; CHECK-NEXT: vmovlb.s8 q1, q1
; CHECK-NEXT: vmovlb.s8 q0, q0		; CHECK-NEXT: vmovlb.s8 q0, q0
; CHECK-NEXT: vsub.i16 q0, q0, q1		; CHECK-NEXT: vabd.s16 q0, q0, q1
; CHECK-NEXT: vabs.s16 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%sextsrc1 = sext <8 x i8> %src1 to <8 x i16>		%sextsrc1 = sext <8 x i8> %src1 to <8 x i16>
%sextsrc2 = sext <8 x i8> %src2 to <8 x i16>		%sextsrc2 = sext <8 x i8> %src2 to <8 x i16>
%add1 = sub <8 x i16> %sextsrc1, %sextsrc2		%add1 = sub <8 x i16> %sextsrc1, %sextsrc2
%add2 = sub <8 x i16> zeroinitializer, %add1		%add2 = sub <8 x i16> zeroinitializer, %add1
%c = icmp sge <8 x i16> %add1, zeroinitializer		%c = icmp sge <8 x i16> %add1, zeroinitializer
%s = select <8 x i1> %c, <8 x i16> %add1, <8 x i16> %add2		%s = select <8 x i1> %c, <8 x i16> %add1, <8 x i16> %add2
%result = trunc <8 x i16> %s to <8 x i8>		%result = trunc <8 x i16> %s to <8 x i8>
Show All 35 Lines	; CHECK-NEXT: bx lr
ret <8 x i16> %result		ret <8 x i16> %result
}		}

define arm_aapcs_vfpcc <4 x i16> @vabd_v4s16(<4 x i16> %src1, <4 x i16> %src2) {		define arm_aapcs_vfpcc <4 x i16> @vabd_v4s16(<4 x i16> %src1, <4 x i16> %src2) {
; CHECK-LABEL: vabd_v4s16:		; CHECK-LABEL: vabd_v4s16:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.s16 q1, q1		; CHECK-NEXT: vmovlb.s16 q1, q1
; CHECK-NEXT: vmovlb.s16 q0, q0		; CHECK-NEXT: vmovlb.s16 q0, q0
; CHECK-NEXT: vsub.i32 q0, q0, q1		; CHECK-NEXT: vabd.s32 q0, q0, q1
; CHECK-NEXT: vabs.s32 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%sextsrc1 = sext <4 x i16> %src1 to <4 x i32>		%sextsrc1 = sext <4 x i16> %src1 to <4 x i32>
%sextsrc2 = sext <4 x i16> %src2 to <4 x i32>		%sextsrc2 = sext <4 x i16> %src2 to <4 x i32>
%add1 = sub <4 x i32> %sextsrc1, %sextsrc2		%add1 = sub <4 x i32> %sextsrc1, %sextsrc2
%add2 = sub <4 x i32> zeroinitializer, %add1		%add2 = sub <4 x i32> zeroinitializer, %add1
%c = icmp sge <4 x i32> %add1, zeroinitializer		%c = icmp sge <4 x i32> %add1, zeroinitializer
%s = select <4 x i1> %c, <4 x i32> %add1, <4 x i32> %add2		%s = select <4 x i1> %c, <4 x i32> %add1, <4 x i32> %add2
%result = trunc <4 x i32> %s to <4 x i16>		%result = trunc <4 x i32> %s to <4 x i16>
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	; CHECK-NEXT: bx lr
ret <16 x i8> %result		ret <16 x i8> %result
}		}

define arm_aapcs_vfpcc <8 x i8> @vabd_v8u8(<8 x i8> %src1, <8 x i8> %src2) {		define arm_aapcs_vfpcc <8 x i8> @vabd_v8u8(<8 x i8> %src1, <8 x i8> %src2) {
; CHECK-LABEL: vabd_v8u8:		; CHECK-LABEL: vabd_v8u8:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.u8 q1, q1		; CHECK-NEXT: vmovlb.u8 q1, q1
; CHECK-NEXT: vmovlb.u8 q0, q0		; CHECK-NEXT: vmovlb.u8 q0, q0
; CHECK-NEXT: vsub.i16 q0, q0, q1		; CHECK-NEXT: vabd.u16 q0, q0, q1
; CHECK-NEXT: vabs.s16 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%zextsrc1 = zext <8 x i8> %src1 to <8 x i16>		%zextsrc1 = zext <8 x i8> %src1 to <8 x i16>
%zextsrc2 = zext <8 x i8> %src2 to <8 x i16>		%zextsrc2 = zext <8 x i8> %src2 to <8 x i16>
%add1 = sub <8 x i16> %zextsrc1, %zextsrc2		%add1 = sub <8 x i16> %zextsrc1, %zextsrc2
%add2 = sub <8 x i16> zeroinitializer, %add1		%add2 = sub <8 x i16> zeroinitializer, %add1
%c = icmp sge <8 x i16> %add1, zeroinitializer		%c = icmp sge <8 x i16> %add1, zeroinitializer
%s = select <8 x i1> %c, <8 x i16> %add1, <8 x i16> %add2		%s = select <8 x i1> %c, <8 x i16> %add1, <8 x i16> %add2
%result = trunc <8 x i16> %s to <8 x i8>		%result = trunc <8 x i16> %s to <8 x i8>
Show All 34 Lines	; CHECK-NEXT: bx lr
ret <8 x i16> %result		ret <8 x i16> %result
}		}

define arm_aapcs_vfpcc <4 x i16> @vabd_v4u16(<4 x i16> %src1, <4 x i16> %src2) {		define arm_aapcs_vfpcc <4 x i16> @vabd_v4u16(<4 x i16> %src1, <4 x i16> %src2) {
; CHECK-LABEL: vabd_v4u16:		; CHECK-LABEL: vabd_v4u16:
; CHECK: @ %bb.0:		; CHECK: @ %bb.0:
; CHECK-NEXT: vmovlb.u16 q1, q1		; CHECK-NEXT: vmovlb.u16 q1, q1
; CHECK-NEXT: vmovlb.u16 q0, q0		; CHECK-NEXT: vmovlb.u16 q0, q0
; CHECK-NEXT: vsub.i32 q0, q0, q1		; CHECK-NEXT: vabd.u32 q0, q0, q1
; CHECK-NEXT: vabs.s32 q0, q0
; CHECK-NEXT: bx lr		; CHECK-NEXT: bx lr
%zextsrc1 = zext <4 x i16> %src1 to <4 x i32>		%zextsrc1 = zext <4 x i16> %src1 to <4 x i32>
%zextsrc2 = zext <4 x i16> %src2 to <4 x i32>		%zextsrc2 = zext <4 x i16> %src2 to <4 x i32>
%add1 = sub <4 x i32> %zextsrc1, %zextsrc2		%add1 = sub <4 x i32> %zextsrc1, %zextsrc2
%add2 = sub <4 x i32> zeroinitializer, %add1		%add2 = sub <4 x i32> zeroinitializer, %add1
%c = icmp sge <4 x i32> %add1, zeroinitializer		%c = icmp sge <4 x i32> %add1, zeroinitializer
%s = select <4 x i1> %c, <4 x i32> %add1, <4 x i32> %add2		%s = select <4 x i1> %c, <4 x i32> %add1, <4 x i32> %add2
%result = trunc <4 x i32> %s to <4 x i16>		%result = trunc <4 x i32> %s to <4 x i16>
▲ Show 20 Lines • Show All 417 Lines • Show Last 20 Lines