This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.h
1/6
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
aarch64-load-ext.ll
-
arm64-vshift.ll
-
neon-extload.ll
-
sadd_sat_vec.ll
-
ssub_sat_vec.ll
-
uadd_sat_vec.ll
-
usub_sat_vec.ll

Differential D104782

[AArch64] Custom lower <4 x i8> loads
ClosedPublic

Authored by SjoerdMeijer on Jun 23 2021, 6:33 AM.

Download Raw Diff

Details

Reviewers

dmgreen
efriedma
fhahn
zatrazz
t.p.northover

Commits

rG51e434fc2590: [AArch64] Custom lower <4 x i8> loads

Summary

This custom lowers <4 x i8> vector loads using a 32-bit load, followed by 2 SSHLL instructions to extend it to a <4 x i32> vector. Before, it was really inefficient and expensive to construct a <4 x i32> for this as 4 byte loads and 4 moves were used. With this improvement SLP vectorisation might for example become profitable, see D103629.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

SjoerdMeijer created this revision.Jun 23 2021, 6:33 AM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptJun 23 2021, 6:33 AM

SjoerdMeijer requested review of this revision.Jun 23 2021, 6:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2021, 6:33 AM

Harbormaster completed remote builds in B110614: Diff 353947.Jun 23 2021, 6:33 AM

SjoerdMeijer added inline comments.Jun 23 2021, 6:35 AM

llvm/test/CodeGen/AArch64/neon-extload.ll
38	I am trying to remember how big-endian works in LLVM, but since I noticed these reverse here, this looked okay'ish to me, but I haven't tested BE. Any opinions on this welcome (while I look a bit more at this)....

fhahn added a reviewer: t.p.northover.Jun 23 2021, 7:40 AM

dmgreen added inline comments.Jun 23 2021, 7:50 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1129	What about v4i16 as well? And EXTLOAD (which is probably fine to treat as a ZEXTLOAD).
4485	It may be worth checking or asserting that the type is v4i32/v4i16. Also DL is more common.
4506	SIGN_EXTEND/ZERO_EXTEND do not need a second VT argument, I don't believe.
4510	ISD::SIGN_EXTEND > ExtType?

Missing testcases for load+ext to <4 x i16>.

llvm/test/CodeGen/AArch64/neon-extload.ll
38	Looks fine to me. The rev32 comes out of lowering the ISD::BITCAST.

SjoerdMeijer added inline comments.Jun 24 2021, 1:00 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1129	I thought the v4i16 were already handled, but will double check and precommit some tests for this (the new test in this patch, extended with v4i16 cases ) if we don't have them already. Yeah, I thought about EXTLOAD, but wasn't sure how to trigger this, but will look into this.
llvm/test/CodeGen/AArch64/neon-extload.ll
38	Thanks for confirming!

SjoerdMeijer mentioned this in rGc74aea466343: [AArch64] Precommit extending load tests for D104782. NFC..Jun 24 2021, 8:00 AM

Matt added a subscriber: Matt.Jun 24 2021, 10:56 AM

Address comments.

SjoerdMeijer added inline comments.Jun 24 2021, 11:41 AM

llvm/test/CodeGen/AArch64/neon-extload.ll
0	Ahh, just spotted that this is a regression. Will look into this.

LGTM

llvm/test/CodeGen/AArch64/neon-extload.ll
0	I'm not really concerned; IR-level optimizations should catch this.

This revision is now accepted and ready to land.Jun 24 2021, 11:56 AM

Fixed that regression by looking if there is one use that is an vector_extract_elt. But I can remove it if you think this is not necessary.

I don't really care either way...

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
4497	`Op.hasOneUse()` is very different from `Op->hasOneUse()`.

Harbormaster completed remote builds in B110881: Diff 354327.Jun 24 2021, 12:58 PM

In D104782#2839334, @SjoerdMeijer wrote:

Fixed that regression by looking if there is one use that is an vector_extract_elt. But I can remove it if you think this is not necessary.

Yeah I wouldn't worry. There may be other ways to fix it if we need, to do with demanded elements of a load. But this code will likely not come up in practice.

llvm/test/CodeGen/AArch64/neon-extload.ll
0	I would perhaps remove this dot, as dots in function names are a little unusual.

Okay, will remove this before committing (and change that function name in that test).

Thanks for the suggestions and help with this work @efriedma and @dmgreen !

This revision was landed with ongoing or failed builds.Jun 25 2021, 1:54 AM

Closed by commit rG51e434fc2590: [AArch64] Custom lower <4 x i8> loads (authored by SjoerdMeijer). · Explain Why

This revision was automatically updated to reflect the committed changes.

SjoerdMeijer added a commit: rG51e434fc2590: [AArch64] Custom lower <4 x i8> loads.

@SjoerdMeijer Since this commit, test-suite::GCC-C-execute-pr60960.test has been failing on our AArch64 bots:
https://lab.llvm.org/buildbot/#/builders/185/builds/40

(we moved them around recently so I think we missed building this commit when it first landed)

https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Regression/C/gcc-c-torture/execute/pr60960.c

In D104782#2844177, @DavidSpickett wrote:

@SjoerdMeijer Since this commit, test-suite::GCC-C-execute-pr60960.test has been failing on our AArch64 bots:
https://lab.llvm.org/buildbot/#/builders/185/builds/40

(we moved them around recently so I think we missed building this commit when it first landed)

https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/Regression/C/gcc-c-torture/execute/pr60960.c

Owwww....... thanks for reporting. I am looking into this. I will do a first finger on the pulse before I revert this since I haven't heard any other complaints, but let me know if you prefer to first revert it.

There is something going on that I will have to look at tomorrow, so will revert this.

SjoerdMeijer added a reverting change: rG3a7cea2858ff: Revert "[AArch64] Custom lower <4 x i8> loads".Jun 28 2021, 9:45 AM

SjoerdMeijer mentioned this in D105110: [AArch64] Fix for custom lowering <4 x i8> loads.Jun 29 2021, 6:13 AM

SjoerdMeijer mentioned this in rGb062fff87adc: Recommit "[AArch64] Custom lower <4 x i8> loads".Jun 30 2021, 1:19 AM

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.h

1 line

AArch64ISelLowering.cpp

43 lines

test/

CodeGen/

AArch64/

196 lines

33 lines

20 lines

20 lines

20 lines

20 lines

Diff 354446

llvm/lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 849 Lines • ▼ Show 20 Lines	private:

SDValue LowerCallResult(SDValue Chain, SDValue InFlag,		SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
CallingConv::ID CallConv, bool isVarArg,		CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &DL, SelectionDAG &DAG,		const SDLoc &DL, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals, bool isThisReturn,		SmallVectorImpl<SDValue> &InVals, bool isThisReturn,
SDValue ThisVal) const;		SDValue ThisVal) const;

		SDValue LowerLOAD(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerABS(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerABS(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerMGATHER(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerMGATHER(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerMSCATTER(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerMSCATTER(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;

▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,119 Lines • ▼ Show 20 Lines	if (Subtarget->hasFullFP16()) {
setOperationAction(ISD::FROUNDEVEN, Ty, Legal);		setOperationAction(ISD::FROUNDEVEN, Ty, Legal);
}		}
}		}

if (Subtarget->hasSVE())		if (Subtarget->hasSVE())
setOperationAction(ISD::VSCALE, MVT::i32, Custom);		setOperationAction(ISD::VSCALE, MVT::i32, Custom);

setTruncStoreAction(MVT::v4i16, MVT::v4i8, Custom);		setTruncStoreAction(MVT::v4i16, MVT::v4i8, Custom);

		setLoadExtAction(ISD::EXTLOAD, MVT::v4i16, MVT::v4i8, Custom);
		dmgreenUnsubmitted Not Done Reply Inline Actions What about v4i16 as well? And EXTLOAD (which is probably fine to treat as a ZEXTLOAD). dmgreen: What about v4i16 as well? And EXTLOAD (which is probably fine to treat as a ZEXTLOAD).
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions I thought the v4i16 were already handled, but will double check and precommit some tests for this (the new test in this patch, extended with v4i16 cases ) if we don't have them already. Yeah, I thought about EXTLOAD, but wasn't sure how to trigger this, but will look into this. SjoerdMeijer: I thought the v4i16 were already handled, but will double check and precommit some tests for…
		setLoadExtAction(ISD::SEXTLOAD, MVT::v4i16, MVT::v4i8, Custom);
		setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i16, MVT::v4i8, Custom);
		setLoadExtAction(ISD::EXTLOAD, MVT::v4i32, MVT::v4i8, Custom);
		setLoadExtAction(ISD::SEXTLOAD, MVT::v4i32, MVT::v4i8, Custom);
		setLoadExtAction(ISD::ZEXTLOAD, MVT::v4i32, MVT::v4i8, Custom);
}		}

if (Subtarget->hasSVE()) {		if (Subtarget->hasSVE()) {
for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {		for (auto VT : {MVT::nxv16i8, MVT::nxv8i16, MVT::nxv4i32, MVT::nxv2i64}) {
setOperationAction(ISD::BITREVERSE, VT, Custom);		setOperationAction(ISD::BITREVERSE, VT, Custom);
setOperationAction(ISD::BSWAP, VT, Custom);		setOperationAction(ISD::BSWAP, VT, Custom);
setOperationAction(ISD::CTLZ, VT, Custom);		setOperationAction(ISD::CTLZ, VT, Custom);
setOperationAction(ISD::CTPOP, VT, Custom);		setOperationAction(ISD::CTPOP, VT, Custom);
▲ Show 20 Lines • Show All 3,331 Lines • ▼ Show 20 Lines	SDValue Result = DAG.getMemIntrinsicNode(
{StoreNode->getChain(), Lo, Hi, StoreNode->getBasePtr()},		{StoreNode->getChain(), Lo, Hi, StoreNode->getBasePtr()},
StoreNode->getMemoryVT(), StoreNode->getMemOperand());		StoreNode->getMemoryVT(), StoreNode->getMemOperand());
return Result;		return Result;
}		}

return SDValue();		return SDValue();
}		}

		// Custom lowering for extending v4i8 vector loads.
		SDValue AArch64TargetLowering::LowerLOAD(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		dmgreenUnsubmitted Not Done Reply Inline Actions It may be worth checking or asserting that the type is v4i32/v4i16. Also DL is more common. dmgreen: It may be worth checking or asserting that the type is v4i32/v4i16. Also DL is more common.
		LoadSDNode *LoadNode = cast<LoadSDNode>(Op);
		assert(LoadNode && "Expected custom lowering of a load node");
		EVT VT = Op->getValueType(0);
		assert((VT == MVT::v4i16 \|\| VT == MVT::v4i32) && "Expected v4i16 or v4i32");

		if (LoadNode->getMemoryVT() != MVT::v4i8)
		return SDValue();

		unsigned ExtType;
		if (LoadNode->getExtensionType() == ISD::SEXTLOAD)
		ExtType = ISD::SIGN_EXTEND;
		else if (LoadNode->getExtensionType() == ISD::ZEXTLOAD \|\|
		efriedmaUnsubmitted Not Done Reply Inline Actions `Op.hasOneUse()` is very different from `Op->hasOneUse()`. efriedma: `Op.hasOneUse()` is very different from `Op->hasOneUse()`.
		LoadNode->getExtensionType() == ISD::EXTLOAD)
		ExtType = ISD::ZERO_EXTEND;
		else
		return SDValue();

		SDValue Load = DAG.getLoad(MVT::f32, DL, DAG.getEntryNode(),
		LoadNode->getBasePtr(), MachinePointerInfo());
		SDValue Chain = Load.getValue(1);
		SDValue Vec = DAG.getNode(ISD::SCALAR_TO_VECTOR, DL, MVT::v2f32, Load);
		dmgreenUnsubmitted Not Done Reply Inline Actions SIGN_EXTEND/ZERO_EXTEND do not need a second VT argument, I don't believe. dmgreen: SIGN_EXTEND/ZERO_EXTEND do not need a second VT argument, I don't believe.
		SDValue BC = DAG.getNode(ISD::BITCAST, DL, MVT::v8i8, Vec);
		SDValue Ext = DAG.getNode(ExtType, DL, MVT::v8i16, BC);
		Ext = DAG.getNode(ISD::EXTRACT_SUBVECTOR, DL, MVT::v4i16, Ext,
		DAG.getConstant(0, DL, MVT::i64));
		dmgreenUnsubmitted Not Done Reply Inline Actions ISD::SIGN_EXTEND > ExtType? dmgreen: ISD::SIGN_EXTEND > ExtType?
		if (VT == MVT::v4i32)
		Ext = DAG.getNode(ExtType, DL, MVT::v4i32, Ext);
		return DAG.getMergeValues({Ext, Chain}, DL);
		}

// Generate SUBS and CSEL for integer abs.		// Generate SUBS and CSEL for integer abs.
SDValue AArch64TargetLowering::LowerABS(SDValue Op, SelectionDAG &DAG) const {		SDValue AArch64TargetLowering::LowerABS(SDValue Op, SelectionDAG &DAG) const {
MVT VT = Op.getSimpleValueType();		MVT VT = Op.getSimpleValueType();

if (VT.isVector())		if (VT.isVector())
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABS_MERGE_PASSTHRU);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::ABS_MERGE_PASSTHRU);

SDLoc DL(Op);		SDLoc DL(Op);
▲ Show 20 Lines • Show All 227 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
}		}
case ISD::TRUNCATE:		case ISD::TRUNCATE:
return LowerTRUNCATE(Op, DAG);		return LowerTRUNCATE(Op, DAG);
case ISD::MLOAD:		case ISD::MLOAD:
return LowerFixedLengthVectorMLoadToSVE(Op, DAG);		return LowerFixedLengthVectorMLoadToSVE(Op, DAG);
case ISD::LOAD:		case ISD::LOAD:
if (useSVEForFixedLengthVectorVT(Op.getValueType()))		if (useSVEForFixedLengthVectorVT(Op.getValueType()))
return LowerFixedLengthVectorLoadToSVE(Op, DAG);		return LowerFixedLengthVectorLoadToSVE(Op, DAG);
llvm_unreachable("Unexpected request to lower ISD::LOAD");		return LowerLOAD(Op, DAG);
case ISD::ADD:		case ISD::ADD:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::ADD_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::ADD_PRED);
case ISD::AND:		case ISD::AND:
return LowerToScalableOp(Op, DAG);		return LowerToScalableOp(Op, DAG);
case ISD::SUB:		case ISD::SUB:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::SUB_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::SUB_PRED);
case ISD::FMAXIMUM:		case ISD::FMAXIMUM:
return LowerToPredicatedOp(Op, DAG, AArch64ISD::FMAX_PRED);		return LowerToPredicatedOp(Op, DAG, AArch64ISD::FMAX_PRED);
▲ Show 20 Lines • Show All 13,654 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-load-ext.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; CHECK-BE-NEXT: ret			; CHECK-BE-NEXT: ret
	%v2i8 = load <2 x i8>, <2 x i8>* %v2i8_ptr			%v2i8 = load <2 x i8>, <2 x i8>* %v2i8_ptr
	ret <2 x i8> %v2i8			ret <2 x i8> %v2i8
	}			}

	define <4 x i8> @test4(<4 x i8>* %v4i8_ptr) {			define <4 x i8> @test4(<4 x i8>* %v4i8_ptr) {
	; CHECK-LE-LABEL: test4:			; CHECK-LE-LABEL: test4:
	; CHECK-LE: // %bb.0:			; CHECK-LE: // %bb.0:
	; CHECK-LE-NEXT: ld1 { v0.b }[0], [x0]			; CHECK-LE-NEXT: ldr s0, [x0]
	; CHECK-LE-NEXT: add x8, x0, #1 // =1			; CHECK-LE-NEXT: ushll v0.8h, v0.8b, #0
	; CHECK-LE-NEXT: ld1 { v0.b }[2], [x8]
	; CHECK-LE-NEXT: add x8, x0, #2 // =2
	; CHECK-LE-NEXT: ld1 { v0.b }[4], [x8]
	; CHECK-LE-NEXT: add x8, x0, #3 // =3
	; CHECK-LE-NEXT: ld1 { v0.b }[6], [x8]
	; CHECK-LE-NEXT: // kill: def $d0 killed $d0 killed $q0			; CHECK-LE-NEXT: // kill: def $d0 killed $d0 killed $q0
	; CHECK-LE-NEXT: ret			; CHECK-LE-NEXT: ret
	;			;
	; CHECK-BE-LABEL: test4:			; CHECK-BE-LABEL: test4:
	; CHECK-BE: // %bb.0:			; CHECK-BE: // %bb.0:
	; CHECK-BE-NEXT: ld1 { v0.b }[0], [x0]			; CHECK-BE-NEXT: ldr s0, [x0]
	; CHECK-BE-NEXT: add x8, x0, #1 // =1			; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
	; CHECK-BE-NEXT: ld1 { v0.b }[2], [x8]			; CHECK-BE-NEXT: ushll v0.8h, v0.8b, #0
	; CHECK-BE-NEXT: add x8, x0, #2 // =2
	; CHECK-BE-NEXT: ld1 { v0.b }[4], [x8]
	; CHECK-BE-NEXT: add x8, x0, #3 // =3
	; CHECK-BE-NEXT: ld1 { v0.b }[6], [x8]
	; CHECK-BE-NEXT: rev64 v0.4h, v0.4h			; CHECK-BE-NEXT: rev64 v0.4h, v0.4h
	; CHECK-BE-NEXT: ret			; CHECK-BE-NEXT: ret
	%v4i8 = load <4 x i8>, <4 x i8>* %v4i8_ptr			%v4i8 = load <4 x i8>, <4 x i8>* %v4i8_ptr
	ret <4 x i8> %v4i8			ret <4 x i8> %v4i8
	}			}

				define <4 x i32> @fsext_v4i32(<4 x i8>* %a) {
				; CHECK-LE-LABEL: fsext_v4i32:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: ldr s0, [x0]
				; CHECK-LE-NEXT: sshll v0.8h, v0.8b, #0
				; CHECK-LE-NEXT: sshll v0.4s, v0.4h, #0
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: fsext_v4i32:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: ldr s0, [x0]
				; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
				; CHECK-BE-NEXT: sshll v0.8h, v0.8b, #0
				; CHECK-BE-NEXT: sshll v0.4s, v0.4h, #0
				; CHECK-BE-NEXT: rev64 v0.4s, v0.4s
				; CHECK-BE-NEXT: ext v0.16b, v0.16b, v0.16b, #8
				; CHECK-BE-NEXT: ret
				%x = load <4 x i8>, <4 x i8>* %a
				%y = sext <4 x i8> %x to <4 x i32>
				ret <4 x i32> %y
				}

				define <4 x i32> @fzext_v4i32(<4 x i8>* %a) {
				; CHECK-LE-LABEL: fzext_v4i32:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: ldr s0, [x0]
				; CHECK-LE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-LE-NEXT: ushll v0.4s, v0.4h, #0
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: fzext_v4i32:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: ldr s0, [x0]
				; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
				; CHECK-BE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-BE-NEXT: ushll v0.4s, v0.4h, #0
				; CHECK-BE-NEXT: rev64 v0.4s, v0.4s
				; CHECK-BE-NEXT: ext v0.16b, v0.16b, v0.16b, #8
				; CHECK-BE-NEXT: ret
				%x = load <4 x i8>, <4 x i8>* %a
				%y = zext <4 x i8> %x to <4 x i32>
				ret <4 x i32> %y
				}

				; TODO: This codegen could just be:
				; ldrb w0, [x0]
				;
				define i32 @loadExti32(<4 x i8>* %ref) {
				; CHECK-LE-LABEL: loadExti32:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: ldr s0, [x0]
				; CHECK-LE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-LE-NEXT: umov w8, v0.h[0]
				; CHECK-LE-NEXT: and w0, w8, #0xff
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: loadExti32:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: ldr s0, [x0]
				; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
				; CHECK-BE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-BE-NEXT: umov w8, v0.h[0]
				; CHECK-BE-NEXT: and w0, w8, #0xff
				; CHECK-BE-NEXT: ret
				%a = load <4 x i8>, <4 x i8>* %ref
				%vecext = extractelement <4 x i8> %a, i32 0
				%conv = zext i8 %vecext to i32
				ret i32 %conv
				}

				define <4 x i16> @fsext_v4i16(<4 x i8>* %a) {
				; CHECK-LE-LABEL: fsext_v4i16:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: ldr s0, [x0]
				; CHECK-LE-NEXT: sshll v0.8h, v0.8b, #0
				; CHECK-LE-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: fsext_v4i16:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: ldr s0, [x0]
				; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
				; CHECK-BE-NEXT: sshll v0.8h, v0.8b, #0
				; CHECK-BE-NEXT: rev64 v0.4h, v0.4h
				; CHECK-BE-NEXT: ret
				%x = load <4 x i8>, <4 x i8>* %a
				%y = sext <4 x i8> %x to <4 x i16>
				ret <4 x i16> %y
				}

				define <4 x i16> @fzext_v4i16(<4 x i8>* %a) {
				; CHECK-LE-LABEL: fzext_v4i16:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: ldr s0, [x0]
				; CHECK-LE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-LE-NEXT: // kill: def $d0 killed $d0 killed $q0
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: fzext_v4i16:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: ldr s0, [x0]
				; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
				; CHECK-BE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-BE-NEXT: rev64 v0.4h, v0.4h
				; CHECK-BE-NEXT: ret
				%x = load <4 x i8>, <4 x i8>* %a
				%y = zext <4 x i8> %x to <4 x i16>
				ret <4 x i16> %y
				}

				define <4 x i16> @anyext_v4i16(<4 x i8> %a, <4 x i8> %b) {
				; CHECK-LE-LABEL: anyext_v4i16:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: ldr s0, [x0]
				; CHECK-LE-NEXT: ldr s1, [x1]
				; CHECK-LE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-LE-NEXT: ushll v1.8h, v1.8b, #0
				; CHECK-LE-NEXT: add v0.4h, v0.4h, v1.4h
				; CHECK-LE-NEXT: shl v0.4h, v0.4h, #8
				; CHECK-LE-NEXT: sshr v0.4h, v0.4h, #8
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: anyext_v4i16:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: ldr s0, [x0]
				; CHECK-BE-NEXT: ldr s1, [x1]
				; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
				; CHECK-BE-NEXT: rev32 v1.8b, v1.8b
				; CHECK-BE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-BE-NEXT: ushll v1.8h, v1.8b, #0
				; CHECK-BE-NEXT: add v0.4h, v0.4h, v1.4h
				; CHECK-BE-NEXT: shl v0.4h, v0.4h, #8
				; CHECK-BE-NEXT: sshr v0.4h, v0.4h, #8
				; CHECK-BE-NEXT: rev64 v0.4h, v0.4h
				; CHECK-BE-NEXT: ret
				%x = load <4 x i8>, <4 x i8>* %a, align 4
				%y = load <4 x i8>, <4 x i8>* %b, align 4
				%z = add <4 x i8> %x, %y
				%s = sext <4 x i8> %z to <4 x i16>
				ret <4 x i16> %s
				}

				define <4 x i32> @anyext_v4i32(<4 x i8> %a, <4 x i8> %b) {
				; CHECK-LE-LABEL: anyext_v4i32:
				; CHECK-LE: // %bb.0:
				; CHECK-LE-NEXT: ldr s0, [x0]
				; CHECK-LE-NEXT: ldr s1, [x1]
				; CHECK-LE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-LE-NEXT: ushll v1.8h, v1.8b, #0
				; CHECK-LE-NEXT: add v0.4h, v0.4h, v1.4h
				; CHECK-LE-NEXT: ushll v0.4s, v0.4h, #0
				; CHECK-LE-NEXT: shl v0.4s, v0.4s, #24
				; CHECK-LE-NEXT: sshr v0.4s, v0.4s, #24
				; CHECK-LE-NEXT: ret
				;
				; CHECK-BE-LABEL: anyext_v4i32:
				; CHECK-BE: // %bb.0:
				; CHECK-BE-NEXT: ldr s0, [x0]
				; CHECK-BE-NEXT: ldr s1, [x1]
				; CHECK-BE-NEXT: rev32 v0.8b, v0.8b
				; CHECK-BE-NEXT: rev32 v1.8b, v1.8b
				; CHECK-BE-NEXT: ushll v0.8h, v0.8b, #0
				; CHECK-BE-NEXT: ushll v1.8h, v1.8b, #0
				; CHECK-BE-NEXT: add v0.4h, v0.4h, v1.4h
				; CHECK-BE-NEXT: ushll v0.4s, v0.4h, #0
				; CHECK-BE-NEXT: shl v0.4s, v0.4s, #24
				; CHECK-BE-NEXT: sshr v0.4s, v0.4s, #24
				; CHECK-BE-NEXT: rev64 v0.4s, v0.4s
				; CHECK-BE-NEXT: ext v0.16b, v0.16b, v0.16b, #8
				; CHECK-BE-NEXT: ret
				%x = load <4 x i8>, <4 x i8>* %a, align 4
				%y = load <4 x i8>, <4 x i8>* %b, align 4
				%z = add <4 x i8> %x, %y
				%s = sext <4 x i8> %z to <4 x i32>
				ret <4 x i32> %s
				}

llvm/test/CodeGen/AArch64/arm64-vshift.ll

	Show First 20 Lines • Show All 1,488 Lines • ▼ Show 20 Lines
	;CHECK: ushl.8h v0, v0, v0			;CHECK: ushl.8h v0, v0, v0
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = zext <8 x i8> %tmp1 to <8 x i16>			%tmp2 = zext <8 x i8> %tmp1 to <8 x i16>
	%tmp3 = call <8 x i16> @llvm.aarch64.neon.ushl.v8i16(<8 x i16> %tmp2, <8 x i16> %tmp2)			%tmp3 = call <8 x i16> @llvm.aarch64.neon.ushl.v8i16(<8 x i16> %tmp2, <8 x i16> %tmp2)
	ret <8 x i16> %tmp3			ret <8 x i16> %tmp3
	}			}

	define <4 x i32> @neon.ushl8h_constant_shift_extend_not_2x(<4 x i8>* %A) nounwind {			define <4 x i32> @neon.ushl8h_constant_shift_extend_not_2x(<4 x i8>* %A) nounwind {
	;CHECK-LABEL: @neon.ushl8h_constant_shift_extend_not_2x			; CHECK-LABEL: neon.ushl8h_constant_shift_extend_not_2x:
	;CHECK-NOT: ushll.8h v0,			; CHECK: // %bb.0:
	;CHECK: ldrb w8, [x0]			; CHECK-NEXT: ldr s0, [x0]
	;CHECK: fmov s0, w8			; CHECK-NEXT: ushll.8h v0, v0, #0
	;CHECK: ldrb w8, [x0, #1]			; CHECK-NEXT: ushll.4s v0, v0, #1
	;CHECK: mov.s v0[1], w8			; CHECK-NEXT: ret
	;CHECK: ldrb w8, [x0, #2]
	;CHECK: mov.s v0[2], w8
	;CHECK: ldrb w8, [x0, #3]
	;CHECK: mov.s v0[3], w8
	;CHECK: shl.4s v0, v0, #1
	%tmp1 = load <4 x i8>, <4 x i8>* %A			%tmp1 = load <4 x i8>, <4 x i8>* %A
	%tmp2 = zext <4 x i8> %tmp1 to <4 x i32>			%tmp2 = zext <4 x i8> %tmp1 to <4 x i32>
	%tmp3 = call <4 x i32> @llvm.aarch64.neon.ushl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)			%tmp3 = call <4 x i32> @llvm.aarch64.neon.ushl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
	ret <4 x i32> %tmp3			ret <4 x i32> %tmp3
	}			}

	define <8 x i16> @neon.ushl8_noext_constant_shift(<8 x i16>* %A) nounwind {			define <8 x i16> @neon.ushl8_noext_constant_shift(<8 x i16>* %A) nounwind {
	; CHECK-LABEL: neon.ushl8_noext_constant_shift			; CHECK-LABEL: neon.ushl8_noext_constant_shift
	▲ Show 20 Lines • Show All 116 Lines • ▼ Show 20 Lines
	;CHECK: sshll.8h v0, {{v[0-9]+}}, #1			;CHECK: sshll.8h v0, {{v[0-9]+}}, #1
	%tmp1 = load <8 x i8>, <8 x i8>* %A			%tmp1 = load <8 x i8>, <8 x i8>* %A
	%tmp2 = sext <8 x i8> %tmp1 to <8 x i16>			%tmp2 = sext <8 x i8> %tmp1 to <8 x i16>
	%tmp3 = call <8 x i16> @llvm.aarch64.neon.sshl.v8i16(<8 x i16> %tmp2, <8 x i16> <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>)			%tmp3 = call <8 x i16> @llvm.aarch64.neon.sshl.v8i16(<8 x i16> %tmp2, <8 x i16> <i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1, i16 1>)
	ret <8 x i16> %tmp3			ret <8 x i16> %tmp3
	}			}

	define <4 x i32> @neon.sshl4s_wrong_ext_constant_shift(<4 x i8>* %A) nounwind {			define <4 x i32> @neon.sshl4s_wrong_ext_constant_shift(<4 x i8>* %A) nounwind {
	;CHECK-LABEL: neon.sshl4s_wrong_ext_constant_shift			; CHECK-LABEL: neon.sshl4s_wrong_ext_constant_shift:
	;CHECK: ldrsb w8, [x0]			; CHECK: // %bb.0:
	;CHECK-NEXT: fmov s0, w8			; CHECK-NEXT: ldr s0, [x0]
	;CHECK-NEXT: ldrsb w8, [x0, #1]			; CHECK-NEXT: sshll.8h v0, v0, #0
	;CHECK-NEXT: mov.s v0[1], w8			; CHECK-NEXT: sshll.4s v0, v0, #1
	;CHECK-NEXT: ldrsb w8, [x0, #2]			; CHECK-NEXT: ret
	;CHECK-NEXT: mov.s v0[2], w8
	;CHECK-NEXT: ldrsb w8, [x0, #3]
	;CHECK-NEXT: mov.s v0[3], w8
	;CHECK-NEXT: shl.4s v0, v0, #1
	%tmp1 = load <4 x i8>, <4 x i8>* %A			%tmp1 = load <4 x i8>, <4 x i8>* %A
	%tmp2 = sext <4 x i8> %tmp1 to <4 x i32>			%tmp2 = sext <4 x i8> %tmp1 to <4 x i32>
	%tmp3 = call <4 x i32> @llvm.aarch64.neon.sshl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)			%tmp3 = call <4 x i32> @llvm.aarch64.neon.sshl.v4i32(<4 x i32> %tmp2, <4 x i32> <i32 1, i32 1, i32 1, i32 1>)
	ret <4 x i32> %tmp3			ret <4 x i32> %tmp3
	}			}

	define <4 x i32> @neon.sshll4s_constant_shift(<4 x i16>* %A) nounwind {			define <4 x i32> @neon.sshll4s_constant_shift(<4 x i16>* %A) nounwind {
	;CHECK-LABEL: neon.sshll4s_constant_shift			;CHECK-LABEL: neon.sshll4s_constant_shift
	▲ Show 20 Lines • Show All 832 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/neon-extload.ll

This file was deleted.

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64-none-linux-gnu -mattr=+neon \| FileCheck %s --check-prefix=LE
	; RUN: llc < %s -verify-machineinstrs -mtriple=aarch64_be-none-linux-gnu -mattr=+neon \| FileCheck %s --check-prefix=BE

	define <4 x i32> @fsext_v4i32(<4 x i8>* %a) {
	; LE-LABEL: fsext_v4i32:
	; LE: // %bb.0:
	; LE-NEXT: ldrsb w8, [x0]
	; LE-NEXT: ldrsb w9, [x0, #1]
	; LE-NEXT: ldrsb w10, [x0, #2]
	; LE-NEXT: ldrsb w11, [x0, #3]
	; LE-NEXT: fmov s0, w8
	; LE-NEXT: mov v0.s[1], w9
	; LE-NEXT: mov v0.s[2], w10
	; LE-NEXT: mov v0.s[3], w11
	; LE-NEXT: ret
	;
	; BE-LABEL: fsext_v4i32:
	; BE: // %bb.0:
	; BE-NEXT: ldrsb w8, [x0]
	; BE-NEXT: ldrsb w9, [x0, #1]
	; BE-NEXT: ldrsb w10, [x0, #2]
	; BE-NEXT: ldrsb w11, [x0, #3]
	; BE-NEXT: fmov s0, w8
	; BE-NEXT: mov v0.s[1], w9
	; BE-NEXT: mov v0.s[2], w10
	; BE-NEXT: mov v0.s[3], w11
	; BE-NEXT: rev64 v0.4s, v0.4s
	; BE-NEXT: ext v0.16b, v0.16b, v0.16b, #8
	; BE-NEXT: ret
	%x = load <4 x i8>, <4 x i8>* %a
	%y = sext <4 x i8> %x to <4 x i32>
	ret <4 x i32> %y
	}

	define <4 x i32> @fzext_v4i32(<4 x i8>* %a) {
	; LE-LABEL: fzext_v4i32:
	; LE: // %bb.0:
	; LE-NEXT: ldrb w8, [x0]
	; LE-NEXT: ldrb w9, [x0, #1]
	; LE-NEXT: ldrb w10, [x0, #2]
	; LE-NEXT: ldrb w11, [x0, #3]
	; LE-NEXT: fmov s0, w8
	; LE-NEXT: mov v0.s[1], w9
	; LE-NEXT: mov v0.s[2], w10
	; LE-NEXT: mov v0.s[3], w11
	; LE-NEXT: ret
	;
	; BE-LABEL: fzext_v4i32:
	; BE: // %bb.0:
	; BE-NEXT: ldrb w8, [x0]
	; BE-NEXT: ldrb w9, [x0, #1]
	; BE-NEXT: ldrb w10, [x0, #2]
	; BE-NEXT: ldrb w11, [x0, #3]
	; BE-NEXT: fmov s0, w8
	; BE-NEXT: mov v0.s[1], w9
	; BE-NEXT: mov v0.s[2], w10
	; BE-NEXT: mov v0.s[3], w11
	; BE-NEXT: rev64 v0.4s, v0.4s
	; BE-NEXT: ext v0.16b, v0.16b, v0.16b, #8
	; BE-NEXT: ret
	%x = load <4 x i8>, <4 x i8>* %a
	%y = zext <4 x i8> %x to <4 x i32>
	ret <4 x i32> %y
	}

	define i32 @loadExt.i32(<4 x i8>* %ref) {
	; CHECK-LABEL: loadExt.i32:
	; CHECK: ldrb
	; LE-LABEL: loadExt.i32:
	; LE: // %bb.0:
	; LE-NEXT: ldrb w0, [x0]
	; LE-NEXT: ret
	;
	; BE-LABEL: loadExt.i32:
	; BE: // %bb.0:
	; BE-NEXT: ldrb w0, [x0]
	; BE-NEXT: ret
	%a = load <4 x i8>, <4 x i8>* %ref
	%vecext = extractelement <4 x i8> %a, i32 0
	%conv = zext i8 %vecext to i32
	ret i32 %conv
	}

	define <4 x i16> @fsext_v4i16(<4 x i8>* %a) {
	; LE-LABEL: fsext_v4i16:
	; LE: // %bb.0:
	; LE-NEXT: ldrsb w8, [x0]
	; LE-NEXT: ldrsb w9, [x0, #1]
	; LE-NEXT: ldrsb w10, [x0, #2]
	; LE-NEXT: ldrsb w11, [x0, #3]
	; LE-NEXT: fmov s0, w8
	; LE-NEXT: mov v0.h[1], w9
	; LE-NEXT: mov v0.h[2], w10
	; LE-NEXT: mov v0.h[3], w11
	; LE-NEXT: // kill: def $d0 killed $d0 killed $q0
	; LE-NEXT: ret
	;
	; BE-LABEL: fsext_v4i16:
	; BE: // %bb.0:
	; BE-NEXT: ldrsb w8, [x0]
	; BE-NEXT: ldrsb w9, [x0, #1]
	; BE-NEXT: ldrsb w10, [x0, #2]
	; BE-NEXT: ldrsb w11, [x0, #3]
	; BE-NEXT: fmov s0, w8
	; BE-NEXT: mov v0.h[1], w9
	; BE-NEXT: mov v0.h[2], w10
	; BE-NEXT: mov v0.h[3], w11
	; BE-NEXT: rev64 v0.4h, v0.4h
	; BE-NEXT: ret
	%x = load <4 x i8>, <4 x i8>* %a
	%y = sext <4 x i8> %x to <4 x i16>
	ret <4 x i16> %y
	}

	define <4 x i16> @fzext_v4i16(<4 x i8>* %a) {
	; LE-LABEL: fzext_v4i16:
	; LE: // %bb.0:
	; LE-NEXT: ldrb w8, [x0]
	; LE-NEXT: ldrb w9, [x0, #1]
	; LE-NEXT: ldrb w10, [x0, #2]
	; LE-NEXT: ldrb w11, [x0, #3]
	; LE-NEXT: fmov s0, w8
	; LE-NEXT: mov v0.h[1], w9
	; LE-NEXT: mov v0.h[2], w10
	; LE-NEXT: mov v0.h[3], w11
	; LE-NEXT: // kill: def $d0 killed $d0 killed $q0
	; LE-NEXT: ret
	;
	; BE-LABEL: fzext_v4i16:
	; BE: // %bb.0:
	; BE-NEXT: ldrb w8, [x0]
	; BE-NEXT: ldrb w9, [x0, #1]
	; BE-NEXT: ldrb w10, [x0, #2]
	; BE-NEXT: ldrb w11, [x0, #3]
	; BE-NEXT: fmov s0, w8
	; BE-NEXT: mov v0.h[1], w9
	; BE-NEXT: mov v0.h[2], w10
	; BE-NEXT: mov v0.h[3], w11
	; BE-NEXT: rev64 v0.4h, v0.4h
	; BE-NEXT: ret
	%x = load <4 x i8>, <4 x i8>* %a
	%y = zext <4 x i8> %x to <4 x i16>
	ret <4 x i16> %y
	}

llvm/test/CodeGen/AArch64/sadd_sat_vec.ll

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%z = call <8 x i8> @llvm.sadd.sat.v8i8(<8 x i8> %x, <8 x i8> %y)		%z = call <8 x i8> @llvm.sadd.sat.v8i8(<8 x i8> %x, <8 x i8> %y)
store <8 x i8> %z, <8 x i8>* %pz		store <8 x i8> %z, <8 x i8>* %pz
ret void		ret void
}		}

define void @v4i8(<4 x i8>* %px, <4 x i8>* %py, <4 x i8>* %pz) nounwind {		define void @v4i8(<4 x i8>* %px, <4 x i8>* %py, <4 x i8>* %pz) nounwind {
; CHECK-LABEL: v4i8:		; CHECK-LABEL: v4i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldrsb w8, [x0]		; CHECK-NEXT: ldr s0, [x0]
; CHECK-NEXT: ldrsb w9, [x1]		; CHECK-NEXT: ldr s1, [x1]
; CHECK-NEXT: ldrsb w10, [x0, #1]		; CHECK-NEXT: sshll v0.8h, v0.8b, #0
; CHECK-NEXT: ldrsb w11, [x1, #1]		; CHECK-NEXT: sshll v1.8h, v1.8b, #0
; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: fmov s1, w9
; CHECK-NEXT: ldrsb w8, [x0, #2]
; CHECK-NEXT: ldrsb w9, [x1, #2]
; CHECK-NEXT: mov v0.h[1], w10
; CHECK-NEXT: mov v1.h[1], w11
; CHECK-NEXT: ldrsb w10, [x0, #3]
; CHECK-NEXT: ldrsb w11, [x1, #3]
; CHECK-NEXT: mov v0.h[2], w8
; CHECK-NEXT: mov v1.h[2], w9
; CHECK-NEXT: mov v0.h[3], w10
; CHECK-NEXT: mov v1.h[3], w11
; CHECK-NEXT: shl v1.4h, v1.4h, #8		; CHECK-NEXT: shl v1.4h, v1.4h, #8
; CHECK-NEXT: shl v0.4h, v0.4h, #8		; CHECK-NEXT: shl v0.4h, v0.4h, #8
; CHECK-NEXT: sqadd v0.4h, v0.4h, v1.4h		; CHECK-NEXT: sqadd v0.4h, v0.4h, v1.4h
; CHECK-NEXT: sshr v0.4h, v0.4h, #8		; CHECK-NEXT: sshr v0.4h, v0.4h, #8
; CHECK-NEXT: xtn v0.8b, v0.8h		; CHECK-NEXT: xtn v0.8b, v0.8h
; CHECK-NEXT: str s0, [x2]		; CHECK-NEXT: str s0, [x2]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%x = load <4 x i8>, <4 x i8>* %px		%x = load <4 x i8>, <4 x i8>* %px
▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/ssub_sat_vec.ll

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%z = call <8 x i8> @llvm.ssub.sat.v8i8(<8 x i8> %x, <8 x i8> %y)		%z = call <8 x i8> @llvm.ssub.sat.v8i8(<8 x i8> %x, <8 x i8> %y)
store <8 x i8> %z, <8 x i8>* %pz		store <8 x i8> %z, <8 x i8>* %pz
ret void		ret void
}		}

define void @v4i8(<4 x i8>* %px, <4 x i8>* %py, <4 x i8>* %pz) nounwind {		define void @v4i8(<4 x i8>* %px, <4 x i8>* %py, <4 x i8>* %pz) nounwind {
; CHECK-LABEL: v4i8:		; CHECK-LABEL: v4i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldrsb w8, [x0]		; CHECK-NEXT: ldr s0, [x0]
; CHECK-NEXT: ldrsb w9, [x1]		; CHECK-NEXT: ldr s1, [x1]
; CHECK-NEXT: ldrsb w10, [x0, #1]		; CHECK-NEXT: sshll v0.8h, v0.8b, #0
; CHECK-NEXT: ldrsb w11, [x1, #1]		; CHECK-NEXT: sshll v1.8h, v1.8b, #0
; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: fmov s1, w9
; CHECK-NEXT: ldrsb w8, [x0, #2]
; CHECK-NEXT: ldrsb w9, [x1, #2]
; CHECK-NEXT: mov v0.h[1], w10
; CHECK-NEXT: mov v1.h[1], w11
; CHECK-NEXT: ldrsb w10, [x0, #3]
; CHECK-NEXT: ldrsb w11, [x1, #3]
; CHECK-NEXT: mov v0.h[2], w8
; CHECK-NEXT: mov v1.h[2], w9
; CHECK-NEXT: mov v0.h[3], w10
; CHECK-NEXT: mov v1.h[3], w11
; CHECK-NEXT: shl v1.4h, v1.4h, #8		; CHECK-NEXT: shl v1.4h, v1.4h, #8
; CHECK-NEXT: shl v0.4h, v0.4h, #8		; CHECK-NEXT: shl v0.4h, v0.4h, #8
; CHECK-NEXT: sqsub v0.4h, v0.4h, v1.4h		; CHECK-NEXT: sqsub v0.4h, v0.4h, v1.4h
; CHECK-NEXT: sshr v0.4h, v0.4h, #8		; CHECK-NEXT: sshr v0.4h, v0.4h, #8
; CHECK-NEXT: xtn v0.8b, v0.8h		; CHECK-NEXT: xtn v0.8b, v0.8h
; CHECK-NEXT: str s0, [x2]		; CHECK-NEXT: str s0, [x2]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%x = load <4 x i8>, <4 x i8>* %px		%x = load <4 x i8>, <4 x i8>* %px
▲ Show 20 Lines • Show All 256 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/uadd_sat_vec.ll

Show First 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%z = call <8 x i8> @llvm.uadd.sat.v8i8(<8 x i8> %x, <8 x i8> %y)		%z = call <8 x i8> @llvm.uadd.sat.v8i8(<8 x i8> %x, <8 x i8> %y)
store <8 x i8> %z, <8 x i8>* %pz		store <8 x i8> %z, <8 x i8>* %pz
ret void		ret void
}		}

define void @v4i8(<4 x i8>* %px, <4 x i8>* %py, <4 x i8>* %pz) nounwind {		define void @v4i8(<4 x i8>* %px, <4 x i8>* %py, <4 x i8>* %pz) nounwind {
; CHECK-LABEL: v4i8:		; CHECK-LABEL: v4i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldrb w8, [x0]		; CHECK-NEXT: ldr s0, [x0]
; CHECK-NEXT: ldrb w9, [x1]		; CHECK-NEXT: ldr s1, [x1]
; CHECK-NEXT: ldrb w10, [x0, #1]
; CHECK-NEXT: ldrb w11, [x1, #1]
; CHECK-NEXT: ldrb w12, [x0, #2]
; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: ldrb w8, [x1, #2]
; CHECK-NEXT: fmov s1, w9
; CHECK-NEXT: mov v0.h[1], w10
; CHECK-NEXT: ldrb w9, [x0, #3]
; CHECK-NEXT: ldrb w10, [x1, #3]
; CHECK-NEXT: mov v1.h[1], w11
; CHECK-NEXT: mov v0.h[2], w12
; CHECK-NEXT: mov v1.h[2], w8
; CHECK-NEXT: mov v0.h[3], w9
; CHECK-NEXT: mov v1.h[3], w10
; CHECK-NEXT: movi d2, #0xff00ff00ff00ff		; CHECK-NEXT: movi d2, #0xff00ff00ff00ff
		; CHECK-NEXT: ushll v0.8h, v0.8b, #0
		; CHECK-NEXT: ushll v1.8h, v1.8b, #0
; CHECK-NEXT: add v0.4h, v0.4h, v1.4h		; CHECK-NEXT: add v0.4h, v0.4h, v1.4h
; CHECK-NEXT: umin v0.4h, v0.4h, v2.4h		; CHECK-NEXT: umin v0.4h, v0.4h, v2.4h
; CHECK-NEXT: xtn v0.8b, v0.8h		; CHECK-NEXT: xtn v0.8b, v0.8h
; CHECK-NEXT: str s0, [x2]		; CHECK-NEXT: str s0, [x2]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%x = load <4 x i8>, <4 x i8>* %px		%x = load <4 x i8>, <4 x i8>* %px
%y = load <4 x i8>, <4 x i8>* %py		%y = load <4 x i8>, <4 x i8>* %py
%z = call <4 x i8> @llvm.uadd.sat.v4i8(<4 x i8> %x, <4 x i8> %y)		%z = call <4 x i8> @llvm.uadd.sat.v4i8(<4 x i8> %x, <4 x i8> %y)
▲ Show 20 Lines • Show All 250 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/usub_sat_vec.ll

Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
%z = call <8 x i8> @llvm.usub.sat.v8i8(<8 x i8> %x, <8 x i8> %y)		%z = call <8 x i8> @llvm.usub.sat.v8i8(<8 x i8> %x, <8 x i8> %y)
store <8 x i8> %z, <8 x i8>* %pz		store <8 x i8> %z, <8 x i8>* %pz
ret void		ret void
}		}

define void @v4i8(<4 x i8>* %px, <4 x i8>* %py, <4 x i8>* %pz) nounwind {		define void @v4i8(<4 x i8>* %px, <4 x i8>* %py, <4 x i8>* %pz) nounwind {
; CHECK-LABEL: v4i8:		; CHECK-LABEL: v4i8:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ldrb w8, [x0]		; CHECK-NEXT: ldr s0, [x0]
; CHECK-NEXT: ldrb w9, [x1]		; CHECK-NEXT: ldr s1, [x1]
; CHECK-NEXT: ldrb w10, [x0, #1]		; CHECK-NEXT: ushll v0.8h, v0.8b, #0
; CHECK-NEXT: ldrb w11, [x1, #1]		; CHECK-NEXT: ushll v1.8h, v1.8b, #0
; CHECK-NEXT: fmov s0, w8
; CHECK-NEXT: fmov s1, w9
; CHECK-NEXT: ldrb w8, [x0, #2]
; CHECK-NEXT: ldrb w9, [x1, #2]
; CHECK-NEXT: mov v0.h[1], w10
; CHECK-NEXT: mov v1.h[1], w11
; CHECK-NEXT: ldrb w10, [x0, #3]
; CHECK-NEXT: ldrb w11, [x1, #3]
; CHECK-NEXT: mov v0.h[2], w8
; CHECK-NEXT: mov v1.h[2], w9
; CHECK-NEXT: mov v0.h[3], w10
; CHECK-NEXT: mov v1.h[3], w11
; CHECK-NEXT: uqsub v0.4h, v0.4h, v1.4h		; CHECK-NEXT: uqsub v0.4h, v0.4h, v1.4h
; CHECK-NEXT: xtn v0.8b, v0.8h		; CHECK-NEXT: xtn v0.8b, v0.8h
; CHECK-NEXT: str s0, [x2]		; CHECK-NEXT: str s0, [x2]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%x = load <4 x i8>, <4 x i8>* %px		%x = load <4 x i8>, <4 x i8>* %px
%y = load <4 x i8>, <4 x i8>* %py		%y = load <4 x i8>, <4 x i8>* %py
%z = call <4 x i8> @llvm.usub.sat.v4i8(<4 x i8> %x, <4 x i8> %y)		%z = call <4 x i8> @llvm.usub.sat.v4i8(<4 x i8> %x, <4 x i8> %y)
store <4 x i8> %z, <4 x i8>* %pz		store <4 x i8> %z, <4 x i8>* %pz
▲ Show 20 Lines • Show All 246 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Custom lower <4 x i8> loadsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 354446

llvm/lib/Target/AArch64/AArch64ISelLowering.h

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/aarch64-load-ext.ll

llvm/test/CodeGen/AArch64/arm64-vshift.ll

llvm/test/CodeGen/AArch64/neon-extload.ll

llvm/test/CodeGen/AArch64/sadd_sat_vec.ll

llvm/test/CodeGen/AArch64/ssub_sat_vec.ll

llvm/test/CodeGen/AArch64/uadd_sat_vec.ll

llvm/test/CodeGen/AArch64/usub_sat_vec.ll

[AArch64] Custom lower <4 x i8> loads
ClosedPublic