Download Raw Diff

Details

Reviewers

fhahn
rengolin
javed.absar
huntergr
SjoerdMeijer
t.p.northover
echristo
evandro
efriedma

Summary

This patch adds a custom trunc store lowering for v4i8 vector types.
Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h)
and default action for v4i8 is to extract each element and issue 4
byte stores.

A better strategy would be to extended the promoted v4i16 to v8i16
(with undef elements) and extract and store the word lane which
represents the v4i8 subvectores. The construction:

define void @foo(<4 x i16> %x, i8* nocapture %p) {
  %0 = trunc <4 x i16> %x to <4 x i8>
  %1 = bitcast i8* %p to <4 x i8>*
  store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2
  ret void
}

Can be optimized from:

umov    w8, v0.h[3]
umov    w9, v0.h[2]
umov    w10, v0.h[1]
umov    w11, v0.h[0]
strb    w8, [x0, #3]
strb    w9, [x0, #2]
strb    w10, [x0, #1]
strb    w11, [x0]
ret

To:

xtn     v0.8b, v0.8h
str     s0, [x0]
ret

The patch also adjust the memory cost for autovectorization, so the C
code:

void foo (const int *src, int width, unsigned char *dst)
{
  for (int i = 0; i < width; i++)
     *dst++ = *src++;
}

can be vectorized to:

.LBB0_4:                                // %vector.body
                                        // =>This Inner Loop Header: Depth=1
      ldr     q0, [x0], #16
      subs    x12, x12, #4            // =4
      xtn     v0.4h, v0.4s
      xtn     v0.8b, v0.8h
      st1     { v0.s }[0], [x2], #4
      b.ne    .LBB0_4

Instead of byte operations.

Diff Detail

Event Timeline

zatrazz created this revision.Jun 19 2018, 12:38 PM

Herald added subscribers: kosarev, kristof.beyls. · View Herald TranscriptJun 19 2018, 12:38 PM

I wonder if we should prefer to widen <2 x i8> and <4 x i8> to <8 x i8> instead of promoting to <4 x i16>. It would make stores like this a bit cheaper. Maybe an interesting experiment at some point (mostly just modifying AArch64TargetLowering::getPreferredVectorAction, I think, and seeing what happens to the generated code).

Do we need similar handling to this patch for <2 x i16> or <2 x i8>?

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
636–637	It looks like we still scalarize extloads?
test/CodeGen/AArch64/neon-truncStore-extLoad.ll
27	Why does this CHECK line have two possible lowerings?

rengolin added inline comments.Jun 19 2018, 1:33 PM

lib/Target/AArch64/AArch64ISelLowering.cpp
2711	`LowerSTORE` is too generic a name for this specific function, but I get it it's the pattern in the custom lowering. You can keep the generic name as this will be the entry point for all custom store lowering, even if it only implements one type right now. But I'd add a longer comment explaining, for now, this only lowers truncating vector stores, but would be the place to add any custom store lowering (vector or not, truncating or not).
2714	asserts on `StoreNode` being not null
2715	declare `AS` close to its usage to make it clear it's not left over dead code.
test/CodeGen/AArch64/neon-truncStore-extLoad.ll
27	looks like it's a copy of the pattern around... weird... What's the instruction actually generated?

In D48332#1136937, @efriedma wrote:

I wonder if we should prefer to widen <2 x i8> and <4 x i8> to <8 x i8> instead of promoting to <4 x i16>. It would make stores like this a bit cheaper. Maybe an interesting experiment at some point (mostly just modifying AArch64TargetLowering::getPreferredVectorAction, I think, and seeing what happens to the generated code).

I tried your suggestion, but without further tuning in vector lowering this does not yield much gain on a vector store operation. The operation:

%0 = trunc <4 x i32> %a to <4 x i8>
store <4 x i8> %0, <4 x i8>* %p, align 4, !tbaa !2

is scalarized because LowerBUILD_VECTOR can't really see a good pattern to use on it:

Custom lowering: t49: v8i8 = BUILD_VECTOR t37, t40, t43, t46, undef:i32, undef:i32, undef:i32, undef:i32
AArch64TargetLowering::ReconstructShuffle
Reshuffle failed: span too large for a VEXT to cope
LowerBUILD_VECTOR: alternatives failed, creating sequence of INSERT_VECTOR_ELT

Maybe if we handle v4i8 as v4i32 we could get a better code generation, but also it would require some more tuning in generic code. I do see a better code generation for trunc store v2i32 to v2i8, but I am not convinced that this vector type should be tuned.

Do we need similar handling to this patch for <2 x i16> or <2 x i8>?

The trunc store for v2i16 to v2i8 and v4i32 to v4i8 indeed can be optimized, but I also think it can be orthogonal to this optimization.

zatrazz added inline comments.Jun 20 2018, 11:52 AM

lib/Target/AArch64/AArch64ISelLowering.cpp
2711	Ack, I will add a comment.
2714	Ack.
2715	Ack.
test/CodeGen/AArch64/neon-truncStore-extLoad.ll
27	Indeed I used the previous function as example (truncStore.v4i32), and for current testing the instruction being generated is 'str'.

I tried your suggestion, but without further tuning in vector lowering this does not yield much gain on a vector store operation.

Yes, it's not really an alternative to this patch, just something to explore if you care about the lowering of <4 x i8> in general.

Updated patch from previous comments.

rengolin added inline comments.Jun 20 2018, 1:11 PM

lib/Target/AArch64/AArch64ISelLowering.cpp
2717	Last nit: we usually add a text at the end, in this case, something saying this is not the droids we're looking for... ex: assert(StoreNode && "Can only custom lower store nodes"); assert(VT.isVector() && "Can only custom lower vector store types");

Updated patch based from previous comment.

Thanks! I'm happy with Eli is happy. :)

efriedma added inline comments.Jun 21 2018, 11:27 AM

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
636–637	I'm still not sure this change belongs in this patch, given that we still scalarize `<4 x i8>` loads.
test/CodeGen/AArch64/neon-truncStore-extLoad.ll
27	Please get rid of the "\|" in all the patterns in this file.

zatrazz added inline comments.Jun 21 2018, 2:01 PM

lib/Target/AArch64/AArch64TargetTransformInfo.cpp
636–637	I will add this modification on another patch then.
test/CodeGen/AArch64/neon-truncStore-extLoad.ll
27	Right (for some reason I had in mind I already fixed it).

Updated patch based on previous comments. Indeed, changing AArch64TTIImpl::getMemoryOpCost for both load and store was wrong, since <4 x i8> loads are still scalarized. I have changed to just adjust cost for stores.

I also think widen <2 x i8> and <4 x i8> to <8 x i8> instead of promoting to <4 x i16> is a more comprehensible approach and I will check what kind of adjustment would require to make it profitable in all cases.

efriedma added inline comments.Jun 22 2018, 11:12 AM

test/CodeGen/AArch64/neon-truncStore-extLoad.ll
27	Still not fixed...?

Update patch from previous comments.

Ping?

LGTM

This revision is now accepted and ready to land.Jun 26 2018, 11:04 AM

zatrazz closed this revision.Jun 27 2018, 7:04 AM

Diff 152528

lib/Target/AArch64/AArch64ISelLowering.h

Show First 20 Lines • Show All 518 Lines • ▼ Show 20 Lines	private:

SDValue LowerCallResult(SDValue Chain, SDValue InFlag,		SDValue LowerCallResult(SDValue Chain, SDValue InFlag,
CallingConv::ID CallConv, bool isVarArg,		CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
const SDLoc &DL, SelectionDAG &DAG,		const SDLoc &DL, SelectionDAG &DAG,
SmallVectorImpl<SDValue> &InVals, bool isThisReturn,		SmallVectorImpl<SDValue> &InVals, bool isThisReturn,
SDValue ThisVal) const;		SDValue ThisVal) const;

		SDValue LowerSTORE(SDValue Op, SelectionDAG &DAG) const;

SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;

bool isEligibleForTailCallOptimization(		bool isEligibleForTailCallOptimization(
SDValue Callee, CallingConv::ID CalleeCC, bool isVarArg,		SDValue Callee, CallingConv::ID CalleeCC, bool isVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals,		const SmallVectorImpl<SDValue> &OutVals,
const SmallVectorImpl<ISD::InputArg> &Ins, SelectionDAG &DAG) const;		const SmallVectorImpl<ISD::InputArg> &Ins, SelectionDAG &DAG) const;

▲ Show 20 Lines • Show All 158 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 736 Lines • ▼ Show 20 Lines	if (Subtarget->hasNEON()) {
for (MVT Ty : {MVT::v2f32, MVT::v4f32, MVT::v2f64}) {		for (MVT Ty : {MVT::v2f32, MVT::v4f32, MVT::v2f64}) {
setOperationAction(ISD::FFLOOR, Ty, Legal);		setOperationAction(ISD::FFLOOR, Ty, Legal);
setOperationAction(ISD::FNEARBYINT, Ty, Legal);		setOperationAction(ISD::FNEARBYINT, Ty, Legal);
setOperationAction(ISD::FCEIL, Ty, Legal);		setOperationAction(ISD::FCEIL, Ty, Legal);
setOperationAction(ISD::FRINT, Ty, Legal);		setOperationAction(ISD::FRINT, Ty, Legal);
setOperationAction(ISD::FTRUNC, Ty, Legal);		setOperationAction(ISD::FTRUNC, Ty, Legal);
setOperationAction(ISD::FROUND, Ty, Legal);		setOperationAction(ISD::FROUND, Ty, Legal);
}		}

		setTruncStoreAction(MVT::v4i16, MVT::v4i8, Custom);
}		}

PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();		PredictableSelectIsExpensive = Subtarget->predictableSelectIsExpensive();
}		}

void AArch64TargetLowering::addTypeForNEON(MVT VT, MVT PromotedBitwiseVT) {		void AArch64TargetLowering::addTypeForNEON(MVT VT, MVT PromotedBitwiseVT) {
assert(VT.isVector() && "VT should be a vector type");		assert(VT.isVector() && "VT should be a vector type");

▲ Show 20 Lines • Show All 1,915 Lines • ▼ Show 20 Lines	case Intrinsic::aarch64_neon_smin:
return DAG.getNode(ISD::SMIN, dl, Op.getValueType(),		return DAG.getNode(ISD::SMIN, dl, Op.getValueType(),
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));
case Intrinsic::aarch64_neon_umin:		case Intrinsic::aarch64_neon_umin:
return DAG.getNode(ISD::UMIN, dl, Op.getValueType(),		return DAG.getNode(ISD::UMIN, dl, Op.getValueType(),
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));
}		}
}		}

		// Custom lower trunc store for v4i8 vectors, since it is promoted to v4i16.
		static SDValue LowerTruncateVectorStore(SDLoc DL, StoreSDNode *ST,
		EVT VT, EVT MemVT,
		SelectionDAG &DAG) {
		assert(VT.isVector() && "VT should be a vector type");
		assert(MemVT == MVT::v4i8 && VT == MVT::v4i16);

		SDValue Value = ST->getValue();

		// It first extend the promoted v4i16 to v8i16, truncate to v8i8, and extract
		// the word lane which represent the v4i8 subvector. It optimizes the store
		// to:
		//
		// xtn v0.8b, v0.8h
		// str s0, [x0]

		SDValue Undef = DAG.getUNDEF(MVT::i16);
		SDValue UndefVec = DAG.getBuildVector(MVT::v4i16, DL,
		{Undef, Undef, Undef, Undef});

		SDValue TruncExt = DAG.getNode(ISD::CONCAT_VECTORS, DL, MVT::v8i16,
		Value, UndefVec);
		SDValue Trunc = DAG.getNode(ISD::TRUNCATE, DL, MVT::v8i8, TruncExt);

		Trunc = DAG.getNode(ISD::BITCAST, DL, MVT::v2i32, Trunc);
		SDValue ExtractTrunc = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i32,
		Trunc, DAG.getConstant(0, DL, MVT::i64));

		return DAG.getStore(ST->getChain(), DL, ExtractTrunc,
		ST->getBasePtr(), ST->getMemOperand());
		}

		// Custom lowering for any store, vector or scalar and/or default or with
		// a truncate operations. Currently only custom lower truncate operation
		rengolinUnsubmitted Not Done Reply Inline Actions `LowerSTORE` is too generic a name for this specific function, but I get it it's the pattern in the custom lowering. You can keep the generic name as this will be the entry point for all custom store lowering, even if it only implements one type right now. But I'd add a longer comment explaining, for now, this only lowers truncating vector stores, but would be the place to add any custom store lowering (vector or not, truncating or not). rengolin: `LowerSTORE` is too generic a name for this specific function, but I get it it's the pattern in…
		zatrazzAuthorUnsubmitted Not Done Reply Inline Actions Ack, I will add a comment. zatrazz: Ack, I will add a comment.
		// from vector v4i16 to v4i8.
		SDValue AArch64TargetLowering::LowerSTORE(SDValue Op,
		SelectionDAG &DAG) const {
		rengolinUnsubmitted Not Done Reply Inline Actions asserts on `StoreNode` being not null rengolin: asserts on `StoreNode` being not null
		zatrazzAuthorUnsubmitted Not Done Reply Inline Actions Ack. zatrazz: Ack.
		SDLoc Dl(Op);
		rengolinUnsubmitted Not Done Reply Inline Actions declare `AS` close to its usage to make it clear it's not left over dead code. rengolin: declare `AS` close to its usage to make it clear it's not left over dead code.
		zatrazzAuthorUnsubmitted Not Done Reply Inline Actions Ack. zatrazz: Ack.
		StoreSDNode *StoreNode = cast<StoreSDNode>(Op);
		assert (StoreNode && "Can only custom lower store nodes");
		rengolinUnsubmitted Not Done Reply Inline Actions Last nit: we usually add a text at the end, in this case, something saying this is not the droids we're looking for... ex: assert(StoreNode && "Can only custom lower store nodes"); assert(VT.isVector() && "Can only custom lower vector store types"); rengolin: Last nit: we usually add a text at the end, in this case, something saying this is not the…

		SDValue Value = StoreNode->getValue();

		EVT VT = Value.getValueType();
		EVT MemVT = StoreNode->getMemoryVT();

		assert (VT.isVector() && "Can only custom lower vector store types");

		unsigned AS = StoreNode->getAddressSpace();
		unsigned Align = StoreNode->getAlignment();
		if (Align < MemVT.getStoreSize() &&
		!allowsMisalignedMemoryAccesses(MemVT, AS, Align, nullptr)) {
		return scalarizeVectorStore(StoreNode, DAG);
		}

		if (StoreNode->isTruncatingStore()) {
		return LowerTruncateVectorStore(Dl, StoreNode, VT, MemVT, DAG);
		}

		return SDValue();
		}

SDValue AArch64TargetLowering::LowerOperation(SDValue Op,		SDValue AArch64TargetLowering::LowerOperation(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
LLVM_DEBUG(dbgs() << "Custom lowering: ");		LLVM_DEBUG(dbgs() << "Custom lowering: ");
LLVM_DEBUG(Op.dump());		LLVM_DEBUG(Op.dump());

switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
default:		default:
llvm_unreachable("unimplemented operand");		llvm_unreachable("unimplemented operand");
▲ Show 20 Lines • Show All 95 Lines • ▼ Show 20 Lines	case ISD::FLT_ROUNDS_:
return LowerFLT_ROUNDS_(Op, DAG);		return LowerFLT_ROUNDS_(Op, DAG);
case ISD::MUL:		case ISD::MUL:
return LowerMUL(Op, DAG);		return LowerMUL(Op, DAG);
case ISD::MULHS:		case ISD::MULHS:
case ISD::MULHU:		case ISD::MULHU:
return LowerMULH(Op, DAG);		return LowerMULH(Op, DAG);
case ISD::INTRINSIC_WO_CHAIN:		case ISD::INTRINSIC_WO_CHAIN:
return LowerINTRINSIC_WO_CHAIN(Op, DAG);		return LowerINTRINSIC_WO_CHAIN(Op, DAG);
		case ISD::STORE:
		return LowerSTORE(Op, DAG);
case ISD::VECREDUCE_ADD:		case ISD::VECREDUCE_ADD:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN:		case ISD::VECREDUCE_UMIN:
case ISD::VECREDUCE_FMAX:		case ISD::VECREDUCE_FMAX:
case ISD::VECREDUCE_FMIN:		case ISD::VECREDUCE_FMIN:
return LowerVECREDUCE(Op, DAG);		return LowerVECREDUCE(Op, DAG);
▲ Show 20 Lines • Show All 8,685 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

Show First 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	if (ST->isMisaligned128StoreSlow() && Opcode == Instruction::Store &&
// unaligned 128-bit stores because the negative impact that has shown in		// unaligned 128-bit stores because the negative impact that has shown in
// practice on inlined block copy code.		// practice on inlined block copy code.
// We make such stores expensive so that we will only vectorize if there		// We make such stores expensive so that we will only vectorize if there
// are 6 other instructions getting vectorized.		// are 6 other instructions getting vectorized.
const int AmortizationCost = 6;		const int AmortizationCost = 6;

return LT.first * 2 * AmortizationCost;		return LT.first * 2 * AmortizationCost;
}		}

if (Ty->isVectorTy() && Ty->getVectorElementType()->isIntegerTy(8) &&		if (Ty->isVectorTy() && Ty->getVectorElementType()->isIntegerTy(8)) {
		efriedmaUnsubmitted Not Done Reply Inline Actions It looks like we still scalarize extloads? efriedma: It looks like we still scalarize extloads?
		efriedmaUnsubmitted Not Done Reply Inline Actions I'm still not sure this change belongs in this patch, given that we still scalarize `<4 x i8>` loads. efriedma: I'm still not sure this change belongs in this patch, given that we still scalarize `<4 x i8>`…
		zatrazzAuthorUnsubmitted Not Done Reply Inline Actions I will add this modification on another patch then. zatrazz: I will add this modification on another patch then.
Ty->getVectorNumElements() < 8) {		unsigned ProfitableNumElements;
// We scalarize the loads/stores because there is not v.4b register and we		if (Opcode == Instruction::Store)
// have to promote the elements to v.4h.		// We use a custom trunc store lowering so v.4b should be profitable.
		ProfitableNumElements = 4;
		else
		// We scalarize the loads because there is not v.4b register and we
		// have to promote the elements to v.2.
		ProfitableNumElements = 8;

		if (Ty->getVectorNumElements() < ProfitableNumElements) {
unsigned NumVecElts = Ty->getVectorNumElements();		unsigned NumVecElts = Ty->getVectorNumElements();
unsigned NumVectorizableInstsToAmortize = NumVecElts * 2;		unsigned NumVectorizableInstsToAmortize = NumVecElts * 2;
// We generate 2 instructions per vector element.		// We generate 2 instructions per vector element.
return NumVectorizableInstsToAmortize * NumVecElts * 2;		return NumVectorizableInstsToAmortize * NumVecElts * 2;
}		}
		}

return LT.first;		return LT.first;
}		}

int AArch64TTIImpl::getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,		int AArch64TTIImpl::getInterleavedMemoryOpCost(unsigned Opcode, Type *VecTy,
unsigned Factor,		unsigned Factor,
ArrayRef<unsigned> Indices,		ArrayRef<unsigned> Indices,
unsigned Alignment,		unsigned Alignment,
▲ Show 20 Lines • Show All 310 Lines • Show Last 20 Lines

test/Analysis/CostModel/AArch64/store.ll

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	define void @getMemoryOpCost() {
; CHECK-NEXT: cost of 1 for {{.*}} store <8 x half>		; CHECK-NEXT: cost of 1 for {{.*}} store <8 x half>
; SLOW_MISALIGNED_128_STORE-NEXT: cost of 12 for {{.*}} store <8 x half>		; SLOW_MISALIGNED_128_STORE-NEXT: cost of 12 for {{.*}} store <8 x half>
store <8 x half> undef, <8 x half> * undef		store <8 x half> undef, <8 x half> * undef

; We scalarize the loads/stores because there is no vector register name for		; We scalarize the loads/stores because there is no vector register name for
; these types (they get extended to v.4h/v.2s).		; these types (they get extended to v.4h/v.2s).
; CHECK: cost of 16 {{.*}} store		; CHECK: cost of 16 {{.*}} store
store <2 x i8> undef, <2 x i8> * undef		store <2 x i8> undef, <2 x i8> * undef
; CHECK: cost of 64 {{.*}} store		; CHECK: cost of 1 {{.*}} store
store <4 x i8> undef, <4 x i8> * undef		store <4 x i8> undef, <4 x i8> * undef
; CHECK: cost of 16 {{.*}} load		; CHECK: cost of 16 {{.*}} load
load <2 x i8> , <2 x i8> * undef		load <2 x i8> , <2 x i8> * undef
; CHECK: cost of 64 {{.*}} load		; CHECK: cost of 64 {{.*}} load
load <4 x i8> , <4 x i8> * undef		load <4 x i8> , <4 x i8> * undef

ret void		ret void
}		}

test/CodeGen/AArch64/neon-truncStore-extLoad.ll

	Show All 14 Lines
	; CHECK-LABEL: truncStore.v4i32:			; CHECK-LABEL: truncStore.v4i32:
	; CHECK: xtn v{{[0-9]+}}.4h, v{{[0-9]+}}.4s			; CHECK: xtn v{{[0-9]+}}.4h, v{{[0-9]+}}.4s
	; CHECK: {{st1 { v[0-9]+.4h }\|str d[0-9]+}}, [x{{[0-9]+\|sp}}]			; CHECK: {{st1 { v[0-9]+.4h }\|str d[0-9]+}}, [x{{[0-9]+\|sp}}]
	%b = trunc <4 x i32> %a to <4 x i16>			%b = trunc <4 x i32> %a to <4 x i16>
	store <4 x i16> %b, <4 x i16>* %result			store <4 x i16> %b, <4 x i16>* %result
	ret void			ret void
	}			}

				define void @truncStore.v4i8(<4 x i32> %a, <4 x i8>* %result) {
				; CHECK-LABEL: truncStore.v4i8:
				; CHECK: xtn [[TMP:(v[0-9]+)]].4h, v{{[0-9]+}}.4s
				; CHECK-NEXT: xtn [[TMP2:(v[0-9]+)]].8b, [[TMP]].8h
				; CHECK-NEXT: str s{{[0-9]+}}, [x{{[0-9]+}}]
				efriedmaUnsubmitted Not Done Reply Inline Actions Why does this CHECK line have two possible lowerings? efriedma: Why does this CHECK line have two possible lowerings?
				rengolinUnsubmitted Not Done Reply Inline Actions looks like it's a copy of the pattern around... weird... What's the instruction actually generated? rengolin: looks like it's a copy of the pattern around... weird... What's the instruction actually…
				zatrazzAuthorUnsubmitted Not Done Reply Inline Actions Indeed I used the previous function as example (truncStore.v4i32), and for current testing the instruction being generated is 'str'. zatrazz: Indeed I used the previous function as example (truncStore.v4i32), and for current testing the…
				efriedmaUnsubmitted Not Done Reply Inline Actions Please get rid of the "\|" in all the patterns in this file. efriedma: Please get rid of the "\|" in all the patterns in this file.
				zatrazzAuthorUnsubmitted Not Done Reply Inline Actions Right (for some reason I had in mind I already fixed it). zatrazz: Right (for some reason I had in mind I already fixed it).
				efriedmaUnsubmitted Not Done Reply Inline Actions Still not fixed...? efriedma: Still not fixed...?
				%b = trunc <4 x i32> %a to <4 x i8>
				store <4 x i8> %b, <4 x i8>* %result
				ret void
				}

	define void @truncStore.v8i16(<8 x i16> %a, <8 x i8>* %result) {			define void @truncStore.v8i16(<8 x i16> %a, <8 x i8>* %result) {
	; CHECK-LABEL: truncStore.v8i16:			; CHECK-LABEL: truncStore.v8i16:
	; CHECK: xtn v{{[0-9]+}}.8b, v{{[0-9]+}}.8h			; CHECK: xtn v{{[0-9]+}}.8b, v{{[0-9]+}}.8h
	; CHECK: {{st1 { v[0-9]+.8b }\|str d[0-9]+}}, [x{{[0-9]+\|sp}}]			; CHECK: {{st1 { v[0-9]+.8b }\|str d[0-9]+}}, [x{{[0-9]+\|sp}}]
	%b = trunc <8 x i16> %a to <8 x i8>			%b = trunc <8 x i16> %a to <8 x i8>
	store <8 x i8> %b, <8 x i8>* %result			store <8 x i8> %b, <8 x i8>* %result
	ret void			ret void
	}			}
	Show All 27 Lines

test/Transforms/LoopVectorize/AArch64/interleaved-vs-scalar.ll

	Show All 9 Lines

	%pair = type { i8, i8 }			%pair = type { i8, i8 }

	; CHECK-LABEL: test			; CHECK-LABEL: test
	; CHECK: Found an estimated cost of 20 for VF 2 For instruction: {{.*}} load i8			; CHECK: Found an estimated cost of 20 for VF 2 For instruction: {{.*}} load i8
	; CHECK: Found an estimated cost of 0 for VF 2 For instruction: {{.*}} load i8			; CHECK: Found an estimated cost of 0 for VF 2 For instruction: {{.*}} load i8
	; CHECK: vector.body			; CHECK: vector.body
	; CHECK: load i8			; CHECK: load i8
	; CHECK: load i8
	; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body			; CHECK: br i1 {{.*}}, label %middle.block, label %vector.body

	define void @test(%pair* %p, i64 %n) {			define void @test(%pair* %p, i64 %n) {
	entry:			entry:
	br label %for.body			br label %for.body

	for.body:			for.body:
	%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]			%i = phi i64 [ 0, %entry ], [ %i.next, %for.body ]
	Show All 12 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add custom lowering for v4i8 trunc store
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 152528

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

test/Analysis/CostModel/AArch64/store.ll

test/CodeGen/AArch64/neon-truncStore-extLoad.ll

test/Transforms/LoopVectorize/AArch64/interleaved-vs-scalar.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add custom lowering for v4i8 trunc storeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 152528

lib/Target/AArch64/AArch64ISelLowering.h

lib/Target/AArch64/AArch64ISelLowering.cpp

lib/Target/AArch64/AArch64TargetTransformInfo.cpp

test/Analysis/CostModel/AArch64/store.ll

test/CodeGen/AArch64/neon-truncStore-extLoad.ll

test/Transforms/LoopVectorize/AArch64/interleaved-vs-scalar.ll

[AArch64] Add custom lowering for v4i8 trunc store
ClosedPublic