Download Raw Diff

Details

Reviewers

samparker
dmgreen
SjoerdMeijer
t.p.northover
olista01
simon_tatham

Commits

rGf1cdd95a2fe7: [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection
rL371218: [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection

Summary

This patch sinks add/mul(shufflevector(insertelement(...), ...), ...) into the basic block in which they are used so that they can then be selected together. This is useful for various MVE instructions, such as vmla and others that take R registers.

Loop tests have been added to the vmla test file to make sure vmlas are generated in loops.

Diff Detail

Repository: rL LLVM

Event Timeline

samtebbs created this revision.Aug 15 2019, 8:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 15 2019, 8:14 AM

Herald added subscribers: hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

samtebbs added a child revision: D66297: [ARM] Select vmla.Aug 15 2019, 8:17 AM

Looks useful. What happens if there are multiple uses of the splat? Do things get handled correctly then?

llvm/lib/Target/ARM/ARMISelLowering.cpp
14263 ↗	(On Diff #215402)	Do we need to check that this is an instruction, or will the match handle that for us?
14268 ↗	(On Diff #215402)	ShuffleOp1 isn't needed?

samtebbs updated this revision to Diff 215596.Aug 16 2019, 7:24 AM

samtebbs marked 2 inline comments as done.

samtebbs added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
14263 ↗	(On Diff #215402)	We don't in fact, will remove.
14268 ↗	(On Diff #215402)	It isn't :)

samtebbs updated this revision to Diff 216837.Aug 23 2019, 7:04 AM

samtebbs marked 2 inline comments as done.

dmgreen added inline comments.Aug 27 2019, 12:13 PM

llvm/lib/Target/ARM/ARMISelLowering.cpp
14317 ↗	(On Diff #216837)	Is it possible to come up with a way to not have to repeat which instructions can be sunk to? Possibly considering things like vsub, which can only be sunk if the operand is the second instruction (although that is not your problem here, it will likely be someones problem soon enough).
llvm/test/Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll
129 ↗	(On Diff #216837)	Can you change this to the first operand of the sub. There is technically an MVE VSUB instruction that takes a grp as the second operand, and I presume eventually we will be looking at making that perform the same trick we have here for add and mul.

samtebbs updated this revision to Diff 217668.Aug 28 2019, 9:15 AM

samtebbs marked 2 inline comments as done.

dmgreen added inline comments.Aug 28 2019, 10:47 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
14325 ↗	(On Diff #217668)	Can you add a comment saying that we want all uses to be sunk (else we'll just end up with the same value in both gpr and vector regs!) Also if this loop is moved to after the match check below, we won't need to do the O(n) loop if the O(1) match fails. Also negating the condition and returning early is apparently a good thing in llvm.
14327 ↗	(On Diff #217668)	I think this should be checking the operand of the use, which may not be the same for the original instruction.

samtebbs marked 2 inline comments as done.Aug 29 2019, 2:30 AM

samtebbs added inline comments.

llvm/lib/Target/ARM/ARMISelLowering.cpp
14325 ↗	(On Diff #217668)	Agreed.
14327 ↗	(On Diff #217668)	The Op value here isn't actually used by the lambda and is only passed as a reference so that it can be set.

samtebbs updated this revision to Diff 217818.Aug 29 2019, 3:29 AM

samtebbs marked an inline comment as done.

samtebbs updated this revision to Diff 217821.Aug 29 2019, 3:33 AM

samtebbs marked an inline comment as done.

samparker added inline comments.Sep 2 2019, 7:35 AM

llvm/lib/Target/ARM/ARMISelLowering.cpp
14317 ↗	(On Diff #217821)	Hoist this condition and exit early if we don't have MVE.
14329 ↗	(On Diff #217821)	just cast is fine.
14332 ↗	(On Diff #217821)	cast is fine.

samtebbs updated this revision to Diff 218411.Sep 3 2019, 2:06 AM

samtebbs marked 3 inline comments as done.

LGTM. But please address comments before committing.

llvm/lib/Target/ARM/ARMISelLowering.cpp
14331 ↗	(On Diff #218411)	But still use llvm style cast... = cast<Instruction>
llvm/test/Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll
1 ↗	(On Diff #218411)	Since we have codegen support for vmla, could you also add an llc line here to test that vmlas are generated in loops.

This revision is now accepted and ready to land.Sep 3 2019, 2:41 AM

Yeah. This looks good.

We should wait to commit until we have the instructions that make use of it though. I think Oliver is working on a few patterns for VADD, VMUL and VSUB, which would cover the cases here.

Also can you add a couple of tests for sub (on there own), that the first operand is not sunk but the second operand is.

llvm/lib/Target/ARM/ARMISelLowering.cpp
14326 ↗	(On Diff #218411)	Make sure you run clang-format.

samtebbs updated this revision to Diff 218671.Sep 4 2019, 5:59 AM

samtebbs marked 5 inline comments as done.

samtebbs edited the summary of this revision. (Show Details)

samtebbs added inline comments.Sep 4 2019, 7:17 AM

llvm/test/Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll
1 ↗	(On Diff #218411)	I've added loop tests to the vmla test file

Sure. LGTM

Very nice. Oliver went and put together a few patches for mul, add and sub, in https://reviews.llvm.org/D67268 and related. So we now (or soon will) produce the other instruction this can effect. This is good to go I think.

In D66295#1660956, @dmgreen wrote:

Very nice. Oliver went and put together a few patches for mul, add and sub, in https://reviews.llvm.org/D67268 and related. So we now (or soon will) produce the other instruction this can effect. This is good to go I think.

Thanks!

Closed by commit rL371218: [ARM] Sink add/mul(shufflevector(insertelement())) for MVE instruction selection (authored by samtebbs). · Explain WhySep 6 2019, 9:00 AM

This revision was automatically updated to reflect the committed changes.

Diff 219123

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,334 Lines • ▼ Show 20 Lines	static bool areExtractExts(Value Ext1, Value Ext2) {
return true;		return true;
}		}

/// Check if sinking \p I's operands to I's basic block is profitable, because		/// Check if sinking \p I's operands to I's basic block is profitable, because
/// the operands can be folded into a target instruction, e.g.		/// the operands can be folded into a target instruction, e.g.
/// sext/zext can be folded into vsubl.		/// sext/zext can be folded into vsubl.
bool ARMTargetLowering::shouldSinkOperands(Instruction *I,		bool ARMTargetLowering::shouldSinkOperands(Instruction *I,
SmallVectorImpl<Use *> &Ops) const {		SmallVectorImpl<Use *> &Ops) const {
if (!Subtarget->hasNEON() \|\| !I->getType()->isVectorTy())		if (!I->getType()->isVectorTy())
return false;		return false;

		if (Subtarget->hasNEON()) {
switch (I->getOpcode()) {		switch (I->getOpcode()) {
case Instruction::Sub:		case Instruction::Sub:
case Instruction::Add: {		case Instruction::Add: {
if (!areExtractExts(I->getOperand(0), I->getOperand(1)))		if (!areExtractExts(I->getOperand(0), I->getOperand(1)))
return false;		return false;
Ops.push_back(&I->getOperandUse(0));		Ops.push_back(&I->getOperandUse(0));
Ops.push_back(&I->getOperandUse(1));		Ops.push_back(&I->getOperandUse(1));
return true;		return true;
}		}
default:		default:
return false;		return false;
}		}
		}

		if (!Subtarget->hasMVEIntegerOps())
		return false;

		auto IsSinker = [](Instruction *I, int Operand) {
		switch (I->getOpcode()) {
		case Instruction::Add:
		case Instruction::Mul:
		return true;
		case Instruction::Sub:
		return Operand == 1;
		default:
		return false;
		}
		};

		int Op = 0;
		if (!isa<ShuffleVectorInst>(I->getOperand(Op)))
		Op = 1;
		if (!IsSinker(I, Op))
		return false;
		if (!match(I->getOperand(Op),
		m_ShuffleVector(m_InsertElement(m_Undef(), m_Value(), m_ZeroInt()),
		m_Undef(), m_Zero()))) {
		return false;
		}
		Instruction *Shuffle = cast<Instruction>(I->getOperand(Op));
		// All uses of the shuffle should be sunk to avoid duplicating it across gpr
		// and vector registers
		for (Use &U : Shuffle->uses()) {
		Instruction *Insn = cast<Instruction>(U.getUser());
		if (!IsSinker(Insn, U.getOperandNo()))
return false;		return false;
}		}
		Ops.push_back(&Shuffle->getOperandUse(0));
		Ops.push_back(&I->getOperandUse(Op));
		return true;
		}

bool ARMTargetLowering::isVectorLoadExtDesirable(SDValue ExtVal) const {		bool ARMTargetLowering::isVectorLoadExtDesirable(SDValue ExtVal) const {
EVT VT = ExtVal.getValueType();		EVT VT = ExtVal.getValueType();

if (!isTypeLegal(VT))		if (!isTypeLegal(VT))
return false;		return false;

// Don't create a loadext if we can fold the extension into a wide/long		// Don't create a loadext if we can fold the extension into a wide/long
▲ Show 20 Lines • Show All 2,319 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/Thumb2/mve-vmla.ll

	Show First 20 Lines • Show All 72 Lines • ▼ Show 20 Lines
	entry:			entry:
	%0 = insertelement <16 x i8> undef, i8 %X, i32 0			%0 = insertelement <16 x i8> undef, i8 %X, i32 0
	%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> zeroinitializer			%1 = shufflevector <16 x i8> %0, <16 x i8> undef, <16 x i32> zeroinitializer
	%2 = mul nsw <16 x i8> %1, %B			%2 = mul nsw <16 x i8> %1, %B
	%3 = add nsw <16 x i8> %2, %A			%3 = add nsw <16 x i8> %2, %A
	ret <16 x i8> %3			ret <16 x i8> %3
	}			}

				define void @vmla32_in_loop(i32* %s1, i32 %x, i32* %d, i32 %n) {
				; CHECK-LABEL: vmla32_in_loop:
				; CHECK: .LBB6_1: @ %vector.body
				; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: vldrw.u32 q0, [r0, #16]!
				; CHECK-NEXT: vldrw.u32 q1, [r2, #16]!
				; CHECK-NEXT: vmla.u32 q1, q0, r1
				; CHECK-NEXT: vstrw.32 q1, [r2]
				; CHECK-NEXT: le lr, .LBB6_1
				; CHECK-NEXT: @ %bb.2: @ %for.cond.cleanup
				; CHECK-NEXT: pop {r7, pc}
				entry:
				%cmp6 = icmp sgt i32 %n, 0
				br i1 %cmp6, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %for.body.preheader
				%n.vec = and i32 %n, -4
				%broadcast.splatinsert8 = insertelement <4 x i32> undef, i32 %x, i32 0
				%broadcast.splat9 = shufflevector <4 x i32> %broadcast.splatinsert8, <4 x i32> undef, <4 x i32> zeroinitializer
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%0 = getelementptr inbounds i32, i32* %s1, i32 %index
				%1 = bitcast i32* %0 to <4 x i32>*
				%wide.load = load <4 x i32>, <4 x i32>* %1, align 4
				%2 = mul nsw <4 x i32> %wide.load, %broadcast.splat9
				%3 = getelementptr inbounds i32, i32* %d, i32 %index
				%4 = bitcast i32* %3 to <4 x i32>*
				%wide.load10 = load <4 x i32>, <4 x i32>* %4, align 4
				%5 = add nsw <4 x i32> %wide.load10, %2
				%6 = bitcast i32* %3 to <4 x i32>*
				store <4 x i32> %5, <4 x i32>* %6, align 4
				%index.next = add i32 %index, 4
				%7 = icmp eq i32 %index.next, %n.vec
				br i1 %7, label %for.cond.cleanup, label %vector.body

				for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
				ret void
				}

				define void @vmla16_in_loop(i16* %s1, i16 %x, i16* %d, i32 %n) {
				; CHECK-LABEL: vmla16_in_loop:
				; CHECK: .LBB7_1: @ %vector.body
				; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: vldrh.u16 q0, [r0, #16]!
				; CHECK-NEXT: vldrh.u16 q1, [r2, #16]!
				; CHECK-NEXT: vmla.u16 q1, q0, r1
				; CHECK-NEXT: vstrh.16 q1, [r2]
				; CHECK-NEXT: le lr, .LBB7_1
				; CHECK-NEXT: @ %bb.2: @ %for.cond.cleanup
				; CHECK-NEXT: pop {r7, pc}
				entry:
				%cmp6 = icmp sgt i32 %n, 0
				br i1 %cmp6, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %for.body.preheader
				%n.vec = and i32 %n, -8
				%broadcast.splatinsert11 = insertelement <8 x i16> undef, i16 %x, i32 0
				%broadcast.splat12 = shufflevector <8 x i16> %broadcast.splatinsert11, <8 x i16> undef, <8 x i32> zeroinitializer
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%0 = getelementptr inbounds i16, i16* %s1, i32 %index
				%1 = bitcast i16* %0 to <8 x i16>*
				%wide.load = load <8 x i16>, <8 x i16>* %1, align 2
				%2 = mul <8 x i16> %wide.load, %broadcast.splat12
				%3 = getelementptr inbounds i16, i16* %d, i32 %index
				%4 = bitcast i16* %3 to <8 x i16>*
				%wide.load13 = load <8 x i16>, <8 x i16>* %4, align 2
				%5 = add <8 x i16> %2, %wide.load13
				%6 = bitcast i16* %3 to <8 x i16>*
				store <8 x i16> %5, <8 x i16>* %6, align 2
				%index.next = add i32 %index, 8
				%7 = icmp eq i32 %index.next, %n.vec
				br i1 %7, label %for.cond.cleanup, label %vector.body

				for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
				ret void
				}

				define void @vmla8_in_loop(i8* %s1, i8 %x, i8* %d, i32 %n) {
				; CHECK-LABEL: vmla8_in_loop:
				; CHECK: .LBB8_1: @ %vector.body
				; CHECK-NEXT: @ =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: vldrh.u16 q0, [r0, #8]!
				; CHECK-NEXT: vldrh.u16 q1, [r2, #8]!
				; CHECK-NEXT: vmla.u8 q1, q0, r1
				; CHECK-NEXT: vstrh.16 q1, [r2]
				; CHECK-NEXT: le lr, .LBB8_1
				; CHECK-NEXT: @ %bb.2: @ %for.cond.cleanup
				; CHECK-NEXT: pop {r7, pc}
				entry:
				%cmp6 = icmp sgt i32 %n, 0
				br i1 %cmp6, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %for.body.preheader
				%n.vec = and i32 %n, -8
				%broadcast.splatinsert11 = insertelement <16 x i8> undef, i8 %x, i32 0
				%broadcast.splat12 = shufflevector <16 x i8> %broadcast.splatinsert11, <16 x i8> undef, <16 x i32> zeroinitializer
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%0 = getelementptr inbounds i8, i8* %s1, i32 %index
				%1 = bitcast i8* %0 to <16 x i8>*
				%wide.load = load <16 x i8>, <16 x i8>* %1, align 2
				%2 = mul <16 x i8> %wide.load, %broadcast.splat12
				%3 = getelementptr inbounds i8, i8* %d, i32 %index
				%4 = bitcast i8* %3 to <16 x i8>*
				%wide.load13 = load <16 x i8>, <16 x i8>* %4, align 2
				%5 = add <16 x i8> %2, %wide.load13
				%6 = bitcast i8* %3 to <16 x i8>*
				store <16 x i8> %5, <16 x i8>* %6, align 2
				%index.next = add i32 %index, 8
				%7 = icmp eq i32 %index.next, %n.vec
				br i1 %7, label %for.cond.cleanup, label %vector.body

				for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
				ret void
				}

llvm/trunk/test/Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll

				; RUN: opt -mtriple=thumbv8.1m.main-arm-none-eabi -mattr=+mve.fp < %s -codegenprepare -S \| FileCheck -check-prefix=CHECK %s

				define void @sink_add_mul(i32* %s1, i32 %x, i32* %d, i32 %n) {
				; CHECK-LABEL: @sink_add_mul(
				; CHECK: vector.ph:
				; CHECK-NOT: [[BROADCAST_SPLATINSERT8:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK-NOT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK: vector.body:
				; CHECK: [[TMP2:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> zeroinitializer
				;
				entry:
				%cmp6 = icmp sgt i32 %n, 0
				br i1 %cmp6, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %for.body.preheader
				%n.vec = and i32 %n, -4
				%broadcast.splatinsert8 = insertelement <4 x i32> undef, i32 %x, i32 0
				%broadcast.splat9 = shufflevector <4 x i32> %broadcast.splatinsert8, <4 x i32> undef, <4 x i32> zeroinitializer
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%0 = getelementptr inbounds i32, i32* %s1, i32 %index
				%1 = bitcast i32* %0 to <4 x i32>*
				%wide.load = load <4 x i32>, <4 x i32>* %1, align 4
				%2 = mul nsw <4 x i32> %wide.load, %broadcast.splat9
				%3 = getelementptr inbounds i32, i32* %d, i32 %index
				%4 = bitcast i32* %3 to <4 x i32>*
				%wide.load10 = load <4 x i32>, <4 x i32>* %4, align 4
				%5 = add nsw <4 x i32> %wide.load10, %2
				%6 = bitcast i32* %3 to <4 x i32>*
				store <4 x i32> %5, <4 x i32>* %6, align 4
				%index.next = add i32 %index, 4
				%7 = icmp eq i32 %index.next, %n.vec
				br i1 %7, label %for.cond.cleanup, label %vector.body

				for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
				ret void
				}

				define void @sink_add_mul_multiple(i32* %s1, i32* %s2, i32 %x, i32* %d, i32* %d2, i32 %n) {
				; CHECK-LABEL: @sink_add_mul_multiple(
				; CHECK: vector.ph:
				; CHECK-NOT: [[BROADCAST_SPLATINSERT8:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK-NOT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK: vector.body:
				; CHECK: [[TMP2:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK: [[TMP11:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> zeroinitializer
				;
				entry:
				%cmp13 = icmp sgt i32 %n, 0
				br i1 %cmp13, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %for.body.preheader
				%n.vec = and i32 %n, -4
				%broadcast.splatinsert15 = insertelement <4 x i32> undef, i32 %x, i32 0
				%broadcast.splat16 = shufflevector <4 x i32> %broadcast.splatinsert15, <4 x i32> undef, <4 x i32> zeroinitializer
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%0 = getelementptr inbounds i32, i32* %s1, i32 %index
				%1 = bitcast i32* %0 to <4 x i32>*
				%wide.load = load <4 x i32>, <4 x i32>* %1, align 4
				%2 = mul nsw <4 x i32> %wide.load, %broadcast.splat16
				%3 = getelementptr inbounds i32, i32* %d, i32 %index
				%4 = bitcast i32* %3 to <4 x i32>*
				%wide.load17 = load <4 x i32>, <4 x i32>* %4, align 4
				%5 = add nsw <4 x i32> %wide.load17, %2
				%6 = bitcast i32* %3 to <4 x i32>*
				store <4 x i32> %5, <4 x i32>* %6, align 4
				%7 = getelementptr inbounds i32, i32* %s2, i32 %index
				%8 = bitcast i32* %7 to <4 x i32>*
				%wide.load18 = load <4 x i32>, <4 x i32>* %8, align 4
				%9 = mul nsw <4 x i32> %wide.load18, %broadcast.splat16
				%10 = getelementptr inbounds i32, i32* %d2, i32 %index
				%11 = bitcast i32* %10 to <4 x i32>*
				%wide.load19 = load <4 x i32>, <4 x i32>* %11, align 4
				%12 = add nsw <4 x i32> %wide.load19, %9
				%13 = bitcast i32* %10 to <4 x i32>*
				store <4 x i32> %12, <4 x i32>* %13, align 4
				%index.next = add i32 %index, 4
				%14 = icmp eq i32 %index.next, %n.vec
				br i1 %14, label %for.cond.cleanup, label %vector.body

				for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
				ret void
				}


				define void @sink_add_sub_unsinkable(i32* %s1, i32* %s2, i32 %x, i32* %d, i32* %d2, i32 %n) {
				; CHECK-LABEL: @sink_add_sub_unsinkable(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP13:%.]] = icmp sgt i32 [[N:%.]], 0
				; CHECK-NEXT: br i1 [[CMP13]], label [[VECTOR_PH:%.]], label [[FOR_COND_CLEANUP:%.]]
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[N]], -4
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT15:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT16:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT15]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				;
				entry:
				%cmp13 = icmp sgt i32 %n, 0
				br i1 %cmp13, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %for.body.preheader
				%n.vec = and i32 %n, -4
				%broadcast.splatinsert15 = insertelement <4 x i32> undef, i32 %x, i32 0
				%broadcast.splat16 = shufflevector <4 x i32> %broadcast.splatinsert15, <4 x i32> undef, <4 x i32> zeroinitializer
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%0 = getelementptr inbounds i32, i32* %s1, i32 %index
				%1 = bitcast i32* %0 to <4 x i32>*
				%wide.load = load <4 x i32>, <4 x i32>* %1, align 4
				%2 = mul nsw <4 x i32> %wide.load, %broadcast.splat16
				%3 = getelementptr inbounds i32, i32* %d, i32 %index
				%4 = bitcast i32* %3 to <4 x i32>*
				%wide.load17 = load <4 x i32>, <4 x i32>* %4, align 4
				%5 = add nsw <4 x i32> %wide.load17, %2
				%6 = bitcast i32* %3 to <4 x i32>*
				store <4 x i32> %5, <4 x i32>* %6, align 4
				%7 = getelementptr inbounds i32, i32* %s2, i32 %index
				%8 = bitcast i32* %7 to <4 x i32>*
				%wide.load18 = load <4 x i32>, <4 x i32>* %8, align 4
				%9 = sub nsw <4 x i32> %broadcast.splat16, %wide.load18
				%10 = getelementptr inbounds i32, i32* %d2, i32 %index
				%11 = bitcast i32* %10 to <4 x i32>*
				%wide.load19 = load <4 x i32>, <4 x i32>* %11, align 4
				%12 = add nsw <4 x i32> %wide.load19, %9
				%13 = bitcast i32* %10 to <4 x i32>*
				store <4 x i32> %12, <4 x i32>* %13, align 4
				%index.next = add i32 %index, 4
				%14 = icmp eq i32 %index.next, %n.vec
				br i1 %14, label %for.cond.cleanup, label %vector.body

				for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
				ret void
				}

				define void @sink_sub(i32* %s1, i32 %x, i32* %d, i32 %n) {
				; CHECK-LABEL: @sink_sub(
				; CHECK: vector.ph:
				; CHECK-NOT: [[BROADCAST_SPLATINSERT8:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK-NOT: [[BROADCAST_SPLAT9:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT8]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK: vector.body:
				; CHECK: [[TMP2:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> zeroinitializer
				;
				entry:
				%cmp6 = icmp sgt i32 %n, 0
				br i1 %cmp6, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %for.body.preheader
				%n.vec = and i32 %n, -4
				%broadcast.splatinsert8 = insertelement <4 x i32> undef, i32 %x, i32 0
				%broadcast.splat9 = shufflevector <4 x i32> %broadcast.splatinsert8, <4 x i32> undef, <4 x i32> zeroinitializer
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%0 = getelementptr inbounds i32, i32* %s1, i32 %index
				%1 = bitcast i32* %0 to <4 x i32>*
				%wide.load = load <4 x i32>, <4 x i32>* %1, align 4
				%2 = sub nsw <4 x i32> %wide.load, %broadcast.splat9
				%3 = getelementptr inbounds i32, i32* %d, i32 %index
				%4 = bitcast i32* %3 to <4 x i32>*
				store <4 x i32> %2, <4 x i32>* %4, align 4
				%index.next = add i32 %index, 4
				%5 = icmp eq i32 %index.next, %n.vec
				br i1 %5, label %for.cond.cleanup, label %vector.body

				for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
				ret void
				}

				define void @sink_sub_unsinkable(i32* %s1, i32 %x, i32* %d, i32 %n) {
				entry:
				; CHECK-LABEL: @sink_sub_unsinkable(
				; CHECK: vector.ph:
				; CHECK-NEXT: [[N_VEC:%.*]] = and i32 [[N]], -4
				; CHECK-NEXT: [[BROADCAST_SPLATINSERT15:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK-NEXT: [[BROADCAST_SPLAT16:%.*]] = shufflevector <4 x i32> [[BROADCAST_SPLATINSERT15]], <4 x i32> undef, <4 x i32> zeroinitializer
				; CHECK-NEXT: br label [[VECTOR_BODY:%.*]]
				; CHECK: vector.body:
				; CHECK-NOT: [[TMP2:%.]] = insertelement <4 x i32> undef, i32 [[X:%.]], i32 0
				; CHECK-NOT: [[TMP3:%.*]] = shufflevector <4 x i32> [[TMP2]], <4 x i32> undef, <4 x i32> zeroinitializer
				;
				%cmp6 = icmp sgt i32 %n, 0
				br i1 %cmp6, label %vector.ph, label %for.cond.cleanup

				vector.ph: ; preds = %for.body.preheader
				%n.vec = and i32 %n, -4
				%broadcast.splatinsert8 = insertelement <4 x i32> undef, i32 %x, i32 0
				%broadcast.splat9 = shufflevector <4 x i32> %broadcast.splatinsert8, <4 x i32> undef, <4 x i32> zeroinitializer
				br label %vector.body

				vector.body: ; preds = %vector.body, %vector.ph
				%index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				%0 = getelementptr inbounds i32, i32* %s1, i32 %index
				%1 = bitcast i32* %0 to <4 x i32>*
				%wide.load = load <4 x i32>, <4 x i32>* %1, align 4
				%2 = sub nsw <4 x i32> %broadcast.splat9, %wide.load
				%3 = getelementptr inbounds i32, i32* %d, i32 %index
				%4 = bitcast i32* %3 to <4 x i32>*
				store <4 x i32> %2, <4 x i32>* %4, align 4
				%index.next = add i32 %index, 4
				%5 = icmp eq i32 %index.next, %n.vec
				br i1 %5, label %for.cond.cleanup, label %vector.body

				for.cond.cleanup: ; preds = %for.body, %middle.block, %entry
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Sink add/mul(shufflevector(insertelement(...), ...), ...) for MVE instruction selection
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 219123

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

llvm/trunk/test/CodeGen/Thumb2/mve-vmla.ll

llvm/trunk/test/Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll

This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Sink add/mul(shufflevector(insertelement(...), ...), ...) for MVE instruction selectionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 219123

llvm/trunk/lib/Target/ARM/ARMISelLowering.cpp

llvm/trunk/test/CodeGen/Thumb2/mve-vmla.ll

llvm/trunk/test/Transforms/CodeGenPrepare/ARM/sink-add-mul-shufflevector.ll

[ARM] Sink add/mul(shufflevector(insertelement(...), ...), ...) for MVE instruction selection
ClosedPublic