This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Combine concat_vectors of scalars into build_vector.
ClosedPublic

Authored by ab on Apr 9 2015, 6:29 PM.

Download Raw Diff

Details

Reviewers

RKSimon
chandlerc

Commits

rGc984b90c86a5: [CodeGen] Re-apply r234809 (concat of scalars), with an x86_mmx fix.
rG8ebcdb3bc30f: [CodeGen] Combine concat_vectors of scalars into build_vector.
rL235072: [CodeGen] Re-apply r234809 (concat of scalars), with an x86_mmx fix.
rL234809: [CodeGen] Combine concat_vectors of scalars into build_vector.

Summary

Replaces D8884 and D8885, which focused on the wrong issue.

Combine something like:

(v8i8 concat_vectors (v2i8 bitcast (i16)) x4)

into:

(v8i8 (bitcast (v4i16 BUILD_VECTOR (i16) x4)))

Only when the concatenated vector type isn't legal. Also, I'm not sure if the bitcast to an integer scalar (rather than something smarter, say picking FP if all scalars are FP) is an issue?

Diff Detail

Repository: rL LLVM

Event Timeline

ab updated this revision to Diff 23553.Apr 9 2015, 6:29 PM

ab retitled this revision from to [CodeGen] Combine concat_vectors of scalars into build_vector..

ab updated this object.

ab edited the test plan for this revision. (Show Details)

ab added reviewers: RKSimon, chandlerc.

ab mentioned this in D8884: [CodeGen] Combine shuffle from concat+bitcast scalar to avoid the smaller vector type..

ab added a subscriber: Unknown Object (MLST).

ab mentioned this in D8885: [CodeGen] Combine small-element shuffles of scalar_to_vector in terms of the wider scalar..Apr 9 2015, 6:32 PM

Looks promising, thank you. I agree reducing unnecessary bitcasts (as you said scalar float -> integer ..... vector integer -> float) would make it easier for later optimizations.

Bitcast to FP or integer scalars depending on the most common type in the inputs.

The tests only cover 1 or 2 scalar inputs. More than 2 is more interesting, and is actually a good testcase for D8885, so I'll revive it and add those there.

-Ahmed

Bunch of nit-picky comments, but feel free to submit this once addressed. None of this seems terribly fundamental or important, and the transform seems quite good.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11510 ↗	(On Diff #23638)	Since this is essentially a new function, please use 'DL' to be consistent with the coding conventions.
11516–11524 ↗	(On Diff #23638)	Are integer vectors the common case here? I would somewhat imagine they are as they seem the most likely to be widened or narrowed, but I've no idea what tests you're looking at. The rest assumes the default should be integer, but invert it appropriately if that's not the case. I would construct a fresh UNDEF node here with the integer type so that if we don't need to cast to floating point, we can just skip ahead.
11527–11528 ↗	(On Diff #23638)	I'm not sure this is the right tactic. If we have a mixture of floating point and integer inputs, I think we should assume the entire vector is intended to be floating point. We don't form floating point vectors randomly anywhere, but the integers could come from generic loads or some such that had no typed operation. Does that make sense to you? It also simplifies the logic -- you can track whether we see any non-floating-point operands and any floating point operands, and if we see both, bitcast all the inputs.
11532–11534 ↗	(On Diff #23638)	Why std::for_each rather than a range based for loop? I would also test the type of the op before bitcasting it to avoid the (sadly non-trivial) machinery involved in constantly folding no-op bitcasts.

This revision is now accepted and ready to land.Apr 10 2015, 8:10 PM

ab added inline comments.Apr 13 2015, 11:34 AM

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11527–11528 ↗	(On Diff #23638)	I sat down and actually thought about this: I agree, if any of the inputs is floating point, all should be, but the real-world code I'm looking at doesn't involve floating point at all, so I might have the wrong reasons? We can either look at: the vectors: floating point vectors is a strong hint the result type should really be floating point as well, and I think this is what you're saying above. the scalars: I see only one case where a floating point scalar value is "unintentionally" bitcast to a vector, and that's a target with say illegal i64 but legal f64 loads. In that case, it's also a hint that we should prefer FP, but that's more because the integer scalar type is illegal than anything else, really. When the IR does an "intentional" bitcast, the AArch64 "_mixed_" testcase shows another reason this is a good idea: moving the scalars to the vector/FP bank once is better (modulo micro-architectures) than doing all the insertions directly from the GPRs. My conclusion is that if there's any floating point type anywhere (input or output), everything should be floating point. Thoughts? I'll commit something like that later, thanks for the review!
11532–11534 ↗	(On Diff #23638)	Oof, not sure what happened there, my brain was clearly off when I wrote the patch.

Closed by commit rL234809: [CodeGen] Combine concat_vectors of scalars into build_vector. (authored by ab). · Explain WhyApr 13 2015, 4:00 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

60 lines

test/

CodeGen/

AArch64/

concat_vector-scalar-combine.ll

125 lines

Diff 23703

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 11,493 Lines • ▼ Show 20 Lines	if (VecIn1.getNode()) {
Ops[0] = VecIn1;		Ops[0] = VecIn1;
Ops[1] = VecIn2;		Ops[1] = VecIn2;
return DAG.getVectorShuffle(VT, dl, Ops[0], Ops[1], &Mask[0]);		return DAG.getVectorShuffle(VT, dl, Ops[0], Ops[1], &Mask[0]);
}		}

return SDValue();		return SDValue();
}		}

		static SDValue combineConcatVectorOfScalars(SDNode *N, SelectionDAG &DAG) {
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		EVT OpVT = N->getOperand(0).getValueType();

		// If the operands are legal vectors, leave them alone.
		if (TLI.isTypeLegal(OpVT))
		return SDValue();

		SDLoc DL(N);
		EVT VT = N->getValueType(0);
		SmallVector<SDValue, 8> Ops;

		EVT SVT = EVT::getIntegerVT(*DAG.getContext(), OpVT.getSizeInBits());
		SDValue ScalarUndef = DAG.getNode(ISD::UNDEF, DL, SVT);

		// Keep track of what we encounter.
		bool AnyInteger = false;
		bool AnyFP = false;
		for (const SDValue &Op : N->ops()) {
		if (ISD::BITCAST == Op.getOpcode() &&
		!Op.getOperand(0).getValueType().isVector())
		Ops.push_back(Op.getOperand(0));
		else if (ISD::UNDEF == Op.getOpcode())
		Ops.push_back(ScalarUndef);
		else
		return SDValue();

		if (Ops.back().getValueType().isFloatingPoint())
		AnyFP = true;
		else
		AnyInteger = true;
		}

		// If any of the operands is a floating point scalar bitcast to a vector,
		// use floating point types throughout, and bitcast everything.
		// Replace UNDEFs by another scalar UNDEF node, of the final desired type.
		if (AnyFP) {
		SVT = EVT::getFloatingPointVT(OpVT.getSizeInBits());
		ScalarUndef = DAG.getNode(ISD::UNDEF, DL, SVT);
		if (AnyInteger) {
		for (SDValue &Op : Ops) {
		if (Op.getValueType() != SVT) {
		Op = DAG.getNode(ISD::BITCAST, DL, SVT, Op);
		if (Op.getOpcode() == ISD::UNDEF)
		Op = ScalarUndef;
		}
		}
		}
		}

		EVT VecVT = EVT::getVectorVT(*DAG.getContext(), SVT,
		VT.getSizeInBits() / SVT.getSizeInBits());
		return DAG.getNode(ISD::BITCAST, DL, VT,
		DAG.getNode(ISD::BUILD_VECTOR, DL, VecVT, Ops));
		}

SDValue DAGCombiner::visitCONCAT_VECTORS(SDNode *N) {		SDValue DAGCombiner::visitCONCAT_VECTORS(SDNode *N) {
// TODO: Check to see if this is a CONCAT_VECTORS of a bunch of		// TODO: Check to see if this is a CONCAT_VECTORS of a bunch of
// EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector		// EXTRACT_SUBVECTOR operations. If so, and if the EXTRACT_SUBVECTOR vector
// inputs come from at most two distinct vectors, turn this into a shuffle		// inputs come from at most two distinct vectors, turn this into a shuffle
// node.		// node.

// If we only have one input vector, we don't need to do any concatenation.		// If we only have one input vector, we don't need to do any concatenation.
if (N->getNumOperands() == 1)		if (N->getNumOperands() == 1)
▲ Show 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	for (const SDValue &Op : N->ops()) {
}		}
}		}

assert(VT.getVectorNumElements() == Opnds.size() &&		assert(VT.getVectorNumElements() == Opnds.size() &&
"Concat vector type mismatch");		"Concat vector type mismatch");
return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N), VT, Opnds);		return DAG.getNode(ISD::BUILD_VECTOR, SDLoc(N), VT, Opnds);
}		}

		// Fold CONCAT_VECTORS of only bitcast scalars (or undef) to BUILD_VECTOR.
		if (SDValue V = combineConcatVectorOfScalars(N, DAG))
		return V;

// Type legalization of vectors and DAG canonicalization of SHUFFLE_VECTOR		// Type legalization of vectors and DAG canonicalization of SHUFFLE_VECTOR
// nodes often generate nop CONCAT_VECTOR nodes.		// nodes often generate nop CONCAT_VECTOR nodes.
// Scan the CONCAT_VECTOR operands and look for a CONCAT operations that		// Scan the CONCAT_VECTOR operands and look for a CONCAT operations that
// place the incoming vectors at the exact same location.		// place the incoming vectors at the exact same location.
SDValue SingleSource = SDValue();		SDValue SingleSource = SDValue();
unsigned PartNumElem = N->getOperand(0).getValueType().getVectorNumElements();		unsigned PartNumElem = N->getOperand(0).getValueType().getVectorNumElements();

for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = N->getNumOperands(); i != e; ++i) {
▲ Show 20 Lines • Show All 1,756 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/concat_vector-scalar-combine.ll

				; RUN: llc < %s -mtriple aarch64-unknown-unknown -aarch64-neon-syntax=apple -asm-verbose=false \| FileCheck %s

				target datalayout = "e-m:o-i64:64-i128:128-n32:64-S128"

				; Test the (concat_vectors (bitcast (scalar)), ..) pattern.

				define <8 x i8> @test_concat_scalar_v2i8_to_v8i8_dup(i32 %x) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalar_v2i8_to_v8i8_dup:
				; CHECK-NEXT: dup.4h v0, w0
				; CHECK-NEXT: ret
				%t = trunc i32 %x to i16
				%0 = bitcast i16 %t to <2 x i8>
				%1 = shufflevector <2 x i8> %0, <2 x i8> undef, <8 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>
				ret <8 x i8> %1
				}

				define <8 x i8> @test_concat_scalar_v4i8_to_v8i8_dup(i32 %x) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalar_v4i8_to_v8i8_dup:
				; CHECK-NEXT: dup.2s v0, w0
				; CHECK-NEXT: ret
				%0 = bitcast i32 %x to <4 x i8>
				%1 = shufflevector <4 x i8> %0, <4 x i8> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
				ret <8 x i8> %1
				}

				define <8 x i16> @test_concat_scalar_v2i16_to_v8i16_dup(i32 %x) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalar_v2i16_to_v8i16_dup:
				; CHECK-NEXT: dup.4s v0, w0
				; CHECK-NEXT: ret
				%0 = bitcast i32 %x to <2 x i16>
				%1 = shufflevector <2 x i16> %0, <2 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 2, i32 0, i32 1, i32 0, i32 1>
				ret <8 x i16> %1
				}

				define <8 x i8> @test_concat_scalars_2x_v2i8_to_v8i8(i32 %x, i32 %y) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalars_2x_v2i8_to_v8i8:
				; CHECK-NEXT: ins.h v0[0], w0
				; CHECK-NEXT: ins.h v0[1], w1
				; CHECK-NEXT: ins.h v0[3], w1
				; CHECK-NEXT: ret
				%tx = trunc i32 %x to i16
				%ty = trunc i32 %y to i16
				%bx = bitcast i16 %tx to <2 x i8>
				%by = bitcast i16 %ty to <2 x i8>
				%r = shufflevector <2 x i8> %bx, <2 x i8> %by, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 2, i32 3>
				ret <8 x i8> %r
				}

				define <8 x i8> @test_concat_scalars_2x_v4i8_to_v8i8_dup(i32 %x, i32 %y) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalars_2x_v4i8_to_v8i8_dup:
				; CHECK-NEXT: fmov s0, w1
				; CHECK-NEXT: ins.s v0[1], w0
				; CHECK-NEXT: ret
				%bx = bitcast i32 %x to <4 x i8>
				%by = bitcast i32 %y to <4 x i8>
				%r = shufflevector <4 x i8> %bx, <4 x i8> %by, <8 x i32> <i32 4, i32 5, i32 6, i32 7, i32 0, i32 1, i32 2, i32 3>
				ret <8 x i8> %r
				}

				define <8 x i16> @test_concat_scalars_2x_v2i16_to_v8i16_dup(i32 %x, i32 %y) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalars_2x_v2i16_to_v8i16_dup:
				; CHECK-NEXT: fmov s0, w0
				; CHECK-NEXT: ins.s v0[1], w1
				; CHECK-NEXT: ins.s v0[2], w1
				; CHECK-NEXT: ins.s v0[3], w0
				; CHECK-NEXT: ret
				%bx = bitcast i32 %x to <2 x i16>
				%by = bitcast i32 %y to <2 x i16>
				%r = shufflevector <2 x i16> %bx, <2 x i16> %by, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 2, i32 3, i32 0, i32 1>
				ret <8 x i16> %r
				}

				; Also make sure we minimize bitcasts.

				; This is a pretty artificial testcase: make sure we bitcast to floating-point
				; if any of the scalars is floating-point.
				define <8 x i8> @test_concat_scalars_mixed_2x_v2i8_to_v8i8(float %dummy, i32 %x, half %y) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalars_mixed_2x_v2i8_to_v8i8:
				; CHECK-NEXT: fmov s[[X:[0-9]+]], w0
				; CHECK-NEXT: ins.h v0[0], v[[X]][0]
				; CHECK-NEXT: ins.h v0[1], v1[0]
				; CHECK-NEXT: ins.h v0[2], v[[X]][0]
				; CHECK-NEXT: ins.h v0[3], v1[0]
				; CHECK-NEXT: ret
				%t = trunc i32 %x to i16
				%0 = bitcast i16 %t to <2 x i8>
				%y0 = bitcast half %y to <2 x i8>
				%1 = shufflevector <2 x i8> %0, <2 x i8> %y0, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
				ret <8 x i8> %1
				}

				define <2 x float> @test_concat_scalars_fp_2x_v2i8_to_v8i8(float %dummy, half %x, half %y) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalars_fp_2x_v2i8_to_v8i8:
				; CHECK-NEXT: ins.h v0[0], v1[0]
				; CHECK-NEXT: ins.h v0[1], v2[0]
				; CHECK-NEXT: ins.h v0[2], v1[0]
				; CHECK-NEXT: ins.h v0[3], v2[0]
				; CHECK-NEXT: ret
				%0 = bitcast half %x to <2 x i8>
				%y0 = bitcast half %y to <2 x i8>
				%1 = shufflevector <2 x i8> %0, <2 x i8> %y0, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3>
				%2 = bitcast <8 x i8> %1 to <2 x float>
				ret <2 x float> %2
				}

				define <4 x float> @test_concat_scalar_fp_v2i16_to_v16i8_dup(float %x) #0 {
				entry:
				; CHECK-LABEL: test_concat_scalar_fp_v2i16_to_v16i8_dup:
				; CHECK-NEXT: dup.4s v0, v0[0]
				; CHECK-NEXT: ret
				%0 = bitcast float %x to <2 x i16>
				%1 = shufflevector <2 x i16> %0, <2 x i16> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 2, i32 0, i32 1, i32 0, i32 1>
				%2 = bitcast <8 x i16> %1 to <4 x float>
				ret <4 x float> %2
				}

				attributes #0 = { nounwind }