This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
4/9
WebAssemblyISelLowering.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
simd-build-vector.ll
1/3
simd-concat.ll

Differential D100018

[WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR
ClosedPublic

Authored by tlively on Apr 6 2021, 10:30 PM.

Download Raw Diff

Details

Reviewers

aheejin
dschuff

Commits

rGf30c429da63a: [WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR

Summary

When lowering a BUILD_VECTOR SDNode, we choose among various possible vector
creation instructions in an attempt to minimize the total number of instructions
used. We previously considered using swizzles, consts, and splats, and this
patch adds shuffles as well. A common pattern that now lowers to shuffles is
when two 64-bit vectors are concatenated. Previously, concatenations generally
lowered to sequences of extract_lane and replace_lane instructions when they
could have been a single shuffle.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tlively created this revision.Apr 6 2021, 10:30 PM

Herald added subscribers: wingo, ecnelises, sunfish and 3 others. · View Herald TranscriptApr 6 2021, 10:30 PM

tlively requested review of this revision.Apr 6 2021, 10:30 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 6 2021, 10:30 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

tlively added a subscriber: steven-johnson.Apr 6 2021, 10:31 PM

Harbormaster completed remote builds in B97452: Diff 335723.Apr 6 2021, 11:11 PM

srj added a subscriber: srj.Apr 7 2021, 10:08 AM

This definitely improves codegen for Halide substantially.

dschuff added inline comments.Apr 7 2021, 5:10 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1695	Unrelated to this CL but what exactly is the difference between a swizzle and a shuffle?

Otherwise the code looks good though

This revision is now accepted and ready to land.Apr 7 2021, 5:16 PM

aheejin added inline comments.Apr 8 2021, 5:55 AM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1640	The source vector must not have more lanes than the dest. Why? Shuffles can only include half of elements anyway, no?
1695	Shuffle: https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#shuffling-using-immediate-indices Swizzle: https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices
1713–1731	Does this handle when it is most beneficial to use two same vectors as both sources?
1737–1739	Can we have some more comments on why we prefer this order?
llvm/test/CodeGen/WebAssembly/simd-concat.ll
76	I might be mistaken, but isn't this loss of data? v2i32 is a full 128bits vector and the result of `shufflevector` contain the whole two vectors but the result of `i8x16.shuffle` contains only half of it.

tlively removed a subscriber: steven-johnson.Apr 8 2021, 11:20 AM

tlively added inline comments.Apr 8 2021, 5:11 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1640	If the source vector has more lanes than the destination, then those lanes would be narrower. The shufflevector SDNode uses indices based on the wider lanes of the destination type, so it cannot express the extension and use of smaller lanes in one of the source vectors. In contrast, if the source vector has fewer lanes than the destination, then those lanes would be wider. The extract_vector_elt operand to the BUILD_VECTOR node would therefore be doing an implicit truncate, and by scaling up the indices for the smaller destination lanes, the truncated portions of the wider source lanes can be correctly pulled in. I'll expand on this comment a bit.
1695	In a single sentence: Shuffling uses a static array of indices to draw lanes from two source vectors while swizzling uses the lanes of one source vector as indices into a second source vector.
1713–1731	The shuffle can draw an arbitrary number of lanes from each source, so it is never necessary to use the same source for both operands. That being said, if there is only one available source, ShuffleSrc2 will be set to `undef` here and will be later made a copy of the first operand, for lack of any better vector to use there.
1737–1739	It's more or less arbitrary, but now that I'm thinking about it, it would probably be better to prefer the simpler/smaller operations like splat over the more complex operations like shuffles and swizzles. I'll change this order in a follow-up PR and add a comment there.
llvm/test/CodeGen/WebAssembly/simd-concat.ll
76	v2i32 is only 64 bits, but it is represented in Wasm using the low 32 bits of each lane in an i64x2 vector. The i8x16.shuffle here pulls in those low 32 bits from each lane and leaves the unused high 32 bits.

Improve comment, fix name

Harbormaster completed remote builds in B97853: Diff 336277.Apr 8 2021, 6:18 PM

aheejin accepted this revision.Apr 9 2021, 6:10 AM

aheejin added inline comments.

llvm/test/CodeGen/WebAssembly/simd-concat.ll
76	Ah right, I mistook this for `v4i32`.. Sorry.

Closed by commit rGf30c429da63a: [WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR (authored by tlively). · Explain WhyApr 9 2021, 11:22 AM

This revision was automatically updated to reflect the committed changes.

tlively added a commit: rGf30c429da63a: [WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR.

Revision Contents

Path

Size

llvm/

lib/

Target/

WebAssembly/

WebAssemblyISelLowering.cpp

95 lines

test/

CodeGen/

WebAssembly/

simd-build-vector.ll

16 lines

simd-concat.ll

79 lines

Diff 336522

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 1,598 Lines • ▼ Show 20 Lines	SDValue WebAssemblyTargetLowering::LowerBUILD_VECTOR(SDValue Op,

// BUILD_VECTORs are lowered to the instruction that initializes the highest		// BUILD_VECTORs are lowered to the instruction that initializes the highest
// possible number of lanes at once followed by a sequence of replace_lane		// possible number of lanes at once followed by a sequence of replace_lane
// instructions to individually initialize any remaining lanes.		// instructions to individually initialize any remaining lanes.

// TODO: Tune this. For example, lanewise swizzling is very expensive, so		// TODO: Tune this. For example, lanewise swizzling is very expensive, so
// swizzled lanes should be given greater weight.		// swizzled lanes should be given greater weight.

// TODO: Investigate building vectors by shuffling together vectors built by		// TODO: Investigate looping rather than always extracting/replacing specific
// separately specialized means.		// lanes to fill gaps.

auto IsConstant = [](const SDValue &V) {		auto IsConstant = [](const SDValue &V) {
return V.getOpcode() == ISD::Constant \|\| V.getOpcode() == ISD::ConstantFP;		return V.getOpcode() == ISD::Constant \|\| V.getOpcode() == ISD::ConstantFP;
};		};

// Returns the source vector and index vector pair if they exist. Checks for:		// Returns the source vector and index vector pair if they exist. Checks for:
// (extract_vector_elt		// (extract_vector_elt
// $src,		// $src,
Show All 14 Lines	auto GetSwizzleSrcs = [](size_t I, const SDValue &Lane) {
if (SwizzleSrc.getValueType() != MVT::v16i8 \|\|		if (SwizzleSrc.getValueType() != MVT::v16i8 \|\|
SwizzleIndices.getValueType() != MVT::v16i8 \|\|		SwizzleIndices.getValueType() != MVT::v16i8 \|\|
Index->getOperand(1)->getOpcode() != ISD::Constant \|\|		Index->getOperand(1)->getOpcode() != ISD::Constant \|\|
Index->getConstantOperandVal(1) != I)		Index->getConstantOperandVal(1) != I)
return Bail;		return Bail;
return std::make_pair(SwizzleSrc, SwizzleIndices);		return std::make_pair(SwizzleSrc, SwizzleIndices);
};		};

		// If the lane is extracted from another vector at a constant index, return
		// that vector. The source vector must not have more lanes than the dest
		aheejinUnsubmitted Not Done Reply Inline Actions The source vector must not have more lanes than the dest. Why? Shuffles can only include half of elements anyway, no? aheejin: > The source vector must not have more lanes than the dest. Why? Shuffles can only include…
		tlivelyAuthorUnsubmitted Done Reply Inline Actions If the source vector has more lanes than the destination, then those lanes would be narrower. The shufflevector SDNode uses indices based on the wider lanes of the destination type, so it cannot express the extension and use of smaller lanes in one of the source vectors. In contrast, if the source vector has fewer lanes than the destination, then those lanes would be wider. The extract_vector_elt operand to the BUILD_VECTOR node would therefore be doing an implicit truncate, and by scaling up the indices for the smaller destination lanes, the truncated portions of the wider source lanes can be correctly pulled in. I'll expand on this comment a bit. tlively: If the source vector has more lanes than the destination, then those lanes would be narrower.
		// because the shufflevector indices are in terms of the destination lanes and
		// would not be able to address the smaller individual source lanes.
		auto GetShuffleSrc = [&](const SDValue &Lane) {
		if (Lane->getOpcode() != ISD::EXTRACT_VECTOR_ELT)
		return SDValue();
		if (!isa<ConstantSDNode>(Lane->getOperand(1).getNode()))
		return SDValue();
		if (Lane->getOperand(0).getValueType().getVectorNumElements() >
		VecT.getVectorNumElements())
		return SDValue();
		return Lane->getOperand(0);
		};

using ValueEntry = std::pair<SDValue, size_t>;		using ValueEntry = std::pair<SDValue, size_t>;
SmallVector<ValueEntry, 16> SplatValueCounts;		SmallVector<ValueEntry, 16> SplatValueCounts;

using SwizzleEntry = std::pair<std::pair<SDValue, SDValue>, size_t>;		using SwizzleEntry = std::pair<std::pair<SDValue, SDValue>, size_t>;
SmallVector<SwizzleEntry, 16> SwizzleCounts;		SmallVector<SwizzleEntry, 16> SwizzleCounts;

		using ShuffleEntry = std::pair<SDValue, size_t>;
		SmallVector<ShuffleEntry, 16> ShuffleCounts;

auto AddCount = [](auto &Counts, const auto &Val) {		auto AddCount = [](auto &Counts, const auto &Val) {
auto CountIt =		auto CountIt =
llvm::find_if(Counts, [&Val](auto E) { return E.first == Val; });		llvm::find_if(Counts, [&Val](auto E) { return E.first == Val; });
if (CountIt == Counts.end()) {		if (CountIt == Counts.end()) {
Counts.emplace_back(Val, 1);		Counts.emplace_back(Val, 1);
} else {		} else {
CountIt->second++;		CountIt->second++;
}		}
Show All 12 Lines	SDValue WebAssemblyTargetLowering::LowerBUILD_VECTOR(SDValue Op,
// Count eligible lanes for each type of vector creation op		// Count eligible lanes for each type of vector creation op
for (size_t I = 0; I < Lanes; ++I) {		for (size_t I = 0; I < Lanes; ++I) {
const SDValue &Lane = Op->getOperand(I);		const SDValue &Lane = Op->getOperand(I);
if (Lane.isUndef())		if (Lane.isUndef())
continue;		continue;

AddCount(SplatValueCounts, Lane);		AddCount(SplatValueCounts, Lane);

if (IsConstant(Lane)) {		if (IsConstant(Lane))
NumConstantLanes++;		NumConstantLanes++;
} else if (CanSwizzle) {		if (auto ShuffleSrc = GetShuffleSrc(Lane))
		AddCount(ShuffleCounts, ShuffleSrc);
		if (CanSwizzle) {
		dschuffUnsubmitted Not Done Reply Inline Actions Unrelated to this CL but what exactly is the difference between a swizzle and a shuffle? dschuff: Unrelated to this CL but what exactly is the difference between a swizzle and a shuffle?
		aheejinUnsubmitted Not Done Reply Inline Actions Shuffle: https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#shuffling-using-immediate-indices Swizzle: https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices aheejin: - Shuffle: https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#shuffling…
		tlivelyAuthorUnsubmitted Done Reply Inline Actions In a single sentence: Shuffling uses a static array of indices to draw lanes from two source vectors while swizzling uses the lanes of one source vector as indices into a second source vector. tlively: In a single sentence: Shuffling uses a static array of indices to draw lanes from two source…
auto SwizzleSrcs = GetSwizzleSrcs(I, Lane);		auto SwizzleSrcs = GetSwizzleSrcs(I, Lane);
if (SwizzleSrcs.first)		if (SwizzleSrcs.first)
AddCount(SwizzleCounts, SwizzleSrcs);		AddCount(SwizzleCounts, SwizzleSrcs);
}		}
}		}

SDValue SplatValue;		SDValue SplatValue;
size_t NumSplatLanes;		size_t NumSplatLanes;
std::tie(SplatValue, NumSplatLanes) = GetMostCommon(SplatValueCounts);		std::tie(SplatValue, NumSplatLanes) = GetMostCommon(SplatValueCounts);

SDValue SwizzleSrc;		SDValue SwizzleSrc;
SDValue SwizzleIndices;		SDValue SwizzleIndices;
size_t NumSwizzleLanes = 0;		size_t NumSwizzleLanes = 0;
if (SwizzleCounts.size())		if (SwizzleCounts.size())
std::forward_as_tuple(std::tie(SwizzleSrc, SwizzleIndices),		std::forward_as_tuple(std::tie(SwizzleSrc, SwizzleIndices),
NumSwizzleLanes) = GetMostCommon(SwizzleCounts);		NumSwizzleLanes) = GetMostCommon(SwizzleCounts);

		// Shuffles can draw from up to two vectors, so find the two most common
		// sources.
		SDValue ShuffleSrc1, ShuffleSrc2;
		size_t NumShuffleLanes = 0;
		if (ShuffleCounts.size()) {
		std::tie(ShuffleSrc1, NumShuffleLanes) = GetMostCommon(ShuffleCounts);
		ShuffleCounts.erase(std::remove_if(ShuffleCounts.begin(),
		ShuffleCounts.end(),
		[&](const auto &Pair) {
		return Pair.first == ShuffleSrc1;
		}),
		ShuffleCounts.end());
		}
		if (ShuffleCounts.size()) {
		size_t AdditionalShuffleLanes;
		std::tie(ShuffleSrc2, AdditionalShuffleLanes) =
		GetMostCommon(ShuffleCounts);
		NumShuffleLanes += AdditionalShuffleLanes;
		}
		aheejinUnsubmitted Not Done Reply Inline Actions Does this handle when it is most beneficial to use two same vectors as both sources? aheejin: Does this handle when it is most beneficial to use two same vectors as both sources?
		tlivelyAuthorUnsubmitted Done Reply Inline Actions The shuffle can draw an arbitrary number of lanes from each source, so it is never necessary to use the same source for both operands. That being said, if there is only one available source, ShuffleSrc2 will be set to `undef` here and will be later made a copy of the first operand, for lack of any better vector to use there. tlively: The shuffle can draw an arbitrary number of lanes from each source, so it is never necessary to…

// Predicate returning true if the lane is properly initialized by the		// Predicate returning true if the lane is properly initialized by the
// original instruction		// original instruction
std::function<bool(size_t, const SDValue &)> IsLaneConstructed;		std::function<bool(size_t, const SDValue &)> IsLaneConstructed;
SDValue Result;		SDValue Result;
// Prefer swizzles over vector consts over splats		// Prefer swizzles over shuffles over vector consts over splats
if (NumSwizzleLanes >= NumSplatLanes && NumSwizzleLanes >= NumConstantLanes) {		if (NumSwizzleLanes >= NumShuffleLanes &&
		NumSwizzleLanes >= NumConstantLanes && NumSwizzleLanes >= NumSplatLanes) {
		aheejinUnsubmitted Not Done Reply Inline Actions Can we have some more comments on why we prefer this order? aheejin: Can we have some more comments on why we prefer this order?
		tlivelyAuthorUnsubmitted Done Reply Inline Actions It's more or less arbitrary, but now that I'm thinking about it, it would probably be better to prefer the simpler/smaller operations like splat over the more complex operations like shuffles and swizzles. I'll change this order in a follow-up PR and add a comment there. tlively: It's more or less arbitrary, but now that I'm thinking about it, it would probably be better to…
Result = DAG.getNode(WebAssemblyISD::SWIZZLE, DL, VecT, SwizzleSrc,		Result = DAG.getNode(WebAssemblyISD::SWIZZLE, DL, VecT, SwizzleSrc,
SwizzleIndices);		SwizzleIndices);
auto Swizzled = std::make_pair(SwizzleSrc, SwizzleIndices);		auto Swizzled = std::make_pair(SwizzleSrc, SwizzleIndices);
IsLaneConstructed = [&, Swizzled](size_t I, const SDValue &Lane) {		IsLaneConstructed = [&, Swizzled](size_t I, const SDValue &Lane) {
return Swizzled == GetSwizzleSrcs(I, Lane);		return Swizzled == GetSwizzleSrcs(I, Lane);
};		};
		} else if (NumShuffleLanes >= NumConstantLanes &&
		NumShuffleLanes >= NumSplatLanes) {
		size_t DestLaneSize = VecT.getVectorElementType().getFixedSizeInBits() / 8;
		size_t DestLaneCount = VecT.getVectorNumElements();
		size_t Scale1 = 1;
		size_t Scale2 = 1;
		SDValue Src1 = ShuffleSrc1;
		SDValue Src2 = ShuffleSrc2 ? ShuffleSrc2 : DAG.getUNDEF(VecT);
		if (Src1.getValueType() != VecT) {
		size_t LaneSize =
		Src1.getValueType().getVectorElementType().getFixedSizeInBits() / 8;
		assert(LaneSize > DestLaneSize);
		Scale1 = LaneSize / DestLaneSize;
		Src1 = DAG.getBitcast(VecT, Src1);
		}
		if (Src2.getValueType() != VecT) {
		size_t LaneSize =
		Src2.getValueType().getVectorElementType().getFixedSizeInBits() / 8;
		assert(LaneSize > DestLaneSize);
		Scale2 = LaneSize / DestLaneSize;
		Src2 = DAG.getBitcast(VecT, Src2);
		}

		int Mask[16];
		assert(DestLaneCount <= 16);
		for (size_t I = 0; I < DestLaneCount; ++I) {
		const SDValue &Lane = Op->getOperand(I);
		SDValue Src = GetShuffleSrc(Lane);
		if (Src == ShuffleSrc1) {
		Mask[I] = Lane->getConstantOperandVal(1) * Scale1;
		} else if (Src && Src == ShuffleSrc2) {
		Mask[I] = DestLaneCount + Lane->getConstantOperandVal(1) * Scale2;
		} else {
		Mask[I] = -1;
		}
		}
		ArrayRef<int> MaskRef(Mask, DestLaneCount);
		Result = DAG.getVectorShuffle(VecT, DL, Src1, Src2, MaskRef);
		IsLaneConstructed = [&](size_t, const SDValue &Lane) {
		auto Src = GetShuffleSrc(Lane);
		return Src == ShuffleSrc1 \|\| (Src && Src == ShuffleSrc2);
		};
} else if (NumConstantLanes >= NumSplatLanes) {		} else if (NumConstantLanes >= NumSplatLanes) {
SmallVector<SDValue, 16> ConstLanes;		SmallVector<SDValue, 16> ConstLanes;
for (const SDValue &Lane : Op->op_values()) {		for (const SDValue &Lane : Op->op_values()) {
if (IsConstant(Lane)) {		if (IsConstant(Lane)) {
ConstLanes.push_back(Lane);		ConstLanes.push_back(Lane);
} else if (LaneT.isFloatingPoint()) {		} else if (LaneT.isFloatingPoint()) {
ConstLanes.push_back(DAG.getConstantFP(0, DL, LaneT));		ConstLanes.push_back(DAG.getConstantFP(0, DL, LaneT));
} else {		} else {
▲ Show 20 Lines • Show All 241 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-build-vector.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; CHECK: return			; CHECK: return
	define <8 x i16> @swizzle_one_i16x8(<8 x i16> %src, <8 x i16> %mask) {			define <8 x i16> @swizzle_one_i16x8(<8 x i16> %src, <8 x i16> %mask) {
	%m0 = extractelement <8 x i16> %mask, i32 0			%m0 = extractelement <8 x i16> %mask, i32 0
	%s0 = extractelement <8 x i16> %src, i16 %m0			%s0 = extractelement <8 x i16> %src, i16 %m0
	%v0 = insertelement <8 x i16> undef, i16 %s0, i32 0			%v0 = insertelement <8 x i16> undef, i16 %s0, i32 0
	ret <8 x i16> %v0			ret <8 x i16> %v0
	}			}

				; CHECK-LABEL: half_shuffle_i32x4:
				; CHECK-NEXT: .functype half_shuffle_i32x4 (v128) -> (v128)
				; CHECK: i8x16.shuffle $push[[L0:[0-9]+]]=, $0, $0, 0, 0, 0, 0, 8, 9, 10, 11, 0, 1, 2, 3, 0, 0, 0, 0
				; CHECK: i32x4.replace_lane
				; CHECK: i32x4.replace_lane
				; CHECK: return
				define <4 x i32> @half_shuffle_i32x4(<4 x i32> %src) {
				%s0 = extractelement <4 x i32> %src, i32 0
				%s2 = extractelement <4 x i32> %src, i32 2
				%v0 = insertelement <4 x i32> undef, i32 0, i32 0
				%v1 = insertelement <4 x i32> %v0, i32 %s2, i32 1
				%v2 = insertelement <4 x i32> %v1, i32 %s0, i32 2
				%v3 = insertelement <4 x i32> %v2, i32 3, i32 3
				ret <4 x i32> %v3
				}

	; CHECK-LABEL: mashup_swizzle_i8x16:			; CHECK-LABEL: mashup_swizzle_i8x16:
	; CHECK-NEXT: .functype mashup_swizzle_i8x16 (v128, v128, i32) -> (v128)			; CHECK-NEXT: .functype mashup_swizzle_i8x16 (v128, v128, i32) -> (v128)
	; CHECK-NEXT: i8x16.swizzle $push[[L0:[0-9]+]]=, $0, $1			; CHECK-NEXT: i8x16.swizzle $push[[L0:[0-9]+]]=, $0, $1
	; CHECK: i8x16.replace_lane			; CHECK: i8x16.replace_lane
	; CHECK: i8x16.replace_lane			; CHECK: i8x16.replace_lane
	; CHECK: i8x16.replace_lane			; CHECK: i8x16.replace_lane
	; CHECK: i8x16.replace_lane			; CHECK: i8x16.replace_lane
	; CHECK: return			; CHECK: return
	▲ Show 20 Lines • Show All 96 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-concat.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -verify-machineinstrs -mattr=+simd128 \| FileCheck %s

				; Check that all varieties of vector concatenations get lowered to shuffles.

				target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
				target triple = "wasm32-unknown--wasm"

				define <16 x i8> @concat_v8i8(<8 x i8> %a, <8 x i8> %b) {
				; CHECK-LABEL: concat_v8i8:
				; CHECK: .functype concat_v8i8 (v128, v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shuffle 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30
				; CHECK-NEXT: # fallthrough-return
				%v = shufflevector <8 x i8> %a, <8 x i8> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				ret <16 x i8> %v
				}

				define <8 x i8> @concat_v4i8(<4 x i8> %a, <4 x i8> %b) {
				; CHECK-LABEL: concat_v4i8:
				; CHECK: .functype concat_v4i8 (v128, v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shuffle 0, 1, 4, 5, 8, 9, 12, 13, 16, 17, 20, 21, 24, 25, 28, 29
				; CHECK-NEXT: # fallthrough-return
				%v = shufflevector <4 x i8> %a, <4 x i8> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i8> %v
				}

				define <8 x i16> @concat_v4i16(<4 x i16> %a, <4 x i16> %b) {
				; CHECK-LABEL: concat_v4i16:
				; CHECK: .functype concat_v4i16 (v128, v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shuffle 0, 1, 4, 5, 8, 9, 12, 13, 16, 17, 20, 21, 24, 25, 28, 29
				; CHECK-NEXT: # fallthrough-return
				%v = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				ret <8 x i16> %v
				}

				define <4 x i8> @concat_v2i8(<2 x i8> %a, <2 x i8> %b) {
				; CHECK-LABEL: concat_v2i8:
				; CHECK: .functype concat_v2i8 (v128, v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27
				; CHECK-NEXT: # fallthrough-return
				%v = shufflevector <2 x i8> %a, <2 x i8> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				ret <4 x i8> %v
				}

				define <4 x i16> @concat_v2i16(<2 x i16> %a, <2 x i16> %b) {
				; CHECK-LABEL: concat_v2i16:
				; CHECK: .functype concat_v2i16 (v128, v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27
				; CHECK-NEXT: # fallthrough-return
				%v = shufflevector <2 x i16> %a, <2 x i16> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				ret <4 x i16> %v
				}

				define <4 x i32> @concat_v2i32(<2 x i32> %a, <2 x i32> %b) {
				; CHECK-LABEL: concat_v2i32:
				; CHECK: .functype concat_v2i32 (v128, v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27
				; CHECK-NEXT: # fallthrough-return
				aheejinUnsubmitted Not Done Reply Inline Actions I might be mistaken, but isn't this loss of data? v2i32 is a full 128bits vector and the result of `shufflevector` contain the whole two vectors but the result of `i8x16.shuffle` contains only half of it. aheejin: I might be mistaken, but isn't this loss of data? v2i32 is a full 128bits vector and the result…
				tlivelyAuthorUnsubmitted Done Reply Inline Actions v2i32 is only 64 bits, but it is represented in Wasm using the low 32 bits of each lane in an i64x2 vector. The i8x16.shuffle here pulls in those low 32 bits from each lane and leaves the unused high 32 bits. tlively: v2i32 is only 64 bits, but it is represented in Wasm using the low 32 bits of each lane in an…
				aheejinUnsubmitted Not Done Reply Inline Actions Ah right, I mistook this for `v4i32`.. Sorry. aheejin: Ah right, I mistook this for `v4i32`.. Sorry.
				%v = shufflevector <2 x i32> %a, <2 x i32> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
				ret <4 x i32> %v
				}