This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
-
WebAssemblyISelLowering.h
-
WebAssemblyISelLowering.cpp
-
WebAssemblyInstrSIMD.td
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
simd-build-vector.ll

Differential D56633

[WebAssembly] Optimize BUILD_VECTOR lowering for size
ClosedPublic

Authored by tlively on Jan 11 2019, 9:03 PM.

Download Raw Diff

Details

Reviewers

aheejin

Commits

rG079816efb72c: [WebAssembly] Optimize BUILD_VECTOR lowering for size
rL352592: [WebAssembly] Optimize BUILD_VECTOR lowering for size

Summary

Implements custom lowering logic that finds the optimal value for the
initial splat of the vector and either uses it or uses v128.const if
it is available and if it would produce smaller code. This logic
replaces large TableGen ISEL patterns that would lower all non-splat
BUILD_VECTORs into a splat followed by a fixed number of replace_lane
instructions. This CL fixes PR39685.

Diff Detail

Repository: rL LLVM

Event Timeline

tlively created this revision.Jan 11 2019, 9:03 PM

Herald added subscribers: llvm-commits, sunfish, jgravelle-google and 2 others. · View Herald TranscriptJan 11 2019, 9:03 PM

Harbormaster completed remote builds in B26744: Diff 181422.Jan 11 2019, 9:03 PM

The tests are a WIP, but I thought I'd get this uploaded before they are done because there is a fair amount of relatively complex code to review. I manually verified that all the tests behave as expected.

Finish tests

Harbormaster completed remote builds in B26948: Diff 182184.Jan 16 2019, 4:24 PM

@aheejin, this should be good to go now.

Sorry for the delay! Some nits and questions.

How should undef elements be initialized? This patch looks it doesn't care about which number they are initialized with, whereas we initialize them with 0 in scalars. Is this OK?

lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1137 ↗	(On Diff #182184)	How about `V.getOpcode() == ISD::Constant \|\| V.getOpcode() == ISD::ConstantFP` ? `ConstantSDNode` and `ConstantFPSDNode`'s definitions seem to include `ISD::TargetConstant` and `ISD::TargetConstantFP`, whose definitions sound like we shouldn't do any optimizations on them.
1144 ↗	(On Diff #182184)	Variable names should start with an uppercase letter. I know it looks especially weird for loop index variables ;( But anyway.. the same for all other for loops too.
1144 ↗	(On Diff #182184)	How about `for (const SDValue &Op : Op.op_values())` ? For the other for loops too
1147 ↗	(On Diff #182184)	This is not being used anywhere. Were you gonna use it for something? Otherwise we can delete this?
1169 ↗	(On Diff #182184)	Can all these byte calculations change if we take LEB encoding into account?
1173 ↗	(On Diff #182184)	In which case you do splat for non-const arguments after using `v128.const`?
1174 ↗	(On Diff #182184)	For `replace_lane`, why is it 2? I guess 2 bytes for the opcode and a byte for immediate lane index, so 3?
1184 ↗	(On Diff #182184)	Why is this `LaneConstBytes` other than `LaneDynBytes`? Isn't `LaneDynBytes` the bytes needed to `replace_lane` instruction?
test/CodeGen/WebAssembly/simd-build-vector.ll
116 ↗	(On Diff #182184)	Looking at the code above, I wonder how we ended up with a `local.get` from an unassigned local here for all undef case. Actually it works beautifully, given that unassigned locals are zero-initialized by default and it is even less space than `splat 0`. Maybe this can be an optimization in which we try to replace all `v128.const 0, ..., 0` and `splat 0` with `local.get n` where n being an assigned local. Just a thought and not relevant to this CL.

aheejin added inline comments.Jan 26 2019, 4:54 PM

test/CodeGen/WebAssembly/simd-build-vector.ll
116 ↗	(On Diff #182184)	Hmm, it seems ExplicitLocals pass turns all unstackified registers to locals in which undefined registers become `local.get` from unassigned locals. And I don't think the current SIMD backend would prefer `v128.const 0, ..., 0` over `splat 0` anyway, right? So sorry, nevermind. :)

Address comments

Herald added a subscriber: hiraditya. · View Herald TranscriptJan 28 2019, 6:36 PM

Harbormaster completed remote builds in B27419: Diff 184005.Jan 28 2019, 6:36 PM

All the inline comments disappeared because I switched to using the monorepo, which has different paths.

In D56633#1371855, @aheejin wrote:

How should undef elements be initialized? This patch looks it doesn't care about which number they are initialized with, whereas we initialize them with 0 in scalars. Is this OK?

Undef elements can be initialized with anything. We use zero here and elsewhere because there's no better default.

lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1169 ↗	(On Diff #182184)	Yes, LEB encoding can make any SIMD opcode arbitrarily long. I'm trying to use the smallest encoding in these calculations on the assumption that tools will always prefer the smallest encoding.
1173 ↗	(On Diff #182184)	After using `v128.const`, everything else would be a `replace_lane`, not a splat. This value is also used for the initial splat when `v128.const` is not used, though.
1174 ↗	(On Diff #182184)	Yes, good catch.
1184 ↗	(On Diff #182184)	I assume (for a lack of better information) that all dynamic values are already on the stack, but all constants need to be materialized immediately before they are used. That means that for constant replace_lanes I count the replace_lane instruction and also the constant lane value.

I assume (for a lack of better information) that all dynamic values are already on the stack, but all constants need to be materialized immediately before they are used. That means that for constant replace_lanes I count the replace_lane instruction and also the constant lane value.

I guess for dynamic splats and consts we should add bytes for opcode for local.get and local number immediate? Because whatever that value is, most likely that value would not be on the top of the stack. For example, for dynamic replace_lanes, we start the code sequence with splat for v128.const, and by the time we need replace_lane, not the dynamic value but the result of v128.const or splat will be on the top of the stack. Also when you begin the sequence with a splat with dynamic value, that value might be on top of the stack, but more likely it's not.

We talked in person and I realized I forgot replace_lane also takes a vector argument. Thanks!

This revision is now accepted and ready to land.Jan 29 2019, 5:36 PM

Closed by commit rL352592: [WebAssembly] Optimize BUILD_VECTOR lowering for size (authored by tlively). · Explain WhyJan 29 2019, 6:23 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

WebAssembly/

WebAssemblyISelLowering.h

1 line

WebAssemblyISelLowering.cpp

110 lines

WebAssemblyInstrSIMD.td

112 lines

test/

CodeGen/

WebAssembly/

simd-build-vector.ll

127 lines

Diff 184229

llvm/trunk/lib/Target/WebAssembly/WebAssemblyISelLowering.h

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	private:
SDValue LowerExternalSymbol(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerExternalSymbol(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerBR_JT(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerBR_JT(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerJumpTable(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVASTART(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerCopyToReg(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerCopyToReg(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_WO_CHAIN(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerINTRINSIC_VOID(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerSIGN_EXTEND_INREG(SDValue Op, SelectionDAG &DAG) const;
		SDValue LowerBUILD_VECTOR(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerVECTOR_SHUFFLE(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerAccessVectorElement(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerAccessVectorElement(SDValue Op, SelectionDAG &DAG) const;
SDValue LowerShift(SDValue Op, SelectionDAG &DAG) const;		SDValue LowerShift(SDValue Op, SelectionDAG &DAG) const;
};		};

namespace WebAssembly {		namespace WebAssembly {
FastISel *createFastISel(FunctionLoweringInfo &funcInfo,		FastISel *createFastISel(FunctionLoweringInfo &funcInfo,
const TargetLibraryInfo *libInfo);		const TargetLibraryInfo *libInfo);
} // end namespace WebAssembly		} // end namespace WebAssembly

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/trunk/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	WebAssemblyTargetLowering::WebAssemblyTargetLowering(

// SIMD-specific configuration		// SIMD-specific configuration
if (Subtarget->hasSIMD128()) {		if (Subtarget->hasSIMD128()) {
// Support saturating add for i8x16 and i16x8		// Support saturating add for i8x16 and i16x8
for (auto Op : {ISD::SADDSAT, ISD::UADDSAT})		for (auto Op : {ISD::SADDSAT, ISD::UADDSAT})
for (auto T : {MVT::v16i8, MVT::v8i16})		for (auto T : {MVT::v16i8, MVT::v8i16})
setOperationAction(Op, T, Legal);		setOperationAction(Op, T, Legal);

		// Custom lower BUILD_VECTORs to minimize number of replace_lanes
		for (auto T : {MVT::v16i8, MVT::v8i16, MVT::v4i32, MVT::v4f32})
		setOperationAction(ISD::BUILD_VECTOR, T, Custom);
		if (Subtarget->hasUnimplementedSIMD128())
		for (auto T : {MVT::v2i64, MVT::v2f64})
		setOperationAction(ISD::BUILD_VECTOR, T, Custom);

// We have custom shuffle lowering to expose the shuffle mask		// We have custom shuffle lowering to expose the shuffle mask
for (auto T : {MVT::v16i8, MVT::v8i16, MVT::v4i32, MVT::v4f32})		for (auto T : {MVT::v16i8, MVT::v8i16, MVT::v4i32, MVT::v4f32})
setOperationAction(ISD::VECTOR_SHUFFLE, T, Custom);		setOperationAction(ISD::VECTOR_SHUFFLE, T, Custom);
if (Subtarget->hasUnimplementedSIMD128())		if (Subtarget->hasUnimplementedSIMD128())
for (auto T: {MVT::v2i64, MVT::v2f64})		for (auto T: {MVT::v2i64, MVT::v2f64})
setOperationAction(ISD::VECTOR_SHUFFLE, T, Custom);		setOperationAction(ISD::VECTOR_SHUFFLE, T, Custom);

// Custom lowering since wasm shifts must have a scalar shift amount		// Custom lowering since wasm shifts must have a scalar shift amount
▲ Show 20 Lines • Show All 739 Lines • ▼ Show 20 Lines	case ISD::INTRINSIC_WO_CHAIN:
return LowerINTRINSIC_WO_CHAIN(Op, DAG);		return LowerINTRINSIC_WO_CHAIN(Op, DAG);
case ISD::EXTRACT_VECTOR_ELT:		case ISD::EXTRACT_VECTOR_ELT:
case ISD::INSERT_VECTOR_ELT:		case ISD::INSERT_VECTOR_ELT:
return LowerAccessVectorElement(Op, DAG);		return LowerAccessVectorElement(Op, DAG);
case ISD::INTRINSIC_VOID:		case ISD::INTRINSIC_VOID:
return LowerINTRINSIC_VOID(Op, DAG);		return LowerINTRINSIC_VOID(Op, DAG);
case ISD::SIGN_EXTEND_INREG:		case ISD::SIGN_EXTEND_INREG:
return LowerSIGN_EXTEND_INREG(Op, DAG);		return LowerSIGN_EXTEND_INREG(Op, DAG);
		case ISD::BUILD_VECTOR:
		return LowerBUILD_VECTOR(Op, DAG);
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
return LowerVECTOR_SHUFFLE(Op, DAG);		return LowerVECTOR_SHUFFLE(Op, DAG);
case ISD::SHL:		case ISD::SHL:
case ISD::SRA:		case ISD::SRA:
case ISD::SRL:		case ISD::SRL:
return LowerShift(Op, DAG);		return LowerShift(Op, DAG);
}		}
}		}
▲ Show 20 Lines • Show All 201 Lines • ▼ Show 20 Lines	WebAssemblyTargetLowering::LowerSIGN_EXTEND_INREG(SDValue Op,
// undo the expansion and select extract_lane_s instructions.		// undo the expansion and select extract_lane_s instructions.
assert(!Subtarget->hasSignExt() && Subtarget->hasSIMD128());		assert(!Subtarget->hasSignExt() && Subtarget->hasSIMD128());
if (Op.getOperand(0).getOpcode() == ISD::EXTRACT_VECTOR_ELT)		if (Op.getOperand(0).getOpcode() == ISD::EXTRACT_VECTOR_ELT)
return Op;		return Op;
// Otherwise expand		// Otherwise expand
return SDValue();		return SDValue();
}		}

		SDValue WebAssemblyTargetLowering::LowerBUILD_VECTOR(SDValue Op,
		SelectionDAG &DAG) const {
		SDLoc DL(Op);
		const EVT VecT = Op.getValueType();
		const EVT LaneT = Op.getOperand(0).getValueType();
		const size_t Lanes = Op.getNumOperands();
		auto IsConstant = [](const SDValue &V) {
		return V.getOpcode() == ISD::Constant \|\| V.getOpcode() == ISD::ConstantFP;
		};

		// Find the most common operand, which is approximately the best to splat
		using Entry = std::pair<SDValue, size_t>;
		SmallVector<Entry, 16> ValueCounts;
		size_t NumConst = 0, NumDynamic = 0;
		for (const SDValue &Lane : Op->op_values()) {
		if (Lane.isUndef()) {
		continue;
		} else if (IsConstant(Lane)) {
		NumConst++;
		} else {
		NumDynamic++;
		}
		auto CountIt = std::find_if(ValueCounts.begin(), ValueCounts.end(),
		[&Lane](Entry A) { return A.first == Lane; });
		if (CountIt == ValueCounts.end()) {
		ValueCounts.emplace_back(Lane, 1);
		} else {
		CountIt->second++;
		}
		}
		auto CommonIt =
		std::max_element(ValueCounts.begin(), ValueCounts.end(),
		[](Entry A, Entry B) { return A.second < B.second; });
		assert(CommonIt != ValueCounts.end() && "Unexpected all-undef build_vector");
		SDValue SplatValue = CommonIt->first;
		size_t NumCommon = CommonIt->second;

		// If v128.const is available, consider using it instead of a splat
		if (Subtarget->hasUnimplementedSIMD128()) {
		// {i32,i64,f32,f64}.const opcode, and value
		const size_t ConstBytes = 1 + std::max(size_t(4), 16 / Lanes);
		// SIMD prefix and opcode
		const size_t SplatBytes = 2;
		const size_t SplatConstBytes = SplatBytes + ConstBytes;
		// SIMD prefix, opcode, and lane index
		const size_t ReplaceBytes = 3;
		const size_t ReplaceConstBytes = ReplaceBytes + ConstBytes;
		// SIMD prefix, v128.const opcode, and 128-bit value
		const size_t VecConstBytes = 18;
		// Initial v128.const and a replace_lane for each non-const operand
		const size_t ConstInitBytes = VecConstBytes + NumDynamic * ReplaceBytes;
		// Initial splat and all necessary replace_lanes
		const size_t SplatInitBytes =
		IsConstant(SplatValue)
		// Initial constant splat
		? (SplatConstBytes +
		// Constant replace_lanes
		(NumConst - NumCommon) * ReplaceConstBytes +
		// Dynamic replace_lanes
		(NumDynamic * ReplaceBytes))
		// Initial dynamic splat
		: (SplatBytes +
		// Constant replace_lanes
		(NumConst * ReplaceConstBytes) +
		// Dynamic replace_lanes
		(NumDynamic - NumCommon) * ReplaceBytes);
		if (ConstInitBytes < SplatInitBytes) {
		// Create build_vector that will lower to initial v128.const
		SmallVector<SDValue, 16> ConstLanes;
		for (const SDValue &Lane : Op->op_values()) {
		if (IsConstant(Lane)) {
		ConstLanes.push_back(Lane);
		} else if (LaneT.isFloatingPoint()) {
		ConstLanes.push_back(DAG.getConstantFP(0, DL, LaneT));
		} else {
		ConstLanes.push_back(DAG.getConstant(0, DL, LaneT));
		}
		}
		SDValue Result = DAG.getBuildVector(VecT, DL, ConstLanes);
		// Add replace_lane instructions for non-const lanes
		for (size_t I = 0; I < Lanes; ++I) {
		const SDValue &Lane = Op->getOperand(I);
		if (!Lane.isUndef() && !IsConstant(Lane))
		Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VecT, Result, Lane,
		DAG.getConstant(I, DL, MVT::i32));
		}
		return Result;
		}
		}
		// Use a splat for the initial vector
		SDValue Result = DAG.getSplatBuildVector(VecT, DL, SplatValue);
		// Add replace_lane instructions for other values
		for (size_t I = 0; I < Lanes; ++I) {
		const SDValue &Lane = Op->getOperand(I);
		if (Lane != SplatValue)
		Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VecT, Result, Lane,
		DAG.getConstant(I, DL, MVT::i32));
		}
		return Result;
		}

SDValue		SDValue
WebAssemblyTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,		WebAssemblyTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
ArrayRef<int> Mask = cast<ShuffleVectorSDNode>(Op.getNode())->getMask();		ArrayRef<int> Mask = cast<ShuffleVectorSDNode>(Op.getNode())->getMask();
MVT VecType = Op.getOperand(0).getSimpleValueType();		MVT VecType = Op.getOperand(0).getSimpleValueType();
assert(VecType.is128BitVector() && "Unexpected shuffle vector type");		assert(VecType.is128BitVector() && "Unexpected shuffle vector type");
size_t LaneBytes = VecType.getVectorElementType().getSizeInBits() / 8;		size_t LaneBytes = VecType.getVectorElementType().getSizeInBits() / 8;
▲ Show 20 Lines • Show All 104 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td

Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines	def : Pat<(vector_insert (v4i32 V128:$vec), I32:$x, undef),
(REPLACE_LANE_v4i32 V128:$vec, 0, I32:$x)>;		(REPLACE_LANE_v4i32 V128:$vec, 0, I32:$x)>;
def : Pat<(vector_insert (v2i64 V128:$vec), I64:$x, undef),		def : Pat<(vector_insert (v2i64 V128:$vec), I64:$x, undef),
(REPLACE_LANE_v2i64 V128:$vec, 0, I64:$x)>;		(REPLACE_LANE_v2i64 V128:$vec, 0, I64:$x)>;
def : Pat<(vector_insert (v4f32 V128:$vec), F32:$x, undef),		def : Pat<(vector_insert (v4f32 V128:$vec), F32:$x, undef),
(REPLACE_LANE_v4f32 V128:$vec, 0, F32:$x)>;		(REPLACE_LANE_v4f32 V128:$vec, 0, F32:$x)>;
def : Pat<(vector_insert (v2f64 V128:$vec), F64:$x, undef),		def : Pat<(vector_insert (v2f64 V128:$vec), F64:$x, undef),
(REPLACE_LANE_v2f64 V128:$vec, 0, F64:$x)>;		(REPLACE_LANE_v2f64 V128:$vec, 0, F64:$x)>;

// Arbitrary other BUILD_VECTOR patterns
def : Pat<(v16i8 (build_vector
(i32 I32:$x0), (i32 I32:$x1), (i32 I32:$x2), (i32 I32:$x3),
(i32 I32:$x4), (i32 I32:$x5), (i32 I32:$x6), (i32 I32:$x7),
(i32 I32:$x8), (i32 I32:$x9), (i32 I32:$x10), (i32 I32:$x11),
(i32 I32:$x12), (i32 I32:$x13), (i32 I32:$x14), (i32 I32:$x15)
)),
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (REPLACE_LANE_v16i8
(v16i8 (SPLAT_v16i8 (i32 I32:$x0))),
1, I32:$x1
)),
2, I32:$x2
)),
3, I32:$x3
)),
4, I32:$x4
)),
5, I32:$x5
)),
6, I32:$x6
)),
7, I32:$x7
)),
8, I32:$x8
)),
9, I32:$x9
)),
10, I32:$x10
)),
11, I32:$x11
)),
12, I32:$x12
)),
13, I32:$x13
)),
14, I32:$x14
)),
15, I32:$x15
))>;
def : Pat<(v8i16 (build_vector
(i32 I32:$x0), (i32 I32:$x1), (i32 I32:$x2), (i32 I32:$x3),
(i32 I32:$x4), (i32 I32:$x5), (i32 I32:$x6), (i32 I32:$x7)
)),
(v8i16 (REPLACE_LANE_v8i16
(v8i16 (REPLACE_LANE_v8i16
(v8i16 (REPLACE_LANE_v8i16
(v8i16 (REPLACE_LANE_v8i16
(v8i16 (REPLACE_LANE_v8i16
(v8i16 (REPLACE_LANE_v8i16
(v8i16 (REPLACE_LANE_v8i16
(v8i16 (SPLAT_v8i16 (i32 I32:$x0))),
1, I32:$x1
)),
2, I32:$x2
)),
3, I32:$x3
)),
4, I32:$x4
)),
5, I32:$x5
)),
6, I32:$x6
)),
7, I32:$x7
))>;
def : Pat<(v4i32 (build_vector
(i32 I32:$x0), (i32 I32:$x1), (i32 I32:$x2), (i32 I32:$x3)
)),
(v4i32 (REPLACE_LANE_v4i32
(v4i32 (REPLACE_LANE_v4i32
(v4i32 (REPLACE_LANE_v4i32
(v4i32 (SPLAT_v4i32 (i32 I32:$x0))),
1, I32:$x1
)),
2, I32:$x2
)),
3, I32:$x3
))>;
def : Pat<(v2i64 (build_vector (i64 I64:$x0), (i64 I64:$x1))),
(v2i64 (REPLACE_LANE_v2i64
(v2i64 (SPLAT_v2i64 (i64 I64:$x0))), 1, I64:$x1))>;
def : Pat<(v4f32 (build_vector
(f32 F32:$x0), (f32 F32:$x1), (f32 F32:$x2), (f32 F32:$x3)
)),
(v4f32 (REPLACE_LANE_v4f32
(v4f32 (REPLACE_LANE_v4f32
(v4f32 (REPLACE_LANE_v4f32
(v4f32 (SPLAT_v4f32 (f32 F32:$x0))),
1, F32:$x1
)),
2, F32:$x2
)),
3, F32:$x3
))>;
def : Pat<(v2f64 (build_vector (f64 F64:$x0), (f64 F64:$x1))),
(v2f64 (REPLACE_LANE_v2f64
(v2f64 (SPLAT_v2f64 (f64 F64:$x0))), 1, F64:$x1))>;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Comparisons		// Comparisons
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

multiclass SIMDCondition<ValueType vec_t, ValueType out_t, string vec,		multiclass SIMDCondition<ValueType vec_t, ValueType out_t, string vec,
string name, CondCode cond, bits<32> simdop> {		string name, CondCode cond, bits<32> simdop> {
defm _#vec_t :		defm _#vec_t :
SIMD_I<(outs V128:$dst), (ins V128:$lhs, V128:$rhs), (outs), (ins),		SIMD_I<(outs V128:$dst), (ins V128:$lhs, V128:$rhs), (outs), (ins),
▲ Show 20 Lines • Show All 338 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/WebAssembly/simd-build-vector.ll

				; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+unimplemented-simd128 \| FileCheck %s

				; Test that the logic to choose between v128.const vector
				; initialization and splat vector initialization and to optimize the
				; choice of splat value works correctly.

				target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
				target triple = "wasm32-unknown-unknown"

				; CHECK-LABEL: same_const_one_replaced_i8x16:
				; CHECK-NEXT: .functype same_const_one_replaced_i8x16 (i32) -> (v128)
				; CHECK-NEXT: i32.const $push[[L0:[0-9]+]]=, 42
				; CHECK-NEXT: i16x8.splat $push[[L1:[0-9]+]]=, $pop[[L0]]
				; CHECK-NEXT: i16x8.replace_lane $push[[L2:[0-9]+]]=, $pop[[L1]], 5, $0
				; CHECK-NEXT: return $pop[[L2]]
				define <8 x i16> @same_const_one_replaced_i8x16(i16 %x) {
				%v = insertelement
				<8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42>,
				i16 %x,
				i32 5
				ret <8 x i16> %v
				}

				; CHECK-LABEL: different_const_one_replaced_i8x16:
				; CHECK-NEXT: .functype different_const_one_replaced_i8x16 (i32) -> (v128)
				; CHECK-NEXT: v128.const $push[[L0:[0-9]+]]=, 1, 2, 3, 4, 5, 0, 7, 8
				; CHECK-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0
				; CHECK-NEXT: return $pop[[L1]]
				define <8 x i16> @different_const_one_replaced_i8x16(i16 %x) {
				%v = insertelement
				<8 x i16> <i16 1, i16 2, i16 3, i16 4, i16 5, i16 6, i16 7, i16 8>,
				i16 %x,
				i32 5
				ret <8 x i16> %v
				}

				; CHECK-LABEL: same_const_one_replaced_f32x4:
				; CHECK-NEXT: .functype same_const_one_replaced_f32x4 (f32) -> (v128)
				; CHECK-NEXT: f32.const $push[[L0:[0-9]+]]=, 0x1.5p5
				; CHECK-NEXT: f32x4.splat $push[[L1:[0-9]+]]=, $pop[[L0]]
				; CHECK-NEXT: f32x4.replace_lane $push[[L2:[0-9]+]]=, $pop[[L1]], 2, $0
				; CHECK-NEXT: return $pop[[L2]]
				define <4 x float> @same_const_one_replaced_f32x4(float %x) {
				%v = insertelement
				<4 x float> <float 42., float 42., float 42., float 42.>,
				float %x,
				i32 2
				ret <4 x float> %v
				}

				; CHECK-LABEL: different_const_one_replaced_f32x4:
				; CHECK-NEXT: .functype different_const_one_replaced_f32x4 (f32) -> (v128)
				; CHECK-NEXT: v128.const $push[[L0:[0-9]+]]=, 0x1p0, 0x1p1, 0x0p0, 0x1p2
				; CHECK-NEXT: f32x4.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 2, $0
				; CHECK-NEXT: return $pop[[L1]]
				define <4 x float> @different_const_one_replaced_f32x4(float %x) {
				%v = insertelement
				<4 x float> <float 1., float 2., float 3., float 4.>,
				float %x,
				i32 2
				ret <4 x float> %v
				}

				; CHECK-LABEL: splat_common_const_i32x4:
				; CHECK-NEXT: .functype splat_common_const_i32x4 () -> (v128)
				; CHECK-NEXT: i32.const $push[[L0:[0-9]+]]=, 3
				; CHECK-NEXT: i32x4.splat $push[[L1:[0-9]+]]=, $pop[[L0]]
				; CHECK-NEXT: i32.const $push[[L2:[0-9]+]]=, 1
				; CHECK-NEXT: i32x4.replace_lane $push[[L3:[0-9]+]]=, $pop[[L1]], 3, $pop[[L2]]
				; CHECK-NEXT: return $pop[[L3]]
				define <4 x i32> @splat_common_const_i32x4() {
				ret <4 x i32> <i32 undef, i32 3, i32 3, i32 1>
				}

				; CHECK-LABEL: splat_common_arg_i16x8:
				; CHECK-NEXT: .functype splat_common_arg_i16x8 (i32, i32, i32) -> (v128)
				; CHECK-NEXT: i16x8.splat $push[[L0:[0-9]+]]=, $2
				; CHECK-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 0, $1
				; CHECK-NEXT: i16x8.replace_lane $push[[L2:[0-9]+]]=, $pop[[L1]], 2, $0
				; CHECK-NEXT: i16x8.replace_lane $push[[L3:[0-9]+]]=, $pop[[L2]], 4, $1
				; CHECK-NEXT: i16x8.replace_lane $push[[L4:[0-9]+]]=, $pop[[L3]], 7, $1
				; CHECK-NEXT: return $pop[[L4]]
				define <8 x i16> @splat_common_arg_i16x8(i16 %a, i16 %b, i16 %c) {
				%v0 = insertelement <8 x i16> undef, i16 %b, i32 0
				%v1 = insertelement <8 x i16> %v0, i16 %c, i32 1
				%v2 = insertelement <8 x i16> %v1, i16 %a, i32 2
				%v3 = insertelement <8 x i16> %v2, i16 %c, i32 3
				%v4 = insertelement <8 x i16> %v3, i16 %b, i32 4
				%v5 = insertelement <8 x i16> %v4, i16 %c, i32 5
				%v6 = insertelement <8 x i16> %v5, i16 %c, i32 6
				%v7 = insertelement <8 x i16> %v6, i16 %b, i32 7
				ret <8 x i16> %v7
				}

				; CHECK-LABEL: undef_const_insert_f32x4:
				; CHECK-NEXT: .functype undef_const_insert_f32x4 () -> (v128)
				; CHECK-NEXT: f32.const $push[[L0:[0-9]+]]=, 0x1.5p5
				; CHECK-NEXT: f32x4.splat $push[[L1:[0-9]+]]=, $pop[[L0]]
				; CHECK-NEXT: return $pop[[L1]]
				define <4 x float> @undef_const_insert_f32x4() {
				%v = insertelement <4 x float> undef, float 42., i32 1
				ret <4 x float> %v
				}

				; CHECK-LABEL: undef_arg_insert_i32x4:
				; CHECK-NEXT: .functype undef_arg_insert_i32x4 (i32) -> (v128)
				; CHECK-NEXT: i32x4.splat $push[[L0:[0-9]+]]=, $0
				; CHECK-NEXT: return $pop[[L0]]
				define <4 x i32> @undef_arg_insert_i32x4(i32 %x) {
				%v = insertelement <4 x i32> undef, i32 %x, i32 3
				ret <4 x i32> %v
				}

				; CHECK-LABEL: all_undef_i8x16:
				; CHECK-NEXT: .functype all_undef_i8x16 () -> (v128)
				; CHECK-NEXT: return $0
				define <16 x i8> @all_undef_i8x16() {
				%v = insertelement <16 x i8> undef, i8 undef, i32 4
				ret <16 x i8> %v
				}

				; CHECK-LABEL: all_undef_f64x2:
				; CHECK-NEXT: .functype all_undef_f64x2 () -> (v128)
				; CHECK-NEXT: return $0
				define <2 x double> @all_undef_f64x2() {
				ret <2 x double> undef
				}