This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
1/2
WebAssemblyISelLowering.h
-
WebAssemblyISelLowering.cpp
1/2
WebAssemblyInstrSIMD.td
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
simd-concat.ll
-
simd-extending.ll
1/5
simd-load-store-alignment.ll
-
simd-nonconst-sext.ll
1/2
simd-offset.ll
-
simd-scalar-to-vector.ll

Differential D107502

[WebAssembly] Legalize vector types by widening
ClosedPublic

Authored by tlively on Aug 4 2021, 2:50 PM.

Download Raw Diff

Details

Reviewers

aheejin
dschuff

Commits

rGb69374ca58d3: [WebAssembly] Legalize vector types by widening

Summary

The default legalization of unsupported vector types is to promote the integers
in each lane, which leads to extra sign or zero extending and masking when
moving data into and out of vectors. Switch our preferred type legalization from
the default to vector widening, which keeps the data in the low lanes of the
vector rather than in the low bits of each lane. The unused high lanes can be
ignored.

Half-wide vectors are now loaded from memory into the low 64 bits of the v128
rather than spread out among the lanes. As a result, v128.load64_splat is a much
more common operation, so add new patterns to support it.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tlively created this revision.Aug 4 2021, 2:50 PM

Herald added subscribers: wingo, ecnelises, sunfish and 3 others. · View Herald TranscriptAug 4 2021, 2:50 PM

tlively requested review of this revision.Aug 4 2021, 2:50 PM

Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2021, 2:50 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B118007: Diff 364254.Aug 4 2021, 3:25 PM

Looks very nice! It looks it will have a significant impact on the performance. Are there possibly code patterns that will suffer from this change? If not, I wonder why was the default option in TargetLowering set up that way.. Maybe other architectures have more benefits when doing integer promotion? Anyway I feel I don't fully understand things yet so I added some questions.

And why were the two tests deleted?

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
122–124	Why is the exception?
llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td
199–205	If what we want is to insert the element in the index 0 of the new vector, why is the splat necessary? Why is only `load64_splat` necessary? What happens when we widen i8x4 into i32x4? In this case do we need `load32_splat`? Is widening only happens by a factor of 2?
llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297	It looks this actually changes the result being returned compared to the previously generate code. Is that fine?
llvm/test/CodeGen/WebAssembly/simd-offset.ll
926	Now that we are doing this, are the narrowing store patterns added in D84377 necessary?

tlively added a subscriber: srj.Aug 17 2021, 8:19 PM

Remove obsolete narrowing store support
Widen vector types more selectively
Move getPreferredVectorAction impl from .h to .cpp

Switch from using load64_splat to load64_zero to load half-wide vectors

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
122–124	The target-independent code that figures out how to legalize types[1] tries the strategies in this order: PromoteInteger (i.e. make individual lanes types wider) WidenVector (i.e. keep the lane types but add more lanes) Split or Scalarize The preferred vector action says which strategy to start with, but if that fails, subsequent strategies will still be tried. The important part here is that strategies prior to the preferred strategy will not be tried. So if we set the preferred action to be WidenVector for all vector types, whenever that fails the vector will be split or scalarized instead of integer-promoted. Since we don't have any legal i1 vector types, this means that all i1 vectors would be scalarized without this exception, leading to terrible codegen. Thinking on this more, it seems that we should actually be more selective and opt into WidenVector as the preferred action only for i8, i16, i32, i64, f32, and f64 vectors, since those are the only vectors for which WidenVector can possibly succeed. [1] https://github.com/llvm/llvm-project/blob/a7ebc4d145892fd22442832549cb12c4b6920dea/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1403-L1499
llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td
199–205	If what we want is to insert the element in the index 0 of the new vector, why is the splat necessary? Good point, I guess we could use v128.load64_zero here instead. Why is only `load64_splat` necessary? What happens when we widen i8x4 into i32x4? In this case do we need `load32_splat`? Is widening only happens by a factor of 2? We don't have any tests that cover those cases right now, but yes, I think we would need load32_splat or load32_zero in that case. Note that we could go even smaller and have e.g. v2i8. Unfortunately load16_zero does not exist, so there load16_splat would probably be the best option. We could also use load16_lane, but that would require an extra zero input vector. Since we don't have test cases that depend on these additional patterns right now, I would like to address them in a follow-up.
llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297	This is expected because the way we are representing these vector types has changed. It is an ABI break for vector code, but I hope that will be ok in practice. Perhaps it's worth mentioning in the LLVM release notes? @dschuff, do you have thoughts here? @srj, will that cause problems for Halide?
llvm/test/CodeGen/WebAssembly/simd-offset.ll
926	Nice catch! It looks like that can all be removed now.

In D107502#2927980, @aheejin wrote:

Looks very nice! It looks it will have a significant impact on the performance. Are there possibly code patterns that will suffer from this change? If not, I wonder why was the default option in TargetLowering set up that way.. Maybe other architectures have more benefits when doing integer promotion? Anyway I feel I don't fully understand things yet so I added some questions.

Thanks! I haven't seen any code patterns that looks worse after this change, but it's possible it exposes new missed opportunities where we don't have patterns in place yet, like with the scalar_to_vector stuff. I'm not sure why the defaults are set up this way, either.

And why were the two tests deleted?

I deleted the two regression tests because after this change the code paths that contained their relevant bugs are either no longer used at all (e.g. simd-nonconst-sext.ll) or have become well-tested by other tests (e.g. simd-scalar-to-vector.ll). In both cases, the regression tests no longer seemed useful to keep around.

tlively added a child revision: D108266: [WebAssembly] Pattern match SIMD convert_low and promote_low during ISel.Aug 17 2021, 9:13 PM

Harbormaster completed remote builds in B120043: Diff 367104.Aug 17 2021, 9:28 PM

dschuff added inline comments.Aug 18 2021, 2:40 PM

llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297	There are sort of 2 issues here. One is that our stable ABI is actually a C ABI, and not an LLVM ABI. I forget whether our C ABI (or other C ABIs) actually even defines the convention for vector types, but maybe it should (even if we declare it unstable). It's probably fair to say that we've not promised/stabilized a C vector ABI yet, so if this is a break of some part of the C ABI, I don't know that I'm too worried about it. The other question is whether we want to define and/or stabilize an LLVM ABI, which is of course broader than just C due to the richer (and simultaneously less rich) type system. My sense would be probably not, at least for everything (I don't know of any other platforms that do this, but I haven't thought about it recently). It probably would still be good to inform/consult stakeholders such as Halide (and Rust?) though.

tlively added inline comments.Aug 18 2021, 2:51 PM

llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297	This ABI change does affect the C ABI via the C vector extensions, but our documented C ABI does not say anything about those vector extensions at all, so it sounds like this is not a problem.

LGTM, thanks!

This revision is now accepted and ready to land.Aug 18 2021, 5:26 PM

dschuff added inline comments.Aug 18 2021, 5:51 PM

llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297	Probably it should say something, even if that something is "vector extensions are not considered part of the stable ABI"

Closed by commit rGb69374ca58d3: [WebAssembly] Legalize vector types by widening (authored by tlively). · Explain WhyAug 19 2021, 12:07 PM

This revision was automatically updated to reflect the committed changes.

tlively added a commit: rGb69374ca58d3: [WebAssembly] Legalize vector types by widening.

Revision Contents

Path

Size

llvm/

lib/

Target/

WebAssembly/

WebAssemblyISelLowering.h

3 lines

WebAssemblyISelLowering.cpp

18 lines

WebAssemblyInstrSIMD.td

91 lines

test/

CodeGen/

WebAssembly/

simd-concat.ll

12 lines

simd-extending.ll

34 lines

simd-load-store-alignment.ll

20 lines

simd-nonconst-sext.ll

simd-offset.ll

176 lines

simd-scalar-to-vector.ll

Diff 367583

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h

Show First 20 Lines • Show All 108 Lines • ▼ Show 20 Lines	bool getTgtMemIntrinsic(IntrinsicInfo &Info, const CallInst &I,
MachineFunction &MF,		MachineFunction &MF,
unsigned Intrinsic) const override;		unsigned Intrinsic) const override;

void computeKnownBitsForTargetNode(const SDValue Op, KnownBits &Known,		void computeKnownBitsForTargetNode(const SDValue Op, KnownBits &Known,
const APInt &DemandedElts,		const APInt &DemandedElts,
const SelectionDAG &DAG,		const SelectionDAG &DAG,
unsigned Depth) const override;		unsigned Depth) const override;

		TargetLoweringBase::LegalizeTypeAction
		getPreferredVectorAction(MVT VT) const override;

SDValue LowerCall(CallLoweringInfo &CLI,		SDValue LowerCall(CallLoweringInfo &CLI,
SmallVectorImpl<SDValue> &InVals) const override;		SmallVectorImpl<SDValue> &InVals) const override;
bool CanLowerReturn(CallingConv::ID CallConv, MachineFunction &MF,		bool CanLowerReturn(CallingConv::ID CallConv, MachineFunction &MF,
bool isVarArg,		bool isVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
		aheejinUnsubmitted Not Done Reply Inline Actions Why is the exception? aheejin: Why is the exception?
		tlivelyAuthorUnsubmitted Done Reply Inline Actions The target-independent code that figures out how to legalize types[1] tries the strategies in this order: PromoteInteger (i.e. make individual lanes types wider) WidenVector (i.e. keep the lane types but add more lanes) Split or Scalarize The preferred vector action says which strategy to start with, but if that fails, subsequent strategies will still be tried. The important part here is that strategies prior to the preferred strategy will not be tried. So if we set the preferred action to be WidenVector for all vector types, whenever that fails the vector will be split or scalarized instead of integer-promoted. Since we don't have any legal i1 vector types, this means that all i1 vectors would be scalarized without this exception, leading to terrible codegen. Thinking on this more, it seems that we should actually be more selective and opt into WidenVector as the preferred action only for i8, i16, i32, i64, f32, and f64 vectors, since those are the only vectors for which WidenVector can possibly succeed. [1] https://github.com/llvm/llvm-project/blob/a7ebc4d145892fd22442832549cb12c4b6920dea/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1403-L1499 tlively: The target-independent code that figures out how to legalize types[1] tries the strategies in…
LLVMContext &Context) const override;		LLVMContext &Context) const override;
SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,		SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::OutputArg> &Outs,		const SmallVectorImpl<ISD::OutputArg> &Outs,
const SmallVectorImpl<SDValue> &OutVals, const SDLoc &dl,		const SmallVectorImpl<SDValue> &OutVals, const SDLoc &dl,
SelectionDAG &DAG) const override;		SelectionDAG &DAG) const override;
SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,		SDValue LowerFormalArguments(SDValue Chain, CallingConv::ID CallConv,
bool IsVarArg,		bool IsVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins,		const SmallVectorImpl<ISD::InputArg> &Ins,
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	for (auto T : {MVT::v16i8, MVT::v8i16, MVT::v4i32, MVT::v2i64, MVT::v4f32,
}		}
}		}
// But some vector extending loads are legal		// But some vector extending loads are legal
for (auto Ext : {ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}) {		for (auto Ext : {ISD::EXTLOAD, ISD::SEXTLOAD, ISD::ZEXTLOAD}) {
setLoadExtAction(Ext, MVT::v8i16, MVT::v8i8, Legal);		setLoadExtAction(Ext, MVT::v8i16, MVT::v8i8, Legal);
setLoadExtAction(Ext, MVT::v4i32, MVT::v4i16, Legal);		setLoadExtAction(Ext, MVT::v4i32, MVT::v4i16, Legal);
setLoadExtAction(Ext, MVT::v2i64, MVT::v2i32, Legal);		setLoadExtAction(Ext, MVT::v2i64, MVT::v2i32, Legal);
}		}
// And some truncating stores are legal as well
setTruncStoreAction(MVT::v8i16, MVT::v8i8, Legal);
setTruncStoreAction(MVT::v4i32, MVT::v4i16, Legal);
}		}

// Don't do anything clever with build_pairs		// Don't do anything clever with build_pairs
setOperationAction(ISD::BUILD_PAIR, MVT::i64, Expand);		setOperationAction(ISD::BUILD_PAIR, MVT::i64, Expand);

// Trap lowers to wasm unreachable		// Trap lowers to wasm unreachable
setOperationAction(ISD::TRAP, MVT::Other, Legal);		setOperationAction(ISD::TRAP, MVT::Other, Legal);
setOperationAction(ISD::DEBUGTRAP, MVT::Other, Legal);		setOperationAction(ISD::DEBUGTRAP, MVT::Other, Legal);
▲ Show 20 Lines • Show All 532 Lines • ▼ Show 20 Lines	case Intrinsic::wasm_bitmask: {
Known.Zero \|= ZeroMask;		Known.Zero \|= ZeroMask;
break;		break;
}		}
}		}
}		}
}		}
}		}

		TargetLoweringBase::LegalizeTypeAction
		WebAssemblyTargetLowering::getPreferredVectorAction(MVT VT) const {
		if (VT.isFixedLengthVector()) {
		MVT EltVT = VT.getVectorElementType();
		// We have legal vector types with these lane types, so widening the
		// vector would let us use some of the lanes directly without having to
		// extend or truncate values.
		if (EltVT == MVT::i8 \|\| EltVT == MVT::i16 \|\| EltVT == MVT::i32 \|\|
		EltVT == MVT::i64 \|\| EltVT == MVT::f32 \|\| EltVT == MVT::f64)
		return TypeWidenVector;
		}

		return TargetLoweringBase::getPreferredVectorAction(VT);
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// WebAssembly Lowering private implementation.		// WebAssembly Lowering private implementation.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Lowering Code		// Lowering Code
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 1,612 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td

Show First 20 Lines • Show All 190 Lines • ▼ Show 20 Lines
defvar inst = "LOAD"#vec.lane_bits#"_SPLAT";		defvar inst = "LOAD"#vec.lane_bits#"_SPLAT";
defm : LoadPatNoOffset<vec.vt, load_splat, inst>;		defm : LoadPatNoOffset<vec.vt, load_splat, inst>;
defm : LoadPatImmOff<vec.vt, load_splat, regPlusImm, inst>;		defm : LoadPatImmOff<vec.vt, load_splat, regPlusImm, inst>;
defm : LoadPatImmOff<vec.vt, load_splat, or_is_add, inst>;		defm : LoadPatImmOff<vec.vt, load_splat, or_is_add, inst>;
defm : LoadPatOffsetOnly<vec.vt, load_splat, inst>;		defm : LoadPatOffsetOnly<vec.vt, load_splat, inst>;
defm : LoadPatGlobalAddrOffOnly<vec.vt, load_splat, inst>;		defm : LoadPatGlobalAddrOffOnly<vec.vt, load_splat, inst>;
}		}

// Load and extend		// Load and extend
multiclass SIMDLoadExtend<Vec vec, string loadPat, bits<32> simdop> {		multiclass SIMDLoadExtend<Vec vec, string loadPat, bits<32> simdop> {
defvar signed = vec.prefix#".load"#loadPat#"_s";		defvar signed = vec.prefix#".load"#loadPat#"_s";
defvar unsigned = vec.prefix#".load"#loadPat#"_u";		defvar unsigned = vec.prefix#".load"#loadPat#"_u";
let mayLoad = 1, UseNamedOperandTable = 1 in {		let mayLoad = 1, UseNamedOperandTable = 1 in {
defm LOAD_EXTEND_S_#vec#_A32 :		defm LOAD_EXTEND_S_#vec#_A32 :
SIMD_I<(outs V128:$dst),		SIMD_I<(outs V128:$dst),
		aheejinUnsubmitted Not Done Reply Inline Actions If what we want is to insert the element in the index 0 of the new vector, why is the splat necessary? Why is only `load64_splat` necessary? What happens when we widen i8x4 into i32x4? In this case do we need `load32_splat`? Is widening only happens by a factor of 2? aheejin: - If what we want is to insert the element in the index 0 of the new vector, why is the splat…
		tlivelyAuthorUnsubmitted Done Reply Inline Actions If what we want is to insert the element in the index 0 of the new vector, why is the splat necessary? Good point, I guess we could use v128.load64_zero here instead. Why is only `load64_splat` necessary? What happens when we widen i8x4 into i32x4? In this case do we need `load32_splat`? Is widening only happens by a factor of 2? We don't have any tests that cover those cases right now, but yes, I think we would need load32_splat or load32_zero in that case. Note that we could go even smaller and have e.g. v2i8. Unfortunately load16_zero does not exist, so there load16_splat would probably be the best option. We could also use load16_lane, but that would require an extra zero input vector. Since we don't have test cases that depend on these additional patterns right now, I would like to address them in a follow-up. tlively: > - If what we want is to insert the element in the index 0 of the new vector, why is the splat…
(ins P2Align:$p2align, offset32_op:$off, I32:$addr),		(ins P2Align:$p2align, offset32_op:$off, I32:$addr),
(outs), (ins P2Align:$p2align, offset32_op:$off), [],		(outs), (ins P2Align:$p2align, offset32_op:$off), [],
signed#"\t$dst, ${off}(${addr})$p2align",		signed#"\t$dst, ${off}(${addr})$p2align",
signed#"\t$off$p2align", simdop>;		signed#"\t$off$p2align", simdop>;
defm LOAD_EXTEND_U_#vec#_A32 :		defm LOAD_EXTEND_U_#vec#_A32 :
SIMD_I<(outs V128:$dst),		SIMD_I<(outs V128:$dst),
(ins P2Align:$p2align, offset32_op:$off, I32:$addr),		(ins P2Align:$p2align, offset32_op:$off, I32:$addr),
(outs), (ins P2Align:$p2align, offset32_op:$off), [],		(outs), (ins P2Align:$p2align, offset32_op:$off), [],
▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	SIMD_I<(outs V128:$dst),
name#"\t$dst, ${off}(${addr})$p2align",		name#"\t$dst, ${off}(${addr})$p2align",
name#"\t$off$p2align", simdop>;		name#"\t$off$p2align", simdop>;
} // mayLoad = 1, UseNamedOperandTable = 1		} // mayLoad = 1, UseNamedOperandTable = 1
}		}

defm "" : SIMDLoadZero<I32x4, 0x5c>;		defm "" : SIMDLoadZero<I32x4, 0x5c>;
defm "" : SIMDLoadZero<I64x2, 0x5d>;		defm "" : SIMDLoadZero<I64x2, 0x5d>;

		// Use load_zero to load scalars into vectors as well where possible.
		// TODO: i32, i16, and i8 scalars
		def load_scalar :
		PatFrag<(ops node:$addr), (scalar_to_vector (i64 (load $addr)))>;
		defm : LoadPatNoOffset<v2i64, load_scalar, "LOAD_ZERO_I64x2">;
		defm : LoadPatImmOff<v2i64, load_scalar, regPlusImm, "LOAD_ZERO_I64x2">;
		defm : LoadPatImmOff<v2i64, load_scalar, or_is_add, "LOAD_ZERO_I64x2">;
		defm : LoadPatOffsetOnly<v2i64, load_scalar, "LOAD_ZERO_I64x2">;
		defm : LoadPatGlobalAddrOffOnly<v2i64, load_scalar, "LOAD_ZERO_I64x2">;

// TODO: f32x4 and f64x2 as well		// TODO: f32x4 and f64x2 as well
foreach vec = [I32x4, I64x2] in {		foreach vec = [I32x4, I64x2] in {
defvar inst = "LOAD_ZERO_"#vec;		defvar inst = "LOAD_ZERO_"#vec;
defvar pat = PatFrag<(ops node:$ptr),		defvar pat = PatFrag<(ops node:$ptr),
(vector_insert (vec.splat (vec.lane_vt 0)), (vec.lane_vt (load $ptr)), 0)>;		(vector_insert (vec.splat (vec.lane_vt 0)), (vec.lane_vt (load $ptr)), 0)>;
defm : LoadPatNoOffset<vec.vt, pat, inst>;		defm : LoadPatNoOffset<vec.vt, pat, inst>;
defm : LoadPatImmOff<vec.vt, pat, regPlusImm, inst>;		defm : LoadPatImmOff<vec.vt, pat, regPlusImm, inst>;
defm : LoadPatImmOff<vec.vt, pat, or_is_add, inst>;		defm : LoadPatImmOff<vec.vt, pat, or_is_add, inst>;
▲ Show 20 Lines • Show All 958 Lines • ▼ Show 20 Lines	SIMD_I<(outs V128:$dst), (ins V128:$low, V128:$high), (outs), (ins),
[(set (vec.split.vt V128:$dst), (vec.split.vt (int_wasm_narrow_unsigned		[(set (vec.split.vt V128:$dst), (vec.split.vt (int_wasm_narrow_unsigned
(vec.vt V128:$low), (vec.vt V128:$high))))],		(vec.vt V128:$low), (vec.vt V128:$high))))],
name#"_u\t$dst, $low, $high", name#"_u", !add(baseInst, 1)>;		name#"_u\t$dst, $low, $high", name#"_u", !add(baseInst, 1)>;
}		}

defm "" : SIMDNarrow<I16x8, 101>;		defm "" : SIMDNarrow<I16x8, 101>;
defm "" : SIMDNarrow<I32x4, 133>;		defm "" : SIMDNarrow<I32x4, 133>;

// Use narrowing operations for truncating stores. Since the narrowing
// operations are saturating instead of truncating, we need to mask
// the stored values first.
def store_v8i8_trunc_v8i16 :
OutPatFrag<(ops node:$val),
(EXTRACT_LANE_I64x2
(NARROW_U_I8x16
(AND
(CONST_V128_I16x8
0x00ff, 0x00ff, 0x00ff, 0x00ff,
0x00ff, 0x00ff, 0x00ff, 0x00ff),
node:$val),
$val), // Unused input
0)>;

def store_v4i16_trunc_v4i32 :
OutPatFrag<(ops node:$val),
(EXTRACT_LANE_I64x2
(NARROW_U_I16x8
(AND
(CONST_V128_I32x4
0x0000ffff, 0x0000ffff, 0x0000ffff, 0x0000ffff),
node:$val),
$val), // Unused input
0)>;

// Store patterns adapted from WebAssemblyInstrMemory.td
multiclass NarrowingStorePatNoOffset<Vec vec, OutPatFrag out> {
defvar node = !cast<PatFrag>("truncstorevi"#vec.split.lane_bits);
def : Pat<(node vec.vt:$val, I32:$addr),
(STORE_I64_A32 0, 0, $addr, (out $val))>,
Requires<[HasAddr32]>;
def : Pat<(node vec.vt:$val, I64:$addr),
(STORE_I64_A64 0, 0, $addr, (out $val))>,
Requires<[HasAddr64]>;
}

defm : NarrowingStorePatNoOffset<I16x8, store_v8i8_trunc_v8i16>;
defm : NarrowingStorePatNoOffset<I32x4, store_v4i16_trunc_v4i32>;

multiclass NarrowingStorePatImmOff<Vec vec, PatFrag operand, OutPatFrag out> {
defvar node = !cast<PatFrag>("truncstorevi"#vec.split.lane_bits);
def : Pat<(node vec.vt:$val, (operand I32:$addr, imm:$off)),
(STORE_I64_A32 0, imm:$off, $addr, (out $val))>,
Requires<[HasAddr32]>;
def : Pat<(node vec.vt:$val, (operand I64:$addr, imm:$off)),
(STORE_I64_A64 0, imm:$off, $addr, (out $val))>,
Requires<[HasAddr64]>;
}

defm : NarrowingStorePatImmOff<I16x8, regPlusImm, store_v8i8_trunc_v8i16>;
defm : NarrowingStorePatImmOff<I32x4, regPlusImm, store_v4i16_trunc_v4i32>;
defm : NarrowingStorePatImmOff<I16x8, or_is_add, store_v8i8_trunc_v8i16>;
defm : NarrowingStorePatImmOff<I32x4, or_is_add, store_v4i16_trunc_v4i32>;

multiclass NarrowingStorePatOffsetOnly<Vec vec, OutPatFrag out> {
defvar node = !cast<PatFrag>("truncstorevi"#vec.split.lane_bits);
def : Pat<(node vec.vt:$val, imm:$off),
(STORE_I64_A32 0, imm:$off, (CONST_I32 0), (out $val))>,
Requires<[HasAddr32]>;
def : Pat<(node vec.vt:$val, imm:$off),
(STORE_I64_A64 0, imm:$off, (CONST_I64 0), (out $val))>,
Requires<[HasAddr64]>;
}

defm : NarrowingStorePatOffsetOnly<I16x8, store_v8i8_trunc_v8i16>;
defm : NarrowingStorePatOffsetOnly<I32x4, store_v4i16_trunc_v4i32>;

multiclass NarrowingStorePatGlobalAddrOffOnly<Vec vec, OutPatFrag out> {
defvar node = !cast<PatFrag>("truncstorevi"#vec.split.lane_bits);
def : Pat<(node vec.vt:$val, (WebAssemblywrapper tglobaladdr:$off)),
(STORE_I64_A32 0, tglobaladdr:$off, (CONST_I32 0), (out $val))>,
Requires<[IsNotPIC, HasAddr32]>;
def : Pat<(node vec.vt:$val, (WebAssemblywrapper tglobaladdr:$off)),
(STORE_I64_A64 0, tglobaladdr:$off, (CONST_I64 0), (out $val))>,
Requires<[IsNotPIC, HasAddr64]>;
}

defm : NarrowingStorePatGlobalAddrOffOnly<I16x8, store_v8i8_trunc_v8i16>;
defm : NarrowingStorePatGlobalAddrOffOnly<I32x4, store_v4i16_trunc_v4i32>;

// Bitcasts are nops		// Bitcasts are nops
// Matching bitcast t1 to t1 causes strange errors, so avoid repeating types		// Matching bitcast t1 to t1 causes strange errors, so avoid repeating types
foreach t1 = AllVecs in		foreach t1 = AllVecs in
foreach t2 = AllVecs in		foreach t2 = AllVecs in
if !ne(t1, t2) then		if !ne(t1, t2) then
def : Pat<(t1.vt (bitconvert (t2.vt V128:$v))), (t1.vt V128:$v)>;		def : Pat<(t1.vt (bitconvert (t2.vt V128:$v))), (t1.vt V128:$v)>;

// Extended pairwise addition		// Extended pairwise addition
Show All 25 Lines

llvm/test/CodeGen/WebAssembly/simd-concat.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -verify-machineinstrs -mattr=+simd128 \| FileCheck %s			; RUN: llc < %s -verify-machineinstrs -mattr=+simd128 \| FileCheck %s

	; Check that all varieties of vector concatenations get lowered to shuffles.			; Check that all varieties of vector concatenations get lowered to shuffles.

	target triple = "wasm32-unknown--wasm"			target triple = "wasm32-unknown--wasm"

	define <16 x i8> @concat_v8i8(<8 x i8> %a, <8 x i8> %b) {			define <16 x i8> @concat_v8i8(<8 x i8> %a, <8 x i8> %b) {
	; CHECK-LABEL: concat_v8i8:			; CHECK-LABEL: concat_v8i8:
	; CHECK: .functype concat_v8i8 (v128, v128) -> (v128)			; CHECK: .functype concat_v8i8 (v128, v128) -> (v128)
	; CHECK-NEXT: # %bb.0:			; CHECK-NEXT: # %bb.0:
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: local.get 1			; CHECK-NEXT: local.get 1
	; CHECK-NEXT: i8x16.shuffle 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30			; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23
	; CHECK-NEXT: # fallthrough-return			; CHECK-NEXT: # fallthrough-return
	%v = shufflevector <8 x i8> %a, <8 x i8> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>			%v = shufflevector <8 x i8> %a, <8 x i8> %b, <16 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
	ret <16 x i8> %v			ret <16 x i8> %v
	}			}

	define <8 x i8> @concat_v4i8(<4 x i8> %a, <4 x i8> %b) {			define <8 x i8> @concat_v4i8(<4 x i8> %a, <4 x i8> %b) {
	; CHECK-LABEL: concat_v4i8:			; CHECK-LABEL: concat_v4i8:
	; CHECK: .functype concat_v4i8 (v128, v128) -> (v128)			; CHECK: .functype concat_v4i8 (v128, v128) -> (v128)
	; CHECK-NEXT: # %bb.0:			; CHECK-NEXT: # %bb.0:
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: local.get 1			; CHECK-NEXT: local.get 1
	; CHECK-NEXT: i8x16.shuffle 0, 1, 4, 5, 8, 9, 12, 13, 16, 17, 20, 21, 24, 25, 28, 29			; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 16, 17, 18, 19, 0, 0, 0, 0, 0, 0, 0, 0
	; CHECK-NEXT: # fallthrough-return			; CHECK-NEXT: # fallthrough-return
	%v = shufflevector <4 x i8> %a, <4 x i8> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%v = shufflevector <4 x i8> %a, <4 x i8> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	ret <8 x i8> %v			ret <8 x i8> %v
	}			}

	define <8 x i16> @concat_v4i16(<4 x i16> %a, <4 x i16> %b) {			define <8 x i16> @concat_v4i16(<4 x i16> %a, <4 x i16> %b) {
	; CHECK-LABEL: concat_v4i16:			; CHECK-LABEL: concat_v4i16:
	; CHECK: .functype concat_v4i16 (v128, v128) -> (v128)			; CHECK: .functype concat_v4i16 (v128, v128) -> (v128)
	; CHECK-NEXT: # %bb.0:			; CHECK-NEXT: # %bb.0:
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: local.get 1			; CHECK-NEXT: local.get 1
	; CHECK-NEXT: i8x16.shuffle 0, 1, 4, 5, 8, 9, 12, 13, 16, 17, 20, 21, 24, 25, 28, 29			; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23
	; CHECK-NEXT: # fallthrough-return			; CHECK-NEXT: # fallthrough-return
	%v = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>			%v = shufflevector <4 x i16> %a, <4 x i16> %b, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
	ret <8 x i16> %v			ret <8 x i16> %v
	}			}

	define <4 x i8> @concat_v2i8(<2 x i8> %a, <2 x i8> %b) {			define <4 x i8> @concat_v2i8(<2 x i8> %a, <2 x i8> %b) {
	; CHECK-LABEL: concat_v2i8:			; CHECK-LABEL: concat_v2i8:
	; CHECK: .functype concat_v2i8 (v128, v128) -> (v128)			; CHECK: .functype concat_v2i8 (v128, v128) -> (v128)
	; CHECK-NEXT: # %bb.0:			; CHECK-NEXT: # %bb.0:
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: local.get 1			; CHECK-NEXT: local.get 1
	; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27			; CHECK-NEXT: i8x16.shuffle 0, 1, 16, 17, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
	; CHECK-NEXT: # fallthrough-return			; CHECK-NEXT: # fallthrough-return
	%v = shufflevector <2 x i8> %a, <2 x i8> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%v = shufflevector <2 x i8> %a, <2 x i8> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i8> %v			ret <4 x i8> %v
	}			}

	define <4 x i16> @concat_v2i16(<2 x i16> %a, <2 x i16> %b) {			define <4 x i16> @concat_v2i16(<2 x i16> %a, <2 x i16> %b) {
	; CHECK-LABEL: concat_v2i16:			; CHECK-LABEL: concat_v2i16:
	; CHECK: .functype concat_v2i16 (v128, v128) -> (v128)			; CHECK: .functype concat_v2i16 (v128, v128) -> (v128)
	; CHECK-NEXT: # %bb.0:			; CHECK-NEXT: # %bb.0:
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: local.get 1			; CHECK-NEXT: local.get 1
	; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27			; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 16, 17, 18, 19, 0, 0, 0, 0, 0, 0, 0, 0
	; CHECK-NEXT: # fallthrough-return			; CHECK-NEXT: # fallthrough-return
	%v = shufflevector <2 x i16> %a, <2 x i16> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%v = shufflevector <2 x i16> %a, <2 x i16> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i16> %v			ret <4 x i16> %v
	}			}

	define <4 x i32> @concat_v2i32(<2 x i32> %a, <2 x i32> %b) {			define <4 x i32> @concat_v2i32(<2 x i32> %a, <2 x i32> %b) {
	; CHECK-LABEL: concat_v2i32:			; CHECK-LABEL: concat_v2i32:
	; CHECK: .functype concat_v2i32 (v128, v128) -> (v128)			; CHECK: .functype concat_v2i32 (v128, v128) -> (v128)
	; CHECK-NEXT: # %bb.0:			; CHECK-NEXT: # %bb.0:
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: local.get 1			; CHECK-NEXT: local.get 1
	; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27			; CHECK-NEXT: i8x16.shuffle 0, 1, 2, 3, 4, 5, 6, 7, 16, 17, 18, 19, 20, 21, 22, 23
	; CHECK-NEXT: # fallthrough-return			; CHECK-NEXT: # fallthrough-return
	%v = shufflevector <2 x i32> %a, <2 x i32> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>			%v = shufflevector <2 x i32> %a, <2 x i32> %b, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	ret <4 x i32> %v			ret <4 x i32> %v
	}			}

llvm/test/CodeGen/WebAssembly/simd-extending.ll

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	;; Also test that similar patterns with offsets not corresponding to			;; Also test that similar patterns with offsets not corresponding to
	;; the low or high half are correctly expanded.			;; the low or high half are correctly expanded.

	define <8 x i16> @extend_lowish_i8x16_s(<16 x i8> %v) {			define <8 x i16> @extend_lowish_i8x16_s(<16 x i8> %v) {
	; CHECK-LABEL: extend_lowish_i8x16_s:			; CHECK-LABEL: extend_lowish_i8x16_s:
	; CHECK: .functype extend_lowish_i8x16_s (v128) -> (v128)			; CHECK: .functype extend_lowish_i8x16_s (v128) -> (v128)
	; CHECK-NEXT: # %bb.0:			; CHECK-NEXT: # %bb.0:
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i8x16.extract_lane_u 1
	; CHECK-NEXT: i16x8.splat
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i8x16.extract_lane_u 2			; CHECK-NEXT: i8x16.shuffle 1, 0, 2, 0, 3, 0, 4, 0, 5, 0, 6, 0, 7, 0, 8, 0
	; CHECK-NEXT: i16x8.replace_lane 1
	; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i8x16.extract_lane_u 3
	; CHECK-NEXT: i16x8.replace_lane 2
	; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i8x16.extract_lane_u 4
	; CHECK-NEXT: i16x8.replace_lane 3
	; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i8x16.extract_lane_u 5
	; CHECK-NEXT: i16x8.replace_lane 4
	; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i8x16.extract_lane_u 6
	; CHECK-NEXT: i16x8.replace_lane 5
	; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i8x16.extract_lane_u 7
	; CHECK-NEXT: i16x8.replace_lane 6
	; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i8x16.extract_lane_u 8
	; CHECK-NEXT: i16x8.replace_lane 7
	; CHECK-NEXT: i32.const 8			; CHECK-NEXT: i32.const 8
	; CHECK-NEXT: i16x8.shl			; CHECK-NEXT: i16x8.shl
	; CHECK-NEXT: i32.const 8			; CHECK-NEXT: i32.const 8
	; CHECK-NEXT: i16x8.shr_s			; CHECK-NEXT: i16x8.shr_s
	; CHECK-NEXT: # fallthrough-return			; CHECK-NEXT: # fallthrough-return
	%lowish = shufflevector <16 x i8> %v, <16 x i8> undef,			%lowish = shufflevector <16 x i8> %v, <16 x i8> undef,
	<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>			<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
	%extended = sext <8 x i8> %lowish to <8 x i16>			%extended = sext <8 x i8> %lowish to <8 x i16>
	ret <8 x i16> %extended			ret <8 x i16> %extended
	}			}

	define <4 x i32> @extend_lowish_i16x8_s(<8 x i16> %v) {			define <4 x i32> @extend_lowish_i16x8_s(<8 x i16> %v) {
	; CHECK-LABEL: extend_lowish_i16x8_s:			; CHECK-LABEL: extend_lowish_i16x8_s:
	; CHECK: .functype extend_lowish_i16x8_s (v128) -> (v128)			; CHECK: .functype extend_lowish_i16x8_s (v128) -> (v128)
	; CHECK-NEXT: # %bb.0:			; CHECK-NEXT: # %bb.0:
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i16x8.extract_lane_u 1
	; CHECK-NEXT: i32x4.splat
	; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i16x8.extract_lane_u 2
	; CHECK-NEXT: i32x4.replace_lane 1
	; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i16x8.extract_lane_u 3
	; CHECK-NEXT: i32x4.replace_lane 2
	; CHECK-NEXT: local.get 0			; CHECK-NEXT: local.get 0
	; CHECK-NEXT: i16x8.extract_lane_u 4			; CHECK-NEXT: i8x16.shuffle 2, 3, 0, 0, 4, 5, 0, 0, 6, 7, 0, 0, 8, 9, 0, 0
	; CHECK-NEXT: i32x4.replace_lane 3
	; CHECK-NEXT: i32.const 16			; CHECK-NEXT: i32.const 16
	; CHECK-NEXT: i32x4.shl			; CHECK-NEXT: i32x4.shl
	; CHECK-NEXT: i32.const 16			; CHECK-NEXT: i32.const 16
	; CHECK-NEXT: i32x4.shr_s			; CHECK-NEXT: i32x4.shr_s
	; CHECK-NEXT: # fallthrough-return			; CHECK-NEXT: # fallthrough-return
	%lowish = shufflevector <8 x i16> %v, <8 x i16> undef,			%lowish = shufflevector <8 x i16> %v, <8 x i16> undef,
	<4 x i32> <i32 1, i32 2, i32 3, i32 4>			<4 x i32> <i32 1, i32 2, i32 3, i32 4>
	%extended = sext <4 x i16> %lowish to <4 x i32>			%extended = sext <4 x i16> %lowish to <4 x i32>
	ret <4 x i32> %extended			ret <4 x i32> %extended
	}			}

llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll

Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret void		ret void
}		}

define <8 x i8> @load_ext_v8i16_a1(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_a1(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_a1:		; CHECK-LABEL: load_ext_v8i16_a1:
; CHECK: .functype load_ext_v8i16_a1 (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_a1 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.load8x8_u 0:p2align=0		; CHECK-NEXT: v128.load64_zero 0:p2align=0
		aheejinUnsubmitted Not Done Reply Inline Actions It looks this actually changes the result being returned compared to the previously generate code. Is that fine? aheejin: It looks this actually changes the result being returned compared to the previously generate…
		tlivelyAuthorUnsubmitted Not Done Reply Inline Actions This is expected because the way we are representing these vector types has changed. It is an ABI break for vector code, but I hope that will be ok in practice. Perhaps it's worth mentioning in the LLVM release notes? @dschuff, do you have thoughts here? @srj, will that cause problems for Halide? tlively: This is expected because the way we are representing these vector types has changed. It is an…
		dschuffUnsubmitted Not Done Reply Inline Actions There are sort of 2 issues here. One is that our stable ABI is actually a C ABI, and not an LLVM ABI. I forget whether our C ABI (or other C ABIs) actually even defines the convention for vector types, but maybe it should (even if we declare it unstable). It's probably fair to say that we've not promised/stabilized a C vector ABI yet, so if this is a break of some part of the C ABI, I don't know that I'm too worried about it. The other question is whether we want to define and/or stabilize an LLVM ABI, which is of course broader than just C due to the richer (and simultaneously less rich) type system. My sense would be probably not, at least for everything (I don't know of any other platforms that do this, but I haven't thought about it recently). It probably would still be good to inform/consult stakeholders such as Halide (and Rust?) though. dschuff: There are sort of 2 issues here. One is that our stable ABI is actually a C ABI, and not an…
		tlivelyAuthorUnsubmitted Done Reply Inline Actions This ABI change does affect the C ABI via the C vector extensions, but our documented C ABI does not say anything about those vector extensions at all, so it sounds like this is not a problem. tlively: This ABI change does affect the C ABI via the C vector extensions, but our documented C ABI…
		dschuffUnsubmitted Not Done Reply Inline Actions Probably it should say something, even if that something is "vector extensions are not considered part of the stable ABI" dschuff: Probably it should say something, even if that something is "vector extensions are not…
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <8 x i8>, <8 x i8>* %p, align 1		%v = load <8 x i8>, <8 x i8>* %p, align 1
ret <8 x i8> %v		ret <8 x i8> %v
}		}

define <8 x i8> @load_ext_v8i16_a2(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_a2(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_a2:		; CHECK-LABEL: load_ext_v8i16_a2:
; CHECK: .functype load_ext_v8i16_a2 (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_a2 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.load8x8_u 0:p2align=1		; CHECK-NEXT: v128.load64_zero 0:p2align=1
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <8 x i8>, <8 x i8>* %p, align 2		%v = load <8 x i8>, <8 x i8>* %p, align 2
ret <8 x i8> %v		ret <8 x i8> %v
}		}

define <8 x i8> @load_ext_v8i16_a4(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_a4(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_a4:		; CHECK-LABEL: load_ext_v8i16_a4:
; CHECK: .functype load_ext_v8i16_a4 (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_a4 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.load8x8_u 0:p2align=2		; CHECK-NEXT: v128.load64_zero 0:p2align=2
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <8 x i8>, <8 x i8>* %p, align 4		%v = load <8 x i8>, <8 x i8>* %p, align 4
ret <8 x i8> %v		ret <8 x i8> %v
}		}

; 8 is the default alignment for v128 extending load so no attribute is needed.		; 8 is the default alignment for v128 extending load so no attribute is needed.
define <8 x i8> @load_ext_v8i16_a8(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_a8(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_a8:		; CHECK-LABEL: load_ext_v8i16_a8:
; CHECK: .functype load_ext_v8i16_a8 (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_a8 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.load8x8_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <8 x i8>, <8 x i8>* %p, align 8		%v = load <8 x i8>, <8 x i8>* %p, align 8
ret <8 x i8> %v		ret <8 x i8> %v
}		}

; 16 is greater than the default alignment so it is ignored.		; 16 is greater than the default alignment so it is ignored.
define <8 x i8> @load_ext_v8i16_a16(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_a16(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_a16:		; CHECK-LABEL: load_ext_v8i16_a16:
; CHECK: .functype load_ext_v8i16_a16 (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_a16 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.load8x8_u 0		; CHECK-NEXT: v128.load 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <8 x i8>, <8 x i8>* %p, align 16		%v = load <8 x i8>, <8 x i8>* %p, align 16
ret <8 x i8> %v		ret <8 x i8> %v
}		}

define <8 x i16> @load_sext_v8i16_a1(<8 x i8>* %p) {		define <8 x i16> @load_sext_v8i16_a1(<8 x i8>* %p) {
; CHECK-LABEL: load_sext_v8i16_a1:		; CHECK-LABEL: load_sext_v8i16_a1:
; CHECK: .functype load_sext_v8i16_a1 (i32) -> (v128)		; CHECK: .functype load_sext_v8i16_a1 (i32) -> (v128)
▲ Show 20 Lines • Show All 279 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret void		ret void
}		}

define <4 x i16> @load_ext_v4i32_a1(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_a1(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_a1:		; CHECK-LABEL: load_ext_v4i32_a1:
; CHECK: .functype load_ext_v4i32_a1 (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_a1 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32x4.load16x4_u 0:p2align=0		; CHECK-NEXT: v128.load64_zero 0:p2align=0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <4 x i16>, <4 x i16>* %p, align 1		%v = load <4 x i16>, <4 x i16>* %p, align 1
ret <4 x i16> %v		ret <4 x i16> %v
}		}

define <4 x i16> @load_ext_v4i32_a2(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_a2(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_a2:		; CHECK-LABEL: load_ext_v4i32_a2:
; CHECK: .functype load_ext_v4i32_a2 (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_a2 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32x4.load16x4_u 0:p2align=1		; CHECK-NEXT: v128.load64_zero 0:p2align=1
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <4 x i16>, <4 x i16>* %p, align 2		%v = load <4 x i16>, <4 x i16>* %p, align 2
ret <4 x i16> %v		ret <4 x i16> %v
}		}

define <4 x i16> @load_ext_v4i32_a4(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_a4(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_a4:		; CHECK-LABEL: load_ext_v4i32_a4:
; CHECK: .functype load_ext_v4i32_a4 (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_a4 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32x4.load16x4_u 0:p2align=2		; CHECK-NEXT: v128.load64_zero 0:p2align=2
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <4 x i16>, <4 x i16>* %p, align 4		%v = load <4 x i16>, <4 x i16>* %p, align 4
ret <4 x i16> %v		ret <4 x i16> %v
}		}

; 8 is the default alignment for v128 extending load so no attribute is needed.		; 8 is the default alignment for v128 extending load so no attribute is needed.
define <4 x i16> @load_ext_v4i32_a8(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_a8(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_a8:		; CHECK-LABEL: load_ext_v4i32_a8:
; CHECK: .functype load_ext_v4i32_a8 (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_a8 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32x4.load16x4_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <4 x i16>, <4 x i16>* %p, align 8		%v = load <4 x i16>, <4 x i16>* %p, align 8
ret <4 x i16> %v		ret <4 x i16> %v
}		}

; 16 is greater than the default alignment so it is ignored.		; 16 is greater than the default alignment so it is ignored.
define <4 x i16> @load_ext_v4i32_a16(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_a16(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_a16:		; CHECK-LABEL: load_ext_v4i32_a16:
; CHECK: .functype load_ext_v4i32_a16 (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_a16 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32x4.load16x4_u 0		; CHECK-NEXT: v128.load 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <4 x i16>, <4 x i16>* %p, align 16		%v = load <4 x i16>, <4 x i16>* %p, align 16
ret <4 x i16> %v		ret <4 x i16> %v
}		}

define <4 x i32> @load_sext_v4i32_a1(<4 x i16>* %p) {		define <4 x i32> @load_sext_v4i32_a1(<4 x i16>* %p) {
; CHECK-LABEL: load_sext_v4i32_a1:		; CHECK-LABEL: load_sext_v4i32_a1:
; CHECK: .functype load_sext_v4i32_a1 (i32) -> (v128)		; CHECK: .functype load_sext_v4i32_a1 (i32) -> (v128)
▲ Show 20 Lines • Show All 833 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-nonconst-sext.ll

This file was deleted.

	; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -mattr=+simd128 \| FileCheck %s

	; A regression test for a bug in the lowering of SIGN_EXTEND_INREG
	; with SIMD and without sign-ext where ISel would crash if the index
	; of the vector extract was not a constant.

	target triple = "wasm32"

	; CHECK-LABEL: foo:
	; CHECK-NEXT: .functype foo () -> (f32)
	; CHECK: i32x4.load16x4_u
	; CHECK: f32.convert_i32_s
	define float @foo() {
	%1 = load <4 x i16>, <4 x i16>* undef, align 8
	%2 = load i32, i32* undef, align 4
	%vecext = extractelement <4 x i16> %1, i32 %2
	%conv = sitofp i16 %vecext to float
	ret float %conv
	}

llvm/test/CodeGen/WebAssembly/simd-offset.ll

Show First 20 Lines • Show All 396 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <8 x i16> %v2		ret <8 x i16> %v2
}		}

define <8 x i8> @load_ext_v8i16(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16:		; CHECK-LABEL: load_ext_v8i16:
; CHECK: .functype load_ext_v8i16 (i32) -> (v128)		; CHECK: .functype load_ext_v8i16 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.load8x8_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <8 x i8>, <8 x i8>* %p		%v = load <8 x i8>, <8 x i8>* %p
ret <8 x i8> %v		ret <8 x i8> %v
}		}

define <8 x i16> @load_v8i16_with_folded_offset(<8 x i16>* %p) {		define <8 x i16> @load_v8i16_with_folded_offset(<8 x i16>* %p) {
; CHECK-LABEL: load_v8i16_with_folded_offset:		; CHECK-LABEL: load_v8i16_with_folded_offset:
; CHECK: .functype load_v8i16_with_folded_offset (i32) -> (v128)		; CHECK: .functype load_v8i16_with_folded_offset (i32) -> (v128)
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <8 x i16> %v2		ret <8 x i16> %v2
}		}

define <8 x i8> @load_ext_v8i16_with_folded_offset(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_with_folded_offset(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_with_folded_offset:		; CHECK-LABEL: load_ext_v8i16_with_folded_offset:
; CHECK: .functype load_ext_v8i16_with_folded_offset (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_with_folded_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.load8x8_u 16		; CHECK-NEXT: v128.load64_zero 16
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <8 x i8>* %p to i32		%q = ptrtoint <8 x i8>* %p to i32
%r = add nuw i32 %q, 16		%r = add nuw i32 %q, 16
%s = inttoptr i32 %r to <8 x i8>*		%s = inttoptr i32 %r to <8 x i8>*
%v = load <8 x i8>, <8 x i8>* %s		%v = load <8 x i8>, <8 x i8>* %s
ret <8 x i8> %v		ret <8 x i8> %v
}		}

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <8 x i16> %v2		ret <8 x i16> %v2
}		}

define <8 x i8> @load_ext_v8i16_with_folded_gep_offset(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_with_folded_gep_offset(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_with_folded_gep_offset:		; CHECK-LABEL: load_ext_v8i16_with_folded_gep_offset:
; CHECK: .functype load_ext_v8i16_with_folded_gep_offset (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_with_folded_gep_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.load8x8_u 8		; CHECK-NEXT: v128.load64_zero 8
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <8 x i8>, <8 x i8>* %p, i32 1		%s = getelementptr inbounds <8 x i8>, <8 x i8>* %p, i32 1
%v = load <8 x i8>, <8 x i8>* %s		%v = load <8 x i8>, <8 x i8>* %s
ret <8 x i8> %v		ret <8 x i8> %v
}		}

define <8 x i16> @load_v8i16_with_unfolded_gep_negative_offset(<8 x i16>* %p) {		define <8 x i16> @load_v8i16_with_unfolded_gep_negative_offset(<8 x i16>* %p) {
; CHECK-LABEL: load_v8i16_with_unfolded_gep_negative_offset:		; CHECK-LABEL: load_v8i16_with_unfolded_gep_negative_offset:
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines

define <8 x i8> @load_ext_v8i16_with_unfolded_gep_negative_offset(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_with_unfolded_gep_negative_offset(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_with_unfolded_gep_negative_offset:		; CHECK-LABEL: load_ext_v8i16_with_unfolded_gep_negative_offset:
; CHECK: .functype load_ext_v8i16_with_unfolded_gep_negative_offset (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_with_unfolded_gep_negative_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const -8		; CHECK-NEXT: i32.const -8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i16x8.load8x8_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <8 x i8>, <8 x i8>* %p, i32 -1		%s = getelementptr inbounds <8 x i8>, <8 x i8>* %p, i32 -1
%v = load <8 x i8>, <8 x i8>* %s		%v = load <8 x i8>, <8 x i8>* %s
ret <8 x i8> %v		ret <8 x i8> %v
}		}

define <8 x i16> @load_v8i16_with_unfolded_offset(<8 x i16>* %p) {		define <8 x i16> @load_v8i16_with_unfolded_offset(<8 x i16>* %p) {
; CHECK-LABEL: load_v8i16_with_unfolded_offset:		; CHECK-LABEL: load_v8i16_with_unfolded_offset:
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines

define <8 x i8> @load_ext_v8i16_with_unfolded_offset(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_with_unfolded_offset(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_with_unfolded_offset:		; CHECK-LABEL: load_ext_v8i16_with_unfolded_offset:
; CHECK: .functype load_ext_v8i16_with_unfolded_offset (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_with_unfolded_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const 16		; CHECK-NEXT: i32.const 16
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i16x8.load8x8_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <8 x i8>* %p to i32		%q = ptrtoint <8 x i8>* %p to i32
%r = add nsw i32 %q, 16		%r = add nsw i32 %q, 16
%s = inttoptr i32 %r to <8 x i8>*		%s = inttoptr i32 %r to <8 x i8>*
%v = load <8 x i8>, <8 x i8>* %s		%v = load <8 x i8>, <8 x i8>* %s
ret <8 x i8> %v		ret <8 x i8> %v
}		}

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

define <8 x i8> @load_ext_v8i16_with_unfolded_gep_offset(<8 x i8>* %p) {		define <8 x i8> @load_ext_v8i16_with_unfolded_gep_offset(<8 x i8>* %p) {
; CHECK-LABEL: load_ext_v8i16_with_unfolded_gep_offset:		; CHECK-LABEL: load_ext_v8i16_with_unfolded_gep_offset:
; CHECK: .functype load_ext_v8i16_with_unfolded_gep_offset (i32) -> (v128)		; CHECK: .functype load_ext_v8i16_with_unfolded_gep_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const 8		; CHECK-NEXT: i32.const 8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i16x8.load8x8_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr <8 x i8>, <8 x i8>* %p, i32 1		%s = getelementptr <8 x i8>, <8 x i8>* %p, i32 1
%v = load <8 x i8>, <8 x i8>* %s		%v = load <8 x i8>, <8 x i8>* %s
ret <8 x i8> %v		ret <8 x i8> %v
}		}

define <8 x i16> @load_v8i16_from_numeric_address() {		define <8 x i16> @load_v8i16_from_numeric_address() {
; CHECK-LABEL: load_v8i16_from_numeric_address:		; CHECK-LABEL: load_v8i16_from_numeric_address:
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <8 x i16> %v2		ret <8 x i16> %v2
}		}

define <8 x i8> @load_ext_v8i16_from_numeric_address() {		define <8 x i8> @load_ext_v8i16_from_numeric_address() {
; CHECK-LABEL: load_ext_v8i16_from_numeric_address:		; CHECK-LABEL: load_ext_v8i16_from_numeric_address:
; CHECK: .functype load_ext_v8i16_from_numeric_address () -> (v128)		; CHECK: .functype load_ext_v8i16_from_numeric_address () -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: i16x8.load8x8_u 32		; CHECK-NEXT: v128.load64_zero 32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = inttoptr i32 32 to <8 x i8>*		%s = inttoptr i32 32 to <8 x i8>*
%v = load <8 x i8>, <8 x i8>* %s		%v = load <8 x i8>, <8 x i8>* %s
ret <8 x i8> %v		ret <8 x i8> %v
}		}

@gv_v8i16 = global <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42>		@gv_v8i16 = global <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42>
define <8 x i16> @load_v8i16_from_global_address() {		define <8 x i16> @load_v8i16_from_global_address() {
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <8 x i16> %v2		ret <8 x i16> %v2
}		}

define <8 x i8> @load_ext_v8i16_from_global_address() {		define <8 x i8> @load_ext_v8i16_from_global_address() {
; CHECK-LABEL: load_ext_v8i16_from_global_address:		; CHECK-LABEL: load_ext_v8i16_from_global_address:
; CHECK: .functype load_ext_v8i16_from_global_address () -> (v128)		; CHECK: .functype load_ext_v8i16_from_global_address () -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: i16x8.load8x8_u gv_v8i8		; CHECK-NEXT: v128.load64_zero gv_v8i8
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <8 x i8>, <8 x i8>* @gv_v8i8		%v = load <8 x i8>, <8 x i8>* @gv_v8i8
ret <8 x i8> %v		ret <8 x i8> %v
}		}


define void @store_v8i16(<8 x i16> %v, <8 x i16>* %p) {		define void @store_v8i16(<8 x i16> %v, <8 x i16>* %p) {
; CHECK-LABEL: store_v8i16:		; CHECK-LABEL: store_v8i16:
; CHECK: .functype store_v8i16 (v128, i32) -> ()		; CHECK: .functype store_v8i16 (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.store 0		; CHECK-NEXT: v128.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
store <8 x i16> %v , <8 x i16>* %p		store <8 x i16> %v , <8 x i16>* %p
ret void		ret void
}		}

define void @store_narrowing_v8i16(<8 x i8> %v, <8 x i8>* %p) {		define void @store_narrowing_v8i16(<8 x i8> %v, <8 x i8>* %p) {
; CHECK-LABEL: store_narrowing_v8i16:		; CHECK-LABEL: store_narrowing_v8i16:
; CHECK: .functype store_narrowing_v8i16 (v128, i32) -> ()		; CHECK: .functype store_narrowing_v8i16 (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: v128.const 255, 255, 255, 255, 255, 255, 255, 255
; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i8x16.narrow_i16x8_u		; CHECK-NEXT: v128.store64_lane 0, 0
		aheejinUnsubmitted Not Done Reply Inline Actions Now that we are doing this, are the narrowing store patterns added in D84377 necessary? aheejin: Now that we are doing this, are the narrowing store patterns added in D84377 necessary?
		tlivelyAuthorUnsubmitted Done Reply Inline Actions Nice catch! It looks like that can all be removed now. tlively: Nice catch! It looks like that can all be removed now.
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
store <8 x i8> %v, <8 x i8>* %p		store <8 x i8> %v, <8 x i8>* %p
ret void		ret void
}		}

define void @store_v8i16_with_folded_offset(<8 x i16> %v, <8 x i16>* %p) {		define void @store_v8i16_with_folded_offset(<8 x i16> %v, <8 x i16>* %p) {
; CHECK-LABEL: store_v8i16_with_folded_offset:		; CHECK-LABEL: store_v8i16_with_folded_offset:
; CHECK: .functype store_v8i16_with_folded_offset (v128, i32) -> ()		; CHECK: .functype store_v8i16_with_folded_offset (v128, i32) -> ()
Show All 9 Lines	; CHECK-NEXT: # fallthrough-return
ret void		ret void
}		}

define void @store_narrowing_v8i16_with_folded_offset(<8 x i8> %v, <8 x i8>* %p) {		define void @store_narrowing_v8i16_with_folded_offset(<8 x i8> %v, <8 x i8>* %p) {
; CHECK-LABEL: store_narrowing_v8i16_with_folded_offset:		; CHECK-LABEL: store_narrowing_v8i16_with_folded_offset:
; CHECK: .functype store_narrowing_v8i16_with_folded_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v8i16_with_folded_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: v128.const 255, 255, 255, 255, 255, 255, 255, 255		; CHECK-NEXT: i32.const 16
; CHECK-NEXT: local.get 0		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i8x16.narrow_i16x8_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 16
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <8 x i8>* %p to i32		%q = ptrtoint <8 x i8>* %p to i32
%r = add nuw i32 %q, 16		%r = add nuw i32 %q, 16
%s = inttoptr i32 %r to <8 x i8>*		%s = inttoptr i32 %r to <8 x i8>*
store <8 x i8> %v , <8 x i8>* %s		store <8 x i8> %v , <8 x i8>* %s
ret void		ret void
}		}

Show All 10 Lines	; CHECK-NEXT: # fallthrough-return
ret void		ret void
}		}

define void @store_narrowing_v8i16_with_folded_gep_offset(<8 x i8> %v, <8 x i8>* %p) {		define void @store_narrowing_v8i16_with_folded_gep_offset(<8 x i8> %v, <8 x i8>* %p) {
; CHECK-LABEL: store_narrowing_v8i16_with_folded_gep_offset:		; CHECK-LABEL: store_narrowing_v8i16_with_folded_gep_offset:
; CHECK: .functype store_narrowing_v8i16_with_folded_gep_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v8i16_with_folded_gep_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: v128.const 255, 255, 255, 255, 255, 255, 255, 255		; CHECK-NEXT: i32.const 8
; CHECK-NEXT: local.get 0		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i8x16.narrow_i16x8_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 8
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <8 x i8>, <8 x i8>* %p, i32 1		%s = getelementptr inbounds <8 x i8>, <8 x i8>* %p, i32 1
store <8 x i8> %v , <8 x i8>* %s		store <8 x i8> %v , <8 x i8>* %s
ret void		ret void
}		}

define void @store_v8i16_with_unfolded_gep_negative_offset(<8 x i16> %v, <8 x i16>* %p) {		define void @store_v8i16_with_unfolded_gep_negative_offset(<8 x i16> %v, <8 x i16>* %p) {
; CHECK-LABEL: store_v8i16_with_unfolded_gep_negative_offset:		; CHECK-LABEL: store_v8i16_with_unfolded_gep_negative_offset:
Show All 12 Lines

define void @store_narrowing_v8i16_with_unfolded_gep_negative_offset(<8 x i8> %v, <8 x i8>* %p) {		define void @store_narrowing_v8i16_with_unfolded_gep_negative_offset(<8 x i8> %v, <8 x i8>* %p) {
; CHECK-LABEL: store_narrowing_v8i16_with_unfolded_gep_negative_offset:		; CHECK-LABEL: store_narrowing_v8i16_with_unfolded_gep_negative_offset:
; CHECK: .functype store_narrowing_v8i16_with_unfolded_gep_negative_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v8i16_with_unfolded_gep_negative_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: i32.const -8		; CHECK-NEXT: i32.const -8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.const 255, 255, 255, 255, 255, 255, 255, 255
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: local.get 0
; CHECK-NEXT: i8x16.narrow_i16x8_u
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <8 x i8>, <8 x i8>* %p, i32 -1		%s = getelementptr inbounds <8 x i8>, <8 x i8>* %p, i32 -1
store <8 x i8> %v , <8 x i8>* %s		store <8 x i8> %v , <8 x i8>* %s
ret void		ret void
}		}

define void @store_v8i16_with_unfolded_offset(<8 x i16> %v, <8 x i16>* %p) {		define void @store_v8i16_with_unfolded_offset(<8 x i16> %v, <8 x i16>* %p) {
; CHECK-LABEL: store_v8i16_with_unfolded_offset:		; CHECK-LABEL: store_v8i16_with_unfolded_offset:
Show All 14 Lines

define void @store_narrowing_v8i16_with_unfolded_offset(<8 x i8> %v, <8 x i8>* %p) {		define void @store_narrowing_v8i16_with_unfolded_offset(<8 x i8> %v, <8 x i8>* %p) {
; CHECK-LABEL: store_narrowing_v8i16_with_unfolded_offset:		; CHECK-LABEL: store_narrowing_v8i16_with_unfolded_offset:
; CHECK: .functype store_narrowing_v8i16_with_unfolded_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v8i16_with_unfolded_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: i32.const 16		; CHECK-NEXT: i32.const 16
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.const 255, 255, 255, 255, 255, 255, 255, 255
; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i8x16.narrow_i16x8_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <8 x i8>* %p to i32		%q = ptrtoint <8 x i8>* %p to i32
%r = add nsw i32 %q, 16		%r = add nsw i32 %q, 16
%s = inttoptr i32 %r to <8 x i8>*		%s = inttoptr i32 %r to <8 x i8>*
store <8 x i8> %v , <8 x i8>* %s		store <8 x i8> %v , <8 x i8>* %s
ret void		ret void
}		}

Show All 14 Lines

define void @store_narrowing_v8i16_with_unfolded_gep_offset(<8 x i8> %v, <8 x i8>* %p) {		define void @store_narrowing_v8i16_with_unfolded_gep_offset(<8 x i8> %v, <8 x i8>* %p) {
; CHECK-LABEL: store_narrowing_v8i16_with_unfolded_gep_offset:		; CHECK-LABEL: store_narrowing_v8i16_with_unfolded_gep_offset:
; CHECK: .functype store_narrowing_v8i16_with_unfolded_gep_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v8i16_with_unfolded_gep_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: i32.const 8		; CHECK-NEXT: i32.const 8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.const 255, 255, 255, 255, 255, 255, 255, 255
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: local.get 0
; CHECK-NEXT: i8x16.narrow_i16x8_u
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr <8 x i8>, <8 x i8>* %p, i32 1		%s = getelementptr <8 x i8>, <8 x i8>* %p, i32 1
store <8 x i8> %v , <8 x i8>* %s		store <8 x i8> %v , <8 x i8>* %s
ret void		ret void
}		}

define void @store_v8i16_to_numeric_address(<8 x i16> %v) {		define void @store_v8i16_to_numeric_address(<8 x i16> %v) {
; CHECK-LABEL: store_v8i16_to_numeric_address:		; CHECK-LABEL: store_v8i16_to_numeric_address:
; CHECK: .functype store_v8i16_to_numeric_address (v128) -> ()		; CHECK: .functype store_v8i16_to_numeric_address (v128) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.store 32		; CHECK-NEXT: v128.store 32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = inttoptr i32 32 to <8 x i16>*		%s = inttoptr i32 32 to <8 x i16>*
store <8 x i16> %v , <8 x i16>* %s		store <8 x i16> %v , <8 x i16>* %s
ret void		ret void
}		}

define void @store_narrowing_v8i16_to_numeric_address(<8 x i8> %v, <8 x i8>* %p) {		define void @store_narrowing_v8i16_to_numeric_address(<8 x i8> %v, <8 x i8>* %p) {
; CHECK-LABEL: store_narrowing_v8i16_to_numeric_address:		; CHECK-LABEL: store_narrowing_v8i16_to_numeric_address:
; CHECK: .functype store_narrowing_v8i16_to_numeric_address (v128, i32) -> ()		; CHECK: .functype store_narrowing_v8i16_to_numeric_address (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 32
; CHECK-NEXT: v128.const 255, 255, 255, 255, 255, 255, 255, 255
; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i8x16.narrow_i16x8_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = inttoptr i32 32 to <8 x i8>*		%s = inttoptr i32 32 to <8 x i8>*
store <8 x i8> %v , <8 x i8>* %s		store <8 x i8> %v , <8 x i8>* %s
ret void		ret void
}		}

define void @store_v8i16_to_global_address(<8 x i16> %v) {		define void @store_v8i16_to_global_address(<8 x i16> %v) {
; CHECK-LABEL: store_v8i16_to_global_address:		; CHECK-LABEL: store_v8i16_to_global_address:
; CHECK: .functype store_v8i16_to_global_address (v128) -> ()		; CHECK: .functype store_v8i16_to_global_address (v128) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.store gv_v8i16		; CHECK-NEXT: v128.store gv_v8i16
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
store <8 x i16> %v , <8 x i16>* @gv_v8i16		store <8 x i16> %v , <8 x i16>* @gv_v8i16
ret void		ret void
}		}

define void @store_narrowing_v8i16_to_global_address(<8 x i8> %v) {		define void @store_narrowing_v8i16_to_global_address(<8 x i8> %v) {
; CHECK-LABEL: store_narrowing_v8i16_to_global_address:		; CHECK-LABEL: store_narrowing_v8i16_to_global_address:
; CHECK: .functype store_narrowing_v8i16_to_global_address (v128) -> ()		; CHECK: .functype store_narrowing_v8i16_to_global_address (v128) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const gv_v8i8
; CHECK-NEXT: v128.const 255, 255, 255, 255, 255, 255, 255, 255
; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i8x16.narrow_i16x8_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store gv_v8i8
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
store <8 x i8> %v , <8 x i8>* @gv_v8i8		store <8 x i8> %v , <8 x i8>* @gv_v8i8
ret void		ret void
}		}

; ==============================================================================		; ==============================================================================
; 4 x i32		; 4 x i32
; ==============================================================================		; ==============================================================================
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <4 x i32> %v2		ret <4 x i32> %v2
}		}

define <4 x i16> @load_ext_v4i32(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32:		; CHECK-LABEL: load_ext_v4i32:
; CHECK: .functype load_ext_v4i32 (i32) -> (v128)		; CHECK: .functype load_ext_v4i32 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32x4.load16x4_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <4 x i16>, <4 x i16>* %p		%v = load <4 x i16>, <4 x i16>* %p
ret <4 x i16> %v		ret <4 x i16> %v
}		}

define <4 x i32> @load_v4i32_with_folded_offset(<4 x i32>* %p) {		define <4 x i32> @load_v4i32_with_folded_offset(<4 x i32>* %p) {
; CHECK-LABEL: load_v4i32_with_folded_offset:		; CHECK-LABEL: load_v4i32_with_folded_offset:
; CHECK: .functype load_v4i32_with_folded_offset (i32) -> (v128)		; CHECK: .functype load_v4i32_with_folded_offset (i32) -> (v128)
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <4 x i32> %v2		ret <4 x i32> %v2
}		}

define <4 x i16> @load_ext_v4i32_with_folded_offset(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_with_folded_offset(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_with_folded_offset:		; CHECK-LABEL: load_ext_v4i32_with_folded_offset:
; CHECK: .functype load_ext_v4i32_with_folded_offset (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_with_folded_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32x4.load16x4_u 16		; CHECK-NEXT: v128.load64_zero 16
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <4 x i16>* %p to i32		%q = ptrtoint <4 x i16>* %p to i32
%r = add nuw i32 %q, 16		%r = add nuw i32 %q, 16
%s = inttoptr i32 %r to <4 x i16>*		%s = inttoptr i32 %r to <4 x i16>*
%v = load <4 x i16>, <4 x i16>* %s		%v = load <4 x i16>, <4 x i16>* %s
ret <4 x i16> %v		ret <4 x i16> %v
}		}

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <4 x i32> %v2		ret <4 x i32> %v2
}		}

define <4 x i16> @load_ext_v4i32_with_folded_gep_offset(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_with_folded_gep_offset(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_with_folded_gep_offset:		; CHECK-LABEL: load_ext_v4i32_with_folded_gep_offset:
; CHECK: .functype load_ext_v4i32_with_folded_gep_offset (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_with_folded_gep_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32x4.load16x4_u 8		; CHECK-NEXT: v128.load64_zero 8
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <4 x i16>, <4 x i16>* %p, i32 1		%s = getelementptr inbounds <4 x i16>, <4 x i16>* %p, i32 1
%v = load <4 x i16>, <4 x i16>* %s		%v = load <4 x i16>, <4 x i16>* %s
ret <4 x i16> %v		ret <4 x i16> %v
}		}

define <4 x i32> @load_v4i32_with_unfolded_gep_negative_offset(<4 x i32>* %p) {		define <4 x i32> @load_v4i32_with_unfolded_gep_negative_offset(<4 x i32>* %p) {
; CHECK-LABEL: load_v4i32_with_unfolded_gep_negative_offset:		; CHECK-LABEL: load_v4i32_with_unfolded_gep_negative_offset:
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines

define <4 x i16> @load_ext_v4i32_with_unfolded_gep_negative_offset(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_with_unfolded_gep_negative_offset(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_with_unfolded_gep_negative_offset:		; CHECK-LABEL: load_ext_v4i32_with_unfolded_gep_negative_offset:
; CHECK: .functype load_ext_v4i32_with_unfolded_gep_negative_offset (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_with_unfolded_gep_negative_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const -8		; CHECK-NEXT: i32.const -8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i32x4.load16x4_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <4 x i16>, <4 x i16>* %p, i32 -1		%s = getelementptr inbounds <4 x i16>, <4 x i16>* %p, i32 -1
%v = load <4 x i16>, <4 x i16>* %s		%v = load <4 x i16>, <4 x i16>* %s
ret <4 x i16> %v		ret <4 x i16> %v
}		}

define <4 x i32> @load_v4i32_with_unfolded_offset(<4 x i32>* %p) {		define <4 x i32> @load_v4i32_with_unfolded_offset(<4 x i32>* %p) {
; CHECK-LABEL: load_v4i32_with_unfolded_offset:		; CHECK-LABEL: load_v4i32_with_unfolded_offset:
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines

define <4 x i16> @load_ext_v4i32_with_unfolded_offset(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_with_unfolded_offset(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_with_unfolded_offset:		; CHECK-LABEL: load_ext_v4i32_with_unfolded_offset:
; CHECK: .functype load_ext_v4i32_with_unfolded_offset (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_with_unfolded_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const 16		; CHECK-NEXT: i32.const 16
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i32x4.load16x4_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <4 x i16>* %p to i32		%q = ptrtoint <4 x i16>* %p to i32
%r = add nsw i32 %q, 16		%r = add nsw i32 %q, 16
%s = inttoptr i32 %r to <4 x i16>*		%s = inttoptr i32 %r to <4 x i16>*
%v = load <4 x i16>, <4 x i16>* %s		%v = load <4 x i16>, <4 x i16>* %s
ret <4 x i16> %v		ret <4 x i16> %v
}		}

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

define <4 x i16> @load_ext_v4i32_with_unfolded_gep_offset(<4 x i16>* %p) {		define <4 x i16> @load_ext_v4i32_with_unfolded_gep_offset(<4 x i16>* %p) {
; CHECK-LABEL: load_ext_v4i32_with_unfolded_gep_offset:		; CHECK-LABEL: load_ext_v4i32_with_unfolded_gep_offset:
; CHECK: .functype load_ext_v4i32_with_unfolded_gep_offset (i32) -> (v128)		; CHECK: .functype load_ext_v4i32_with_unfolded_gep_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const 8		; CHECK-NEXT: i32.const 8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i32x4.load16x4_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr <4 x i16>, <4 x i16>* %p, i32 1		%s = getelementptr <4 x i16>, <4 x i16>* %p, i32 1
%v = load <4 x i16>, <4 x i16>* %s		%v = load <4 x i16>, <4 x i16>* %s
ret <4 x i16> %v		ret <4 x i16> %v
}		}

define <4 x i32> @load_v4i32_from_numeric_address() {		define <4 x i32> @load_v4i32_from_numeric_address() {
; CHECK-LABEL: load_v4i32_from_numeric_address:		; CHECK-LABEL: load_v4i32_from_numeric_address:
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <4 x i32> %v2		ret <4 x i32> %v2
}		}

define <4 x i16> @load_ext_v4i32_from_numeric_address() {		define <4 x i16> @load_ext_v4i32_from_numeric_address() {
; CHECK-LABEL: load_ext_v4i32_from_numeric_address:		; CHECK-LABEL: load_ext_v4i32_from_numeric_address:
; CHECK: .functype load_ext_v4i32_from_numeric_address () -> (v128)		; CHECK: .functype load_ext_v4i32_from_numeric_address () -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: i32x4.load16x4_u 32		; CHECK-NEXT: v128.load64_zero 32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = inttoptr i32 32 to <4 x i16>*		%s = inttoptr i32 32 to <4 x i16>*
%v = load <4 x i16>, <4 x i16>* %s		%v = load <4 x i16>, <4 x i16>* %s
ret <4 x i16> %v		ret <4 x i16> %v
}		}

@gv_v4i32 = global <4 x i32> <i32 42, i32 42, i32 42, i32 42>		@gv_v4i32 = global <4 x i32> <i32 42, i32 42, i32 42, i32 42>
define <4 x i32> @load_v4i32_from_global_address() {		define <4 x i32> @load_v4i32_from_global_address() {
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <4 x i32> %v2		ret <4 x i32> %v2
}		}

define <4 x i16> @load_ext_v4i32_from_global_address() {		define <4 x i16> @load_ext_v4i32_from_global_address() {
; CHECK-LABEL: load_ext_v4i32_from_global_address:		; CHECK-LABEL: load_ext_v4i32_from_global_address:
; CHECK: .functype load_ext_v4i32_from_global_address () -> (v128)		; CHECK: .functype load_ext_v4i32_from_global_address () -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: i32x4.load16x4_u gv_v4i16		; CHECK-NEXT: v128.load64_zero gv_v4i16
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <4 x i16>, <4 x i16>* @gv_v4i16		%v = load <4 x i16>, <4 x i16>* @gv_v4i16
ret <4 x i16> %v		ret <4 x i16> %v
}		}

define void @store_v4i32(<4 x i32> %v, <4 x i32>* %p) {		define void @store_v4i32(<4 x i32> %v, <4 x i32>* %p) {
; CHECK-LABEL: store_v4i32:		; CHECK-LABEL: store_v4i32:
; CHECK: .functype store_v4i32 (v128, i32) -> ()		; CHECK: .functype store_v4i32 (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.store 0		; CHECK-NEXT: v128.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
store <4 x i32> %v , <4 x i32>* %p		store <4 x i32> %v , <4 x i32>* %p
ret void		ret void
}		}

define void @store_narrowing_v4i32(<4 x i16> %v, <4 x i16>* %p) {		define void @store_narrowing_v4i32(<4 x i16> %v, <4 x i16>* %p) {
; CHECK-LABEL: store_narrowing_v4i32:		; CHECK-LABEL: store_narrowing_v4i32:
; CHECK: .functype store_narrowing_v4i32 (v128, i32) -> ()		; CHECK: .functype store_narrowing_v4i32 (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: v128.const 65535, 65535, 65535, 65535
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.narrow_i32x4_u
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
store <4 x i16> %v , <4 x i16>* %p		store <4 x i16> %v , <4 x i16>* %p
ret void		ret void
}		}

define void @store_v4i32_with_folded_offset(<4 x i32> %v, <4 x i32>* %p) {		define void @store_v4i32_with_folded_offset(<4 x i32> %v, <4 x i32>* %p) {
; CHECK-LABEL: store_v4i32_with_folded_offset:		; CHECK-LABEL: store_v4i32_with_folded_offset:
; CHECK: .functype store_v4i32_with_folded_offset (v128, i32) -> ()		; CHECK: .functype store_v4i32_with_folded_offset (v128, i32) -> ()
Show All 9 Lines	; CHECK-NEXT: # fallthrough-return
ret void		ret void
}		}

define void @store_narrowing_v4i32_with_folded_offset(<4 x i16> %v, <4 x i16>* %p) {		define void @store_narrowing_v4i32_with_folded_offset(<4 x i16> %v, <4 x i16>* %p) {
; CHECK-LABEL: store_narrowing_v4i32_with_folded_offset:		; CHECK-LABEL: store_narrowing_v4i32_with_folded_offset:
; CHECK: .functype store_narrowing_v4i32_with_folded_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v4i32_with_folded_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: v128.const 65535, 65535, 65535, 65535		; CHECK-NEXT: i32.const 16
; CHECK-NEXT: local.get 0		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.narrow_i32x4_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 16
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <4 x i16>* %p to i32		%q = ptrtoint <4 x i16>* %p to i32
%r = add nuw i32 %q, 16		%r = add nuw i32 %q, 16
%s = inttoptr i32 %r to <4 x i16>*		%s = inttoptr i32 %r to <4 x i16>*
store <4 x i16> %v , <4 x i16>* %s		store <4 x i16> %v , <4 x i16>* %s
ret void		ret void
}		}

Show All 10 Lines	; CHECK-NEXT: # fallthrough-return
ret void		ret void
}		}

define void @store_narrowing_v4i32_with_folded_gep_offset(<4 x i16> %v, <4 x i16>* %p) {		define void @store_narrowing_v4i32_with_folded_gep_offset(<4 x i16> %v, <4 x i16>* %p) {
; CHECK-LABEL: store_narrowing_v4i32_with_folded_gep_offset:		; CHECK-LABEL: store_narrowing_v4i32_with_folded_gep_offset:
; CHECK: .functype store_narrowing_v4i32_with_folded_gep_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v4i32_with_folded_gep_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: v128.const 65535, 65535, 65535, 65535		; CHECK-NEXT: i32.const 8
; CHECK-NEXT: local.get 0		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.narrow_i32x4_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 8
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <4 x i16>, <4 x i16>* %p, i32 1		%s = getelementptr inbounds <4 x i16>, <4 x i16>* %p, i32 1
store <4 x i16> %v , <4 x i16>* %s		store <4 x i16> %v , <4 x i16>* %s
ret void		ret void
}		}

define void @store_v4i32_with_unfolded_gep_negative_offset(<4 x i32> %v, <4 x i32>* %p) {		define void @store_v4i32_with_unfolded_gep_negative_offset(<4 x i32> %v, <4 x i32>* %p) {
; CHECK-LABEL: store_v4i32_with_unfolded_gep_negative_offset:		; CHECK-LABEL: store_v4i32_with_unfolded_gep_negative_offset:
Show All 12 Lines

define void @store_narrowing_v4i32_with_unfolded_gep_negative_offset(<4 x i16> %v, <4 x i16>* %p) {		define void @store_narrowing_v4i32_with_unfolded_gep_negative_offset(<4 x i16> %v, <4 x i16>* %p) {
; CHECK-LABEL: store_narrowing_v4i32_with_unfolded_gep_negative_offset:		; CHECK-LABEL: store_narrowing_v4i32_with_unfolded_gep_negative_offset:
; CHECK: .functype store_narrowing_v4i32_with_unfolded_gep_negative_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v4i32_with_unfolded_gep_negative_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: i32.const -8		; CHECK-NEXT: i32.const -8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.const 65535, 65535, 65535, 65535
; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.narrow_i32x4_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <4 x i16>, <4 x i16>* %p, i32 -1		%s = getelementptr inbounds <4 x i16>, <4 x i16>* %p, i32 -1
store <4 x i16> %v , <4 x i16>* %s		store <4 x i16> %v , <4 x i16>* %s
ret void		ret void
}		}

define void @store_v4i32_with_unfolded_offset(<4 x i32> %v, <4 x i32>* %p) {		define void @store_v4i32_with_unfolded_offset(<4 x i32> %v, <4 x i32>* %p) {
; CHECK-LABEL: store_v4i32_with_unfolded_offset:		; CHECK-LABEL: store_v4i32_with_unfolded_offset:
Show All 14 Lines

define void @store_narrowing_v4i32_with_unfolded_offset(<4 x i16> %v, <4 x i16>* %p) {		define void @store_narrowing_v4i32_with_unfolded_offset(<4 x i16> %v, <4 x i16>* %p) {
; CHECK-LABEL: store_narrowing_v4i32_with_unfolded_offset:		; CHECK-LABEL: store_narrowing_v4i32_with_unfolded_offset:
; CHECK: .functype store_narrowing_v4i32_with_unfolded_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v4i32_with_unfolded_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: i32.const 16		; CHECK-NEXT: i32.const 16
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.const 65535, 65535, 65535, 65535
; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.narrow_i32x4_u		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <4 x i16>* %p to i32		%q = ptrtoint <4 x i16>* %p to i32
%r = add nsw i32 %q, 16		%r = add nsw i32 %q, 16
%s = inttoptr i32 %r to <4 x i16>*		%s = inttoptr i32 %r to <4 x i16>*
store <4 x i16> %v , <4 x i16>* %s		store <4 x i16> %v , <4 x i16>* %s
ret void		ret void
}		}

Show All 14 Lines

define void @store_narrowing_v4i32_with_unfolded_gep_offset(<4 x i16> %v, <4 x i16>* %p) {		define void @store_narrowing_v4i32_with_unfolded_gep_offset(<4 x i16> %v, <4 x i16>* %p) {
; CHECK-LABEL: store_narrowing_v4i32_with_unfolded_gep_offset:		; CHECK-LABEL: store_narrowing_v4i32_with_unfolded_gep_offset:
; CHECK: .functype store_narrowing_v4i32_with_unfolded_gep_offset (v128, i32) -> ()		; CHECK: .functype store_narrowing_v4i32_with_unfolded_gep_offset (v128, i32) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 1		; CHECK-NEXT: local.get 1
; CHECK-NEXT: i32.const 8		; CHECK-NEXT: i32.const 8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: v128.const 65535, 65535, 65535, 65535
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.narrow_i32x4_u
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr <4 x i16>, <4 x i16>* %p, i32 1		%s = getelementptr <4 x i16>, <4 x i16>* %p, i32 1
store <4 x i16> %v , <4 x i16>* %s		store <4 x i16> %v , <4 x i16>* %s
ret void		ret void
}		}

define void @store_v4i32_to_numeric_address(<4 x i32> %v) {		define void @store_v4i32_to_numeric_address(<4 x i32> %v) {
; CHECK-LABEL: store_v4i32_to_numeric_address:		; CHECK-LABEL: store_v4i32_to_numeric_address:
; CHECK: .functype store_v4i32_to_numeric_address (v128) -> ()		; CHECK: .functype store_v4i32_to_numeric_address (v128) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.store 32		; CHECK-NEXT: v128.store 32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = inttoptr i32 32 to <4 x i32>*		%s = inttoptr i32 32 to <4 x i32>*
store <4 x i32> %v , <4 x i32>* %s		store <4 x i32> %v , <4 x i32>* %s
ret void		ret void
}		}

define void @store_narrowing_v4i32_to_numeric_address(<4 x i16> %v) {		define void @store_narrowing_v4i32_to_numeric_address(<4 x i16> %v) {
; CHECK-LABEL: store_narrowing_v4i32_to_numeric_address:		; CHECK-LABEL: store_narrowing_v4i32_to_numeric_address:
; CHECK: .functype store_narrowing_v4i32_to_numeric_address (v128) -> ()		; CHECK: .functype store_narrowing_v4i32_to_numeric_address (v128) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 32
; CHECK-NEXT: v128.const 65535, 65535, 65535, 65535
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.narrow_i32x4_u
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store 32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = inttoptr i32 32 to <4 x i16>*		%s = inttoptr i32 32 to <4 x i16>*
store <4 x i16> %v , <4 x i16>* %s		store <4 x i16> %v , <4 x i16>* %s
ret void		ret void
}		}

define void @store_v4i32_to_global_address(<4 x i32> %v) {		define void @store_v4i32_to_global_address(<4 x i32> %v) {
; CHECK-LABEL: store_v4i32_to_global_address:		; CHECK-LABEL: store_v4i32_to_global_address:
; CHECK: .functype store_v4i32_to_global_address (v128) -> ()		; CHECK: .functype store_v4i32_to_global_address (v128) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.store gv_v4i32		; CHECK-NEXT: v128.store gv_v4i32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
store <4 x i32> %v , <4 x i32>* @gv_v4i32		store <4 x i32> %v , <4 x i32>* @gv_v4i32
ret void		ret void
}		}

define void @store_narrowing_v4i32_to_global_address(<4 x i16> %v) {		define void @store_narrowing_v4i32_to_global_address(<4 x i16> %v) {
; CHECK-LABEL: store_narrowing_v4i32_to_global_address:		; CHECK-LABEL: store_narrowing_v4i32_to_global_address:
; CHECK: .functype store_narrowing_v4i32_to_global_address (v128) -> ()		; CHECK: .functype store_narrowing_v4i32_to_global_address (v128) -> ()
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const gv_v4i16
; CHECK-NEXT: v128.const 65535, 65535, 65535, 65535
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: v128.and		; CHECK-NEXT: v128.store64_lane 0, 0
; CHECK-NEXT: local.get 0
; CHECK-NEXT: i16x8.narrow_i32x4_u
; CHECK-NEXT: i64x2.extract_lane 0
; CHECK-NEXT: i64.store gv_v4i16
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
store <4 x i16> %v , <4 x i16>* @gv_v4i16		store <4 x i16> %v , <4 x i16>* @gv_v4i16
ret void		ret void
}		}

; ==============================================================================		; ==============================================================================
; 2 x i64		; 2 x i64
; ==============================================================================		; ==============================================================================
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <2 x i64> %v2		ret <2 x i64> %v2
}		}

define <2 x i32> @load_ext_v2i64(<2 x i32>* %p) {		define <2 x i32> @load_ext_v2i64(<2 x i32>* %p) {
; CHECK-LABEL: load_ext_v2i64:		; CHECK-LABEL: load_ext_v2i64:
; CHECK: .functype load_ext_v2i64 (i32) -> (v128)		; CHECK: .functype load_ext_v2i64 (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i64x2.load32x2_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <2 x i32>, <2 x i32>* %p		%v = load <2 x i32>, <2 x i32>* %p
ret <2 x i32> %v		ret <2 x i32> %v
}		}

define <2 x i64> @load_v2i64_with_folded_offset(<2 x i64>* %p) {		define <2 x i64> @load_v2i64_with_folded_offset(<2 x i64>* %p) {
; CHECK-LABEL: load_v2i64_with_folded_offset:		; CHECK-LABEL: load_v2i64_with_folded_offset:
; CHECK: .functype load_v2i64_with_folded_offset (i32) -> (v128)		; CHECK: .functype load_v2i64_with_folded_offset (i32) -> (v128)
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <2 x i64> %v2		ret <2 x i64> %v2
}		}

define <2 x i32> @load_ext_v2i64_with_folded_offset(<2 x i32>* %p) {		define <2 x i32> @load_ext_v2i64_with_folded_offset(<2 x i32>* %p) {
; CHECK-LABEL: load_ext_v2i64_with_folded_offset:		; CHECK-LABEL: load_ext_v2i64_with_folded_offset:
; CHECK: .functype load_ext_v2i64_with_folded_offset (i32) -> (v128)		; CHECK: .functype load_ext_v2i64_with_folded_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i64x2.load32x2_u 16		; CHECK-NEXT: v128.load64_zero 16
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <2 x i32>* %p to i32		%q = ptrtoint <2 x i32>* %p to i32
%r = add nuw i32 %q, 16		%r = add nuw i32 %q, 16
%s = inttoptr i32 %r to <2 x i32>*		%s = inttoptr i32 %r to <2 x i32>*
%v = load <2 x i32>, <2 x i32>* %s		%v = load <2 x i32>, <2 x i32>* %s
ret <2 x i32> %v		ret <2 x i32> %v
}		}

▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <2 x i64> %v2		ret <2 x i64> %v2
}		}

define <2 x i32> @load_ext_v2i64_with_folded_gep_offset(<2 x i32>* %p) {		define <2 x i32> @load_ext_v2i64_with_folded_gep_offset(<2 x i32>* %p) {
; CHECK-LABEL: load_ext_v2i64_with_folded_gep_offset:		; CHECK-LABEL: load_ext_v2i64_with_folded_gep_offset:
; CHECK: .functype load_ext_v2i64_with_folded_gep_offset (i32) -> (v128)		; CHECK: .functype load_ext_v2i64_with_folded_gep_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i64x2.load32x2_u 8		; CHECK-NEXT: v128.load64_zero 8
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <2 x i32>, <2 x i32>* %p, i32 1		%s = getelementptr inbounds <2 x i32>, <2 x i32>* %p, i32 1
%v = load <2 x i32>, <2 x i32>* %s		%v = load <2 x i32>, <2 x i32>* %s
ret <2 x i32> %v		ret <2 x i32> %v
}		}

define <2 x i64> @load_v2i64_with_unfolded_gep_negative_offset(<2 x i64>* %p) {		define <2 x i64> @load_v2i64_with_unfolded_gep_negative_offset(<2 x i64>* %p) {
; CHECK-LABEL: load_v2i64_with_unfolded_gep_negative_offset:		; CHECK-LABEL: load_v2i64_with_unfolded_gep_negative_offset:
▲ Show 20 Lines • Show All 57 Lines • ▼ Show 20 Lines

define <2 x i32> @load_ext_v2i64_with_unfolded_gep_negative_offset(<2 x i32>* %p) {		define <2 x i32> @load_ext_v2i64_with_unfolded_gep_negative_offset(<2 x i32>* %p) {
; CHECK-LABEL: load_ext_v2i64_with_unfolded_gep_negative_offset:		; CHECK-LABEL: load_ext_v2i64_with_unfolded_gep_negative_offset:
; CHECK: .functype load_ext_v2i64_with_unfolded_gep_negative_offset (i32) -> (v128)		; CHECK: .functype load_ext_v2i64_with_unfolded_gep_negative_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const -8		; CHECK-NEXT: i32.const -8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i64x2.load32x2_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr inbounds <2 x i32>, <2 x i32>* %p, i32 -1		%s = getelementptr inbounds <2 x i32>, <2 x i32>* %p, i32 -1
%v = load <2 x i32>, <2 x i32>* %s		%v = load <2 x i32>, <2 x i32>* %s
ret <2 x i32> %v		ret <2 x i32> %v
}		}

define <2 x i64> @load_v2i64_with_unfolded_offset(<2 x i64>* %p) {		define <2 x i64> @load_v2i64_with_unfolded_offset(<2 x i64>* %p) {
; CHECK-LABEL: load_v2i64_with_unfolded_offset:		; CHECK-LABEL: load_v2i64_with_unfolded_offset:
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines

define <2 x i32> @load_ext_v2i64_with_unfolded_offset(<2 x i32>* %p) {		define <2 x i32> @load_ext_v2i64_with_unfolded_offset(<2 x i32>* %p) {
; CHECK-LABEL: load_ext_v2i64_with_unfolded_offset:		; CHECK-LABEL: load_ext_v2i64_with_unfolded_offset:
; CHECK: .functype load_ext_v2i64_with_unfolded_offset (i32) -> (v128)		; CHECK: .functype load_ext_v2i64_with_unfolded_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const 16		; CHECK-NEXT: i32.const 16
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i64x2.load32x2_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%q = ptrtoint <2 x i32>* %p to i32		%q = ptrtoint <2 x i32>* %p to i32
%r = add nsw i32 %q, 16		%r = add nsw i32 %q, 16
%s = inttoptr i32 %r to <2 x i32>*		%s = inttoptr i32 %r to <2 x i32>*
%v = load <2 x i32>, <2 x i32>* %s		%v = load <2 x i32>, <2 x i32>* %s
ret <2 x i32> %v		ret <2 x i32> %v
}		}

▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines

define <2 x i32> @load_ext_v2i64_with_unfolded_gep_offset(<2 x i32>* %p) {		define <2 x i32> @load_ext_v2i64_with_unfolded_gep_offset(<2 x i32>* %p) {
; CHECK-LABEL: load_ext_v2i64_with_unfolded_gep_offset:		; CHECK-LABEL: load_ext_v2i64_with_unfolded_gep_offset:
; CHECK: .functype load_ext_v2i64_with_unfolded_gep_offset (i32) -> (v128)		; CHECK: .functype load_ext_v2i64_with_unfolded_gep_offset (i32) -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: local.get 0		; CHECK-NEXT: local.get 0
; CHECK-NEXT: i32.const 8		; CHECK-NEXT: i32.const 8
; CHECK-NEXT: i32.add		; CHECK-NEXT: i32.add
; CHECK-NEXT: i64x2.load32x2_u 0		; CHECK-NEXT: v128.load64_zero 0
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = getelementptr <2 x i32>, <2 x i32>* %p, i32 1		%s = getelementptr <2 x i32>, <2 x i32>* %p, i32 1
%v = load <2 x i32>, <2 x i32>* %s		%v = load <2 x i32>, <2 x i32>* %s
ret <2 x i32> %v		ret <2 x i32> %v
}		}

define <2 x i64> @load_v2i64_from_numeric_address() {		define <2 x i64> @load_v2i64_from_numeric_address() {
; CHECK-LABEL: load_v2i64_from_numeric_address:		; CHECK-LABEL: load_v2i64_from_numeric_address:
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <2 x i64> %v2		ret <2 x i64> %v2
}		}

define <2 x i32> @load_ext_v2i64_from_numeric_address() {		define <2 x i32> @load_ext_v2i64_from_numeric_address() {
; CHECK-LABEL: load_ext_v2i64_from_numeric_address:		; CHECK-LABEL: load_ext_v2i64_from_numeric_address:
; CHECK: .functype load_ext_v2i64_from_numeric_address () -> (v128)		; CHECK: .functype load_ext_v2i64_from_numeric_address () -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: i64x2.load32x2_u 32		; CHECK-NEXT: v128.load64_zero 32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%s = inttoptr i32 32 to <2 x i32>*		%s = inttoptr i32 32 to <2 x i32>*
%v = load <2 x i32>, <2 x i32>* %s		%v = load <2 x i32>, <2 x i32>* %s
ret <2 x i32> %v		ret <2 x i32> %v
}		}

@gv_v2i64 = global <2 x i64> <i64 42, i64 42>		@gv_v2i64 = global <2 x i64> <i64 42, i64 42>
define <2 x i64> @load_v2i64_from_global_address() {		define <2 x i64> @load_v2i64_from_global_address() {
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	; CHECK-NEXT: # fallthrough-return
ret <2 x i64> %v2		ret <2 x i64> %v2
}		}

define <2 x i32> @load_ext_v2i64_from_global_address() {		define <2 x i32> @load_ext_v2i64_from_global_address() {
; CHECK-LABEL: load_ext_v2i64_from_global_address:		; CHECK-LABEL: load_ext_v2i64_from_global_address:
; CHECK: .functype load_ext_v2i64_from_global_address () -> (v128)		; CHECK: .functype load_ext_v2i64_from_global_address () -> (v128)
; CHECK-NEXT: # %bb.0:		; CHECK-NEXT: # %bb.0:
; CHECK-NEXT: i32.const 0		; CHECK-NEXT: i32.const 0
; CHECK-NEXT: i64x2.load32x2_u gv_v2i32		; CHECK-NEXT: v128.load64_zero gv_v2i32
; CHECK-NEXT: # fallthrough-return		; CHECK-NEXT: # fallthrough-return
%v = load <2 x i32>, <2 x i32>* @gv_v2i32		%v = load <2 x i32>, <2 x i32>* @gv_v2i32
ret <2 x i32> %v		ret <2 x i32> %v
}		}

define void @store_v2i64(<2 x i64> %v, <2 x i64>* %p) {		define void @store_v2i64(<2 x i64> %v, <2 x i64>* %p) {
; CHECK-LABEL: store_v2i64:		; CHECK-LABEL: store_v2i64:
; CHECK: .functype store_v2i64 (v128, i32) -> ()		; CHECK: .functype store_v2i64 (v128, i32) -> ()
▲ Show 20 Lines • Show All 790 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-scalar-to-vector.ll

This file was deleted.

	; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 \| FileCheck %s

	; Test that scalar_to_vector is lowered into a splat correctly.
	; This bugpoint-reduced code turns into the selection dag below.
	; TODO: find small test cases that produce scalar_to_vector dag nodes
	; to make this test more readable and comprehensive.

	; t0: ch = EntryToken
	; t32: i32,ch = load<(load 4 from `<2 x i16>* undef`, align 1)> t0, undef:i32, undef:i32
	; t33: v4i32 = scalar_to_vector t32
	; t34: v8i16 = bitcast t33
	; t51: i32 = extract_vector_elt t34, Constant:i32<0>
	; t52: ch = store<(store 2 into `<4 x i16>* undef`, align 1), trunc to i16> t32:1, t51, undef:i32, undef:i32
	; t50: i32 = extract_vector_elt t34, Constant:i32<1>
	; t53: ch = store<(store 2 into `<4 x i16>* undef` + 2, align 1), trunc to i16> t32:1, t50, undef:i32, undef:i32
	; t49: i32 = extract_vector_elt t34, Constant:i32<2>
	; t55: ch = store<(store 2 into `<4 x i16>* undef` + 4, align 1), trunc to i16> t32:1, t49, undef:i32, undef:i32
	; t48: i32 = extract_vector_elt t34, Constant:i32<3>
	; t57: ch = store<(store 2 into `<4 x i16>* undef` + 6, align 1), trunc to i16> t32:1, t48, undef:i32, undef:i32
	; t58: ch = TokenFactor t52, t53, t55, t57
	; t24: ch = WebAssemblyISD::RETURN t58

	target triple = "wasm32-unknown-unknown"

	; CHECK-LABEL: foo:
	; CHECK: i64x2.splat
	define void @foo() {
	entry:
	%a = load <2 x i16>, <2 x i16>* undef, align 1
	%b = shufflevector <2 x i16> %a, <2 x i16> undef, <8 x i32> <i32 0, i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef>
	%0 = bitcast <8 x i16> %b to <16 x i8>
	%shuffle.i214 = shufflevector <16 x i8> %0, <16 x i8> <i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 0, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef, i8 undef>, <16 x i32> <i32 0, i32 16, i32 1, i32 17, i32 2, i32 18, i32 3, i32 19, i32 4, i32 20, i32 5, i32 21, i32 6, i32 22, i32 7, i32 23>
	%1 = bitcast <16 x i8> %shuffle.i214 to <8 x i16>
	%add82 = add <8 x i16> %1, zeroinitializer
	%2 = select <8 x i1> undef, <8 x i16> undef, <8 x i16> %add82
	%3 = bitcast <8 x i16> %2 to <16 x i8>
	%shuffle.i204 = shufflevector <16 x i8> %3, <16 x i8> undef, <16 x i32> <i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14, i32 0, i32 2, i32 4, i32 6, i32 8, i32 10, i32 12, i32 14>
	%4 = bitcast <16 x i8> %shuffle.i204 to <8 x i16>
	%dst2.0.vec.extract = shufflevector <8 x i16> %4, <8 x i16> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
	store <4 x i16> %dst2.0.vec.extract, <4 x i16>* undef, align 1
	ret void
	}