This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
BuiltinsWebAssembly.def
-
lib/
-
CodeGen/
-
CGBuiltin.cpp
-
Headers/
-
wasm_simd128.h
-
test/CodeGen/
-
CodeGen/
-
builtins-wasm.c
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
-
IntrinsicsWebAssembly.td
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
-
WebAssemblyISD.def
-
WebAssemblyISelLowering.cpp
-
WebAssemblyInstrSIMD.td
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
simd-intrinsics.ll
2/5
simd-widening.ll

Differential D84556

[WebAssembly] Remove intrinsics for SIMD widening ops
ClosedPublic

Authored by tlively on Jul 24 2020, 1:54 PM.

Download Raw Diff

Details

Reviewers

aheejin

Commits

rG11bb7eef4152: [WebAssembly] Remove intrinsics for SIMD widening ops

Summary

Instead, pattern match extends of extract_subvectors to generate
widening operations. Since extract_subvector is not a legal node, this
is implemented via a custom combine that recognizes extract_subvector
nodes before they are legalized. The combine produces custom ISD nodes
that are later pattern matched directly, just like the intrinsic was.

Also removes the clang builtins for these operations since the
instructions can now be generated from portable code sequences.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tlively created this revision.Jul 24 2020, 1:54 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 24 2020, 1:54 PM

Herald added subscribers: llvm-commits, cfe-commits, sunfish and 4 others. · View Herald Transcript

Harbormaster failed remote builds in B65634: Diff 280581!Jul 24 2020, 1:56 PM

srj added a subscriber: srj.Jul 24 2020, 2:24 PM

aheejin accepted this revision.Jul 24 2020, 7:05 PM

aheejin added inline comments.

llvm/test/CodeGen/WebAssembly/simd-widening.ll
114	It'd be clearer to say starting indices of these don't start with 0 or [lanecount - 1] so they can't be widened using `widen_low` or `widen_high` instructions. Question: Can we also widen these using shifts?

This revision is now accepted and ready to land.Jul 24 2020, 7:05 PM

tlively added inline comments.Jul 28 2020, 6:18 PM

llvm/test/CodeGen/WebAssembly/simd-widening.ll
114	Sure, since I didn't end up testing more patterns, I can make the comment more specific. Regarding shifts, I don't think it's possible to do widening with shifts because widening has to fundamentally change the number of lanes, which shifts can't do.

This revision was landed with ongoing or failed builds.Jul 28 2020, 6:26 PM

Closed by commit rG11bb7eef4152: [WebAssembly] Remove intrinsics for SIMD widening ops (authored by tlively). · Explain Why

This revision was automatically updated to reflect the committed changes.

tlively added a commit: rG11bb7eef4152: [WebAssembly] Remove intrinsics for SIMD widening ops.

aheejin added inline comments.Jul 28 2020, 6:55 PM

llvm/test/CodeGen/WebAssembly/simd-widening.ll
114	What I meant was, in case of i16x8->i32x4, the current code can widen i16x8 input vector with elements in the indices 0 to 4. If those elements are instead in 1 to 5, can we first shift that to 0~4 and widen it?

tlively added inline comments.Jul 28 2020, 7:59 PM

llvm/test/CodeGen/WebAssembly/simd-widening.ll
114	Oh gotcha. No, unfortunately I don't think that would work. The SIMD shift instructions shift bytes within lanes but they can't shift data into a different lane. Even if we used shifts on larger lanes to try to overcome that limitation, a 64x2 shift would still not be able to shift data from the high half of the vector to the low half or vice versa, which we would need to do to implement your suggestion.

aheejin added inline comments.Jul 28 2020, 8:16 PM

llvm/test/CodeGen/WebAssembly/simd-widening.ll
114	Ah right... I was confused about SIMD shifts. Thanks.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsWebAssembly.def

9 lines

lib/

CodeGen/

CGBuiltin.cpp

34 lines

Headers/

wasm_simd128.h

51 lines

test/

CodeGen/

builtins-wasm.c

48 lines

llvm/

include/

llvm/

IR/

IntrinsicsWebAssembly.td

16 lines

lib/

Target/

WebAssembly/

WebAssemblyISD.def

4 lines

WebAssemblyISelLowering.cpp

50 lines

WebAssemblyInstrSIMD.td

14 lines

test/

CodeGen/

WebAssembly/

simd-intrinsics.ll

80 lines

simd-widening.ll

180 lines

Diff 281442

clang/include/clang/Basic/BuiltinsWebAssembly.def

	Show First 20 Lines • Show All 163 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_wasm_trunc_saturate_s_i32x4_f32x4, "V4iV4f", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_trunc_saturate_s_i32x4_f32x4, "V4iV4f", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_trunc_saturate_u_i32x4_f32x4, "V4iV4f", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_trunc_saturate_u_i32x4_f32x4, "V4iV4f", "nc", "simd128")

	TARGET_BUILTIN(__builtin_wasm_narrow_s_i8x16_i16x8, "V16cV8sV8s", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_narrow_s_i8x16_i16x8, "V16cV8sV8s", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_narrow_u_i8x16_i16x8, "V16cV8sV8s", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_narrow_u_i8x16_i16x8, "V16cV8sV8s", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_narrow_s_i16x8_i32x4, "V8sV4iV4i", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_narrow_s_i16x8_i32x4, "V8sV4iV4i", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_narrow_u_i16x8_i32x4, "V8sV4iV4i", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_narrow_u_i16x8_i32x4, "V8sV4iV4i", "nc", "simd128")

	TARGET_BUILTIN(__builtin_wasm_widen_low_s_i16x8_i8x16, "V8sV16c", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_widen_high_s_i16x8_i8x16, "V8sV16c", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_widen_low_u_i16x8_i8x16, "V8sV16c", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_widen_high_u_i16x8_i8x16, "V8sV16c", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_widen_low_s_i32x4_i16x8, "V4iV8s", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_widen_high_s_i32x4_i16x8, "V4iV8s", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_widen_low_u_i32x4_i16x8, "V4iV8s", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_widen_high_u_i32x4_i16x8, "V4iV8s", "nc", "simd128")

	#undef BUILTIN			#undef BUILTIN
	#undef TARGET_BUILTIN			#undef TARGET_BUILTIN

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,522 Lines • ▼ Show 20 Lines	case WebAssembly::BI__builtin_wasm_narrow_u_i16x8_i32x4:
break;		break;
default:		default:
llvm_unreachable("unexpected builtin ID");		llvm_unreachable("unexpected builtin ID");
}		}
Function *Callee =		Function *Callee =
CGM.getIntrinsic(IntNo, {ConvertType(E->getType()), Low->getType()});		CGM.getIntrinsic(IntNo, {ConvertType(E->getType()), Low->getType()});
return Builder.CreateCall(Callee, {Low, High});		return Builder.CreateCall(Callee, {Low, High});
}		}
case WebAssembly::BI__builtin_wasm_widen_low_s_i16x8_i8x16:
case WebAssembly::BI__builtin_wasm_widen_high_s_i16x8_i8x16:
case WebAssembly::BI__builtin_wasm_widen_low_u_i16x8_i8x16:
case WebAssembly::BI__builtin_wasm_widen_high_u_i16x8_i8x16:
case WebAssembly::BI__builtin_wasm_widen_low_s_i32x4_i16x8:
case WebAssembly::BI__builtin_wasm_widen_high_s_i32x4_i16x8:
case WebAssembly::BI__builtin_wasm_widen_low_u_i32x4_i16x8:
case WebAssembly::BI__builtin_wasm_widen_high_u_i32x4_i16x8: {
Value *Vec = EmitScalarExpr(E->getArg(0));
unsigned IntNo;
switch (BuiltinID) {
case WebAssembly::BI__builtin_wasm_widen_low_s_i16x8_i8x16:
case WebAssembly::BI__builtin_wasm_widen_low_s_i32x4_i16x8:
IntNo = Intrinsic::wasm_widen_low_signed;
break;
case WebAssembly::BI__builtin_wasm_widen_high_s_i16x8_i8x16:
case WebAssembly::BI__builtin_wasm_widen_high_s_i32x4_i16x8:
IntNo = Intrinsic::wasm_widen_high_signed;
break;
case WebAssembly::BI__builtin_wasm_widen_low_u_i16x8_i8x16:
case WebAssembly::BI__builtin_wasm_widen_low_u_i32x4_i16x8:
IntNo = Intrinsic::wasm_widen_low_unsigned;
break;
case WebAssembly::BI__builtin_wasm_widen_high_u_i16x8_i8x16:
case WebAssembly::BI__builtin_wasm_widen_high_u_i32x4_i16x8:
IntNo = Intrinsic::wasm_widen_high_unsigned;
break;
default:
llvm_unreachable("unexpected builtin ID");
}
Function *Callee =
CGM.getIntrinsic(IntNo, {ConvertType(E->getType()), Vec->getType()});
return Builder.CreateCall(Callee, Vec);
}
case WebAssembly::BI__builtin_wasm_shuffle_v8x16: {		case WebAssembly::BI__builtin_wasm_shuffle_v8x16: {
Value *Ops[18];		Value *Ops[18];
size_t OpIdx = 0;		size_t OpIdx = 0;
Ops[OpIdx++] = EmitScalarExpr(E->getArg(0));		Ops[OpIdx++] = EmitScalarExpr(E->getArg(0));
Ops[OpIdx++] = EmitScalarExpr(E->getArg(1));		Ops[OpIdx++] = EmitScalarExpr(E->getArg(1));
while (OpIdx < 18) {		while (OpIdx < 18) {
Optional<llvm::APSInt> LaneConst =		Optional<llvm::APSInt> LaneConst =
E->getArg(OpIdx)->getIntegerConstantExpr(getContext());		E->getArg(OpIdx)->getIntegerConstantExpr(getContext());
▲ Show 20 Lines • Show All 254 Lines • Show Last 20 Lines

clang/lib/Headers/wasm_simd128.h

	Show All 29 Lines
	typedef unsigned int __u32x4			typedef unsigned int __u32x4
	__attribute__((__vector_size__(16), __aligned__(16)));			__attribute__((__vector_size__(16), __aligned__(16)));
	typedef long long __i64x2 __attribute__((__vector_size__(16), __aligned__(16)));			typedef long long __i64x2 __attribute__((__vector_size__(16), __aligned__(16)));
	typedef unsigned long long __u64x2			typedef unsigned long long __u64x2
	__attribute__((__vector_size__(16), __aligned__(16)));			__attribute__((__vector_size__(16), __aligned__(16)));
	typedef float __f32x4 __attribute__((__vector_size__(16), __aligned__(16)));			typedef float __f32x4 __attribute__((__vector_size__(16), __aligned__(16)));
	typedef double __f64x2 __attribute__((__vector_size__(16), __aligned__(16)));			typedef double __f64x2 __attribute__((__vector_size__(16), __aligned__(16)));

				typedef signed char __i8x8 __attribute__((__vector_size__(8), __aligned__(8)));
				typedef unsigned char __u8x8
				__attribute__((__vector_size__(8), __aligned__(8)));
				typedef short __i16x4 __attribute__((__vector_size__(8), __aligned__(8)));
				typedef unsigned short __u16x4
				__attribute__((__vector_size__(8), __aligned__(8)));

	#define __DEFAULT_FN_ATTRS \			#define __DEFAULT_FN_ATTRS \
	__attribute__((__always_inline__, __nodebug__, __target__("simd128"), \			__attribute__((__always_inline__, __nodebug__, __target__("simd128"), \
	__min_vector_width__(128)))			__min_vector_width__(128)))

	#define __REQUIRE_CONSTANT(e) \			#define __REQUIRE_CONSTANT(e) \
	_Static_assert(__builtin_constant_p(e), "Expected constant")			_Static_assert(__builtin_constant_p(e), "Expected constant")

	static __inline__ v128_t __DEFAULT_FN_ATTRS wasm_v128_load(const void *__mem) {			static __inline__ v128_t __DEFAULT_FN_ATTRS wasm_v128_load(const void *__mem) {
	▲ Show 20 Lines • Show All 1,038 Lines • ▼ Show 20 Lines
	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_u16x8_narrow_i32x4(v128_t __a, v128_t __b) {			wasm_u16x8_narrow_i32x4(v128_t __a, v128_t __b) {
	return (v128_t)__builtin_wasm_narrow_u_i16x8_i32x4((__i32x4)__a,			return (v128_t)__builtin_wasm_narrow_u_i16x8_i32x4((__i32x4)__a,
	(__i32x4)__b);			(__i32x4)__b);
	}			}

	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_i16x8_widen_low_i8x16(v128_t __a) {			wasm_i16x8_widen_low_i8x16(v128_t __a) {
	return (v128_t)__builtin_wasm_widen_low_s_i16x8_i8x16((__i8x16)__a);			return (v128_t) __builtin_convertvector(
				(__i8x8){((__i8x16)__a)[0], ((__i8x16)__a)[1], ((__i8x16)__a)[2],
				((__i8x16)__a)[3], ((__i8x16)__a)[4], ((__i8x16)__a)[5],
				((__i8x16)__a)[6], ((__i8x16)__a)[7]},
				__i16x8);
	}			}

	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_i16x8_widen_high_i8x16(v128_t __a) {			wasm_i16x8_widen_high_i8x16(v128_t __a) {
	return (v128_t)__builtin_wasm_widen_high_s_i16x8_i8x16((__i8x16)__a);			return (v128_t) __builtin_convertvector(
				(__i8x8){((__i8x16)__a)[8], ((__i8x16)__a)[9], ((__i8x16)__a)[10],
				((__i8x16)__a)[11], ((__i8x16)__a)[12], ((__i8x16)__a)[13],
				((__i8x16)__a)[14], ((__i8x16)__a)[15]},
				__i16x8);
	}			}

	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_i16x8_widen_low_u8x16(v128_t __a) {			wasm_i16x8_widen_low_u8x16(v128_t __a) {
	return (v128_t)__builtin_wasm_widen_low_u_i16x8_i8x16((__i8x16)__a);			return (v128_t) __builtin_convertvector(
				(__u8x8){((__u8x16)__a)[0], ((__u8x16)__a)[1], ((__u8x16)__a)[2],
				((__u8x16)__a)[3], ((__u8x16)__a)[4], ((__u8x16)__a)[5],
				((__u8x16)__a)[6], ((__u8x16)__a)[7]},
				__u16x8);
	}			}

	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_i16x8_widen_high_u8x16(v128_t __a) {			wasm_i16x8_widen_high_u8x16(v128_t __a) {
	return (v128_t)__builtin_wasm_widen_high_u_i16x8_i8x16((__i8x16)__a);			return (v128_t) __builtin_convertvector(
				(__u8x8){((__u8x16)__a)[8], ((__u8x16)__a)[9], ((__u8x16)__a)[10],
				((__u8x16)__a)[11], ((__u8x16)__a)[12], ((__u8x16)__a)[13],
				((__u8x16)__a)[14], ((__u8x16)__a)[15]},
				__u16x8);
	}			}

	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_i32x4_widen_low_i16x8(v128_t __a) {			wasm_i32x4_widen_low_i16x8(v128_t __a) {
	return (v128_t)__builtin_wasm_widen_low_s_i32x4_i16x8((__i16x8)__a);			return (v128_t) __builtin_convertvector(
				(__i16x4){((__i16x8)__a)[0], ((__i16x8)__a)[1], ((__i16x8)__a)[2],
				((__i16x8)__a)[3]},
				__i32x4);
	}			}

	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_i32x4_widen_high_i16x8(v128_t __a) {			wasm_i32x4_widen_high_i16x8(v128_t __a) {
	return (v128_t)__builtin_wasm_widen_high_s_i32x4_i16x8((__i16x8)__a);			return (v128_t) __builtin_convertvector(
				(__i16x4){((__i16x8)__a)[4], ((__i16x8)__a)[5], ((__i16x8)__a)[6],
				((__i16x8)__a)[7]},
				__i32x4);
	}			}

	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_i32x4_widen_low_u16x8(v128_t __a) {			wasm_i32x4_widen_low_u16x8(v128_t __a) {
	return (v128_t)__builtin_wasm_widen_low_u_i32x4_i16x8((__i16x8)__a);			return (v128_t) __builtin_convertvector(
				(__u16x4){((__u16x8)__a)[0], ((__u16x8)__a)[1], ((__u16x8)__a)[2],
				((__u16x8)__a)[3]},
				__u32x4);
	}			}

	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_i32x4_widen_high_u16x8(v128_t __a) {			wasm_i32x4_widen_high_u16x8(v128_t __a) {
	return (v128_t)__builtin_wasm_widen_high_u_i32x4_i16x8((__i16x8)__a);			return (v128_t) __builtin_convertvector(
				(__u16x4){((__u16x8)__a)[4], ((__u16x8)__a)[5], ((__u16x8)__a)[6],
				((__u16x8)__a)[7]},
				__u32x4);
	}			}

	// Undefine helper macros			// Undefine helper macros
	#undef __DEFAULT_FN_ATTRS			#undef __DEFAULT_FN_ATTRS

	#endif // __WASM_SIMD128_H			#endif // __WASM_SIMD128_H

clang/test/CodeGen/builtins-wasm.c

	Show First 20 Lines • Show All 731 Lines • ▼ Show 20 Lines

	i16x8 narrow_u_i16x8_i32x4(i32x4 low, i32x4 high) {			i16x8 narrow_u_i16x8_i32x4(i32x4 low, i32x4 high) {
	return __builtin_wasm_narrow_u_i16x8_i32x4(low, high);			return __builtin_wasm_narrow_u_i16x8_i32x4(low, high);
	// WEBASSEMBLY: call <8 x i16> @llvm.wasm.narrow.unsigned.v8i16.v4i32(			// WEBASSEMBLY: call <8 x i16> @llvm.wasm.narrow.unsigned.v8i16.v4i32(
	// WEBASSEMBLY-SAME: <4 x i32> %low, <4 x i32> %high)			// WEBASSEMBLY-SAME: <4 x i32> %low, <4 x i32> %high)
	// WEBASSEMBLY: ret			// WEBASSEMBLY: ret
	}			}

	i16x8 widen_low_s_i16x8_i8x16(i8x16 v) {
	return __builtin_wasm_widen_low_s_i16x8_i8x16(v);
	// WEBASSEMBLY: call <8 x i16> @llvm.wasm.widen.low.signed.v8i16.v16i8(<16 x i8> %v)
	// WEBASSEMBLY: ret
	}

	i16x8 widen_high_s_i16x8_i8x16(i8x16 v) {
	return __builtin_wasm_widen_high_s_i16x8_i8x16(v);
	// WEBASSEMBLY: call <8 x i16> @llvm.wasm.widen.high.signed.v8i16.v16i8(<16 x i8> %v)
	// WEBASSEMBLY: ret
	}

	i16x8 widen_low_u_i16x8_i8x16(i8x16 v) {
	return __builtin_wasm_widen_low_u_i16x8_i8x16(v);
	// WEBASSEMBLY: call <8 x i16> @llvm.wasm.widen.low.unsigned.v8i16.v16i8(<16 x i8> %v)
	// WEBASSEMBLY: ret
	}

	i16x8 widen_high_u_i16x8_i8x16(i8x16 v) {
	return __builtin_wasm_widen_high_u_i16x8_i8x16(v);
	// WEBASSEMBLY: call <8 x i16> @llvm.wasm.widen.high.unsigned.v8i16.v16i8(<16 x i8> %v)
	// WEBASSEMBLY: ret
	}

	i32x4 widen_low_s_i32x4_i16x8(i16x8 v) {
	return __builtin_wasm_widen_low_s_i32x4_i16x8(v);
	// WEBASSEMBLY: call <4 x i32> @llvm.wasm.widen.low.signed.v4i32.v8i16(<8 x i16> %v)
	// WEBASSEMBLY: ret
	}

	i32x4 widen_high_s_i32x4_i16x8(i16x8 v) {
	return __builtin_wasm_widen_high_s_i32x4_i16x8(v);
	// WEBASSEMBLY: call <4 x i32> @llvm.wasm.widen.high.signed.v4i32.v8i16(<8 x i16> %v)
	// WEBASSEMBLY: ret
	}

	i32x4 widen_low_u_i32x4_i16x8(i16x8 v) {
	return __builtin_wasm_widen_low_u_i32x4_i16x8(v);
	// WEBASSEMBLY: call <4 x i32> @llvm.wasm.widen.low.unsigned.v4i32.v8i16(<8 x i16> %v)
	// WEBASSEMBLY: ret
	}

	i32x4 widen_high_u_i32x4_i16x8(i16x8 v) {
	return __builtin_wasm_widen_high_u_i32x4_i16x8(v);
	// WEBASSEMBLY: call <4 x i32> @llvm.wasm.widen.high.unsigned.v4i32.v8i16(<8 x i16> %v)
	// WEBASSEMBLY: ret
	}

	i8x16 swizzle_v8x16(i8x16 x, i8x16 y) {			i8x16 swizzle_v8x16(i8x16 x, i8x16 y) {
	return __builtin_wasm_swizzle_v8x16(x, y);			return __builtin_wasm_swizzle_v8x16(x, y);
	// WEBASSEMBLY: call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %x, <16 x i8> %y)			// WEBASSEMBLY: call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %x, <16 x i8> %y)
	}			}

	i8x16 shuffle(i8x16 x, i8x16 y) {			i8x16 shuffle(i8x16 x, i8x16 y) {
	return __builtin_wasm_shuffle_v8x16(x, y, 0, 1, 2, 3, 4, 5, 6, 7,			return __builtin_wasm_shuffle_v8x16(x, y, 0, 1, 2, 3, 4, 5, 6, 7,
	8, 9, 10, 11, 12, 13, 14, 15);			8, 9, 10, 11, 12, 13, 14, 15);
	// WEBASSEMBLY: call <16 x i8> @llvm.wasm.shuffle(<16 x i8> %x, <16 x i8> %y,			// WEBASSEMBLY: call <16 x i8> @llvm.wasm.shuffle(<16 x i8> %x, <16 x i8> %y,
	// WEBASSEMBLY-SAME: i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,			// WEBASSEMBLY-SAME: i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
	// WEBASSEMBLY-SAME: i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14,			// WEBASSEMBLY-SAME: i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14,
	// WEBASSEMBLY-SAME: i32 15			// WEBASSEMBLY-SAME: i32 15
	// WEBASSEMBLY-NEXT: ret			// WEBASSEMBLY-NEXT: ret
	}			}

llvm/include/llvm/IR/IntrinsicsWebAssembly.td

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	def int_wasm_narrow_signed :			def int_wasm_narrow_signed :
	Intrinsic<[llvm_anyvector_ty],			Intrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty, LLVMMatchType<1>],			[llvm_anyvector_ty, LLVMMatchType<1>],
	[IntrNoMem, IntrSpeculatable]>;			[IntrNoMem, IntrSpeculatable]>;
	def int_wasm_narrow_unsigned :			def int_wasm_narrow_unsigned :
	Intrinsic<[llvm_anyvector_ty],			Intrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty, LLVMMatchType<1>],			[llvm_anyvector_ty, LLVMMatchType<1>],
	[IntrNoMem, IntrSpeculatable]>;			[IntrNoMem, IntrSpeculatable]>;
	def int_wasm_widen_low_signed :
	Intrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty],
	[IntrNoMem, IntrSpeculatable]>;
	def int_wasm_widen_high_signed :
	Intrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty],
	[IntrNoMem, IntrSpeculatable]>;
	def int_wasm_widen_low_unsigned :
	Intrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty],
	[IntrNoMem, IntrSpeculatable]>;
	def int_wasm_widen_high_unsigned :
	Intrinsic<[llvm_anyvector_ty],
	[llvm_anyvector_ty],
	[IntrNoMem, IntrSpeculatable]>;

	// TODO: Replace these intrinsics with normal ISel patterns			// TODO: Replace these intrinsics with normal ISel patterns
	def int_wasm_pmin :			def int_wasm_pmin :
	Intrinsic<[llvm_anyvector_ty],			Intrinsic<[llvm_anyvector_ty],
	[LLVMMatchType<0>, LLVMMatchType<0>],			[LLVMMatchType<0>, LLVMMatchType<0>],
	[IntrNoMem, IntrSpeculatable]>;			[IntrNoMem, IntrSpeculatable]>;
	def int_wasm_pmax :			def int_wasm_pmax :
	Intrinsic<[llvm_anyvector_ty],			Intrinsic<[llvm_anyvector_ty],
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyISD.def

	Show All 23 Lines
	HANDLE_NODETYPE(WrapperPIC)			HANDLE_NODETYPE(WrapperPIC)
	HANDLE_NODETYPE(BR_IF)			HANDLE_NODETYPE(BR_IF)
	HANDLE_NODETYPE(BR_TABLE)			HANDLE_NODETYPE(BR_TABLE)
	HANDLE_NODETYPE(SHUFFLE)			HANDLE_NODETYPE(SHUFFLE)
	HANDLE_NODETYPE(SWIZZLE)			HANDLE_NODETYPE(SWIZZLE)
	HANDLE_NODETYPE(VEC_SHL)			HANDLE_NODETYPE(VEC_SHL)
	HANDLE_NODETYPE(VEC_SHR_S)			HANDLE_NODETYPE(VEC_SHR_S)
	HANDLE_NODETYPE(VEC_SHR_U)			HANDLE_NODETYPE(VEC_SHR_U)
				HANDLE_NODETYPE(WIDEN_LOW_S)
				HANDLE_NODETYPE(WIDEN_LOW_U)
				HANDLE_NODETYPE(WIDEN_HIGH_S)
				HANDLE_NODETYPE(WIDEN_HIGH_U)
	HANDLE_NODETYPE(THROW)			HANDLE_NODETYPE(THROW)
	HANDLE_NODETYPE(MEMORY_COPY)			HANDLE_NODETYPE(MEMORY_COPY)
	HANDLE_NODETYPE(MEMORY_FILL)			HANDLE_NODETYPE(MEMORY_FILL)

	// Memory intrinsics			// Memory intrinsics
	HANDLE_MEM_NODETYPE(LOAD_SPLAT)			HANDLE_MEM_NODETYPE(LOAD_SPLAT)

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	if (Subtarget->hasSIMD128())
setOperationAction(Op, T, Expand);		setOperationAction(Op, T, Expand);
}		}

// SIMD-specific configuration		// SIMD-specific configuration
if (Subtarget->hasSIMD128()) {		if (Subtarget->hasSIMD128()) {
// Hoist bitcasts out of shuffles		// Hoist bitcasts out of shuffles
setTargetDAGCombine(ISD::VECTOR_SHUFFLE);		setTargetDAGCombine(ISD::VECTOR_SHUFFLE);

		// Combine extends of extract_subvectors into widening ops
		setTargetDAGCombine(ISD::SIGN_EXTEND);
		setTargetDAGCombine(ISD::ZERO_EXTEND);

// Support saturating add for i8x16 and i16x8		// Support saturating add for i8x16 and i16x8
for (auto Op : {ISD::SADDSAT, ISD::UADDSAT})		for (auto Op : {ISD::SADDSAT, ISD::UADDSAT})
for (auto T : {MVT::v16i8, MVT::v8i16})		for (auto T : {MVT::v16i8, MVT::v8i16})
setOperationAction(Op, T, Legal);		setOperationAction(Op, T, Legal);

// Support integer abs		// Support integer abs
for (auto T : {MVT::v16i8, MVT::v8i16, MVT::v4i32})		for (auto T : {MVT::v16i8, MVT::v8i16, MVT::v4i32})
setOperationAction(ISD::ABS, T, Legal);		setOperationAction(ISD::ABS, T, Legal);
▲ Show 20 Lines • Show All 1,606 Lines • ▼ Show 20 Lines	performVECTOR_SHUFFLECombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
if (!SrcType.is128BitVector() \|\|		if (!SrcType.is128BitVector() \|\|
SrcType.getVectorNumElements() != DstType.getVectorNumElements())		SrcType.getVectorNumElements() != DstType.getVectorNumElements())
return SDValue();		return SDValue();
SDValue NewShuffle = DAG.getVectorShuffle(		SDValue NewShuffle = DAG.getVectorShuffle(
SrcType, SDLoc(N), CastOp, DAG.getUNDEF(SrcType), Shuffle->getMask());		SrcType, SDLoc(N), CastOp, DAG.getUNDEF(SrcType), Shuffle->getMask());
return DAG.getBitcast(DstType, NewShuffle);		return DAG.getBitcast(DstType, NewShuffle);
}		}

		static SDValue performVectorWidenCombine(SDNode *N,
		TargetLowering::DAGCombinerInfo &DCI) {
		auto &DAG = DCI.DAG;
		assert(N->getOpcode() == ISD::SIGN_EXTEND \|\|
		N->getOpcode() == ISD::ZERO_EXTEND);

		// Combine ({s,z}ext (extract_subvector src, i)) into a widening operation if
		// possible before the extract_subvector can be expanded.
		auto Extract = N->getOperand(0);
		if (Extract.getOpcode() != ISD::EXTRACT_SUBVECTOR)
		return SDValue();
		auto Source = Extract.getOperand(0);
		auto *IndexNode = dyn_cast<ConstantSDNode>(Extract.getOperand(1));
		if (IndexNode == nullptr)
		return SDValue();
		auto Index = IndexNode->getZExtValue();

		// Only v8i8 and v4i16 extracts can be widened, and only if the extracted
		// subvector is the low or high half of its source.
		EVT ResVT = N->getValueType(0);
		if (ResVT == MVT::v8i16) {
		if (Extract.getValueType() != MVT::v8i8 \|\|
		Source.getValueType() != MVT::v16i8 \|\| (Index != 0 && Index != 8))
		return SDValue();
		} else if (ResVT == MVT::v4i32) {
		if (Extract.getValueType() != MVT::v4i16 \|\|
		Source.getValueType() != MVT::v8i16 \|\| (Index != 0 && Index != 4))
		return SDValue();
		} else {
		return SDValue();
		}

		bool IsSext = N->getOpcode() == ISD::SIGN_EXTEND;
		bool IsLow = Index == 0;

		unsigned Op = IsSext ? (IsLow ? WebAssemblyISD::WIDEN_LOW_S
		: WebAssemblyISD::WIDEN_HIGH_S)
		: (IsLow ? WebAssemblyISD::WIDEN_LOW_U
		: WebAssemblyISD::WIDEN_HIGH_U);

		return DAG.getNode(Op, SDLoc(N), ResVT, Source);
		}

SDValue		SDValue
WebAssemblyTargetLowering::PerformDAGCombine(SDNode *N,		WebAssemblyTargetLowering::PerformDAGCombine(SDNode *N,
DAGCombinerInfo &DCI) const {		DAGCombinerInfo &DCI) const {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
return SDValue();		return SDValue();
case ISD::VECTOR_SHUFFLE:		case ISD::VECTOR_SHUFFLE:
return performVECTOR_SHUFFLECombine(N, DCI);		return performVECTOR_SHUFFLECombine(N, DCI);
		case ISD::SIGN_EXTEND:
		case ISD::ZERO_EXTEND:
		return performVectorWidenCombine(N, DCI);
}		}
}		}

llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td

	Show First 20 Lines • Show All 886 Lines • ▼ Show 20 Lines

	// Lower llvm.wasm.trunc.saturate.* to saturating instructions			// Lower llvm.wasm.trunc.saturate.* to saturating instructions
	def : Pat<(v4i32 (int_wasm_trunc_saturate_signed (v4f32 V128:$src))),			def : Pat<(v4i32 (int_wasm_trunc_saturate_signed (v4f32 V128:$src))),
	(fp_to_sint_v4i32_v4f32 (v4f32 V128:$src))>;			(fp_to_sint_v4i32_v4f32 (v4f32 V128:$src))>;
	def : Pat<(v4i32 (int_wasm_trunc_saturate_unsigned (v4f32 V128:$src))),			def : Pat<(v4i32 (int_wasm_trunc_saturate_unsigned (v4f32 V128:$src))),
	(fp_to_uint_v4i32_v4f32 (v4f32 V128:$src))>;			(fp_to_uint_v4i32_v4f32 (v4f32 V128:$src))>;

	// Widening operations			// Widening operations
				def widen_t : SDTypeProfile<1, 1, [SDTCisVec<0>, SDTCisVec<1>]>;
				def widen_low_s : SDNode<"WebAssemblyISD::WIDEN_LOW_S", widen_t>;
				def widen_high_s : SDNode<"WebAssemblyISD::WIDEN_HIGH_S", widen_t>;
				def widen_low_u : SDNode<"WebAssemblyISD::WIDEN_LOW_U", widen_t>;
				def widen_high_u : SDNode<"WebAssemblyISD::WIDEN_HIGH_U", widen_t>;

	multiclass SIMDWiden<ValueType vec_t, string vec, ValueType arg_t, string arg,			multiclass SIMDWiden<ValueType vec_t, string vec, ValueType arg_t, string arg,
	bits<32> baseInst> {			bits<32> baseInst> {
	defm "" : SIMDConvert<vec_t, arg_t, int_wasm_widen_low_signed,			defm "" : SIMDConvert<vec_t, arg_t, widen_low_s,
	vec#".widen_low_"#arg#"_s", baseInst>;			vec#".widen_low_"#arg#"_s", baseInst>;
	defm "" : SIMDConvert<vec_t, arg_t, int_wasm_widen_high_signed,			defm "" : SIMDConvert<vec_t, arg_t, widen_high_s,
	vec#".widen_high_"#arg#"_s", !add(baseInst, 1)>;			vec#".widen_high_"#arg#"_s", !add(baseInst, 1)>;
	defm "" : SIMDConvert<vec_t, arg_t, int_wasm_widen_low_unsigned,			defm "" : SIMDConvert<vec_t, arg_t, widen_low_u,
	vec#".widen_low_"#arg#"_u", !add(baseInst, 2)>;			vec#".widen_low_"#arg#"_u", !add(baseInst, 2)>;
	defm "" : SIMDConvert<vec_t, arg_t, int_wasm_widen_high_unsigned,			defm "" : SIMDConvert<vec_t, arg_t, widen_high_u,
	vec#".widen_high_"#arg#"_u", !add(baseInst, 3)>;			vec#".widen_high_"#arg#"_u", !add(baseInst, 3)>;
	}			}

	defm "" : SIMDWiden<v8i16, "i16x8", v16i8, "i8x16", 135>;			defm "" : SIMDWiden<v8i16, "i16x8", v16i8, "i8x16", 135>;
	defm "" : SIMDWiden<v4i32, "i32x4", v8i16, "i16x8", 167>;			defm "" : SIMDWiden<v4i32, "i32x4", v8i16, "i16x8", 167>;

	// Narrowing operations			// Narrowing operations
	multiclass SIMDNarrow<ValueType vec_t, string vec, ValueType arg_t, string arg,			multiclass SIMDNarrow<ValueType vec_t, string vec, ValueType arg_t, string arg,
	▲ Show 20 Lines • Show All 140 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-intrinsics.ll

	Show First 20 Lines • Show All 288 Lines • ▼ Show 20 Lines
	declare <8 x i16> @llvm.wasm.narrow.unsigned.v8i16.v4i32(<4 x i32>, <4 x i32>)			declare <8 x i16> @llvm.wasm.narrow.unsigned.v8i16.v4i32(<4 x i32>, <4 x i32>)
	define <8 x i16> @narrow_unsigned_v8i16(<4 x i32> %low, <4 x i32> %high) {			define <8 x i16> @narrow_unsigned_v8i16(<4 x i32> %low, <4 x i32> %high) {
	%a = call <8 x i16> @llvm.wasm.narrow.unsigned.v8i16.v4i32(			%a = call <8 x i16> @llvm.wasm.narrow.unsigned.v8i16.v4i32(
	<4 x i32> %low, <4 x i32> %high			<4 x i32> %low, <4 x i32> %high
	)			)
	ret <8 x i16> %a			ret <8 x i16> %a
	}			}

	; CHECK-LABEL: widen_low_signed_v8i16:
	; SIMD128-NEXT: .functype widen_low_signed_v8i16 (v128) -> (v128){{$}}
	; SIMD128-NEXT: i16x8.widen_low_i8x16_s $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <8 x i16> @llvm.wasm.widen.low.signed.v8i16.v16i8(<16 x i8>)
	define <8 x i16> @widen_low_signed_v8i16(<16 x i8> %v) {
	%a = call <8 x i16> @llvm.wasm.widen.low.signed.v8i16.v16i8(<16 x i8> %v)
	ret <8 x i16> %a
	}

	; CHECK-LABEL: widen_high_signed_v8i16:
	; SIMD128-NEXT: .functype widen_high_signed_v8i16 (v128) -> (v128){{$}}
	; SIMD128-NEXT: i16x8.widen_high_i8x16_s $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <8 x i16> @llvm.wasm.widen.high.signed.v8i16.v16i8(<16 x i8>)
	define <8 x i16> @widen_high_signed_v8i16(<16 x i8> %v) {
	%a = call <8 x i16> @llvm.wasm.widen.high.signed.v8i16.v16i8(<16 x i8> %v)
	ret <8 x i16> %a
	}

	; CHECK-LABEL: widen_low_unsigned_v8i16:
	; SIMD128-NEXT: .functype widen_low_unsigned_v8i16 (v128) -> (v128){{$}}
	; SIMD128-NEXT: i16x8.widen_low_i8x16_u $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <8 x i16> @llvm.wasm.widen.low.unsigned.v8i16.v16i8(<16 x i8>)
	define <8 x i16> @widen_low_unsigned_v8i16(<16 x i8> %v) {
	%a = call <8 x i16> @llvm.wasm.widen.low.unsigned.v8i16.v16i8(<16 x i8> %v)
	ret <8 x i16> %a
	}

	; CHECK-LABEL: widen_high_unsigned_v8i16:
	; SIMD128-NEXT: .functype widen_high_unsigned_v8i16 (v128) -> (v128){{$}}
	; SIMD128-NEXT: i16x8.widen_high_i8x16_u $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <8 x i16> @llvm.wasm.widen.high.unsigned.v8i16.v16i8(<16 x i8>)
	define <8 x i16> @widen_high_unsigned_v8i16(<16 x i8> %v) {
	%a = call <8 x i16> @llvm.wasm.widen.high.unsigned.v8i16.v16i8(<16 x i8> %v)
	ret <8 x i16> %a
	}

	; ==============================================================================			; ==============================================================================
	; 4 x i32			; 4 x i32
	; ==============================================================================			; ==============================================================================
	; CHECK-LABEL: dot:			; CHECK-LABEL: dot:
	; SIMD128-NEXT: .functype dot (v128, v128) -> (v128){{$}}			; SIMD128-NEXT: .functype dot (v128, v128) -> (v128){{$}}
	; SIMD128-NEXT: i32x4.dot_i16x8_s $push[[R:[0-9]+]]=, $0, $1{{$}}			; SIMD128-NEXT: i32x4.dot_i16x8_s $push[[R:[0-9]+]]=, $0, $1{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}			; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <4 x i32> @llvm.wasm.dot(<8 x i16>, <8 x i16>)			declare <4 x i32> @llvm.wasm.dot(<8 x i16>, <8 x i16>)
	▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	; SIMD128-NEXT: i32x4.trunc_sat_f32x4_u $push[[R:[0-9]+]]=, $0			; SIMD128-NEXT: i32x4.trunc_sat_f32x4_u $push[[R:[0-9]+]]=, $0
	; SIMD128-NEXT: return $pop[[R]]			; SIMD128-NEXT: return $pop[[R]]
	declare <4 x i32> @llvm.wasm.trunc.saturate.unsigned.v4i32.v4f32(<4 x float>)			declare <4 x i32> @llvm.wasm.trunc.saturate.unsigned.v4i32.v4f32(<4 x float>)
	define <4 x i32> @trunc_sat_u_v4i32(<4 x float> %x) {			define <4 x i32> @trunc_sat_u_v4i32(<4 x float> %x) {
	%a = call <4 x i32> @llvm.wasm.trunc.saturate.unsigned.v4i32.v4f32(<4 x float> %x)			%a = call <4 x i32> @llvm.wasm.trunc.saturate.unsigned.v4i32.v4f32(<4 x float> %x)
	ret <4 x i32> %a			ret <4 x i32> %a
	}			}

	; CHECK-LABEL: widen_low_signed_v4i32:
	; SIMD128-NEXT: .functype widen_low_signed_v4i32 (v128) -> (v128){{$}}
	; SIMD128-NEXT: i32x4.widen_low_i16x8_s $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <4 x i32> @llvm.wasm.widen.low.signed.v4i32.v8i16(<8 x i16>)
	define <4 x i32> @widen_low_signed_v4i32(<8 x i16> %v) {
	%a = call <4 x i32> @llvm.wasm.widen.low.signed.v4i32.v8i16(<8 x i16> %v)
	ret <4 x i32> %a
	}

	; CHECK-LABEL: widen_high_signed_v4i32:
	; SIMD128-NEXT: .functype widen_high_signed_v4i32 (v128) -> (v128){{$}}
	; SIMD128-NEXT: i32x4.widen_high_i16x8_s $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <4 x i32> @llvm.wasm.widen.high.signed.v4i32.v8i16(<8 x i16>)
	define <4 x i32> @widen_high_signed_v4i32(<8 x i16> %v) {
	%a = call <4 x i32> @llvm.wasm.widen.high.signed.v4i32.v8i16(<8 x i16> %v)
	ret <4 x i32> %a
	}

	; CHECK-LABEL: widen_low_unsigned_v4i32:
	; SIMD128-NEXT: .functype widen_low_unsigned_v4i32 (v128) -> (v128){{$}}
	; SIMD128-NEXT: i32x4.widen_low_i16x8_u $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <4 x i32> @llvm.wasm.widen.low.unsigned.v4i32.v8i16(<8 x i16>)
	define <4 x i32> @widen_low_unsigned_v4i32(<8 x i16> %v) {
	%a = call <4 x i32> @llvm.wasm.widen.low.unsigned.v4i32.v8i16(<8 x i16> %v)
	ret <4 x i32> %a
	}

	; CHECK-LABEL: widen_high_unsigned_v4i32:
	; SIMD128-NEXT: .functype widen_high_unsigned_v4i32 (v128) -> (v128){{$}}
	; SIMD128-NEXT: i32x4.widen_high_i16x8_u $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <4 x i32> @llvm.wasm.widen.high.unsigned.v4i32.v8i16(<8 x i16>)
	define <4 x i32> @widen_high_unsigned_v4i32(<8 x i16> %v) {
	%a = call <4 x i32> @llvm.wasm.widen.high.unsigned.v4i32.v8i16(<8 x i16> %v)
	ret <4 x i32> %a
	}

	; ==============================================================================			; ==============================================================================
	; 2 x i64			; 2 x i64
	; ==============================================================================			; ==============================================================================
	; CHECK-LABEL: any_v2i64:			; CHECK-LABEL: any_v2i64:
	; SIMD128-NEXT: .functype any_v2i64 (v128) -> (i32){{$}}			; SIMD128-NEXT: .functype any_v2i64 (v128) -> (i32){{$}}
	; SIMD128-NEXT: i64x2.any_true $push[[R:[0-9]+]]=, $0{{$}}			; SIMD128-NEXT: i64x2.any_true $push[[R:[0-9]+]]=, $0{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}			; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare i32 @llvm.wasm.anytrue.v2i64(<2 x i64>)			declare i32 @llvm.wasm.anytrue.v2i64(<2 x i64>)
	▲ Show 20 Lines • Show All 224 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-widening.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mattr=+simd128 \| FileCheck %s

				;; Test that SIMD widening operations can be successfully selected

				target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
				target triple = "wasm32-unknown-unknown"

				define <8 x i16> @widen_low_i8x16_s(<16 x i8> %v) {
				; CHECK-LABEL: widen_low_i8x16_s:
				; CHECK: .functype widen_low_i8x16_s (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i16x8.widen_low_i8x16_s
				; CHECK-NEXT: # fallthrough-return
				%low = shufflevector <16 x i8> %v, <16 x i8> undef,
				<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				%widened = sext <8 x i8> %low to <8 x i16>
				ret <8 x i16> %widened
				}

				define <8 x i16> @widen_low_i8x16_u(<16 x i8> %v) {
				; CHECK-LABEL: widen_low_i8x16_u:
				; CHECK: .functype widen_low_i8x16_u (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i16x8.widen_low_i8x16_u
				; CHECK-NEXT: # fallthrough-return
				%low = shufflevector <16 x i8> %v, <16 x i8> undef,
				<8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
				%widened = zext <8 x i8> %low to <8 x i16>
				ret <8 x i16> %widened
				}

				define <8 x i16> @widen_high_i8x16_s(<16 x i8> %v) {
				; CHECK-LABEL: widen_high_i8x16_s:
				; CHECK: .functype widen_high_i8x16_s (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i16x8.widen_high_i8x16_s
				; CHECK-NEXT: # fallthrough-return
				%low = shufflevector <16 x i8> %v, <16 x i8> undef,
				<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				%widened = sext <8 x i8> %low to <8 x i16>
				ret <8 x i16> %widened
				}

				define <8 x i16> @widen_high_i8x16_u(<16 x i8> %v) {
				; CHECK-LABEL: widen_high_i8x16_u:
				; CHECK: .functype widen_high_i8x16_u (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i16x8.widen_high_i8x16_u
				; CHECK-NEXT: # fallthrough-return
				%low = shufflevector <16 x i8> %v, <16 x i8> undef,
				<8 x i32> <i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15>
				%widened = zext <8 x i8> %low to <8 x i16>
				ret <8 x i16> %widened
				}

				define <4 x i32> @widen_low_i16x8_s(<8 x i16> %v) {
				; CHECK-LABEL: widen_low_i16x8_s:
				; CHECK: .functype widen_low_i16x8_s (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i32x4.widen_low_i16x8_s
				; CHECK-NEXT: # fallthrough-return
				%low = shufflevector <8 x i16> %v, <8 x i16> undef,
				<4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%widened = sext <4 x i16> %low to <4 x i32>
				ret <4 x i32> %widened
				}

				define <4 x i32> @widen_low_i16x8_u(<8 x i16> %v) {
				; CHECK-LABEL: widen_low_i16x8_u:
				; CHECK: .functype widen_low_i16x8_u (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i32x4.widen_low_i16x8_u
				; CHECK-NEXT: # fallthrough-return
				%low = shufflevector <8 x i16> %v, <8 x i16> undef,
				<4 x i32> <i32 0, i32 1, i32 2, i32 3>
				%widened = zext <4 x i16> %low to <4 x i32>
				ret <4 x i32> %widened
				}

				define <4 x i32> @widen_high_i16x8_s(<8 x i16> %v) {
				; CHECK-LABEL: widen_high_i16x8_s:
				; CHECK: .functype widen_high_i16x8_s (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i32x4.widen_high_i16x8_s
				; CHECK-NEXT: # fallthrough-return
				%low = shufflevector <8 x i16> %v, <8 x i16> undef,
				<4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%widened = sext <4 x i16> %low to <4 x i32>
				ret <4 x i32> %widened
				}

				define <4 x i32> @widen_high_i16x8_u(<8 x i16> %v) {
				; CHECK-LABEL: widen_high_i16x8_u:
				; CHECK: .functype widen_high_i16x8_u (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i32x4.widen_high_i16x8_u
				; CHECK-NEXT: # fallthrough-return
				%low = shufflevector <8 x i16> %v, <8 x i16> undef,
				<4 x i32> <i32 4, i32 5, i32 6, i32 7>
				%widened = zext <4 x i16> %low to <4 x i32>
				ret <4 x i32> %widened
				}

				;; Also test that similar patterns with offsets not corresponding to
				;; the low or high half are correctly expanded.
				aheejinUnsubmitted Not Done Reply Inline Actions It'd be clearer to say starting indices of these don't start with 0 or [lanecount - 1] so they can't be widened using `widen_low` or `widen_high` instructions. Question: Can we also widen these using shifts? aheejin: It'd be clearer to say starting indices of these don't start with 0 or [lanecount - 1] so they…
				tlivelyAuthorUnsubmitted Done Reply Inline Actions Sure, since I didn't end up testing more patterns, I can make the comment more specific. Regarding shifts, I don't think it's possible to do widening with shifts because widening has to fundamentally change the number of lanes, which shifts can't do. tlively: Sure, since I didn't end up testing more patterns, I can make the comment more specific.
				aheejinUnsubmitted Not Done Reply Inline Actions What I meant was, in case of i16x8->i32x4, the current code can widen i16x8 input vector with elements in the indices 0 to 4. If those elements are instead in 1 to 5, can we first shift that to 0~4 and widen it? aheejin: What I meant was, in case of i16x8->i32x4, the current code can widen i16x8 input vector with…
				tlivelyAuthorUnsubmitted Done Reply Inline Actions Oh gotcha. No, unfortunately I don't think that would work. The SIMD shift instructions shift bytes within lanes but they can't shift data into a different lane. Even if we used shifts on larger lanes to try to overcome that limitation, a 64x2 shift would still not be able to shift data from the high half of the vector to the low half or vice versa, which we would need to do to implement your suggestion. tlively: Oh gotcha. No, unfortunately I don't think that would work. The SIMD shift instructions shift…
				aheejinUnsubmitted Not Done Reply Inline Actions Ah right... I was confused about SIMD shifts. Thanks. aheejin: Ah right... I was confused about SIMD shifts. Thanks.

				define <8 x i16> @widen_lowish_i8x16_s(<16 x i8> %v) {
				; CHECK-LABEL: widen_lowish_i8x16_s:
				; CHECK: .functype widen_lowish_i8x16_s (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i8x16.extract_lane_u 1
				; CHECK-NEXT: i16x8.splat
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i8x16.extract_lane_u 2
				; CHECK-NEXT: i16x8.replace_lane 1
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i8x16.extract_lane_u 3
				; CHECK-NEXT: i16x8.replace_lane 2
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i8x16.extract_lane_u 4
				; CHECK-NEXT: i16x8.replace_lane 3
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i8x16.extract_lane_u 5
				; CHECK-NEXT: i16x8.replace_lane 4
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i8x16.extract_lane_u 6
				; CHECK-NEXT: i16x8.replace_lane 5
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i8x16.extract_lane_u 7
				; CHECK-NEXT: i16x8.replace_lane 6
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i8x16.extract_lane_u 8
				; CHECK-NEXT: i16x8.replace_lane 7
				; CHECK-NEXT: i32.const 8
				; CHECK-NEXT: i16x8.shl
				; CHECK-NEXT: i32.const 8
				; CHECK-NEXT: i16x8.shr_s
				; CHECK-NEXT: # fallthrough-return
				%lowish = shufflevector <16 x i8> %v, <16 x i8> undef,
				<8 x i32> <i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8>
				%widened = sext <8 x i8> %lowish to <8 x i16>
				ret <8 x i16> %widened
				}

				define <4 x i32> @widen_lowish_i16x8_s(<8 x i16> %v) {
				; CHECK-LABEL: widen_lowish_i16x8_s:
				; CHECK: .functype widen_lowish_i16x8_s (v128) -> (v128)
				; CHECK-NEXT: # %bb.0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i16x8.extract_lane_u 1
				; CHECK-NEXT: i32x4.splat
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i16x8.extract_lane_u 2
				; CHECK-NEXT: i32x4.replace_lane 1
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i16x8.extract_lane_u 3
				; CHECK-NEXT: i32x4.replace_lane 2
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i16x8.extract_lane_u 4
				; CHECK-NEXT: i32x4.replace_lane 3
				; CHECK-NEXT: i32.const 16
				; CHECK-NEXT: i32x4.shl
				; CHECK-NEXT: i32.const 16
				; CHECK-NEXT: i32x4.shr_s
				; CHECK-NEXT: # fallthrough-return
				%lowish = shufflevector <8 x i16> %v, <8 x i16> undef,
				<4 x i32> <i32 1, i32 2, i32 3, i32 4>
				%widened = sext <4 x i16> %lowish to <4 x i32>
				ret <4 x i32> %widened
				}

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Remove intrinsics for SIMD widening opsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 281442

clang/include/clang/Basic/BuiltinsWebAssembly.def

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/Headers/wasm_simd128.h

clang/test/CodeGen/builtins-wasm.c

llvm/include/llvm/IR/IntrinsicsWebAssembly.td

llvm/lib/Target/WebAssembly/WebAssemblyISD.def

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td

llvm/test/CodeGen/WebAssembly/simd-intrinsics.ll

llvm/test/CodeGen/WebAssembly/simd-widening.ll

[WebAssembly] Remove intrinsics for SIMD widening ops
ClosedPublic