This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
BuiltinsWebAssembly.def
-
lib/
-
CodeGen/
-
CGBuiltin.cpp
-
Headers/
-
wasm_simd128.h
-
test/CodeGen/
-
CodeGen/
-
builtins-wasm.c
-
llvm/
-
include/llvm/IR/
-
llvm/
-
IR/
1/2
IntrinsicsWebAssembly.td
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
1/2
WebAssemblyISelLowering.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
simd-intrinsics.ll

Differential D66983

[WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic
ClosedPublic

Authored by tlively on Aug 29 2019, 5:15 PM.

Download Raw Diff

Details

Reviewers

aheejin
dschuff

Commits

rG8e3e56f2a367: [WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic

Summary

Although using __builtin_shufflevector and the shufflevector
instruction works fine, they are not opaque to the optimizer. As a
result, DAGCombine can potentially reduce the number of shuffles and
change the shuffle masks. This is unexpected behavior for users of the
WebAssembly SIMD intrinsics who have crafted their shuffles to
optimize the code generated by engines. This patch solves the problem
by adding a new shuffle intrinsic that is opaque to the optimizers in
line with the decision of the WebAssembly SIMD contributors at
https://github.com/WebAssembly/simd/issues/196#issuecomment-622494748. In
the future we may implement custom DAG combines to properly optimize
shuffles and replace this solution.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tlively created this revision.Aug 29 2019, 5:15 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 29 2019, 5:15 PM

Herald added subscribers: llvm-commits, cfe-commits, sunfish and 3 others. · View Herald Transcript

Harbormaster completed remote builds in B37549: Diff 218001.Aug 29 2019, 5:17 PM

Can you say what leads wasm users to maintain such an expectation when, for example, ARM users and x86 users don't?

I think the expectation is that if you use an intrinsics header that has an intrinsic for each machine instruction, that each intrinsic call should result in exactly one machine instruction with the same arguments you passed to it. If the vector operation is created some other way (e.g. autovectorization or even __builtin_shufflevector) then I agree that there doesn't necessarily have to be that expectation.

Oh, interesting I didn't notice that those are implementations of the target-specific intrinsics. I wonder if they do that so they can implement the intrinsics on hardware that doesn't support the corresponding instruction? In a situation other than that I think I'd be surprised if I used a builtin for a particular instruction and got something else.

x86 is aggressive about optimizing shuffles no matter where they came from. FWIW, InstCombine has a general rule that its not supposed to create a shuffle mask that didn't already exist in the IR except for special things like identity masks that would allow the shuffle to be removed. DAG combine is supposed to check with TargetLowering::isShuffleMaskLegal.

In D66983#1651977, @dschuff wrote:

Oh, interesting I didn't notice that those are implementations of the target-specific intrinsics. I wonder if they do that so they can implement the intrinsics on hardware that doesn't support the corresponding instruction? In a situation other than that I think I'd be surprised if I used a builtin for a particular instruction and got something else.

One of the reasons is a belief that LLVM's optimizations are desirable, and that if there are cases where LLVM's optimizations make code worse, it's a bug in LLVM which should be fixed. Do you have any such cases?

I should say, I myself am sympathetic to the argument that if you write _mm_shuffle_ps, you might really want SHUFPS, and not __builtin_shufflevector, but I'm not aware of any target in LLVM that works this way, which is something to consider.

In fact, the argument that __mm_shuffle_ps is SHUFPS seems like it ought to be stronger for x86 than wasm, because x86 has about 100 different shuffle instructions each with their own opinion, while wasm just has one shuffle instruction that just does everything.

tlively added a child revision: D67020: [WebAssembly] Add SIMD QFMA/QFMS.Aug 30 2019, 2:21 PM

The context for this CL is https://github.com/emscripten-core/emscripten/issues/9340. The code that does the undesirable optimization is around llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18776. I think it is a reasonable assumption that what you put in is what you get out, especially if you're trying to trigger particular WebAssembly engine behavior that the backend should not really be reasoning about. Now perhaps users should not be reasoning about it either, but given that they are I think this patch is reasonable.

In D66983#1653226, @tlively wrote:

The code that does the undesirable optimization is around llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:18776.

Now line 18776 is a blank line :) You can take a permalink from the https://github.com/llvm/llvm-project repo.

Link to DAGCombiner.cpp code: https://github.com/llvm/llvm-project/blob/802aab5/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L19014

In D66983#1651981, @craig.topper wrote:

DAG combine is supposed to check with TargetLowering::isShuffleMaskLegal.

In @tlively's example, it is DAGCombine, and it does check isShuffleMaskLegal. However for wasm, it appears that's not enough -- in wasm, all shuffle masks are legal, because you can do any shuffle in a single wasm instruction. This makes it tricky, because the user here is aiming for an x86-like cost model, but the LLVM wasm backend doesn't have any x86-specific knowledge, so it just tells DAGCombine to form any shuffle it sees fit.

I wonder if it would make sense to introduce a counterpart to isShuffleMaskLegal, which instead of returning a bool returned a cost value. And then, we could teach the wasm backend about certain shuffle patterns which are known to be fast across multiple architectures. Then we could teach DAGComine to check whether the new shuffle it wants to create is actually cheaper than the one it's replacing. Thoughts?

tlively removed a child revision: D67020: [WebAssembly] Add SIMD QFMA/QFMS.Aug 30 2019, 5:05 PM

@sunfish That sounds like a useful mechanism to me. I'd be happy to work on that next week. Is the consensus that we should not merge this change and instead pursue that idea?

Yeah I think that can be a more generalized solution too.

Abandoning in favor of @sunfish's idea for introducing a cost mechanism for shuffle masks in DAGCombiner.

tlively edited the summary of this revision. (Show Details)Oct 17 2019, 10:32 AM

I'm re-opening this revision. After discussion on https://github.com/WebAssembly/simd/issues/118, there is clear consensus that we do not want to break WebAssembly's abstraction and consider underlying platforms, so shuffles should not be merged without some sort of modification to the spec to serve as a platform-agnostic indication that the merged shuffle will be better.

aheejin added inline comments.Oct 18 2019, 4:11 PM

llvm/include/llvm/IR/IntrinsicsWebAssembly.td
143	i32 is bigger than `ImmLaneIdx32`. Should we model this into something smaller, like i8? What happens if we specify an index grater than 31? (I think this question also applies to other intrinsics and builtins. I don't think it matters a lot given than all integers are larger than lane types though.)
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1369	This looks rather straightforward... Can't we do this in TableGen?

This issue is still not resolved.

tlively edited the summary of this revision. (Show Details)May 7 2020, 5:07 PM

Rebase and update intrinsics header

Rebase, update intrinsics header, and address comment

Ok, this is ready for review for real this time. The WebAssembly SIMD contributors have decided that this is an appropriate direction to go in, and we are leaving the door open for future improvements.

llvm/include/llvm/IR/IntrinsicsWebAssembly.td
143	It turns out that it would have been an ISel failure. I fixed this to replace an out-of-bounds indices with 0.
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1369	No, I don't know of a simple way to handle undef lanes in TableGen. I could look into using custom patterns and transform nodes, but in the end this code is probably simpler the way it is.

Harbormaster failed remote builds in B56123: Diff 262795!May 7 2020, 7:00 PM

Harbormaster failed remote builds in B56124: Diff 262798!May 7 2020, 7:32 PM

aheejin accepted this revision.May 8 2020, 6:33 PM

This revision is now accepted and ready to land.May 8 2020, 6:33 PM

Closed by commit rG8e3e56f2a367: [WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic (authored by tlively). · Explain WhyMay 11 2020, 10:12 AM

This revision was automatically updated to reflect the committed changes.

penzn added a reverting change: D140773: [WebAssembly] Use `shufflevector` for shuffle.Dec 29 2022, 11:12 PM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsWebAssembly.def

1 line

lib/

CodeGen/

CGBuiltin.cpp

14 lines

Headers/

wasm_simd128.h

28 lines

test/

CodeGen/

builtins-wasm.c

9 lines

llvm/

include/

llvm/

IR/

IntrinsicsWebAssembly.td

9 lines

lib/

Target/

WebAssembly/

WebAssemblyISelLowering.cpp

18 lines

test/

CodeGen/

WebAssembly/

simd-intrinsics.ll

30 lines

Diff 263214

clang/include/clang/Basic/BuiltinsWebAssembly.def

	Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines
	TARGET_BUILTIN(__builtin_wasm_min_u_i32x4, "V4iV4iV4i", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_min_u_i32x4, "V4iV4iV4i", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_max_s_i32x4, "V4iV4iV4i", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_max_s_i32x4, "V4iV4iV4i", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_max_u_i32x4, "V4iV4iV4i", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_max_u_i32x4, "V4iV4iV4i", "nc", "simd128")

	TARGET_BUILTIN(__builtin_wasm_avgr_u_i8x16, "V16cV16cV16c", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_avgr_u_i8x16, "V16cV16cV16c", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_avgr_u_i16x8, "V8sV8sV8s", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_avgr_u_i16x8, "V8sV8sV8s", "nc", "simd128")

	TARGET_BUILTIN(__builtin_wasm_bitselect, "V4iV4iV4iV4i", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_bitselect, "V4iV4iV4iV4i", "nc", "simd128")
				TARGET_BUILTIN(__builtin_wasm_shuffle_v8x16, "V16cV16cV16cIiIiIiIiIiIiIiIiIiIiIiIiIiIiIiIi", "nc", "simd128")

	TARGET_BUILTIN(__builtin_wasm_any_true_i8x16, "iV16c", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_any_true_i8x16, "iV16c", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_any_true_i16x8, "iV8s", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_any_true_i16x8, "iV8s", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_any_true_i32x4, "iV4i", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_any_true_i32x4, "iV4i", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_any_true_i64x2, "iV2LLi", "nc", "unimplemented-simd128")			TARGET_BUILTIN(__builtin_wasm_any_true_i64x2, "iV2LLi", "nc", "unimplemented-simd128")
	TARGET_BUILTIN(__builtin_wasm_all_true_i8x16, "iV16c", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_all_true_i8x16, "iV16c", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_all_true_i16x8, "iV8s", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_all_true_i16x8, "iV8s", "nc", "simd128")
	TARGET_BUILTIN(__builtin_wasm_all_true_i32x4, "iV4i", "nc", "simd128")			TARGET_BUILTIN(__builtin_wasm_all_true_i32x4, "iV4i", "nc", "simd128")
	▲ Show 20 Lines • Show All 43 Lines • Show Last 20 Lines

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 16,038 Lines • ▼ Show 20 Lines	case WebAssembly::BI__builtin_wasm_widen_high_u_i32x4_i16x8:
break;		break;
default:		default:
llvm_unreachable("unexpected builtin ID");		llvm_unreachable("unexpected builtin ID");
}		}
Function *Callee =		Function *Callee =
CGM.getIntrinsic(IntNo, {ConvertType(E->getType()), Vec->getType()});		CGM.getIntrinsic(IntNo, {ConvertType(E->getType()), Vec->getType()});
return Builder.CreateCall(Callee, Vec);		return Builder.CreateCall(Callee, Vec);
}		}
		case WebAssembly::BI__builtin_wasm_shuffle_v8x16: {
		Value *Ops[18];
		size_t OpIdx = 0;
		Ops[OpIdx++] = EmitScalarExpr(E->getArg(0));
		Ops[OpIdx++] = EmitScalarExpr(E->getArg(1));
		while (OpIdx < 18) {
		llvm::APSInt LaneConst;
		if (!E->getArg(OpIdx)->isIntegerConstantExpr(LaneConst, getContext()))
		llvm_unreachable("Constant arg isn't actually constant?");
		Ops[OpIdx++] = llvm::ConstantInt::get(getLLVMContext(), LaneConst);
		}
		Function *Callee = CGM.getIntrinsic(Intrinsic::wasm_shuffle);
		return Builder.CreateCall(Callee, Ops);
		}
default:		default:
return nullptr;		return nullptr;
}		}
}		}

static std::pair<Intrinsic::ID, unsigned>		static std::pair<Intrinsic::ID, unsigned>
getIntrinsicForHexagonNonGCCBuiltin(unsigned BuiltinID) {		getIntrinsicForHexagonNonGCCBuiltin(unsigned BuiltinID) {
struct Info {		struct Info {
▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

clang/lib/Headers/wasm_simd128.h

	Show First 20 Lines • Show All 1,014 Lines • ▼ Show 20 Lines
	static __inline__ v128_t __DEFAULT_FN_ATTRS			static __inline__ v128_t __DEFAULT_FN_ATTRS
	wasm_f32x4_convert_u32x4(v128_t __a) {			wasm_f32x4_convert_u32x4(v128_t __a) {
	return (v128_t) __builtin_convertvector((__u32x4)__a, __f32x4);			return (v128_t) __builtin_convertvector((__u32x4)__a, __f32x4);
	}			}

	#define wasm_v8x16_shuffle(__a, __b, __c0, __c1, __c2, __c3, __c4, __c5, __c6, \			#define wasm_v8x16_shuffle(__a, __b, __c0, __c1, __c2, __c3, __c4, __c5, __c6, \
	__c7, __c8, __c9, __c10, __c11, __c12, __c13, \			__c7, __c8, __c9, __c10, __c11, __c12, __c13, \
	__c14, __c15) \			__c14, __c15) \
	((v128_t)(__builtin_shufflevector( \			((v128_t)__builtin_wasm_shuffle_v8x16( \
	(__u8x16)(__a), (__u8x16)(__b), __c0, __c1, __c2, __c3, __c4, __c5, \			(__i8x16)(__a), (__i8x16)(__b), __c0, __c1, __c2, __c3, __c4, __c5, \
	__c6, __c7, __c8, __c9, __c10, __c11, __c12, __c13, __c14, __c15)))			__c6, __c7, __c8, __c9, __c10, __c11, __c12, __c13, __c14, __c15))

	#define wasm_v16x8_shuffle(__a, __b, __c0, __c1, __c2, __c3, __c4, __c5, __c6, \			#define wasm_v16x8_shuffle(__a, __b, __c0, __c1, __c2, __c3, __c4, __c5, __c6, \
	__c7) \			__c7) \
	((v128_t)(__builtin_shufflevector((__u16x8)(__a), (__u16x8)(__b), __c0, \			((v128_t)__builtin_wasm_shuffle_v8x16( \
	__c1, __c2, __c3, __c4, __c5, __c6, \			(__i8x16)(__a), (__i8x16)(__b), __c0 * 2, __c0 * 2 + 1, __c1 * 2, \
	__c7)))			__c1 * 2 + 1, __c2 * 2, __c2 * 2 + 1, __c3 * 2, __c3 * 2 + 1, __c4 * 2, \
				__c4 * 2 + 1, __c5 * 2, __c5 * 2 + 1, __c6 * 2, __c6 * 2 + 1, __c7 * 2, \
				__c7 * 2 + 1))

	#define wasm_v32x4_shuffle(__a, __b, __c0, __c1, __c2, __c3) \			#define wasm_v32x4_shuffle(__a, __b, __c0, __c1, __c2, __c3) \
	((v128_t)(__builtin_shufflevector((__u32x4)(__a), (__u32x4)(__b), __c0, \			((v128_t)__builtin_wasm_shuffle_v8x16( \
	__c1, __c2, __c3)))			(__i8x16)(__a), (__i8x16)(__b), __c0 * 4, __c0 * 4 + 1, __c0 * 4 + 2, \
				__c0 * 4 + 3, __c1 * 4, __c1 * 4 + 1, __c1 * 4 + 2, __c1 * 4 + 3, \
				__c2 * 4, __c2 * 4 + 1, __c2 * 4 + 2, __c2 * 4 + 3, __c3 * 4, \
				__c3 * 4 + 1, __c3 * 4 + 2, __c3 * 4 + 3))

	#define wasm_v64x2_shuffle(__a, __b, __c0, __c1) \			#define wasm_v64x2_shuffle(__a, __b, __c0, __c1) \
	((v128_t)( \			((v128_t)__builtin_wasm_shuffle_v8x16( \
	__builtin_shufflevector((__u64x2)(__a), (__u64x2)(__b), __c0, __c1)))			(__i8x16)(__a), (__i8x16)(__b), __c0 * 8, __c0 * 8 + 1, __c0 * 8 + 2, \
				__c0 * 8 + 3, __c0 * 8 + 4, __c0 * 8 + 5, __c0 * 8 + 6, __c0 * 8 + 7, \
				__c1 * 8, __c1 * 8 + 1, __c1 * 8 + 2, __c1 * 8 + 3, __c1 * 8 + 4, \
				__c1 * 8 + 5, __c1 * 8 + 6, __c1 * 8 + 7))

	#ifdef __wasm_unimplemented_simd128__			#ifdef __wasm_unimplemented_simd128__

	static __inline__ v128_t __DEFAULT_FN_ATTRS wasm_v8x16_swizzle(v128_t __a,			static __inline__ v128_t __DEFAULT_FN_ATTRS wasm_v8x16_swizzle(v128_t __a,
	v128_t __b) {			v128_t __b) {
	return (v128_t)__builtin_wasm_swizzle_v8x16((__i8x16)__a, (__i8x16)__b);			return (v128_t)__builtin_wasm_swizzle_v8x16((__i8x16)__a, (__i8x16)__b);
	}			}

	▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

clang/test/CodeGen/builtins-wasm.c

Show First 20 Lines • Show All 718 Lines • ▼ Show 20 Lines	i32x4 widen_high_u_i32x4_i16x8(i16x8 v) {
return __builtin_wasm_widen_high_u_i32x4_i16x8(v);		return __builtin_wasm_widen_high_u_i32x4_i16x8(v);
// WEBASSEMBLY: call <4 x i32> @llvm.wasm.widen.high.unsigned.v4i32.v8i16(<8 x i16> %v)		// WEBASSEMBLY: call <4 x i32> @llvm.wasm.widen.high.unsigned.v4i32.v8i16(<8 x i16> %v)
// WEBASSEMBLY: ret		// WEBASSEMBLY: ret
}		}

i8x16 swizzle_v8x16(i8x16 x, i8x16 y) {		i8x16 swizzle_v8x16(i8x16 x, i8x16 y) {
return __builtin_wasm_swizzle_v8x16(x, y);		return __builtin_wasm_swizzle_v8x16(x, y);
// WEBASSEMBLY: call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %x, <16 x i8> %y)		// WEBASSEMBLY: call <16 x i8> @llvm.wasm.swizzle(<16 x i8> %x, <16 x i8> %y)
		}

		i8x16 shuffle(i8x16 x, i8x16 y) {
		return __builtin_wasm_shuffle_v8x16(x, y, 0, 1, 2, 3, 4, 5, 6, 7,
		8, 9, 10, 11, 12, 13, 14, 15);
		// WEBASSEMBLY: call <16 x i8> @llvm.wasm.shuffle(<16 x i8> %x, <16 x i8> %y,
		// WEBASSEMBLY-SAME: i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
		// WEBASSEMBLY-SAME: i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14,
		// WEBASSEMBLY-SAME: i32 15
// WEBASSEMBLY-NEXT: ret		// WEBASSEMBLY-NEXT: ret
}		}

llvm/include/llvm/IR/IntrinsicsWebAssembly.td

Show First 20 Lines • Show All 98 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// SIMD intrinsics		// SIMD intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def int_wasm_swizzle :		def int_wasm_swizzle :
Intrinsic<[llvm_v16i8_ty],		Intrinsic<[llvm_v16i8_ty],
[llvm_v16i8_ty, llvm_v16i8_ty],		[llvm_v16i8_ty, llvm_v16i8_ty],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
		def int_wasm_shuffle :
		Intrinsic<[llvm_v16i8_ty],
		[llvm_v16i8_ty, llvm_v16i8_ty, llvm_i32_ty, llvm_i32_ty,
		llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty,
		llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty,
		llvm_i32_ty, llvm_i32_ty, llvm_i32_ty, llvm_i32_ty],
		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_sub_saturate_signed :		def int_wasm_sub_saturate_signed :
Intrinsic<[llvm_anyvector_ty],		Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_sub_saturate_unsigned :		def int_wasm_sub_saturate_unsigned :
Intrinsic<[llvm_anyvector_ty],		Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_avgr_unsigned :		def int_wasm_avgr_unsigned :
Intrinsic<[llvm_anyvector_ty],		Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;

def int_wasm_bitselect :		def int_wasm_bitselect :
Intrinsic<[llvm_anyvector_ty],		Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_anytrue :		def int_wasm_anytrue :
Intrinsic<[llvm_i32_ty],		Intrinsic<[llvm_i32_ty],
[llvm_anyvector_ty],		[llvm_anyvector_ty],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_alltrue :		def int_wasm_alltrue :
Intrinsic<[llvm_i32_ty],		Intrinsic<[llvm_i32_ty],
[llvm_anyvector_ty],		[llvm_anyvector_ty],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_bitmask :		def int_wasm_bitmask :
Intrinsic<[llvm_i32_ty],		Intrinsic<[llvm_i32_ty],
[llvm_anyvector_ty],		[llvm_anyvector_ty],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_qfma :		def int_wasm_qfma :
Intrinsic<[llvm_anyvector_ty],		Intrinsic<[llvm_anyvector_ty],
		aheejinUnsubmitted Not Done Reply Inline Actions i32 is bigger than `ImmLaneIdx32`. Should we model this into something smaller, like i8? What happens if we specify an index grater than 31? (I think this question also applies to other intrinsics and builtins. I don't think it matters a lot given than all integers are larger than lane types though.) aheejin: i32 is bigger than `ImmLaneIdx32`. Should we model this into something smaller, like i8? What…
		tlivelyAuthorUnsubmitted Done Reply Inline Actions It turns out that it would have been an ISel failure. I fixed this to replace an out-of-bounds indices with 0. tlively: It turns out that it would have been an ISel failure. I fixed this to replace an out-of-bounds…
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_qfms :		def int_wasm_qfms :
Intrinsic<[llvm_anyvector_ty],		Intrinsic<[llvm_anyvector_ty],
[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],		[LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_dot :		def int_wasm_dot :
Intrinsic<[llvm_v4i32_ty],		Intrinsic<[llvm_v4i32_ty],
Show All 19 Lines	def int_wasm_widen_low_unsigned :
Intrinsic<[llvm_anyvector_ty],		Intrinsic<[llvm_anyvector_ty],
[llvm_anyvector_ty],		[llvm_anyvector_ty],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;
def int_wasm_widen_high_unsigned :		def int_wasm_widen_high_unsigned :
Intrinsic<[llvm_anyvector_ty],		Intrinsic<[llvm_anyvector_ty],
[llvm_anyvector_ty],		[llvm_anyvector_ty],
[IntrNoMem, IntrSpeculatable]>;		[IntrNoMem, IntrSpeculatable]>;


//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Bulk memory intrinsics		// Bulk memory intrinsics
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

def int_wasm_memory_init :		def int_wasm_memory_init :
Intrinsic<[],		Intrinsic<[],
[llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty],		[llvm_i32_ty, llvm_i32_ty, llvm_ptr_ty, llvm_i32_ty, llvm_i32_ty],
[IntrWriteMem, IntrInaccessibleMemOrArgMemOnly, WriteOnly<2>,		[IntrWriteMem, IntrInaccessibleMemOrArgMemOnly, WriteOnly<2>,
Show All 26 Lines

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 1,347 Lines • ▼ Show 20 Lines	case Intrinsic::wasm_throw: {
return DAG.getNode(WebAssemblyISD::THROW, DL,		return DAG.getNode(WebAssemblyISD::THROW, DL,
MVT::Other, // outchain type		MVT::Other, // outchain type
{		{
Op.getOperand(0), // inchain		Op.getOperand(0), // inchain
SymNode, // exception symbol		SymNode, // exception symbol
Op.getOperand(3) // thrown value		Op.getOperand(3) // thrown value
});		});
}		}

		case Intrinsic::wasm_shuffle: {
		// Drop in-chain and replace undefs, but otherwise pass through unchanged
		SDValue Ops[18];
		size_t OpIdx = 0;
		Ops[OpIdx++] = Op.getOperand(1);
		Ops[OpIdx++] = Op.getOperand(2);
		while (OpIdx < 18) {
		const SDValue &MaskIdx = Op.getOperand(OpIdx + 1);
		if (MaskIdx.isUndef() \|\|
		cast<ConstantSDNode>(MaskIdx.getNode())->getZExtValue() >= 32) {
		Ops[OpIdx++] = DAG.getConstant(0, DL, MVT::i32);
		} else {
		Ops[OpIdx++] = MaskIdx;
		aheejinUnsubmitted Not Done Reply Inline Actions This looks rather straightforward... Can't we do this in TableGen? aheejin: This looks rather straightforward... Can't we do this in TableGen?
		tlivelyAuthorUnsubmitted Done Reply Inline Actions No, I don't know of a simple way to handle undef lanes in TableGen. I could look into using custom patterns and transform nodes, but in the end this code is probably simpler the way it is. tlively: No, I don't know of a simple way to handle undef lanes in TableGen. I could look into using…
		}
		}
		return DAG.getNode(WebAssemblyISD::SHUFFLE, DL, Op.getValueType(), Ops);
		}
}		}
}		}

SDValue		SDValue
WebAssemblyTargetLowering::LowerSIGN_EXTEND_INREG(SDValue Op,		WebAssemblyTargetLowering::LowerSIGN_EXTEND_INREG(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
// If sign extension operations are disabled, allow sext_inreg only if operand		// If sign extension operations are disabled, allow sext_inreg only if operand
▲ Show 20 Lines • Show All 331 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-intrinsics.ll

	Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines
	declare <16 x i8> @llvm.wasm.narrow.unsigned.v16i8.v8i16(<8 x i16>, <8 x i16>)			declare <16 x i8> @llvm.wasm.narrow.unsigned.v16i8.v8i16(<8 x i16>, <8 x i16>)
	define <16 x i8> @narrow_unsigned_v16i8(<8 x i16> %low, <8 x i16> %high) {			define <16 x i8> @narrow_unsigned_v16i8(<8 x i16> %low, <8 x i16> %high) {
	%a = call <16 x i8> @llvm.wasm.narrow.unsigned.v16i8.v8i16(			%a = call <16 x i8> @llvm.wasm.narrow.unsigned.v16i8.v8i16(
	<8 x i16> %low, <8 x i16> %high			<8 x i16> %low, <8 x i16> %high
	)			)
	ret <16 x i8> %a			ret <16 x i8> %a
	}			}

				; CHECK-LABEL: shuffle_v16i8:
				; NO-SIMD128-NOT: v8x16
				; SIMD128-NEXT: .functype shuffle_v16i8 (v128, v128) -> (v128){{$}}
				; SIMD128-NEXT: v8x16.shuffle $push[[R:[0-9]+]]=, $0, $1,
				; SIMD128-SAME: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 0{{$}}
				; SIMD128-NEXT: return $pop[[R]]{{$}}
				declare <16 x i8> @llvm.wasm.shuffle(
				<16 x i8>, <16 x i8>, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32, i32,
				i32, i32, i32, i32, i32)
				define <16 x i8> @shuffle_v16i8(<16 x i8> %x, <16 x i8> %y) {
				%res = call <16 x i8> @llvm.wasm.shuffle(<16 x i8> %x, <16 x i8> %y,
				i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7,
				i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 35)
				ret <16 x i8> %res
				}

				; CHECK-LABEL: shuffle_undef_v16i8:
				; NO-SIMD128-NOT: v8x16
				; SIMD128-NEXT: .functype shuffle_undef_v16i8 (v128, v128) -> (v128){{$}}
				; SIMD128-NEXT: v8x16.shuffle $push[[R:[0-9]+]]=, $0, $1,
				; SIMD128-SAME: 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2{{$}}
				; SIMD128-NEXT: return $pop[[R]]{{$}}
				define <16 x i8> @shuffle_undef_v16i8(<16 x i8> %x, <16 x i8> %y) {
				%res = call <16 x i8> @llvm.wasm.shuffle(<16 x i8> %x, <16 x i8> %y,
				i32 1, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
				i32 undef, i32 undef, i32 undef, i32 undef, i32 undef, i32 undef,
				i32 undef, i32 undef, i32 undef, i32 2)
				ret <16 x i8> %res
				}

	; ==============================================================================			; ==============================================================================
	; 8 x i16			; 8 x i16
	; ==============================================================================			; ==============================================================================
	; CHECK-LABEL: add_sat_s_v8i16:			; CHECK-LABEL: add_sat_s_v8i16:
	; SIMD128-NEXT: .functype add_sat_s_v8i16 (v128, v128) -> (v128){{$}}			; SIMD128-NEXT: .functype add_sat_s_v8i16 (v128, v128) -> (v128){{$}}
	; SIMD128-NEXT: i16x8.add_saturate_s $push[[R:[0-9]+]]=, $0, $1{{$}}			; SIMD128-NEXT: i16x8.add_saturate_s $push[[R:[0-9]+]]=, $0, $1{{$}}
	; SIMD128-NEXT: return $pop[[R]]{{$}}			; SIMD128-NEXT: return $pop[[R]]{{$}}
	declare <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16>, <8 x i16>)			declare <8 x i16> @llvm.sadd.sat.v8i16(<8 x i16>, <8 x i16>)
	▲ Show 20 Lines • Show All 384 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add wasm-specific vector shuffle builtin and intrinsicClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 263214

clang/include/clang/Basic/BuiltinsWebAssembly.def

clang/lib/CodeGen/CGBuiltin.cpp

clang/lib/Headers/wasm_simd128.h

clang/test/CodeGen/builtins-wasm.c

llvm/include/llvm/IR/IntrinsicsWebAssembly.td

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/test/CodeGen/WebAssembly/simd-intrinsics.ll

[WebAssembly] Add wasm-specific vector shuffle builtin and intrinsic
ClosedPublic