This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Basic/
-
clang/
-
Basic/
-
BuiltinsAArch64.def
-
lib/
-
CodeGen/
-
CGBuiltin.cpp
-
Headers/
-
intrin.h
-
test/CodeGen/
-
CodeGen/
-
arm64-microsoft-intrinsics.c

Differential D106721

[AArch64] Implemnt MSVC mulh and umulh builtins and corresponding IR level intrinsics
ClosedPublic

Authored by mstorsjo on Jul 23 2021, 4:01 PM.

Download Raw Diff

Details

Reviewers

rnk
STL_MSFT
efriedma
DavidSpickett

Commits

rGcc3affd8b020: [clang] [MSVC] Implement __mulh and __umulh builtins for aarch64

Summary

This seems to work, but would it need more testing for anything else
than just the most trivial happy path cases, and any other tests than
these? (I tried to look for some of the existing tests for MSVC ARM64
intrinsics, but they're all very sparingly tested.)

This should fix PR51128.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

mstorsjo created this revision.Jul 23 2021, 4:01 PM

Herald added subscribers: danielkiss, hiraditya, kristof.beyls. · View Herald TranscriptJul 23 2021, 4:01 PM

mstorsjo requested review of this revision.Jul 23 2021, 4:01 PM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 23 2021, 4:01 PM

Do we need LLVM intrinsics for these? For the x86 equivalents, we just generate mul i128.

Harbormaster completed remote builds in B115968: Diff 361373.Jul 23 2021, 4:42 PM

In D106721#2901728, @efriedma wrote:

Do we need LLVM intrinsics for these? For the x86 equivalents, we just generate mul i128.

I worry that LLVM will end up generating a call to compiler-rt to implement i128 arithmetic, especially in unoptimized builds, and currently nothing autolinks clang_rt.builtins-${arch}.lib in an MSVC environment. We can do that, I filed an issue for it, it just needs to get done.

I also worry that without the intrinsic, LLVM will more often than not fail to match the pattern here. The issues users filed about x86 rotate instructions come to mind, LLVM failed to produce the desired rot instructions sometimes.

We won't call compiler-rt for i128 multiply on aarch64. It's not worthwhile under any circumstances.

Pattern-matching 64/64->128 multiply in SelectionDAG legalization has been reliable in practice, as far as I know. And even it misses due to weird behavior in instcombine or something like that, we only end up with one or two extra mla instructions, so not a big deal.

In D106721#2902586, @efriedma wrote:

We won't call compiler-rt for i128 multiply on aarch64. It's not worthwhile under any circumstances.

Pattern-matching 64/64->128 multiply in SelectionDAG legalization has been reliable in practice, as far as I know. And even it misses due to weird behavior in instcombine or something like that, we only end up with one or two extra mla instructions, so not a big deal.

Sounds reasonable to use i128 to me then.

Reimplemented the builtins by expanding to the corresponding i128 multiplication in IR, just like the corresponding existing __mulh and __umulh builtins on x86.

Harbormaster completed remote builds in B119859: Diff 366839.Aug 17 2021, 2:36 AM

lgtm

This revision is now accepted and ready to land.Aug 18 2021, 2:14 PM

This revision was landed with ongoing or failed builds.Aug 19 2021, 1:30 AM

Closed by commit rGcc3affd8b020: [clang] [MSVC] Implement __mulh and __umulh builtins for aarch64 (authored by mstorsjo). · Explain Why

This revision was automatically updated to reflect the committed changes.

mstorsjo added a commit: rGcc3affd8b020: [clang] [MSVC] Implement __mulh and __umulh builtins for aarch64.

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

BuiltinsAArch64.def

3 lines

lib/

CodeGen/

CGBuiltin.cpp

23 lines

Headers/

intrin.h

3 lines

test/

CodeGen/

arm64-microsoft-intrinsics.c

22 lines

Diff 367424

clang/include/clang/Basic/BuiltinsAArch64.def

	Show First 20 Lines • Show All 237 Lines • ▼ Show 20 Lines
	TARGET_HEADER_BUILTIN(_InterlockedDecrement64_rel, "LLiLLiD*", "nh", "intrin.h", ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(_InterlockedDecrement64_rel, "LLiLLiD*", "nh", "intrin.h", ALL_MS_LANGUAGES, "")

	TARGET_HEADER_BUILTIN(_ReadWriteBarrier, "v", "nh", "intrin.h", ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(_ReadWriteBarrier, "v", "nh", "intrin.h", ALL_MS_LANGUAGES, "")
	TARGET_HEADER_BUILTIN(__getReg, "ULLii", "nh", "intrin.h", ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(__getReg, "ULLii", "nh", "intrin.h", ALL_MS_LANGUAGES, "")
	TARGET_HEADER_BUILTIN(_ReadStatusReg, "LLii", "nh", "intrin.h", ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(_ReadStatusReg, "LLii", "nh", "intrin.h", ALL_MS_LANGUAGES, "")
	TARGET_HEADER_BUILTIN(_WriteStatusReg, "viLLi", "nh", "intrin.h", ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(_WriteStatusReg, "viLLi", "nh", "intrin.h", ALL_MS_LANGUAGES, "")
	TARGET_HEADER_BUILTIN(_AddressOfReturnAddress, "v*", "nh", "intrin.h", ALL_MS_LANGUAGES, "")			TARGET_HEADER_BUILTIN(_AddressOfReturnAddress, "v*", "nh", "intrin.h", ALL_MS_LANGUAGES, "")

				TARGET_HEADER_BUILTIN(__mulh, "SLLiSLLiSLLi", "nh", "intrin.h", ALL_MS_LANGUAGES, "")
				TARGET_HEADER_BUILTIN(__umulh, "ULLiULLiULLi", "nh", "intrin.h", ALL_MS_LANGUAGES, "")

	#undef BUILTIN			#undef BUILTIN
	#undef LANGBUILTIN			#undef LANGBUILTIN
	#undef TARGET_HEADER_BUILTIN			#undef TARGET_HEADER_BUILTIN

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 9,706 Lines • ▼ Show 20 Lines	if (BuiltinID == AArch64::BI_AddressOfReturnAddress) {
return Builder.CreateCall(F);		return Builder.CreateCall(F);
}		}

if (BuiltinID == AArch64::BI__builtin_sponentry) {		if (BuiltinID == AArch64::BI__builtin_sponentry) {
llvm::Function *F = CGM.getIntrinsic(Intrinsic::sponentry, AllocaInt8PtrTy);		llvm::Function *F = CGM.getIntrinsic(Intrinsic::sponentry, AllocaInt8PtrTy);
return Builder.CreateCall(F);		return Builder.CreateCall(F);
}		}

		if (BuiltinID == AArch64::BI__mulh \|\| BuiltinID == AArch64::BI__umulh) {
		llvm::Type *ResType = ConvertType(E->getType());
		llvm::Type *Int128Ty = llvm::IntegerType::get(getLLVMContext(), 128);

		bool IsSigned = BuiltinID == AArch64::BI__mulh;
		Value *LHS =
		Builder.CreateIntCast(EmitScalarExpr(E->getArg(0)), Int128Ty, IsSigned);
		Value *RHS =
		Builder.CreateIntCast(EmitScalarExpr(E->getArg(1)), Int128Ty, IsSigned);

		Value MulResult, HigherBits;
		if (IsSigned) {
		MulResult = Builder.CreateNSWMul(LHS, RHS);
		HigherBits = Builder.CreateAShr(MulResult, 64);
		} else {
		MulResult = Builder.CreateNUWMul(LHS, RHS);
		HigherBits = Builder.CreateLShr(MulResult, 64);
		}
		HigherBits = Builder.CreateIntCast(HigherBits, ResType, IsSigned);

		return HigherBits;
		}

// Handle MSVC intrinsics before argument evaluation to prevent double		// Handle MSVC intrinsics before argument evaluation to prevent double
// evaluation.		// evaluation.
if (Optional<MSVCIntrin> MsvcIntId = translateAarch64ToMsvcIntrin(BuiltinID))		if (Optional<MSVCIntrin> MsvcIntId = translateAarch64ToMsvcIntrin(BuiltinID))
return EmitMSVCBuiltinExpr(*MsvcIntId, E);		return EmitMSVCBuiltinExpr(*MsvcIntId, E);

// Find out if any arguments are required to be integer constant		// Find out if any arguments are required to be integer constant
// expressions.		// expressions.
unsigned ICEArguments = 0;		unsigned ICEArguments = 0;
▲ Show 20 Lines • Show All 8,757 Lines • Show Last 20 Lines

clang/lib/Headers/intrin.h

	Show First 20 Lines • Show All 568 Lines • ▼ Show 20 Lines
	unsigned __int64 __getReg(int);			unsigned __int64 __getReg(int);
	long _InterlockedAdd(long volatile *Addend, long Value);			long _InterlockedAdd(long volatile *Addend, long Value);
	__int64 _ReadStatusReg(int);			__int64 _ReadStatusReg(int);
	void _WriteStatusReg(int, __int64);			void _WriteStatusReg(int, __int64);

	unsigned short __cdecl _byteswap_ushort(unsigned short val);			unsigned short __cdecl _byteswap_ushort(unsigned short val);
	unsigned long __cdecl _byteswap_ulong (unsigned long val);			unsigned long __cdecl _byteswap_ulong (unsigned long val);
	unsigned __int64 __cdecl _byteswap_uint64(unsigned __int64 val);			unsigned __int64 __cdecl _byteswap_uint64(unsigned __int64 val);

				__int64 __mulh(__int64 __a, __int64 __b);
				unsigned __int64 __umulh(unsigned __int64 __a, unsigned __int64 __b);
	#endif			#endif

	/----------------------------------------------------------------------------\			/----------------------------------------------------------------------------\
	\|* Privileged intrinsics			\|* Privileged intrinsics
	\----------------------------------------------------------------------------/			\----------------------------------------------------------------------------/
	#if defined(__i386__) \|\| defined(__x86_64__)			#if defined(__i386__) \|\| defined(__x86_64__)
	static __inline__ unsigned __int64 __DEFAULT_FN_ATTRS			static __inline__ unsigned __int64 __DEFAULT_FN_ATTRS
	__readmsr(unsigned long __register) {			__readmsr(unsigned long __register) {
	Show All 34 Lines

clang/test/CodeGen/arm64-microsoft-intrinsics.c

	Show First 20 Lines • Show All 75 Lines • ▼ Show 20 Lines

	void check_ReadWriteBarrier() {			void check_ReadWriteBarrier() {
	_ReadWriteBarrier();			_ReadWriteBarrier();
	}			}

	// CHECK-MSVC: fence syncscope("singlethread")			// CHECK-MSVC: fence syncscope("singlethread")
	// CHECK-LINUX: error: implicit declaration of function '_ReadWriteBarrier'			// CHECK-LINUX: error: implicit declaration of function '_ReadWriteBarrier'

				long long check_mulh(long long a, long long b) {
				return __mulh(a, b);
				}

				// CHECK-MSVC: %[[ARG1:.]] = sext i64 {{.}} to i128
				// CHECK-MSVC: %[[ARG2:.]] = sext i64 {{.}} to i128
				// CHECK-MSVC: %[[PROD:.*]] = mul nsw i128 %[[ARG1]], %[[ARG2]]
				// CHECK-MSVC: %[[HIGH:.*]] = ashr i128 %[[PROD]], 64
				// CHECK-MSVC: %[[RES:.*]] = trunc i128 %[[HIGH]] to i64
				// CHECK-LINUX: error: implicit declaration of function '__mulh'

				unsigned long long check_umulh(unsigned long long a, unsigned long long b) {
				return __umulh(a, b);
				}

				// CHECK-MSVC: %[[ARG1:.]] = zext i64 {{.}} to i128
				// CHECK-MSVC: %[[ARG2:.]] = zext i64 {{.}} to i128
				// CHECK-MSVC: %[[PROD:.*]] = mul nuw i128 %[[ARG1]], %[[ARG2]]
				// CHECK-MSVC: %[[HIGH:.*]] = lshr i128 %[[PROD]], 64
				// CHECK-MSVC: %[[RES:.*]] = trunc i128 %[[HIGH]] to i64
				// CHECK-LINUX: error: implicit declaration of function '__umulh'

	unsigned __int64 check__getReg() {			unsigned __int64 check__getReg() {
	unsigned volatile __int64 reg;			unsigned volatile __int64 reg;
	reg = __getReg(18);			reg = __getReg(18);
	reg = __getReg(31);			reg = __getReg(31);
	return reg;			return reg;
	}			}

	// CHECK-MSVC: call i64 @llvm.read_register.i64(metadata ![[MD2:.*]])			// CHECK-MSVC: call i64 @llvm.read_register.i64(metadata ![[MD2:.*]])
	// CHECK-MSVC: call i64 @llvm.read_register.i64(metadata ![[MD3:.*]])			// CHECK-MSVC: call i64 @llvm.read_register.i64(metadata ![[MD3:.*]])
	// CHECK-MSVC: ![[MD2]] = !{!"x18"}			// CHECK-MSVC: ![[MD2]] = !{!"x18"}
	// CHECK-MSVC: ![[MD3]] = !{!"sp"}			// CHECK-MSVC: ![[MD3]] = !{!"sp"}