This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/Headers/
-
Headers/
-
intrin.h
-
test/Headers/
-
Headers/
-
ms-intrin.cpp

Differential D49606

[ms] Add shiftleft128 / shiftright128 intrinsics
ClosedPublic

Authored by thakis on Jul 20 2018, 9:32 AM.

Download Raw Diff

Details

Reviewers

craig.topper
spatel

Summary

Carefully match the pattern matched by ISel so that this produces shld / shrd (unless Subtarget->isSHLDSlow() is true).

Thanks to Craig Topper for providing the LLVM IR pattern that gets successfully matched.

Fixes PR37755.

Diff Detail

Event Timeline

thakis created this revision.Jul 20 2018, 9:32 AM

@spatel, should this ultimately use funnel shift?

Here are the IR patterns for this that work. Not sure if we can do this directly in C, we need a 128 bit type, but maybe we can emit it from CGBuiltin.cpp?

define i64 @__shiftleft128(i64 %x, i64 %y, i8 %amt) {
  %a = zext i64 %x to i128
  %b = zext i64 %y to i128
  %c = shl i128 %b, 64
  %d = or i128 %a, %c
  %amtmask = and i8 %amt, 63
  %e = zext i8 %amtmask to i128
  %f = shl i128 %d, %e
  %g = lshr i128 %f, 64
  %h = trunc i128 %g to i64
  ret i64 %h
}

define i64 @__shiftright128(i64 %x, i64 %y, i8 %amt) {
  %a = zext i64 %x to i128
  %b = zext i64 %y to i128
  %c = shl i128 %b, 64
  %d = or i128 %a, %c
  %amtmask = and i8 %amt, 63
  %e = zext i8 %amtmask to i128
  %f = lshr i128 %d, %e
  %g = trunc i128 %f to i64
  ret i64 %g
}

We have __int128. If you think hitting the pattern is preferable to inline asm, I can try to give that a try, either via C or via CGBuiltin.cpp.

I'd prefer the pattern over inline assembly. It'll give us more flexibility in the backend if we should be using some other instruction on different targets.

In D49606#1170278, @craig.topper wrote:

Here are the IR patterns for this that work. Not sure if we can do this directly in C, we need a 128 bit type, but maybe we can emit it from CGBuiltin.cpp?

define i64 @__shiftleft128(i64 %x, i64 %y, i8 %amt) {
  %a = zext i64 %x to i128
  %b = zext i64 %y to i128
  %c = shl i128 %b, 64
  %d = or i128 %a, %c
  %amtmask = and i8 %amt, 63
  %e = zext i8 %amtmask to i128
  %f = shl i128 %d, %e
  %g = lshr i128 %f, 64
  %h = trunc i128 %g to i64
  ret i64 %h
}

define i64 @__shiftright128(i64 %x, i64 %y, i8 %amt) {
  %a = zext i64 %x to i128
  %b = zext i64 %y to i128
  %c = shl i128 %b, 64
  %d = or i128 %a, %c
  %amtmask = and i8 %amt, 63
  %e = zext i8 %amtmask to i128
  %f = lshr i128 %d, %e
  %g = trunc i128 %f to i64
  ret i64 %g
}

I’m not at my dev machine, but this is exactly the definition of funnel shift, no? Unless that got reverted, adding/modifying clang builtins was the next step in the plan for those intrinsics. We probably need some backend work to match the variable shift version, but shift-by constant should already work.

Now with C builtins which get nicely optimized.

@spatel, yes its exactly funnel shift. I wasn't sure if we were ready for clang to create it yet or not. Can we let this go as is and change it to funnel shift once we have the variable case fixed in the backend?

Isn't implementing this in plain old C the nicest approach anyhow, even once funnel shift exists?

The only weird thing that I can really think of with the C version is that the 'and' on the shift amount might get hoisted out of a loop and not get dropped during isel.

LGTM. I'm inclined to let this go in now since we have a requested use for it. We can change it to funnel shift once we're confident in the backend.

This revision is now accepted and ready to land.Jul 20 2018, 2:04 PM

In D49606#1170448, @thakis wrote:

Isn't implementing this in plain old C the nicest approach anyhow, even once funnel shift exists?

No, the primary reason for creating the intrinsic is that we can’t guarantee that we’ll recognize the pattern as ‘shld’ or ‘rotate’ in this C/IR form. That’s because the pattern can get split across basic blocks, so the backend can’t easily recognize it. Also, IIUC this patch only deals with the specific case of 64-bit values, so we still don’t have the functionality for other widths?

I don’t object if this helps that one case, but we have made the investment in the IR intrinsics, so using them should be the general direction/goal.

r337619, thanks! The hoisting point is a good one; will rewrite using funnelshift once that's possible :-)

STL_MSFT mentioned this in D53912: [Headers] [MS] Add intrin0.h.Nov 2 2018, 2:51 PM

Revision Contents

Path

Size

clang/

lib/

Headers/

intrin.h

14 lines

test/

Headers/

ms-intrin.cpp

2 lines

Diff 156596

clang/lib/Headers/intrin.h

	Show First 20 Lines • Show All 847 Lines • ▼ Show 20 Lines
	__halt(void) {			__halt(void) {
	__asm__ volatile ("hlt");			__asm__ volatile ("hlt");
	}			}
	static __inline__ void __DEFAULT_FN_ATTRS			static __inline__ void __DEFAULT_FN_ATTRS
	__nop(void) {			__nop(void) {
	__asm__ volatile ("nop");			__asm__ volatile ("nop");
	}			}
	#endif			#endif
				#if defined(__x86_64__)
				static __inline__ unsigned __int64 __DEFAULT_FN_ATTRS
				__shiftleft128(unsigned __int64 __l, unsigned __int64 __h, unsigned char __d) {
				unsigned __int128 __val = ((unsigned __int128)__h << 64) \| __l;
				unsigned __int128 __res = __val << (__d & 63);
				return (unsigned __int64)(__res >> 64);
				}
				static __inline__ unsigned __int64 __DEFAULT_FN_ATTRS
				__shiftright128(unsigned __int64 __l, unsigned __int64 __h, unsigned char __d) {
				unsigned __int128 __val = ((unsigned __int128)__h << 64) \| __l;
				unsigned __int128 __res = __val >> (__d & 63);
				return (unsigned __int64)__res;
				}
				#endif

	/----------------------------------------------------------------------------\			/----------------------------------------------------------------------------\
	\|* Privileged intrinsics			\|* Privileged intrinsics
	\----------------------------------------------------------------------------/			\----------------------------------------------------------------------------/
	#if defined(__i386__) \|\| defined(__x86_64__)			#if defined(__i386__) \|\| defined(__x86_64__)
	static __inline__ unsigned __int64 __DEFAULT_FN_ATTRS			static __inline__ unsigned __int64 __DEFAULT_FN_ATTRS
	__readmsr(unsigned long __register) {			__readmsr(unsigned long __register) {
	// Loads the contents of a 64-bit model specific register (MSR) specified in			// Loads the contents of a 64-bit model specific register (MSR) specified in
	Show All 32 Lines

clang/test/Headers/ms-intrin.cpp

Show All 36 Lines	void f() {
__movsb(0, 0, 0);		__movsb(0, 0, 0);
__movsd(0, 0, 0);		__movsd(0, 0, 0);
__movsw(0, 0, 0);		__movsw(0, 0, 0);

__stosd(0, 0, 0);		__stosd(0, 0, 0);
__stosw(0, 0, 0);		__stosw(0, 0, 0);

#ifdef _M_X64		#ifdef _M_X64
		__shiftleft128(1, 2, 3);
		__shiftright128(1, 2, 3);
__movsq(0, 0, 0);		__movsq(0, 0, 0);
__stosq(0, 0, 0);		__stosq(0, 0, 0);
#endif		#endif

int info[4];		int info[4];
__cpuid(info, 0);		__cpuid(info, 0);
__cpuidex(info, 0, 0);		__cpuidex(info, 0, 0);
_xgetbv(0);		_xgetbv(0);
Show All 19 Lines