This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/Headers/
-
lib/
-
Headers/
3
intrin.h

Differential D33616

[MS] Fix _bittest* intrinsics for values bigger than 31
AcceptedPublic

Authored by rnk on May 26 2017, 4:56 PM.

Download Raw Diff

Details

Reviewers

hans
majnemer

Summary

These intrinsics are supposed to select to BT, BTS, etc instructions.
Those instructions actually perform a bitwise array indexing memory
operation that LLVM doesn't currently expose. This change implements
the shifting and array indexing in plain C.

Fixes PR33188

If we ever fix PR19164, then the array indexing should be pattern
matched to BT and BTS memory instructions as appropriate.

Diff Detail

Build Status

Buildable 6825
Build 6825: arc lint + arc unit

Event Timeline

rnk created this revision.May 26 2017, 4:56 PM

lgtm

This revision is now accepted and ready to land.May 30 2017, 9:34 AM

Do you really want to be doing signed division here? Because it is signed, it won't turn into the simple bitshifts and masks that one might expect.

From looking in the Intel manual (Table 3-2, in 3.1.1.9 about Bit(BitBase,BitOffset)) it does sound like the bit offset can be negative *shudder*, so I suppose this is necessary and explains why the type is signed in the first place? Hopefully most of these will be inlined into a context where BitPos is constant or unsigned.

In D33616#768287, @hans wrote:

From looking in the Intel manual (Table 3-2, in 3.1.1.9 about Bit(BitBase,BitOffset)) it does sound like the bit offset can be negative *shudder*, so I suppose this is necessary and explains why the type is signed in the first place? Hopefully most of these will be inlined into a context where BitPos is constant or unsigned.

Indeed :/

LGTM

In D33616#768287, @hans wrote:

From looking in the Intel manual (Table 3-2, in 3.1.1.9 about Bit(BitBase,BitOffset)) it does sound like the bit offset can be negative *shudder*, so I suppose this is necessary and explains why the type is signed in the first place? Hopefully most of these will be inlined into a context where BitPos is constant or unsigned.

OK, but this code will produce the wrong answer in cases where _BitPos is negative, because signed division rounds towards zero rather than rounding down.

majnemer added inline comments.May 30 2017, 7:57 PM

clang/lib/Headers/intrin.h
345–347	`_bittest` seems to expand to `(((unsigned char const *)_BitBase)[_BitPos >> 3] >> (_BitPos & 7)) & 1` on CL ARM: https://godbolt.org/g/Yc8rMH Perhaps these are the intended semantics?

Darn, I think Richard is right, the signed div/mod doesn't do the right thing for negative values. Honestly I need to rig up a test case for this, and then I'll come back to it.

What do you folks think is best:

Add an LLVM intrinsic for this and use it
Use inline assembly in intrin.h
Emit an InlineAsm call directly in CGBuiltin.cpp
Keep expanding this to plain C and wait for us to eventually pattern match this in LLVM CodeGen

The LLVM pattern match optimization feels increasingly difficult because of all the edge cases around negative indices, and that's pushing me back towards doing this as an instruction.

clang/lib/Headers/intrin.h
345–347	Personally I think we need to match the x86 behavior on x86.

My feeling is that there's not much value in adding an LLVM intrinsic for something that can be expressed directly in a handful of IR instructions; those instructions seem like the way to express this that would allow the most optimization potential, and the backend should be able to pattern match this to a BT instruction. Based on that, I'm reluctant to add clang builtins for this, partly because it looks like we'd need quite a lot of them and partly because (in the absence of an IR intrinsic) they'd just get lowered into the same instructions as the plain C code anyway.

clang/lib/Headers/intrin.h
398	I'm confused: why are there distinct ...32 and ...64 versions of these, if the bit number can leave the (d\|q)word anyway? Are we supposed to number bits right-to-left within words on big endian targets? Or is this guaranteeing something about alignment or about being able to load a full 32/64 bits at the requested word offset?

Revision Contents

Path

Size

clang/

lib/

Headers/

intrin.h

24 lines

Diff 100502

clang/lib/Headers/intrin.h

	Show First 20 Lines • Show All 336 Lines • ▼ Show 20 Lines

	#endif			#endif

	/----------------------------------------------------------------------------\			/----------------------------------------------------------------------------\
	\|* Bit Counting and Testing			\|* Bit Counting and Testing
	\----------------------------------------------------------------------------/			\----------------------------------------------------------------------------/
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_bittest(long const *_BitBase, long _BitPos) {			_bittest(long const *_BitBase, long _BitPos) {
				_BitBase += (_BitPos / 32);
				_BitPos %= 32;
	return (*_BitBase >> _BitPos) & 1;			return (*_BitBase >> _BitPos) & 1;
				majnemerUnsubmitted Not Done Reply Inline Actions `_bittest` seems to expand to `(((unsigned char const )_BitBase)[_BitPos >> 3] >> (_BitPos & 7)) & 1` on CL ARM: https://godbolt.org/g/Yc8rMH Perhaps these are the intended semantics? majnemer:* `_bittest` seems to expand to `(((unsigned char const *)_BitBase)[_BitPos >> 3] >> (_BitPos &…
				rnkAuthorUnsubmitted Not Done Reply Inline Actions Personally I think we need to match the x86 behavior on x86. rnk: Personally I think we need to match the x86 behavior on x86.
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_bittestandcomplement(long *_BitBase, long _BitPos) {			_bittestandcomplement(long *_BitBase, long _BitPos) {
				_BitBase += (_BitPos / 32);
				_BitPos %= 32;
	unsigned char _Res = (*_BitBase >> _BitPos) & 1;			unsigned char _Res = (*_BitBase >> _BitPos) & 1;
	_BitBase = _BitBase ^ (1 << _BitPos);			_BitBase = _BitBase ^ (1 << _BitPos);
	return _Res;			return _Res;
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_bittestandreset(long *_BitBase, long _BitPos) {			_bittestandreset(long *_BitBase, long _BitPos) {
				_BitBase += (_BitPos / 32);
				_BitPos %= 32;
	unsigned char _Res = (*_BitBase >> _BitPos) & 1;			unsigned char _Res = (*_BitBase >> _BitPos) & 1;
	_BitBase = _BitBase & ~(1 << _BitPos);			_BitBase = _BitBase & ~(1 << _BitPos);
	return _Res;			return _Res;
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_bittestandset(long *_BitBase, long _BitPos) {			_bittestandset(long *_BitBase, long _BitPos) {
				_BitBase += (_BitPos / 32);
				_BitPos %= 32;
	unsigned char _Res = (*_BitBase >> _BitPos) & 1;			unsigned char _Res = (*_BitBase >> _BitPos) & 1;
	_BitBase = _BitBase \| (1 << _BitPos);			_BitBase = _BitBase \| (1 << _BitPos);
	return _Res;			return _Res;
	}			}
	#if defined(__arm__) \|\| defined(__aarch64__)			#if defined(__arm__) \|\| defined(__aarch64__)
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_interlockedbittestandset_acq(long volatile *_BitBase, long _BitPos) {			_interlockedbittestandset_acq(long volatile *_BitBase, long _BitPos) {
				_BitBase += (_BitPos / 32);
				_BitPos %= 32;
	long _PrevVal = __atomic_fetch_or(_BitBase, 1l << _BitPos, __ATOMIC_ACQUIRE);			long _PrevVal = __atomic_fetch_or(_BitBase, 1l << _BitPos, __ATOMIC_ACQUIRE);
	return (_PrevVal >> _BitPos) & 1;			return (_PrevVal >> _BitPos) & 1;
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_interlockedbittestandset_nf(long volatile *_BitBase, long _BitPos) {			_interlockedbittestandset_nf(long volatile *_BitBase, long _BitPos) {
				_BitBase += (_BitPos / 32);
				_BitPos %= 32;
	long _PrevVal = __atomic_fetch_or(_BitBase, 1l << _BitPos, __ATOMIC_RELAXED);			long _PrevVal = __atomic_fetch_or(_BitBase, 1l << _BitPos, __ATOMIC_RELAXED);
	return (_PrevVal >> _BitPos) & 1;			return (_PrevVal >> _BitPos) & 1;
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_interlockedbittestandset_rel(long volatile *_BitBase, long _BitPos) {			_interlockedbittestandset_rel(long volatile *_BitBase, long _BitPos) {
				_BitBase += (_BitPos / 32);
				_BitPos %= 32;
	long _PrevVal = __atomic_fetch_or(_BitBase, 1l << _BitPos, __ATOMIC_RELEASE);			long _PrevVal = __atomic_fetch_or(_BitBase, 1l << _BitPos, __ATOMIC_RELEASE);
	return (_PrevVal >> _BitPos) & 1;			return (_PrevVal >> _BitPos) & 1;
	}			}
	#endif			#endif
	#ifdef __x86_64__			#ifdef __x86_64__
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_bittest64(__int64 const *_BitBase, __int64 _BitPos) {			_bittest64(__int64 const *_BitBase, __int64 _BitPos) {
				rsmithUnsubmitted Not Done Reply Inline Actions I'm confused: why are there distinct ...32 and ...64 versions of these, if the bit number can leave the (d\|q)word anyway? Are we supposed to number bits right-to-left within words on big endian targets? Or is this guaranteeing something about alignment or about being able to load a full 32/64 bits at the requested word offset? rsmith: I'm confused: why are there distinct ...32 and ...64 versions of these, if the bit number can…
				_BitBase += (_BitPos / 64);
				_BitPos %= 64;
	return (*_BitBase >> _BitPos) & 1;			return (*_BitBase >> _BitPos) & 1;
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_bittestandcomplement64(__int64 *_BitBase, __int64 _BitPos) {			_bittestandcomplement64(__int64 *_BitBase, __int64 _BitPos) {
				_BitBase += (_BitPos / 64);
				_BitPos %= 64;
	unsigned char _Res = (*_BitBase >> _BitPos) & 1;			unsigned char _Res = (*_BitBase >> _BitPos) & 1;
	_BitBase = _BitBase ^ (1ll << _BitPos);			_BitBase = _BitBase ^ (1ll << _BitPos);
	return _Res;			return _Res;
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_bittestandreset64(__int64 *_BitBase, __int64 _BitPos) {			_bittestandreset64(__int64 *_BitBase, __int64 _BitPos) {
				_BitBase += (_BitPos / 64);
				_BitPos %= 64;
	unsigned char _Res = (*_BitBase >> _BitPos) & 1;			unsigned char _Res = (*_BitBase >> _BitPos) & 1;
	_BitBase = _BitBase & ~(1ll << _BitPos);			_BitBase = _BitBase & ~(1ll << _BitPos);
	return _Res;			return _Res;
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_bittestandset64(__int64 *_BitBase, __int64 _BitPos) {			_bittestandset64(__int64 *_BitBase, __int64 _BitPos) {
				_BitBase += (_BitPos / 64);
				_BitPos %= 64;
	unsigned char _Res = (*_BitBase >> _BitPos) & 1;			unsigned char _Res = (*_BitBase >> _BitPos) & 1;
	_BitBase = _BitBase \| (1ll << _BitPos);			_BitBase = _BitBase \| (1ll << _BitPos);
	return _Res;			return _Res;
	}			}
	static __inline__ unsigned char __DEFAULT_FN_ATTRS			static __inline__ unsigned char __DEFAULT_FN_ATTRS
	_interlockedbittestandset64(__int64 volatile *_BitBase, __int64 _BitPos) {			_interlockedbittestandset64(__int64 volatile *_BitBase, __int64 _BitPos) {
				_BitBase += (_BitPos / 64);
				_BitPos %= 64;
	long long _PrevVal =			long long _PrevVal =
	__atomic_fetch_or(_BitBase, 1ll << _BitPos, __ATOMIC_SEQ_CST);			__atomic_fetch_or(_BitBase, 1ll << _BitPos, __ATOMIC_SEQ_CST);
	return (_PrevVal >> _BitPos) & 1;			return (_PrevVal >> _BitPos) & 1;
	}			}
	#endif			#endif
	/----------------------------------------------------------------------------\			/----------------------------------------------------------------------------\
	\|* Interlocked Exchange Add			\|* Interlocked Exchange Add
	\----------------------------------------------------------------------------/			\----------------------------------------------------------------------------/
	▲ Show 20 Lines • Show All 551 Lines • Show Last 20 Lines