This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
include/
1/3
__bits
-
__random/
1/1
uniform_int_distribution.h
-
test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/
-
std/
-
numerics/
-
rand/
-
rand.dis/
-
rand.dist.uni/
-
rand.dist.uni.int/
10/10
int128.pass.cpp

Differential D114129

[libc++] Fix `uniform_int_distribution` for 128-bit result type
ClosedPublic

Authored by fwolff on Nov 17 2021, 4:46 PM.

Download Raw Diff

Details

Reviewers

• Quuxplusone
Mordante
ldionne

Group Reviewers

Restricted Project

Commits

rGb254c2e2c4aa: [libc++] Fix `uniform_int_distribution` for 128-bit result type

Summary

Fixes PR#51520. The problem is that uniform_int_distribution currently uses an unsigned integer with at most 64 bits internally, which is then casted to the desired result type. If the result type is int64_t, this will produce a negative number if the most significant bit is set, but if the result type is __int128_t, the value remains non-negative and will be out of bounds for the example in PR#51520. (The reason why it also seems to work if the upper or lower bound is changed is because this branch will then no longer be taken, and proper rejection sampling takes place.)

The bigger issue here is probably that uniform_int_distribution can be instantiated with __int128_t but will silently produce incorrect results (only the lowest 64 bits can ever be set). libstdc++ also supports __int128_t as a result type, so I have simply extended the maximum width of the internal intermediate result type.

Diff Detail

Event Timeline

fwolff requested review of this revision.Nov 17 2021, 4:46 PM

fwolff created this revision.

Herald added 1 blocking reviewer(s): Restricted Project. · View Herald TranscriptNov 17 2021, 4:46 PM

Herald added a subscriber: libcxx-commits. · View Herald Transcript

Harbormaster completed remote builds in B134818: Diff 388060.Nov 17 2021, 5:05 PM

Is uniform_int_distribution really the only distribution with this kind of problem? What about, I dunno, binomial_distribution?

libcxx/include/__bits
55
libcxx/include/__random/uniform_int_distribution.h
295	Would this work if it were simply typedef typename conditional<sizeof(result_type) <= 4, uint32_t, typename make_unsigned<result_type>::type>::type _UIntType; ? That is, I think the old code was just assuming that for types larger than 32 bits, `typename make_unsigned<result_type>::type` was by definition `uint64_t`; now that that's false, we can just make the substitution.
libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp
29	I think it's worth verifying that `d.min() == 0 && d.max() == INT128_MAX` (however we spell that). Similarly on line 40. Shall we add a case where the constructor arguments to `uniform_int_distribution<__[u]int128_t>` are outside the range of int64/uint64? Style nit: Please use `int` not `auto` on lines 30 and 41, and `__int128_t` not `auto` on lines 31 and 42. Shall we check `uniform_int_distribution<__uint128_t>` while we're at it?

Is uniform_int_distribution really the only distribution with this kind of problem? What about, I dunno, binomial_distribution?

binomial_distribution also seems to be affected, and maybe others as well. But I haven't looked into them deeply yet, and it's not obvious from looking at the code briefly what causes the problem there.

Also, is there a reason why uniform_int_distribution.h defines its own independent_bits_engine instead of using the one from the random header?

And, out of curiosity, because I haven't been able to figure it out by myself and must be overlooking something: Do you happen to know why the simple definition of independent_bits_engine (here, 26.4.4.2) was dropped in favor of the much more complicated one that made it into the standard?

fwolff marked 3 inline comments as done.Nov 18 2021, 1:06 PM

Harbormaster completed remote builds in B134968: Diff 388290.Nov 18 2021, 2:15 PM

binomial_distribution also seems to be affected, and maybe others as well. But I haven't looked into them deeply yet, and it's not obvious from looking at the code briefly what causes the problem there.

Also, is there a reason why uniform_int_distribution.h defines its own independent_bits_engine instead of using the one from the random header?

On both of those questions (but especially the second), I tentatively propose that we (I?) should granularize <random> into <__random/*.h> before landing this PR. (Which will merge-conflict horribly, no matter which way we do things. Unless people think we should just land this one ASAP. I wouldn't object to that, but I wouldn't necessarily encourage it either.)

And, out of curiosity, because I haven't been able to figure it out by myself and must be overlooking something: Do you happen to know why the simple definition of independent_bits_engine (here, 26.4.4.2) was dropped in favor of the much more complicated one that made it into the standard?

Way before my time; I wonder if @mclow.lists would remember. :) What little I've got: That engine was absent from N1932, which was discussed in Berlin 2006 (the meeting notes are not enlightening to me); N2079 had the "simple" version of random_bits_engine, with some comments indicating that getting the math exactly technically right on generate_canonical was harder than it looked; I don't know where N2079 was discussed (presumably Portland though); and then its successor N2111 had the "complicated" version of independent_bits_engine and was voted into the Standard in Portland 2006 (the meeting notes are just a straw poll on voting in the still-draft-numbered d2111, no recorded discussion).

In D114129#3141402, @Quuxplusone wrote:

On both of those questions (but especially the second), I tentatively propose that we (I?) should granularize <random> into <__random/*.h> before landing this PR. (Which will merge-conflict horribly, no matter which way we do things. Unless people think we should just land this one ASAP. I wouldn't object to that, but I wouldn't necessarily encourage it either.)

Sure. Do let me know if/how I can help (so that I'm not stepping on your toes).

Way before my time; I wonder if @mclow.lists would remember. :) What little I've got: That engine was absent from N1932, which was discussed in Berlin 2006 (the meeting notes are not enlightening to me); N2079 had the "simple" version of random_bits_engine, with some comments indicating that getting the math exactly technically right on generate_canonical was harder than it looked; I don't know where N2079 was discussed (presumably Portland though); and then its successor N2111 had the "complicated" version of independent_bits_engine and was voted into the Standard in Portland 2006 (the meeting notes are just a straw poll on voting in the still-draft-numbered d2111, no recorded discussion).

Thanks for digging up this information! I can only imagine that it must have something to do with some mathematical intricacy that I'm not seeing...

"Granularize the <random> header" is now D114281.

Rebased after D114281 was merged.

LGTM mod nits! Please wait for @Mordante or @ldionne to approve.
For the record, do we think that there is an implied "TODO" here to look into the situation for all other integer distributions, such as binomial_distribution?
And if so, are you (@fwolff) volunteering to tackle that?

libcxx/include/__bits
53	Nit: I personally prefer to see `(X)-1` rather than `(X)~0`, since I always mentally read the latter as `(X)~0u` (which would be wrong) and then correct myself. :) Alternatively, in this case I guess you could just use `~0uLL` and save yourself some parentheses.
libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp
82	Tiny nits throughout: `a <= n && n <= b` is easier on the eyes than `n >= a && n <= b`. https://quuxplusone.github.io/blog/2021/05/05/its-clamping-time/ I'd prefer not to see the extraneous `const` on lines 42, 43, 49, 58, 59, 65. I'd prefer `T(a, b)` over `T{a, b}` on lines 61, 75, and arguably 60, 74 (since we're not worried about most-vexing-parse, and our `T`s here are not sequences, arrays, or aggregates). Perhaps rename `all_in_range` to `all_in_64bit_range` or `all_in_narrower_range` or something like that, to indicate more clearly why we don't expect all the output to be "in range"?

Harbormaster completed remote builds in B135710: Diff 389301.Nov 23 2021, 3:40 PM

Thanks for looking into this issue!

libcxx/include/__bits
49	In the header `<bit>` there is `__countl_zero` which has 128-bit support. Can we use that function instead?
libcxx/include/__random/log2.h
23 ↗	(On Diff #389301)	Does this header the same as `__bit_log2` in `<bit>`?
libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp
36	This feels flaky. If the tests happens to produce 100 random values below `UINT64_MAX` the test fails. I suggest to select a specific random number engine with a fixed seed for this test. Then you can add a stable test to see whether values in the wanted range are produced.
52	Why do we need this test? Isn't `assert(n >= a && n <= b);` enough to test?

This revision now requires changes to proceed.Nov 24 2021, 11:17 AM

+1 it's a good idea to investigate the possibility of reusing __countl_zero and __bit_log2.

libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp
36	This isn't literally flaky, since `default_random_engine` will be fixed at compile time and so will its default seed — we're not using `random_device` or anything. I agree its definition might vary between libc++/libstdc++ and MSVC (I'm not sure if it does in practice; I'm pretty sure off the top of my head that it's `minstd_rand0` for both libc++ and libstdc++). Note that engines' behaviors are defined by the standard, so `minstd_rand0{}()` has the same value on all implementations by definition. It's the distributions whose behaviors diverge across implementations (and unfortunately it's the distribution whose behavior we really care about here). The chances of 100 numbers all being clustered in one 18446744073709551616th of the PRNG's range (or even one-tenth of its range, as on line 42) strikes me as actually-far-more-than-astronomically unlikely. So I don't see a problem here. But, at the same time, it wouldn't hurt to `s/default_random_engine/minstd_rand0/` just to remove one superfluous degree of freedom.
52	I believe this illustrates my recent point about renaming this variable. :) We want to check that all the values are in the range `[a, b]`; but then we also want to check that some of the values are not in the narrower range `[INT64_MIN, INT64_MAX]` (because if they all happened to fall in that narrower range, that would likely indicate a truncation bug somewhere in libc++ code).

Mordante added inline comments.Nov 25 2021, 10:04 AM

libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp
36	Fair point. In fact when it fails we might consider the distribution not to be uniform. I still think it's best to use a fixed random engine.
52	Indeed it illustrates your point. Kete Gregory's "Naming is hard, let's do better" talk of a few years ago directly comes to mind ;-)

This attempts to fix the remaining concerns/requests.

Regarding the other integer distributions: binomial_distribution and negative_binomial_distribution actually seem to work; discrete_distribution is not relevant here because nobody has over UINT64_MAX weights; and I haven't gotten poisson_distribution and geometric_distribution to work, but this might just be due to the extreme values of the parameters you'd have to use to get 128-bit results (and the latter just delegates to negative_binomial_distribution anyway).

fwolff marked 8 inline comments as done.Nov 25 2021, 4:46 PM

fwolff added inline comments.

libcxx/include/__random/log2.h
23 ↗	(On Diff #389301)	Yes, but `__bit_log2` is marked with `_LIBCPP_CONSTEXPR_AFTER_CXX11`, so it won't work for C++11 because we are using it to initialize `static const` members, or am I missing something?

Harbormaster completed remote builds in B136129: Diff 389887.Nov 25 2021, 5:37 PM

LGTM modulo one style issue. Please resolve this issue and rebase the patch to verify the CI is green before landing.

libcxx/include/__random/log2.h
23 ↗	(On Diff #389301)	Good point, too bad we can't remove the duplication.
libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp
30	Please change `auto` with the type, here and in the remainder of this file. This is the typical style in LLVM/libc++.

This revision is now accepted and ready to land.Nov 28 2021, 4:40 AM

• Quuxplusone accepted this revision.Nov 28 2021, 9:25 AM

• Quuxplusone added inline comments.

libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp
30	(FWIW, as long as it's `auto x = TheSpecificType();` as in these cases, personally I'm fine with it. But even in these cases, `TheSpecificType x;` will be shorter.)

Thanks a lot for looking into this @fwolff.

The only issue I see with this patch is that the Standard says that instantiating any of the distributions with something that is not short, int, long, long long, unsigned short, unsigned int, unsigned long, or unsigned long long is undefined: http://eel.is/c++draft/rand#req.genl-1.5. Hence, this is technically an extension, and another valid approach would be to just static_assert(_IntType is one of the aforementioned types); from within the distributions. I'm curious to hear your thoughts on whether it's better to stick to the Standard or provide this as an extension (in which case we should probably annotate that it is an extension).

Requesting changes so we can have this discussion, however I'm fine with the patch if we decide to go for this approach.

libcxx/include/__random/log2.h
61–66 ↗	(On Diff #389887)	Or something like that -- the current formatting looks really weird with the `<` alone on its own line.

This revision now requires changes to proceed.Nov 29 2021, 8:44 AM

I prefer to allow this extension; I even wonder whether not allowing 128-bit values was intentionally or an artifact of prohibiting (un)signed char. In general I prefer to allow __[u]int128_t to be used as a first-class citizen in libc++.

fwolff updated this revision to Diff 390780.Nov 30 2021, 12:19 PM

fwolff marked an inline comment as done.

In D114129#3158805, @Mordante wrote:

I prefer to allow this extension; I even wonder whether not allowing 128-bit values was intentionally or an artifact of prohibiting (un)signed char. In general I prefer to allow __[u]int128_t to be used as a first-class citizen in libc++.

Yes, this is also how I feel about this.

In D114129#3158555, @ldionne wrote:

[...] or provide this as an extension (in which case we should probably annotate that it is an extension).

What does that mean exactly, annotating it as an extension?

libcxx/include/__random/log2.h
61–66 ↗	(On Diff #389887)	Fixed, although I followed the precedent of `_Working_result_type` from `__independent_bits_engine` here.

In D114129#3162314, @fwolff wrote:

In D114129#3158805, @Mordante wrote:

I prefer to allow this extension; I even wonder whether not allowing 128-bit values was intentionally or an artifact of prohibiting (un)signed char. In general I prefer to allow __[u]int128_t to be used as a first-class citizen in libc++.

Yes, this is also how I feel about this.

Ok. In that case, can we add a comment somewhere in uniform_int_distribution that __int128_t is supported? Make sure the comment contains the word "extension", so that when we eventually get a formal way of documenting those, it's easy to find the existing extensions we support.

In D114129#3158555, @ldionne wrote:

[...] or provide this as an extension (in which case we should probably annotate that it is an extension).

What does that mean exactly, annotating it as an extension?

I just meant documenting it somewhere.

This revision is now accepted and ready to land.Nov 30 2021, 1:16 PM

Harbormaster completed remote builds in B136745: Diff 390780.Nov 30 2021, 2:40 PM

fwolff updated this revision to Diff 390849.Nov 30 2021, 4:07 PM

fwolff marked an inline comment as done.

In D114129#3162487, @ldionne wrote:

Ok. In that case, can we add a comment somewhere in uniform_int_distribution that __int128_t is supported?

Done, and thanks for reviewing this! Can you also commit it for me? You can use name and email as in commit dc1c27149f214ff099e99930226ae312b0cf1910 for attribution. Thanks!

Harbormaster completed remote builds in B136794: Diff 390849.Dec 1 2021, 12:01 AM

Closed by commit rGb254c2e2c4aa: [libc++] Fix `uniform_int_distribution` for 128-bit result type (authored by fwolff, committed by ldionne). · Explain WhyDec 1 2021, 8:03 AM

This revision was automatically updated to reflect the committed changes.

ldionne added a commit: rGb254c2e2c4aa: [libc++] Fix `uniform_int_distribution` for 128-bit result type.

Hi. I think the build issue we're seeing (https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-debug-subbuild/b8829019800818975281/overview) is caused by this patch:

../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/bit:142:5: error: static_assert failed due to requirement '__libcpp_is_unsigned_integer<i
nt>::value' "__countl_zero requires an unsigned integer type"
    static_assert(__libcpp_is_unsigned_integer<_Tp>::value, "__countl_zero requires an unsigned integer type");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/__random/uniform_int_distribution.h:242:24: note: in instantiation of function template s
pecialization 'std::__countl_zero<int>' requested here
    size_t __w = _Dt - __countl_zero(_Rp) - 1;
                       ^
../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/__random/uniform_int_distribution.h:206:17: note: in instantiation of function template s
pecialization 'std::uniform_int_distribution<bool>::operator()<std::linear_congruential_engine<unsigned int, 48271, 0, 2147483647>>' requested here
        {return (*this)(__g, __p_);}
                ^
../../src/tests/fidl/compatibility/compatibility_test.cc:592:43: note: in instantiation of function template specialization 'std::uniform_int_distribution<bool>::operator()<std::linea
r_congruential_engine<unsigned int, 48271, 0, 2147483647>>' requested here
  s->primitive_types.b = bool_distribution(rand_engine);
                                          ^

Is there something wrong on our end that's causing this to happen, or is this from something incorrect with the patch?

In D114129#3165399, @leonardchan wrote:

Is there something wrong on our end that's causing this to happen, or is this from something incorrect with the patch?

Could be; I will look into this (although technically you are not allowed to instantiate std::uniform_int_distribution with bool according to the standard, as @ldionne has noted above).

In D114129#3165399, @leonardchan wrote:

Hi. I think the build issue we're seeing (https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-debug-subbuild/b8829019800818975281/overview) is caused by this patch:

../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/bit:142:5: error: static_assert failed due to requirement '__libcpp_is_unsigned_integer<i
nt>::value' "__countl_zero requires an unsigned integer type"
    static_assert(__libcpp_is_unsigned_integer<_Tp>::value, "__countl_zero requires an unsigned integer type");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/__random/uniform_int_distribution.h:242:24: note: in instantiation of function template s
pecialization 'std::__countl_zero<int>' requested here
    size_t __w = _Dt - __countl_zero(_Rp) - 1;
                       ^
../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/__random/uniform_int_distribution.h:206:17: note: in instantiation of function template s
pecialization 'std::uniform_int_distribution<bool>::operator()<std::linear_congruential_engine<unsigned int, 48271, 0, 2147483647>>' requested here
        {return (*this)(__g, __p_);}
                ^
../../src/tests/fidl/compatibility/compatibility_test.cc:592:43: note: in instantiation of function template specialization 'std::uniform_int_distribution<bool>::operator()<std::linea
r_congruential_engine<unsigned int, 48271, 0, 2147483647>>' requested here
  s->primitive_types.b = bool_distribution(rand_engine);
                                          ^

Is there something wrong on our end that's causing this to happen, or is this from something incorrect with the patch?

I believe there's something wrong with the patch; specifically I think this'll fix it:

diff --git a/libcxx/include/__random/uniform_int_distribution.h b/libcxx/include/__random/uniform_int_distribution.h
index 55b4761637f0..169d2ed112d0 100644
--- a/libcxx/include/__random/uniform_int_distribution.h
+++ b/libcxx/include/__random/uniform_int_distribution.h
@@ -230,8 +230,8 @@ typename uniform_int_distribution<_IntType>::result_type
 uniform_int_distribution<_IntType>::operator()(_URNG& __g, const param_type& __p)
 _LIBCPP_DISABLE_UBSAN_UNSIGNED_INTEGER_CHECK
 {
-    typedef typename conditional<sizeof(result_type) <= sizeof(uint32_t), uint32_t,
-                                 typename make_unsigned<result_type>::type>::type _UIntType;
+    typedef typename conditional<sizeof(result_type) <= sizeof(uint32_t), __identity<uint32_t>,
+                                 make_unsigned<result_type> >::type::type _UIntType;
     const _UIntType _Rp = _UIntType(__p.b()) - _UIntType(__p.a()) + _UIntType(1);
     if (_Rp == 1)
         return __p.a();

I'm working on a regression test now. Shockingly, we have no tests for uniform_int_distribution<T> for any T except int and now __{u,}int128_t — not even short, let alone bool.

• Quuxplusone mentioned this in D114920: [libc++] Explicitly reject `uniform_int_distribution<bool>` and `<char>`..Dec 1 2021, 5:01 PM

In D114129#3165399, @leonardchan wrote:

Hi. I think the build issue we're seeing (https://ci.chromium.org/ui/p/fuchsia/builders/ci/clang_toolchain.ci.core.x64-debug-subbuild/b8829019800818975281/overview) is caused by this patch:

../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/bit:142:5: error: static_assert failed due to requirement '__libcpp_is_unsigned_integer<i
nt>::value' "__countl_zero requires an unsigned integer type"
    static_assert(__libcpp_is_unsigned_integer<_Tp>::value, "__countl_zero requires an unsigned integer type");
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/__random/uniform_int_distribution.h:242:24: note: in instantiation of function template s
pecialization 'std::__countl_zero<int>' requested here
    size_t __w = _Dt - __countl_zero(_Rp) - 1;
                       ^
../../../../llvm-monorepo/llvm-build-1-master-fuchsia-toolchain/install/bin/../include/c++/v1/__random/uniform_int_distribution.h:206:17: note: in instantiation of function template s
pecialization 'std::uniform_int_distribution<bool>::operator()<std::linear_congruential_engine<unsigned int, 48271, 0, 2147483647>>' requested here
        {return (*this)(__g, __p_);}
                ^
../../src/tests/fidl/compatibility/compatibility_test.cc:592:43: note: in instantiation of function template specialization 'std::uniform_int_distribution<bool>::operator()<std::linea
r_congruential_engine<unsigned int, 48271, 0, 2147483647>>' requested here
  s->primitive_types.b = bool_distribution(rand_engine);
                                          ^

Is there something wrong on our end that's causing this to happen, or is this from something incorrect with the patch?

@leonardchan
It's technically not allowed to instantiate the distributions with bool. In that other patch, I am pushing so that we make it ill-formed and provide a diagnostic -- are you able to fix the code on your side?

• arthur.j.odwyer mentioned this in rGa3255f219a86: [libc++] Explicitly reject `uniform_int_distribution<bool>` and `<char>`..Feb 28 2022, 11:59 AM

ldionne mentioned this in D125283: reverts "[libc++] Explicitly reject `uniform_int_distribution<bool>` and `<char>`.".May 24 2022, 12:18 PM

Revision Contents

Path

Size

libcxx/

include/

__bits

11 lines

__random/

uniform_int_distribution.h

41 lines

test/

std/

numerics/

rand/

rand.dis/

rand.dist.uni/

rand.dist.uni.int/

int128.pass.cpp

86 lines

Diff 388290

libcxx/include/__bits

Show All 37 Lines

int __libcpp_clz(unsigned __x) _NOEXCEPT { return __builtin_clz(__x); } int __libcpp_clz(unsigned __x) _NOEXCEPT { return __builtin_clz(__x); }

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR

int __libcpp_clz(unsigned long __x) _NOEXCEPT { return __builtin_clzl(__x); } int __libcpp_clz(unsigned long __x) _NOEXCEPT { return __builtin_clzl(__x); }

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR

int __libcpp_clz(unsigned long long __x) _NOEXCEPT { return __builtin_clzll(__x); } int __libcpp_clz(unsigned long long __x) _NOEXCEPT { return __builtin_clzll(__x); }

#ifndef _LIBCPP_HAS_NO_INT128

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR

int __libcpp_clz(__uint128_t __x) _NOEXCEPT {

MordanteUnsubmitted

Not Done

In the header <bit> there is __countl_zero which has 128-bit support. Can we use that function instead?

Mordante: In the header `<bit>` there is `__countl_zero` which has 128-bit support. Can we use that…

return ((__x >> 64) == 0)

? (64 + __libcpp_clz((unsigned long long)(__x & ((unsigned long long)~0))))

: __libcpp_clz((unsigned long long)(__x >> 64));

}

QuuxplusoneUnsubmitted

Not Done

Nit: I personally prefer to see (X)-1 rather than (X)~0, since I always mentally read the latter as (X)~0u (which would be wrong) and then correct myself. :) Alternatively, in this case I guess you could just use ~0uLL and save yourself some parentheses.

Quuxplusone: Nit: I personally prefer to see `(X)-1` rather than `(X)~0`, since I always mentally read the…

#endif // _LIBCPP_HAS_NO_INT128

QuuxplusoneUnsubmitted

Done

return (_Upper == 0) ? (64 + __libcpp_clz(_Lower)) : __libcpp_clz(_Upper);

}

- #endif

+ #endif // _LIBCPP_HAS_NO_INT128

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR

Quuxplusone:

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR

int __libcpp_popcount(unsigned __x) _NOEXCEPT { return __builtin_popcount(__x); } int __libcpp_popcount(unsigned __x) _NOEXCEPT { return __builtin_popcount(__x); }

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR

int __libcpp_popcount(unsigned long __x) _NOEXCEPT { return __builtin_popcountl(__x); } int __libcpp_popcount(unsigned long __x) _NOEXCEPT { return __builtin_popcountl(__x); }

inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR inline _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR

▲ Show 20 Lines • Show All 92 Lines • Show Last 20 Lines

libcxx/include/__random/uniform_int_distribution.h

	Show All 22 Lines

	_LIBCPP_PUSH_MACROS			_LIBCPP_PUSH_MACROS
	#include <__undef_macros>			#include <__undef_macros>

	_LIBCPP_BEGIN_NAMESPACE_STD			_LIBCPP_BEGIN_NAMESPACE_STD

	// __independent_bits_engine			// __independent_bits_engine

				template <class _UIntType, _UIntType _Xp, size_t _Rp>
				struct __log2_imp;

	template <unsigned long long _Xp, size_t _Rp>			template <unsigned long long _Xp, size_t _Rp>
	struct __log2_imp			struct __log2_imp<unsigned long long, _Xp, _Rp>
	{			{
	static const size_t value = _Xp & ((unsigned long long)(1) << _Rp) ? _Rp			static const size_t value = _Xp & ((unsigned long long)(1) << _Rp) ? _Rp
	: __log2_imp<_Xp, _Rp - 1>::value;			: __log2_imp<unsigned long long, _Xp, _Rp - 1>::value;
	};			};

	template <unsigned long long _Xp>			template <unsigned long long _Xp>
	struct __log2_imp<_Xp, 0>			struct __log2_imp<unsigned long long, _Xp, 0>
	{			{
	static const size_t value = 0;			static const size_t value = 0;
	};			};

	template <size_t _Rp>			template <size_t _Rp>
	struct __log2_imp<0, _Rp>			struct __log2_imp<unsigned long long, 0, _Rp>
	{			{
	static const size_t value = _Rp + 1;			static const size_t value = _Rp + 1;
	};			};

				#ifndef _LIBCPP_HAS_NO_INT128

				template <__uint128_t _Xp, size_t _Rp>
				struct __log2_imp<__uint128_t, _Xp, _Rp>
				{
				static const size_t value = (_Xp >> 64)
				? (64 + __log2_imp<unsigned long long, (_Xp >> 64), 63>::value)
				: __log2_imp<unsigned long long, _Xp, 63>::value;
				};

				#endif // _LIBCPP_HAS_NO_INT128

	template <class _UIntType, _UIntType _Xp>			template <class _UIntType, _UIntType _Xp>
	struct __log2			struct __log2
	{			{
	static const size_t value = __log2_imp<_Xp,			static const size_t value = __log2_imp<
	sizeof(_UIntType) * __CHAR_BIT__ - 1>::value;			#ifndef _LIBCPP_HAS_NO_INT128
				typename conditional
				<
				sizeof(_UIntType) <= sizeof(unsigned long long),
				unsigned long long,
				__uint128_t
				>::type,
				#else
				unsigned long long,
				#endif // _LIBCPP_HAS_NO_INT128
				_Xp, sizeof(_UIntType) * __CHAR_BIT__ - 1>::value;
	};			};

	template<class _Engine, class _UIntType>			template<class _Engine, class _UIntType>
	class __independent_bits_engine			class __independent_bits_engine
	{			{
	public:			public:
	// types			// types
	typedef _UIntType result_type;			typedef _UIntType result_type;
	▲ Show 20 Lines • Show All 188 Lines • ▼ Show 20 Lines
	};			};

	template<class _IntType>			template<class _IntType>
	template<class _URNG>			template<class _URNG>
	typename uniform_int_distribution<_IntType>::result_type			typename uniform_int_distribution<_IntType>::result_type
	uniform_int_distribution<_IntType>::operator()(_URNG& __g, const param_type& __p)			uniform_int_distribution<_IntType>::operator()(_URNG& __g, const param_type& __p)
	_LIBCPP_DISABLE_UBSAN_UNSIGNED_INTEGER_CHECK			_LIBCPP_DISABLE_UBSAN_UNSIGNED_INTEGER_CHECK
	{			{
	typedef typename conditional<sizeof(result_type) <= sizeof(uint32_t),			typedef typename conditional<sizeof(result_type) <= sizeof(uint32_t), uint32_t,
	uint32_t, uint64_t>::type _UIntType;			typename make_unsigned<result_type>::type>::type _UIntType;
	const _UIntType _Rp = _UIntType(__p.b()) - _UIntType(__p.a()) + _UIntType(1);			const _UIntType _Rp = _UIntType(__p.b()) - _UIntType(__p.a()) + _UIntType(1);
	if (_Rp == 1)			if (_Rp == 1)
	return __p.a();			return __p.a();
	const size_t _Dt = numeric_limits<_UIntType>::digits;			const size_t _Dt = numeric_limits<_UIntType>::digits;
	typedef __independent_bits_engine<_URNG, _UIntType> _Eng;			typedef __independent_bits_engine<_URNG, _UIntType> _Eng;
	if (_Rp == 0)			if (_Rp == 0)
	return static_cast<result_type>(_Eng(__g, _Dt)());			return static_cast<result_type>(_Eng(__g, _Dt)());
	size_t __w = _Dt - __libcpp_clz(_Rp) - 1;			size_t __w = _Dt - __libcpp_clz(_Rp) - 1;
	if ((_Rp & (numeric_limits<_UIntType>::max() >> (_Dt - __w))) != 0)			if ((_Rp & (numeric_limits<_UIntType>::max() >> (_Dt - __w))) != 0)
	++__w;			++__w;
				QuuxplusoneUnsubmitted Done Reply Inline Actions Would this work if it were simply typedef typename conditional<sizeof(result_type) <= 4, uint32_t, typename make_unsigned<result_type>::type>::type _UIntType; ? That is, I think the old code was just assuming that for types larger than 32 bits, `typename make_unsigned<result_type>::type` was by definition `uint64_t`; now that that's false, we can just make the substitution. Quuxplusone: Would this work if it were simply ``` typedef typename conditional<sizeof(result_type) <= 4…
	_Eng __e(__g, __w);			_Eng __e(__g, __w);
	_UIntType __u;			_UIntType __u;
	do			do
	{			{
	__u = __e();			__u = __e();
	} while (__u >= _Rp);			} while (__u >= _Rp);
	return static_cast<result_type>(__u + __p.a());			return static_cast<result_type>(__u + __p.a());
	}			}
	Show All 38 Lines

libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp

This file was added.

				//===----------------------------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// UNSUPPORTED: c++03

				// <random>

				// template<class _IntType = int>
				// class uniform_int_distribution

				// template<class _URNG> result_type operator()(_URNG& g);

				#include <random>
				#include <cassert>

				#include "test_macros.h"

				int main(int, char**) {

				#ifndef _LIBCPP_HAS_NO_INT128

				// Test that values outside of the 64-bit range can be produced.
				{
				auto e = std::default_random_engine{};
				QuuxplusoneUnsubmitted Done Reply Inline Actions I think it's worth verifying that `d.min() == 0 && d.max() == INT128_MAX` (however we spell that). Similarly on line 40. Shall we add a case where the constructor arguments to `uniform_int_distribution<__[u]int128_t>` are outside the range of int64/uint64? Style nit: Please use `int` not `auto` on lines 30 and 41, and `__int128_t` not `auto` on lines 31 and 42. Shall we check `uniform_int_distribution<__uint128_t>` while we're at it? Quuxplusone: - I think it's worth verifying that `d.min() == 0 && d.max() == INT128_MAX` (however we spell…
				auto d = std::uniform_int_distribution<__int128_t>{};
				MordanteUnsubmitted Done Reply Inline Actions Please change `auto` with the type, here and in the remainder of this file. This is the typical style in LLVM/libc++. Mordante: Please change `auto` with the type, here and in the remainder of this file. This is the typical…
				QuuxplusoneUnsubmitted Done Reply Inline Actions (FWIW, as long as it's `auto x = TheSpecificType();` as in these cases, personally I'm fine with it. But even in these cases, `TheSpecificType x;` will be shorter.) Quuxplusone: (FWIW, as long as it's `auto x = TheSpecificType();` as in these cases, personally I'm fine…
				assert(d.min() == 0 && d.max() == std::numeric_limits<__int128_t>::max());
				bool all_in_range = true;
				for (int i = 0; i < 100; ++i) {
				const __int128_t n = d(e);
				all_in_range = all_in_range && (n <= UINT64_MAX);
				}
				MordanteUnsubmitted Done Reply Inline Actions This feels flaky. If the tests happens to produce 100 random values below `UINT64_MAX` the test fails. I suggest to select a specific random number engine with a fixed seed for this test. Then you can add a stable test to see whether values in the wanted range are produced. Mordante: This feels flaky. If the tests happens to produce 100 random values below `UINT64_MAX` the test…
				QuuxplusoneUnsubmitted Done Reply Inline Actions This isn't literally flaky, since `default_random_engine` will be fixed at compile time and so will its default seed — we're not using `random_device` or anything. I agree its definition might vary between libc++/libstdc++ and MSVC (I'm not sure if it does in practice; I'm pretty sure off the top of my head that it's `minstd_rand0` for both libc++ and libstdc++). Note that engines' behaviors are defined by the standard, so `minstd_rand0{}()` has the same value on all implementations by definition. It's the distributions whose behaviors diverge across implementations (and unfortunately it's the distribution whose behavior we really care about here). The chances of 100 numbers all being clustered in one 18446744073709551616th of the PRNG's range (or even one-tenth of its range, as on line 42) strikes me as actually-far-more-than-astronomically unlikely. So I don't see a problem here. But, at the same time, it wouldn't hurt to `s/default_random_engine/minstd_rand0/` just to remove one superfluous degree of freedom. Quuxplusone: This isn't //literally// flaky, since `default_random_engine` will be fixed at compile time and…
				MordanteUnsubmitted Done Reply Inline Actions Fair point. In fact when it fails we might consider the distribution not to be uniform. I still think it's best to use a fixed random engine. Mordante: Fair point. In fact when it fails we might consider the distribution not to be uniform. I…
				assert(!all_in_range);
				}

				// Same test as above with min/max set and outside the 64-bit range.
				{
				const __int128_t a = ((__int128_t)INT64_MIN) * 10;
				const __int128_t b = ((__int128_t)INT64_MAX) * 10;
				auto e = std::default_random_engine{};
				auto d = std::uniform_int_distribution<__int128_t>{a, b};
				assert(d.min() == a && d.max() == b);
				bool all_in_range = true;
				for (int i = 0; i < 100; ++i) {
				const __int128_t n = d(e);
				assert(n >= a && n <= b);
				all_in_range = all_in_range && (n >= INT64_MIN) && (n <= (INT64_MAX));
				}
				MordanteUnsubmitted Done Reply Inline Actions Why do we need this test? Isn't `assert(n >= a && n <= b);` enough to test? Mordante: Why do we need this test? Isn't `assert(n >= a && n <= b);` enough to test?
				QuuxplusoneUnsubmitted Done Reply Inline Actions I believe this illustrates my recent point about renaming this variable. :) We want to check that all the values are in the range `[a, b]`; but then we also want to check that some of the values are not in the narrower range `[INT64_MIN, INT64_MAX]` (because if they all happened to fall in that narrower range, that would likely indicate a truncation bug somewhere in libc++ code). Quuxplusone: I believe this illustrates my recent point about renaming this variable. :) We want to check…
				MordanteUnsubmitted Done Reply Inline Actions Indeed it illustrates your point. Kete Gregory's "Naming is hard, let's do better" talk of a few years ago directly comes to mind ;-) Mordante: Indeed it illustrates your point. Kete Gregory's "Naming is hard, let's do better" talk of a…
				assert(!all_in_range);
				}

				// Same test as above with __uint128_t.
				{
				const __uint128_t a = UINT64_MAX / 3;
				const __uint128_t b = ((__uint128_t)UINT64_MAX) * 10;
				auto e = std::default_random_engine{};
				auto d = std::uniform_int_distribution<__uint128_t>{a, b};
				assert(d.min() == a && d.max() == b);
				bool all_in_range = true;
				for (int i = 0; i < 100; ++i) {
				const __uint128_t n = d(e);
				assert(n >= a && n <= b);
				all_in_range = all_in_range && (n <= (UINT64_MAX));
				}
				assert(!all_in_range);
				}

				// Regression test for PR#51520:
				{
				auto e = std::default_random_engine{};
				auto d = std::uniform_int_distribution<__int128_t>{INT64_MIN, INT64_MAX};
				assert(d.min() == INT64_MIN && d.max() == INT64_MAX);
				for (int i = 0; i < 100; ++i) {
				const __int128_t n = d(e);
				assert((n >= INT64_MIN) && (n <= INT64_MAX));
				}
				}

				QuuxplusoneUnsubmitted Done Reply Inline Actions Tiny nits throughout: `a <= n && n <= b` is easier on the eyes than `n >= a && n <= b`. https://quuxplusone.github.io/blog/2021/05/05/its-clamping-time/ I'd prefer not to see the extraneous `const` on lines 42, 43, 49, 58, 59, 65. I'd prefer `T(a, b)` over `T{a, b}` on lines 61, 75, and arguably 60, 74 (since we're not worried about most-vexing-parse, and our `T`s here are not sequences, arrays, or aggregates). Perhaps rename `all_in_range` to `all_in_64bit_range` or `all_in_narrower_range` or something like that, to indicate more clearly why we don't expect all the output to be "in range"? Quuxplusone: Tiny nits throughout: - `a <= n && n <= b` is easier on the eyes than `n >= a && n <= b`. https…
				#endif // _LIBCPP_HAS_NO_INT128

				return 0;
				}

This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Fix `uniform_int_distribution` for 128-bit result typeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 388290

libcxx/include/__bits

libcxx/include/__random/uniform_int_distribution.h

libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/int128.pass.cpp

[libc++] Fix `uniform_int_distribution` for 128-bit result type
ClosedPublic