Fixes PR#39209. Suppose we have a URBG which generates values in the range [0, 12), and we want a uniform distribution for the range [0, 3). The current implementation will now use an independent bits engine to generate two independent bits using the URBG, i.e. the range [0, 4), and then do rejection sampling. But this is not necessary: Since URBG is supposed to generate uniformly distributed values, we can assume that each of the subranges [0, 3), [3, 6), [6, 9) and [9, 12) is equally probable, i.e. we can get a uniform value in [0, 3) by simply generating one value in [0, 12) and dividing it by 4 (=12/3).
Details
Diff Detail
Unit Tests
Event Timeline
My initial reaction is that this change slows down the common case to cater to a really super pathological case; I don't think we should do it (but a benchmark could probably change my mind).
The new test needs to move from libcxx/test/std to libcxx/test/libcxx if it's testing non-mandated, libcxx-specific behavior.
libcxx/include/__random/uniform_int_distribution.h | ||
---|---|---|
244–245 | (1) _Rg != 0 is trivially true, right? | |
libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/no_wasted_entropy.pass.cpp | ||
41 | I'm pretty sure this is library UB. If this test case is deliberately doing weird stuff to test a corner case, then OK (but it needs to be more explicit about what corner case it's testing). |
OK. This was just a suggestion because there was a bug report about this, and the check definitely gets optimized away for all "common" cases, but I understand your concern.
IMO correctness is more important than performance -- if our implementation is incorrect and we can fix it without introducing a bad performance regression (and that seems to be the case, the branch and the modulo don't seem to be much in comparison to the rest of the algorithm, but I could be wrong), then we should do it.
(1) _Rg != 0 is trivially true, right?
(2) Should we be worried that _Rg % _Rp is a division by a non-constant value, on the hot path? I think this change needs at least some benchmark numbers.
(3) In non-pathological cases, _Rg is a power of two, so _Rg % _Rp == 0 is true only if _Rp is also a power of two. But if _Rp is a power of two, don't we already do the optimal number of calls to __g? So this helps only in the pathological cases where the URBG's range is not a power of two? Perhaps at least we could if constexpr (or equivalent) to keep the non-pathological case fast.