This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Avoid rejection sampling in `uniform_int_distribution` if possible
AbandonedPublic

Authored by fwolff on Dec 5 2021, 1:41 PM.

Details

Reviewers
Quuxplusone
ldionne
Group Reviewers
Restricted Project
Summary

Fixes PR#39209. Suppose we have a URBG which generates values in the range [0, 12), and we want a uniform distribution for the range [0, 3). The current implementation will now use an independent bits engine to generate two independent bits using the URBG, i.e. the range [0, 4), and then do rejection sampling. But this is not necessary: Since URBG is supposed to generate uniformly distributed values, we can assume that each of the subranges [0, 3), [3, 6), [6, 9) and [9, 12) is equally probable, i.e. we can get a uniform value in [0, 3) by simply generating one value in [0, 12) and dividing it by 4 (=12/3).

Diff Detail

Event Timeline

fwolff requested review of this revision.Dec 5 2021, 1:41 PM
fwolff created this revision.
Herald added a reviewer: Restricted Project. · View Herald TranscriptDec 5 2021, 1:41 PM
Quuxplusone requested changes to this revision.Dec 5 2021, 3:49 PM

My initial reaction is that this change slows down the common case to cater to a really super pathological case; I don't think we should do it (but a benchmark could probably change my mind).
The new test needs to move from libcxx/test/std to libcxx/test/libcxx if it's testing non-mandated, libcxx-specific behavior.

libcxx/include/__random/uniform_int_distribution.h
244–245

(1) _Rg != 0 is trivially true, right?
(2) Should we be worried that _Rg % _Rp is a division by a non-constant value, on the hot path? I think this change needs at least some benchmark numbers.
(3) In non-pathological cases, _Rg is a power of two, so _Rg % _Rp == 0 is true only if _Rp is also a power of two. But if _Rp is a power of two, don't we already do the optimal number of calls to __g? So this helps only in the pathological cases where the URBG's range is not a power of two? Perhaps at least we could if constexpr (or equivalent) to keep the non-pathological case fast.

libcxx/test/std/numerics/rand/rand.dis/rand.dist.uni/rand.dist.uni.int/no_wasted_entropy.pass.cpp
41

I'm pretty sure this is library UB.
http://eel.is/c++draft/rand.req.urng#1 "A uniform random bit generator g of type G is a function object returning unsigned integer values such that each value in the range of possible results has (ideally) equal probability of being returned."

If this test case is deliberately doing weird stuff to test a corner case, then OK (but it needs to be more explicit about what corner case it's testing).

This revision now requires changes to proceed.Dec 5 2021, 3:49 PM
fwolff abandoned this revision.Dec 6 2021, 11:48 AM

My initial reaction is that this change slows down the common case to cater to a really super pathological case; I don't think we should do it (but a benchmark could probably change my mind).

OK. This was just a suggestion because there was a bug report about this, and the check definitely gets optimized away for all "common" cases, but I understand your concern.

My initial reaction is that this change slows down the common case to cater to a really super pathological case; I don't think we should do it (but a benchmark could probably change my mind).

OK. This was just a suggestion because there was a bug report about this, and the check definitely gets optimized away for all "common" cases, but I understand your concern.

IMO correctness is more important than performance -- if our implementation is incorrect and we can fix it without introducing a bad performance regression (and that seems to be the case, the branch and the modulo don't seem to be much in comparison to the rest of the algorithm, but I could be wrong), then we should do it.