This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
libcxx/
-
include/
-
random
-
test/std/numerics/rand/rand.dis/rand.dist.bern/rand.dist.bern.bin/
-
std/
-
numerics/
-
rand/
-
rand.dis/
-
rand.dist.bern/
-
rand.dist.bern.bin/
-
eval.pass.cpp

Differential D74997

[libc++] Bugfix to std::binomial_distribution<int>
ClosedPublic

Authored by atmnpatel on Feb 21 2020, 3:54 PM.

Download Raw Diff

Details

Reviewers

ldionne
mclow.lists
EricWF

Group Reviewers

Restricted Project

Commits

rG51b78a3e06d4: [libc++] Bugfix to std::binomial_distribution<int>

Summary

The current implementation of binomial_distribution is not guaranteed to converge for certain extreme configurations of the engine and distribution. This is due to a mistake in the implementation of the algorithm from the given reference paper. The algorithm in the paper is guaranteed to terminate but has redundant statements. The current implementation simplified away the redundancy into a while loop, but it excludes the return condition of the case where a good sample cannot be returned for the particular sample being used from the uniform distribution, which is what causes the infinite loop. This change guarantees termination by recognizing that a good sample cannot be returned and returning 0 after breaking the loop. This is also in contrast to the paper because the return value as specified in the paper violates basic checks in at least a subset of the extreme cases where the current implementation fails to terminate. This default return value of 0 is satisfactory for the extreme case known so far.

Since this is only meant to affect extreme cases where the algorithm does not terminate anyways, the behavior is expected to remain exactly the same for all non-extreme cases that have been terminating so far.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=44847

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

atmnpatel created this revision.Feb 21 2020, 3:54 PM

Herald added a reviewer: EricWF. · View Herald TranscriptFeb 21 2020, 3:54 PM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: libcxx-commits, dexonsmith. · View Herald Transcript

Harbormaster completed remote builds in B47073: Diff 246030.Feb 21 2020, 4:02 PM

Is there any way to add a test for this? Perhaps something based on the reproducer in https://llvm.org/PR44847?

This revision now requires changes to proceed.Feb 25 2020, 4:04 PM

Yep, sorry I forgot to add it before. It's a bit of a wordy and entirely based on the bug report.

Thanks for your contribution!

What email address do you want this committed with? In the future, please consider using arc diff to upload patches, since that automatically handles attribution.

Also, I'll admit I am thorn apart on this change. One the one hand I can confirm it fixes the issue reported in PR44847, however I am not sure of what it does to the algorithm itself. Indeed, our tests for <random> are not exactly great, and I don't understand or have access to the algorithm that's implemented in binomial_distribution. So I can't say for sure that it doesn't have other side effects such as making binomial_distribution not binomial anymore, or similar. This is not a great place to be in.

Perhaps you can walk me through your thought process for deciding when the algorithm should break? @EricWF @mclow.lists have either of you fixed bugs like this in <random>? How do you do it?

Sorry, I was transitioning between machines while updating the commit and I couldn't figure out arc diff - I'd like it to be committed under a335pate@uwaterloo.ca.

I have access to the paper and can confirm that although the implementation is a convoluted implementation of the algorithm in the paper, it does indeed follow the algorithm as best as it should - the most direct implementation of the algorithm is at least partially incorrect (I implemented it and it doesn't pass the most basic tests we have as far as I can tell). The original algorithm consists of a few for-loops (with return conditions) as well as a catch-all return condition, which as I understand it, has been merged rather cleverly into a single while loop while maintaining its behavior (the original algorithm as written is almost certainly erroneous probably due to typographical mistakes). This edit should cause the while loop to terminate when it has looped through an empty loop - which should correspond to the case in the original algorithm where it terminates when it is known that a good sample cannot be constructed given the initial sample from uniform distribution, and returns a fairly default value.

If we assume that the current implementation is in fact a functional implementation of the binomial_distribution, this edit will not change that fact, and will only return 0 in the cases where the current implementation will not terminate. Also I'd like to be really upfront about this - It returns 0 as opposed to following the paper in the catch-all return condition because if the paper is followed, the mean is nowhere near it should be, and this is particular evident in the case of the bug when the probability is set to be extremely small. If instead we return 0, we do end up with a better approximation to the binomial distribution in terms of mean, kurtosis, etc. Since this bug is really hard to reproduce, I can't speak to how generically this will remain true but it is empirically true given what we know now. I did find newer papers with authors that I can contact in case of confusion on the same topic of sampling from a binomial distribution, but given that we have an implementation that is known to work, I wasn't sure if it was appropriate to change the sampling method entirely while claiming to fix a bug that is clearly caused by a simple case of missing a termination guarantee. That is, I would be open to implementing a newer method for binomial_distribution if this seems like an unsatisfactory resolution.

Could you please upload this diff with more context?
(https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface)

This revision now requires changes to proceed.Mar 11 2020, 8:00 PM

atmnpatel edited the summary of this revision. (Show Details)Mar 13 2020, 4:21 PM

Better? Or did I misunderstand the request? The updated patch paragraph from the Code Reviews page seems to be primarily for changes in the content of the patch rather than changes in the commit message/context provided.

In D74997#1918498, @atmnpatel wrote:

Sorry, I was transitioning between machines while updating the commit and I couldn't figure out arc diff - I'd like it to be committed under a335pate@uwaterloo.ca.

Ok, will do.

[...]

If we assume that the current implementation is in fact a functional implementation of the binomial_distribution, this edit will not change that fact, and will only return 0 in the cases where the current implementation will not terminate. Also I'd like to be really upfront about this - It returns 0 as opposed to following the paper in the catch-all return condition because if the paper is followed, the mean is nowhere near it should be, and this is particular evident in the case of the bug when the probability is set to be extremely small. [...]

Ok, reading your explanation, I believe I follow. Thanks for explaining. I'm fine with the change.

In D74997#1918581, @EricWF wrote:

Could you please upload this diff with more context?

@atmnpatel What Eric meant here is to upload a patch with some Diff context. For example, see https://reviews.llvm.org/D76288 where you can expand/collapse the code surrounding the changes, and where it says instead Context not available. We prefer reviewing patches with Diff context cause it's easier, but this time around it's okay.

Thanks a lot for your contribution, I'm going to commit this for you now under a335pate@uwaterloo.ca.

This revision was not accepted when it landed; it landed in state Needs Revision.Mar 17 2020, 12:58 PM

Closed by commit rG51b78a3e06d4: [libc++] Bugfix to std::binomial_distribution<int> (authored by atmnpatel, committed by ldionne). · Explain Why

This revision was automatically updated to reflect the committed changes.

Herald added a reviewer: Restricted Project. · View Herald TranscriptMar 17 2020, 12:58 PM

Revision Contents

Path

Size

libcxx/

include/

random

5 lines

test/

std/

numerics/

rand/

rand.dis/

rand.dist.bern/

rand.dist.bern.bin/

eval.pass.cpp

141 lines

Diff 250877

libcxx/include/random

Show First 20 Lines • Show All 4,042 Lines • ▼ Show 20 Lines	binomial_distribution<_IntType>::operator()(_URNG& __g, const param_type& __pr)
if (__u < 0)		if (__u < 0)
return __pr.__r0_;		return __pr.__r0_;
double __pu = __pr.__pr_;		double __pu = __pr.__pr_;
double __pd = __pu;		double __pd = __pu;
result_type __ru = __pr.__r0_;		result_type __ru = __pr.__r0_;
result_type __rd = __ru;		result_type __rd = __ru;
while (true)		while (true)
{		{
		bool __break = true;
if (__rd >= 1)		if (__rd >= 1)
{		{
__pd = __rd / (__pr.__odds_ratio_ (__pr.__t_ - __rd + 1));		__pd = __rd / (__pr.__odds_ratio_ (__pr.__t_ - __rd + 1));
__u -= __pd;		__u -= __pd;
		__break = false;
if (__u < 0)		if (__u < 0)
return __rd - 1;		return __rd - 1;
}		}
if ( __rd != 0 )		if ( __rd != 0 )
--__rd;		--__rd;
++__ru;		++__ru;
if (__ru <= __pr.__t_)		if (__ru <= __pr.__t_)
{		{
__pu = (__pr.__t_ - __ru + 1) __pr.__odds_ratio_ / __ru;		__pu = (__pr.__t_ - __ru + 1) __pr.__odds_ratio_ / __ru;
__u -= __pu;		__u -= __pu;
		__break = false;
if (__u < 0)		if (__u < 0)
return __ru;		return __ru;
}		}
		if (__break)
		return 0;
}		}
}		}

template <class _CharT, class _Traits, class _IntType>		template <class _CharT, class _Traits, class _IntType>
basic_ostream<_CharT, _Traits>&		basic_ostream<_CharT, _Traits>&
operator<<(basic_ostream<_CharT, _Traits>& __os,		operator<<(basic_ostream<_CharT, _Traits>& __os,
const binomial_distribution<_IntType>& __x)		const binomial_distribution<_IntType>& __x)
{		{
▲ Show 20 Lines • Show All 2,673 Lines • Show Last 20 Lines

libcxx/test/std/numerics/rand/rand.dis/rand.dist.bern/rand.dist.bern.bin/eval.pass.cpp

	Show All 13 Lines
	// class binomial_distribution			// class binomial_distribution

	// template<class _URNG> result_type operator()(_URNG& g);			// template<class _URNG> result_type operator()(_URNG& g);

	#include <random>			#include <random>
	#include <numeric>			#include <numeric>
	#include <vector>			#include <vector>
	#include <cassert>			#include <cassert>
				#include <sstream>

	#include "test_macros.h"			#include "test_macros.h"

	template <class T>			template <class T>
	inline			inline
	T			T
	sqr(T x)			sqr(T x)
	{			{
	▲ Show 20 Lines • Show All 473 Lines • ▼ Show 20 Lines
	// double x_skew = (1-2*d.p()) / std::sqrt(x_var);			// double x_skew = (1-2*d.p()) / std::sqrt(x_var);
	// double x_kurtosis = (1-6d.p()(1-d.p())) / x_var;			// double x_kurtosis = (1-6d.p()(1-d.p())) / x_var;
	assert(mean == x_mean);			assert(mean == x_mean);
	assert(var == x_var);			assert(var == x_var);
	// assert(skew == x_skew);			// assert(skew == x_skew);
	// assert(kurtosis == x_kurtosis);			// assert(kurtosis == x_kurtosis);
	}			}

				void
				test12()
				{
				typedef std::binomial_distribution<> D;
				typedef std::mt19937 G;

				G g;
				D d(128738942, 1.6941441471907126e-08);

				std::string state = "1740222423 1665913615 1140355283 124152834 434145240"
				"2553002688 4143320714 1810519474 447745536 1439409640 1596060396 1243637295"
				"452117361 734967774 3276935081 35650473 682607275 4208082251 3209082916"
				"638915489 4127185595 2859436515 309105096 837982734 796854873 4271538185"
				"2447193692 607594006 4035165093 4230150671 2567368782 1000242037 2469514821"
				"1843373462 1751084370 1033341643 3506396674 4169541123 1191187784 3479797390"
				"3785371742 1475391851 878730063 2661164420 63166678 4127393159 2797714867"
				"1295211604 2717051330 1009514623 1963164571 561646784 819612826 3340955171"
				"1338523647 1675643732 458583760 698472119 3233594836 2901754568 4222994242"
				"51167459 2501563254 2175997686 1673326467 3722097469 2183287831 2155925807"
				"1071447253 1857934241 320830903 1514449149 103571877 839083116 3893321384"
				"4236495022 766393502 2729490440 290181118 3191537542 1077578150 3066185245"
				"3193085445 3786728494 2938418649 3410121447 1453867071 698346001 3037921161"
				"839425565 2245305640 2806447261 3196149514 1071872132 2337761397 3632554165"
				"190093341 4248613644 2372806256 3290113603 3852853233 272818390 3168842643"
				"25788407 4197010683 3864965812 1635548247 2364439227 3344377087 4284620573"
				"3351117493 3398532219 2757166123 1127999905 2988564217 3707129726 3652489018"
				"4035370271 475801332 2109377392 2128345729 3920803035 4271338685 509459802"
				"4158256844 1850467175 1579214935 8921175 4068350958 2951987840 506827330"
				"3520651040 3359838267 1120109827 3917280670 2748947423 3672973280 1566164613"
				"2986317531 1204099196 3080678121 3574913280 4009316336 3034181160 1818230129"
				"3757769877 2464713972 2812294843 782960615 1228678223 2571358051 4260066020"
				"2439643840 3500737183 1433940923 1031851687 190066625 3777385171 4142770213"
				"3539275502 1622933657 425231043 3715607557 260333136 4198959706 358418"
				"1799817566 2839827743 437785672 2967249029 949856347 2081447702 1102224171"
				"479701434 1781895167 2965560025 264797633 2564778619 2515037023 1320978995"
				"780140943 2372404879 3823445604 2917613108 143505740 3507288260 2803553229"
				"4195962819 4072604717 3155823087 323755011 601944215 1840441037 2850820195"
				"915623058 3306124208 3069788039 1553985704 411632899 3200645375 2973968812"
				"4263574437 3360058162 144760024 845487010 3508028432 4091510967 3925394277"
				"71566492 3432433113 3266920114 3539050491 51719451 1245373835 1469278112"
				"3298302496 753088653 2942352102 1565378440 494477947 971879195 482756304"
				"2475493857 143180757 324876427 1610205542 1829295320 1937949038 3733336232"
				"2542145235 3636527510 2347609126 2343078538 2526896356 167862270 2299577281"
				"3382958264 1911078293 531208917 3588214476 1086101513 1838672874 2119663667"
				"491092052 2961424745 3048925589 486607333 3505822195 3888367 2949031946"
				"2684841832 433147539 2333660325 3142554719 435207743 3063000516 4043979879"
				"3290075088 1114755542 18368971 876637247 3352816011 1421909753 3339898083"
				"740553432 3682683666 2699730850 2861403632 1971653904 900380480 2635160544"
				"1318218867 411940 2141321523 2349820793 280562368 3816712514 3790707429"
				"1619023591 2858103376 1462886666 1723686126 3766879240 1918781537 2792938366"
				"166155425 803108075 722833545 220020495 880029214 2901984266 609985526"
				"1367283597 132804580 903066665 131582208 64374393 2006102725 3422930158"
				"4209296423 380263053 3978926691 3310851236 4245770487 4166043866 4080757525"
				"3329599259 258706185 2452129516 3191265966 2958285912 1070664670 921876197"
				"2421722823 2568477756 382467393 2196144533 213270233 116974426 2230947214"
				"2576421741 393776471 2796472698 3647710433 3264988906 726903864 817800486"
				"628224092 2707785007 3517963926 596980027 2466711387 3156540408 2517803670"
				"3408123552 4142066739 3779818910 2988899011 2732117432 3579427018 1513070048"
				"1566052861 20225341 997297613 3219855094 2777075639 1656025420 3670325076"
				"1469330501 3061438653 4264717436 1305791144 1237197751 2943926634 1566843825"
				"3359878993 4037226997 4044024653 3611863927 1375344610 211134383 2406252392"
				"1349912770 1023874273 1912665158 1762983936 407124872 2936278199 1821966634"
				"3337187112 2546090236 2594870585 823411965 126464686 4041388220 1686530706"
				"2780657745 1945569350 203691199 2532411242 1830339266 674003798 2192329968"
				"2425624005 2819484460 3743368462 2565769418 4179439526 216134386 2880090718"
				"623297558 3913067470 745959159 2499436157 373025119 3423124368 2522302278"
				"3719518513 999390119 159673547 228111094 3391079061 1761352720 2549048062"
				"1125219697 2052834337 3743842626 2433549637 3636723358 1860785315 2387664013"
				"581140755 11086848 4199179079 1180488689 2060816030 2550665319 1314472090"
				"1402807876 304522082 3382175195 4260677857 1724818219 2183493354 1004322779"
				"4166984056 889220724 553883566 582971548 2046113107 2080208105 3473121134"
				"1959681858 1840897428 2595714120 855065022 4191762128 3679914005 3623561445"
				"3437337182 3269107597 2019021510 2112281155 985458687 1364815423 514093990"
				"3711847302 704129707 3398127374 517373404 2646977457 3048605419 1372350917"
				"3831335422 2263542968 2283942504 2193996512 824623017 1707815852 3337156739"
				"2301398895 2077322758 193542893 869960695 3878520140 403616946 3228943765"
				"1037753596 2949947821 379992823 2251850209 1614533146 1704886337 108361232"
				"3840616436 2932809257 2375700648 596391307 4226846855 191943050 1271990524"
				"1335422537 3085696177 2030313449 3272604577 1148556450 1184357181 3558074012"
				"3259720214 2755915415 2720703536 31861322 1740307221 1860884298 3922103763"
				"4066872392 1756734488 394294796 2505236387 2456914682 639788702 52063410"
				"2855173018 3307964490 556762160 3624145788 3793504468 4252003358 3690335184"
				"688245281 2259823605 2617950220 1045718164 3091539813 1330130477 736722350"
				"3100437052 3900855736 4183439368 1735720081 2644768495 819274730 2364834023"
				"1393374098 981219339 3969251105 332522940 850159909 646738867 3413137687"
				"1646732884 80027487 2196948979 5295580 3530173036 767814907 2573204209"
				"491686200 1287955820 3095830596 2152743903 1738320986 1900678059 3613699900"
				"3076191184 2917243255 2236492002 3504114019 604643631 3324769580 4078090927"
				"1245379462 2026215662 1566278916 2832509655 3010562339 1269806412 835199342"
				"2561789927 163108895 524878390 833167775 3551760739 3008059185 1133970834"
				"26821616 2846321927 3803209991 581001826 3764614926 3893778555 617853085"
				"183809431 3510530944 350044681 429839558 1238110552 265276207 205294443"
				"3092821176 2003027316 2577165836 1277274629 87531073 63821123 354781812"
				"3700767000 3451421881 990626144 3763226681 3373715717 2360928651 2412110189"
				"3362121672 3080578947 1604861935 3186376735 2989392261 3550022914 756392571"
				"1580512570 2584626785 2727753459 730699388 3379402897 3050444856 2244390108"
				"3941150486 1990708800 2462735 195459645 761670582 1067695927 984662039"
				"2678082647 1839150009 2113552968 406021267 2193154754 720977131 2445722325"
				"2482507181 2062595810 1015226482";

				std::istringstream iss(state);
				iss >> g;

				const int N = 1000000;
				std::vector<D::result_type> u;
				for (int i = 0; i < N; ++i)
				{
				D::result_type v = d(g);
				assert(d.min() <= v && v <= d.max());
				u.push_back(v);
				}
				double mean = std::accumulate(u.begin(), u.end(),
				double(0)) / u.size();
				double var = 0;
				double skew = 0;
				double kurtosis = 0;
				for (unsigned i = 0; i < u.size(); ++i)
				{
				double dbl = (u[i] - mean);
				double d2 = sqr(dbl);
				var += d2;
				skew += dbl * d2;
				kurtosis += d2 * d2;
				}
				var /= u.size();
				double dev = std::sqrt(var);
				skew /= u.size() * dev * var;
				kurtosis /= u.size() * var * var;
				kurtosis -= 3;
				double x_mean = d.t() * d.p();
				double x_var = x_mean*(1-d.p());
				double x_skew = (1-2*d.p()) / std::sqrt(x_var);
				double x_kurtosis = (1-6d.p()(1-d.p())) / x_var;
				assert(std::abs((mean - x_mean) / x_mean) < 0.01);
				assert(std::abs((var - x_var) / x_var) < 0.01);
				assert(std::abs((skew - x_skew) / x_skew) < 0.01);
				assert(std::abs((kurtosis - x_kurtosis) / x_kurtosis) < 0.04);
				}

	int main(int, char**)			int main(int, char**)
	{			{
	test1();			test1();
	test2();			test2();
	test3();			test3();
	test4();			test4();
	test5();			test5();
	test6();			test6();
	test7();			test7();
	test8();			test8();
	test9();			test9();
	test10();			test10();
	test11();			test11();
				test12();

	return 0;			return 0;
	}			}