Fixes PR#51520. The problem is that uniform_int_distribution currently uses an unsigned integer with at most 64 bits internally, which is then casted to the desired result type. If the result type is int64_t, this will produce a negative number if the most significant bit is set, but if the result type is __int128_t, the value remains non-negative and will be out of bounds for the example in PR#51520. (The reason why it also seems to work if the upper or lower bound is changed is because this branch will then no longer be taken, and proper rejection sampling takes place.)
The bigger issue here is probably that uniform_int_distribution can be instantiated with __int128_t but will silently produce incorrect results (only the lowest 64 bits can ever be set). libstdc++ also supports __int128_t as a result type, so I have simply extended the maximum width of the internal intermediate result type.
Does this header the same as __bit_log2 in <bit>?