Use add_with_carry builtin to improve the performance of
addition and multiplication of UInt class. For 128-bit, it is as
fast as using __uint128_t.
Microbenchmark for addition:
https://quick-bench.com/q/-5a6xM4T8rIXBhqMTtLE-DD2h8w
Microbenchmark for multiplication:
https://quick-bench.com/q/P2muLAzJ_W-VqWCuxEJ0CU0bLDg
Microbenchmark for shift right:
https://quick-bench.com/q/N-jkKXaVsGQ4AAv3k8VpsVkua5Y
Microbenchmark for shift left:
https://quick-bench.com/q/5-RzwF8UdslC-zuhNajXtXdzLRM
is there a reason this is changed?