Prevent overflow from intermediate results when adding UInt<N> values.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libc/src/__support/CPP/UInt.h | ||
---|---|---|
74 | Hi, Tue! I was thinking about the implementation. I little bit worries about the performance. To many if here. I would like to propose you another solution. Of course it is up to you to accept it or not. The strongest point of the solution is slightly reduced number of arithmetic operations and only one "main" if. The second one triggered very rare. uint64_t carry = 0; for (size_t i = 0; i < WordCount; ++i) { // Will be wrapped if sum more than 2^(sizeof(x)) - 1 val[i] += x.val[i]; // If an overflow appears, the result is less than both of the initial // variables if (val[i] < x.val[i]) { // Add previous carry. Overflow is not possible. val[i] += carry; // Put 1 to the next digits. carry = 1; } else { val[i] += carry; // Likely no overflow. if (likely(val[i]) != 0) carry = 0; // else carry keeps value in case of carry = 0 it is simply 0 with // no overflow in case of 1 this made overflow and propagates next. } } return carry; } |
libc/src/__support/CPP/UInt.h | ||
---|---|---|
74 | Hi Kirill, thanks for thinking about improving the performance! I should have added some more background to this patch. The main reason I had to make it a bit complicated is that when using UInt<128> to replace __uint128_t (which we will need to do for targets without __uint128_t builtin supports), it failed some tests that check overflow flags, which are set by the intermediate computation such as val[i] += x.val[i] in your improvement or the previous implementation, while the overall __uint128_t addition is not overflowed. Of course this change will make another issue pop up, that is now we don't set overflow flag/trap when the real sum in __uint128_t is overflowed. That's why I left some todo for later patches. |
libc/src/__support/CPP/UInt.h | ||
---|---|---|
74 | Thank you for the reply. it was mistake, I should read the title more carefully. Your solution looks good and beautiful to solve the problem.
So you can get "true overflow" flag in the end or you can clear the flag in the end, so the behavior will be the same. |
Did you consider using builtins with overflow checking:
https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html
It is platform independent and leaves some optimisations opportunities to the compiler.
In general this is a good idea, but it can be two problems:
- It still can setup flags, which should be avoided.
- I'm worried is it already exists on this stage. I had a problem, that some builtins were not exists. See https://github.com/llvm/llvm-project/commit/27aca975b6b6e9d5c7516c091f954884b28650ae for example.
Thanks for your suggestion! We definitely want to use compiler builtins to improve the performance whenever they are available, and fall back to generic implementation otherwise.
As of now, this code is only used in a very limited way for arm32 platforms that do not have __uint128_t builtins, and we haven't finished setting up other requirements such as flags, environments or performance testings yet.
Once those are set, we will followup with performance enhancement using builtins, together with more comprehensive testings, including exception flags and performance, and making sure that the fallback works properly just as @orex mentioned.
libc/src/__support/CPP/UInt.h | ||
---|---|---|
74 | Thanks! We can definitely have followups with performance enhancement per architecture targets once this class is used in more places, together with more comprehensive testings. Re P.S.: Sorry I didn't dig in too much to see why it overflowed. It's possible that some compiler optimizations were too aggressive recognizing that the operations could be done in 32-bits and used such; or they recognized the pattern and reduced it to uint64_t adds with carry bit checks? Anyhow, I plan to use this safe implementation as our baseline and extend it to subtraction next. We definitely should have specialized versions with builtins / clear flags for x86_64 and aarch64 to improve the performance, as we are going to use more than 128-bit integers very soon. |
Hi, Tue!
I was thinking about the implementation. I little bit worries about the performance. To many if here. I would like to propose you another solution. Of course it is up to you to accept it or not. The strongest point of the solution is slightly reduced number of arithmetic operations and only one "main" if. The second one triggered very rare.