Add SaturatingMultiplyAdd convenience function template since A + (X * Y) comes up frequently when doing weighted arithmetic.
Details
Diff Detail
Event Timeline
include/llvm/Support/MathExtras.h | ||
---|---|---|
727 | Do we want types of X and Y to be same ? For instance, the weight can be less than 1. Also we may want to have an overloaded function that is used for common cases where one of the X or Y is constant: template <typename T, uint64_t Multiplier> it is probably just slightly more efficient so probably not in this patch. |
it is probably just slightly more efficient so probably not in this patch.
If we are looking to improve efficiency, the clearest improvement to make is to use __builtin_*_with_overflow which will dramatically reduce the cost since we can directly use the processor flags instead of using some explicit calculation.
Generally in LLVM for this kind of patch, we would add a couple motivating uses as well instead of as a separate patch. That helps show the context where it was intended to be used and that adding the helper is, in fact, a net win.
include/llvm/Support/MathExtras.h | ||
---|---|---|
724 | I think it's worth adding a note indicating that because all these values are unsigned, there is no distinction between a "fused" operation and non-fused from the perspective of whether saturation occurred. | |
unittests/Support/MathExtrasTest.cpp | ||
314 | I don't think we should bother to repeatedly check the case without the ResultOverflowed (except maybe one case to make sure that it compiles (i.e. that the default argument does in fact have a default)). |
include/llvm/Support/MathExtras.h | ||
---|---|---|
727 | X and Y are the multiplicands. Why would they need to be different types for just the unsigned arithmetic case? I guess the addend (A) could usefully be a negative number but that's inconsistent with SaturatingAdd() as it is currently defined. The pending weighted profile change (D15306) doesn't support weight < 1. |
include/llvm/Support/MathExtras.h | ||
---|---|---|
727 | Ok -- sounds reasonable. |
That would be great - and would really simplify the implementation of saturating arithmetic The only caveat is that the builtin_*_overflow intrinsics aren't supported by every compiler that hosts LLVM. For example, GCC 4.x doesn't have builtin_add_overflow() - GCC 5 is required (https://gcc.gnu.org/gcc-5/changes.html). So there would have to be fall-back implementations.
Yeah, it would require some ifdef's and would be extra work. Like I said, "if" we are looking to improve efficiency. I don't think this code at the moment has been measured to be a perf problem though. Also, for PS4, our shipping toolchain is currently compiled with MSVC so this would not affect us anyway (if we used clang-cl, then we could use the __builtin_*_overflow intrinsics, but while clang-cl would be great, I don't think we are planning to do that soon).
I think it's worth adding a note indicating that because all these values are unsigned, there is no distinction between a "fused" operation and non-fused from the perspective of whether saturation occurred.