This is an archive of the discontinued LLVM Phabricator instance.

[Support] Add saturating multiply-add support function
ClosedPublic

Authored by slingn on Dec 9 2015, 10:51 AM.

Details

Summary

Add SaturatingMultiplyAdd convenience function template since A + (X * Y) comes up frequently when doing weighted arithmetic.

Diff Detail

Event Timeline

slingn updated this revision to Diff 42319.Dec 9 2015, 10:51 AM
slingn retitled this revision from to [Support] Add saturating multiply-add support function.
slingn updated this object.
slingn added reviewers: davidxl, silvas.
slingn added a subscriber: llvm-commits.
davidxl added inline comments.Dec 9 2015, 2:29 PM
include/llvm/Support/MathExtras.h
727

Do we want types of X and Y to be same ? For instance, the weight can be less than 1.

Also we may want to have an overloaded function that is used for common cases where one of the X or Y is constant:

template <typename T, uint64_t Multiplier>
​typename std::enable_if<std::is_unsigned<T>::value, T>::type ​ SaturatingMultiplyAdd(T X, T A, bool *ResultOverflowed = nullptr) {
...
}

it is probably just slightly more efficient so probably not in this patch.

silvas edited edge metadata.Dec 9 2015, 7:09 PM

it is probably just slightly more efficient so probably not in this patch.

If we are looking to improve efficiency, the clearest improvement to make is to use __builtin_*_with_overflow which will dramatically reduce the cost since we can directly use the processor flags instead of using some explicit calculation.

silvas added a comment.Dec 9 2015, 7:25 PM

Generally in LLVM for this kind of patch, we would add a couple motivating uses as well instead of as a separate patch. That helps show the context where it was intended to be used and that adding the helper is, in fact, a net win.

include/llvm/Support/MathExtras.h
724

I think it's worth adding a note indicating that because all these values are unsigned, there is no distinction between a "fused" operation and non-fused from the perspective of whether saturation occurred.

unittests/Support/MathExtrasTest.cpp
314

I don't think we should bother to repeatedly check the case without the ResultOverflowed (except maybe one case to make sure that it compiles (i.e. that the default argument does in fact have a default)).

slingn marked 2 inline comments as done.Dec 10 2015, 10:17 AM
slingn added inline comments.
include/llvm/Support/MathExtras.h
727

X and Y are the multiplicands. Why would they need to be different types for just the unsigned arithmetic case? I guess the addend (A) could usefully be a negative number but that's inconsistent with SaturatingAdd() as it is currently defined.

The pending weighted profile change (D15306) doesn't support weight < 1.

davidxl added inline comments.Dec 10 2015, 10:23 AM
include/llvm/Support/MathExtras.h
727

Ok -- sounds reasonable.

it is probably just slightly more efficient so probably not in this patch.

If we are looking to improve efficiency, the clearest improvement to make is to use __builtin_*_with_overflow which will dramatically reduce the cost since we can directly use the processor flags instead of using some explicit calculation.

That would be great - and would really simplify the implementation of saturating arithmetic The only caveat is that the builtin_*_overflow intrinsics aren't supported by every compiler that hosts LLVM. For example, GCC 4.x doesn't have builtin_add_overflow() - GCC 5 is required (https://gcc.gnu.org/gcc-5/changes.html). So there would have to be fall-back implementations.

it is probably just slightly more efficient so probably not in this patch.

If we are looking to improve efficiency, the clearest improvement to make is to use __builtin_*_with_overflow which will dramatically reduce the cost since we can directly use the processor flags instead of using some explicit calculation.

That would be great - and would really simplify the implementation of saturating arithmetic The only caveat is that the builtin_*_overflow intrinsics aren't supported by every compiler that hosts LLVM. For example, GCC 4.x doesn't have builtin_add_overflow() - GCC 5 is required (https://gcc.gnu.org/gcc-5/changes.html). So there would have to be fall-back implementations.

Yeah, it would require some ifdef's and would be extra work. Like I said, "if" we are looking to improve efficiency. I don't think this code at the moment has been measured to be a perf problem though. Also, for PS4, our shipping toolchain is currently compiled with MSVC so this would not affect us anyway (if we used clang-cl, then we could use the __builtin_*_overflow intrinsics, but while clang-cl would be great, I don't think we are planning to do that soon).

slingn updated this revision to Diff 43064.Dec 16 2015, 2:22 PM
slingn edited edge metadata.

Updated for silvas comments.
-Apply SaturatingMultplyAdd() to weighted PGO merging.

slingn updated this revision to Diff 44669.Jan 12 2016, 1:42 PM

Update patch to apply cleanly to HEAD.

davidxl accepted this revision.Jan 12 2016, 1:47 PM
davidxl edited edge metadata.

Very nice cleanups .. LGTM.

This revision is now accepted and ready to land.Jan 12 2016, 1:47 PM
This revision was automatically updated to reflect the committed changes.