Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
| lib/Support/APInt.cpp | ||
|---|---|---|
| 1912 ↗ | (On Diff #195067) | In this form, this will only be correct if the bit-width is *exactly* 64-bit. For smaller bit-widths you'll have to consider a combination of the intrinsic overflow flag, and whether any bits beyond the bit-width are set. (It might make sense to also special-case <= 32 bits, in which case checking bits in the result of a normal multiply would be sufficient.) |
I think even if it doesn't address a specific performance problem, this is a good change (well, once it is correct...), because you don't have to treat mul_ov as an expensive operation anymore. E.g. after this change I think we could drop the code in https://github.com/llvm-mirror/llvm/blob/2b38b84a12ff96d6862b7927f92cbdae77653deb/lib/Analysis/ValueTracking.cpp#L4001-L4008 and the big comment in the function without being concerned about a possible perf impact.
Is this trying to address some performance problem? Any numbers?
I'm using a -DBUILD_SHARED_LIBS=on build so there is some indirection cost. Anyway, The performance improvement is obvious:
#include <cstdlib>
#include <llvm/ADT/APInt.h>
using namespace std;
#define FOR(i, a, b) for (remove_cv<remove_reference<decltype(b)>::type>::type i = (a); i < (b); i++)
#define REP(i, n) FOR(i, 0, n)
int main() {
const unsigned N = 64;
REP(i, 10000) {
unsigned a = rand(), b = rand();
REP(i, 40000) {
bool overflow;
llvm::APInt A(N, a), B(N, b);
auto C = A.umul_ov(B, overflow);
asm volatile("" : : "r"(C[0]), "r"(overflow) : "memory");
}
}
}Before (two udiv calls; actually one suffices): 12.8129 +- 0.0952 seconds time elapsed ( +- 0.74% )
After (leading zeros + mul + shift + plus): 3.6257 +- 0.0296 seconds time elapsed ( +- 0.82% )
| lib/Support/APInt.cpp | ||
|---|---|---|
| 1912 ↗ | (On Diff #195067) | Good catch. I forgot about that. Changed the purpose of this patch. |
I have also fuzzed it for some time with libFuzzer to give me more confidence that it is correct.
#include <llvm/ADT/APInt.h>
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
using namespace llvm;
unsigned n;
uint64_t a, b;
memcpy(&n, Data, 1);
n = n%64+1;
memcpy(&a, Data+1, 8);
memcpy(&b, Data+9, 8);
bool overflow, overflow1;
APInt A(n, a), B(n, b), C = A.umul_ov(B, overflow), D = A * B;
if (C != D) __builtin_trap();
if (A != 0 && B != 0) overflow1 = D.udiv(A) != B;
else overflow1 = false;
if (overflow != overflow1) __builtin_trap();
return 0;
}LGTM. It might still make sense to optimize the single word case separately, but for the general case I don't think we can do much better than this.
| lib/Support/APInt.cpp | ||
|---|---|---|
| 1919 ↗ | (On Diff #195158) | Is there a reason to use operator*(RHS) over *this * RHS here? |