Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Support/APInt.cpp | ||
---|---|---|
1912 ↗ | (On Diff #195067) | In this form, this will only be correct if the bit-width is *exactly* 64-bit. For smaller bit-widths you'll have to consider a combination of the intrinsic overflow flag, and whether any bits beyond the bit-width are set. (It might make sense to also special-case <= 32 bits, in which case checking bits in the result of a normal multiply would be sufficient.) |
I think even if it doesn't address a specific performance problem, this is a good change (well, once it is correct...), because you don't have to treat mul_ov as an expensive operation anymore. E.g. after this change I think we could drop the code in https://github.com/llvm-mirror/llvm/blob/2b38b84a12ff96d6862b7927f92cbdae77653deb/lib/Analysis/ValueTracking.cpp#L4001-L4008 and the big comment in the function without being concerned about a possible perf impact.
Is this trying to address some performance problem? Any numbers?
I'm using a -DBUILD_SHARED_LIBS=on build so there is some indirection cost. Anyway, The performance improvement is obvious:
#include <cstdlib> #include <llvm/ADT/APInt.h> using namespace std; #define FOR(i, a, b) for (remove_cv<remove_reference<decltype(b)>::type>::type i = (a); i < (b); i++) #define REP(i, n) FOR(i, 0, n) int main() { const unsigned N = 64; REP(i, 10000) { unsigned a = rand(), b = rand(); REP(i, 40000) { bool overflow; llvm::APInt A(N, a), B(N, b); auto C = A.umul_ov(B, overflow); asm volatile("" : : "r"(C[0]), "r"(overflow) : "memory"); } } }
Before (two udiv calls; actually one suffices): 12.8129 +- 0.0952 seconds time elapsed ( +- 0.74% )
After (leading zeros + mul + shift + plus): 3.6257 +- 0.0296 seconds time elapsed ( +- 0.82% )
lib/Support/APInt.cpp | ||
---|---|---|
1912 ↗ | (On Diff #195067) | Good catch. I forgot about that. Changed the purpose of this patch. |
I have also fuzzed it for some time with libFuzzer to give me more confidence that it is correct.
#include <llvm/ADT/APInt.h> extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) { using namespace llvm; unsigned n; uint64_t a, b; memcpy(&n, Data, 1); n = n%64+1; memcpy(&a, Data+1, 8); memcpy(&b, Data+9, 8); bool overflow, overflow1; APInt A(n, a), B(n, b), C = A.umul_ov(B, overflow), D = A * B; if (C != D) __builtin_trap(); if (A != 0 && B != 0) overflow1 = D.udiv(A) != B; else overflow1 = false; if (overflow != overflow1) __builtin_trap(); return 0; }
LGTM. It might still make sense to optimize the single word case separately, but for the general case I don't think we can do much better than this.
lib/Support/APInt.cpp | ||
---|---|---|
1919 ↗ | (On Diff #195158) | Is there a reason to use operator*(RHS) over *this * RHS here? |