Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
| lib/Support/APInt.cpp | ||
|---|---|---|
| 1912 ↗ | (On Diff #195067) | In this form, this will only be correct if the bit-width is *exactly* 64-bit. For smaller bit-widths you'll have to consider a combination of the intrinsic overflow flag, and whether any bits beyond the bit-width are set. (It might make sense to also special-case <= 32 bits, in which case checking bits in the result of a normal multiply would be sufficient.) | 
I think even if it doesn't address a specific performance problem, this is a good change (well, once it is correct...), because you don't have to treat mul_ov as an expensive operation anymore. E.g. after this change I think we could drop the code in https://github.com/llvm-mirror/llvm/blob/2b38b84a12ff96d6862b7927f92cbdae77653deb/lib/Analysis/ValueTracking.cpp#L4001-L4008 and the big comment in the function without being concerned about a possible perf impact.
Is this trying to address some performance problem? Any numbers?
I'm using a -DBUILD_SHARED_LIBS=on build so there is some indirection cost. Anyway, The performance improvement is obvious:
#include <cstdlib>
#include <llvm/ADT/APInt.h>
using namespace std;
#define FOR(i, a, b) for (remove_cv<remove_reference<decltype(b)>::type>::type i = (a); i < (b); i++)
#define REP(i, n) FOR(i, 0, n)
int main() {
  const unsigned N = 64;
  REP(i, 10000) {
    unsigned a = rand(), b = rand();
    REP(i, 40000) {
      bool overflow;
      llvm::APInt A(N, a), B(N, b);
      auto C = A.umul_ov(B, overflow);
      asm volatile("" : : "r"(C[0]), "r"(overflow) : "memory");
    }
  }
}Before (two udiv calls; actually one suffices): 12.8129 +- 0.0952 seconds time elapsed  ( +-  0.74% )
After (leading zeros + mul + shift + plus): 3.6257 +- 0.0296 seconds time elapsed  ( +-  0.82% )
| lib/Support/APInt.cpp | ||
|---|---|---|
| 1912 ↗ | (On Diff #195067) | Good catch. I forgot about that. Changed the purpose of this patch. | 
I have also fuzzed it for some time with libFuzzer to give me more confidence that it is correct.
#include <llvm/ADT/APInt.h>
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  using namespace llvm;
  unsigned n;
  uint64_t a, b;
  memcpy(&n, Data, 1);
  n = n%64+1;
  memcpy(&a, Data+1, 8);
  memcpy(&b, Data+9, 8);
  bool overflow, overflow1;
  APInt A(n, a), B(n, b), C = A.umul_ov(B, overflow), D = A * B;
  if (C != D) __builtin_trap();
  if (A != 0 && B != 0) overflow1 = D.udiv(A) != B;
  else overflow1 = false;
  if (overflow != overflow1) __builtin_trap();
  return 0;
}LGTM. It might still make sense to optimize the single word case separately, but for the general case I don't think we can do much better than this.
| lib/Support/APInt.cpp | ||
|---|---|---|
| 1919 ↗ | (On Diff #195158) | Is there a reason to use operator*(RHS) over *this * RHS here? |