⚙ D60669 [APInt] Optimize umul

MaskRay created this revision.Apr 14 2019, 7:49 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2019, 7:49 AM

Herald added subscribers: llvm-commits, kristina, dexonsmith. · View Herald Transcript

Harbormaster completed remote builds in B30507: Diff 195067.Apr 14 2019, 7:50 AM

nikic added inline comments.Apr 14 2019, 8:34 AM

lib/Support/APInt.cpp
1912 ↗	(On Diff #195067)	In this form, this will only be correct if the bit-width is exactly 64-bit. For smaller bit-widths you'll have to consider a combination of the intrinsic overflow flag, and whether any bits beyond the bit-width are set. (It might make sense to also special-case <= 32 bits, in which case checking bits in the result of a normal multiply would be sufficient.)

Is this trying to address some performance problem? Any numbers?

In D60669#1465834, @lebedev.ri wrote:

Is this trying to address some performance problem? Any numbers?

I think even if it doesn't address a specific performance problem, this is a good change (well, once it is correct...), because you don't have to treat mul_ov as an expensive operation anymore. E.g. after this change I think we could drop the code in https://github.com/llvm-mirror/llvm/blob/2b38b84a12ff96d6862b7927f92cbdae77653deb/lib/Analysis/ValueTracking.cpp#L4001-L4008 and the big comment in the function without being concerned about a possible perf impact.

Repurpose this patch

Harbormaster completed remote builds in B30514: Diff 195091.Apr 14 2019, 7:56 PM

Is this trying to address some performance problem? Any numbers?

I'm using a -DBUILD_SHARED_LIBS=on build so there is some indirection cost. Anyway, The performance improvement is obvious:

#include <cstdlib>
#include <llvm/ADT/APInt.h>
using namespace std;

#define FOR(i, a, b) for (remove_cv<remove_reference<decltype(b)>::type>::type i = (a); i < (b); i++)
#define REP(i, n) FOR(i, 0, n)

int main() {
  const unsigned N = 64;
  REP(i, 10000) {
    unsigned a = rand(), b = rand();
    REP(i, 40000) {
      bool overflow;
      llvm::APInt A(N, a), B(N, b);
      auto C = A.umul_ov(B, overflow);
      asm volatile("" : : "r"(C[0]), "r"(overflow) : "memory");
    }
  }
}

Before (two udiv calls; actually one suffices): 12.8129 +- 0.0952 seconds time elapsed ( +- 0.74% )
After (leading zeros + mul + shift + plus): 3.6257 +- 0.0296 seconds time elapsed ( +- 0.82% )

lib/Support/APInt.cpp
1912 ↗	(On Diff #195067)	Good catch. I forgot about that. Changed the purpose of this patch.

I have also fuzzed it for some time with libFuzzer to give me more confidence that it is correct.

#include <llvm/ADT/APInt.h>

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
  using namespace llvm;
  unsigned n;
  uint64_t a, b;
  memcpy(&n, Data, 1);
  n = n%64+1;
  memcpy(&a, Data+1, 8);
  memcpy(&b, Data+9, 8);

  bool overflow, overflow1;
  APInt A(n, a), B(n, b), C = A.umul_ov(B, overflow), D = A * B;
  if (C != D) __builtin_trap();

  if (A != 0 && B != 0) overflow1 = D.udiv(A) != B;
  else overflow1 = false;
  if (overflow != overflow1) __builtin_trap();

  return 0;
}

Fold a lz test

Harbormaster completed remote builds in B30543: Diff 195158.Apr 15 2019, 5:58 AM

Ping :)

LGTM. It might still make sense to optimize the single word case separately, but for the general case I don't think we can do much better than this.

lib/Support/APInt.cpp
1919 ↗	(On Diff #195158)	Is there a reason to use `operator(RHS)` over `this * RHS` here?

This revision is now accepted and ready to land.Apr 18 2019, 12:43 PM

operator*(RHS) -> *this * RHS

MaskRay marked an inline comment as done.Apr 18 2019, 6:45 PM

Harbormaster completed remote builds in B30775: Diff 195854.Apr 18 2019, 6:47 PM

Closed by commit rL358730: [APInt] Optimize umul_ov (authored by MaskRay). · Explain WhyApr 18 2019, 7:04 PM

This revision was automatically updated to reflect the committed changes.

This is an archive of the discontinued LLVM Phabricator instance.

[APInt] Optimize umul_ov
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 195855

llvm/trunk/lib/Support/APInt.cpp

llvm/trunk/unittests/ADT/APIntTest.cpp