This is an archive of the discontinued LLVM Phabricator instance.

[DO NOT SUBMIT] Do strlen() while computing xxhash
Needs ReviewPublic

Authored by ruiu on Apr 26 2018, 5:56 PM.

Details

Summary

I created this patch with the hope that computing a hash value while
calculating the string length is more efficient than doing the two in
separate passes. Unfortunately I couldn't find any performance improvement
on my machine with this patch, but you might find it interesting, so
I'm sharing it.

Event Timeline

ruiu created this revision.Apr 26 2018, 5:56 PM
ruiu added a comment.Apr 26 2018, 6:12 PM

Note that strlen_xxHash64 was actually faster than strlen() and xxHash64() for a microbenchmark. I didn't observe noticeable difference in real benchmarks with this patch though.

It is possible that xxhash is just too slow for use in a hash table. The experiment I did for pr37029 using hash_combine was still using strlen.

grimar added inline comments.Apr 27 2018, 2:23 AM
llvm/lib/Support/xxhash.cpp
182

I remember that when I experimented with all such stuff, at some point I also had the Hash out parameter passed by reference like H here.
And assigning multiple times to it in a loop, like you do, was slow for me I believe. Because when I changed the code to do the single assign after the loop, at the end of the function, profiler showed a much better result.

grimar added inline comments.Apr 28 2018, 2:34 AM
llvm/lib/Support/xxhash.cpp
149

I tried to build this patch with MSVS, but it complains about the absence of this function.

I had to use the following instead:

int psnip_builtin_ffsl(long v) {
  unsigned long r;
  if (_BitScanForward(&r, (unsigned long)v)) {
    return (int)(r + 1);
  }
  return 0;
}

(took it from https://github.com/nemequ/portable-snippets/tree/master/builtin).

But it does not work correctly it seems. I am testing on lld-speed-test\lld-speed-test\scylla,
here at some point Remaining becomes (-1). And then lines below (185, 186)

const char *End = Q + Remaining;
size_t Len = End - P;

handle it wrong and code ends up with
an infinite loop in splitStrings1 because strlen_xxHash64 returns -1 always.

grimar added inline comments.Apr 28 2018, 2:53 AM
llvm/lib/Support/xxhash.cpp
148

btw, if you pass P=="ab\0" to this method,

then R[0] will read the rest 5 bytes of memory from out of bounds, no?

ruiu added inline comments.Apr 28 2018, 10:56 AM
llvm/lib/Support/xxhash.cpp
148

That's intentional.

149

I'm not very familiar with these intrinsics of MSVC, but I believe you forgot to use the 64 bit version. Use this.

static inline int my_ffsl(uint64_t v) {
  uint64r_t R;
  _BitScanForward64(&R, V);
  return R;
}

Note that this should be compiled to a single BSF instruction on x86-64.

grimar added inline comments.Apr 29 2018, 6:20 PM
llvm/lib/Support/xxhash.cpp
148

It would be wrong to keep it in lib/Support with that probably I think.
We can probably assume that it is valid to read out of buffer for the particular LLD use case,
but it does not seem fine to do in general.

149

Yes, thanks. The final code I used was:

static inline int psnip_builtin_ffsl(uint64_t v) {
  unsigned long R;
  _BitScanForward64(&R, v);
  return R + 1;
}

And I was able to profile the patch finally. It was faster for me than the original code but
results I observe with D45571 are a bit better. I'll post the numbers to the D45571 thread for
completeness.

Note that this should be compiled to a single BSF instruction on x86-64.

Yes, it was:

...
      Remaining = (psnip_builtin_ffsl(X) >> 3) - 1;
00007FF6F219C604  bsf         rcx,r8  
00007FF6F219C608  inc         ecx  
00007FF6F219C60A  sar         ecx,3  
00007FF6F219C60D  dec         ecx  
  }