Page MenuHomePhabricator

Replace HashString algorithm with xxHash64
AbandonedPublic

Authored by scott.smith on Apr 25 2017, 3:44 PM.

Details

Summary

The previous algorithm processed one character at a time, which is very painful on a modern CPU. Replace it with xxHash64, which both already exists in the codebase and is fairly fast.

Diff Detail

Repository
rL LLVM

Event Timeline

scott.smith created this revision.Apr 25 2017, 3:44 PM
ruiu added a subscriber: ruiu.Apr 25 2017, 3:49 PM
ruiu added inline comments.
include/llvm/ADT/StringExtras.h
136

Rename Result Seed.

rename param

scott.smith marked an inline comment as done.Apr 25 2017, 4:03 PM
vsk edited edge metadata.Apr 25 2017, 4:05 PM

Thanks for working on this! In the future, please set llvm-commits as a subscriber on differential revisions, not as a reviewer. Also, please upload patches with context (e.g "git diff -U1000").

IMO, tests which break when the string hash function is changed are suspicious. Why is a dependence on a specific hash function being tested? It might make sense to fix the tests in prep commits (ideally no test updates would be needed with this patch). I'll look into fixing the llvm-{cov,profdata} issues.

include/llvm/ADT/StringExtras.h
138

Would xxh32 run significantly faster/slower than xxh64?

In D32509#737488, @vsk wrote:

Thanks for working on this! In the future, please set llvm-commits as a subscriber on differential revisions, not as a reviewer. Also, please upload patches with context (e.g "git diff -U1000").

The list is both a subscriber and a reviewer. Next time I'll only make it a subscriber.
I assumed I had no context because Phabricator couldn't map my patch to the repository. Your comment makes me think it doesn't even try (ReviewBoard does this automatically). I didn't realize I'm supposed to supply the context!

IMO, tests which break when the string hash function is changed are suspicious. Why is a dependence on a specific hash function being tested? It might make sense to fix the tests in prep commits (ideally no test updates would be needed with this patch). I'll look into fixing the llvm-{cov,profdata} issues.

That was my first thought too - I was worried the hash value was encoded in the output some how. But I think this stems from the use of StringMap and StringSet, which place into buckets by hash value. So it isn't that anything interesting changed, just that the order of iteration changed. None of the interesting contents changed, AFAICT.

include/llvm/ADT/StringExtras.h
138

It isn't already in the llvm code base. If we're going that route of adding a hash function, there are probably even better choices.

vsk added a subscriber: aprantl.Apr 25 2017, 4:30 PM
In D32509#737488, @vsk wrote:

Thanks for working on this! In the future, please set llvm-commits as a subscriber on differential revisions, not as a reviewer. Also, please upload patches with context (e.g "git diff -U1000").

The list is both a subscriber and a reviewer. Next time I'll only make it a subscriber.
I assumed I had no context because Phabricator couldn't map my patch to the repository. Your comment makes me think it doesn't even try (ReviewBoard does this automatically). I didn't realize I'm supposed to supply the context!

IMO, tests which break when the string hash function is changed are suspicious. Why is a dependence on a specific hash function being tested? It might make sense to fix the tests in prep commits (ideally no test updates would be needed with this patch). I'll look into fixing the llvm-{cov,profdata} issues.

That was my first thought too - I was worried the hash value was encoded in the output some how. But I think this stems from the use of StringMap and StringSet, which place into buckets by hash value. So it isn't that anything interesting changed, just that the order of iteration changed. None of the interesting contents changed, AFAICT.

Right, it looks like that's the only thing that's changed. My worry is that if we need to change the hash again, we'd have to go through another batch of test updates. It might be easier to convince people to not print things out in an order determined by a hash function now.

include/llvm/ADT/StringExtras.h
138

Ok. I think it's fine to switch to xxh64 for now.

test/DebugInfo/X86/gnu-public-names.ll
218

I'm not familiar with this test, but it's a bit concerning that more checks get added than removed. @aprantl, what are your thoughts on going forward with the test updates vs. making the output of llvm-dwarfdump not depend on the string hash function?

aprantl added inline comments.
include/llvm/ADT/StringExtras.h
134

For consistency: x86_64?

test/DebugInfo/X86/gnu-public-names.ll
218

DWARFDebugPubTable::dump() iterates over the elements of an llvm::Set. We should have separate change first that sorts the set.

@echristo: Is there anything besides LLVM tests that depends on llvm-dwarfdump's output ordering for the .gnu_pubnames section?

Updated to reflect the testing changes made by Vedant.

scott.smith marked 4 inline comments as done.
echristo edited edge metadata.Apr 26 2017, 9:59 AM

@echristo: Is there anything besides LLVM tests that depends on llvm-dwarfdump's output ordering for the .gnu_pubnames section?

Shouldn't be, no.

echristo accepted this revision.Apr 26 2017, 10:01 AM

This is fine with me.

test/DebugInfo/X86/gnu-public-names.ll
218

Agreed.

This revision is now accepted and ready to land.Apr 26 2017, 10:01 AM
ruiu added inline comments.Apr 26 2017, 10:04 AM
include/llvm/ADT/StringExtras.h
158–159

nit: in the LLVM coding style you want to put { at end of the previous line.

Addressed coding style Issue.

Can someone commit this for me? I don't have access. Thank you!

scott.smith marked an inline comment as done.Apr 26 2017, 10:11 AM
ruiu added a comment.Apr 26 2017, 10:11 AM

I can do that for you.

ruiu added a comment.Apr 26 2017, 11:20 AM

Scott,

I cannot submit this patch because there are a few tests failing with your patch. Here is a log. https://gist.github.com/rui314/ef9cc0845131eb6e5e1e00912aefa2b5

Can you take a look?

vsk added inline comments.Apr 26 2017, 11:21 AM
test/DebugInfo/Generic/accel-table-hash-collisions.ll
31

Might make these two 'CHECK-DAG' for some future-proofing.

35

Ditto.

Updated to include clang test changes.

This revision was automatically updated to reflect the committed changes.

Fixed clang-tools-extra build issues, but doesn't fix Mac lld issues (http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/7423/steps/test_lld/logs/stdio) as I do not have access to a Mac. Can someone look into it? It's probably just an identifier ordering issue.

vsk added a comment.Apr 26 2017, 5:52 PM

Fixed clang-tools-extra build issues, but doesn't fix Mac lld issues (http://lab.llvm.org:8011/builders/lld-x86_64-darwin13/builds/7423/steps/test_lld/logs/stdio) as I do not have access to a Mac. Can someone look into it? It's probably just an identifier ordering issue.

Is it necessary to have a mac to run the mach-o tests? At any rate, changing the 'CHECK'/'DCHECK' lines to 'CHECK-DAG'/'DCHECK-DAG' should do the trick.

In D32509#738971, @vsk wrote:

Is it necessary to have a mac to run the mach-o tests? At any rate, changing the 'CHECK'/'DCHECK' lines to 'CHECK-DAG'/'DCHECK-DAG' should do the trick.

AFAIK the test only failed on the mac buildbot. I'm of two minds on CHECK vs CHECK-DAG; it makes the test immune to hash order issues, but in some cases I believe it can make the test too permissive, since in the end it just makes sure the lines are present, not that they are grouped together in any meaningful way.

vsk added a comment.Apr 27 2017, 8:40 AM
In D32509#738971, @vsk wrote:

Is it necessary to have a mac to run the mach-o tests? At any rate, changing the 'CHECK'/'DCHECK' lines to 'CHECK-DAG'/'DCHECK-DAG' should do the trick.

AFAIK the test only failed on the mac buildbot. I'm of two minds on CHECK vs CHECK-DAG; it makes the test immune to hash order issues, but in some cases I believe it can make the test too permissive, since in the end it just makes sure the lines are present, not that they are grouped together in any meaningful way.

That's a fair point. My thinking was that since the output being tested relies on the hash function to for ordering, the test shouldn't be checking for a particular order. But, instead of making the tests more permissive, we could simply sort things before printing them.

scott.smith reopened this revision.Apr 27 2017, 10:33 AM

Reopening as this was reverted.

This revision is now accepted and ready to land.Apr 27 2017, 10:33 AM
scott.smith abandoned this revision.May 2 2017, 2:39 PM

I'll get much better performance by just replacing lldb::ConstString with a lock free hashtable utilizing its own hash function.

davide edited edge metadata.May 2 2017, 4:49 PM

I'll get much better performance by just replacing lldb::ConstString with a lock free hashtable utilizing its own hash function.

I can't really comment on your lldb specific solution as I'm not familiar with that code, but I think this would be a valuable improvement for llvm independently.

Thanks!

Davide