This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add CachedHashString.
ClosedPublic

Authored by jlebar on Oct 15 2016, 3:08 PM.

Details

Summary

This is like CachedHashStringRef, but owns its data.

This lets us use strings inside of DenseMaps.

Event Timeline

jlebar updated this revision to Diff 74773.Oct 15 2016, 3:08 PM
jlebar retitled this revision from to [ADT] Add CachedHashString..
jlebar updated this object.
jlebar added a reviewer: timshen.
jlebar added a subscriber: llvm-commits.
timshen added inline comments.Oct 17 2016, 1:45 PM
llvm/include/llvm/ADT/CachedHashString.h
9–10

CachedHashString is not like std::string, because it doesn't support operations like operator+=. :)

24

It's not used, is it?

92

It is unfortunate to implement some of the std::string functionalities from scratch. Is there a way to implement it in terms of std::string?

timshen added inline comments.Oct 17 2016, 1:50 PM
llvm/include/llvm/ADT/CachedHashString.h
92

More specifically, I'm asking for a member function string& val(); that returns the underlying std::string object (it needs to exist first), so that CachedHashString compose better with other APIs that don't care about hashing.

jlebar added inline comments.Oct 17 2016, 1:55 PM
llvm/include/llvm/ADT/CachedHashString.h
92

We could do this, but at the cost of more than doubling its storage.

Right now this class takes two words of space on 64-bit platforms. sizeof(std::string) is 32 bytes (with whatever stdlib gcc.godbolt.org uses), so already 4 words. Plus we'd need extra space for storing the two is-empty and is-tombstone bits. (We could also use this space for storing the hash code.) So our modified class would be 40 bytes, instead of 16.

That's pretty significant for a class that's meant to be stored in open-addressing hashtables.

I agree it's unfortunate -- if you have a better idea I'm all ears.

jlebar updated this revision to Diff 74921.Oct 17 2016, 3:22 PM
jlebar marked 2 inline comments as done.

Address Tim's comments.

Rafael, Tim and I were looking for a second opinion -- does this seem to you like a sane thing to do? The immediate problem I'm trying to solve here is to convert code that's using SmallSetVector<std::string>, which doesn't work after converting SmallSetVector to use DenseSet, because you can't store an std::string in a DenseSet.

Longer term, I hope to use something like this class to let us replace StringSet, StringMap, &co, but what we end up with in order to make that work may look quite different from this. I was just planning to change it as necessary.

rafael added inline comments.Oct 19 2016, 5:28 AM
llvm/include/llvm/ADT/CachedHashString.h
165

It is common practice to declare == out of class.

Do you still need if you declare a StringRef() operator?

jlebar updated this revision to Diff 75214.Oct 19 2016, 2:00 PM

Address Rafael's comments (get rid of explicit operators in favor of implicit
conversions to StringPiece).

jlebar marked an inline comment as done.Oct 19 2016, 2:02 PM

Thank you for the review, Rafael. I like using implicit casts -- this is better!

Couldn't CachedHashString just wrap a CachedHashStringRef?

Or even have a single implementation:

template<bool Owning>
class CachedHashStringRefBase {
  ...

  ~CachedHashStringRefBase() {
    if (owning) delete...
}
using CachedHashStringRef = CachedHashStringRefBase<false>;
using CachedHashString = CachedHashStringRefBase<true>;
rafael accepted this revision.Oct 19 2016, 3:21 PM
rafael edited edge metadata.

LGTM.

I am not c++ expert, but this is probably good enough for post commit review.

This revision is now accepted and ready to land.Oct 19 2016, 3:21 PM
timshen added inline comments.Oct 19 2016, 3:32 PM
llvm/include/llvm/ADT/CachedHashString.h
117

How do you feel about this:

if (this != Other) {
  ~CachedHashString();
  new (this) CachedHashString(Other);
}
return *this;

Likewise for the move assign.

dblaikie added inline comments.
llvm/include/llvm/ADT/CachedHashString.h
117

You can potentially just use the unified copy/move op roughly:

foo &operator=(foo f) {
  swap(*this, f);
}

(you can actually implement the assignment here rather than use a custom swap - then rely on std::swap to be cheap enough - or the other way around, whichever's better/worth it)

& then you do need to implement copy and move ctors, but you've already got those.

jlebar added inline comments.Oct 21 2016, 1:11 PM
llvm/include/llvm/ADT/CachedHashString.h
117

I like this, because the swap function is simple to write. It lets me avoid the problem of blowing up on

CachedHashString x;
x = x;

which I had before.

Thank you for the suggestion!

jlebar updated this revision to Diff 75471.Oct 21 2016, 1:11 PM
jlebar edited edge metadata.

Use swap to implement copy assignment and move.

jlebar marked 3 inline comments as done.Oct 21 2016, 1:12 PM
This revision was automatically updated to reflect the committed changes.

Couldn't CachedHashString just wrap a CachedHashStringRef?

Or even have a single implementation:

template<bool Owning>
class CachedHashStringRefBase {
  ...

  ~CachedHashStringRefBase() {
    if (owning) delete...
}
using CachedHashStringRef = CachedHashStringRefBase<false>;
using CachedHashString = CachedHashStringRefBase<true>;

Ping?

Couldn't CachedHashString just wrap a CachedHashStringRef?

Or even have a single implementation:

template<bool Owning>
class CachedHashStringRefBase {
  ...

  ~CachedHashStringRefBase() {
    if (owning) delete...
}
using CachedHashStringRef = CachedHashStringRefBase<false>;
using CachedHashString = CachedHashStringRefBase<true>;

Ping?

Sorry, forgot to respond to this.

I don't think this is a good idea, because the two classes do not have the same API. In particular, CachedHashString has a conversion to CachedHashStringRef. We would have to enable-if this away.

Also the bodies of about half of the functions would be different. At this point we would not be gaining much from this additional layer of indirection.