This is an archive of the discontinued LLVM Phabricator instance.

[ADT] Add CachedHashString.
ClosedPublic

Authored by jlebar on Oct 15 2016, 3:08 PM.

Details

Summary

This is like CachedHashStringRef, but owns its data.

This lets us use strings inside of DenseMaps.

Diff Detail

Repository
rL LLVM

Event Timeline

jlebar updated this revision to Diff 74773.Oct 15 2016, 3:08 PM
jlebar retitled this revision from to [ADT] Add CachedHashString..
jlebar updated this object.
jlebar added a reviewer: timshen.
jlebar added a subscriber: llvm-commits.
timshen added inline comments.Oct 17 2016, 1:45 PM
llvm/include/llvm/ADT/CachedHashString.h
11 ↗(On Diff #74773)

CachedHashString is not like std::string, because it doesn't support operations like operator+=. :)

24 ↗(On Diff #74773)

It's not used, is it?

94 ↗(On Diff #74773)

It is unfortunate to implement some of the std::string functionalities from scratch. Is there a way to implement it in terms of std::string?

timshen added inline comments.Oct 17 2016, 1:50 PM
llvm/include/llvm/ADT/CachedHashString.h
94 ↗(On Diff #74773)

More specifically, I'm asking for a member function string& val(); that returns the underlying std::string object (it needs to exist first), so that CachedHashString compose better with other APIs that don't care about hashing.

jlebar added inline comments.Oct 17 2016, 1:55 PM
llvm/include/llvm/ADT/CachedHashString.h
94 ↗(On Diff #74773)

We could do this, but at the cost of more than doubling its storage.

Right now this class takes two words of space on 64-bit platforms. sizeof(std::string) is 32 bytes (with whatever stdlib gcc.godbolt.org uses), so already 4 words. Plus we'd need extra space for storing the two is-empty and is-tombstone bits. (We could also use this space for storing the hash code.) So our modified class would be 40 bytes, instead of 16.

That's pretty significant for a class that's meant to be stored in open-addressing hashtables.

I agree it's unfortunate -- if you have a better idea I'm all ears.

jlebar updated this revision to Diff 74921.Oct 17 2016, 3:22 PM
jlebar marked 2 inline comments as done.

Address Tim's comments.

Rafael, Tim and I were looking for a second opinion -- does this seem to you like a sane thing to do? The immediate problem I'm trying to solve here is to convert code that's using SmallSetVector<std::string>, which doesn't work after converting SmallSetVector to use DenseSet, because you can't store an std::string in a DenseSet.

Longer term, I hope to use something like this class to let us replace StringSet, StringMap, &co, but what we end up with in order to make that work may look quite different from this. I was just planning to change it as necessary.

rafael added inline comments.Oct 19 2016, 5:28 AM
llvm/include/llvm/ADT/CachedHashString.h
166 ↗(On Diff #74921)

It is common practice to declare == out of class.

Do you still need if you declare a StringRef() operator?

jlebar updated this revision to Diff 75214.Oct 19 2016, 2:00 PM

Address Rafael's comments (get rid of explicit operators in favor of implicit
conversions to StringPiece).

jlebar marked an inline comment as done.Oct 19 2016, 2:02 PM

Thank you for the review, Rafael. I like using implicit casts -- this is better!

Couldn't CachedHashString just wrap a CachedHashStringRef?

Or even have a single implementation:

template<bool Owning>
class CachedHashStringRefBase {
  ...

  ~CachedHashStringRefBase() {
    if (owning) delete...
}
using CachedHashStringRef = CachedHashStringRefBase<false>;
using CachedHashString = CachedHashStringRefBase<true>;
rafael accepted this revision.Oct 19 2016, 3:21 PM
rafael edited edge metadata.

LGTM.

I am not c++ expert, but this is probably good enough for post commit review.

This revision is now accepted and ready to land.Oct 19 2016, 3:21 PM
timshen added inline comments.Oct 19 2016, 3:32 PM
llvm/include/llvm/ADT/CachedHashString.h
117 ↗(On Diff #75214)

How do you feel about this:

if (this != Other) {
  ~CachedHashString();
  new (this) CachedHashString(Other);
}
return *this;

Likewise for the move assign.

dblaikie added inline comments.
llvm/include/llvm/ADT/CachedHashString.h
117 ↗(On Diff #75214)

You can potentially just use the unified copy/move op roughly:

foo &operator=(foo f) {
  swap(*this, f);
}

(you can actually implement the assignment here rather than use a custom swap - then rely on std::swap to be cheap enough - or the other way around, whichever's better/worth it)

& then you do need to implement copy and move ctors, but you've already got those.

jlebar added inline comments.Oct 21 2016, 1:11 PM
llvm/include/llvm/ADT/CachedHashString.h
117 ↗(On Diff #75214)

I like this, because the swap function is simple to write. It lets me avoid the problem of blowing up on

CachedHashString x;
x = x;

which I had before.

Thank you for the suggestion!

jlebar updated this revision to Diff 75471.Oct 21 2016, 1:11 PM
jlebar edited edge metadata.

Use swap to implement copy assignment and move.

jlebar marked 3 inline comments as done.Oct 21 2016, 1:12 PM
This revision was automatically updated to reflect the committed changes.

Couldn't CachedHashString just wrap a CachedHashStringRef?

Or even have a single implementation:

template<bool Owning>
class CachedHashStringRefBase {
  ...

  ~CachedHashStringRefBase() {
    if (owning) delete...
}
using CachedHashStringRef = CachedHashStringRefBase<false>;
using CachedHashString = CachedHashStringRefBase<true>;

Ping?

Couldn't CachedHashString just wrap a CachedHashStringRef?

Or even have a single implementation:

template<bool Owning>
class CachedHashStringRefBase {
  ...

  ~CachedHashStringRefBase() {
    if (owning) delete...
}
using CachedHashStringRef = CachedHashStringRefBase<false>;
using CachedHashString = CachedHashStringRefBase<true>;

Ping?

Sorry, forgot to respond to this.

I don't think this is a good idea, because the two classes do not have the same API. In particular, CachedHashString has a conversion to CachedHashStringRef. We would have to enable-if this away.

Also the bodies of about half of the functions would be different. At this point we would not be gaining much from this additional layer of indirection.