Page MenuHomePhabricator

Add a copy constructor to StringMap
ClosedPublic

Authored by hfinkel on Mar 27 2016, 4:52 PM.

Details

Summary

I will soon propose a patch that will require that StringMap have a copy constructor (which it does not currently support). This adds one.

Diff Detail

Repository
rL LLVM

Event Timeline

hfinkel updated this revision to Diff 51755.Mar 27 2016, 4:52 PM
hfinkel retitled this revision from to Add a copy constructor to StringMap.
hfinkel updated this object.
hfinkel added a subscriber: llvm-commits.
mehdi_amini added inline comments.Mar 27 2016, 5:15 PM
include/llvm/ADT/StringMap.h
248 ↗(On Diff #51755)

May be explicit?

261 ↗(On Diff #51755)

I wonder if we really want to keep the tombstones or if it wouldn't be better to reinsert the elements?

272 ↗(On Diff #51755)

The hashes have to be inserted as well I think (after the buckets), I may miss where it happened?

hfinkel added inline comments.Mar 28 2016, 4:04 AM
include/llvm/ADT/StringMap.h
248 ↗(On Diff #51755)

I don't think that I can -- I need the copy-initialization support for:

CustomNameFuncs = TLI.CustomNameFuncs;

in D18513.

261 ↗(On Diff #51755)

I don't have a good feeling for this; it seems like this could mean one of two things:

  1. We start with the same number of buckets as the RHS; in this case keeping the tombstones vs. not just affects whether we write a 0 or -1 into the pointer array for the empty slot (although insertion will prefer that slot over further probing).
  2. Start with a minimal number of buckets. Potentially more memory efficient, but essentially leads to "rehashing" (which in this case means re-probing for all inputs).

What do you prefer?

272 ↗(On Diff #51755)

Yes; that's what the

HashTable[I] = RHSHashTable[I];

does.

mehdi_amini added inline comments.Mar 29 2016, 2:21 PM
include/llvm/ADT/StringMap.h
261 ↗(On Diff #51755)

It seems to me that when looking for an entry in the map, the search stops when it find an empty slot, but continue when there is a tombstone. If I understand your 1), it is just replacing tombstones with empty buckets, but that would break the search for existing entries that are after the tombstones.

The 2) is basically what I was thinking of, eventually allocating the same number of buckets, but indeed reinserting each entry (reusing the existing known hash). Not that I have a strong preference, I was rather wondering if you considered this and what is the rational for one or the other (and if there is a reason, I'd rather have it documented in the code for future reference).
I hope David could chime in with an opinion.

272 ↗(On Diff #51755)

Obviously...

hfinkel added inline comments.Mar 29 2016, 5:53 PM
include/llvm/ADT/StringMap.h
261 ↗(On Diff #51755)

It seems to me that when looking for an entry in the map, the search stops when it find an empty slot, but continue when there is a tombstone. If I understand your 1), it is just replacing tombstones with empty buckets, but that would break the search for existing entries that are after the tombstones.

I believe you're correct; scratch that option.

The 2) is basically what I was thinking of, eventually allocating the same number of buckets, but indeed reinserting each entry (reusing the existing known hash). Not that I have a strong preference, I was rather wondering if you considered this and what is the rational for one or the other (and if there is a reason, I'd rather have it documented in the code for future reference).

My thought was the following: Most StringMaps don't have many tombstones, because many use cases don't ever (or only rarely) delete things from their string maps. As a result, just copying the structure as-is (any tombstones included) will be faster than re-probing for all keys with little downside.

I hope David could chime in with an opinion.

mehdi_amini added inline comments.Mar 29 2016, 5:59 PM
include/llvm/ADT/StringMap.h
261 ↗(On Diff #51755)

Most StringMaps don't have many tombstones, because many use cases don't ever (or only rarely) delete things from their string maps.

That sounds a good argument to me!

mehdi_amini added inline comments.Mar 29 2016, 6:20 PM
include/llvm/ADT/StringMap.h
248 ↗(On Diff #51755)

The issue is only because the copy assignment operator is defined as:

StringMap &operator=(StringMap RHS) {
    StringMapImpl::swap(RHS);
    std::swap(Allocator, RHS.Allocator);
    return *this;
  }

I think we could change it to:

StringMap &operator=(const StringMap &RHS) {
    StringMap Tmp(RHS);
    StringMapImpl::swap(Tmp);
    std::swap(Allocator, Tmp.Allocator);
    return *this;
  }
StringMap &operator=(const StringMap &&RHS) {
    std::swap(Allocator, std::move(RHS.Allocator));
    StringMapImpl::swap(std::move(RHS));
    return *this;
  }

Now the only thing that it buys us is a protection against unexpected/unintended copy construction for a StringMap, is it compelling enough?

hfinkel added inline comments.Mar 29 2016, 8:49 PM
include/llvm/ADT/StringMap.h
248 ↗(On Diff #51755)

Now the only thing that it buys us is a protection against unexpected/unintended copy construction for a StringMap, is it compelling enough?

Personally, I don't see a reason to be so careful about copying a StringMap, more careful, that is, than for other similar data structures. We have non-explicit copy constructors on DenseMap, etc. Everyone needs to be just as careful about copying those. As a user of containers, I tend to find such restrictions annoying. Obviously, people have different opinions on this topic ;)

mehdi_amini added inline comments.Mar 29 2016, 8:56 PM
include/llvm/ADT/StringMap.h
94 ↗(On Diff #51755)

private -> public is a bit rough, could it be protected maybe?

hfinkel added inline comments.Mar 29 2016, 8:59 PM
include/llvm/ADT/StringMap.h
94 ↗(On Diff #51755)

Yes, it is (all of the functions in this block are protected).

mehdi_amini accepted this revision.Mar 29 2016, 9:01 PM
mehdi_amini edited edge metadata.

Thanks for all the answers. LGTM then.

include/llvm/ADT/StringMap.h
94 ↗(On Diff #51755)

OK nevermind then...

This revision is now accepted and ready to land.Mar 29 2016, 9:01 PM
This revision was automatically updated to reflect the committed changes.