This is an archive of the discontinued LLVM Phabricator instance.

RFC: fail-fast iterators for DenseMap
ClosedPublic

Authored by sanjoy on Feb 26 2015, 10:28 PM.

Details

Summary

This patch is an attempt at making DenseMapIterators "fail-fast".
Fail-fast iterators that have been invalidated due to insertion into
the host DenseMap deterministically trip an assert (in debug mode)
on access, instead of non-deterministically hitting memory corruption
issues.

(This exact same thing can be done for SmallPtrSets, but that is a
later change.)

This change is fairly rough at the moment, but I want to get some
early feedback on the approach and style before I spend too much time
on this.

This change, in its current form, has already helped me fix two
existing bugs:

http://reviews.llvm.org/rL230718
http://reviews.llvm.org/rL230719

Diff Detail

Repository
rL LLVM

Event Timeline

sanjoy updated this revision to Diff 20826.Feb 26 2015, 10:28 PM
sanjoy retitled this revision from to RFC: fail-fast iterators for DenseMap.
sanjoy updated this object.
sanjoy edited the test plan for this revision. (Show Details)
sanjoy added reviewers: ruiu, dexonsmith, dberlin.
sanjoy added a subscriber: Unknown Object (MLST).
ruiu edited edge metadata.Feb 26 2015, 10:40 PM

Is it possible to catch an error caused by an expression like m[x] = m[y]? Generally, the last statement v in auto &v = m[x]; m[newkey] = somevalue; v; is invalid because it uses an invalid reference.

In D7931#131003, @ruiu wrote:

Is it possible to catch an error caused by an expression like m[x] = m[y]?

Do you mean the expression m[x] = m[y] itself is invalid? Or that it is one more way to "clobber" a DenseMap and invalidate all iterators?

Generally, the last statement v in auto &v = m[x]; m[newkey] = somevalue; v; is invalid because it uses an invalid reference.

That's a harder problem. The way I'd catch in the snippet you mentioned would be by getting operator[] return a wrapper object which would, like the iterator, check its validity on access. It would also have to be implicitly castable to the DenseMaps range type. However, I don't think any approach will work with int &v = m[x]; m[y] = foo; use(v) since we lose any possibility of intercepting the use of v as soon as it gets cast to an int &.

chandlerc added inline comments.
include/llvm/ADT/ADTModCountTracker.h
16 ↗(On Diff #20826)

Heh. Pick a name and be consistent?

And please don't repeat 'ADT' in it. We don't do that anywhere else. =]

35–46 ↗(On Diff #20826)

I have to admit, I don't find the naming convention really useful here. Here is my suggested naming. Here are the key points:

  1. I would just use the term epoch. That's what you have.
  2. I would make it say debugging-only on the tin.
  3. I would hint heavily that it is intended to be used as base classes so that the empty base class optimization kicks in.
  4. I would handle the NDEBUG stuff at this layer. See more on why in my comment below.
template <> struct DebugEpochBaseImpl<true> {
  uint64_t Epoch;

  struct HandleBase {
    const uint64_t *Epoch;
    uint64_t InitialEpoch;

    ...

    bool hasEpochAdvanced() const { ... }
  };
};

#ifndef NDEBUG
typedef DebugEpochBaseImpl<true> DebugEpochBase;
#else
typedef DebugEpochBaseImpl<false> DebugEpochBase;
#endif

I'm not 100% on "hasEpochAdvanced" though. Maybe there is a better name there. But I would just name these as general Epoch utilities, and I would merge the parent into the containing template. I think handle or ref is a better model than "child".

include/llvm/ADT/DenseMap.h
52–56 ↗(On Diff #20826)

Hmm...

What do you think about making this be automatically handled by the epoch utility? That is, localize the NDEBUG behavior there. There is no getting around the fact that this makes data-structures ABI-incompatible between NDEBUG and !NDEBUG, so I'd probably just make it a conditional typedef of the implementation class template.

1013 ↗(On Diff #20826)

This still makes the iterator object larger even when the class is empty.

You need to arrange for this to be the base class of the iteartor so that the empty base optimization kicks in (or to be a base class of a "compressed pair" style member type, but that seems much more complicated than we need).

sanjoy updated this revision to Diff 20954.Feb 28 2015, 11:46 PM
sanjoy updated this object.
sanjoy edited edge metadata.

Address Chandler's review.

  • no "ADT" in the file or class names.
  • changed the vocabulary to "Handle" and "Host"
  • suffixed the class names with "Base" and change the DenseMap implementation to use inheritance over composition.
  • handled the NDEBUG-only-ness directly in the header file. Once this was locked down, I could not justify the additional complexity from the template specialization, so I removed that too.
sanjoy updated this revision to Diff 20965.Mar 1 2015, 2:13 PM

Address Duncan's review comments.

The advantage of setting a template argument via a typedef based on NDEBUG is that it ensures these types *mangle* differently which will give a hard error at link time if you mix code. With this implementation, it looks like you'll just get weird miscompiles if TUs have different values of NDEBUG.

I also think you should put the word "debug" in the name to indicate that this cannot be used for epoch-based data structures.

The word "host" doesn't seem to really add much for me at least.

How about: DebugEpochBase and DebugEpochBase::HandleBase?

sanjoy updated this revision to Diff 20979.Mar 1 2015, 10:18 PM

Address Chandler's comments:

  • change vocabulary to DebugEpochBase and DebugEpochBase::HandleBase
  • move the #ifdef to outside the classes since with (1) in place, #ifdef within the classes looks confusing.

Minor nits only. Feel free to submit, but probably submit when you can clean up substantial fallout... I expect this to cause chaos when it lands.

include/llvm/ADT/DenseMap.h
1014 ↗(On Diff #20979)

Is Epoch ever false here? If not, pass by reference?

include/llvm/ADT/EpochTracker.h
25–35 ↗(On Diff #20979)

Looking at what this will do with doxygen and other things, I think it would be better to sink the #ifdef stuff inside the class and method bodies. That will ensure code navigation and such always have a consistent target. A very minor point though.

56 ↗(On Diff #20979)

What do you think about calling this "next"? or "increment"? I don't feel strongly, but "bump" seems fairly imprecise...

chandlerc accepted this revision.Mar 2 2015, 9:49 AM
chandlerc added a reviewer: chandlerc.
This revision is now accepted and ready to land.Mar 2 2015, 9:49 AM

Thanks for the review! I'll land this sometime in the next couple of days.

include/llvm/ADT/DenseMap.h
1014 ↗(On Diff #20979)

Will do.

include/llvm/ADT/EpochTracker.h
25–35 ↗(On Diff #20979)

Frankly, I'd rather have an easy to understand .h file over easy to read doxygen, so I went against too many #ifdefs. I'll push the #ifdefs inside class bodies if people actually complain about bad doxygen. :)

56 ↗(On Diff #20979)

Will change to incrementEpoch

This revision was automatically updated to reflect the committed changes.