This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Refactor DWARFDie::iterator to prevent UB
AbandonedPublic

Authored by JDevlieghere on Jul 12 2018, 7:07 AM.

Details

Summary

The DWARFDie is a lightweight utility wrapper that stores a pointer to a compile unit and a debug info entry. Currently, its iterator (used for walking over its children) stores a DWARFDie and returns a const reference when dereferencing it. However, when the iterator is modified (by incrementing or decrementing it), this reference becomes invalid. This was happening when calling reverse on it, because the std::reverse_iterator is keeping a temporary copy of the iterator (see https://en.cppreference.com/w/cpp/iterator/reverse_iterator for a good illustration).

reference operator*() const {_Iter __tmp = current; return *--__tmp;}

When dereferencing the reverse iterator, we decrement and return a reference to a DWARFDie stored in the stack frame of this function, resulting in UB at runtime.

This patch changes the DWARFDie::iterator into something that looks like an iterator, but actually isn't. When dereferencing it, we return a copy of the DWARFDie instead. This means we can't have operator->, but at least we don't risk invalidating the iterator by modifying it.

Diff Detail

Event Timeline

JDevlieghere created this revision.Jul 12 2018, 7:07 AM
friss added a comment.Jul 12 2018, 8:32 AM

I suppose this works, but I'll defer to Dave on what's best here.

  • Continue using the iterator facade.
  • Explicitly remove operator->() to make our intentions clear.

@rsmith - Richard, I'd be curious for your opinion here. I've been casting through the runes of the standard for C++ iterator semantics/guarantees, and I can't find where the iterator requirements are that makes std::reverse_iterator's behavior valid (basically relying on the underlying sequence having indefinitely persistent storage - that a pointer/reference to an element is valid even after the iterator that pointer/reference was obtained from goes out of scope). And whether there's some other solution - a common/predefined type trait for saying "hey this isn't valid" or "this is the reverse iterator type for this sequence" (so llvm::reverse(Range) could use that trait to decide how to get reverse iterators from the forward ones) or if we should create such a trait (though I guess it'd still be problematic for any other iterator adapters having to somehow forward through/expose such details... :/)?

[forward.iterators]: "Two dereferenceable iterators a and b of type X offer the multi-pass guarantee if [...] the expression (void)++X(a), *a is equivalent to the expression *a." [reverse.iter.requirements]: "The template parameter Iterator shall meet all the requirements of a Bidirectional Iterator". So it's UB to use std::reverse_iterator with your iterator.

There isn't any standard way to handle this situation; libcxx has an internal hack for standard library types (https://reviews.llvm.org/rL300164).

[forward.iterators]: "Two dereferenceable iterators a and b of type X offer the multi-pass guarantee if [...] the expression (void)++X(a), *a is equivalent to the expression *a." [reverse.iter.requirements]: "The template parameter Iterator shall meet all the requirements of a Bidirectional Iterator". So it's UB to use std::reverse_iterator with your iterator.

There isn't any standard way to handle this situation; libcxx has an internal hack for standard library types (https://reviews.llvm.org/rL300164).

Hmm - thanks for the pointers/words, Eli! Though I'm not sure this applies in our case. I /think/ the DIE child iterator does meet this definition of multi-pass guarantee. Making a copy of the iterator and incrementing it does not affect the original iterator - which can still be dereferenced and yield the same value.

Ah, there it is: "If a and b are both dereferenceable, then a == b if and only if *a and *b are bound to the same object." - this is a feature of Forward Iterators that the DWARF iterator does not/cannot (practically) meet (because it returns a reference to an element stashed in the iterator - so though the iterators are ==, *a and *b are bound to different (though equivalent) objects).

Hrm. So technically it only meets the requirements of input iterator - which would cause it to fail to compile with llvm::reverse. That seems OK-ish. Yep, check-llvm still compiles if this is only an input iterator.

So we're still left with those general choices - coudl just say that llvm::reverse doesn't support these sort of iterators (like std::reverse_iterator doesn't) (& in this case, could have DWARFDie expose a "reverse children" accessor, "rchildren()" perhaps) or provide a trait that exposes a specific way to get a supported reverse iterator if an input_iterator can support it. I'm open to either, really. If we go the simpler route (rchildren) then if at some point in the future we find more and more cases like this, we can always add the trait/generalize things.

Oh, bother. I tried adding an llvm::reverse call over DWARFDie.children() and it still compiled, even though I made DWARFDie::iterator an input_iterator - so I guess std::reverse_iterator isn't checkingf the trait? :/ That's unfortunate.

labath added a subscriber: labath.Jul 17 2018, 2:04 AM

"this is the reverse iterator type for this sequence"

One way to do that would be to specialize std::reverse_iterator<DWARFDie::iterator> to do "the right thing".

I've put up https://reviews.llvm.org/D49679 as an alternative to this patch. Personally I don't like the caveat that you can't store the DWARFDie the iterators point to, but on the other hand I don't like returning by value either. Please let me know which you prefer.

JDevlieghere abandoned this revision.Aug 1 2018, 6:26 AM

Landed alternative from D49679