This is an archive of the discontinued LLVM Phabricator instance.

[ADT] ImmutableList no longer requires elements to be copy constructible
ClosedPublic

Authored by Szelethus on Jul 30 2018, 7:03 AM.

Details

Summary

I'm refactoring my Static Analyzer checker UninitializedObjectChecker, and I need to store lightweight polymorphic objects in llvm::ImmutableList. I wish to use std::unique_ptrs, but llvm::ImmutableList requires it's elements to be copy constructible for no good reason. This patch aims to fix this.

Diff Detail

Event Timeline

Szelethus created this revision.Jul 30 2018, 7:03 AM
Szelethus edited the summary of this revision. (Show Details)Jul 30 2018, 7:32 AM

Test coverage?

(& also, maybe rather than modifying concat's signature (& causing its
arguments to reverse in a way that might be confusing), just take the
argument by value and move into the destination?)

& does changing these signatures (of concat and add) not break any existing
code?

NoQ added a comment.Jul 30 2018, 2:03 PM

I strongly support argument order swap because it's consistent with other immutable data structures.

Not sure if i support turning add into some sort of emplace. I'd rather just restrict it to a move-constructor by accepting a single rvalue, because it'll make caller code easy to understand. Maybe add another emplace-like method if you need it. I think it's an important distinction to make.

Test coverage?
...
& does changing these signatures (of concat and add) not break any existing
code?

This is used only in clang and there's D49986 and there should be no functional change here.

Not sure why didn't we ever write unit tests for these data structures. But it should more or less work as tested by clang lit tests.

In D49985#1181018, @NoQ wrote:

I strongly support argument order swap because it's consistent with other immutable data structures.

Not sure I follow - which examples of argument order are you comparing here?

It looks like concat orders the arguments in the same way that the output would be (so concat(X, list) produces [X, list]) - so preserving that argument order seems like it improves/retains readability compared to switching them around so 'concat(list, X)' produces '[X, list]'.

Test coverage?
...
& does changing these signatures (of concat and add) not break any existing
code?

This is used only in clang and there's D49986 and there should be no functional change here.

There's a functional change as described by the patch description - this data structure is getting new features/functionality and that can be unit tested. Also, it's best to test it in LLVM so that it doesn't get broken by LLVM developers who may not be compiling/testing Clang.

Not sure why didn't we ever write unit tests for these data structures. But it should more or less work as tested by clang lit tests.

Yeah, often the older corners don't have unit tests - but when they're improved/changed, it's a good chance to correct that & start putting in unit tests. The overhead/work involved in making the first unit test isn't high enough that I feel like it's a great imposition to ask the next developer to start it.

NoQ added a comment.Jul 30 2018, 6:42 PM
In D49985#1181018, @NoQ wrote:

I strongly support argument order swap because it's consistent with other immutable data structures.

Not sure I follow - which examples of argument order are you comparing here?

ImmutableMap and ImmutableSet both have add(Collection, Item) argument order, but ImmutableList writes the same thing as add(Item, Collection).

It looks like concat orders the arguments in the same way that the output would be (so concat(X, list) produces [X, list]) - so preserving that argument order seems like it improves/retains readability compared to switching them around so 'concat(list, X)' produces '[X, list]'.

Yeah, i guess that might have been the motivation behind such inconsistency. I'll be fine with fixing the order for other data structures.

In D49985#1181018, @NoQ wrote:

Test coverage?
...
& does changing these signatures (of concat and add) not break any existing
code?

This is used only in clang and there's D49986 and there should be no functional change here.

There's a functional change as described by the patch description - this data structure is getting new features/functionality and that can be unit tested. Also, it's best to test it in LLVM so that it doesn't get broken by LLVM developers who may not be compiling/testing Clang.

Yup, that makes sense.

Szelethus updated this revision to Diff 158743.EditedAug 2 2018, 6:28 AM

Added a new emplace method, and the rest of the factory methods now take the data argument by value.
I also added a unittest file. It does contain code unrelated to this patch, but since the file didn't exist, I though it's okay to hit two birds with one stone.

On a somewhat unrelated note, is there a need for concat to be public? Also, to me, when I think if the word concatenation, I would first think that it would work similar to std::list<T>::append().

Szelethus updated this revision to Diff 158764.Aug 2 2018, 8:11 AM

Forgot -U9999.

I ran into a serious problem. It seems like ImmutableList doesn't run the destructor for std::unique_ptrs or std::shared_ptrs:

struct Obj {
  ~Obj() { llvm_unreachable(""); } // never called
};

using ObjRef = std::unique_ptr<Obj>; // same with std::shared_ptr

namespace llvm {

// Specializing FoldingSetTrait so we can store ObjRef objects in ImmutableList.
template <>
struct FoldingSetTrait<ObjRef> : public DefaultFoldingSetTrait<ObjRef> {

  static void Profile(const ObjRef &FN, FoldingSetNodeID &ID) {
    ID.AddPointer(FN.get());
  }
};

} // end of namespace llvm

TEST_F(ImmutableListTest, UniquePtrTest) {
  ImmutableList<ObjRef>::Factory f;
  ImmutableList<ObjRef> L = f.create(ObjRef(new Obj));
}

What is very interesting though, a simple wrapper works perfectly:

struct ObjRef {
  Obj *ptr;
  ObjRef(Obj *ptr) : ptr(ptr) {}
  ~ObjRef() { delete ptr; }
  Obj *get() const { return ptr; }
};
// llvm_unreachable is actually reached in Obj::~Obj().

I've been stuck on this issue for a while now. Do you know anything about why this could happen?

requires it's elements to be copy constructible for no good reason
It seems like ImmutableList doesn't run the destructor for std::unique_ptrs

Seems there was a reason: and ImmutableList and its members are assumed to be a POD (plain old datatype), which do not need destructors.

E.g. see the code at the bottom of ImmutableList.h:

template <typename T> struct isPodLike;
template <typename T>
struct isPodLike<ImmutableList<T>> { static const bool value = true; };

@NoQ might want to correct me here, but I'm not sure how achievable is your use case, or whether it makes sense.
The point of functional ImmutableList is that you can copy them by value at O(1) cost everywhere.
If your list contains unique_ptr you can no longer copy it at all without violating unique_ptr semantics.

Would it make more sense to store references in ImmutableList for your use case instead?

NoQ added a comment.Aug 6 2018, 11:11 AM

What is very interesting though, a simple wrapper works perfectly:

struct ObjRef {
  Obj *ptr;
  ObjRef(Obj *ptr) : ptr(ptr) {}
  ~ObjRef() { delete ptr; }
  Obj *get() const { return ptr; }
};
// llvm_unreachable is actually reached in Obj::~Obj().

I've been stuck on this issue for a while now. Do you know anything about why this could happen?

My guess is, in your wrapper code the destructor for the temporary ObjRef at the end of the full-expression f.create(ObjRef(new Obj)) deletes the object, and the copy of ObjRef within the immutable list now contains a dangling pointer to the object (from which it won't be deleted again because immutable list doesn't call destructors; your code also doesn't dereference the pointer).

Your custom wrapper doesn't define any reasonable copy/move constructors, so a default copy is used.

NoQ added a comment.Aug 6 2018, 11:46 AM

But generally, yeah, i guess the main problem with putting non-POD objects into the immutable list is that you cannot easily make the factory call destructors when it dies because the allocator within it is only good in providing chunks of memory, not tracking them.

Putting smart pointers into an immutable list doesn't make it non-POD or harder to copy, but we are still unable to recall what we need to destroy when time comes.

Szelethus added a comment.EditedAug 7 2018, 6:28 AM

Thanks for the quick and well detailed replies!

requires it's elements to be copy constructible for no good reason
It seems like ImmutableList doesn't run the destructor for std::unique_ptrs

Seems there was a reason: and ImmutableList and its members are assumed to be a POD (plain old datatype), which do not need destructors.

E.g. see the code at the bottom of ImmutableList.h:

template <typename T> struct isPodLike;
template <typename T>
struct isPodLike<ImmutableList<T>> { static const bool value = true; };

Oh wow. Thank you so much for pointing this out, I was stuck on this one for days. However, to me, it seems like, T doesn't need to be llvm::isPodLike type, just std::is_trivially_destructible.

Would it make more sense to store references in ImmutableList for your use case instead?

Yes, dynamic memory management has a way too great overhead, and since my algorithm relies on recursion, using references should be fine. Great idea!

george.karpenkov requested changes to this revision.Aug 10 2018, 11:06 AM

In any case, tests are great, I really appreciate those, but I think we have established that the overall direction here does not make too much sense (or would require much more invasive changes)

This revision now requires changes to proceed.Aug 10 2018, 11:06 AM
Szelethus edited the summary of this revision. (Show Details)
  • Moved tests to D50646
  • Changed ordering back to how it was
  • Added a static_assert, so only trivially destructible types can be stored in ImmutableList.
include/llvm/ADT/ImmutableList.h
172

Shouldn't we have && here as well for Head?

194–201

And for Data?

In any case, tests are great, I really appreciate those, but I think we have established that the overall direction here does not make too much sense (or would require much more invasive changes)

I still don't see why we wouldn't like to remove the copy constructability restriction, to me the changes for it seem relatively non-invasive. However, can we at least add static_assert(std::is_trivially_destructible<T>::value, ...) to maybe save a developer from falling in the same pit?

Re-uploaded with unittests.

include/llvm/ADT/ImmutableList.h
172

I intentionally left it like this, because one can just std::move to it. That does achieve the same thing, right?

include/llvm/ADT/ImmutableList.h
172

In my understanding, one creates an extra copy, and one does not, but my c++fu is not that strong for templates+references.
I'm pretty sure you want to use universal references here (https://isocpp.org/blog/2012/11/universal-references-in-c11-scott-meyers)

@NoQ might clarify

Polite ping :)

@Szelethus I'm still not sure why do you pass by value instead of using a universal reference.

NoQ added inline comments.Aug 21 2018, 12:42 PM
include/llvm/ADT/ImmutableList.h
172

I intentionally left it like this, because one can just std::move to it. That does achieve the same thing, right?

Nah, it doesn't achieve the same thing.

Reference is always a reference, passing it into a function is always free-of-charge, i.e. never invokes a constructor. Also std::move doesn't invoke move-constructor on its own, it simply marks the object as movable, i.e. move is simply an lvalue-to-xvalue-cast.

But initialization of a value-type argument is a non-elidable constructor, which will be copy-constructor if the object is not movable and move-constructor if the object is movable. So if you omit && for Head, you are losing the benefits of a free-of-charge pass-by-reference and instead have to deal with an actual constructor call.

Szelethus updated this revision to Diff 162665.Aug 27 2018, 5:37 AM

Now using universal references.

This revision is now accepted and ready to land.Aug 27 2018, 10:41 AM
This revision was automatically updated to reflect the committed changes.