This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Remove UB in std::list
AbandonedPublic

Authored by ldionne on Mar 30 2021, 3:31 PM.

Details

Reviewers
None
Group Reviewers
Restricted Project
Summary

We create list nodes by allocating them, but we never actually call the
constructor of the list node itself - we only construct its subobjects
one at a time. This patch fixes that by making sure that we construct
the list node itself. This is a different attempt to fix the underlying
issue behind D98750.

Diff Detail

Event Timeline

ldionne requested review of this revision.Mar 30 2021, 3:31 PM
ldionne created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2021, 3:31 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
Quuxplusone added inline comments.
libcxx/include/list
273

I wonder whether you want to have a char in this union too, as the first element. (Have I been cargo-culting that idiom around? Godbolt seems to indicate that maybe the rules changed between '03 and '11?)

Also, of course, you never call the destructor of _Tp anywhere anymore, so I really hope some tests fail on buildkite. ;)

1122

As I understand it, there is no UB here as of C++20: this is the main point of @rsmith 's P0593 "Implicit creation of objects for low-level object manipulation". When we get the memory for a node from the allocator, the allocator ultimately gets it from some operation that "implicitly creates objects." Essentially, the C++20 compiler is required to assume that malloc might be implemented as

void *malloc(size_t n) {
    __node *p = real_malloc(n);
    ::new (p) __node();
    p->__value_.~_Tp();
    return p;
}

(I mean, the compiler can't prove malloc isn't implemented that way!) So libc++'s code is allowed to assume that the implicitly created object was in fact created, and the compiler isn't allowed to optimize as-if the object wasn't created. Thus: no UB. (Compiler vendors are, technically, allowed to follow a different object model in C++03/11/14/17 mode; but I would be shocked.)

I describe this "compiler magic" also in "Trivially Relocatable", using std::list specifically as my example.

ldionne added inline comments.Mar 31 2021, 9:49 AM
libcxx/include/list
273

I'm not aware of the char-in-the-union idiom, can you explain what it does?

Also, we do call the destructor for _Tp in __list_imp<_Tp, _Alloc>::clear(). I haven't fully audited the code, but I don't think we called the destructor for __list_node anywhere previously, so adding the union doesn't change anything.

1122

From P0593:

A class S is an implicit-lifetime class if it is an aggregate ([dcl.aggr]) or has at least one trivial, non-deleted constructor and a trivial, non-deleted destructor.

__list_node is not an aggregate (it has a user-declared constructor). It doesn't have a trivial non-deleted constructor AFAICT either, so it's not an implicit-lifetime type. Hence, my understanding is that no object is implicitly created, and it's still UB. I could be wrong, but that's my understanding.

libcxx/include/list
273

Ah, I get it. Okay.
The char-in-the-union idiom is just union { char __dummy_; _Tp __value_; }; — put a trivial type as the first member of the union. libc++ does that in __optional_destruct_base and also in __union (in <variant>). But on slightly closer inspection, I actually can't figure out what the purpose of that dummy char is. I thought it was to make the union default-constructible and/or give it a non-deleted destructor, but that actually doesn't seem to be the case.
I may poke into this a bit more, but basically I don't know what the char-in-the-union idiom is for anymore! Hence, maybe I've been cargo-culting it into places where it's unnecessary.

1122

Hm. More likely I'm wrong.
We could easily give __list_node at least one trivial, non-deleted constructor, right?
I'm not sure if it's possible to give __list_node a trivial, non-deleted destructor. (I think this interacts with my confusion about the char-in-union thing. I don't know the union rules well.)

Hi, are there updates to this? Just wanted to mention here that it fixes the issue with missing debug info so it'd be great to have.

Hi, are there updates to this? Just wanted to mention here that it fixes the issue with missing debug info so it'd be great to have.

Sorry, I need to focus on Ranges right now and since this is a bit lower priority, it'll take a bit of time before I can go back to this. I'd very much welcome it if someone (perhaps you?) could pick it up. If so, I'll review it.

Yep, happy to pick this up and also port it to the other types-

About this patch, are there still concerns about it? IIUC the main concern is whether the original code was UB or not, which is asking whether __list_node is an implicit-lifetime type (or could be made to be one)?

@akhuang wrote:

Are there still concerns about [this patch]? IIUC the main concern is whether the original code was UB or not, which is asking whether __list_node is an implicit-lifetime type (or could be made to be one)?

I've been persuaded that __list_node is not (currently) an implicit-lifetime type, and therefore the original code does have UB. (I'm still not 100% sure, but I'm no longer arguing against the premise.) I'm also not sure if it's possible to make __list_node an implicit-lifetime type, but again I'm willing to assume it's not possible, and therefore (1) yes there's UB, and (2) yes this is a good direction to solve it.

@akhuang wrote:

Are there still concerns about [this patch]? IIUC the main concern is whether the original code was UB or not, which is asking whether __list_node is an implicit-lifetime type (or could be made to be one)?

I've been persuaded that __list_node is not (currently) an implicit-lifetime type, and therefore the original code does have UB. (I'm still not 100% sure, but I'm no longer arguing against the premise.) I'm also not sure if it's possible to make __list_node an implicit-lifetime type, but again I'm willing to assume it's not possible, and therefore (1) yes there's UB, and (2) yes this is a good direction to solve it.

Ah, ok. I don't know for sure about the implicit lifetime-ness of the type either, although I assumed that it was just not a trivially constructible class. Anyway, should we get more input on this or is it good?

For context, the debug info thing that this affects currently applies to classes that are not aggregate and don't have trivial default ctors - so it's assuming that those types of classes are not implicit lifetime and need to be constructed.

@akhuang As Arthur said, I do think this is the right way to solve the problem, but we have to find a way to make it work in C++03, which has different rules about what can appear in a union apparently (didn't know that until just now). As I said, I won't have much time to devote to this in the short term, so feel free to commandeer.

@akhuang As Arthur said, I do think this is the right way to solve the problem, but we have to find a way to make it work in C++03, which has different rules about what can appear in a union apparently (didn't know that until just now). As I said, I won't have much time to devote to this in the short term, so feel free to commandeer.

Ohh, I see, didn't realize there was an issue with c++03 --

Don't know of a good way to make it work in c++03 (aside from a suggestion to use a char buffer for __value_ and put reinterpret_casts everywhere). Overall the change seems less nice, but I just made a patch that does it here: https://reviews.llvm.org/D101206

Also not sure if this code is actually ok, but the tests seem to pass

ldionne abandoned this revision.Sep 6 2023, 10:49 AM

Abandoning in favour of completing D101206.

Herald added a project: Restricted Project. · View Herald TranscriptSep 6 2023, 10:49 AM