This is an archive of the discontinued LLVM Phabricator instance.

[libcxx] Attempt to fix undefined behavior in list, forward_list and __tree.
AbandonedPublic

Authored by EricWF on Jan 14 2015, 11:59 AM.

Details

Summary

I'm not 100% confident in this patch yet but because of the time crunch with the 3.6 release I'm posting it early.

The patch works by using static_cast on fancy class-like pointers and reinterpret_cast on raw pointers. It also changes it so that when the base node class is needed at a call site then it gets the base pointer directly instead of the derived pointer.

I'll do more work on this and actually convince myself of its correctness.

Diff Detail

Event Timeline

EricWF updated this revision to Diff 18172.Jan 14 2015, 11:59 AM
EricWF retitled this revision from to [libcxx] Attempt to fix undefined behavior in list, forward_list and __tree..
EricWF updated this object.
EricWF edited the test plan for this revision. (Show Details)
EricWF added a subscriber: Unknown Object (MLST).
rsmith edited edge metadata.Jan 14 2015, 2:01 PM

It seems to me that the right thing to do is to fix __tree and __tree_iterator to use the appropriate pointer type for all stored pointers. That is, __tree_node_base::__right_, __tree_iterator::__ptr_, __tree_const_iterator::__ptr_, and __tree::__begin_node_ should be pointers to __tree_end_node, because they might point to a __tree_end_node that is not a __tree_node_base.

As-is, the code *still* has undefined behavior in the case where the element type has a higher alignment requirement than that of a pointer, because you will potentially use a node_pointer to point to an element that is not suitably aligned to be represented as such a pointer value. This problem doesn't arise if you only use a pointer-to-T type to point at a T object.

include/memory
5452–5458

It seems strange to use static_cast here and reinterpret_cast below. Even assuming the reinterpret_cast approach works, don't we still have the same problem if the user uses a fancy pointer type?

awi added a subscriber: awi.Jan 20 2015, 5:45 AM

Strange, I can't see much difference in include/list. Only list::end() (with and without const) and __list_node_base::__self() have been changed and use the new _FancyPointerCaster<>, but all this does is replace the old static_casts with reinterpret_casts, if the allocater defines pointer to be a built-in pointer. If it is a fancy pointer, your patch doesn't change anything at all! How is that supposed to fix the bug?
I've just checked your patch with my two example files. As I expected, main.cpp works, with or without your patch, and main2.cpp doesn't work, with or without your patch. I'm sorry.

By the way: what is the purpose of _FancyPointerCaster<>? I don't quite understand what you need it for, because all it does is choose whether to use a static_cast or a reinterpret_cast.

The class __tree seems more complicated and uses three instead of two node classes: __tree_node and __tree_node_base work like their counterparts in list, and then there is __tree_end_node for the end node that is the parent of the tree's root node and the target of the end iterator. As far as I understand it, a __tree_node_base is never created: all regular nodes are __tree_nodes and the end node is a __tree_end_node. Therefore I don't expect any aliasing problems between __tree_node and __tree_node_base, only between __tree_node and __tree_end_node.
Your patch introduces a new function __tree::__end_node_pointer() that returns the end node as a pointer of that base class __tree_end_node, and it calls that function whenever it needs to access the end node. This looks good to me and should solve some strict aliasing problems.
Alas, not all of them! The end iterator is a __tree_iterator and thus points to the end node with a __tree_node_pointer. The operator--() casts these to __node_base_pointer first, but not to __end_node_ptr, so the very moment it accesses the __left_ field, it breaks the strict aliasing rule again (operator++() works the same way, but this is fine, because it would be an error in the application to call operator++() on the end node). The same is true for __tree_node_base::__parent_, whose type points to a __tree_node_base, although the parent of the root node is the end node. Note, that it's safe for __left_ and __right_ to point to __tree_node_base, because due to the structure of the tree they can never point to the end node: only __parent_ and the end iterator can, in addition to all functions that call __end_node() or __end_node_pointer().
Unfortunately, I cannot present you a counterexample this time: that bug seems to be much harder to trigger in __tree than in std::list.

Finally, I've had a look into std::forward_list. Similar to __tree, you have replaced some calls to __before_begin() with calls to a new function __before_begin_pointer(), which returns the __before_begin node as a __begin_node_pointer instead of __node_pointer.
In my opinion, this is more appropriate and may solve some strict aliasing problems, but not all of them. The function forward_list::before_begin() still returns a __forward_list_iterator<__node_pointer> that contains a pointer to a __forward_list_node, although it actually points to a __forward_begin_node instead. Dereferencing that iterator violates strict aliasing, as can be seen from the fourth attachment to the bug entry: It fails, with or without your patch.

EricWF abandoned this revision.Sep 15 2016, 3:29 PM

Abandoning. All issues have been addressed by other commits.

include/memory
5452–5458

Yeah, I couldn't find a better way to convert fancy pointers. Perhaps we need to cast to a void pointer as a go between.