This is an archive of the discontinued LLVM Phabricator instance.

[libc++][NFC] Add checks for lifetime issues in classic algorithms.
ClosedPublic

Authored by var-const on Jul 21 2022, 10:18 PM.

Diff Detail

Event Timeline

var-const created this revision.Jul 21 2022, 10:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2022, 10:18 PM
Herald added a subscriber: mgrang. · View Herald Transcript
var-const requested review of this revision.Jul 21 2022, 10:18 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 21 2022, 10:18 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript

Revert local testing changes.

This is a follow-up to https://reviews.llvm.org/D130197 and https://reviews.llvm.org/D130212. I have manually confirmed that these tests would have found the bug fixed by those two patches.

libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
10

Making this support C++03 would be pretty painful because it doesn't support initializer lists.

463

There needs to be a compromise between providing valid inputs for each algorithm and having a separate input for every single function. On one hand, the inputs have to satisfy the preconditions (or else we'd trigger undefined behavior) and exercise many (ideally all) the code paths in the algorithm (because issues are only found if they're triggered). On the other hand, having separate inputs for each algorithm is unmaintainable. The idea here is to split all algorithms into a few groups and provide a set of interesting inputs for each group (e.g. the default set, the sorted set, the partitioned set, etc.).

552

I don't know of a good way to pass a pointer to a template function, so unlike similar tests for range algorithms I'm using lambdas here.

558

I know it's a lot of TODOs, but I'd like to get some feedback before spending time on how to test algorithms with more complicated inputs. This patch already improves coverage significantly.

Rebase on main.

var-const updated this revision to Diff 446744.Jul 22 2022, 1:47 AM

Fix the CI.

I think this is good. But I'd like to clarify the idea behind it. IIUC, the idea is that

  • there is a global object pool of ptrs
  • the class's destructor removes this from the global pool
  • every operation assert this is inside the pool

But if the object destructor has been run, it is UB to call its member function. and since it is UB , those assert in theory is not guaranteed to run

I think this is good. But I'd like to clarify the idea behind it. IIUC, the idea is that

  • there is a global object pool of ptrs
  • the class's destructor removes this from the global pool
  • every operation assert this is inside the pool

But if the object destructor has been run, it is UB to call its member function. and since it is UB , those assert in theory is not guaranteed to run

Yes, it's true that this relies on undefined behavior. However, I think it's the pragmatic thing to do:

  • I can't think of a way to do this that avoids UB, because checking an object after its destructor has run is really the crux of the problem. constexpr checks are great in that regard but unfortunately won't give us full coverage as I mentioned in another comment. So it looks like our alternatives are either having checks that rely on UB or no checks at all (for some cases, like most of std::sort which is the motivating example);
  • I have manually confirmed that it works (in the sense that it does find actual lifetime issues). It would have found both the original bug that triggered the assertion in Chromium and the fact that the first patch with the fix still contained a dangling temporary;
  • I have deliberately written the lifetime checks so that they never dereference this (that's one of the reasons the cache is a static variable). In practice, I presume that member function calls translate to regular function calls with this as the first parameter (simplifying a little). Neither the function code nor the _value_ of the this pointer have a reason to become invalid after the destructor has run. Now, it's possible that some compiler optimization simply prevents the member function from being called since that is supposed to be impossible (for a valid program). I'm very skeptical this could happen in practice, and even if it does, it seems like the worst that could happen is that these tests would miss a lifetime bug. While that would be unfortunate, it's not significantly worse than the status quo which is no checks at all.

Of course, I'd be happy to rewrite this if there's a way to achieve the same or similar coverage without relying on undefined behavior. Unfortunately, I can't think of one -- if you have any ideas, I'm happy to discuss.

Fix the CI, rebase on main.

I think this is good. But I'd like to clarify the idea behind it. IIUC, the idea is that

  • there is a global object pool of ptrs
  • the class's destructor removes this from the global pool
  • every operation assert this is inside the pool

But if the object destructor has been run, it is UB to call its member function. and since it is UB , those assert in theory is not guaranteed to run

Yes, it's true that this relies on undefined behavior. However, I think it's the pragmatic thing to do:

  • I can't think of a way to do this that avoids UB, because checking an object after its destructor has run is really the crux of the problem. constexpr checks are great in that regard but unfortunately won't give us full coverage as I mentioned in another comment. So it looks like our alternatives are either having checks that rely on UB or no checks at all (for some cases, like most of std::sort which is the motivating example);
  • I have manually confirmed that it works (in the sense that it does find actual lifetime issues). It would have found both the original bug that triggered the assertion in Chromium and the fact that the first patch with the fix still contained a dangling temporary;
  • I have deliberately written the lifetime checks so that they never dereference this (that's one of the reasons the cache is a static variable). In practice, I presume that member function calls translate to regular function calls with this as the first parameter (simplifying a little). Neither the function code nor the _value_ of the this pointer have a reason to become invalid after the destructor has run. Now, it's possible that some compiler optimization simply prevents the member function from being called since that is supposed to be impossible (for a valid program). I'm very skeptical this could happen in practice, and even if it does, it seems like the worst that could happen is that these tests would miss a lifetime bug. While that would be unfortunate, it's not significantly worse than the status quo which is no checks at all.

Of course, I'd be happy to rewrite this if there's a way to achieve the same or similar coverage without relying on undefined behavior. Unfortunately, I can't think of one -- if you have any ideas, I'm happy to discuss.

Hi, thanks for the explanation. Undefined behavior is not something very reliable.
My original test has some coverage (would fail on gcc). I am trying to figure out why clang passes the test. See this reproduction example (it is basically my original test plus making Reference destructor to set _i to be nullptr
https://godbolt.org/z/Y3j4GGjxv
It is clear that the behavior of the original __iter_move is already wrong according to clang output. If the program behaves correctly, i should equal to 5 and the program should return 5. But the program return 139 which means something clearly wrong.
However, the assertion assert(i==5) isn't triggered because of the Undefined Behaviour. The compiler doesn't even bother assert as i is just rubbish value.

If we run the test with -fsantize=address, it would catch the issue
https://godbolt.org/z/q7Khc91T7

My original test has some coverage (would fail on gcc). I am trying to figure out why clang passes the test. See this reproduction example (it is basically my original test plus making Reference destructor to set _i to be nullptr

That's exactly my experience with compiler warnings and tools like Asan -- all of them are essentially "best effort" and aren't reliable ("reliable" in the sense "guaranteed to catch 100% of issues"). In fact, when I did the first patch to fix the Chromium issue, Clang could see the dangling temporary in the original version of __iter_move (that returned a dangling value_type) but not in the version from the patch (that returned a dangling reference), even though both were equally UB and essentially the same issue. You encountered a very similar problem where GCC sees a lifetime issue while Clang doesn't. I'm sure there exist counterexamples where Clang would catch something that GCC cannot spot. The point is, these warnings aren't meant to validate that the code is correct -- while their presence almost always indicates a problem, their absence doesn't guarantee there is no problem. Asan similarly cannot catch all issues. To be clear, all these tools are very helpful, but should be used for their intended purpose and not to verify that code is correct.

However, the assertion assert(i==5) isn't triggered because of the Undefined Behaviour. The compiler doesn't even bother assert as i is just rubbish value.

The assertion isn't triggered because the program segfaults on the line that calls iter_move. Unfortunately, Godbolt output doesn't make it very clear, but 139 isn't the value of i, it's the return code of a program upon receiving sigsegv (11, which is the code of the signal, + 128).

From the perspective of the standard, the Godbolt program might be equivalently undefined compared to this patch, but in practice there is a very important difference -- the program dereferences this after the destructor has been run, while this patch doesn't.

To be clear, I would love to do it another way that doesn't require this sort of "this is technically undefined but works in practice" analysis. Unfortunately, I don't really see it. constexpr would be the perfect solution if it were not for the coverage problems. I think that tooling works best as complimentary, not the exclusive means of detecting this sort of issue.

There are some possible mitigations we could do as well. We can make sure this test runs without optimizations, we may restrict it to a certain compiler version if need be, and we could add a fail.cpp test to make sure it actually fails in practice (when encountering memory problems). I think this is overkill, personally, but wouldn't object if you feel strongly about this.

My original test has some coverage (would fail on gcc). I am trying to figure out why clang passes the test. See this reproduction example (it is basically my original test plus making Reference destructor to set _i to be nullptr

That's exactly my experience with compiler warnings and tools like Asan -- all of them are essentially "best effort" and aren't reliable ("reliable" in the sense "guaranteed to catch 100% of issues"). In fact, when I did the first patch to fix the Chromium issue, Clang could see the dangling temporary in the original version of __iter_move (that returned a dangling value_type) but not in the version from the patch (that returned a dangling reference), even though both were equally UB and essentially the same issue. You encountered a very similar problem where GCC sees a lifetime issue while Clang doesn't. I'm sure there exist counterexamples where Clang would catch something that GCC cannot spot. The point is, these warnings aren't meant to validate that the code is correct -- while their presence almost always indicates a problem, their absence doesn't guarantee there is no problem. Asan similarly cannot catch all issues. To be clear, all these tools are very helpful, but should be used for their intended purpose and not to verify that code is correct.

However, the assertion assert(i==5) isn't triggered because of the Undefined Behaviour. The compiler doesn't even bother assert as i is just rubbish value.

The assertion isn't triggered because the program segfaults on the line that calls iter_move. Unfortunately, Godbolt output doesn't make it very clear, but 139 isn't the value of i, it's the return code of a program upon receiving sigsegv (11, which is the code of the signal, + 128).

From the perspective of the standard, the Godbolt program might be equivalently undefined compared to this patch, but in practice there is a very important difference -- the program dereferences this after the destructor has been run, while this patch doesn't.

To be clear, I would love to do it another way that doesn't require this sort of "this is technically undefined but works in practice" analysis. Unfortunately, I don't really see it. constexpr would be the perfect solution if it were not for the coverage problems. I think that tooling works best as complimentary, not the exclusive means of detecting this sort of issue.

There are some possible mitigations we could do as well. We can make sure this test runs without optimizations, we may restrict it to a certain compiler version if need be, and we could add a fail.cpp test to make sure it actually fails in practice (when encountering memory problems). I think this is overkill, personally, but wouldn't object if you feel strongly about this.

Thanks for the explanation. I don't think I have strong opinions on this. I just feel that it is a bit complicated but at the same time it also relies on UB. perhaps we could try a simpler approach, e.g. the destructor sets the ptr to nulltpr so that it can trigger segv as you explained.

Thanks for working on this!

In general I'm quite happy, but I would like to add test to catch "read after move" too.

I agree the current approach technically is UB, but I don't dislike it. I've seen this approach used before for life-time tracking.

libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
10

Can you add that as a comment in the test itself?

54
65

This can be noexcept since we only support C++11 and later.

526

This can be just constexpr.

552

You mean a pointer to std::any?

648
653

Do you expect test_all to grow? Otherwise it could be folded in main.

var-const marked 6 inline comments as done.

Address feedback.

libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
65

Done. There are a few leftovers from my attempt to make it work in C++03.

552

I think using any here would prevent running this test in pre-C++17 modes, right? (good idea, though)

653

Probably -- I have at least one addition in mind (I also have a personal preference for a very simple main).

LGMT

libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
140

optional: it might be worth setting rhs.v_ to nullptr, which would likely cause segv in case we are trying to deference the moved-from Reference

158

is this intended? why is the assignment inside an assert. I think in general it is not a good idea to make assert have side effects, as it makes program behaves differently with different compiler flags

var-const marked an inline comment as done.

Address feedback.

var-const marked an inline comment as done.Jul 26 2022, 12:11 PM
var-const added inline comments.
libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
158

Thanks for spotting, this was a copy-paste error.

ldionne accepted this revision.Jul 26 2022, 12:16 PM

This LGTM, and we should do the same for <ranges>.

libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
75
This revision is now accepted and ready to land.Jul 26 2022, 12:16 PM
huixie90 accepted this revision.Jul 26 2022, 1:15 PM
huixie90 added inline comments.
libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
208

Should be void as this class does not have operator->

420
Mordante accepted this revision.Jul 26 2022, 3:27 PM

Thanks for adding the use after move validation!
LGTM when the CI is green.

libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
552

sorry I meant std::any_of.

653

I prefer a small main too, but usually I fold these small helpers in main. But no objection against your approach.

huixie90 added inline comments.Jul 26 2022, 3:41 PM
libcxx/test/std/algorithms/robust_against_proxy_iterators_lifetime_bugs.pass.cpp
552

It is fine to use lambda here. it is not easy to pass std::any_of because it is

  1. a function template
  2. an overload set.

to solve 1, usually you need to explicit specify template arguments. &my_fun<intput1, input2>

to solve 2, you need to static_cast it to a function pointer.

combining 1 and 2 is lot of code which is not worth it.

I use BOOST_HOF_LIFT all the time to lift non-functional-friendly functions to an object that can be passed around but that require Boost.

var-const updated this revision to Diff 447869.Jul 26 2022, 4:14 PM
var-const marked 6 inline comments as done.

Address feedback and rebase.

This revision was landed with ongoing or failed builds.Jul 26 2022, 4:15 PM
This revision was automatically updated to reflect the committed changes.