This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Fix memory leaks when throwing inside std::vector constructors
ClosedPublic

Authored by philnik on Nov 23 2022, 12:54 PM.

Details

Summary

Fixes #58392

Diff Detail

Event Timeline

philnik created this revision.Nov 23 2022, 12:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2022, 12:54 PM
philnik requested review of this revision.Nov 23 2022, 12:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 23 2022, 12:54 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
philnik edited the summary of this revision. (Show Details)Nov 23 2022, 12:58 PM
philnik updated this revision to Diff 477668.Nov 23 2022, 6:28 PM

Try to fix CI

ldionne added inline comments.Nov 24 2022, 6:05 AM
libcxx/include/vector
297

In the bug report, you mention this was introduced when we removed the base class. Are you referring to b82da8b555603b172f826fdd5ea902eff3aeb7ad?

I looked at the commit and I still don't see how that introduced this issue (but I do acknowledge that this is a real issue, and a somewhat embarrassing one at that).

1070

This comment applies to all the constructors. If __vallocate() fails, we won't call the destructor, so we won't remove *this from the debug database.

That being said, the legacy debug mode is already broken pretty bad so I'm not sure we can or want to fix it.

libcxx/test/std/containers/sequences/vector/vector.cons/throwing.pass.cpp
12

Please add a link to the bug report.

93

Instead, I would list the signatures of the constructors you're testing. It will make it clearer what's going on at a glance.

ldionne added inline comments.Nov 24 2022, 6:10 AM
libcxx/include/vector
297

Oh crap, I see it now. It's from 12b55821a578929a7b03448a22c3a678aa649bd5.

And now I finally understand what the motivation for introducing __vector_base probably was when Howard wrote it. What a great lesson. I still think it was too subtle to not even have a comment, but there's definitely something to learn from this one.

Since we made similar changes to string and perhaps other types, we'll have to also look at those with this bug in mind.

ldionne added inline comments.Nov 24 2022, 6:19 AM
libcxx/include/vector
297

I looked at the changes we made in string and deque, and we didn't do anything similar.

Independently of any changes we might have made, I looked at the string implementation and I don't think it suffers from the same issue. Indeed, we sometimes allocate and then copy characters into the string, but char_traits is required not to throw anything, so I don't think the copy step can fail.

deque however seems to have the same problem (but not introduced by us). In deque::__append (which is called from various constructors), we sometimes seem to allocate and then copy over elements, the constructors of which may throw.

ldionne added inline comments.Nov 24 2022, 6:22 AM
libcxx/test/std/containers/sequences/vector/vector.cons/throwing.pass.cpp
20

I don't understand how this test is testing anything -- we're not asserting anything. Are we relying on the msan bot?

Instead, I would suggest that we have a counting allocator that ensures that everything that was allocated also was deallocated. That would make the checking explicit.

philnik updated this revision to Diff 477800.Nov 24 2022, 8:11 AM
philnik marked 4 inline comments as done.
  • Address comments
  • Try to fix CI
libcxx/include/vector
297

Either way, I'll create additional patches to test the constructors of deque and string, since we also removed the base classes there. (BTW libstdc++ has a comment explaining the rationale for their base classes)

1070

We could just make the transaction cover the whole constructor, that should fix the problem. Since the fix should be easy enough, I think we should do it.

libcxx/test/std/containers/sequences/vector/vector.cons/throwing.pass.cpp
20

ASan, but yes. An allocator doesn't really work, since we can't give it a proper state in all cases. I opted for checking new + delete calls.

In addition to this memory leak, it seems like we have a pre-existing bug that has been there for a long long time. In __construct_at_end (which is used in like 40 places), we fail to destroy the elements (in reverse order) in case one of the constructors throws. While it is not 100% clear to me that it's required in the constructors (I couldn't find the part of the spec that says so), I am pretty sure we're expected to do that. Let's fix this leak and then make a pass at our containers to see which ones suffer from this issue. I'm extremely surprised that we've been going for so long without this being noticed.

libcxx/include/vector
450

I would move the definition of the class here:

private:
  // add comment: Helper class to make it easier to rollback when we fail during one of vector's constructors
  struct __destroy_vector {
    operator() etc..
    vector& __vec;
  };

public:
  ~vector() { __vector_destroy()(*this); }
2488

We have the same problem here if the iterators throw inside std::copy -- we will end up leaking the memory allocated before the call to __construct_at_end.

libcxx/test/std/containers/sequences/vector/vector.cons/throwing.pass.cpp
2

Maybe rename to /exceptions.pass.cpp?

philnik updated this revision to Diff 478403.Nov 28 2022, 3:38 PM
philnik marked 6 inline comments as done.

Address comments

philnik updated this revision to Diff 478511.Nov 29 2022, 3:58 AM

Try to fix CI

@philnik Thanks a lot for fixing this! Just to double-check -- you have considered reintroducing the base class for the purpose of providing exception safety and decided that using transactions is better, right? (I presume it is, since you would probably need to move quite a few details related to memory management back into the base class for this to work)

libcxx/include/vector
1259

Optional: maybe add a helper function so that this can be:

auto __guard = __create_guard(*this);

The expression is not super verbose, but it's repeated _a lot_.

@philnik Thanks a lot for fixing this! Just to double-check -- you have considered reintroducing the base class for the purpose of providing exception safety [...]

FWIW my vote here would be for the current approach, which is more explicit. There's not that much duplication and it makes the code extremely explicit about what operations we're expecting to potentially fail.

ldionne accepted this revision.Dec 1 2022, 8:56 AM
ldionne added subscribers: joanahalili, hans, alexfh.

This LGTM with green CI. However, since we had some debug symbols regressions in previous vector patches, I would like to get a sense of whether this one might be a problem.

Pinging @dblaikie @hans @alexfh @joanahalili. Can one of you help us figure out whether that's going to be a problem for you folks? I think you had a semi-easy way of checking this. Also, I think it would be useful to get tests for this kind of stuff inside libc++ itself so we don't have to do this manually every time. We can't afford rebuilding all of Chromium on every patch, but perhaps if you can work with us to reproduce a smaller test that would be representative of what you see, we could start from that?

@philnik Let's ship this on next Monday unless we have an update regarding the code size. Thanks for fixing this.

Also, can you please make sure to file a bug report against libc++ regarding the pre-existing issue about exception rollback.

This revision is now accepted and ready to land.Dec 1 2022, 8:56 AM
philnik updated this revision to Diff 479435.Dec 1 2022, 2:37 PM

Try to fix CI

libcxx/include/vector
1259

TBH I don't think it's worth it. It's not like these are in any way complex.

I ran at least a quick check, just clang -O0 -g -gsplit-dwarf self-host, basically, and the differences seem small:

   FILE SIZE        VM SIZE                                                                                                                
--------------  --------------                                                                                                              
+0.2%  +811Ki  [ = ]       0    .debug_info.dwo                                                                                            
+0.2%  +612Ki  [ = ]       0    .debug_str.dwo                                                                                            
+0.1% +84.3Ki  [ = ]       0    .debug_str_offsets.dwo                                                                                     
+0.2% +17.2Ki  [ = ]       0    .debug_abbrev.dwo                                                                                          
+0.1% +11.2Ki  [ = ]       0    .debug_rnglists.dwo                                                                                        
+0.2% +1.50Mi  [ = ]       0    TOTAL
    FILE SIZE        VM SIZE                                                                                                                
 --------------  --------------                                                                                                              
 +0.3%  +860Ki  [ = ]       0    .gdb_index    
...
 +0.2% +70.8Ki  [ = ]       0    .debug_addr                                                                                                
 +0.3% +35.7Ki  [ = ]       0    .debug_rnglists                                                                                            
 +0.1% +1.53Ki  [ = ]       0    .debug_info                                                                                                
 +0.1% +1.45Ki  [ = ]       0    .debug_str                                                                                                 
 +0.0%    +142  [ = ]       0    .debug_loclists                                                                                            
 +0.0%     +92  [ = ]       0    .debug_str_offsets                                                                                         
 +0.0%     +89  [ = ]       0    .debug_line_str

So, that sounds OK to me (our reporting threshold is 1% overall file size, I think - then we drcill into the per section sizes to see what caused the problems) - though results for different programs and in different build modes might swing the numbers. I reached out to @joanahalili and co on our release team to see if they could run the wider release testing, but not sure if they'll have time - it's still a bit of a painful/long/unreliable process to run & there's work underway to streamline it.

This revision was landed with ongoing or failed builds.Dec 6 2022, 5:38 AM
This revision was automatically updated to reflect the committed changes.