Fixes #58392
Details
- Reviewers
ldionne Mordante var-const - Group Reviewers
Restricted Project - Commits
- rG8ff4d218a80b: [libc++] Fix memory leaks when throwing inside std::vector constructors
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
libcxx/include/vector | ||
---|---|---|
297 | In the bug report, you mention this was introduced when we removed the base class. Are you referring to b82da8b555603b172f826fdd5ea902eff3aeb7ad? I looked at the commit and I still don't see how that introduced this issue (but I do acknowledge that this is a real issue, and a somewhat embarrassing one at that). | |
1068 | This comment applies to all the constructors. If __vallocate() fails, we won't call the destructor, so we won't remove *this from the debug database. That being said, the legacy debug mode is already broken pretty bad so I'm not sure we can or want to fix it. | |
libcxx/test/std/containers/sequences/vector/vector.cons/throwing.pass.cpp | ||
11 ↗ | (On Diff #477668) | Please add a link to the bug report. |
92 ↗ | (On Diff #477668) | Instead, I would list the signatures of the constructors you're testing. It will make it clearer what's going on at a glance. |
libcxx/include/vector | ||
---|---|---|
297 | Oh crap, I see it now. It's from 12b55821a578929a7b03448a22c3a678aa649bd5. And now I finally understand what the motivation for introducing __vector_base probably was when Howard wrote it. What a great lesson. I still think it was too subtle to not even have a comment, but there's definitely something to learn from this one. Since we made similar changes to string and perhaps other types, we'll have to also look at those with this bug in mind. |
libcxx/include/vector | ||
---|---|---|
297 | I looked at the changes we made in string and deque, and we didn't do anything similar. Independently of any changes we might have made, I looked at the string implementation and I don't think it suffers from the same issue. Indeed, we sometimes allocate and then copy characters into the string, but char_traits is required not to throw anything, so I don't think the copy step can fail. deque however seems to have the same problem (but not introduced by us). In deque::__append (which is called from various constructors), we sometimes seem to allocate and then copy over elements, the constructors of which may throw. |
libcxx/test/std/containers/sequences/vector/vector.cons/throwing.pass.cpp | ||
---|---|---|
19 ↗ | (On Diff #477668) | I don't understand how this test is testing anything -- we're not asserting anything. Are we relying on the msan bot? Instead, I would suggest that we have a counting allocator that ensures that everything that was allocated also was deallocated. That would make the checking explicit. |
- Address comments
- Try to fix CI
libcxx/include/vector | ||
---|---|---|
297 | Either way, I'll create additional patches to test the constructors of deque and string, since we also removed the base classes there. (BTW libstdc++ has a comment explaining the rationale for their base classes) | |
1068 | We could just make the transaction cover the whole constructor, that should fix the problem. Since the fix should be easy enough, I think we should do it. | |
libcxx/test/std/containers/sequences/vector/vector.cons/throwing.pass.cpp | ||
19 ↗ | (On Diff #477668) | ASan, but yes. An allocator doesn't really work, since we can't give it a proper state in all cases. I opted for checking new + delete calls. |
In addition to this memory leak, it seems like we have a pre-existing bug that has been there for a long long time. In __construct_at_end (which is used in like 40 places), we fail to destroy the elements (in reverse order) in case one of the constructors throws. While it is not 100% clear to me that it's required in the constructors (I couldn't find the part of the spec that says so), I am pretty sure we're expected to do that. Let's fix this leak and then make a pass at our containers to see which ones suffer from this issue. I'm extremely surprised that we've been going for so long without this being noticed.
libcxx/include/vector | ||
---|---|---|
441–447 | I would move the definition of the class here: private: // add comment: Helper class to make it easier to rollback when we fail during one of vector's constructors struct __destroy_vector { operator() etc.. vector& __vec; }; public: ~vector() { __vector_destroy()(*this); } | |
2503 | We have the same problem here if the iterators throw inside std::copy -- we will end up leaking the memory allocated before the call to __construct_at_end. | |
libcxx/test/std/containers/sequences/vector/vector.cons/throwing.pass.cpp | ||
1 ↗ | (On Diff #477800) | Maybe rename to /exceptions.pass.cpp? |
@philnik Thanks a lot for fixing this! Just to double-check -- you have considered reintroducing the base class for the purpose of providing exception safety and decided that using transactions is better, right? (I presume it is, since you would probably need to move quite a few details related to memory management back into the base class for this to work)
libcxx/include/vector | ||
---|---|---|
1256 | Optional: maybe add a helper function so that this can be: auto __guard = __create_guard(*this); The expression is not super verbose, but it's repeated _a lot_. |
FWIW my vote here would be for the current approach, which is more explicit. There's not that much duplication and it makes the code extremely explicit about what operations we're expecting to potentially fail.
This LGTM with green CI. However, since we had some debug symbols regressions in previous vector patches, I would like to get a sense of whether this one might be a problem.
Pinging @dblaikie @hans @alexfh @joanahalili. Can one of you help us figure out whether that's going to be a problem for you folks? I think you had a semi-easy way of checking this. Also, I think it would be useful to get tests for this kind of stuff inside libc++ itself so we don't have to do this manually every time. We can't afford rebuilding all of Chromium on every patch, but perhaps if you can work with us to reproduce a smaller test that would be representative of what you see, we could start from that?
@philnik Let's ship this on next Monday unless we have an update regarding the code size. Thanks for fixing this.
Also, can you please make sure to file a bug report against libc++ regarding the pre-existing issue about exception rollback.
Try to fix CI
libcxx/include/vector | ||
---|---|---|
1256 | TBH I don't think it's worth it. It's not like these are in any way complex. |
I ran at least a quick check, just clang -O0 -g -gsplit-dwarf self-host, basically, and the differences seem small:
FILE SIZE VM SIZE -------------- -------------- +0.2% +811Ki [ = ] 0 .debug_info.dwo +0.2% +612Ki [ = ] 0 .debug_str.dwo +0.1% +84.3Ki [ = ] 0 .debug_str_offsets.dwo +0.2% +17.2Ki [ = ] 0 .debug_abbrev.dwo +0.1% +11.2Ki [ = ] 0 .debug_rnglists.dwo +0.2% +1.50Mi [ = ] 0 TOTAL
FILE SIZE VM SIZE -------------- -------------- +0.3% +860Ki [ = ] 0 .gdb_index ... +0.2% +70.8Ki [ = ] 0 .debug_addr +0.3% +35.7Ki [ = ] 0 .debug_rnglists +0.1% +1.53Ki [ = ] 0 .debug_info +0.1% +1.45Ki [ = ] 0 .debug_str +0.0% +142 [ = ] 0 .debug_loclists +0.0% +92 [ = ] 0 .debug_str_offsets +0.0% +89 [ = ] 0 .debug_line_str
So, that sounds OK to me (our reporting threshold is 1% overall file size, I think - then we drcill into the per section sizes to see what caused the problems) - though results for different programs and in different build modes might swing the numbers. I reached out to @joanahalili and co on our release team to see if they could run the wider release testing, but not sure if they'll have time - it's still a bit of a painful/long/unreliable process to run & there's work underway to streamline it.
In the bug report, you mention this was introduced when we removed the base class. Are you referring to b82da8b555603b172f826fdd5ea902eff3aeb7ad?
I looked at the commit and I still don't see how that introduced this issue (but I do acknowledge that this is a real issue, and a somewhat embarrassing one at that).