This is an archive of the discontinued LLVM Phabricator instance.

[ASan][libcxx] Annotating std::vector with all allocators
ClosedPublic

Authored by AdvenamTacet on Oct 26 2022, 6:27 AM.

Details

Summary

This revision is a part of a series of patches extending
AddressSanitizer C++ container overflow detection
capabilities by adding annotations, similar to those existing
in std::vector, to std::string and std::deque collections.
These changes allow ASan to detect cases when the instrumented
program accesses memory which is internally allocated by
the collection but is still not in-use (accesses before or
after the stored elements for std::deque, or between the size and
capacity bounds for std::string).

The motivation for the research and those changes was a bug,
found by Trail of Bits, in a real code where an out-of-bounds read
could happen as two strings were compared via a std::equals function
that took iter1_begin, iter1_end, iter2_begin iterators
(with a custom comparison function).
When object iter1 was longer than iter2, read out-of-bounds on iter2
could happen. Container sanitization would detect it.

In revision D132522, support for non-aligned memory buffers (sharing
first/last granule with other objects) was added, therefore the
check for standard allocator is not necessary anymore.
This patch removes the check in std::vector annotation member
function (__annotate_contiguous_container) to support
different allocators.

Additionally, this revision fixes unpoisoning in std::vector.
It guarantees that __alloc_traits::deallocate may access returned memory.
Originally suggested in D144155 revision.

If you have any questions, please email:

  • advenam.tacet@trailofbits.com
  • disconnect3d@trailofbits.com

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 6:27 AM
AdvenamTacet requested review of this revision.Oct 26 2022, 6:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 6:27 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
AdvenamTacet added a project: Restricted Project.
AdvenamTacet added a subscriber: Restricted Project.

Turning on old check for llvm without new annotation implementation.
(Check for _LIBCPP_CLANG_VER.)

For old llvm versions, it will work the same way.

Btw. I do not have a commiter access.

philnik added a subscriber: philnik.Nov 6 2022, 5:21 AM
philnik added inline comments.
libcxx/include/vector
746–762

Could you explain here why this is possible in LLVM16, but not before?

libcxx/test/libcxx/containers/sequences/vector/asan.pass.cpp
39

Could you instead add a test that checks this case?

EricWF added a subscriber: EricWF.Nov 6 2022, 2:40 PM
EricWF added inline comments.
libcxx/include/vector
746–762

Seconded.

Also, I really don't like having the condition be conditionally compiled. It smells weird, and it looks weird, and I think it'll lead to bugs.

Can you find another way to write this?

libcxx/test/libcxx/containers/sequences/vector/asan.pass.cpp
31–32

This seems like it's defeating the purpose of the test?

AdvenamTacet marked an inline comment as done.Nov 7 2022, 4:53 PM
AdvenamTacet added inline comments.
libcxx/include/vector
746–762

Vectors annotations rely on __sanitizer_annotate_contiguous_container, its implementation from LLVM15 (and earlier) requires that beginning of a containers data buffer is aligned to shadow granularity. Now, in effort to extend ASans capabilities, implementation of the function changed with commit rGdd1b7b797a116eed588fd752fbe61d34deeb24e4 and supports:

  • unaligned beginnings of buffers,
  • shared first/last granule (if memory granule is shared with a different object just after the end of unaligned end/before the unaligned beginning, memory of that object won't be poisoned).

Therefore, check for standard allocator is not necessary.

I don't know how to change the condition, requirements for the functions (__sanitizer_annotate_contiguous_container) arguments were weakened now and if older LLVM versions are supported, that check has to be still present for them. There are no new variables, just implementation changed. @EricWF Do you have an idea how it may be done in a different way?

libcxx/test/libcxx/containers/sequences/vector/asan.pass.cpp
31–32

It does change the purpose of the test, because with that patch, that access will be caught. But I will add a new test instead, as suggested.

39

Sure, I will add a new test and extend if around that one.

AdvenamTacet marked 2 inline comments as done.

With that change, new tests were introduced instead of modifying the old one.

List of changes:

  • Changed 16000 to 160000 in comparison with _LIBCPP_CLANG_VER
  • Withdrawn removal of a test, the test is now behind _LIBCPP_CLANG_VER < 160000 guard.
  • A tests for annotations with min_allocator added.
  • New allocator (returning not aligned buffers for types smaller than 8) added.
  • A new test for annotations on an unaligned buffer added.

Does it look good now?

AdvenamTacet added a reviewer: Restricted Project.Jan 20 2023, 6:35 PM

This update fixes incorrect version number and applies changes to new head (master).

Please, consider accepting those changes as that revision increases number of situations when ASan may be used (non-standard allocators in vector).

Unit test updated.

philnik added inline comments.Jan 22 2023, 7:25 PM
libcxx/include/vector
746–762

Please make this a comment in the code. Maybe also a // TODO LLVM18: Remove the special-casing, so we don't forget removing it.

libcxx/test/libcxx/containers/sequences/vector/asan.pass.cpp
32–41

Let's use TEST_CLANG_VER instead.

libcxx/test/support/min_allocator.h
446

This is UB in general. Please add a static_assert(alignof(T) == 1) to make sure nobody tries to instantiate it with a type where this produces UB.

AdvenamTacet marked 3 inline comments as done.

Fixes suggested in code review.

  • Comment "TODO LLVM18..." added.
  • Explained why special case is necessary and why the change is possible.
  • TEST_CLANG_VER used instead of _LIBCPP_CLANG_VER
  • Static assert for type size added.

Fixing issue with old C++.

error: 'static_assert' with no message is a C++17 extension

One more attempt to fix problems with older C++ (this time, C++03).

  • Fixed static_assert description.
  • static_assert is modified for C++03, as alignof was introduced in C++11.
philnik accepted this revision.Jan 23 2023, 10:29 PM

LGTM % nit.

libcxx/test/support/min_allocator.h
439–442

Sorry that I didn't say that before. We have TEST_ALIGNOF to support C++03. Then you don't need the special-casing here.

This revision is now accepted and ready to land.Jan 23 2023, 10:29 PM
AdvenamTacet added inline comments.Jan 24 2023, 3:56 AM
libcxx/test/support/min_allocator.h
439–442

Should I change it? I see that you accepted the revision.

I see that the revision is accepted, but based on the inline commit, I updated static_assert.
Now TEST_ALIGNOF is used instead of two cases with #if.

I do not have commiter permissions.

AdvenamTacet marked an inline comment as done.Jan 24 2023, 4:28 AM

I see that the revision is accepted, but based on the inline commit, I updated static_assert.
Now TEST_ALIGNOF is used instead of two cases with #if.

I do not have commiter permissions.

LGTM % nit means that it looks good when the remaining comments are addressed, which are usually just small changes where one can't do much wrong. Are the Name and E-Mail in the commit correct?

LGTM % nit means that it looks good when the remaining comments are addressed, which are usually just small changes where one can't do much wrong

Thank you for explaining. I realized that it means LGTM modulo something, but I wasn't sure if I should fix it or not.

Are the Name and E-Mail in the commit correct?

I don't see those fields here, but I believe so. locally it's correct (Advenam Tacet <advenam.tacet@trailofbits.com>) in git commit and I used arc to send it.

vitalybuka accepted this revision.Jan 24 2023, 3:53 PM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Jan 27 2023, 5:52 AM

Extending the check to non-standard allocators, appears to cause a false positive during copy-assignment of a vector with custom allocator in V8, see https://crbug.com/1410719#c6

hans added a comment.Jan 30 2023, 4:25 AM

Extending the check to non-standard allocators, appears to cause a false positive during copy-assignment of a vector with custom allocator in V8, see https://crbug.com/1410719#c6

Filed as https://github.com/llvm/llvm-project/issues/60384 with a stand-alone reproducer. I'll revert this until it can be fixed.

AdvenamTacet reopened this revision.Feb 19 2023, 9:05 PM
This revision is now accepted and ready to land.Feb 19 2023, 9:05 PM

This update should fix problems with zero allocators and add tests confirming it.
It adds fixes to std::vector annotations unpoisoning, originally suggested in D144155. It guarantees that __alloc_traits::deallocate may access returned memory. (Details below.)
It adds a new test allocator (safe_allocator), this allocator accesses memory during allocation and deallocation.
It extends existing and relevant tests to use that allocator as well. This should show that a similar problem to the one which happened won’t happen again.

Short description of the fix:
A call to __annotate_delete() in operator() (in __destroy_vector class) is moved after a call to clear(), because clear may poison memory (as container overflow - fc).
It also adds unpoisoning memory before passing it to the deallocator in __vdeallocate() and __copy_assign_alloc(const vector& __c, true_type).
Current implementation is not correct and works only because annotations are turned on for only the standard allocator, as well as its implementation does not access memory at all, it may change with implementation change. You can see details in D144155.

This update applies the patch on the newest commit from master branch.
This hopefuly fixes:

No testing is possible because we couldn't apply the patch.

Src: https://buildkite.com/llvm-project/diff-checks/builds/153926

The patch applied without a conflict.

Important update notes are in the previous message.

fix: test

Is there a way to send a quiet update? Without informing everyone following the revision?

Important update notes are in the previous message.

AdvenamTacet edited the summary of this revision. (Show Details)Feb 20 2023, 8:03 PM

This revision passed automatic tests. To confirm that it's working, I used those changes to compile a bunch of random harnesses (~45) from Chromium and I'm fuzzing using them right now. Everything works.
The problem with false positives was caused by incorrect use/missing use of __annotate_delete() and should be fixed with that revision. Just the fix to that problem was suggested in D144155 but moved here, look there for details.

I believe the patch is ready to be merged again. Please, let me know if there is anything else to do.

philnik accepted this revision.Feb 21 2023, 5:46 PM

A couple small things, but nothing major. LGTM with nits addressed.

libcxx/test/libcxx/containers/sequences/vector/asan.pass.cpp
32–41

Could you also add a TODO here?

libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_size_value.pass.cpp
100–111

Why are you replacing min_allocator here? Everywhere else you added a test.

libcxx/test/std/containers/sequences/vector/vector.special/swap.pass.cpp
181

Could you make the allocator a template parameter instead to avoid the duplication? The other cases are small enough that I don't really care, but would of course be a plus to deduplicate them as well.

libcxx/test/support/min_allocator.h
447

This static_cast shouldn't be necessary, right?

475

The compiler probably knows that after freeing the memory it's values can't be read, so this memset could probably get optimized away. Let's add a DoNotOptimize(p) just to be safe.

AdvenamTacet added inline comments.Feb 21 2023, 6:57 PM
libcxx/test/std/containers/sequences/vector/vector.modifiers/insert_iter_size_value.pass.cpp
100–111

I believe, someone pasted that test (with min_allocator) twice by mistake. Same test is just above.

AdvenamTacet marked 5 inline comments as done.

This update answers requests from the code review:

  • adds TODO,
  • removes static_cast,
  • adds DoNotOptimize.

I hope, this patch is ready now and may be upstreamed.

AdvenamTacet added inline comments.Feb 21 2023, 9:14 PM
libcxx/test/std/containers/sequences/vector/vector.special/swap.pass.cpp
181

Could you make the allocator a template parameter

I forgot to mention in the update note, I did that as well.

libcxx/test/support/min_allocator.h
447

It's not, fixed.

To make sure that after changes this diff still works as intended, I again compiled a bunch of Chromium harnesses and I don't get any false positives during fuzzing.
I believe, the patch is ready for upstream.

This revision was automatically updated to reflect the committed changes.
hans added a comment.Feb 27 2023, 7:11 AM

I'm afraid we did hit an issue with this in Chromium, but I think it points to a more fundamental problem with these annotations rather than this patch.

The issue is that we have some code using an arena allocator to allocate a bunch of objects, including vectors and their backing memory. Then instead of tracking the lifetimes of those objects, the arena is cleared, and the memory recycled for next time. That means the vector destructors don't run, so some memory is left in the arena with container annotations, causing errors when it's reused.

I've created a reproducer at https://bugs.chromium.org/p/chromium/issues/detail?id=1419798#c5

This could have happened before this patch too, if the user overloaded the global operator new and delete for example.

The question is what to do about it. I don't think there's an easy way for the allocator to remove the container annotations when recycling the memory?

I'm afraid we did hit an issue with this in Chromium, but I think it points to a more fundamental problem with these annotations rather than this patch.

The issue is that we have some code using an arena allocator to allocate a bunch of objects, including vectors and their backing memory. Then instead of tracking the lifetimes of those objects, the arena is cleared, and the memory recycled for next time. That means the vector destructors don't run, so some memory is left in the arena with container annotations, causing errors when it's reused.

I've created a reproducer at https://bugs.chromium.org/p/chromium/issues/detail?id=1419798#c5

This could have happened before this patch too, if the user overloaded the global operator new and delete for example.

The question is what to do about it. I don't think there's an easy way for the allocator to remove the container annotations when recycling the memory?

I think it's pretty clear that what you are doing is UB, so IMO we want to catch this generally. I don't think there is an easy way to unpoison the memory other than running the destructors. We could add an escape hatch through the allocator traits, but that would of course mean that you don't get the additional coverage for that allocator.

vitalybuka added a comment.EditedFeb 27 2023, 12:36 PM

I'm afraid we did hit an issue with this in Chromium, but I think it points to a more fundamental problem with these annotations rather than this patch.

The issue is that we have some code using an arena allocator to allocate a bunch of objects, including vectors and their backing memory. Then instead of tracking the lifetimes of those objects, the arena is cleared, and the memory recycled for next time. That means the vector destructors don't run, so some memory is left in the arena with container annotations, causing errors when it's reused.

I've created a reproducer at https://bugs.chromium.org/p/chromium/issues/detail?id=1419798#c5

This could have happened before this patch too, if the user overloaded the global operator new and delete for example.

The question is what to do about it. I don't think there's an easy way for the allocator to remove the container annotations when recycling the memory?

I think it's pretty clear that what you are doing is UB, so IMO we want to catch this generally. I don't think there is an easy way to unpoison the memory other than running the destructors. We could add an escape hatch through the allocator traits, but that would of course mean that you don't get the additional coverage for that allocator.

@hans If Chromium knowingly drops destructors, can you just upoison the entire range there, by arena allocator?

hans added a comment.Feb 27 2023, 1:39 PM

I think it's pretty clear that what you are doing is UB, so IMO we want to catch this generally. I don't think there is an easy way to unpoison the memory other than running the destructors. We could add an escape hatch through the allocator traits, but that would of course mean that you don't get the additional coverage for that allocator.

I'm not sure it is UB actually. (At least not the part that recycles the vector storage memory.) In any case it's not an unreasonable pattern; for example Clang does the same for its AST with ASTContext::Allocator -- the AST nodes' destructors never run, and one could imagine a scenario where the memory is re-used.

@hans If Chromium knowingly drops destructors, can you just upoison the entire range there, by arena allocator?

Yes, I think that would work for us. (You mean just with __asan_unpoison_memory_region right?)
I'll give it a try tomorrow.

I think it's pretty clear that what you are doing is UB, so IMO we want to catch this generally. I don't think there is an easy way to unpoison the memory other than running the destructors. We could add an escape hatch through the allocator traits, but that would of course mean that you don't get the additional coverage for that allocator.

I'm not sure it is UB actually. (At least not the part that recycles the vector storage memory.) In any case it's not an unreasonable pattern; for example Clang does the same for its AST with ASTContext::Allocator -- the AST nodes' destructors never run, and one could imagine a scenario where the memory is re-used.

Since you never run the destructor the object is still within it's lifetime, i.e. you create multiple objects within the same memory region, which is UB. I agree that it's not that crazy of a thing to do, just like comparing unrelated pointers is. Still, both are UB and are reasonable to be detected by a tool that is there to find UB. Maybe there should be a standards-blessed mechanism to destroy all objects within a region of memory, but we don't have that.

@hans If Chromium knowingly drops destructors, can you just upoison the entire range there, by arena allocator?

Yes, I think that would work for us. (You mean just with __asan_unpoison_memory_region right?)
I'll give it a try tomorrow.

I wasn't aware that there exists a magic way to remove any annotations within a region, interesting.

vitalybuka added a comment.EditedFeb 28 2023, 2:37 PM

I think it's pretty clear that what you are doing is UB, so IMO we want to catch this generally. I don't think there is an easy way to unpoison the memory other than running the destructors. We could add an escape hatch through the allocator traits, but that would of course mean that you don't get the additional coverage for that allocator.

I'm not sure it is UB actually. (At least not the part that recycles the vector storage memory.) In any case it's not an unreasonable pattern; for example Clang does the same for its AST with ASTContext::Allocator -- the AST nodes' destructors never run, and one could imagine a scenario where the memory is re-used.

@hans If Chromium knowingly drops destructors, can you just upoison the entire range there, by arena allocator?

Yes, I think that would work for us. (You mean just with __asan_unpoison_memory_region right?)
I'll give it a try tomorrow.

I think it's pretty clear that what you are doing is UB, so IMO we want to catch this generally. I don't think there is an easy way to unpoison the memory other than running the destructors. We could add an escape hatch through the allocator traits, but that would of course mean that you don't get the additional coverage for that allocator.

I'm not sure it is UB actually. (At least not the part that recycles the vector storage memory.) In any case it's not an unreasonable pattern; for example Clang does the same for its AST with ASTContext::Allocator -- the AST nodes' destructors never run, and one could imagine a scenario where the memory is re-used.

Since you never run the destructor the object is still within it's lifetime, i.e. you create multiple objects within the same memory region, which is UB. I agree that it's not that crazy of a thing to do, just like comparing unrelated pointers is. Still, both are UB and are reasonable to be detected by a tool that is there to find UB. Maybe there should be a standards-blessed mechanism to destroy all objects within a region of memory, but we don't have that.

@hans If Chromium knowingly drops destructors, can you just upoison the entire range there, by arena allocator?

Yes, I think that would work for us. (You mean just with __asan_unpoison_memory_region right?)
I'll give it a try tomorrow.

I wasn't aware that there exists a magic way to remove any annotations within a region, interesting.

I think it's pretty clear that what you are doing is UB, so IMO we want to catch this generally. I don't think there is an easy way to unpoison the memory other than running the destructors. We could add an escape hatch through the allocator traits, but that would of course mean that you don't get the additional coverage for that allocator.

I'm not sure it is UB actually. (At least not the part that recycles the vector storage memory.) In any case it's not an unreasonable pattern; for example Clang does the same for its AST with ASTContext::Allocator -- the AST nodes' destructors never run, and one could imagine a scenario where the memory is re-used.

Since you never run the destructor the object is still within it's lifetime, i.e. you create multiple objects within the same memory region, which is UB. I agree that it's not that crazy of a thing to do, just like comparing unrelated pointers is. Still, both are UB and are reasonable to be detected by a tool that is there to find UB. Maybe there should be a standards-blessed mechanism to destroy all objects within a region of memory, but we don't have that.

@hans If Chromium knowingly drops destructors, can you just upoison the entire range there, by arena allocator?

Yes, I think that would work for us. (You mean just with __asan_unpoison_memory_region right?)
I'll give it a try tomorrow.

I wasn't aware that there exists a magic way to remove any annotations within a region, interesting.

You should be able to achieve the same with sanitizer_annotate_contiguous_container() using special set of arguments.
something like
sanitizer_annotate_contiguous_container(alloc_begin, alloc_end, alloc_begin, alloc_end)
libc++ should use the __sanitizer_ api so we can support other sanitizers, e.g. HWASAN, or even MTE.

@hans For arena __sanitizer_annotate_contiguous_container looks like weird but probably better option for the same reason. But having ifdef per sanitizers seems fine as well.

hans added a comment.Mar 6 2023, 6:09 AM

We've hit another issue after this in Chromium: http://crbug.com/1420967. Also, Google internal testing hit a similar issue in MySQL. That means at least three false positives after this patch. At the same time this patch has revealed zero real issues in our code.

I understand the desire to make the behavior with custom allocators match that of standard allocators, but perhaps these stats indicate that the new behavior is doing more harm than good? As opposed to standard allocators, it seems that with custom allocators, it's common to have accesses to the memory that fall outside the container range.

I think we should consider backing this out, and leaving the container annotations to the standard allocators only.

We've hit another issue after this in Chromium: http://crbug.com/1420967. Also, Google internal testing hit a similar issue in MySQL. That means at least three false positives after this patch. At the same time this patch has revealed zero real issues in our code.

I understand the desire to make the behavior with custom allocators match that of standard allocators, but perhaps these stats indicate that the new behavior is doing more harm than good? As opposed to standard allocators, it seems that with custom allocators, it's common to have accesses to the memory that fall outside the container range.

I think we should consider backing this out, and leaving the container annotations to the standard allocators only.

3 issues doesn't seem like a lot of data to build on. I don't think that these should be described as "not real issues", since they are clearly UB, which is something ASan is supposed to flag. I guess these are three different custom allocators that try to access allocated memory for some reason? I'd be fine with an opt-out mechanism for these allocators, but I think we should enable poisoning by default. e.g. for the wink-out mechanism it makes sense to poison the memory and when winking-out the objects un-poison everything. This allows ASan to still detect any overflow issues and having (most of) the performance benefit. There are also super simple allocators out there for logging etc. which would benefit from this kind of poisoning.

hans added a comment.Mar 6 2023, 10:30 AM

We've hit another issue after this in Chromium: http://crbug.com/1420967. Also, Google internal testing hit a similar issue in MySQL. That means at least three false positives after this patch. At the same time this patch has revealed zero real issues in our code.

I understand the desire to make the behavior with custom allocators match that of standard allocators, but perhaps these stats indicate that the new behavior is doing more harm than good? As opposed to standard allocators, it seems that with custom allocators, it's common to have accesses to the memory that fall outside the container range.

I think we should consider backing this out, and leaving the container annotations to the standard allocators only.

3 issues doesn't seem like a lot of data to build on.

Combined with the fact that we've run tests for hundreds of millions of lines of code and found zero real issues, I think it's a good indication.

I don't think that these should be described as "not real issues", since they are clearly UB, which is something ASan is supposed to flag.

I don't think the UB situation is that clear. If an allocator has allocated a chunk of ints, it's perfectly well defined behavior for that allocator or other code to access the whole chunk, even if parts of it is being used by a vector.

Another indication that these are not real issues is that the fix for all of them is to remove the container annotations.

One way to think about the situation is that with the standard allocator, these annotations work because the std::vector is in practice in complete control of the memory. With custom allocators that's no longer true, since the standard doesn't say what those allocators might do, and then these checks become much less precise.

I guess these are three different custom allocators that try to access allocated memory for some reason? I'd be fine with an opt-out mechanism for these allocators, but I think we should enable poisoning by default. e.g. for the wink-out mechanism it makes sense to poison the memory and when winking-out the objects un-poison everything. This allows ASan to still detect any overflow issues and having (most of) the performance benefit. There are also super simple allocators out there for logging etc. which would benefit from this kind of poisoning.

I can also see the theoretical benefit of this, and part of me likes it, but with the issues that are showing up, I'm just not sure it works out in practice. Perhaps it should be opt-in instead?

ayzhao added a subscriber: ayzhao.Mar 6 2023, 5:34 PM

FYI this is causing (yet another) bug on Chrome/Chrome OS: https://crbug.com/1422033

The question is what to do about it. I don't think there's an easy way for the allocator to remove the container annotations when recycling the memory?

Depends on your implementation, but you can easily unpoison memory (there are macros for it in the ASan API, but it should be possible to also use interface functions (not recommended)), or turn off instrumentation in a function accessing poisoned area. At the same time, I'm happy to implement an easy way to turn off annotations for specific allocators. Below details.

  1. Unpoisoning:
  1. It's possible to access memory without checks by turning off instrumentation with __attribute__((__no_sanitize__("address"))).

we've run tests for hundreds of millions of lines of code and found zero real issues

with the issues that are showing up, I'm just not sure it works out in practice. Perhaps it should be opt-in instead?

I strongly believe, it shouldn't be opt-in. We would have to educate everyone about that possibility and they would have to maintain list of allocators. With opt-out, one may just react to visible errors. Also, if in "hundreds of millions of lines" only 3 allocators are problematic, turning off annotations for them (or unpoisoning memory in them) seems like a small effort compared to a potential benefit of finding a security bug early. I understand that you didn't find anything yet, but it's same logic as for default allocator, and implementations with custom allocators may have bugs as well.


If @philnik thinks that opt-out is ok, I'm happy to add something like:

template<class A>
struct libcxx_asan_allocator_annotate {
    const bool value = true;
}

and in vectors member function __annotate_contiguous_container add to the check && libcxx_asan_allocator_annotate<_Allocator>::value.

  • @philnik what file would be best for that class definition? Probably std::basic_string and std::deque should have access to it as well.
  • @hans do you have a minimal failing example? Would be good to add unit test and make sure that it solves problems you observe.

At the same time, if simply unpoisoning memory works for you, it is probably a better choice as that change increases chance of detecting bugs.


@ayzhao I'm not sure what exactly is happening in your case, but there is a write to memory.

Strangely enough, calling __asan_unpoison_memory_region(...) in deallocate(...) doesn't resolve this...

Is it the first thing you do in deallocator? Are you unpoisoning whole memory for sure?

I'm guessing the reason is because std::vector calls __annotate_delete() before deallocate()

It should leave memory poisoned in the very same way as before a constructor. Why do you think it's the reason?

Regarding the Standard terms, accessing destructed, but not released memory is well defined in basic.life#6 (in some limited ways, but exactly sufficient for the allocator needs):

after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. ... such a pointer refers to allocated storage ([basic.stc.dynamic.allocation]), and using the pointer as if the pointer were of type void* is well-defined. Indirection through such a pointer is permitted...

Since you never run the destructor the object is still within it's lifetime, i.e. you create multiple objects within the same memory region, which is UB.

Even that may (suprisingly) be well-defined (as the paragraph above mentions):

A program may end the lifetime of an object of class type without invoking the destructor, by reusing or releasing the storage as described above.
hans added a comment.Mar 8 2023, 7:05 AM

FYI this is causing (yet another) bug on Chrome/Chrome OS: https://crbug.com/1422033

We've root caused this to ChromeOS using a toolchain based on llvmorg-16-init-6711-g11897708c022, which is before the ASan runtime gained support for unaligned container memory in llvmorg-16-init-9006-gdd1b7b797a11, which this patch depends on. (https://crbug.com/1422033#c11 has a reproducer).

I see that the patch is already trying to handle this with the #if _LIBCPP_CLANG_VER >= 1600 check. However, that doesn't quite work since there were many revisions with that version number before the ASan change.

Should it be checking _LIBCPP_CLANG_VER >= 1700 instead, or is there some more precise way to handle this? (Does the ASan runtime have some way to check if it supports this for example?)

FYI this is causing (yet another) bug on Chrome/Chrome OS: https://crbug.com/1422033

We've root caused this to ChromeOS using a toolchain based on llvmorg-16-init-6711-g11897708c022, which is before the ASan runtime gained support for unaligned container memory in llvmorg-16-init-9006-gdd1b7b797a11, which this patch depends on. (https://crbug.com/1422033#c11 has a reproducer).

I see that the patch is already trying to handle this with the #if _LIBCPP_CLANG_VER >= 1600 check. However, that doesn't quite work since there were many revisions with that version number before the ASan change.

Should it be checking _LIBCPP_CLANG_VER >= 1700 instead, or is there some more precise way to handle this? (Does the ASan runtime have some way to check if it supports this for example?)

Afaik, this is against libc++ policy. libc++ needs to work with at least 1 or 2 last clang releases so this patch needs to accomodate it.
@ldionne to confirm.

FYI this is causing (yet another) bug on Chrome/Chrome OS: https://crbug.com/1422033

We've root caused this to ChromeOS using a toolchain based on llvmorg-16-init-6711-g11897708c022, which is before the ASan runtime gained support for unaligned container memory in llvmorg-16-init-9006-gdd1b7b797a11, which this patch depends on. (https://crbug.com/1422033#c11 has a reproducer).

I see that the patch is already trying to handle this with the #if _LIBCPP_CLANG_VER >= 1600 check. However, that doesn't quite work since there were many revisions with that version number before the ASan change.

Should it be checking _LIBCPP_CLANG_VER >= 1700 instead, or is there some more precise way to handle this? (Does the ASan runtime have some way to check if it supports this for example?)

Afaik, this is against libc++ policy. libc++ needs to work with at least 1 or 2 last clang releases so this patch needs to accomodate it.
@ldionne to confirm.

The last two official releases. It doesn't seem like ChromeOS uses that, given that LLVM16 isn't released yet. Anyways, I think we have to revert, since we clearly need some way to fix the annotation problems, and that will probably take some time to figure out.

hans added a comment.Mar 8 2023, 7:55 AM

The last two official releases. It doesn't seem like ChromeOS uses that, given that LLVM16 isn't released yet. Anyways, I think we have to revert, since we clearly need some way to fix the annotation problems, and that will probably take some time to figure out.

FWIW I think just checking _LIBCPP_CLANG_VER >= 1700 would be a good fix.

The last two official releases. It doesn't seem like ChromeOS uses that, given that LLVM16 isn't released yet. Anyways, I think we have to revert, since we clearly need some way to fix the annotation problems, and that will probably take some time to figure out.

FWIW I think just checking _LIBCPP_CLANG_VER >= 1700 would be a good fix.

I don't think that makes sense. We don't check that we have the version after the first one in which it's implemented anywhere else. Supporting this kind of stuff would be a nightmare, and 99% of people who don't use an official release are either on the bleeding edge, or are a vendor. If you are a vendor you probably have some downstream patches anyways, so having this too seems not that much of a problem.

hans added a comment.Mar 8 2023, 9:05 AM

The last two official releases. It doesn't seem like ChromeOS uses that, given that LLVM16 isn't released yet. Anyways, I think we have to revert, since we clearly need some way to fix the annotation problems, and that will probably take some time to figure out.

FWIW I think just checking _LIBCPP_CLANG_VER >= 1700 would be a good fix.

I don't think that makes sense. We don't check that we have the version after the first one in which it's implemented anywhere else. Supporting this kind of stuff would be a nightmare, and 99% of people who don't use an official release are either on the bleeding edge, or are a vendor. If you are a vendor you probably have some downstream patches anyways, so having this too seems not that much of a problem.

As we've seen, _LIBCPP_CLANG_VER >= 1600 is not sufficient to check that the ASan support is there, but _LIBCPP_CLANG_VER >= 1700 is sufficient. And since this patch (annotations for vector with custom allocator) is happening in the 17 release, there's no delay before they can be used. That's why I thought it would be a good fix.

An even nicer fix would of course be if ASan had some kind of feature checking mechanism or finer grained version we could check.

AdvenamTacet reopened this revision.Mar 8 2023, 10:23 AM

I'm reopening because of the revert.

  • Safe allocator and tests are moved to D145597, I will remove them from here.
  • A way to turn off annotations for a specific allocator will be added in a separate patch.
This revision is now accepted and ready to land.Mar 8 2023, 10:23 AM

This update removes tests already added in D145597.

This update includes change from D145628 and adds an easy way to turn off annotations for a specific allocator.
It also adds a test for that change, based on failing example showed above.

I still consider unpoisoning memory as a superior choice, but if it's impossible in some situations (like using allocator mantained by someone else), now it should be easy to turn off annotations for a specific allocator.

Test update.

This update tries to solve the problem with MacOS error (cannot test it locally).

It adds #include <new> to solve:

error: no matching 'operator new' function for non-allocating placement new expression

hans added a comment.Mar 9 2023, 2:47 AM

This update includes change from D145628 and adds an easy way to turn off annotations for a specific allocator.
It also adds a test for that change, based on failing example showed above.

I still consider unpoisoning memory as a superior choice, but if it's impossible in some situations (like using allocator mantained by someone else), now it should be easy to turn off annotations for a specific allocator.

Thanks! I added a comment on that. Besides allocators maintained by someone else, I think https://crbug.com/1420967 is also a good example of where unpoisoning isn't an option.

The current patch still has the _LIBCPP_CLANG_VER issue though.

I think https://crbug.com/1420967 is also a good example of where unpoisoning isn't an option.

If access to poisoned memory happens only in a few functions, it's possible to just turn off instrumentation in them with __attribute__((__no_sanitize__("address"))).
But if the buffer is modified, it may be better to turn off annotations for that allocator. Therefore I made D145628.

The current patch still has the _LIBCPP_CLANG_VER issue though.

Here I agree with @philnik that waiting until LLVM17 (given that LLVM16 isn't released yet) makes no sense and testing changes would be very difficult. Also, rGdd1b7b797a116eed588fd752fbe61d34deeb24e4 was accepted more than 4 moths ago, so it's a big time gap between support in compiler-rt and libc++.
I also don't see any other reasonable way to check support in compiler-rt than checking version number.

hans added a comment.Mar 16 2023, 4:03 AM

I think https://crbug.com/1420967 is also a good example of where unpoisoning isn't an option.

If access to poisoned memory happens only in a few functions, it's possible to just turn off instrumentation in them with __attribute__((__no_sanitize__("address"))).
But if the buffer is modified, it may be better to turn off annotations for that allocator. Therefore I made D145628.

Thanks, I think turning off annotations for this allocator is the best option. Hunting down all functions that access the memory would be hard, and also disabling instrumentation in them seems like a too big hammer.

The current patch still has the _LIBCPP_CLANG_VER issue though.

Here I agree with @philnik that waiting until LLVM17 (given that LLVM16 isn't released yet) makes no sense and testing changes would be very difficult. Also, rGdd1b7b797a116eed588fd752fbe61d34deeb24e4 was accepted more than 4 moths ago, so it's a big time gap between support in compiler-rt and libc++.
I also don't see any other reasonable way to check support in compiler-rt than checking version number.

There would be no need to wait since 17 is already the current version at HEAD.

Considering that this won't be cherry-picked back to release/16.x (right?), then whether we use _LIBCPP_CLANG_VER >= 1600 or _LIBCPP_CLANG_VER >= 1700 is irrelevant for all practical purposes, since libc++ gets released in sync with Clang and the next release this will get in will have Clang 17. I do agree however that if someone (in this case Chrome) is using a not-yet-released snapshot of trunk that happens to have _LIBCPP_CLANG_VER >= 1600 yet isn't something we've released as LLVM 16, then it's probably not reasonable for us to try to support that. Imagine coming up with that policy, I think it would mean that we can't ever rely on _LIBCPP_CLANG_VER >= 1600 during the libc++ 16 release, which would be pretty unfortunate.

So I have no strong opinion, either way is fine with me (even though the status quo is kind of unfortunate for Chrome until they bump their Clang version).

libcxx/test/libcxx/containers/sequences/vector/asan.pass.cpp
44–76

Since we're mostly interested in having good coverage for the Clang >= 16 behavior anyway, I would instead suggest keeping this test but removing the #if TEST_STD_VER >= 11 && TEST_CLANG_VER < 1600 test above. I don't really see much value in keeping it around, unless I missed something.

libcxx/test/libcxx/containers/sequences/vector/no_asan.pass.cpp
13 ↗(On Diff #503649)

Can you please add an english description of what this test does? It's also nice to have the link, however the test should ideally be understandable just by reading it (if e.g. the link goes down or something).

57 ↗(On Diff #503649)

Nitpick, but this is needed for freestanding.

hans added a comment.Mar 20 2023, 5:31 AM

I do agree however that if someone (in this case Chrome) is using a not-yet-released snapshot of trunk that happens to have _LIBCPP_CLANG_VER >= 1600 yet isn't something we've released as LLVM 16, then it's probably not reasonable for us to try to support that. Imagine coming up with that policy, I think it would mean that we can't ever rely on _LIBCPP_CLANG_VER >= 1600 during the libc++ 16 release, which would be pretty unfortunate.

(FWIW it's ChromeOS, not Chrome, that's using an older Clang version. And if necessary they can probably patch in the needed compiler-rt changes. Also, https://libcxx.llvm.org/ technically says that clang "16-git" is supported, which is what they are using ;)

I'll stop harping on about this, but I think gating functionality on _LIBCPP_CLANG_VER isn't great for the reason we've seen here. (And Eric predicted it in his "It smells weird, and it looks weird, and I think it'll lead to bugs." comment). I'm pleased to see that it's currently only used in two places in libc++, one of which is for a deprecation warning. As mentioned above, the best would be if compiler-rt exposed some kind of feature check for this, but if we can't have that, I think >= 17 is the way to go.

I do agree however that if someone (in this case Chrome) is using a not-yet-released snapshot of trunk that happens to have _LIBCPP_CLANG_VER >= 1600 yet isn't something we've released as LLVM 16, then it's probably not reasonable for us to try to support that. Imagine coming up with that policy, I think it would mean that we can't ever rely on _LIBCPP_CLANG_VER >= 1600 during the libc++ 16 release, which would be pretty unfortunate.

(FWIW it's ChromeOS, not Chrome, that's using an older Clang version. And if necessary they can probably patch in the needed compiler-rt changes. Also, https://libcxx.llvm.org/ technically says that clang "16-git" is supported, which is what they are using ;)

It's "16-git" as-in current trunk, not some random version from a few months ago.

I'll stop harping on about this, but I think gating functionality on _LIBCPP_CLANG_VER isn't great for the reason we've seen here. (And Eric predicted it in his "It smells weird, and it looks weird, and I think it'll lead to bugs." comment). I'm pleased to see that it's currently only used in two places in libc++, one of which is for a deprecation warning. As mentioned above, the best would be if compiler-rt exposed some kind of feature check for this, but if we can't have that, I think >= 17 is the way to go.

I don't think >= 17 is an option. You are basically asking us to keep the code around for another release because someone used some wacky unsupported configuration. Either live at head or use releases. They exist for a reason.

hans added a comment.Mar 20 2023, 6:47 AM

I don't think >= 17 is an option. You are basically asking us to keep the code around for another release because someone used some wacky unsupported configuration. Either live at head or use releases. They exist for a reason.

I'm just trying to be pragmatic. Given the nasty failure mode - ASan silently malfunctioning - using a check which actually guarantees that the required functionality is there seems reasonable to me.

AdvenamTacet marked 3 inline comments as done.

This update:

  • adds mains arguments,
  • adds tests description,
  • removes suggested test case.

I do agree however that if someone (in this case Chrome) is using a not-yet-released snapshot of trunk that happens to have _LIBCPP_CLANG_VER >= 1600 yet isn't something we've released as LLVM 16, then it's probably not reasonable for us to try to support that. Imagine coming up with that policy, I think it would mean that we can't ever rely on _LIBCPP_CLANG_VER >= 1600 during the libc++ 16 release, which would be pretty unfortunate.

(FWIW it's ChromeOS, not Chrome, that's using an older Clang version. And if necessary they can probably patch in the needed compiler-rt changes. Also, https://libcxx.llvm.org/ technically says that clang "16-git" is supported, which is what they are using ;)

It's "16-git" as-in current trunk, not some random version from a few months ago.

I'll stop harping on about this, but I think gating functionality on _LIBCPP_CLANG_VER isn't great for the reason we've seen here. (And Eric predicted it in his "It smells weird, and it looks weird, and I think it'll lead to bugs." comment). I'm pleased to see that it's currently only used in two places in libc++, one of which is for a deprecation warning. As mentioned above, the best would be if compiler-rt exposed some kind of feature check for this, but if we can't have that, I think >= 17 is the way to go.

I don't think >= 17 is an option. You are basically asking us to keep the code around for another release because someone used some wacky unsupported configuration. Either live at head or use releases. They exist for a reason.

809855b56f06dd7182685f88fbbc64111df9339a changed clang version in trunk to 16.
https://reviews.llvm.org/rGdd1b7b797a11 added the API that this patch is using.

This change means that any clang released in the range 809855b56f06dd7182685f88fbbc64111df9339a..dd1b7b797a11 is now BAD for building libc++.

$ git log 809855b56f06dd7182685f88fbbc64111df9339a..dd1b7b797a11 --oneline | wc -l
9006

IMO, this is an large number of commits and punishes people that made clang releases in this range.

I'd however like to steer out of the debate by cherry-picking https://reviews.llvm.org/rGdd1b7b797a11 to ChromeOS clang.

I was thinking about a solution for the silent error.

API for contiguous containers was extended on Oct 28 2022, - rGdd1b7b797a116eed588fd752fbe61d34deeb24e4.

API for double ended containers was commited on Nov 22 2022 - rG1c5ad6d2c01294a0decde43a88e9c27d7437d157.
And extended on Nov 29 - rGe1657e322902d973aea606d3557f938ce0e0f06c

If we first land D132092 (annotations for std::deque with default allocator only), there will be no silent error, as a missing function in compiler-rt causes an error.

That solutions holds water only if D132092 will be accepted, but I hope it will be. The pach is ready for review.

hans added a comment.Mar 23 2023, 6:53 AM

If we first land D132092 (annotations for std::deque with default allocator only), there will be no silent error, as a missing function in compiler-rt causes an error.

Yes, that is indeed much nicer.

This update removed #if checking compiler-rt to confirm support in API, the check was moved into __asan_annotate_container_with_allocator (check D145628) as suggested by @EricWF.

Additionally, test name is changed for one which (hopefuly) has a more precise description of the purpose.

This update:

  • adds use of Cpp17UnaryTypeTrait in a test case, same as in updated D145628.
ldionne accepted this revision.Mar 30 2023, 3:57 AM

In D145628, an FTM was added. This update introduces that change to this revision.

This update the base commit.

Rebase to HEAD of master.

This revision was landed with ongoing or failed builds.May 4 2023, 5:45 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.May 11 2023, 1:26 AM

Somewhat early heads-up: This is again causing various test failures in Chromium land. Several of us are traveling this week, so it'll likely be a while until we can provide more details. (No action needed at this point, just as an FYI.) Our upstream bug is https://bugs.chromium.org/p/chromium/issues/detail?id=1444659 .