Page MenuHomePhabricator

[2b/3][ASan][libcxx] std::basic_string annotations
Needs ReviewPublic

Authored by AdvenamTacet on Aug 26 2022, 4:04 PM.


Group Reviewers
Restricted Project

This commit is a part of patches extending AddressSanitizer C++
container overflow detection capabilities by adding annotations,
similar to those existing in std::vector, to std::string and
std::deque collections. These changes allow ASan to detect cases
when the instrumented program accesses memory which is internally
allocated by the collection but is still not in-use (accesses
before or after the stored elements for std::deque, or between
the size and capacity bounds for std::string).

Trail of Bits developed this as part of a research project where
we tried to find bugs using the oss-fuzz (using llvm-14 with our
modifications) harnesses. Now, we want to upstream the llvm-16
port of this work.

That commit adds ASan annotations to std::basic_string. Functions
from [1b/3] are necessary to verify if memory may be annotated.

Std::basic_string may be short or long. Short keeps content
inside objects memory, long keeps content in an external buffer.
Those changes correctly annotate both cases.

Long string case is very similar to std::vector, as content is kept
in an external memory block allocated by the object. Memory will be
annotated if and only if a standard allocator is used.

Short string is more tricky and will be only annotated if
__sanitizer_is_annotable returns true. Other implementation quirks
are not visible outside the implementation, but
alternate string layout may help with detecting overflows.
There are no false positives.

Alternate string layout is a different std::basic_string layout
(it changes the order of class variables).
It can be enabled by setting the XXX macro during libcxx/libcxxabi
compilation and cannot be changed later (it has to be set for all
libraries/objects compilation, so it is usually pre-chosen by given
project devs/linux distro etc.).

The commit also provides unit tests for std::basic_string ASan
annotatiotations with a new function is_string_asan_correct.

Structure of our patches:
[1a/3][ASan][compiler-rt] API for double ended containers
[1b/3][ASan][compiler-rt] API for annotating objects memory
[2a/3][ASan][libcxx] std::deque annotations
[2b/3][ASan][libcxx] std::basic_string annotations
[3/3][ASan] integration test for std::deque and std::basic_string annotations

1a -> 2a -

 --> 3

1b -> 2b -

If you have any questions, please email:


Diff Detail

Event Timeline

AdvenamTacet created this revision.Aug 26 2022, 4:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 4:04 PM
AdvenamTacet requested review of this revision.Aug 26 2022, 4:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 26 2022, 4:04 PM
Herald added a reviewer: Restricted Project. · View Herald Transcript
This comment was removed by AdvenamTacet.

Code formatting and naming imporvements.

TODO comment removed.

philnik requested changes to this revision.Sep 9 2022, 9:31 AM
philnik added a subscriber: philnik.

Thanks for working on this! I haven't looked at any of the tests yet.


I don't think this should be marked _LIBCPP_INTERNAL_MEMORY_ACCESS. Or is the size/is_long byte also poisoned for some reason?


Two of these are redundant. I've uploaded D132951 to remove the differences. Also, would it be possible to annotate the special member functions of __rep instead? I think adding _LIBCPP_HIDE_FROM_ABI _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS __rep(const __rep&) = default; should do the trick.


This is now unnecessary.


Couldn't we check whether the allocation satisfies the asan requirements? It looks like the overhead of checking pointer alignment seems quite manageable:


Could you mention common_interface_defs.h here instead? I couldn't find any documentation generated by the comments in that header.


You can't check for non-_Uglified defines inside the headers. You should instead use _LIBCPP_HAS_NO_ASAN. You also want this to work outside our test environment, right?


The two characters really don't hurt anybody and makes it a lot nicer to read.


Could you make this a single if?


Could we use a common interface for this and the vector-equivalent? It would be nice if we could add a __debug_utils/address_sanitizer.h where we define __annotate_contiguous_container() and some CRTP empty base that defines the functions used here. AFAICT it should be possible to define all of these functions for the long string with the public API of both vector and string. Then we could simply add a function here with the same name to special-case the short string. I hope this would simplify the implementation here quite a bit. I'm not sure though, so tell me if there is some problem.


Why do you do this special-casing here?


When is __init called after poisoning the memory? __init should only ever be called by a constructor.


You're missing a few newlines.

This revision now requires changes to proceed.Sep 9 2022, 9:31 AM
This comment was removed by AdvenamTacet.
Herald added a project: Restricted Project. · View Herald TranscriptSep 19 2022, 6:37 PM
Herald added subscribers: Restricted Project, Enna1. · View Herald Transcript
This comment was removed by AdvenamTacet.

Previous push was designed for a different revision. Sorry for it.

API change from D132522 - no new function, annotating function changed.

Harbormaster completed remote builds in B188096: Diff 462067.
AdvenamTacet marked 2 inline comments as done.Sep 22 2022, 5:26 AM
AdvenamTacet added inline comments.

I think adding _LIBCPP_HIDE_FROM_ABI _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS rep(const rep&) = default; should do the trick.

No, because we can also write to poisoned memory.


We can and now (due to changes in D132522) we do. I hope the check should be safe to remove. I can do it, but then it will be different in std::vector. Do we want to do it now and not (for example) wait for another revision to remove it here and in std::vector?


I agree, but everywhere else those arguments are called beg, I'm happy to use begin, if inconsistency is not a problem. Is it?


This code has different behavior and we cannot really remove if from here.


Possible that it's possible for long string, not sure tho. But can we leave it for future improvements outside of that patch?
I will think about potential problems. Future patch may be for both std::basic_string and std::vector and maybe std::deque, if we make it more general.

philnik added inline comments.Sep 22 2022, 6:04 AM

Yes, and why would that break if we annotate special member functions instead?


I think we should fix it here now and update vector in a later patch.


I guess you mean in vector? IMO we should just use proper names here and update vector in another patch. That should remove the inconsistency. We also never use beg anywhere else in the code base AFAIK, so vector is actually inconsistent IMO.


Oh yeah you're right. But you changed the code now, so AFAICT it should now be return !__libcpp_is_constant_evaluated(). Also, could you move the #if into the function to avoid declaring it twice?