This is an archive of the discontinued LLVM Phabricator instance.

[2b/3][ASan][libcxx] std::basic_string annotations
AbandonedPublic

Authored by AdvenamTacet on Aug 26 2022, 4:04 PM.

Details

Reviewers
philnik
ldionne
Group Reviewers
Restricted Project
Summary

This revision is a part of a series of patches extending AddressSanitizer C++ container overflow detection capabilities by adding annotations, similar to those existing in std::vector, to std::string and std::deque collections. These changes allow ASan to detect cases when the instrumented program accesses memory which is internally allocated by the collection but is still not in-use (accesses before or after the stored elements for std::deque, or between the size and capacity bounds for std::string).

The motivation for the research and those changes was a bug, found by Trail of Bits, in a real code where an out-of-bounds read could happen as two strings were compared via a std::equals function that took iter1_begin, iter1_end, iter2_begin iterators (with a custom comparison function). When object iter1 was longer than iter2, read out-of-bounds on iter2 could happen. Container sanitization would detect it.

This revision adds annotations for std::basic_string. Long string is very similar to std::vector, and therefore works well with __sanitizer_annotate_contiguous_container from LLVM 15 and therefore annotations works if that implementation is compiled with previous LLVM.
However, only the standard allocator is supported.

As D132522 extended possibilities of __sanitizer_annotate_contiguous_container, now annotations may be supported with all allocators, but that support will be added in next patch. Only strings with default allocator are annotated with that patch.

There is also support for annotating objects using short string optimization.
This is limited functionality. If ASan is extended and (for example) objects on stack should not be annotated (note that not every short string is on stack), __annotate_short_string_check should be modified to return false in those situations.

If you have any questions, please email:

  • advenam.tacet@trailofbits.com
  • disconnect3d@trailofbits.com

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
AdvenamTacet marked 3 inline comments as not done.

This update should fix problems with short string annotations.

  • It adds _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS in relevant places.
  • It adds a check for an allocator* in __annotate_short_string_check.
  • It turns off string external templates, when compiled with ASan.

*Objects annotation cannot be supported with non-standard allocators, because some access memory inside the object.

libcxx/include/string
1831–1842

The solution seems to not work. Maybe it's inlined in an instrumented function and then it is instrumented as well (I didn't check the reason. I'm happy to test another solution, if you have an idea, how to make it work).

This update applies the patch to a newer commit.

fix: removed unnecesary test, which was added by mistake.

This update:

  • adds include for "wchar" support in tests,
  • removes a few automatic style changes outside of a new code
  • tries to fix issues with Windows/Mac

I'm using previous diff for fuzzing to confirm that it's working at the moment, therefore I believe the implementation is correct.
But there is issue with Windows and Apple tests, described below. I believe, the main issue is: constexpr evaluation hit maximum step limit; possible infinite loop? in a tests, which I didn't modify.

Failed Tests (1):
  apple-libc++-backdeployment.cfg.in :: std/strings/basic.string/string.modifiers/string_insert/size_pointer_size.pass.cpp

This update:

  • tries to solves an issue with wchars in a correct way (#ifndef TEST_HAS_NO_WIDE_CHARACTERS). Hopefully, this time everything will work.

Main problem:
I do not know how to fix the problem with Windows and Apple tests failing. I do not modify the failing test.
Below I post details, I will apprecieate every suggestion.

C:\ws\w32-1\llvm-project\libcxx-ci\libcxx\test\std\strings\basic.string\string.modifiers\string_insert\size_pointer_size.pass.cpp:403:17: error: static_assert expression is not an integral constant expression
  static_assert(test<std::basic_string<char, std::char_traits<char>, min_allocator<char>>>());
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
C:/ws/w32-1/llvm-project/libcxx-ci/build/clang-cl-static/include/c++/v1\string:1714:15: note: constexpr evaluation hit maximum step limit; possible infinite loop?
              std::construct_at(std::addressof(__begin[__i]));
              ^
C:/ws/w32-1/llvm-project/libcxx-ci/build/clang-cl-static/include/c++/v1\string:2542:5: note: in call to '__begin_lifetime({&{*new char[192]#1220}[0]}, 192)'
    __begin_lifetime(__p, __allocation.count);
    ^
C:/ws/w32-1/llvm-project/libcxx-ci/build/clang-cl-static/include/c++/v1\string:3067:7: note: in call to '&s_short->__grow_by_and_replace(95, 0, 16, 0, 0, 16, &{*new char[96]#1219}[0])'
      __grow_by_and_replace(__cap, 0, __sz, __pos, 0, __n, __s);
      ^
C:\ws\w32-1\llvm-project\libcxx-ci\libcxx\test\std\strings\basic.string\string.modifiers\string_insert\size_pointer_size.pass.cpp:384:13: note: in call to '&s_short->insert(0, &{*new char[96]#1219}[0], 16)'
    s_short.insert(0, s_short.data(), s_short.size());
            ^
C:\ws\w32-1\llvm-project\libcxx-ci\libcxx\test\std\strings\basic.string\string.modifiers\string_insert\size_pointer_size.pass.cpp:403:17: note: in call to 'test()'
  static_assert(test<std::basic_string<char, std::char_traits<char>, min_allocator<char>>>());

Therefore there is no attempt to fix it. I will apprecieate every suggestion.

This update:

  • Applies the patch to newest commit from master branch.
  • Fixes an issue with one test.
  • Removes remaining auto-changes to style outside of a new code.

On a general note: I don't have a problem with formatting the code around the changes, but let's do that in an NFC patch if you want to do that. The actual changes are quite hard to find sometimes.

libcxx/include/string
1831–1871

__zero doesn't exist anymore in trunk.

1888–1889

These functions seem to be unchanged. If that's the case, please remove the formatting changes. This patch is already quite large, no need to make it larger.

1924

It's not clear to me what is actually checks for. Is it to check whether to poison the short string? I don't really have a better name right now, maybe someone has a good idea.

1926

Shouldn't it be legal to just poison the first sizeof(__rep) bytes? If the allocator is empty, it isn't allowed to access the memory anyways, and otherwise it will be layed out after the internal string representation.

1951–1954

I don't think we should design the interface for this. It should just be __annotate_delete(); do-whatever-operation; __annotate_new(size). That makes the interface simpler and avoids any problems with the invariants not holding. I don't think any function is designed for this (other than the setters of course). Any performance-gains are probably non-existant to not-worth-the-effort. Taking a quick look through the code I also couldn't find a case where the invariants don't hold.

AdvenamTacet added inline comments.Feb 21 2023, 8:35 PM
libcxx/include/string
1924

Yes, short string is poisoned if and only if this function returns true.

1926

Good point, it actually should work. I will test it and think about some edge cases.

1951–1954

That decision was made because it was necessary, but a lot changed since that code was written. I will test if it's still necessary.

AdvenamTacet marked 4 inline comments as done.

This update is based on the feedback from code review:

  • removes __zero,
  • removes unnecessary functions,
  • partially removes too agressive auto refactor.

There are still requests in the code reveiw not fixed by that update.
I will try to finish it Tomorrow, as soon as possible.

The actual changes are quite hard to find sometimes.

Sorry, I used auto-refactor, which was too agressive, but I realized it much later.
I'm cleaning it!

AdvenamTacet added inline comments.Feb 21 2023, 10:15 PM
libcxx/include/string
980

I will fix order.

AdvenamTacet marked 4 inline comments as done.Feb 22 2023, 8:51 PM
AdvenamTacet added inline comments.
libcxx/include/string
1926

With minor changes, it works. I added it.

1951–1954

It is no longer necessary, I removed it.

This update:

  • adds support for annotating short strings with every allocator,
  • removes special-casing (template) from __annotate_* interface, makes it simpler,
  • cleans too aggressive style changes.

I believe, this patch and the prvious one, answers all points from the code review. If there is anything I should change, please, let me know.

This update applies changes to the newest commit from master and fixes a few syntax issues.

I do not see anything else to change or improve, if you think I should, let me know.
The only thing I still want to do is to test those changes by fuzzing with the newest diff.

philnik added inline comments.Feb 23 2023, 11:43 AM
libcxx/include/string
907–911

The __annotate_delete() shouldn't be necessary here, right? The function is already marked _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS, so it should be enough to call __annotate_delete() right before __default_init() if it's needed at all.

913

This seems to be redundant. __default_init() already calls __annotate_new(0).

1774–1775

What exactly is the problem here? The memory should never actually be accessed, right? __r_.second() should just return a reference to either

  1. memory outside the poisoned area, or
  2. a reference inside the poisoned area that will never actually be used in a meaningful way (because of EBO).
1836
1857–1869

This should be equivalent, or did I miss anything? Also, I think the + 1 for the old version is too much. __get_long_cap() returns the capacity including the null terminator. The __current_size argument should also be unnecessary if we assume the string meets the invariants. Assuming I didn't miss anything, this (in part) also applies to the other helpers.

2015

Nit: I think __old_size would be nicer. I know that's not how it's done elsewhere, but at least I'm not Hessian.

2517

What invariant doesn't hold after calling __grow_by?

2554–2556

I think this comments makes more sense above the function, since it's about it in general.

3159

Same question here: Which invariants are not met after calling __grow_by? (And maybe add a comment to __grow_by to mention that it fails to keep the invariants)

4230

These annotations shouldn't be necessary. string(__uninitialized_size_tag(), ...) already sets the correct annotations.

libcxx/test/std/strings/basic.string/string.asan/append.pass.cpp
1 ↗(On Diff #499775)

These tests are very libc++-specific, so we should just put them in test/libcxx/string/basic.string and then have one test per overload. I know this is quite a bit of refactoring work, but it makes verifying the tests and figuring out what is broken a lot easier when it's necessary. You can just mirror the naming inside test/std and add a .asan in-between, i.e. std/.../string_append/initalizer_list.pass.cpp would become libcxx/.../string_append/initializer_list.asan.pass.cpp.

I didn't take a super close look at the tests yet, but other than splitting them up they look quite good.

18 ↗(On Diff #499775)

We normally use class here.

26–64 ↗(On Diff #499775)

Maybe just use std::char_traits<T>::length(str)? This is currently technically UB, since you are adding overloads for a function from the standard without a user-defined type. Also, function template specializations are quite brittle. Just use overloads instead. You never call the char version, since the C version is a better match. (https://godbolt.org/z/W8qdeGqPd)

68 ↗(On Diff #499775)

Let's use using here instead. (Also works in C++03)

libcxx/test/std/strings/basic.string/string.asan/swap.pass.cpp
74–99 ↗(On Diff #499775)

It should be possible to simplify these instantiations to meta::for_each(meta::character_types(), TestClass()); (probably modulo most vexing parse).

philnik added inline comments.Feb 23 2023, 11:49 AM
libcxx/include/__string/extern_template_lists.h
30

Why is this required?

AdvenamTacet added inline comments.Feb 23 2023, 4:40 PM
libcxx/include/__string/extern_template_lists.h
30

We don't really want to use external templates here. Those we cannot control.

Without that casing, non-instrumented functions may be used. This leads to false positives. With that change, during compilation, a version with instrumentation is created. I never had a problem of missing instrumentation after that change.

Now when I look at it, && (_LIBCPP_CLANG_VER >= 1600) should be removed, same problem may happen with older LLVM.

libcxx/include/string
907–911

Unfortunately not. Sometimes instrumentation is turned on here. I tried that kind of solution first.

During las year I saw a lot of inconsistent behavior of ASan with variable initialization in constructors (... : var_name(value)) and turning off instrumentations (_LIBCPP_STRING_INTERNAL_MEMORY_ACCESS).

In that case, it's necessary with test_allocator (and probably some other of non-zero size, but I didn't confirm). Sometimes that kind of initialization is instrumented, even when function is not.

If we move it to the first line of the functions body (__r_ = std::move(__str.__r_);), then you are right.
And then, __annotate_delete() does not have to be called every time.

But with unpoisoning every time, _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS seems to be not necessary.

1857–1869

It's not same, but with new __sanitizer_annotate_contiguous_container should work.
data() returns address of content, but with old implementation address of buffer to annotate had to be aligned, therefore a byte with metadata before content had to be treated as char-in-use all the time.
Now it should work.

At the same time, thanks to _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS byte after content (Alternate String Layout) may be poisoned all the time (when string is short).
It has a nice feature of always having a poisoned separator between buffer and a next object, even when red zone is not possible (eg. arrays).

That's a nice suggestion. Thx!

2554–2556

Actually, it probably makes more sense to have it next to #define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS ..., as a few functions should have that comment. I will move it.

AdvenamTacet marked 6 inline comments as done.

This update implements part of requested improvements and fixes:

  • adds __annotate_new(0) to __default_init(),
  • fixes use of __annotate_delete() in variable initialization,
  • fixes asan_testing.h,
  • fixes comments.
libcxx/include/string
907–911

But __annotate_delete() should be only called if __str is short.

913

It didn't, but it actually can. I implemented it, makes code cleaner.

1774–1775

Right, fixed.

AdvenamTacet updated this revision to Diff 500202.EditedFeb 24 2023, 7:30 AM
AdvenamTacet marked 4 inline comments as done.

This update:

  • simplifies helpers,
  • class instead of typename,
  • fixes UB (strlen),
  • __old_size instead of __old_sz.
AdvenamTacet marked 3 inline comments as done.

This update:

  • removes one unnecessary call to __annotate_delete(),
  • typedef -> using,
  • moves 3 tests.

This update applies changes to the newest commit from master branch.

AdvenamTacet marked an inline comment as done.Feb 24 2023, 7:59 PM
AdvenamTacet added inline comments.
libcxx/test/std/strings/basic.string/string.asan/append.pass.cpp
1 ↗(On Diff #499775)

I mostly removed my tests and merged them with already existing tests. In just few cases I saw a value in leaving .asan. file.

AdvenamTacet marked 3 inline comments as done.

This update refactors test.
I kept only few of my tests and mostly added ASan checks into existing tests.
I also extended one or two tests.

I elieve, the revision is ready for upstream.

AdvenamTacet added inline comments.Feb 24 2023, 9:06 PM
libcxx/include/string
3159

It's not about strings invariants, but ASan annotations. After __growby, those are different than size (as __growby does not set size, but extends annotations), setting size just after __grow_by is enough to make it work and it's probably a nicer decision.

Double includes removed.

I think, all requests from code review are fixed.

philnik added inline comments.
libcxx/include/__string/extern_template_lists.h
30

IMO the right thing here would be to have an instrumented library, since the same problem exists for any other externally defined function, but I guess I'm OK with this for now. @ldionne are you OK with this?

31–33

This comment is wrong.

libcxx/include/string
907–911

I'm not a huge fan of this construct, but unfortunately I don't have a good suggestion right now. Let's at least add a comment explaining why it's done this way.

1857

Isn't this redundant? __annotate_short_string_check() already checks this.

3098–3099

The indentation is wrong.

3159

I think the annotations in __grow_by are wrong. Since __grow_by doesn't change the size of the string, it should annotate the memory to be __old_sz large, not that all the memory is available. That should fix the inconsistency here. Then you can just call __null_terminate_at before traits_type::assign and everything should be working properly. This would then also catch problems in a user-defined traits_type::assign.

philnik added inline comments.Feb 25 2023, 3:23 AM
libcxx/test/std/strings/basic.string/string.nonmembers/string_op+/string_string.pass.cpp
42

All these checks are libc++-specific, so we should mark them that way.

libcxx/test/support/asan_testing.h
38
51
54
AdvenamTacet marked 7 inline comments as done.

This update:

  • adds requested comments,
  • sets asserts to libc++ specific (LIBCPP_ASSERT),
  • applies small code fixes:
    • _LIBCPP_CLANG_VER -> TEST_CLANG_VER,
    • std::__libcpp_is_constant_evaluated() -> TEST_IS_CONSTANT_EVALUATED,
    • __is_string_short -> is_string_short,
    • removes incorrect comments,
    • fixes indentation.
AdvenamTacet marked an inline comment as done.Feb 25 2023, 4:05 PM
AdvenamTacet added inline comments.
libcxx/include/string
1857

But for long strings it also has to be checked, as every string is long during constant evaluation.

bool __is_long() const _NOEXCEPT {
    if (__libcpp_is_constant_evaluated())
        return true;

It has to be checked here and we cannot wait until the check in __annotate_contiguous_container, because during constant evaluation, next line would rise cannot perform pointer arithmetic on null pointer.

I don't want to remove it from __annotate_short_string_check(), because then returned values won't be correct.
And I don't want to remove it from __annotate_contiguous_container to make sure it can be simply called.
It also has no impact on performance.

Possibly it may be changed to __annotate_short_string_check() || (!__libcpp_is_constant_evaluated() && __is_long()), but I have a strong preference for current version, as that would suggest possibility of poisoning short string during constant evaluation.

AdvenamTacet marked an inline comment as done.Feb 25 2023, 4:27 PM

This update fixes problem with __grow_by:

  • it adds __set_long_size(__old_sz); to __grow_by and thanks to that, size value is always correct,
  • it changes how __grow_by annotates memory,
  • it adds new annotations, required after above change.
AdvenamTacet added inline comments.Feb 25 2023, 6:33 PM
libcxx/include/string
3159

Ugh... my original comment was correct, please ignore my previous response. Sorry for confusion.

Since __grow_by doesn't change the size of the string, it should annotate the memory to be __old_sz large, not that all the memory is available.

I agree, it makes more sense. I changed it. Now it works like that.

Btw. previous implementation didn't just unpoison whole memory. as __grow_by knows how many elements will be added, it unpoisoned exact number of bytes, so the buffer were ready.


The problem is with size value. In original implementation, __grow_by does not call __set_long_size, therefore if it's called in a short string, size value is no longer correct.

  • I added a call to __set_long_size inside __grow_by.
  • Fixed how it annotates buffers (1).
  • I made minimal changes to functions using __grow_by (now diff with master is much smaller).

The real problem was later in __annotate_increace(n) called by __null_terminate_at, which requires a correct size value.


I see that my last patch failed, but it should be easy to fix, I had to make a mistake with updating some value.
Somewhere inside ::assign(size_type __n, value_type __c).

$ "/usr/bin/python3.10" "/home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/libcxx/test/../utils/run.py" "--execdir" "/home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/build/generic-asan/test/std/strings/basic.string/string.modifiers/string_assign/Output/size_char.pass.cpp.dir" "--" "/home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/build/generic-asan/test/std/strings/basic.string/string.modifiers/string_assign/Output/size_char.pass.cpp.dir/t.tmp.exe"
# command stderr:
AddressSanitizer: CHECK failed: asan_poisoning.cpp:454 "((*(u8 *)MemToShadow(a))) == ((0))" (0x2, 0x0) (tid=354701)
    #0 0x5651a5546f51 in __asan::CheckUnwind() asan_rtl.cpp.o
    #1 0x5651a555ea32 in __sanitizer::CheckFailed(char const*, int, char const*, unsigned long long, unsigned long long) (/home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/build/generic-asan/test/std/strings/basic.string/string.modifiers/string_assign/Output/size_char.pass.cpp.dir/t.tmp.exe+0xdaa32) (BuildId: 05ded1d03f29cb11827d1b47b24d2ce97422315a)
    #2 0x5651a55402ca in __sanitizer_annotate_contiguous_container (/home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/build/generic-asan/test/std/strings/basic.string/string.modifiers/string_assign/Output/size_char.pass.cpp.dir/t.tmp.exe+0xbc2ca) (BuildId: 05ded1d03f29cb11827d1b47b24d2ce97422315a)
    #3 0x5651a557cfbb in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__annotate_contiguous_container[abi:v170000](void const*, void const*, void const*, void const*) const /home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/build/generic-asan/include/c++/v1/string:1841:11
    #4 0x5651a557f60a in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__annotate_shrink[abi:v170000](unsigned long) const /home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/build/generic-asan/include/c++/v1/string:1874:9
    #5 0x5651a557c32d in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::__null_terminate_at[abi:v170000](char*, unsigned long) /home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/build/generic-asan/include/c++/v1/string:2050:9
    #6 0x5651a557b48f in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>::assign(unsigned long, char) /home/libcxx-builder/.buildkite-agent/builds/356fcbab9cbd-1/llvm-project/libcxx-ci/build/generic-asan/include/c++/v1/string:2608:12

This update fixes issues mention above, the problem was missing if.
(__annotate_increase should be called only when size was increased).

I also refactored helpers, as every time __beg and __end are same, there
is no point to pass them every time.
Now those variables are set in __annotate_contiguous_container.

In next update I will remove support for different allocators and move them to
a new patch, similar to D136765. Here I want to confirm that everything works on CI.

AdvenamTacet edited the summary of this revision. (Show Details)Mar 8 2023, 6:40 PM
AdvenamTacet marked an inline comment as done.

This update applies the patch to newest commit from master branch.

Using std::addressof instead of &.
Patch applied to a newer commit from master.

This update removes support for all allocators, now annotations work with only default allocator.
It updates the test, maing sure that with other allocators memory is not poisoned.

It also adds a way to turn off annotations with _LIBCPP_ASAN_ANNOTATE_ONLY_LONG, in case someone has a specific case where short strings shouldn't be annotated.

Support for annotating all allocators will be added in a separate patch.

I believe, this patch is ready for upstreaming.

ldionne requested changes to this revision.Mar 17 2023, 2:52 PM

Could you give a really simple example of issues this allows catching? Like the "Hello world" for this new functionality. I think that would be useful for folks (at least for me) to understand concretely what we're gaining (I have no doubt we're gaining something, I just don't see it clearly in my mind).

I see a lot of added complexity in this patch. It's only a couple of ifs here and there, but it's in some of the trickiest code and most used code in the library. And that code is already very much in need of refactoring. Honestly, that makes me worried and from the maintainability and ease-to-validate-correctness perspective, this is only making things worse. It may still be worth doing if we're gaining something massive in return, but there's a clear tradeoff here. Also, this area is:

  1. Super tricky cause it touches critical and already crufty code that few people understand well
  2. In active development: constexpr-ification recently, hardening work, the debug mode that we gotta probably rip out, and now this

All in all, I guess I wish there had been a design document/RFC explaining this stack of changes before starting to try to get code changes into the library. If there's one, please point it out to me and apologies for my ignorance.

libcxx/include/__string/extern_template_lists.h
30

Using ASAN is an ABI affecting change, kind of like our debug mode. Doing this will only seem to make things work, but in reality it'll cause other problems. For example, if you take a string created inside the dylib (without ASAN) and you try to use it from your ASAN-compiled user program, I assume you'll run into issues, right?

The solution is (quite unfortunately) to have a different built library for that configuration, unless I missed something.

libcxx/include/string
1848

_LIBCPP_ASAN_ANNOTATE_ONLY_LONG adds yet another configuration option. We try not to add configuration options unless we absolutely need to, so I would remove it. If users need to specify whether to annotate short strings or not, IMO the feature is mis-designed. Users shouldn't even be aware of the fact that we have short strings and long strings and they are implemented differently.

This revision now requires changes to proceed.Mar 17 2023, 2:52 PM
AdvenamTacet marked an inline comment as done.

This update reverts part of a previous update and removes _LIBCPP_ASAN_ANNOTATE_ONLY_LONG.

AdvenamTacet added inline comments.Mar 20 2023, 12:19 PM
libcxx/include/__string/extern_template_lists.h
30

In general, ASan does accept false positives in situation when part of the program is compiled without ASan. That may happen with std::vector annotations already. But minimizing the risk of it is a good course of action.
I agree that a separate built is the goal solution. I think, it may work as a temporary solution and then be changed into a separate library built. What do you think?

But I understand if you disagree, just I'm also not sure what has to be changed here to add an ASan library build to llvm. What has to be done? Where should I look to learn it?
Please, let me know.

libcxx/include/string
1848

It's not necessary, I added it after discussion about problems ChromeOS had as another way to turn off annotations more granularly. However, I believe that D145628 already answers that problem. Removed it.

@ldionne When a program is compiled with ASan, these annotations allow it to detect all accesses to memory that has been allocated but is not currently in use by an object of std::basic_string type. The only exception are functions where instrumentation has been explicitly disabled.

String annotations already exist in Microsoft C++ Standard Library.
Link: https://devblogs.microsoft.com/cppblog/stdstring-now-supports-address-sanitizer/

It’s not the same as libc++ hardening. While all accesses with operator[] or iterators are also detected, particularly important are cases with raw pointers, such as when passing data to C functions (strcpy() etc.).

The simplest example may be:

std::string s = “abcdefg”;
char *ptr = &s[4];
s = “”;
char c = *ptr; // ASan error

A write to *ptr would be detected as well.

Examples of what is detected are:

  • incorrect use of C functions,
  • random access to that memory (*(char *)randint() = 'x'; will be detected, if by chance unused strings memory is accessed),
  • use after resize (as in the example above),
  • other BO etc.

A real world bug, which motivated that research was as follow (simplified):

bool custom_cmp(const char l, const char r) {
        std::cout << "DEBUG print: " << l << " " << r << std::endl;

        return l < 'a' || l == r;
}

void foo() {
  std::string str1 = "A VERY LONG STRING IS HERE";
  std::string str2 = "short string";
  const char* iter1_begin = str1.c_str();
  const char* iter1_end   = str1.c_str() + str1.size();
  const char* iter2_begin = str2.c_str();

  std::cout << (std::equal(iter1_begin, iter1_end, buf, custom_cmp)) << std::endl;
}

Out-of-bounds read could happen as two strings were compared via a std::equals function that took iter1_begin, iter1_end, iter2_begin iterators (with a custom comparison function). When object str1 was longer than str2, read out-of-bounds on iter2 could happen. Container sanitization would detect it.
I hope it shows when those annotations are applicable. Let me know if you want me to add a bunch of samples to the patch. I can add a bunch of failing unit tests with examples. Is it a good idea?

This implementation uses the same API as std::vector (function: __sanitizer_annotate_contiguous_container) and from high level is the same here and there.
Long std::basic_string is almost the same as std::vecotr, so logic is absolutely the same. The difference is support for short strings, but that logic is very similar as well.
Long string (with standard allocators) annotations work with old ASan API (does not require rG1c5ad6d2c01294a0decde43a88e9c27d7437d157, so long-only patch wouldn't create a problem for ChromeOS). Only short string annotations require new API rG1c5ad6d2c01294a0decde43a88e9c27d7437d157.
If you think it's a good idea, I can split this patch into two. One for long only, second enabling SSO annotations.

It is worth mentioning that the change, proposed by me here, has negligible impact on performance, because when binary is already compiled with ASan, every access to strings memory is instrumented either way. Code from that change is executed only when strings size is modified and it's proportional to the change. All functions are no-ops when compiled without ASan.


Examples are above, here I let myself to digress a little. One may argue that some out of bounds errors may be detected by ASan without those changes, and that's true. If out of bound reaches memory outside of allocated area, ASan detects the bug without those changes. In situations like that, those changes increase speed of detecting errors. Potentially significantly, but I will come back to it. It's also worth to remember that there are no poisoned areas between elements in arrays.

std::string s[1024];
// ^ There are NO redzones between elements.

And memory is not poisoned inside classes between variables (and at the beginning and the end):

class X {
std::string x;
// NO redzone
int a[3];
};

Therefore, not every access outside of objects memory is detected. (Eg. above, s[0] is short, without detecting an error, BO can write over rest of 1023 strings memory. Error will be detected when memory after s[1023] is reached.) With those changes OOB will be detected, unless size() == capacity() before OOB happenes (case when there is no allocated and not-in-use memory).


Because short strings are on stack, bugs affect not only heap. Bellow I show modified example from above, with standard comparator, which leaks stack memory (including canary). Assumptions:

  • user fully controls s[0],
  • user has access to result of std::equals,
  • it's super simplified.
bool result = false;
char x = 0;
size_t n = 0;

void user_input(std::string &s) {
  // Simulation of user providing input
  n += 1;

  if(result) {
    x = 0;
    std::cout << "Leaked " << s.size() << " bytes, with " << n << " requests." << std::endl;
    std::cout << "Leak: " << s << std::endl << std::endl;
    s.push_back('\0');
  } else {
    if(!s.empty()) s.pop_back();
    x += 1;
    s.push_back(x);
  }
}

void leak_cannary() {
  std::string s[3];
  s[2] = "secret data";
  for(int i = 0; i < 'z' * 100; ++i) {
    result = std::equal(s[0].begin(), s[0].end(), s[1].data());
    // User has access to the result.
    user_input(s[0]);
  }
}

Example result is: Leaked 169 bytes, with 12062 requests.

Chance of finding that bug with simple (random) harness and today ASan is close to zero, but it can be easily exploited.

Here I won't fully offtop how off-by-one on a buffer just before string may leak stack as well (same off-by-two on string with Alternate String Layout). If someone is interested, I'm happy to talk about ways to exploit SSO (connected with some bugs) outside of that revision, I was looking at it to create a CTF challenge.

This update should resolve the conflict, it applies changes on the newest commit.

This update is strictly related to refactor, which happened on the main branch.
No real changes included, newest commit from main as base.

This update turns off annotations for short strings.
It is a small change and I will create another revision simply turning on annotations for short strings.

I was thinkin a lot about it and I believe it's better to have one patch turning on annotations for long strings only (with the default allocator only) and a separate one, very simple, turning on annotations for short strings as well.
My reasoning:

  • Long strings with default allocator works with old ASan API. (Main reason.)
  • There is much smaller chance to see a similar carse to area allocators (something we should address) with only long strings.
  • If someone reports a problem, it's very helpful to see which change exactly caused the problem.

This update:

  • fixes problem with tests after setting _LIBCPP_SHORT_STRING_ANNOTATIONS_ALLOWED,
  • removes #if ... [code] #endif from the test.

This is a fix after the previous update.

Also, I created a revision to turn on short string annotations.
Check D147680 for details.

Now short strings are not poisoned at all with this revision.

This update tries to fix the merge conflict.

ldionne added inline comments.
libcxx/include/__string/extern_template_lists.h
30

I believe this means that -fsanitize=address would have to cause a different libc++.dylib to be linked. So for example, the driver would add e.g. -l c++asan instead of -l c++, and we would have to produce libc++asan.dylib in addition to libc++.dylib. That's not a small change though, since it seems like this might be the first time a sanitizer has an effect on the version of libc++ we have to link against. But it might be the right way to go (probably the only way to go, really). In practice, disabling extern templates like this is only going to give users the impression that things work and then they will run into issues.

CC @vitalybuka

libcxx/include/string
1847

Would __annotate_short_string_check be better called something like __short_string_is_annotated()?

2406

Isn't that a pre-existing problem that should be fixed in its own patch?

2524

I feel like this function would be a lot simpler if we instead called __annotate_contiguous_container directly, no?

2615

This whole set of changes feels kind of clunky. Why don't we annotate delete and then annotate new instead? Wouldn't that get rid of the special this check?

Thanks a lot for the write up, I think this makes sense. I still find this more intrusive than I'd like, but let's see how we can improve this in a few places.

AdvenamTacet added inline comments.Apr 19 2023, 2:00 AM
libcxx/include/string
2406

I created a separate revision with that change: D148693
I will remove it from this patch in the next update.

This update:

  • changes name of __annotate_short_string_check function to __asan_short_string_is_annotated,
  • removes the fix moved to D148693 revision,
  • simplifies a function entioned in a code review,
  • fixes use of & instead of std::addressof in libcxx/test/support/asan_testing.h.

Thank you for looking at it!

AdvenamTacet marked an inline comment as done.Apr 19 2023, 3:29 AM
AdvenamTacet added inline comments.
libcxx/include/string
1847

I changed it to __asan_short_string_is_annotated, but I don't fully like those names. I added asan so it is clear what kind of annotations are [not] there.

2524

I do not think so, and I don't like the idea of calling __annotate_contiguous_container directly, but I did improve the function a lot. Hope it's ok now.

2615

No, it wouldn't. Here we are working on __str and not *this. We cannot __annotate_delete __str, if it's long because we would have to annotate whole string every time (big performance hit and stil check).

We also shouldn't call __annotate_delete on *this and __str if those are the same objects.

So, the check would be just earlier. Or maybe I don't see something here?

AdvenamTacet added inline comments.May 14 2023, 9:16 PM
libcxx/include/__string/extern_template_lists.h
30

@vitalybuka do you have any suggestion how to implement it?

Btw. a separate dylib may help with issue described here: https://github.com/llvm/llvm-project/issues/62431 (not sure, tho).

This update:

  • fixes annotations after changes in D148693,
  • does rebase.

Now the revision is working, the only missing element is a separate dylib for ASan.

AdvenamTacet added inline comments.Jun 10 2023, 4:58 PM
libcxx/include/__string/extern_template_lists.h
30

Hey @ldionne,
could you tell me where I should look to modify it? I don't really know how to start working on that change and I believe, it's the only missing part after landing D148693. I would appreciate letting me know at what files I should look first.

Btw. it won't affect the issue mentioned above.

ldionne added inline comments.Jun 12 2023, 12:32 PM
libcxx/include/__string/extern_template_lists.h
30

I think we would want to start with a RFC on Discourse. This would be the first time that we introduce another shared library for libc++ and there are a few things to coordinate:

  1. We need to build this new version of the library (do we do that with multiple CMake invocations or do we do like compiler-rt and allow building multiple libraries from a single invocation?)
  2. We need to ship this library as part of the LLVM release and vendors need to also start shipping it with their own releases if they want -fsanitize=address to work
  3. Do we need to also provide a -fsanitize=address version of other components in the LLVM stack? For example, if we ship another library that uses libc++ and we want it to work with -fsanitize=address, we would likely need to provide a sanitized version of it or else if e.g. a std::string is created inside that component and passed to another component that was compiled with -fsanitize=address, then you'd get a mismatch.

So this is unfortunately kind of involved, but I think knowing how to do this would benefit us since we have similar needs for other efforts. For example, it would be conceivable to ship an unstable-ABI version of the library, and a hardened version as well, and doing that would follow the exact same steps.

vitalybuka added inline comments.Jun 12 2023, 1:55 PM
libcxx/include/__string/extern_template_lists.h
30

I believe this means that -fsanitize=address would have to cause a different libc++.dylib to be linked. So for example, the driver would add e.g. -l c++asan instead of -l c++, and we would have to produce libc++asan.dylib in addition to libc++.dylib. That's not a small change though, since it seems like this might be the first time a sanitizer has an effect on the version of libc++ we have to link against. But it might be the right way to go (probably the only way to go, really). In practice, disabling extern templates like this is only going to give users the impression that things work and then they will run into issues.

For msan, tsan, instrumented libc++ was requirement all the time.
For asan, hwasan it's highly recommended.

For all use-case I work, we do that on build system level.
Our sanitizer*bootstrap build bots do that as well.

It would be nice if libc++ is shipped with each version. Sometimes additional flags may affect behavior of sanitizer, but even version instrumented with default flags will be useful.

CC @vitalybuka

@vitalybuka do you have any suggestion how to implement it?

I don't have specific thoughts on how to implemented.
I guess libc++ could install all sanitized versions with some prefixes: asan/msan/tsan/hwasan
Then the driver will pick one to link.

Btw. a separate dylib may help with issue described here: https://github.com/llvm/llvm-project/issues/62431 (not sure, tho).

30

I believe this means that -fsanitize=address would have to cause a different libc++.dylib to be linked. So for example, the driver would add e.g. -l c++asan instead of -l c++, and we would have to produce libc++asan.dylib in addition to libc++.dylib. That's not a small change though, since it seems like this might be the first time a sanitizer has an effect on the version of libc++ we have to link against. But it might be the right way to go (probably the only way to go, really). In practice, disabling extern templates like this is only going to give users the impression that things work and then they will run into issues.

CC @vitalybuka

30

I think we would want to start with a RFC on Discourse. This would be the first time that we introduce another shared library for libc++ and there are a few things to coordinate:

  1. We need to build this new version of the library (do we do that with multiple CMake invocations or do we do like compiler-rt and allow building multiple libraries from a single invocation?)

Single invocation would be nice, as it's better scale on other sanitizers.

  1. We need to ship this library as part of the LLVM release and vendors need to also start shipping it with their own releases if they want -fsanitize=address to work

please include msan/hwasan/tsan versions into the plan

lsan/ubsan do not needed special libc++.
dfsan is close to msan, but it's very specific tool, so we can skip for now.

AdvenamTacet added inline comments.Jun 28 2023, 6:41 PM
libcxx/include/__string/extern_template_lists.h
30

Created a RFC: https://discourse.llvm.org/t/rfc-instrumented-versions-of-libc-for-different-sanitizers/71653

Who can I/should I ping there? Who can I ping from vendors?

AdvenamTacet added inline comments.Jul 14 2023, 9:04 PM
libcxx/include/string
3159

The issue described above is addressed in D148693 and solved by rG6a9c41fdd4ca834a46fd866278c90a35f2375333. This revision has to be updated. Now the size is correct after __grow_by_without_replace.

AdvenamTacet abandoned this revision.Dec 5 2023, 8:45 AM

This revision has been moved into GH: https://github.com/llvm/llvm-project/pull/72677