This is an archive of the discontinued LLVM Phabricator instance.

Hook Rtl* allocation functions directly on Windows
AbandonedPublic

Authored by cbezault on Mar 2 2021, 9:56 AM.

Details

Reviewers
None
Summary

This is the first of several planned PRs to upstream the changes to the ASan runtime which have been completed internally by Microsoft.We are shipping this and several other changes built on top of LLVM 9 in the Visual Studio toolset.et. Our intention is to upstream our changes,
and then move our development of ASan for Windows to LLVM latest/trunk.

This PR consists of three major changes.

  1. Drop the hook_rtl_allocators flag. All the Heap* functions are just thin wrappers for their Rtl* counterparts and directly hooking them makes everything more robust.
  2. Keep track of all the ASan allocated memory associated with each heap so that on RtlDestroyHeap We can free the memory appropriately.
  3. Heaps with flags we do not currently support with the ASan allocator get passed through to the real Rtl* functions. This is required for applications which require special heaps such as the CLR runtime which requires an executable memory heap. This is an area for future improvement.

Diff Detail

Event Timeline

cbezault created this revision.Mar 2 2021, 9:56 AM
cbezault requested review of this revision.Mar 2 2021, 9:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 2 2021, 9:56 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript
cbezault retitled this revision from Summary: This is the first of several planned PRs to upstream the changes to the ASan runtime which have been completed internally by Microsoft. We are shipping this and several other changes built on top of LLVM 9 in the Visual Studio toolset. to Move Heap* hooks to just hooking the Rtl* functions directly on Windows.Mar 2 2021, 10:01 AM
cbezault edited the summary of this revision. (Show Details)
cbezault edited the summary of this revision. (Show Details)Mar 2 2021, 10:03 AM
cbezault retitled this revision from Move Heap* hooks to just hooking the Rtl* functions directly on Windows to Hook Rtl* allocation functions directly on Windows.
cbezault planned changes to this revision.Mar 2 2021, 10:11 AM
cbezault updated this revision to Diff 327644.Mar 2 2021, 6:19 PM

Fixup headers.

cbezault planned changes to this revision.Mar 12 2021, 10:59 AM
jblazquez added a subscriber: jblazquez.EditedJul 19 2021, 12:29 PM

@cbezault , I think these changes can cause hangs on shutdown when linking against the static runtime.

You may want to check our discussion here: https://developercommunity.visualstudio.com/t/ASAN-hang-on-exit-in-__sanitizer::proc/1453688

Hi Javier,

This changeset is not what is currently being shipped with VS but I did fix
a deadlock a few weeks back that could be causing the hang you observe.
Try out the latest version of VS and let me know if your issue is still
present. I'm hoping to come back to this differential in the next couple of
weeks, so you can take a look at the code then.

Curtis

Hi Curtis, thanks for the quick reply.

I just rebuilt one of our apps with the latest MSVC toolset (14.30.30401, from VS2022 Preview 2.0), and the issue still occurs. Here's a callstack I just got:

MyApp.exe!__sanitizer::internal_sched_yield() Line 887
MyApp.exe!__sanitizer::StaticSpinMutex::LockSlow() Line 56
[Inline Frame] MyApp.exe!__sanitizer::StaticSpinMutex::Lock() Line 31
[Inline Frame] MyApp.exe!RecursiveScopedLock::{ctor}(__sanitizer::SpinMutex &) Line 31
[Inline Frame] MyApp.exe!AsanHeapHandle::{ctor}(AsanHeap & heap, void *) Line 602
MyApp.exe!GetAsanHeapHandle(void * heap, void * memory) Line 631
MyApp.exe!__asan_wrap_RtlFreeHeap(void * HeapHandle, unsigned long Flags, void * BaseAddress) Line 772
bcryptprimitives.dll!_TearDownRNGStatesCommon@0
bcryptprimitives.dll!_TearDownRNGStates@0
bcryptprimitives.dll!_MsProviderStop@0
bcryptprimitives.dll!_DllMain@12
ntdll.dll!_LdrxCallInitRoutine@16
ntdll.dll!LdrpCallInitRoutine()
ntdll.dll!_LdrShutdownProcess@0
ntdll.dll!RtlExitUserProcess()
kernel32.dll!_ExitProcessImplementation@4
MyApp.exe!exit_or_terminate_process(const unsigned int return_code) Line 143
MyApp.exe!common_exit(const int return_code, const _crt_exit_cleanup_mode cleanup_mode, const _crt_exit_return_mode return_mode) Line 280
MyApp.exe!exit(int return_code) Line 293
MyApp.exe!__scrt_common_main_seh() Line 310
kernel32.dll!@BaseThreadInitThunk@12
ntdll.dll!___RtlUserThreadStart@8
ntdll.dll!__RtlUserThreadStart@8

I have this issue in the debugger right now, and I could potentially share a full memory dump with you or assist in other ways in order to dig into the problem.

Please let me know how you'd like to proceed, and whether there's a better venue than reviews.llvm.org for this discussion.

Hi Javier,

A full memory dump could be useful, even more useful would be the
application which is suffering from the hang.
If you could open a new devcom ticket and just send me the link so I can
get it triaged more quickly that would probably be best.

Hi Curtis, I created a new ticket and attached a full memory dump + PDB for you: https://developercommunity.visualstudio.com/t/AddressSanitizer:-Hang-on-exit-inside-AS/1479116

I noticed a Lock call in RtlDestroyHeap that doesn't have a corresponding Unlock. Could that be it?

compiler-rt/lib/asan/asan_malloc_win.cpp
421

There is no Unlock call here. Could this be the cause of the hang?

Looks like I found the problem. It's pretty dumb :( . The issue is that at
the top of RtlDestroyHeap I destroy the instance of the AsanHeap associated
with the HeapHandle but anywhere that calls GetAsanHeap will just recreate
it, leading to the race condition. This is a pretty easy fix but I'm kind
of backlogged and won't be able to get something in for the next couple of
weeks.

I’m glad we found the problem.

We’re not blocked by this issue, since we have recently switched to linking against the upstream compiler-rt instead of using the MSVC-provided libraries, as a temporary workaround.

Looking forward to seeing the latest MSFT changes upstreamed.

Hi friends,

We've also run into this problem after upgrading our compilers and attempting to run our servers with ASan enabled. I was wondering if there was any update about the fix mentioned here? We're willing to upgrade to VS2022 Preview if it's been fixed there, but I assume that it wasn't fixed as part of the Preview 3 release.

We’re not blocked by this issue, since we have recently switched to linking against the upstream compiler-rt instead of using the MSVC-provided libraries, as a temporary workaround.

I was excited to hear this workaround is possible! Are you able to share information with me about how you accomplished this?
I'm able to compile the LLVM ASan binaries fine, and can swap out the MSVC-shipped libs for a local version at link-time, but my DLLs don't link cleanly when I do that. I think there must be some fundamental piece of this puzzle that I'm missing, as I get the same linker errors when I swap out the MSVC-shipped clang/vcasan libs from 14.29.30133 for the older copy MSVC ships with 14.28.29910. The linker errors all involve the asan_new_delete OBJ if that's any clue...

Thanks for anything either of you may be able to share, and thanks above all for the work on bringing ASan to Windows/MSVC. :)

Sorry for the spam, but I wanted to follow-up on my previous message since I've discovered what I was doing wrong. For anyone else who may run into this problem, I was trying to use /libpath with link.exe to have it skip the default libs and use the custom set I had built. For whatever reason, that doesn't work. The correct change to make, which does link correctly, was to use /inferasanlibs:no and then manually /wholearchive the ASAN libs as described on this page I hadn't discovered before: https://docs.microsoft.com/en-us/cpp/sanitizers/asan-building?view=msvc-160

Thanks again to everyone for your work on MSVC ASan :)

Hi Daniel,

The fix for the race condition is in PR review internally right now. We
should be moving to getting all our changes out in the open soon™.
For your second problem it sounds like you're running into an issue with
the way asan replaces the new and delete operators. If your program
provides user-defined new/delete operators in a header as opposed to in
their own standalone obj file you may run into the issue of multiple
defined symbols.

Curtis

rnk added a subscriber: rnk.Sep 17 2021, 3:47 PM

Hey, I just found this review today. Let me see what I can do to help get this review going.

I found my way here because I was looking at a different winasan patch: D109941

Hi Reid,

This DR isn't actually ready yet. We've made some extensive changes on our
side that we're getting ready to start breaking into digestible pieces for
review.

cbezault abandoned this revision.Jan 25 2022, 1:25 PM

Hi @cbezault, I see this review was abandoned. Is a new one coming? We're still hoping to go back to Microsoft's official ASan libraries as soon as possible, once the hang on exit issue is solved.

Hi Javier,

There is a fix in the pipeline for the hang on teardown issue.

Herald added a project: Restricted Project. · View Herald TranscriptMar 31 2022, 2:16 PM

Hi Javier,

There is a fix in the pipeline for the hang on teardown issue.

Hi @cbezault, is there any information about the studio version that will include this fix?

@poljak181 I'm no longer on the ASan team at Microsoft so I can't really comment on where things stand. The last thing I saw was that we were still running into issues where Rtl* functions were getting called during process teardown. The RAII lock destructors wouldn't get called before teardown so locks would be held by threads that no longer existed and then we would deadlock. Windows gets around this with the Heap locks by walking through all existing heaps on teardown and essentially unlocking all of the locks before proceeding to call Rtl* functions. I had proposed hooking the function that does that in order to effectively do the same thing with the ASan internal locks but I don't know if it was ever implemented. @stwish_msft

The process teardown issue (as well as a bunch of other deadlock issues resulting from this change) are going to be included in Visual Studio 17.2 Preview 3. We have plans to backport this to 16.11 as well. This is our feedback issue tracking this on DevCommunity: https://developercommunity.visualstudio.com/t/ASAN-hang-on-exit-in-__sanitizer::proc/1453688

Thanks for the update, Steve! Looking forward to switching back to MSFT's ASan libraries.