Page MenuHomePhabricator

Leak Sanitizer port to Windows
Needs ReviewPublic

Authored by clemenswasser on Dec 4 2021, 12:11 PM.

Details

Summary

Initial Leak Sanitizer port to Windows.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
vitalybuka added inline comments.Dec 6 2021, 12:58 PM
compiler-rt/lib/lsan/lsan_common.cpp
411

I guess do needs this as well on windows
The problem is that we have no MemoryMappingLayout implementation for Windows.
I guess it can be done with VirtualQueryEx
for now can you just ifdef only body of the ProcessRootRegion with TODO?

compiler-rt/lib/lsan/lsan_common_win.cpp
49

how this can work without ProcessGlobalRegions?

compiler-rt/lib/lsan/lsan_interceptors.cpp
506

Can you please just convert unneeded stuff from INTERCEPT_FUNCTION into LSAN_MAYBE_INTERCEPT
In a separate patch

compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_win.cpp
4

sanitizer_stoptheworld_test.cpp needs to be ported to windows, probably with std::
can you please move this work and sanitizer_stoptheworld_win.cpp into separate patch/patches

MyDeveloperDay added inline comments.Dec 7 2021, 12:02 AM
compiler-rt/lib/lsan/lsan.h
20

Sorry, is there a reason we can't follow the current style in this patch? sorry did I miss something? Its not about fighting clang-format its about ensuring we follow the style from the lines around it which are following the clang-format style. I don't need you to clang-format the whole file (although that would be great because this directory has very low clang-format status)


https://clang.llvm.org/docs/ClangFormattedStatus.html

All I'm asking is that we don't indent the preprocessor directives to a non LLVM style.

i.e.

#include "lsan_win.h"
#if !SANITIZER_WINDOWS
#endif

vs

#  include "lsan_win.h"
#  if !SANITIZER_WINDOWS
#  endif
MyDeveloperDay added inline comments.
compiler-rt/lib/lsan/lsan_common.h
48

it might not seem useful to individuals, but I think its the guidance, and a consistent style really help new people to understand what's going on.

https://llvm.org/docs/Contributing.html#how-to-submit-a-patch

I'm a little surprise the "llvm-premerge-checks" didn't highlight this @kuhnel

oh! gosh I'm sorry I didn't realize that lsan has this non standard LLVM style .clang-format, my apologies! D100238: [sanitizer] Set IndentPPDirectives: AfterHash in .clang-format

kind of surprised TBH, doesn't feel the indentation happens enough to warrant it, but that's up to you its not my backyard.

$ grep "#if" *.cpp | wc -l
461
llvm-project/compiler-rt/lib/sanitizer_common
$ grep "# if" *.cpp | wc -l
27
clemenswasser added inline comments.Dec 7 2021, 7:44 AM
compiler-rt/lib/lsan/lsan.h
20

Yes, this is caused by clang-format. What should I do about it?

20

@vitalybuka Should I create a patch, where I format the whole file or just these #includes?

MyDeveloperDay added inline comments.Dec 7 2021, 8:25 AM
compiler-rt/lib/lsan/lsan.h
20

You can ignore my comment, whilst I think the lsan .clang-format file is incorrect in terms of matching the LLVM style it is correct from the lsan subprojects point of view. I see now that you are doing what is in the local .clang-format file and its the rest of the file is incorrect. (you would handle that separately or not at all)

The fact that the .clang-format file doesn't follow the LLVM style is unfortunate, but as I expressed thats upto you all.

Split this patch up into multiple smaller ones:

clemenswasser marked 2 inline comments as done.
clemenswasser marked 4 inline comments as done.

Split this patch up into multiple smaller ones:

Please check "Edit Related Revisions" on the right of Phabricator. You can nicely chain this patches into "Stack"

Split this patch up into multiple smaller ones:

Implement ProcessGlobalRegions
Fix Interceptors for Windows

vitalybuka added inline comments.Dec 16 2021, 1:30 PM
compiler-rt/lib/lsan/lsan_allocator.h
52–53 ↗(On Diff #393750)
69 ↗(On Diff #393750)
compiler-rt/lib/lsan/lsan_common.cpp
248–253

same

clemenswasser marked an inline comment as done.
clemenswasser marked 2 inline comments as done.

Simplify Architecture checks

clemenswasser marked 3 inline comments as done.Dec 17 2021, 1:54 PM

@vitalybuka could you or someone else help me get the lsan tests running on Windows?
For example lets look at compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp (and basically all other tests, this serves as an easy example)
I imagine that the following defines a variable called LSAN_BASE :
// RUN: LSAN_BASE="use_stacks=0:use_registers=0"
This doesn't seem to work on Windows.
It gives the following error message:

[build] $ ":" "RUN: at line 2"
[build] $ "LSAN_BASE=use_stacks=0:use_registers=0"
[build] # command stderr:
[build] 'LSAN_BASE=use_stacks=0:use_registers=0': command not found

@vitalybuka could you or someone else help me get the lsan tests running on Windows?
For example lets look at compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp (and basically all other tests, this serves as an easy example)
I imagine that the following defines a variable called LSAN_BASE :
// RUN: LSAN_BASE="use_stacks=0:use_registers=0"
This doesn't seem to work on Windows.
It gives the following error message:

[build] $ ":" "RUN: at line 2"
[build] $ "LSAN_BASE=use_stacks=0:use_registers=0"
[build] # command stderr:
[build] 'LSAN_BASE=use_stacks=0:use_registers=0': command not found

Can you try to remove LSAN_BASE and just hardcode use_stacks=0:use_registers=0 ?
If it works don't mind if we do so for the rest (in a separate patch)

The problem is, that LSAN_BASE is not always use_stacks=0:use_registers=0.
Also there are many test cases, where LSAN_BASE gets used multiple times.
Removing it would result in bad repetition.

The problem is, that LSAN_BASE is not always use_stacks=0:use_registers=0.

Off-cause , I am asking to replace with correct value each test.

Also there are many test cases, where LSAN_BASE gets used multiple times.
Removing it would result in bad repetition.

I don't like LSAN_BASE even without Windows.
Usually when test fail, I have a single command line to reproduce or put into debugger for just the step. Here I need to cherry-pick two line: LSAN_BASE= and actual step
also in most case it used just twice, so it's not a big loss in repetition

@vitalybuka I have now removed all LSAN_BASE occurrences.
However the invocations of the compiled test executables don't seem to work either.
This doesn't work:
// RUN: %env_lsan_opts=use_stacks=0:use_registers=0 not %run %t foo 2>&1 | FileCheck %s --check-prefix=CHECK-do
It generates this error:

[build] $ ":" "RUN: at line 3"
[build] $ "env" "LSAN_OPTIONS=:detect_leaks=1:use_stacks=0:use_registers=0" "not" "E:\git\llvm-project\llvm\build\ninja_debug\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\leak_check_at_exit.cpp.tmp.exe" "foo"
[build] note: command had no output on stdout or stderr
[build] error: command failed with exit status: True

One of the problems seems to be that %t doesn't have a .exe ending on Windows.
I also tried changing %t to %t.exe, but that doesn't fix the problem and produces the same error.
I think the not command(?) is the part that fails?

When I run it on my own, it does work:

PS E:\git\llvm-project> & 'E:\git\llvm-project\llvm\build\ninja_debug\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\leak_check_at_exit.cpp.tmp.exe'
Test alloc: 000061A000000000.

=================================================================
==25768==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1337 byte(s) in 1 object(s) allocated from:
    #0 0x7ff6e2f745be in __asan_wrap_malloc E:\git\llvm-project\compiler-rt\lib\lsan\lsan_interceptors.cpp:75
    #1 0x7ff6e2f2aade in main E:\git\llvm-project\compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp:13:40
    #2 0x7ff6e2f79b5b in __scrt_common_main_seh d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #3 0x7fffde187033  (C:\WINDOWS\System32\KERNEL32.DLL+0x180017033)
    #4 0x7fffde8a2650  (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180052650)

SUMMARY: LeakSanitizer: 1337 byte(s) leaked in 1 allocation(s).

@vitalybuka I have now removed all LSAN_BASE occurrences.
However the invocations of the compiled test executables don't seem to work either.
This doesn't work:
// RUN: %env_lsan_opts=use_stacks=0:use_registers=0 not %run %t foo 2>&1 | FileCheck %s --check-prefix=CHECK-do
It generates this error:

[build] $ ":" "RUN: at line 3"
[build] $ "env" "LSAN_OPTIONS=:detect_leaks=1:use_stacks=0:use_registers=0" "not" "E:\git\llvm-project\llvm\build\ninja_debug\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\leak_check_at_exit.cpp.tmp.exe" "foo"
[build] note: command had no output on stdout or stderr
[build] error: command failed with exit status: True

does check-asan work for your setup?

@vitalybuka
No check-asan doesn't work for me. It just hangs forever and does absolutely nothing. No output, nothing showing up in Task Manager with high CPU usage or anything.
However check-clang does work. Is there some documentation on running check-asan on windows.
I also saw that none of the Windows buildbots check/test anything related to sanitizers. It seems to me that none of the sanitizer stuff gets ever tested on Windows?
Do the test for sanitizers_common/asan/ubsan/etc. even work on Windows?

@vitalybuka
No check-asan doesn't work for me. It just hangs forever and does absolutely nothing. No output, nothing showing up in Task Manager with high CPU usage or anything.
However check-clang does work. Is there some documentation on running check-asan on windows.
I also saw that none of the Windows buildbots check/test anything related to sanitizers. It seems to me that none of the sanitizer stuff gets ever tested on Windows?
Do the test for sanitizers_common/asan/ubsan/etc. even work on Windows?

This one runs asan on Windows, you can try to replicated using logs. https://lab.llvm.org/buildbot/#/builders/127
I rarely work on Windows, but i did similar setup some time ago.

rnk added a comment.Jan 6 2022, 2:39 PM

@vitalybuka
No check-asan doesn't work for me. It just hangs forever and does absolutely nothing. No output, nothing showing up in Task Manager with high CPU usage or anything.
However check-clang does work. Is there some documentation on running check-asan on windows.
I also saw that none of the Windows buildbots check/test anything related to sanitizers. It seems to me that none of the sanitizer stuff gets ever tested on Windows?
Do the test for sanitizers_common/asan/ubsan/etc. even work on Windows?

This one runs asan on Windows, you can try to replicated using logs. https://lab.llvm.org/buildbot/#/builders/127
I rarely work on Windows, but i did similar setup some time ago.

I will just confirm that check-asan is supposed to work, it worked for me a few months ago, and the linked buildbot tests it continuously.

However, WinASan is very sensitive to the OS and SDK version. The failure mode you described sounds plausible, something like a crash on startup that results in a message box that you can't see or something like that.

rnk added a subscriber: aganea.Jan 6 2022, 2:40 PM

See a recent change rG7cd109b92c72855937273a6c8ab19016fbe27d33 by @aganea, who presumably was able to run the ASan tests for it.

aganea added a comment.EditedJan 6 2022, 4:06 PM

It does work indeed. I am running under a "x64 Native Tools Command Prompt for VS 2019" prompt with MSVC 16.11.8.

D:\git\llvm-project>mkdir release && cd release
D:\git\llvm-project\release>cmake ../llvm -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lld;llvm;compiler-rt" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PDB=ON -DLLVM_ENABLE_ASSERTIONS=ON
...
D:\git\llvm-project\release>ninja check-asan
[0/1] Running the AddressSanitizer tests
-- Testing: 669 tests, 16 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..

Testing Time: 86.33s
  Unsupported      : 229
  Passed           : 426
  Expectedly Failed:  14

However you might need some Unix tools that don't come out-of-the-box on Windows.
You could first install the scoop package manager and then install any tool that you need:

D:\git\llvm-project>powershell
...
D:\git\llvm-project [main ↓9 +10 ~0 -0 !]> iwr -useb get.scoop.sh | iex
...
D:\git\llvm-project [main ↓9 +10 ~0 -0 !]> scoop install coreutils grep make sed unxutils sysinternals

And then run ninja check-asan again.

If you install sysinternals with scoop above, you can run procexp64 prior to ninja check-asan. That will give a much finer view on the running processes, as compared to the Task Manager. Let me know if I can help further.

@aganea Thanks for your help. I already did everything you wrote (I am however using the coreutils provided by Git).
check-asan works on my new computer. I still don't know why it didn't work on my older one...
I will try to get the lsan tests working (and hopefully passing 😉) on windows soon.

All the lsan tests, when run in lsan+asan mode, Deadlock in StopTheWord when creating the tracer thread...


-> check-lsan gets stuck, due to all the deadlocking tests :(

Could it be caused by asan intercepting the Win32 calls in `StopTheWorld?
@vitalybuka Could this be a problem? If so, how can I call the "real" functions, without being intercepted by asan?
Is there a way (for now), to only run the lsan tests in lsan only mode (no asan)?

clemenswasser edited the summary of this revision. (Show Details)

Updated the Tests.
The Problem with Lsan+Asan still exists. I have just disabled them for now.
I hope anyone can help me with getting the tests to run correctly?

GhisHD added a subscriber: GhisHD.Mar 16 2022, 5:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 5:02 PM
clemenswasser marked an inline comment as done.Apr 9 2022, 1:31 AM

Hi,
today I found some more time to play/progress with this patch. I'm generally optimistic that most of the basic lsan stuff should already be working on Windows.
However without some help of you, I'm unable to get the tests to properly work. The main Problem I looked into today is that llvm-lit.py seems to be somehow unable to capture the stdout/err of the tests. More precisely my Debugging showed that when I insert a print(procs[-1].stdout.read()) at TestRunner.py:790 I get the correctly captured stdout of a test e.g.:

=================================================================
==25028==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1337 byte(s) in 1 object(s) allocated from:
    #0 0x7ff77fc0488e in __asan_wrap_malloc C:\Users\cleme\dev\cpp\llvm-project\compiler-rt\lib\lsan\lsan_interceptors.cpp:75
    #1 0x7ff77fbbaafe in main C:\Users\cleme\dev\cpp\llvm-project\compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp:13:40
    #2 0x7ff77fc09e2b in invoke_main d:\a01\_work\43\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #3 0x7ff77fc09e2b in __scrt_common_main_seh d:\a01\_work\43\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #4 0x7ff8cb5654df  (C:\Windows\System32\KERNEL32.DLL+0x1800154df)
    #5 0x7ff8cc9e485a  (C:\Windows\SYSTEM32\ntdll.dll+0x18000485a)

SUMMARY: LeakSanitizer: 1337 byte(s) leaked in 1 allocation(s).

However further below at lines 816-829 this output is lost (proc.stdout.read() returns a empty string, which obviously causes any FileCheck afterwards to fail)... and I can't figure out why. The only possible explaination I can think of is that the captured stdout is lost during the next loopiteration over cmd.commands

I was wrong, the Problem was, that I didn't change the exit code, when a leak gets detected.
I now intercept ExitProcess and change the exit code in there, with this there are already 12 Tests passing on Windows.

clemenswasser edited the summary of this revision. (Show Details)Apr 10 2022, 8:51 AM

Since I am currently unable to intercept the ExitProcess/TerminateProcess in the ucrt (exit_or_terminate_process in Windows Kits\10\Source\10.0.22000.0\ucrt\startup\exit.cpp:143)
I inserted a call to Die in HandleLeaks (just like on Linux). This brings the passing Test up to 22

I always forget to change the line endings, sorry :(

I have now split the lsan TestCases chages into https://reviews.llvm.org/D124322
and managed to get the lsan+asan TestCases nearly to work. The current problem is,
that during the start of the RunThread in StopTheWorld, we are intercepting a call
to calloc done by the ucrt in ucrtbase!DllMainDispatch, which somehow results in a deadlock.
If someone has an idea, how we could avoid/fix this, I would be very happy to hear it.

clemenswasser marked an inline comment as done.Apr 23 2022, 2:04 AM
vitalybuka requested changes to this revision.Dec 7 2022, 1:30 PM

Is this still relevant?

This revision now requires changes to proceed.Dec 7 2022, 1:30 PM

@vitalybuka
I've been working on this on-off for the past months and have gotten the tests in the following state:

Unsupported: 41
Passed     : 19
Failed     : 48

There is still the bug with the tracer thread deadlocking during an allocation made in the Microsoft CRT thread setup code when using Asan+Lsan.
I hope I can get the failing tests passing.

You probably can try to finish pure lsan, and then get back to lsan+asan.
So if you continue to work please send smaller and ready pieces for review?

clemenswasser added a comment.EditedWed, Feb 1, 11:58 AM

@vitalybuka There seems to be a bug in MSVCs bit-field implementation, which causes the disable.c test to fail.
This reproducer (assert) passes with gcc and fails with MSVC (m.tag is 0xffffffff):

#include <stdint.h>
#include <assert.h>

enum ChunkTag {
  kDirectlyLeaked = 0,  // default
  kIndirectlyLeaked = 1,
  kReachable = 2,
  kIgnored = 3
};

struct ChunkMetadata {
  uint8_t allocated : 8;  // Must be first.
  ChunkTag tag : 2;
  uintptr_t requested_size : 54;
  uint32_t stack_trace_id;
};

int main() {
    ChunkMetadata m;
    m.tag = kIgnored;
    assert(m.tag == kIgnored);
}

Do you have a suggestion how I could fix this on MSVC

Do you have a suggestion how I could fix this on MSVC

@clemenswasser The bug is there for years, I'm surprised it hasn't been fixed yet. MSVC doesn't support different types with bitfields. You should do:

struct ChunkMetadata {
  uint64_t allocated : 8;  // Must be first.
  uint64_t tag : 2;
  uint64_t requested_size : 54;
  uint32_t stack_trace_id;
};

@aganea Thank you very much!

Wow, this fixed a lot of tests, down to only 17 lsan standalone tests failing :) (26 already passing)

luke957 removed a subscriber: luke957.Thu, Feb 2, 11:29 PM