Initial Leak Sanitizer port to Windows.
Details
Diff Detail
Event Timeline
You need to add some reviewers to this revision. check out the CODE_OWNERS.txt or what I sometimes do is to git log one of the files you are changing and see who is a major contributor.
LGTM in general, but it can be cut into smaller patches.
+ @rnk for Windows suff
compiler-rt/lib/lsan/lsan.h | ||
---|---|---|
20 | @clemenswasser Can you please a separate tiny patch which clang-format nearby lines | |
compiler-rt/lib/lsan/lsan_allocator.cpp | ||
30 | This also can go into a separate patch | |
compiler-rt/lib/lsan/lsan_common.cpp | ||
411 | I guess do needs this as well on windows | |
compiler-rt/lib/lsan/lsan_common.h | ||
49 | I am fine like this, fighting clang-format is not useful | |
compiler-rt/lib/lsan/lsan_common_win.cpp | ||
49 | how this can work without ProcessGlobalRegions? | |
compiler-rt/lib/lsan/lsan_interceptors.cpp | ||
505 | Can you please just convert unneeded stuff from INTERCEPT_FUNCTION into LSAN_MAYBE_INTERCEPT | |
compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_win.cpp | ||
5 | sanitizer_stoptheworld_test.cpp needs to be ported to windows, probably with std:: |
compiler-rt/lib/lsan/lsan.h | ||
---|---|---|
20 | Sorry, is there a reason we can't follow the current style in this patch? sorry did I miss something? Its not about fighting clang-format its about ensuring we follow the style from the lines around it which are following the clang-format style. I don't need you to clang-format the whole file (although that would be great because this directory has very low clang-format status) https://clang.llvm.org/docs/ClangFormattedStatus.html All I'm asking is that we don't indent the preprocessor directives to a non LLVM style. i.e. #include "lsan_win.h" #if !SANITIZER_WINDOWS #endif vs # include "lsan_win.h" # if !SANITIZER_WINDOWS # endif |
compiler-rt/lib/lsan/lsan_common.h | ||
---|---|---|
49 | it might not seem useful to individuals, but I think its the guidance, and a consistent style really help new people to understand what's going on. https://llvm.org/docs/Contributing.html#how-to-submit-a-patch I'm a little surprise the "llvm-premerge-checks" didn't highlight this @kuhnel |
oh! gosh I'm sorry I didn't realize that lsan has this non standard LLVM style .clang-format, my apologies! D100238: [sanitizer] Set IndentPPDirectives: AfterHash in .clang-format
kind of surprised TBH, doesn't feel the indentation happens enough to warrant it, but that's up to you its not my backyard.
$ grep "#if" *.cpp | wc -l 461 llvm-project/compiler-rt/lib/sanitizer_common $ grep "# if" *.cpp | wc -l 27
compiler-rt/lib/lsan/lsan.h | ||
---|---|---|
20 | Yes, this is caused by clang-format. What should I do about it? | |
20 | @vitalybuka Should I create a patch, where I format the whole file or just these #includes? |
compiler-rt/lib/lsan/lsan.h | ||
---|---|---|
20 | You can ignore my comment, whilst I think the lsan .clang-format file is incorrect in terms of matching the LLVM style it is correct from the lsan subprojects point of view. I see now that you are doing what is in the local .clang-format file and its the rest of the file is incorrect. (you would handle that separately or not at all) The fact that the .clang-format file doesn't follow the LLVM style is unfortunate, but as I expressed thats upto you all. |
Please check "Edit Related Revisions" on the right of Phabricator. You can nicely chain this patches into "Stack"
@vitalybuka could you or someone else help me get the lsan tests running on Windows?
For example lets look at compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp (and basically all other tests, this serves as an easy example)
I imagine that the following defines a variable called LSAN_BASE :
// RUN: LSAN_BASE="use_stacks=0:use_registers=0"
This doesn't seem to work on Windows.
It gives the following error message:
[build] $ ":" "RUN: at line 2" [build] $ "LSAN_BASE=use_stacks=0:use_registers=0" [build] # command stderr: [build] 'LSAN_BASE=use_stacks=0:use_registers=0': command not found
Can you try to remove LSAN_BASE and just hardcode use_stacks=0:use_registers=0 ?
If it works don't mind if we do so for the rest (in a separate patch)
The problem is, that LSAN_BASE is not always use_stacks=0:use_registers=0.
Also there are many test cases, where LSAN_BASE gets used multiple times.
Removing it would result in bad repetition.
Off-cause , I am asking to replace with correct value each test.
Also there are many test cases, where LSAN_BASE gets used multiple times.
Removing it would result in bad repetition.
I don't like LSAN_BASE even without Windows.
Usually when test fail, I have a single command line to reproduce or put into debugger for just the step. Here I need to cherry-pick two line: LSAN_BASE= and actual step
also in most case it used just twice, so it's not a big loss in repetition
@vitalybuka I have now removed all LSAN_BASE occurrences.
However the invocations of the compiled test executables don't seem to work either.
This doesn't work:
// RUN: %env_lsan_opts=use_stacks=0:use_registers=0 not %run %t foo 2>&1 | FileCheck %s --check-prefix=CHECK-do
It generates this error:
[build] $ ":" "RUN: at line 3" [build] $ "env" "LSAN_OPTIONS=:detect_leaks=1:use_stacks=0:use_registers=0" "not" "E:\git\llvm-project\llvm\build\ninja_debug\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\leak_check_at_exit.cpp.tmp.exe" "foo" [build] note: command had no output on stdout or stderr [build] error: command failed with exit status: True
One of the problems seems to be that %t doesn't have a .exe ending on Windows.
I also tried changing %t to %t.exe, but that doesn't fix the problem and produces the same error.
I think the not command(?) is the part that fails?
When I run it on my own, it does work:
PS E:\git\llvm-project> & 'E:\git\llvm-project\llvm\build\ninja_debug\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\leak_check_at_exit.cpp.tmp.exe' Test alloc: 000061A000000000. ================================================================= ==25768==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1337 byte(s) in 1 object(s) allocated from: #0 0x7ff6e2f745be in __asan_wrap_malloc E:\git\llvm-project\compiler-rt\lib\lsan\lsan_interceptors.cpp:75 #1 0x7ff6e2f2aade in main E:\git\llvm-project\compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp:13:40 #2 0x7ff6e2f79b5b in __scrt_common_main_seh d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 #3 0x7fffde187033 (C:\WINDOWS\System32\KERNEL32.DLL+0x180017033) #4 0x7fffde8a2650 (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180052650) SUMMARY: LeakSanitizer: 1337 byte(s) leaked in 1 allocation(s).
@vitalybuka
No check-asan doesn't work for me. It just hangs forever and does absolutely nothing. No output, nothing showing up in Task Manager with high CPU usage or anything.
However check-clang does work. Is there some documentation on running check-asan on windows.
I also saw that none of the Windows buildbots check/test anything related to sanitizers. It seems to me that none of the sanitizer stuff gets ever tested on Windows?
Do the test for sanitizers_common/asan/ubsan/etc. even work on Windows?
This one runs asan on Windows, you can try to replicated using logs. https://lab.llvm.org/buildbot/#/builders/127
I rarely work on Windows, but i did similar setup some time ago.
I will just confirm that check-asan is supposed to work, it worked for me a few months ago, and the linked buildbot tests it continuously.
However, WinASan is very sensitive to the OS and SDK version. The failure mode you described sounds plausible, something like a crash on startup that results in a message box that you can't see or something like that.
See a recent change rG7cd109b92c72855937273a6c8ab19016fbe27d33 by @aganea, who presumably was able to run the ASan tests for it.
It does work indeed. I am running under a "x64 Native Tools Command Prompt for VS 2019" prompt with MSVC 16.11.8.
D:\git\llvm-project>mkdir release && cd release D:\git\llvm-project\release>cmake ../llvm -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lld;llvm;compiler-rt" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PDB=ON -DLLVM_ENABLE_ASSERTIONS=ON ... D:\git\llvm-project\release>ninja check-asan [0/1] Running the AddressSanitizer tests -- Testing: 669 tests, 16 workers -- Testing: 0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. Testing Time: 86.33s Unsupported : 229 Passed : 426 Expectedly Failed: 14
However you might need some Unix tools that don't come out-of-the-box on Windows.
You could first install the scoop package manager and then install any tool that you need:
D:\git\llvm-project>powershell ... D:\git\llvm-project [main ↓9 +10 ~0 -0 !]> iwr -useb get.scoop.sh | iex ... D:\git\llvm-project [main ↓9 +10 ~0 -0 !]> scoop install coreutils grep make sed unxutils sysinternals
And then run ninja check-asan again.
If you install sysinternals with scoop above, you can run procexp64 prior to ninja check-asan. That will give a much finer view on the running processes, as compared to the Task Manager. Let me know if I can help further.
@aganea Thanks for your help. I already did everything you wrote (I am however using the coreutils provided by Git).
check-asan works on my new computer. I still don't know why it didn't work on my older one...
I will try to get the lsan tests working (and hopefully passing 😉) on windows soon.
All the lsan tests, when run in lsan+asan mode, Deadlock in StopTheWord when creating the tracer thread...
-> check-lsan gets stuck, due to all the deadlocking tests :(
Could it be caused by asan intercepting the Win32 calls in `StopTheWorld?
@vitalybuka Could this be a problem? If so, how can I call the "real" functions, without being intercepted by asan?
Is there a way (for now), to only run the lsan tests in lsan only mode (no asan)?
Updated the Tests.
The Problem with Lsan+Asan still exists. I have just disabled them for now.
I hope anyone can help me with getting the tests to run correctly?
Hi,
today I found some more time to play/progress with this patch. I'm generally optimistic that most of the basic lsan stuff should already be working on Windows.
However without some help of you, I'm unable to get the tests to properly work. The main Problem I looked into today is that llvm-lit.py seems to be somehow unable to capture the stdout/err of the tests. More precisely my Debugging showed that when I insert a print(procs[-1].stdout.read()) at TestRunner.py:790 I get the correctly captured stdout of a test e.g.:
================================================================= ==25028==ERROR: LeakSanitizer: detected memory leaks Direct leak of 1337 byte(s) in 1 object(s) allocated from: #0 0x7ff77fc0488e in __asan_wrap_malloc C:\Users\cleme\dev\cpp\llvm-project\compiler-rt\lib\lsan\lsan_interceptors.cpp:75 #1 0x7ff77fbbaafe in main C:\Users\cleme\dev\cpp\llvm-project\compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp:13:40 #2 0x7ff77fc09e2b in invoke_main d:\a01\_work\43\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78 #3 0x7ff77fc09e2b in __scrt_common_main_seh d:\a01\_work\43\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288 #4 0x7ff8cb5654df (C:\Windows\System32\KERNEL32.DLL+0x1800154df) #5 0x7ff8cc9e485a (C:\Windows\SYSTEM32\ntdll.dll+0x18000485a) SUMMARY: LeakSanitizer: 1337 byte(s) leaked in 1 allocation(s).
However further below at lines 816-829 this output is lost (proc.stdout.read() returns a empty string, which obviously causes any FileCheck afterwards to fail)... and I can't figure out why. The only possible explaination I can think of is that the captured stdout is lost during the next loopiteration over cmd.commands
I was wrong, the Problem was, that I didn't change the exit code, when a leak gets detected.
I now intercept ExitProcess and change the exit code in there, with this there are already 12 Tests passing on Windows.
Since I am currently unable to intercept the ExitProcess/TerminateProcess in the ucrt (exit_or_terminate_process in Windows Kits\10\Source\10.0.22000.0\ucrt\startup\exit.cpp:143)
I inserted a call to Die in HandleLeaks (just like on Linux). This brings the passing Test up to 22
I have now split the lsan TestCases chages into https://reviews.llvm.org/D124322
and managed to get the lsan+asan TestCases nearly to work. The current problem is,
that during the start of the RunThread in StopTheWorld, we are intercepting a call
to calloc done by the ucrt in ucrtbase!DllMainDispatch, which somehow results in a deadlock.
If someone has an idea, how we could avoid/fix this, I would be very happy to hear it.
@vitalybuka
I've been working on this on-off for the past months and have gotten the tests in the following state:
Unsupported: 41 Passed : 19 Failed : 48
There is still the bug with the tracer thread deadlocking during an allocation made in the Microsoft CRT thread setup code when using Asan+Lsan.
I hope I can get the failing tests passing.
You probably can try to finish pure lsan, and then get back to lsan+asan.
So if you continue to work please send smaller and ready pieces for review?
@vitalybuka There seems to be a bug in MSVCs bit-field implementation, which causes the disable.c test to fail.
This reproducer (assert) passes with gcc and fails with MSVC (m.tag is 0xffffffff):
#include <stdint.h> #include <assert.h> enum ChunkTag { kDirectlyLeaked = 0, // default kIndirectlyLeaked = 1, kReachable = 2, kIgnored = 3 }; struct ChunkMetadata { uint8_t allocated : 8; // Must be first. ChunkTag tag : 2; uintptr_t requested_size : 54; uint32_t stack_trace_id; }; int main() { ChunkMetadata m; m.tag = kIgnored; assert(m.tag == kIgnored); }
Do you have a suggestion how I could fix this on MSVC
@clemenswasser The bug is there for years, I'm surprised it hasn't been fixed yet. MSVC doesn't support different types with bitfields. You should do:
struct ChunkMetadata { uint64_t allocated : 8; // Must be first. uint64_t tag : 2; uint64_t requested_size : 54; uint32_t stack_trace_id; };
Wow, this fixed a lot of tests, down to only 17 lsan standalone tests failing :) (26 already passing)
I am currently unable to progress as all of my interceptors are not working.
The interceptors are failing, because the GetInstructionSize function fails.
For example, the CreateThread Win32 function starts on my Windows 11 installation with 4c 8b dc mov r11,rsp (in little-endian hex: 0xdc8b4c).
I get this exact value when using the Visual Studio Watch Window with the expression: 0x00FFFFFF & *(uint32_t*)address while debugging GetInstructionSize.
This was then verified by me again in the Visual Studio Disassembly View of CreateThread and a third time via this simple test program:
#include <Windows.h> #include <stdio.h> #include <string> int main() { auto* ptr = CreateThread; printf("CreateThread: %p\n", ptr); printf("CreateThread first bytes: %s\n", std::to_string(*(int*)ptr).c_str()); }
This exact instruction is correctly handled by the following switch in GetInstructionSize:
switch (0x00FFFFFF & *(u32*)address) { ... case 0xdc8b4c: // 4c 8b dc : mov r11, rsp return 3;
But GetInstructionSize never returns 3 as the switch condition reads 0xdc8bcc from address which is != 0xdc8b4c and instead encodes:
0: cc int3 1: 8b dc mov ebx,esp
I don't really understand why this would read this obviously wrong value when my similar test program reads it correctly.
Does anyone of you know why this might be happening and how I could fix this?
(BTW. Asan also intercepts CreateThreads and that seems to work...)
@clemenswasser We hit this once in a while, when Microsoft update their libraries. But I'm surprised about the CC at the beginning. How can this issue be reproduced? Can you show a disassembly dump from the VS debugger, around the address of the function that fails?
@aganea You can find my current changes on my D115103 branch. I am using the currently failing use_stacks.cpp test.
Something like this should make debugging of this issue possible:
> cd llvm-project > cmake -G Ninja -B build -D CMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_PROJECTS=clang;compiler-rt > cmake --build build -t check-lsan > cp build\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\use_stacks.cpp.tmp build\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\use_stacks.cpp.exe > devenv /debugexe build\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\use_stacks.cpp.exe
You should see that the LSAN_MAYBE_INTERCEPT_CREATE_THREAD in __lsan::InitializeInterceptors fails because of the wrong instruction read of the CreateThread function address in GetInstructionSize as described above.
clang-format: please reformat the code