This is an archive of the discontinued LLVM Phabricator instance.

Leak Sanitizer port to Windows
Needs RevisionPublic

Authored by clemenswasser on Dec 4 2021, 12:11 PM.

Details

Summary

Initial Leak Sanitizer port to Windows.

Diff Detail

Event Timeline

clemenswasser created this revision.Dec 4 2021, 12:11 PM
clemenswasser requested review of this revision.Dec 4 2021, 12:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2021, 12:11 PM
Herald added subscribers: Restricted Project, cfe-commits. · View Herald Transcript
clemenswasser edited the summary of this revision. (Show Details)Dec 4 2021, 12:36 PM

Changed to core.autocrlf=false

Move __lsan_init call inside of SANITIZER_WINDOWS ifdef

Apply clang-format

Use LF line endings

clemenswasser edited the summary of this revision. (Show Details)Dec 4 2021, 2:44 PM
MyDeveloperDay added inline comments.
compiler-rt/lib/lsan/lsan.h
20

clang-format?

compiler-rt/lib/lsan/lsan_common.h
49

you probably don't want to change this do you?

You need to add some reviewers to this revision. check out the CODE_OWNERS.txt or what I sometimes do is to git log one of the files you are changing and see who is a major contributor.

Remove wrong interceptors

vitalybuka added a subscriber: rnk.

LGTM in general, but it can be cut into smaller patches.

+ @rnk for Windows suff

compiler-rt/lib/lsan/lsan.h
20

@clemenswasser Can you please a separate tiny patch which clang-format nearby lines

compiler-rt/lib/lsan/lsan_allocator.cpp
30 ↗(On Diff #392017)

This also can go into a separate patch
BTW I would slightly prefer just UL -> ULL

compiler-rt/lib/lsan/lsan_common.cpp
503

I guess do needs this as well on windows
The problem is that we have no MemoryMappingLayout implementation for Windows.
I guess it can be done with VirtualQueryEx
for now can you just ifdef only body of the ProcessRootRegion with TODO?

compiler-rt/lib/lsan/lsan_common.h
49

I am fine like this, fighting clang-format is not useful
however it would be nice if you clang-format relevant block in a separate patch then then rebase this patch on top of it

compiler-rt/lib/lsan/lsan_common_win.cpp
49

how this can work without ProcessGlobalRegions?

compiler-rt/lib/lsan/lsan_interceptors.cpp
540

Can you please just convert unneeded stuff from INTERCEPT_FUNCTION into LSAN_MAYBE_INTERCEPT
In a separate patch

compiler-rt/lib/sanitizer_common/sanitizer_stoptheworld_win.cpp
4 ↗(On Diff #392017)

sanitizer_stoptheworld_test.cpp needs to be ported to windows, probably with std::
can you please move this work and sanitizer_stoptheworld_win.cpp into separate patch/patches

MyDeveloperDay added inline comments.Dec 7 2021, 12:02 AM
compiler-rt/lib/lsan/lsan.h
20

Sorry, is there a reason we can't follow the current style in this patch? sorry did I miss something? Its not about fighting clang-format its about ensuring we follow the style from the lines around it which are following the clang-format style. I don't need you to clang-format the whole file (although that would be great because this directory has very low clang-format status)


https://clang.llvm.org/docs/ClangFormattedStatus.html

All I'm asking is that we don't indent the preprocessor directives to a non LLVM style.

i.e.

#include "lsan_win.h"
#if !SANITIZER_WINDOWS
#endif

vs

#  include "lsan_win.h"
#  if !SANITIZER_WINDOWS
#  endif
MyDeveloperDay added inline comments.Dec 7 2021, 12:10 AM
compiler-rt/lib/lsan/lsan_common.h
49

it might not seem useful to individuals, but I think its the guidance, and a consistent style really help new people to understand what's going on.

https://llvm.org/docs/Contributing.html#how-to-submit-a-patch

I'm a little surprise the "llvm-premerge-checks" didn't highlight this @kuhnel

oh! gosh I'm sorry I didn't realize that lsan has this non standard LLVM style .clang-format, my apologies! D100238: [sanitizer] Set IndentPPDirectives: AfterHash in .clang-format

kind of surprised TBH, doesn't feel the indentation happens enough to warrant it, but that's up to you its not my backyard.

$ grep "#if" *.cpp | wc -l
461
llvm-project/compiler-rt/lib/sanitizer_common
$ grep "# if" *.cpp | wc -l
27
clemenswasser added inline comments.Dec 7 2021, 7:44 AM
compiler-rt/lib/lsan/lsan.h
20

Yes, this is caused by clang-format. What should I do about it?

20

@vitalybuka Should I create a patch, where I format the whole file or just these #includes?

MyDeveloperDay added inline comments.Dec 7 2021, 8:25 AM
compiler-rt/lib/lsan/lsan.h
20

You can ignore my comment, whilst I think the lsan .clang-format file is incorrect in terms of matching the LLVM style it is correct from the lsan subprojects point of view. I see now that you are doing what is in the local .clang-format file and its the rest of the file is incorrect. (you would handle that separately or not at all)

The fact that the .clang-format file doesn't follow the LLVM style is unfortunate, but as I expressed thats upto you all.

Split this patch up into multiple smaller ones:

clemenswasser marked 2 inline comments as done.
clemenswasser marked 4 inline comments as done.

Split this patch up into multiple smaller ones:

Please check "Edit Related Revisions" on the right of Phabricator. You can nicely chain this patches into "Stack"

Split this patch up into multiple smaller ones:

Implement ProcessGlobalRegions
Fix Interceptors for Windows

vitalybuka added inline comments.Dec 16 2021, 1:30 PM
compiler-rt/lib/lsan/lsan_allocator.h
52–54
68–69
compiler-rt/lib/lsan/lsan_common.cpp
249–254

same

clemenswasser marked an inline comment as done.
clemenswasser marked 2 inline comments as done.

Simplify Architecture checks

clemenswasser marked 3 inline comments as done.Dec 17 2021, 1:54 PM

@vitalybuka could you or someone else help me get the lsan tests running on Windows?
For example lets look at compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp (and basically all other tests, this serves as an easy example)
I imagine that the following defines a variable called LSAN_BASE :
// RUN: LSAN_BASE="use_stacks=0:use_registers=0"
This doesn't seem to work on Windows.
It gives the following error message:

[build] $ ":" "RUN: at line 2"
[build] $ "LSAN_BASE=use_stacks=0:use_registers=0"
[build] # command stderr:
[build] 'LSAN_BASE=use_stacks=0:use_registers=0': command not found

@vitalybuka could you or someone else help me get the lsan tests running on Windows?
For example lets look at compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp (and basically all other tests, this serves as an easy example)
I imagine that the following defines a variable called LSAN_BASE :
// RUN: LSAN_BASE="use_stacks=0:use_registers=0"
This doesn't seem to work on Windows.
It gives the following error message:

[build] $ ":" "RUN: at line 2"
[build] $ "LSAN_BASE=use_stacks=0:use_registers=0"
[build] # command stderr:
[build] 'LSAN_BASE=use_stacks=0:use_registers=0': command not found

Can you try to remove LSAN_BASE and just hardcode use_stacks=0:use_registers=0 ?
If it works don't mind if we do so for the rest (in a separate patch)

The problem is, that LSAN_BASE is not always use_stacks=0:use_registers=0.
Also there are many test cases, where LSAN_BASE gets used multiple times.
Removing it would result in bad repetition.

The problem is, that LSAN_BASE is not always use_stacks=0:use_registers=0.

Off-cause , I am asking to replace with correct value each test.

Also there are many test cases, where LSAN_BASE gets used multiple times.
Removing it would result in bad repetition.

I don't like LSAN_BASE even without Windows.
Usually when test fail, I have a single command line to reproduce or put into debugger for just the step. Here I need to cherry-pick two line: LSAN_BASE= and actual step
also in most case it used just twice, so it's not a big loss in repetition

@vitalybuka I have now removed all LSAN_BASE occurrences.
However the invocations of the compiled test executables don't seem to work either.
This doesn't work:
// RUN: %env_lsan_opts=use_stacks=0:use_registers=0 not %run %t foo 2>&1 | FileCheck %s --check-prefix=CHECK-do
It generates this error:

[build] $ ":" "RUN: at line 3"
[build] $ "env" "LSAN_OPTIONS=:detect_leaks=1:use_stacks=0:use_registers=0" "not" "E:\git\llvm-project\llvm\build\ninja_debug\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\leak_check_at_exit.cpp.tmp.exe" "foo"
[build] note: command had no output on stdout or stderr
[build] error: command failed with exit status: True

One of the problems seems to be that %t doesn't have a .exe ending on Windows.
I also tried changing %t to %t.exe, but that doesn't fix the problem and produces the same error.
I think the not command(?) is the part that fails?

When I run it on my own, it does work:

PS E:\git\llvm-project> & 'E:\git\llvm-project\llvm\build\ninja_debug\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\leak_check_at_exit.cpp.tmp.exe'
Test alloc: 000061A000000000.

=================================================================
==25768==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1337 byte(s) in 1 object(s) allocated from:
    #0 0x7ff6e2f745be in __asan_wrap_malloc E:\git\llvm-project\compiler-rt\lib\lsan\lsan_interceptors.cpp:75
    #1 0x7ff6e2f2aade in main E:\git\llvm-project\compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp:13:40
    #2 0x7ff6e2f79b5b in __scrt_common_main_seh d:\a01\_work\6\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #3 0x7fffde187033  (C:\WINDOWS\System32\KERNEL32.DLL+0x180017033)
    #4 0x7fffde8a2650  (C:\WINDOWS\SYSTEM32\ntdll.dll+0x180052650)

SUMMARY: LeakSanitizer: 1337 byte(s) leaked in 1 allocation(s).

@vitalybuka I have now removed all LSAN_BASE occurrences.
However the invocations of the compiled test executables don't seem to work either.
This doesn't work:
// RUN: %env_lsan_opts=use_stacks=0:use_registers=0 not %run %t foo 2>&1 | FileCheck %s --check-prefix=CHECK-do
It generates this error:

[build] $ ":" "RUN: at line 3"
[build] $ "env" "LSAN_OPTIONS=:detect_leaks=1:use_stacks=0:use_registers=0" "not" "E:\git\llvm-project\llvm\build\ninja_debug\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\leak_check_at_exit.cpp.tmp.exe" "foo"
[build] note: command had no output on stdout or stderr
[build] error: command failed with exit status: True

does check-asan work for your setup?

@vitalybuka
No check-asan doesn't work for me. It just hangs forever and does absolutely nothing. No output, nothing showing up in Task Manager with high CPU usage or anything.
However check-clang does work. Is there some documentation on running check-asan on windows.
I also saw that none of the Windows buildbots check/test anything related to sanitizers. It seems to me that none of the sanitizer stuff gets ever tested on Windows?
Do the test for sanitizers_common/asan/ubsan/etc. even work on Windows?

@vitalybuka
No check-asan doesn't work for me. It just hangs forever and does absolutely nothing. No output, nothing showing up in Task Manager with high CPU usage or anything.
However check-clang does work. Is there some documentation on running check-asan on windows.
I also saw that none of the Windows buildbots check/test anything related to sanitizers. It seems to me that none of the sanitizer stuff gets ever tested on Windows?
Do the test for sanitizers_common/asan/ubsan/etc. even work on Windows?

This one runs asan on Windows, you can try to replicated using logs. https://lab.llvm.org/buildbot/#/builders/127
I rarely work on Windows, but i did similar setup some time ago.

rnk added a comment.Jan 6 2022, 2:39 PM

@vitalybuka
No check-asan doesn't work for me. It just hangs forever and does absolutely nothing. No output, nothing showing up in Task Manager with high CPU usage or anything.
However check-clang does work. Is there some documentation on running check-asan on windows.
I also saw that none of the Windows buildbots check/test anything related to sanitizers. It seems to me that none of the sanitizer stuff gets ever tested on Windows?
Do the test for sanitizers_common/asan/ubsan/etc. even work on Windows?

This one runs asan on Windows, you can try to replicated using logs. https://lab.llvm.org/buildbot/#/builders/127
I rarely work on Windows, but i did similar setup some time ago.

I will just confirm that check-asan is supposed to work, it worked for me a few months ago, and the linked buildbot tests it continuously.

However, WinASan is very sensitive to the OS and SDK version. The failure mode you described sounds plausible, something like a crash on startup that results in a message box that you can't see or something like that.

rnk added a subscriber: aganea.Jan 6 2022, 2:40 PM

See a recent change rG7cd109b92c72855937273a6c8ab19016fbe27d33 by @aganea, who presumably was able to run the ASan tests for it.

aganea added a comment.EditedJan 6 2022, 4:06 PM

It does work indeed. I am running under a "x64 Native Tools Command Prompt for VS 2019" prompt with MSVC 16.11.8.

D:\git\llvm-project>mkdir release && cd release
D:\git\llvm-project\release>cmake ../llvm -G Ninja -DLLVM_ENABLE_PROJECTS="clang;lld;llvm;compiler-rt" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PDB=ON -DLLVM_ENABLE_ASSERTIONS=ON
...
D:\git\llvm-project\release>ninja check-asan
[0/1] Running the AddressSanitizer tests
-- Testing: 669 tests, 16 workers --
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90..

Testing Time: 86.33s
  Unsupported      : 229
  Passed           : 426
  Expectedly Failed:  14

However you might need some Unix tools that don't come out-of-the-box on Windows.
You could first install the scoop package manager and then install any tool that you need:

D:\git\llvm-project>powershell
...
D:\git\llvm-project [main ↓9 +10 ~0 -0 !]> iwr -useb get.scoop.sh | iex
...
D:\git\llvm-project [main ↓9 +10 ~0 -0 !]> scoop install coreutils grep make sed unxutils sysinternals

And then run ninja check-asan again.

If you install sysinternals with scoop above, you can run procexp64 prior to ninja check-asan. That will give a much finer view on the running processes, as compared to the Task Manager. Let me know if I can help further.

@aganea Thanks for your help. I already did everything you wrote (I am however using the coreutils provided by Git).
check-asan works on my new computer. I still don't know why it didn't work on my older one...
I will try to get the lsan tests working (and hopefully passing 😉) on windows soon.

All the lsan tests, when run in lsan+asan mode, Deadlock in StopTheWord when creating the tracer thread...


-> check-lsan gets stuck, due to all the deadlocking tests :(

Could it be caused by asan intercepting the Win32 calls in `StopTheWorld?
@vitalybuka Could this be a problem? If so, how can I call the "real" functions, without being intercepted by asan?
Is there a way (for now), to only run the lsan tests in lsan only mode (no asan)?

clemenswasser edited the summary of this revision. (Show Details)

Updated the Tests.
The Problem with Lsan+Asan still exists. I have just disabled them for now.
I hope anyone can help me with getting the tests to run correctly?

GhisHD added a subscriber: GhisHD.Mar 16 2022, 5:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2022, 5:02 PM
clemenswasser marked an inline comment as done.Apr 9 2022, 1:31 AM

Hi,
today I found some more time to play/progress with this patch. I'm generally optimistic that most of the basic lsan stuff should already be working on Windows.
However without some help of you, I'm unable to get the tests to properly work. The main Problem I looked into today is that llvm-lit.py seems to be somehow unable to capture the stdout/err of the tests. More precisely my Debugging showed that when I insert a print(procs[-1].stdout.read()) at TestRunner.py:790 I get the correctly captured stdout of a test e.g.:

=================================================================
==25028==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 1337 byte(s) in 1 object(s) allocated from:
    #0 0x7ff77fc0488e in __asan_wrap_malloc C:\Users\cleme\dev\cpp\llvm-project\compiler-rt\lib\lsan\lsan_interceptors.cpp:75
    #1 0x7ff77fbbaafe in main C:\Users\cleme\dev\cpp\llvm-project\compiler-rt\test\lsan\TestCases\leak_check_at_exit.cpp:13:40
    #2 0x7ff77fc09e2b in invoke_main d:\a01\_work\43\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:78
    #3 0x7ff77fc09e2b in __scrt_common_main_seh d:\a01\_work\43\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
    #4 0x7ff8cb5654df  (C:\Windows\System32\KERNEL32.DLL+0x1800154df)
    #5 0x7ff8cc9e485a  (C:\Windows\SYSTEM32\ntdll.dll+0x18000485a)

SUMMARY: LeakSanitizer: 1337 byte(s) leaked in 1 allocation(s).

However further below at lines 816-829 this output is lost (proc.stdout.read() returns a empty string, which obviously causes any FileCheck afterwards to fail)... and I can't figure out why. The only possible explaination I can think of is that the captured stdout is lost during the next loopiteration over cmd.commands

I was wrong, the Problem was, that I didn't change the exit code, when a leak gets detected.
I now intercept ExitProcess and change the exit code in there, with this there are already 12 Tests passing on Windows.

clemenswasser edited the summary of this revision. (Show Details)Apr 10 2022, 8:51 AM

Since I am currently unable to intercept the ExitProcess/TerminateProcess in the ucrt (exit_or_terminate_process in Windows Kits\10\Source\10.0.22000.0\ucrt\startup\exit.cpp:143)
I inserted a call to Die in HandleLeaks (just like on Linux). This brings the passing Test up to 22

I always forget to change the line endings, sorry :(

I have now split the lsan TestCases chages into https://reviews.llvm.org/D124322
and managed to get the lsan+asan TestCases nearly to work. The current problem is,
that during the start of the RunThread in StopTheWorld, we are intercepting a call
to calloc done by the ucrt in ucrtbase!DllMainDispatch, which somehow results in a deadlock.
If someone has an idea, how we could avoid/fix this, I would be very happy to hear it.

clemenswasser marked an inline comment as done.Apr 23 2022, 2:04 AM
vitalybuka requested changes to this revision.Dec 7 2022, 1:30 PM

Is this still relevant?

This revision now requires changes to proceed.Dec 7 2022, 1:30 PM

@vitalybuka
I've been working on this on-off for the past months and have gotten the tests in the following state:

Unsupported: 41
Passed     : 19
Failed     : 48

There is still the bug with the tracer thread deadlocking during an allocation made in the Microsoft CRT thread setup code when using Asan+Lsan.
I hope I can get the failing tests passing.

You probably can try to finish pure lsan, and then get back to lsan+asan.
So if you continue to work please send smaller and ready pieces for review?

clemenswasser added a comment.EditedFeb 1 2023, 11:58 AM

@vitalybuka There seems to be a bug in MSVCs bit-field implementation, which causes the disable.c test to fail.
This reproducer (assert) passes with gcc and fails with MSVC (m.tag is 0xffffffff):

#include <stdint.h>
#include <assert.h>

enum ChunkTag {
  kDirectlyLeaked = 0,  // default
  kIndirectlyLeaked = 1,
  kReachable = 2,
  kIgnored = 3
};

struct ChunkMetadata {
  uint8_t allocated : 8;  // Must be first.
  ChunkTag tag : 2;
  uintptr_t requested_size : 54;
  uint32_t stack_trace_id;
};

int main() {
    ChunkMetadata m;
    m.tag = kIgnored;
    assert(m.tag == kIgnored);
}

Do you have a suggestion how I could fix this on MSVC

Do you have a suggestion how I could fix this on MSVC

@clemenswasser The bug is there for years, I'm surprised it hasn't been fixed yet. MSVC doesn't support different types with bitfields. You should do:

struct ChunkMetadata {
  uint64_t allocated : 8;  // Must be first.
  uint64_t tag : 2;
  uint64_t requested_size : 54;
  uint32_t stack_trace_id;
};

@aganea Thank you very much!

Wow, this fixed a lot of tests, down to only 17 lsan standalone tests failing :) (26 already passing)

luke957 removed a subscriber: luke957.Feb 2 2023, 11:29 PM

I am currently unable to progress as all of my interceptors are not working.
The interceptors are failing, because the GetInstructionSize function fails.
For example, the CreateThread Win32 function starts on my Windows 11 installation with 4c 8b dc mov r11,rsp (in little-endian hex: 0xdc8b4c).
I get this exact value when using the Visual Studio Watch Window with the expression: 0x00FFFFFF & *(uint32_t*)address while debugging GetInstructionSize.
This was then verified by me again in the Visual Studio Disassembly View of CreateThread and a third time via this simple test program:

#include <Windows.h>
#include <stdio.h>
#include <string>
int main() {
  auto* ptr = CreateThread;
  printf("CreateThread: %p\n", ptr);
  printf("CreateThread first bytes: %s\n", std::to_string(*(int*)ptr).c_str());
}

This exact instruction is correctly handled by the following switch in GetInstructionSize:

switch (0x00FFFFFF & *(u32*)address) {
  ...
  case 0xdc8b4c:    // 4c 8b dc : mov r11, rsp
    return 3;

But GetInstructionSize never returns 3 as the switch condition reads 0xdc8bcc from address which is != 0xdc8b4c and instead encodes:

0:  cc                      int3
1:  8b dc                   mov    ebx,esp

I don't really understand why this would read this obviously wrong value when my similar test program reads it correctly.
Does anyone of you know why this might be happening and how I could fix this?
(BTW. Asan also intercepts CreateThreads and that seems to work...)

@clemenswasser We hit this once in a while, when Microsoft update their libraries. But I'm surprised about the CC at the beginning. How can this issue be reproduced? Can you show a disassembly dump from the VS debugger, around the address of the function that fails?

@aganea You can find my current changes on my D115103 branch. I am using the currently failing use_stacks.cpp test.
Something like this should make debugging of this issue possible:

> cd llvm-project
> cmake -G Ninja -B build -D CMAKE_BUILD_TYPE=Debug -DLLVM_ENABLE_PROJECTS=clang;compiler-rt
> cmake --build build -t check-lsan
> cp build\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\use_stacks.cpp.tmp build\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\use_stacks.cpp.exe
> devenv /debugexe build\projects\compiler-rt\test\lsan\X86_64LsanConfig\TestCases\Output\use_stacks.cpp.exe

You should see that the LSAN_MAYBE_INTERCEPT_CREATE_THREAD in __lsan::InitializeInterceptors fails because of the wrong instruction read of the CreateThread function address in GetInstructionSize as described above.

vitalybuka requested changes to this revision.Aug 25 2023, 3:16 PM
This revision now requires changes to proceed.Aug 25 2023, 3:16 PM