Page MenuHomePhabricator

PrintStackTrace: don't symbolize if LLVM_DISABLE_SYMBOLIZATION is set
ClosedPublic

Authored by MaskRay on Aug 18 2020, 2:12 PM.

Details

Summary

See http://lists.llvm.org/pipermail/llvm-dev/2017-June/113975.html for a related previous discussion.
Many tools install signal handlers to print stack traces and optionally
symbolize the addresses with an external program 'llvm-symbolizer' (when
searching for 'llvm-symbolizer', the directory containg the executable
is preferred over PATH).

'llvm-symbolizer' can be slow if the executable is large and/or if
llvm-symbolizer' itself is under-optimized. For example, my 'llvm-lto2' from a
-DCMAKE_BUILD_TYPE=Debug build is 443MiB. The 'llvm-symbolizer' from the same
build takes ~2s to symbolize it. (An optimized 'llvm-symbolizer' takes 0.34s).
A crashed clang may take more than 5s to symbolize a stack trace.

If a test file has several not --crash RUN lines. It can be very slow in a Debug build.
This patch makes not --crash set an environment variable to suppress symbolization.
This is similar to D33804 which uses a command line option.
I pick 'symbolization' instead of 'symbolication' because the former is
used much more commonly and its stem matches 'llvm-symbolizer'.

Also set LLVM_DISABLE_CRASH_REPORT=1, which is currently only applicable on
__APPLE__.

Diff Detail

Event Timeline

MaskRay created this revision.Aug 18 2020, 2:12 PM
MaskRay requested review of this revision.Aug 18 2020, 2:12 PM
MaskRay edited the summary of this revision. (Show Details)Aug 18 2020, 2:16 PM

Rather than adding a new option, could we generalize LLVM_DISABLE_CRASH_REPORT to work on all platforms & wire that up to not --crash as you've done, without adding another name for this functionality?

Actually it looks like if we removed the Apple-specific filtering of ENABLE_CRASH_OVERRIDE, then the handling in Signals.inc which also supports the runtime API parameter "DisableCrashReporting" would also do what I was proposing for unit tests (disabling crash reporting on gtest) - which has been the default on MacOS builds for 5 years now? I'm not sure I feel perfectly good about disabling crash reporting for all unit test failures, rather than only disabling it around death tests - but I think it's probably good to make the perf tradeoffs the same on different platforms, and that this has been the default for 5 years for a good chunk of the LLVM developers (those on mac) it's probably good enough to ship.

MaskRay added a comment.EditedAug 18 2020, 2:38 PM

Rather than adding a new option, could we generalize LLVM_DISABLE_CRASH_REPORT to work on all platforms & wire that up to not --crash as you've done, without adding another name for this functionality?

Actually it looks like if we removed the Apple-specific filtering of ENABLE_CRASH_OVERRIDE, then the handling in Signals.inc which also supports the runtime API parameter "DisableCrashReporting" would also do what I was proposing for unit tests (disabling crash reporting on gtest) - which has been the default on MacOS builds for 5 years now? I'm not sure I feel perfectly good about disabling crash reporting for all unit test failures, rather than only disabling it around death tests - but I think it's probably good to make the perf tradeoffs the same on different platforms, and that this has been the default for 5 years for a good chunk of the LLVM developers (those on mac) it's probably good enough to ship.

http://lists.llvm.org/pipermail/llvm-dev/2017-May/113292.html says the following. Your comment suggests the opposite. I am fine with either way.

I actually slightly prefer the name LLVM_DISABLE_DISABLE_SYMBOLIZATION over LLVM_DISABLE_CRASH_REPORT because the former appears to be more appropriate.

We have the LLVM_DISABLE_CRASH_REPORT environment variable, but that's more about whether we should do crash reporting or not.

It would be pretty reasonable to have another one to disable all this stuff. It would also be reasonable to have a cmake option that compiles this stuff away, since it basically never works on user machines that don't have debug info.

MaskRay edited the summary of this revision. (Show Details)Aug 18 2020, 2:49 PM
pete added a comment.Aug 18 2020, 4:22 PM

Rather than adding a new option, could we generalize LLVM_DISABLE_CRASH_REPORT to work on all platforms & wire that up to not --crash as you've done, without adding another name for this functionality?

Actually it looks like if we removed the Apple-specific filtering of ENABLE_CRASH_OVERRIDE, then the handling in Signals.inc which also supports the runtime API parameter "DisableCrashReporting" would also do what I was proposing for unit tests (disabling crash reporting on gtest) - which has been the default on MacOS builds for 5 years now? I'm not sure I feel perfectly good about disabling crash reporting for all unit test failures, rather than only disabling it around death tests - but I think it's probably good to make the perf tradeoffs the same on different platforms, and that this has been the default for 5 years for a good chunk of the LLVM developers (those on mac) it's probably good enough to ship.

Seems reasonable to me too.

The patch itself looks good to me. Some others have feedback to answer, but I don't see anything that worries me.

Rather than adding a new option, could we generalize LLVM_DISABLE_CRASH_REPORT to work on all platforms & wire that up to not --crash as you've done, without adding another name for this functionality?

Actually it looks like if we removed the Apple-specific filtering of ENABLE_CRASH_OVERRIDE, then the handling in Signals.inc which also supports the runtime API parameter "DisableCrashReporting" would also do what I was proposing for unit tests (disabling crash reporting on gtest) - which has been the default on MacOS builds for 5 years now? I'm not sure I feel perfectly good about disabling crash reporting for all unit test failures, rather than only disabling it around death tests - but I think it's probably good to make the perf tradeoffs the same on different platforms, and that this has been the default for 5 years for a good chunk of the LLVM developers (those on mac) it's probably good enough to ship.

http://lists.llvm.org/pipermail/llvm-dev/2017-May/113292.html says the following. Your comment suggests the opposite. I am fine with either way.

@rnk wrote:

I actually slightly prefer the name LLVM_DISABLE_DISABLE_SYMBOLIZATION over LLVM_DISABLE_CRASH_REPORT because the former appears to be more appropriate.

We have the LLVM_DISABLE_CRASH_REPORT environment variable, but that's more about whether we should do crash reporting or not.

It would be pretty reasonable to have another one to disable all this stuff. It would also be reasonable to have a cmake option that compiles this stuff away, since it basically never works on user machines that don't have debug info.

Ah, I see - I misunderstood what the existing codepath does. I guess/think it's about those sort of OS-level pop-up crash dialogs that some operating systems have? So that makes sense that it'd be disabled for all tests, even the ones that aren't intended to crash because otherwise it could stall the whole test run, potentially, with a blocking pop-up dialog.

& yeah, I'm onboard with the new/separate env variable name/functionality. Any chance there would be some way to disable this temporarily at runtime, though? So we could disable it during death tests (I think in the GTEST_DEATH_ macro, in the "DeathTest::EXECUTE_TEST" case in the switch there (in gtest-death-test-internal.h)) but not having to disable it for all unit test executions?

Ah, I see I'm to blame for the existing DisableSymbolication flag & indeed in the initial review an env variable was speculated as another possible direction. So perhaps if you're feeling inclined, in a follow-up patch, you could remove the flag and migrate it to use the env variable. (I seem to have experimented with that direction at least a little here: https://reviews.llvm.org/D33804?id=101121#inline-296440 )

Rather than adding a new option, could we generalize LLVM_DISABLE_CRASH_REPORT to work on all platforms & wire that up to not --crash as you've done, without adding another name for this functionality?

Actually it looks like if we removed the Apple-specific filtering of ENABLE_CRASH_OVERRIDE, then the handling in Signals.inc which also supports the runtime API parameter "DisableCrashReporting" would also do what I was proposing for unit tests (disabling crash reporting on gtest) - which has been the default on MacOS builds for 5 years now? I'm not sure I feel perfectly good about disabling crash reporting for all unit test failures, rather than only disabling it around death tests - but I think it's probably good to make the perf tradeoffs the same on different platforms, and that this has been the default for 5 years for a good chunk of the LLVM developers (those on mac) it's probably good enough to ship.

http://lists.llvm.org/pipermail/llvm-dev/2017-May/113292.html says the following. Your comment suggests the opposite. I am fine with either way.

@rnk wrote:

I actually slightly prefer the name LLVM_DISABLE_DISABLE_SYMBOLIZATION over LLVM_DISABLE_CRASH_REPORT because the former appears to be more appropriate.

We have the LLVM_DISABLE_CRASH_REPORT environment variable, but that's more about whether we should do crash reporting or not.

It would be pretty reasonable to have another one to disable all this stuff. It would also be reasonable to have a cmake option that compiles this stuff away, since it basically never works on user machines that don't have debug info.

Ah, I see - I misunderstood what the existing codepath does. I guess/think it's about those sort of OS-level pop-up crash dialogs that some operating systems have? So that makes sense that it'd be disabled for all tests, even the ones that aren't intended to crash because otherwise it could stall the whole test run, potentially, with a blocking pop-up dialog.

& yeah, I'm onboard with the new/separate env variable name/functionality. Any chance there would be some way to disable this temporarily at runtime, though? So we could disable it during death tests (I think in the GTEST_DEATH_ macro, in the "DeathTest::EXECUTE_TEST" case in the switch there (in gtest-death-test-internal.h)) but not having to disable it for all unit test executions?

I am very unfamiliar with gtest-death-test.cc, which additionally has some Windows related stuff for which I don't have a good way to test (I have no Windows machine). I also think lit tests take most of the time, so I do not intend to address it in this patch. Hope some Windows folks on the Reviewers: list can improve unit tests:)

Ah, I see I'm to blame for the existing DisableSymbolication flag & indeed in the initial review an env variable was speculated as another possible direction. So perhaps if you're feeling inclined, in a follow-up patch, you could remove the flag and migrate it to use the env variable. (I seem to have experimented with that direction at least a little here: https://reviews.llvm.org/D33804?id=101121#inline-296440 )

Removing DisableSymbolication in a follow-up sounds good.

dblaikie accepted this revision.Aug 19 2020, 4:12 PM

Rather than adding a new option, could we generalize LLVM_DISABLE_CRASH_REPORT to work on all platforms & wire that up to not --crash as you've done, without adding another name for this functionality?

Actually it looks like if we removed the Apple-specific filtering of ENABLE_CRASH_OVERRIDE, then the handling in Signals.inc which also supports the runtime API parameter "DisableCrashReporting" would also do what I was proposing for unit tests (disabling crash reporting on gtest) - which has been the default on MacOS builds for 5 years now? I'm not sure I feel perfectly good about disabling crash reporting for all unit test failures, rather than only disabling it around death tests - but I think it's probably good to make the perf tradeoffs the same on different platforms, and that this has been the default for 5 years for a good chunk of the LLVM developers (those on mac) it's probably good enough to ship.

http://lists.llvm.org/pipermail/llvm-dev/2017-May/113292.html says the following. Your comment suggests the opposite. I am fine with either way.

@rnk wrote:

I actually slightly prefer the name LLVM_DISABLE_DISABLE_SYMBOLIZATION over LLVM_DISABLE_CRASH_REPORT because the former appears to be more appropriate.

We have the LLVM_DISABLE_CRASH_REPORT environment variable, but that's more about whether we should do crash reporting or not.

It would be pretty reasonable to have another one to disable all this stuff. It would also be reasonable to have a cmake option that compiles this stuff away, since it basically never works on user machines that don't have debug info.

Ah, I see - I misunderstood what the existing codepath does. I guess/think it's about those sort of OS-level pop-up crash dialogs that some operating systems have? So that makes sense that it'd be disabled for all tests, even the ones that aren't intended to crash because otherwise it could stall the whole test run, potentially, with a blocking pop-up dialog.

& yeah, I'm onboard with the new/separate env variable name/functionality. Any chance there would be some way to disable this temporarily at runtime, though? So we could disable it during death tests (I think in the GTEST_DEATH_ macro, in the "DeathTest::EXECUTE_TEST" case in the switch there (in gtest-death-test-internal.h)) but not having to disable it for all unit test executions?

I am very unfamiliar with gtest-death-test.cc, which additionally has some Windows related stuff for which I don't have a good way to test (I have no Windows machine).

Not sure the Windows stuff has too much to do with what would be needed for this work. If there was, say, a global variable/function to call to disable symbolization, that could be called from DeathTest::ReturnSentinel's constructor.

I also think lit tests take most of the time, so I do not intend to address it in this patch.

It's pretty close - there are 530 "not --crash" RUN lines in the LLVM monorepo, and 606 EXPECT or ASSERT_DEATH tests.

I don't think it should be addressed in this patch - but if you're interested in the cost of symbolization on LLVM test execution, there's possibly more cost in unit tests (it'll depend on how big the stacks are - I guess the "not --crash" tests might crash deeper inside LLVM, but the gunit tests will have extra frames of gunit infrastructure calls, so it might be a toss up) than in "not --crash" tests.

Ah, I see I'm to blame for the existing DisableSymbolication flag & indeed in the initial review an env variable was speculated as another possible direction. So perhaps if you're feeling inclined, in a follow-up patch, you could remove the flag and migrate it to use the env variable. (I seem to have experimented with that direction at least a little here: https://reviews.llvm.org/D33804?id=101121#inline-296440 )

Removing DisableSymbolication in a follow-up sounds good.

This revision is now accepted and ready to land.Aug 19 2020, 4:12 PM
aganea added inline comments.Aug 20 2020, 6:10 AM
llvm/utils/not/not.cpp
37

setenv doesn't work on Windows, you would need to use SetEnvironmentVariableA. So you can do something like:

#ifdef _WIN32
#include <windows.h>
#endif

static void env(const char *Name, const char *Val) {
#ifdef LLVM_ON_UNIX
  setenv(Name, Val, 0);
#else
  SetEnvironmentVariableA(Name, Val);
#endif
}

int main(int argc, const char **argv) {
...
  env("LLVM_DISABLE_CRASH_REPORT", "1");
}

Testing now to see the impact of this change.

aganea added inline comments.Aug 20 2020, 6:21 AM
llvm/test/tools/not/disable-symbolization.test
2

This test also fails on Windows - I think lit doesn't like env alone.

******************** TEST 'LLVM :: tools/not/disable-symbolization.test' FAILED ********************
Script:
--
: 'RUN: at line 1';   not --crash env > D:\llvm-project\buildninjaRel\test\tools\not\Output\disable-symbolization.test.tmp || true
: 'RUN: at line 2';   d:\llvm-project\buildninjarel\bin\filecheck.exe D:\llvm-project\llvm\test\tools\not\disable-symbolization.test < D:\llvm-project\buildninjaRel\test\tools\not\Output\disable-symbolization.test.tmp
--
Exit Code: 127

Command Output (stdout):
--
$ ":" "RUN: at line 1"
$ "not" "--crash" "env"
# command stderr:
Error: 'env' requires a subcommand
error: command failed with exit status: 127

--

printenv instead?

MaskRay updated this revision to Diff 286824.Aug 20 2020, 8:15 AM

env -> printenv
Use SetEnvironmentVariableA on Windows

llvm/test/tools/not/disable-symbolization.test
2

Thanks! printenv is not specified by POSIX, so I did not use it. But I just checked: printenv appeared in 3.0BSD & Solaris has it as well, so the portability is good.

dblaikie added inline comments.Aug 20 2020, 3:40 PM
llvm/test/tools/not/disable-symbolization.test
2

Any ideas if this'll work on Windows?

llvm/utils/not/not.cpp
37–43

Rather than adding non-portable code, any chance of passing this environment variable into the child process creation in ExecuteAndWait?

MaskRay added inline comments.Aug 20 2020, 3:49 PM
llvm/utils/not/not.cpp
37–43

That requires to construct a vector<StringRef>. The complexity may not be lower.

dblaikie added inline comments.Aug 20 2020, 4:17 PM
llvm/utils/not/not.cpp
37–43

Yeah, I'm not expecting it to be less code - but more portable/keeping the portability layers down in the System library in one place, etc.

I don't see any measureable improvement in terms of execution time on a 36-core Windows with ninja check-llvm Release with assertions (164 sec both with & without the patch). However there's a measurable saving on a 6-core: 820 sec without patch -> 784 sec with patch, median times after a few runs.

llvm/test/tools/not/disable-symbolization.test
2

@dblaikie It does work, since a Unix-like shell is required anyway to run the lit tests on Windows, either MinGW, CygWin or GnuWin32 on the top of the regular cmd shell.

llvm/utils/not/not.cpp
37–43

+1 with David said.

MaskRay added inline comments.Aug 21 2020, 8:29 AM
llvm/utils/not/not.cpp
37–43

I checked about 16 uses of ExecuteAndWait - there is no instance passing an envp array. Moreover, I cannot find an API returning the current set environment variables. GetEnv can return the value of one environment variable but that is apparently insufficient.

In addition, dispatching on _WIN32 already exists in this file. I'd argue that setenv is more straightforward as the intention is very clearly expressed in the 2 lines of setenv. The #ifdef _WIN32 branch, even if the user is not familiar with it, the user can easily infer its intention as well. Getting the environment variable array and adding two more may be slightly less obvious, even if it does not use #ifdef

aganea accepted this revision.Aug 21 2020, 10:22 AM
aganea added inline comments.
llvm/utils/not/not.cpp
37–43

You're right, ExecuteAndWait doesn't inherit from the current environment if you give it a new environment block, I was under the false impression that it would. At some point, we should perhaps import lldb's Host::GetEnvironment() into LLVMSupport and use the Environment type as argument to ExecuteAndWait. But in the meanwhile, I would go ahead with your patch.

dblaikie accepted this revision.Aug 21 2020, 10:33 AM

yeah, fair enough - be good to push this environment variable setting down into System some time (& share it with Support/ProcessTest which also does some environment variable setting) & add some way to query the whole environment, as @aganea mentioned

zequanwu added inline comments.
llvm/test/tools/not/disable-symbolization.test
2

Hi, this test failed on Windows

******************** TEST 'LLVM :: tools/not/disable-symbolization.test' FAILED ********************
 Script:
 --
 : 'RUN: at line 1';   not --crash printenv > C:\b\s\w\ir\cache\builder\src\third_party\llvm-bootstrap\test\tools\not\Output\disable-symbolization.test.tmp || true
 : 'RUN: at line 2';   c:\b\s\w\ir\cache\builder\src\third_party\llvm-bootstrap\bin\filecheck.exe C:\b\s\w\ir\cache\builder\src\third_party\llvm\llvm\test\tools\not\disable-symbolization.test < C:\b\s\w\ir\cache\builder\src\third_party\llvm-bootstrap\test\tools\not\Output\disable-symbolization.test.tmp
 --
 Exit Code: 2
 
 Command Output (stdout):
 --
 $ ":" "RUN: at line 1"
 $ "not" "--crash" "printenv"
 # command stderr:
 error: unable to find `printenv' in PATH: no such file or directory
aganea added inline comments.Aug 24 2020, 4:38 PM
llvm/test/tools/not/disable-symbolization.test
2

@zequanwu Are you running from a cmd shell or a Unix-like shell? Do you have GnuWin32 installed and in PATH?

This is what I see from cmd.exe:

D:\llvm-project>which printenv
C:\Users\aganea\Downloads\GetGnuWin32\gnuwin32\bin\printenv.EXE

From a MinGW shell:

$ which printenv
/usr/bin/printenv
MaskRay added inline comments.Aug 24 2020, 4:38 PM
llvm/test/tools/not/disable-symbolization.test
2

@aganea If printenv is not always available, I'd like to use

UNSUPPORTED: system-windows

and switch back to env (POSIX).

zequanwu added inline comments.Aug 24 2020, 4:45 PM
llvm/test/tools/not/disable-symbolization.test
2

It's a building bot and running from cmd.exe

cmd.exe /C "cd /D C:\b\s\w\ir\cache\builder\src\third_party\llvm-bootstrap && C:\b\s\w\ir\cipd_bin_packages\cpython3\bin\python3.exe C:/b/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/./bin/llvm-lit.py -sv --param USE_Z3_SOLVER=0 C:/b/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/projects/compiler-rt/test/profile/Profile-x86_64 C:/b/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/tools/clang/test C:/b/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/tools/lld/test C:/b/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/utils/lit C:/b/s/w/ir/cache/builder/src/third_party/llvm-bootstrap/test"
 ninja: build stopped: subcommand failed.
aganea added inline comments.Aug 24 2020, 4:54 PM
llvm/test/tools/not/disable-symbolization.test
2

@MaskRay Ok, but I'd like to understand what is going on. This page mandates having GnuWin32 in PATH, and GnuWin32 hasn't been updated since 2010, so printenv should be there,
@zequanwu Is it a public bot? Would you have a link to the logs?

MaskRay added inline comments.Aug 24 2020, 5:00 PM
llvm/test/tools/not/disable-symbolization.test
2

Sent D86496

I'd want to use 'env' instead of 'printenv' as well, since the former is specified by POSIX.

aganea added inline comments.Aug 24 2020, 6:16 PM
llvm/test/tools/not/disable-symbolization.test
2

It looks like the issue @zequanwu was seeing is specific to how the Chromium build environment is set up, and should not occur on users' machines. The first thing that it does, is to download & install this package gnuwin-13.zip which is a largely slimmed down version of GnuWin32. See: https://chromium.googlesource.com/chromium/src/+/refs/heads/master/tools/clang/scripts/build.py#262
I suppose this zip archive was created empirically. If we wanted to re-enable this test on Windows, someone in charge with gnuwin-XX.zip (@hans ?) could add printenv.exe in the archive and that would solve the issue.

MaskRay added inline comments.Aug 24 2020, 6:25 PM
llvm/test/tools/not/disable-symbolization.test
2

Thanks for investigating this issue!

Meanwhile, I've committed D86496. I was surprised that GnuWin32 env does not implement the POSIX specified behavior

If no utility operand is specified, the resulting environment shall be written to the standard output, with one name= value pair per line.

MaskRay added inline comments.Aug 24 2020, 6:27 PM
llvm/test/tools/not/disable-symbolization.test
2
******************** TEST 'LLVM :: tools/not/disable-symbolization.test' FAILED ********************
Script:
--
: 'RUN: at line 2';   not --crash env > c:\b\slave\clang-x64-windows-msvc\build\stage1\test\tools\not\Output\disable-symbolization.test.tmp || true
: 'RUN: at line 3';   cmp -s c:\b\slave\clang-x64-windows-msvc\build\stage1\test\tools\not\Output\disable-symbolization.test.tmp /dev/null || c:\b\slave\clang-x64-windows-msvc\build\stage1\bin\filecheck.exe c:\b\slave\clang-x64-windows-msvc\llvm-project\llvm\test\tools\not\disable-symbolization.test < c:\b\slave\clang-x64-windows-msvc\build\stage1\test\tools\not\Output\disable-symbolization.test.tmp
--
Exit Code: 127

Command Output (stdout):
--
$ ":" "RUN: at line 2"
$ "not" "--crash" "env"
# command stderr:
Error: 'env' requires a subcommand
error: command failed with exit status: 127

OK, I will not waste more time on the problem as I don't have Windows to test. Hope @aganea can probably address the problem.

zequanwu added inline comments.Aug 24 2020, 7:05 PM
llvm/test/tools/not/disable-symbolization.test
2

Thanks for looking into it.