This is an archive of the discontinued LLVM Phabricator instance.

[Symbolize] Use the local MSVC C++ demangler instead of relying on dbghelp. NFC.
ClosedPublic

Authored by mstorsjo on Sep 27 2019, 5:51 AM.

Diff Detail

Event Timeline

mstorsjo created this revision.Sep 27 2019, 5:51 AM
rnk accepted this revision.Oct 3 2019, 5:10 PM

lgtm

This revision is now accepted and ready to land.Oct 3 2019, 5:10 PM
This revision was automatically updated to reflect the committed changes.
mstorsjo reopened this revision.Oct 4 2019, 12:26 AM
mstorsjo added a reviewer: thakis.
mstorsjo added a subscriber: thakis.

Had to revert this one, as it broke sanitizer tests: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/52441

So apparently it seems (again) like it would be useful if the llvm microsoft demangler could be configured to leave out parts of the output. In D68134 (in LLDB) we can consider changing the testcases to include the new demangler output, but there was a concern that there might be cases within LLDB that try to parse the demangled function names. And here it significantly changes the libfuzzer deduplication token output, which I think also isn't ideal.

@thakis - you brought it up earlier that adding options to it could be possible. Do you, who I think is familiar with that demangler, have time to make a PoC of it? Otherwise I might have a stab at it sometimes later...

This revision is now accepted and ready to land.Oct 4 2019, 12:26 AM
mstorsjo requested review of this revision.Oct 4 2019, 12:26 AM

Had to revert this one, as it broke sanitizer tests: http://lab.llvm.org:8011/builders/sanitizer-windows/builds/52441

So apparently it seems (again) like it would be useful if the llvm microsoft demangler could be configured to leave out parts of the output. In D68134 (in LLDB) we can consider changing the testcases to include the new demangler output, but there was a concern that there might be cases within LLDB that try to parse the demangled function names. And here it significantly changes the libfuzzer deduplication token output, which I think also isn't ideal.

@thakis - you brought it up earlier that adding options to it could be possible. Do you, who I think is familiar with that demangler, have time to make a PoC of it? Otherwise I might have a stab at it sometimes later...

That links is dead. Which customization is needed here? microsoftDemangle() already gets a Flags argument; we can just add more flags to it.

But again, maybe the tests are just overly strict here?

That links is dead. Which customization is needed here?

Ah crap, I should have quoted the relevant bits...

IIRC, it at least had extra public: at the start of the symbol, extra return type and calling convention specifiers. D68134 has a more concrete case of what needs to change in LLDB if the llvm demangler is used as-is there.

microsoftDemangle() already gets a Flags argument; we can just add more flags to it.

Oh, that's nice! Then this shoudln't probably be too big a deal to extend.

But again, maybe the tests are just overly strict here?

I think for this particular case, for the fuzzer sanitizer's backtrace disambiguation log, there might be a risk that something is expecting to process the log, which might not be ready to handle extra unexpected keywords. I might be overly cautious though.

In the LLDB case, a concern was voiced that some parts of LLDB might try to parse the demangled symbol names, and not expect those extra keywords there.

That links is dead. Which customization is needed here?

Ah crap, I should have quoted the relevant bits...

IIRC, it at least had extra public: at the start of the symbol, extra return type and calling convention specifiers. D68134 has a more concrete case of what needs to change in LLDB if the llvm demangler is used as-is there.

I guess ideally we would have individual flags for all the UnDecorateSymbolName flags used here: UNDNAME_NO_ACCESS_SPECIFIERS, UNDNAME_NO_ALLOCATION_LANGUAGE, UNDNAME_NO_THROW_SIGNATURES, UNDNAME_NO_MEMBER_TYPE, UNDNAME_NO_MS_KEYWORDS, UNDNAME_NO_FUNCTION_RETURNS.

thakis added a comment.Oct 9 2019, 5:52 AM

for the fuzzer sanitizer's backtrace disambiguation log, there might be a risk that something is expecting to process the log, which might not be ready to handle extra unexpected keywords. I might be overly cautious though.

In the LLDB case, a concern was voiced that some parts of LLDB might try to parse the demangled symbol names, and not expect those extra keywords there.

I think LLDB/win and fuzzers/win might have few enough clients that we can just trying to change it and see what breaks.

mstorsjo updated this revision to Diff 224970.Oct 15 2019, 2:00 AM

Using specific flags to microsoftDemangle, to avoid spurious changes in libfuzzer tests when llvm-symbolizer is used there. (This means that it can't use the nice unified demangle function frontend that handles both itanium and ms style, and which does the char*->std::string wrapping.)

@rnk - does this update of the patch seem ok to you?

rnk accepted this revision.Oct 16 2019, 1:22 PM

lgtm

This revision is now accepted and ready to land.Oct 16 2019, 1:22 PM
This revision was automatically updated to reflect the committed changes.

It seems like this still broke a buildbot. @rnk, as it's your bot, can you look into it before we revert this? The logs don't really show enough detail... http://lab.llvm.org:8011/builders/sanitizer-windows/builds/53114/steps/stage%201%20check/logs/stdio

rnk added a comment.Oct 17 2019, 10:58 AM

It seems like this still broke a buildbot. @rnk, as it's your bot, can you look into it before we revert this? The logs don't really show enough detail... http://lab.llvm.org:8011/builders/sanitizer-windows/builds/53114/steps/stage%201%20check/logs/stdio

Yep, done in r375147. The demangler difference is:

MS:   f(char,char *,char *)
LLVM: f(char, char *, char *)
In D68133#1713254, @rnk wrote:

It seems like this still broke a buildbot. @rnk, as it's your bot, can you look into it before we revert this? The logs don't really show enough detail... http://lab.llvm.org:8011/builders/sanitizer-windows/builds/53114/steps/stage%201%20check/logs/stdio

Yep, done in r375147. The demangler difference is:

MS:   f(char,char *,char *)
LLVM: f(char, char *, char *)

Awesome, thanks! I guess that should have been deducible from the buildbot log, but I overlooked the spacing.