This is an archive of the discontinued LLVM Phabricator instance.

[ASTMatchers] Output currently processing match and nodes on crash
ClosedPublic

Authored by njames93 on Feb 19 2022, 4:30 AM.

Details

Summary

Create a PrettyStackTraceEvent that will dump the current MatchCallback id as well as the BoundNodes if the 'run' method of a MatchCallback results in a crash.
The purpose of this is sometimes clang-tidy checks can crash in the check method. And in a large codebase with alot of checks enabled and in a release build, it can be near impossible to figure out which check as well as the source code that caused the crash. Without that information a reproducer is very hard to create.
This is a more generalised version of D118520 which has a nicer integration and should be useful to clients other than clang-tidy.

Diff Detail

Event Timeline

njames93 created this revision.Feb 19 2022, 4:30 AM
njames93 requested review of this revision.Feb 19 2022, 4:30 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptFeb 19 2022, 4:30 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I've reused the test cases from the old clang-tidy implementation of this, however this should eventually be moved into ASTMatchers tests.

njames93 updated this revision to Diff 410139.Feb 20 2022, 2:44 AM

Moved TraceReporter into the MatchASTVisitor class
Added some missing new lines
Fixed patch not being based off a commit in trunk

clang/lib/ASTMatchers/ASTMatchFinder.cpp
1541

Is there a precedent to instrument this function too. AFAIK this is only used when running 1 matcher on a predetermined node so doesn't have the same issues when debugging as below.

njames93 updated this revision to Diff 412772.Mar 3 2022, 10:51 AM

Moved tests into ASTMatchersInternal, makes sense in here and removes the dirty clang-tidy plugin hack.

Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 10:51 AM
njames93 updated this revision to Diff 412793.Mar 3 2022, 11:44 AM

Update test case

aaron.ballman added inline comments.Mar 3 2022, 1:37 PM
clang/lib/ASTMatchers/ASTMatchFinder.cpp
789

Should this be "<anonymous> : " instead?

796

Do we also need a match for TypeLoc matchers, or does the else cover that sufficiently well?

(Actually, should we handle all of the various matchers at: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/ASTMatchers/ASTMatchers.h#L141 rather than leaving it to an else? Then the else can become an unreachable so we know to update this interface?)

njames93 added inline comments.Mar 3 2022, 1:52 PM
clang/lib/ASTMatchers/ASTMatchFinder.cpp
789

Good catch

796

The else should be sufficient for most general cases, the only reason some are special cased is to improve the output, but I don't want there to be a burden to update this interface if new nodes are added.

aaron.ballman added inline comments.Mar 3 2022, 1:57 PM
clang/lib/ASTMatchers/ASTMatchFinder.cpp
796

I should verify: does this map hold arbitrary AST nodes, or does the map only hold the top-level classes in the AST matching hierarchy?

If it's arbitrary AST nodes, then yeah, I definitely agree the else here is fine. If it's top-level classes instead, we don't add those all that often and so it doesn't seem like a major burden to keep those in sync.

njames93 added inline comments.Mar 3 2022, 2:07 PM
clang/lib/ASTMatchers/ASTMatchFinder.cpp
796

It can hold any kind of node that can be stored in a DynTypedNode, so essentially it's any arbitrary AST node.

Essentially looks good to me now, thanks!

clang/lib/ASTMatchers/ASTMatchFinder.cpp
789

It'd probably be good to add some test coverage for these weird edge cases (unnamed structs, constructors or other special member functions without a name).

796

Thanks for clarifying that; the else look *beautiful* to me! :-D

njames93 updated this revision to Diff 416994.Mar 21 2022, 9:41 AM
njames93 marked 5 inline comments as done.

Rework testing infrastructure.
Add more test cases.

njames93 updated this revision to Diff 417041.Mar 21 2022, 11:48 AM

Remove <anonymous> from Decls that aren't NamedDecls

This revision is now accepted and ready to land.Mar 21 2022, 11:52 AM
This revision was landed with ongoing or failed builds.Mar 21 2022, 12:14 PM
This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Mar 21 2022, 1:05 PM

Looks like this breaks tests on Windows: http://45.33.8.238/win/54785/step_7.txt

Please take a look, and revert for now if takes a while to fix.

njames93 added a comment.EditedMar 21 2022, 3:22 PM

Looks like this breaks tests on Windows: http://45.33.8.238/win/54785/step_7.txt

Please take a look, and revert for now if takes a while to fix.

Seems to be that on windows we aren't getting the crash dump. I'll revert and investigate

Looks like this breaks tests on Windows: http://45.33.8.238/win/54785/step_7.txt

Please take a look, and revert for now if takes a while to fix.

I can't seem to reproduce the test failure on windows, are you opposed to relanding this with the test disabled on windows?

Looks like this breaks tests on Windows: http://45.33.8.238/win/54785/step_7.txt

Please take a look, and revert for now if takes a while to fix.

I can't seem to reproduce the test failure on windows, are you opposed to relanding this with the test disabled on windows?

I'd like to understand what's happening better rather than land with a disabled test. Not getting the crash dump in some circumstances could either be specific to the machine configuration (which would be pretty tricky for us to solve in a unit test) or it could be something functionally wrong with the changes (though I would be surprised, tbh).

I'd like to understand what's happening better rather than land with a disabled test. Not getting the crash dump in some circumstances could either be specific to the machine configuration (which would be pretty tricky for us to solve in a unit test) or it could be something functionally wrong with the changes (though I would be surprised, tbh).

From what it looks like on the buildbot, the testing infrastructure wasn't recieving any output from stderr.

I'd like to understand what's happening better rather than land with a disabled test. Not getting the crash dump in some circumstances could either be specific to the machine configuration (which would be pretty tricky for us to solve in a unit test) or it could be something functionally wrong with the changes (though I would be surprised, tbh).

From what it looks like on the buildbot, the testing infrastructure wasn't recieving any output from stderr.

@thakis -- is it possible your build bot is configured to disable generation of crash dumps?

@thakis -- is it possible your build bot is configured to disable generation of crash dumps?

So with a lot of trial and error, it seems that on certain windows configurations, crash dumps just aren't being emitted.
Best I can see so far is using clang to build the tests, results in ENABLE_BACKTRACES being enabled, but no crash dumps being emitted, causing the test failure.
I did change the test to throw an assert instead of a TRAP instruction, and the assert message was captured but no crash dump was reported, so the test infrastructure has no issue there.
I feel like a stop gap may be to disable the test when clang is the host compiler and windows is the platform.

@thakis -- is it possible your build bot is configured to disable generation of crash dumps?

So with a lot of trial and error, it seems that on certain windows configurations, crash dumps just aren't being emitted.
Best I can see so far is using clang to build the tests, results in ENABLE_BACKTRACES being enabled, but no crash dumps being emitted, causing the test failure.

From looking at CMake, I don't see anything that actually tests if backtraces are enabled by the OS; we just trust the user told us something valid and set ENABLE_BACKTRACES accordingly. So I'd still like to hear from @thakis about his bot configuration.

I did change the test to throw an assert instead of a TRAP instruction, and the assert message was captured but no crash dump was reported, so the test infrastructure has no issue there.

That's good at least.

I feel like a stop gap may be to disable the test when clang is the host compiler and windows is the platform.

I wonder if "clang is the host compiler" is just happenstance though and this will cause us to pass @thakis' bots but not other testing situations (other downstreams, for example). So this may work as a stopgap, but if we can get to the bottom of the issue, I think it would be valuable. e.g., perhaps the better solution is for cmake to run a configure test to see if crash dumps are actually generated (or enabled, if there's a WMI query or something we can use), and if not, warn the user and don't define ENABLE_BACKTRACES in that case.

That bot sets ENABLE_BACKTRACES=1.

It seems to work for clang-cl (foo.cc contains #pragma clang __debug parser_crash):

>out\gn\bin\clang-cl /c foo.cc
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: out\\gn\\bin\\clang-cl /c foo.cc
1.      foo.cc:1:23: at annotation token
 #0 0x00007ff6653b7395 (C:\src\llvm-project\out\gn\bin\clang-cl.exe+0x3997395)

I tried setting up a local cmake build, but cmake complains that it can't find py3 even though I have it installed. I also have today off, so I don't want to spend a lot of time debugging it.

I'd suggest you land this with the test disabled on Win, and I'll debug it on Windows next week.

That bot sets ENABLE_BACKTRACES=1.

It seems to work for clang-cl (foo.cc contains #pragma clang __debug parser_crash):

>out\gn\bin\clang-cl /c foo.cc
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.      Program arguments: out\\gn\\bin\\clang-cl /c foo.cc
1.      foo.cc:1:23: at annotation token
 #0 0x00007ff6653b7395 (C:\src\llvm-project\out\gn\bin\clang-cl.exe+0x3997395)

I tried setting up a local cmake build, but cmake complains that it can't find py3 even though I have it installed. I also have today off, so I don't want to spend a lot of time debugging it.

I'd suggest you land this with the test disabled on Win, and I'll debug it on Windows next week.

Okay, that works for me, this definitely sounds more involved than "bot configured wrong" but doesn't sound like "functionality is incorrect with the patch". Thanks for digging into this next week @thakis (and enjoy your day off!), and thanks for the discussion @njames93 while we figured out what to do next.

I've landed it and your build bot is happy, there was another failing buildbot so I'll keep an eye on it.

mgorny reopened this revision.Jul 21 2022, 3:42 AM

This change broke standalone build of clang. Please fix.

clang/unittests/ASTMatchers/ASTMatchersInternalTest.cpp
15

This is private header of LLVM and you can't use it in clang.

This revision is now accepted and ready to land.Jul 21 2022, 3:42 AM
njames93 closed this revision.Jul 22 2022, 2:36 AM

This change broke standalone build of clang. Please fix.

I've pushed rG251b5b864183e868ffc86522e320f91ab3c5a771 which should fix the build, but need to look into a way to re-enable the tests.

Thanks. The usual way of doing is to move the necessary bits to llvm-config.h.