This is an archive of the discontinued LLVM Phabricator instance.

[utils] Add a script that runs clang in LLDB and stops it when a specified diagnostic is emitted
Needs ReviewPublic

Authored by arphaman on Jul 31 2017, 4:57 AM.

Details

Reviewers
hintonda
Summary

This script is useful while working on and debugging Clang's diagnostics, as you don't have to figure out where the diagnostic is emitted before setting a breakpoint for the emission of that diagnostic.

Diff Detail

Repository
rL LLVM

Event Timeline

arphaman created this revision.Jul 31 2017, 4:57 AM
hintonda edited edge metadata.Jul 31 2017, 10:27 AM

Cool, I'll try to play with this later today.

BTW, since the call to DiagnosticsEngine::Report is delayed when using PartialDiagnostic, would it make sense to add them as well? Also, this doesn't seem to solve the initial problem of finding the Diag name in the first place.

Eventually, I'd like a scan-build like tool that let you rerun a particular compilation command (without needing to construct the -cc1 command yourself), that adds breakpoints for all warning/error diagnostics seen before invoking lldb -- perhaps even leveraging the compilation database, a la clang-tidy.

It might make sense to add a breakpoint at PartialDiagnostic(unsigned DiagID, StorageAllocator &Allocator), I'll check how that works.

I reckon it should be possible to have a script that could find the name of all emitted diagnostics. Let's say we'd like to run clang with -cc1 main.cpp.
If we run it in LLDB and put a breakpoint at DiagnosticsEngine::Report, we could add a command to that breakpoint that would print a unique string and the diagnostic id.
The script would redirect the stdout from LLDB/Clang, but it would replace the occurrence of the unique string and diagnostic id with the name of the diagnostic by evaluating some diagtool that maps the id back to the diagnostic (If there are multiple same diagnostics it could also add an id and then we could extend this script to stop at the Nth specific diagnostic).

This is a good example of how to script lldb, but it's predicated on knowing the diag name, which is great if you know the name.

However, this isn't my use case. I don't have the diag name, just a diagnostic message. In order to get the diag name associated with a specific diagnostic message, I have to grep the source, which was the original motivation behind D35175.

Here's how I currently do it:

  1. select a partial substring from the diagnostic (omitting variable/class names and anything that might be part of a %select{}, e.g., (public|private|protected), etc...)
  2. use grep to match a diagnostic definition in one of the diagnostic inc files generated by tblgen, i.e., tools/clang/include/clang/Basic/Diagnostic*.inc
  3. if one or more matches found, select the correct one(s), otherwise, adjust substring and go back to 2
  4. pass diag name(s) found in 3 to this tool

However, once the diag name is known, it would be just as easy to find/grep the source to find the file/line numbers and where the diag name is seen. Then you could either look directly at the source, or use them to set breakpoints in lldb. This avoids issues concerning late calls to report and PartialDiagnostic locations.

Alternatively, one could do what John suggested (paraphrasing here) and munge the strings found in tools/clang/include/clang/Basic/Diagnostic*.inc to create regular expressions and use flex to generate a lexer, or write one by hand, that could find a specific diag name that matched the diagnostic message. (obviously there are a lot of ways to do this)

Makes sense. I'll see if I can get somewhere with the regex idea.

Makes sense. I'll see if I can get somewhere with the regex idea.

Btw, I created a quick prototype with sed and found that the diag strings aren't unique -- which isn't surprising since there's no requirement for that. In fact, 23 of them are just "%0".

I suppose you could add a breakpoint for multiple values.