This is an archive of the discontinued LLVM Phabricator instance.

New lldb python module for adding diagnostic breakpoints
ClosedPublic

Authored by hintonda on Aug 4 2017, 7:21 PM.

Details

Summary

This module adds a breakpoint for calls to DiagnosticsEngine::Report which will fire each time a diagnostic, e.g., error, warning, etc., is hit, and add breakpoints for all locations where that diagnostic ID is used.

It is intended for use by developers and relies on diagtool being in the same directory of the executable begin debugged, e.g.:

$ lldb bin/clang-6.0
(lldb) target create "bin/clang-6.0"
Current executable set to 'bin/clang-6.0' (x86_64).
(lldb) command script import ../../llvm/tools/clang/utils/clangdiag.py
The "clangdiag" command has been installed, type "help clangdiag" or "clangdiag --help" for detailed help.
(lldb) clangdiag enable
(lldb) run <....>

Breakpoints get added as DiagnosticsEngine::Report is seen, so you will need to run it twice.

Event Timeline

hintonda created this revision.Aug 4 2017, 7:21 PM
hintonda added a subscriber: cfe-commits.
rjmccall edited reviewers, added: jasonmolenda, spyffe; removed: rjmccall.Aug 4 2017, 8:23 PM
rjmccall edited edge metadata.

Since this is fundamentally an LLDB script, I've tagged a couple of LLDB people to review it.

Jason, Sean: the idea here is to make it easier for clang developers to debug unexpected diagnostics by automatically setting breakpoints on the places that originate the observed diagnostics.

hintonda updated this revision to Diff 109949.Aug 6 2017, 9:47 PM
  • Use temp files instead of temp dir.
hintonda abandoned this revision.Sep 29 2017, 8:47 AM
hintonda reclaimed this revision.Oct 19 2017, 5:22 PM
hintonda edited the summary of this revision. (Show Details)
zturner edited edge metadata.Oct 19 2017, 5:40 PM
zturner added a subscriber: lldb-commits.

One possible reason for why this never got any traction is that lldb-commits wasn't added as a subscriber. While it's true that the tagged people should have chimed in, having the whole commits list will get some more visibility. I never saw this come to my inbox.

I think this would be most suitable in the lldb/examples folder.

I can't really review this thoroughly, because it relies on a bash script, and I use Windows where we bash isn't really a thing. My bash is rusty, but it looks like you're embedding a python script in the bash script? It might be good if this were just an lldb script command. Take a look at command script add in LLDB, and in the examples folder for some examples of existing commands that work this way. The nice thing about doing it this way is that you could just be inside LLDB and write (lldb) break-diag -Wcovered-switch, for example, which would be a much tighter integration with the debugger.

One possible reason for why this never got any traction is that lldb-commits wasn't added as a subscriber. While it's true that the tagged people should have chimed in, having the whole commits list will get some more visibility. I never saw this come to my inbox.

I think this would be most suitable in the lldb/examples folder.

I can't really review this thoroughly, because it relies on a bash script, and I use Windows where we bash isn't really a thing. My bash is rusty, but it looks like you're embedding a python script in the bash script? It might be good if this were just an lldb script command. Take a look at command script add in LLDB, and in the examples folder for some examples of existing commands that work this way. The nice thing about doing it this way is that you could just be inside LLDB and write (lldb) break-diag -Wcovered-switch, for example, which would be a much tighter integration with the debugger.

Thanks for taking a look.

I mainly work on *nix systems, hence the initial shell script, but if there's an interest, I'll be happy to convert it to a single python script as you suggest, and resubmit it as a patch to lldb.

Thanks again...

Please do convert to python. Just know that you can use "lldb -P" to get the python path that is needed in order to do "import lldb" in the python script. So you can try doing a "import lldb", and if that fails, catch the exception, run "lldb -P", add that path to the python path:

try:
    # Just try for LLDB in case the lldb module is already in the python search
    # paths
    import lldb
except ImportError:
    # We failed to import lldb automatically. Run lldb with the -P option so
    # it tells us the python path we should use.
    lldb_py_dirs = list()
    (status, output) = commands.getstatusoutput("lldb -P")
    dir = os.path.realpath(output)
    if status == 0 and os.path.isdir(dir):
        lldb_py_dirs.append(dir)
    success = False
    for lldb_py_dir in lldb_py_dirs:
        if os.path.exists(lldb_py_dir):
            if not (sys.path.__contains__(lldb_py_dir)):
                sys.path.append(lldb_py_dir)
                try:
                    import lldb
                except ImportError:
                    pass
                else:
                    success = True
                    break
    if not success:
        print("error: couldn't locate the 'lldb' module, please set "
              "PYTHONPATH correctly")
        sys.exit(1)

Please do convert to python. Just know that you can use "lldb -P" to get the python path that is needed in order to do "import lldb" in the python script. So you can try doing a "import lldb", and if that fails, catch the exception, run "lldb -P", add that path to the python path:

try:
    # Just try for LLDB in case the lldb module is already in the python search
    # paths
    import lldb
except ImportError:
    # We failed to import lldb automatically. Run lldb with the -P option so
    # it tells us the python path we should use.
    lldb_py_dirs = list()
    (status, output) = commands.getstatusoutput("lldb -P")
    dir = os.path.realpath(output)
    if status == 0 and os.path.isdir(dir):
        lldb_py_dirs.append(dir)
    success = False
    for lldb_py_dir in lldb_py_dirs:
        if os.path.exists(lldb_py_dir):
            if not (sys.path.__contains__(lldb_py_dir)):
                sys.path.append(lldb_py_dir)
                try:
                    import lldb
                except ImportError:
                    pass
                else:
                    success = True
                    break
    if not success:
        print("error: couldn't locate the 'lldb' module, please set "
              "PYTHONPATH correctly")
        sys.exit(1)

Is any of this really necessary? If you load this script via command script add (which is definitely better than having this be a post-processing script that someone has to manually run) then it is guaranteed to be in the path. Just import it, like the other examples in lldb/examples/python/jump.py for an example. The idea is to have it do the indexing when the command is loaded and save it to a global, and then each time it runs it uses the global index. This way it's invisible to the user, you just run bcd -Wcovered-switch or something without worrying about it.

clayborg edited edge metadata.Oct 20 2017, 9:39 AM

If you want to run the script from the command line, then it is necessary. If it is run from within LLDB it will just work. I like to have my LLDB python scripts work both ways.

This might be better implemented as a new command that gets installed and can be used within LLDB. See:

http://llvm.org/svn/llvm-project/lldb/trunk/examples/python/cmdtemplate.py

I can't see anything wrong with the SB API use here. I don't feel qualified to comment on the most effective workflow for an analysis I've never had to do, however.

jingham removed a subscriber: jingham.Oct 20 2017, 10:03 AM
jasonmolenda edited edge metadata.Oct 20 2017, 12:45 PM

Sorry for missing this back in August.

I think it'd be clearer to import your python once in the startup, like

-o "script import $module" \

Multiple imports are a no-op IIUC so it's harmless to re-import the module every time the breakpoint is hit (I'm guessing it's only hit once, for that matter), but I think it's clearer to have this on its own line.

For what it's worth, if you use breakpoint command add -F python-function, your function is passed in the frame as well as the SBBreakpointLocation. See 'help breakpoint add' for more information about this; I used this in a breakpoint python command I wrote a while back,

def handle_pthread_create_return(frame, bp_loc, dict):

global pthreads_being_created

thread_index_id = frame.GetThread().GetIndexID()

[...]

it might be a good idea to name the breakpoint you're adding and then you can delete it explicitly, in case the user is doing something unexpected with breakpoints. e.g.

% ~/k/svn/lldb/build/Debug/lldb /tmp/a.out
(lldb) target create "/tmp/a.out"
Current executable set to '/tmp/a.out' (x86_64).
(lldb) br s -n main -N worker
Breakpoint 1: where = a.out`main, address = 0x0000000100000ec0
(lldb) br del worker
1 breakpoints deleted; 0 breakpoint locations disabled.
(lldb) br li
No breakpoints currently set.
(lldb)

Thanks for all the feedback. I'll report back once I've addressed all your suggestions.

Thanks again...

hintonda updated this revision to Diff 120308.Oct 25 2017, 1:59 PM

Reimplement at a python module.

hintonda retitled this revision from Add new script to launch lldb and set breakpoints for diagnostics all diagnostics seen. to New lldb python module for adding diagnostic breakpoints.Oct 25 2017, 2:08 PM
hintonda edited the summary of this revision. (Show Details)

Looks good. A little bit of cleanup. Let me know what you think of the comments.

utils/clangdiag.py
17

I would rather just do:

import subprocess

Then later use:

subprocess.check_output(...)

It makes the code easier to read.

20

Remove? See inlined comment for line 92

21

I would avoid making this a global. It will keep the target alive since it has a strong reference count to the underlying lldb_private::Target.

52

Get the target from the frame instead of grabbing it from a global:

target = frame.thread.process.target

or without the shortcuts:

target = frame.GetThread().GetProcess().GetTarget()
66

Always grab the target and don't store it in a global. so just remove this line

75–78

Allow "diagtool" to be set from an option maybe? This would require the options to be passed into this function as an argument. If it doesn't get set, then modify the options to contain this default value? Then this error message can just complain about the actual path:

print('diagtool "%s" doesn't exist' % diagtool)
80

Add name to breakpoint? See inlined comment at line 92

87

remove and use:

target = debugger.GetSelectedTarget()
91–92

Another way to do this would be to give the breakpoints you create using "target.BreakpointCreateXXX" to have a name when you create them:

bp = target.BreakpointCreateByName('DiagnosticsEngine::Report')
bp.AddName("llvm::Diagnostic")

Then here you can iterate over all breakpoints in the target:

for bp in target.breakpoint_iter():
  if bp.MatchesName("llvm::Diagnostic"):
    target.BreakpointDelete(bp.GetID())

Then you don't need a global list?

Thanks for the quick feedback. I'll make all you suggested changes, but need to think a little more about diagtool.

utils/clangdiag.py
75–78

The problem is that diagtool contains a map of DiagID -> DiagEnums, and must be in sync. I wasn't sure how else to enforce the version you were using was built at the same time as the code you're trying to debug.

I can add the option, but then you might get bogus output.

jingham requested changes to this revision.Oct 25 2017, 2:32 PM

Use the form of the command that gets an SBExecutionContext, then you can avoid having to cache the target at all.

utils/clangdiag.py
98–100

If you use the form of the command function that takes an execution context:

def command_function(debugger, command, exe_ctx, result, internal_dict):

then you can grab the target from there when the command gets invoked and pass it to your enable & disable funcs. That way you won't have to rely on GetSelectedTarget. That's important, for instance, if you were running a debug session with two targets and you wanted to invoke your command in a breakpoint command of a breakpoint in target A. There's no guarantee when target A hits the breakpoint that A is the currently selected target (it won't get selected till it actually decides to stop.) But when the breakpoint runs its command, it sets the right target, & thread in the execution context that gets passed in.

This revision now requires changes to proceed.Oct 25 2017, 2:32 PM

Yes definitely use names for your breakpoints. It makes them easier to handle. Note starting a month or two ago you can set names to not get deleted except by an explicit gesture. That hasn't shown up in releases yet, but once you can rely on its being there, you can set the names to not get auto-deleted or disabled, and then if somebody does:

(lldb) break disable

not thinking they are affecting your utility, you won't get messed up by this.

utils/clangdiag.py
87

See my comment. Don't use selected targets, use the command form that takes an SBExecutionContext. That's been around for more than a couple of years now, so it's pretty safe to use.

91–92

If you name them you can just do:

target.BreakpointDeleteByName("llvm::Diagnostic")

That's even simpler.

clayborg added inline comments.Oct 25 2017, 2:45 PM
utils/clangdiag.py
62

Pass exe_ctx or target into this instead of the debugger (see Jim's comment on execution contexts below.

78

No need. just a suggestion. Is this information available in the main executable as any kind of debug info? If so you can read it from there. If it is always in a stand alone separate file, then what you have is ok.

87

Just pass the target or exe_ctx into this function then instead of "debugger".

92

nice!

100

Great idea. Forgot about the execution context variant. The "exe_ctx" is a lldb.SBExecutionContext object. You get your target using:

target = exe_ctx.GetTarget()
jingham added a comment.EditedOct 25 2017, 3:04 PM

Note, BTW, Enrico also added a form of the command add that allows you to use a Python class to wrap the callable. That was in early 2015 so it's probably safe to use as well by now. That's even more convenient, and obviates the need for globals at all. One instance of the class is made per debugger as the command gets loaded, so its ivars live for the life of the command - spanning executions.

There's an example of it here:

http://llvm.org/svn/llvm-project/lldb/trunk/examples/python/disassembly_mode.py

and it is mentioned in the Python Ref.

You don't by any means have to use this form, but if you are having fun playing around with this stuff, it makes it much more pleasant to write commands.

utils/clangdiag.py
100

Yeah, if I could think of a friendly way to do it, I would outlaw the older form. That was an oversight, and makes commands that aren't guaranteed to behave correctly in breakpoint callbacks.

I'll go make the docs a little stronger now that the exe_ctx form has been there long enough that it's generally safe to use.

hintonda updated this revision to Diff 120340.Oct 25 2017, 4:47 PM
hintonda edited edge metadata.
hintonda edited the summary of this revision. (Show Details)
  • Addressed comments.
hintonda added inline comments.Oct 25 2017, 4:50 PM
utils/clangdiag.py
84

Can't use iterator because it gets invalidated and not all breakpoints get removed. Also, target.BreakpointDeleteByName doesn't seem to exist, so iterated over SBBreakpointList instead.

Ack, my bad. I should remember not to rely on my memory...

I thought at one point I was going to do it that way, then got annoyed I'd have to have "BreakpointDisableByName" etc... So apparently I made:

SBTarget.FindBreakpointByName

that takes a name & an SBBreakpointList, and the list gets filled with the breakpoints that match that name.

hintonda updated this revision to Diff 120360.Oct 25 2017, 10:02 PM
  • Add diagtool option used to set arbitrary diagtool path.
clayborg requested changes to this revision.Oct 26 2017, 9:59 AM

Very close. Thanks for making the changes. Just a few nits.

utils/clangdiag.py
67

Is there only ever one of these? If so this is good.

118

should probably verify that this 'diagtool' with:

elif args.subcommands == 'diagtool':

and add an else that creates an error:

else:
    result.SetError("invalid subcommand '%s'" % (args.subcommands))
This revision now requires changes to proceed.Oct 26 2017, 9:59 AM

Thanks for the feedback (addressed below).

btw, where should this module go?

Since it's intended for clang, and clang based tool, developers, I put it in clang/utils, but Zackery suggested lldb/examples/python might be a better place. Please let me know if anyone has a preference.

utils/clangdiag.py
67

Currently, DiagnosticsEngine::Report only has two signatures, and both contain an unsigned DiagID parameter, so this is just a sanity check to make sure we are in the right place.

118

This is already handled by argparser, which will raise an exception if the first parameter isn't one of (enable, disable, daigtool). That's the benefit of using subparsers. So, by the time you get to this point, it must be "diagtool".

However, I think I should probably verify the path past actually exists. Plus, I think I could add a better exception to alert the user how to fix the problem if diagtool couldn't be found in the callback. Suggestions welcome.

Do I understand correctly that this will insert breakpoints on *all* clang diagnostics? That's not necessarily bad, but I was under the impression originally that it would let you pick the diagnostics you wanted to insert breakpoints on.

Also, What is the workflow for using the "clangdiag diagtool" subcommand? Would you have to do two steps, clangdiag enable and then clangdiag diagtool? If so, maybe it could just be clangdiag enable --diagtool=<path>

clayborg accepted this revision.Oct 26 2017, 10:53 AM
clayborg added inline comments.
utils/clangdiag.py
118

nice, I didn't realized that argparser would protect against that. Checking the file exists would be a good idea.

hintonda updated this revision to Diff 120459.Oct 26 2017, 11:33 AM
  • Enhance diagtool option to check, reset, print out current value.
clayborg accepted this revision.Oct 26 2017, 11:48 AM

Do I understand correctly that this will insert breakpoints on *all* clang diagnostics? That's not necessarily bad, but I was under the impression originally that it would let you pick the diagnostics you wanted to insert breakpoints on.

clangdiag enable sets a breakpoint in DiagnosticsEngine::Report and adds a callback. The callback passed the value of DiagID to diagtool to get the enum spelling of the DiagID, e.g., err_undeclared_var_use, then calls BreakpointCreateBySourceRegex with that string and and empty FileSpec to set breakpoints in every location that string is seen -- specifically, we use "diag::err_undeclared_var_use` for this case. Now, that might add a few more breakpoints than actually needed, but not many, and only rarely in places that were actually covered by this particular run.

Also, since we add the breakpoint after it was seen -- DiagnosticsEngine::Report is called later, and sometimes much later, you'll need to run the debugger again to actually hit all the breakpoints.

Also, What is the workflow for using the "clangdiag diagtool" subcommand? Would you have to do two steps, clangdiag enable and then clangdiag diagtool? If so, maybe it could just be clangdiag enable --diagtool=<path>

I put command script import /Users/dhinton/projects/llvm_project/llvm/tools/clang/utils/clangdiag.py in my ~/.lldbinit file, so here's an example of how I use it:

$ lldb bin/clang-6.0
The "clangdiag" command has been installed, type "help clangdiag" or "clangdiag --help" for detailed help.
(lldb) target create "bin/clang-6.0"
Current executable set to 'bin/clang-6.0' (x86_64).

(lldb) clangdiag diagtool
diagtool = /Users/dhinton/projects/llvm_project/build/Debug/bin/diagtool

(lldb) clangdiag diagtool /bad/path/xx
clangdiag: /bad/path/xx not found.
diagtool = /Users/dhinton/projects/llvm_project/build/Debug/bin/diagtool

(lldb) clangdiag diagtool /Users/dhinton/projects/monorepo/build/Debug/bin/diagtool
diagtool = /Users/dhinton/projects/monorepo/build/Debug/bin/diagtool

(lldb) clangdiag diagtool reset
diagtool = /Users/dhinton/projects/llvm_project/build/Debug/bin/diagtool

(lldb) clangdiag enable
(lldb) br l
Current breakpoints:
1: name = 'DiagnosticsEngine::Report', locations = 33
    Breakpoint commands (Python):
      return clangdiag.setDiagBreakpoint(frame, bp_loc, internal_dict)

  Names:
    clang::Diagnostic

  1.1: where = clang-6.0`clang::DiagnosticsEngine::Report(unsigned int) + 38 at Diagnostic.h:1215, address = clang-6.0[0x0000000100022cd6], unresolved, hit count = 0
  1.2: where = clang-6.0`clang::DiagnosticsEngine::Report(clang::SourceLocation, unsigned int) + 46 at Diagnostic.h:1207, address = clang-6.0[0x000000010002c6ce], unresolved, hit count = 0
<...snip...>

(lldb) clangdiag disable
(lldb) br l
No breakpoints currently set.

(lldb) clangdiag enable
(lldb) br l
Current breakpoints:
2: name = 'DiagnosticsEngine::Report', locations = 33
    Breakpoint commands (Python):
      return clangdiag.setDiagBreakpoint(frame, bp_loc, internal_dict)

  Names:
    clang::Diagnostic

  2.1: where = clang-6.0`clang::DiagnosticsEngine::Report(unsigned int) + 38 at Diagnostic.h:1215, address = clang-6.0[0x0000000100022cd6], unresolved, hit count = 0
  2.2: where = clang-6.0`clang::DiagnosticsEngine::Report(clang::SourceLocation, unsigned int) + 46 at Diagnostic.h:1207, address = clang-6.0[0x000000010002c6ce], unresolved, hit count = 0
<...snip...>

(lldb) run <...>
#### might hit one of the new breakpoints if they are seen more than once
(lldb) run 
#### should hit all the breakpoints for which diagnostics were produced

(lldb) br l
Current breakpoints:
2: name = 'DiagnosticsEngine::Report', locations = 33, resolved = 33, hit count = 5
    Breakpoint commands (Python):
      return clangdiag.setDiagBreakpoint(frame, bp_loc, internal_dict)

  Names:
    clang::Diagnostic
  2.1: where = clang-6.0`clang::DiagnosticsEngine::Report(unsigned int) + 38 at Diagnostic.h:1215, address = 0x0000000100022cd6, resolved, hit count = 0                                                                                        
  2.2: where = clang-6.0`clang::DiagnosticsEngine::Report(clang::SourceLocation, unsigned int) + 46 at Diagnostic.h:1207, address = 0x000000010002c6ce, resolved, hit count = 0
<...snip...>

3: source regex = "err_unknown_typename", exact_match = 0, locations = 6, resolved = 6, hit count = 2                                                                                                                                           Names:                                                                                                                                                                                                                                          clang::Diagnostic

  3.1: where = libclangSema.dylib`clang::Sema::DiagnoseUnknownTypeName(clang::IdentifierInfo*&, clang::SourceLocation, clang::Scope*, clang::CXXScopeSpec*, clang::OpaquePtr<clang::QualType>&, bool) + 1721 at SemaDecl.cpp:677, address = 0x0000000111553ee9, resolved, hit count = 0
<...snip...>

4: source regex = "warn_return_missing_expr", exact_match = 0, locations = 1, resolved = 1, hit count = 0
  Names:
    clang::Diagnostic

  4.1: where = libclangSema.dylib`clang::Sema::BuildReturnStmt(clang::SourceLocation, clang::Expr*) + 3149 at SemaStmt.cpp:3489, address = 0x0000000111db1c6d, resolved, hit count = 0
<...snip...>

5: source regex = "err_expected_lparen_after", exact_match = 0, locations = 39, resolved = 39, hit count = 0
  Names:
    clang::Diagnostic

  5.1: where = libclangParse.dylib`clang::Parser::ConsumeAndStoreFunctionPrologue(llvm::SmallVector<clang::Token, 4u>&) + 527 at ParseCXXInlineMethods.cpp:786, address = 0x0000000110fd74df, resolved, hit count = 0
<...snip...>

6: source regex = "err_undeclared_var_use", exact_match = 0, locations = 11, resolved = 11, hit count = 0
  Names:
    clang::Diagnostic

  6.1: where = libclangSema.dylib`clang::Sema::BuildCXXNestedNameSpecifier(clang::Scope*, clang::Sema::NestedNameSpecInfo&, bool, clang::CXXScopeSpec&, clang::NamedDecl*, bool, bool*, bool) + 5820 at SemaCXXScopeSpec.cpp:630, address = 0x0000000111440ddc, resolved, hit count = 0
<...snip...>

7: source regex = "err_unknown_typename", exact_match = 0, locations = 6, resolved = 6, hit count = 1
  Names:
    clang::Diagnostic

  7.1: where = libclangSema.dylib`clang::Sema::DiagnoseUnknownTypeName(clang::IdentifierInfo*&, clang::SourceLocation, clang::Scope*, clang::CXXScopeSpec*, clang::OpaquePtr<clang::QualType>&, bool) + 1721 at SemaDecl.cpp:677, address = 0x0000000111553ee9, resolved, hit count = 0
<...snip...>

(lldb) clangdiag enable
#### removes all existing breakpoints, then adds back one for DiagnosticsEngine::Report

(lldb) clangdiag disable
#### removes all clangdiag breakpoints
hintonda updated this revision to Diff 120492.Oct 26 2017, 2:38 PM
  • Remove debugging print statement, and enhance help message.

Is there a way to associate a particular diagtool variable to an exe_ctx?

Each lldb.SBValue has accessors for the stuff in an execution context:

``

lldb::SBTarget GetTarget();
lldb::SBProcess GetProcess();
lldb::SBThread GetThread();
lldb::SBFrame GetFrame();
You could keep a global map of process ID to diagtool if you want?

What are you thinking of using this for?
hintonda updated this revision to Diff 120518.Oct 26 2017, 5:36 PM
  • Maintain process id map for diagtool.
hintonda updated this revision to Diff 120538.Oct 26 2017, 9:48 PM
  • Add ability to add breakpoints matching -W<warning> warnings.
hintonda updated this revision to Diff 120628.Oct 27 2017, 9:28 AM
  • Add support for individual DiagID's, and print out number of breakpoints added.
hintonda updated this revision to Diff 120629.Oct 27 2017, 9:46 AM
  • Remove whitespace.
This revision was automatically updated to reflect the committed changes.