This is an archive of the discontinued LLVM Phabricator instance.

[lldb] make it easier to find LLDB's python
ClosedPublic

Authored by lawrence_danna on Nov 1 2021, 4:58 PM.

Details

Summary

It is surprisingly difficult to write a simple python script that
can reliably import lldb without failing, or crashing. I'm
currently resorting to convolutions like this:

    def find_lldb(may_reexec=False):
		if prefix := os.environ.get('LLDB_PYTHON_PREFIX'):
			if os.path.realpath(prefix) != os.path.realpath(sys.prefix):
				raise Exception("cannot import lldb.\n"
					f"  sys.prefix should be: {prefix}\n"
					f"  but it is: {sys.prefix}")
		else:
			line1, line2 = subprocess.run(
				['lldb', '-x', '-b', '-o', 'script print(sys.prefix)'],
				encoding='utf8', stdout=subprocess.PIPE,
				check=True).stdout.strip().splitlines()
			assert line1.strip() == '(lldb) script print(sys.prefix)'
			prefix = line2.strip()
			os.environ['LLDB_PYTHON_PREFIX'] = prefix

		if sys.prefix != prefix:
			if not may_reexec:
				raise Exception(
					"cannot import lldb.\n" +
					f"  This python, at {sys.prefix}\n"
					f"  does not math LLDB's python at {prefix}")
			os.environ['LLDB_PYTHON_PREFIX'] = prefix
			python_exe = os.path.join(prefix, 'bin', 'python3')
			os.execl(python_exe, python_exe, *sys.argv)

		lldb_path = subprocess.run(['lldb', '-P'],
			check=True, stdout=subprocess.PIPE,
				encoding='utf8').stdout.strip()

		sys.path = [lldb_path] + sys.path

This patch aims to replace all that with:

#!/usr/bin/env lldb-python
import lldb
...

... by adding the following features:

  • new command line option: --print-script-interpreter-info. This prints language-specific information about the script interpreter in JSON format.
  • new tool (unix only): lldb-python which finds python and exec's it.

Diff Detail

Event Timeline

lawrence_danna created this revision.Nov 1 2021, 4:58 PM
lawrence_danna requested review of this revision.Nov 1 2021, 4:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 1 2021, 4:58 PM
lawrence_danna edited the summary of this revision. (Show Details)Nov 1 2021, 4:59 PM

I feel like this puts too much Python specific logic in the driver compared to the convenience it brings. Even though we already leak python details (like --python-path), LLDB has always been designed to support several scripting languages. I'm not convinced this use case is compelling enough to make the situation worse.

FWIW crashlog.py has a good example of how to import LLDB. It's complicated slightly by us having to honor the LLDB_DEFAULT_PYTHON_VERSION, but otherwise it's pretty simple:

try:
    import lldb
except ImportError:
    lldb_python_path = subprocess.check_output(['xcrun', 'lldb', '-P']).decode("utf-8").strip()
    if os.path.exists(lldb_python_path) and not sys.path.__contains__(lldb_python_path):
        sys.path.append(lldb_python_path)
        import lldb

A potential alternative to this could be an lldb-python (python) script that reinvokes itself with the correct PYTHONPATH set?

Since you do need to match the lldb module to the python it was built against, this seems a useful addition.

The original --python-path was a bit unfortunate, we're supposed to be able to support multiple scripting interpreters (currently python & lua) and it would be annoying to have to add one of these variables for each scripting language we support. So it seems a shame to repeat that error.

Maybe it would be better to have:

--scripting-module-path {python, lua}

and

--scripting-executable-path {python, lua}

as the arguments to the driver (we'll have to keep the --python-path for compatibility.)

Seems like "where is your module" and "where is your executable" are also questions we should be asking the ScriptInterpreter base class, and not going straight to the Python interpreter.

I feel like this puts too much Python specific logic in the driver compared to the convenience it brings. Even though we already leak python details (like --python-path), LLDB has always been designed to support several scripting languages. I'm not convinced this use case is compelling enough to make the situation worse.

FWIW crashlog.py has a good example of how to import LLDB. It's complicated slightly by us having to honor the LLDB_DEFAULT_PYTHON_VERSION, but otherwise it's pretty simple:

try:
    import lldb
except ImportError:
    lldb_python_path = subprocess.check_output(['xcrun', 'lldb', '-P']).decode("utf-8").strip()
    if os.path.exists(lldb_python_path) and not sys.path.__contains__(lldb_python_path):
        sys.path.append(lldb_python_path)
        import lldb

A potential alternative to this could be an lldb-python (python) script that reinvokes itself with the correct PYTHONPATH set?

I don't think that fits exactly with what Larry's patch is doing. The lldb vrs. Python problem is two-fold, one problem is finding some lldb module somewhere, but the other is making sure that the python you pick (should you have multiple on your system) is the one that's going to work with the lldb module your lldb is going to find for you.

The solutions we have to this problem all assume you've ALREADY found the right python, now you just have to find some lldb module. OTOH lldb-python uses the Python library that lldb loads to find the right python executable and match it to the right lldb module.

This isn't 100% necessary: once you've found your lldb, you could just continue to use lldb, employing the script command to feed the Python interpreter. But I can see how you might want Python to be in control, and this solution makes that use case less error-prone.

But I do agree with Jonas that we should find a less python-specific way to implement this. I don't think you need to actually fill in the Lua parts, but whoever does that shouldn't have to copy everything that was done for python, a lot of this should be shareable.

FWIW crashlog.py has a good example of how to import LLDB. It's complicated slightly by us having to honor the LLDB_DEFAULT_PYTHON_VERSION, but otherwise it's pretty simple:

Unfortunately that is not enough to solve the problem, because if we run that script with the wrong version of python it will crash (see https://reviews.llvm.org/D112972)
Even if that bug is fixed, it will still raise an ImportError

A potential alternative to this could be an lldb-python (python) script that reinvokes itself with the correct PYTHONPATH set?

That approach would work, but I'd still need --python-prefix to find the right python. I'll update this diff tomorrow doing it that way.

change lldb-python into a script instead of a feature of Driver.cpp

jingham added a comment.EditedNov 2 2021, 10:07 AM

I still don't think adding python specific ways of getting at generic ScriptInterpreter features (e.g. the path to the lldb module for that script interpreter - not your fault but... - and the path to the script's native script runner binary) with language specific affordances is the right way to continue...

These should be "--interpreter-prefix python" and --interpreter-prefix lua" and there should be a ScriptInterpreter::GetInterpreterPrefix(Language) or whatever to fetch the paths. Even most of the work your lldb-python does could probably work for Lua with just a few changes (like PYTHONPATH-> whatever that is for Lua...)

I was talking to Jim about this offline and one potential solution would be to have a flag that asks the current script interpreter for some of this relevant information. What that means would be different for every language, so the output would have to be able to define its own keys and values. (JSON could be a good candidate for this?) For Python, this would include the python path and the prefix, while for lua this might be something totally different. Both could include a version.

So the way I imagine this working is something like this:

$ lldb -l python --print-script-interpreter-info
{ 
  "prefix": "/foo/bar",
  "path": "/baz/quux",
   "version": "3.9"
}
$ lldb -l lua --print-script-interpreter-info
{ 
   "version": "5.4"
}

From an SB API perspective, this would be a function in SBDebugger that would return a string or populate a buffer. The driver would then simply print that, and the Python script would parse it and do the right thing.

WDYT?

I would do this:

lldb --print-script-interpreter-info python

making the interpreter the argument for this option. That seems clearer than having to provide two flags. language is overloaded anyway, because it can be a source language or a script interpreter language, so it might be better to side-step that one.

@JDevlieghere

That approach would work great for my purposes, i'll update like that.

updated according to reviewer feedback.

lawrence_danna edited the summary of this revision. (Show Details)Nov 9 2021, 12:06 AM
lawrence_danna edited the summary of this revision. (Show Details)
lawrence_danna edited the summary of this revision. (Show Details)

clang-format

Rather than percolating up a JSON string, we should use StructuredData (and SBStructuredData at the SB API layer) and only convert it to JSON at the very last moment.

lldb/tools/driver/Driver.cpp
410

changed internal apis to use StructuredData instead of char*

A few small nits but I'm very happy with the approach. Thanks Larry!

lldb/include/lldb/API/SBHostOS.h
23 ↗(On Diff #386170)

Is still still relevant?

lldb/test/API/functionalities/paths/TestPaths.py
51
52

Might be nice to check that all the keys you expect for Python are in the structured data, either here or in a separate test.

lawrence_danna marked 3 inline comments as done.

cleanup

lldb/include/lldb/API/SBHostOS.h
23 ↗(On Diff #386170)

nope.

This revision is now accepted and ready to land.Nov 10 2021, 9:26 AM
This revision was automatically updated to reflect the committed changes.