This is an archive of the discontinued LLVM Phabricator instance.

[lldb/bindings] Add Python ScriptedProcess base class to lldb module
ClosedPublic

Authored by mib on Jan 29 2021, 5:19 PM.

Details

Summary

In order to facilitate the writting of Scripted Processes, this patch
introduces a ScriptedProcess python base class in the lldb module.

The base class holds the python interface with all the - abstract -
methods that need to be implemented by the inherited class but also some
methods that can be overwritten.

This patch also provides an example of a Scripted Process with the
MyScriptedProcess class.

rdar://65508855

Signed-off-by: Med Ismail Bennani <medismail.bennani@gmail.com>

Diff Detail

Event Timeline

mib requested review of this revision.Jan 29 2021, 5:19 PM
mib created this revision.
mib updated this revision to Diff 320250.Jan 29 2021, 5:32 PM

Fix typo in swig wrapper.

Can you add a test for this too? I assume I would look very similar to the example. Maybe have one that checks the base class and one that overrides it with some known values.

lldb/bindings/python/python-scripted-process.swig
1 ↗(On Diff #320250)

Does this have to be a SWIG file? I guess there's nothing really special about this file except that it should end up in the lldb module?

22 ↗(On Diff #320250)

I don't think this adds much. I think it would be more valuable to have pydoc style comments on the methods.

28 ↗(On Diff #320250)

Type annotations are Python 3 only. We've agreed to keep the bindings Python 2 compatible until at least after the 13 release is cut. (https://lists.llvm.org/pipermail/lldb-dev/2020-August/016388.html)

mib updated this revision to Diff 323484.Feb 12 2021, 2:42 PM
mib marked 3 inline comments as done.
mib edited the summary of this revision. (Show Details)

Address @JDevlieghere comments:

  • Instead of having the ScriptedProcess python base class in a SWIG file, I moved it to a python file in the example folder. There is a post-build event that copies it to the python lldb module under to lldb.plugins package.
  • Added pydoc documentation to the ScriptedProcess base class.
  • Removed all the typing hints that prevented the feature to be compatible with Python 2. I, however, kept them in the MyScriptProcess example.

I also added a first test that checks that the base class exists in the lldb python module.

Unfortunately I don't have the bandwidth to review this, though I feel that this should have a wider discussion, given that its destined to become a pretty big chunk of our public immutable api surface.
Some questions to seed that discussion:

get_num_memory_regions() -> int:
get_memory_region_at_index(idx: int) -> lldb.SBMemoryRegionInfo:

This means that the implementation needs to know the exact count of memory regions at any given moment. Elsewhere, we've handled this by having a single api like get_memory_region_containing_address. This permits (but doesn't force) the implementation to be lazy in calculating the memory regions, and the caller can still enumerate the memory regions by starting with get_memory_region_containing_address(0) and continuing by get_memory_region_containing_address(prev_region.base + prev_region+size). I think we should do the same here.

get_num_threads() -> int:
get_thread_at_index(idx: int) -> Dict:

This suffers from the same problem, though the solution is not that simple. But given that, not too long ago, we've (me&Jim) had a long discussion about how materializing all the threads all the time is prohibitively expensive, it might be good to elaborate on how exactly is this supposed to work (must the implementation always return all threads, or can it have threads disappear temporarily, etc.)

get_register_for_thread(tid:int) -> Dict:

I guess this should be at least get_registers_for_thread

read_memory_at_address(addr:int) -> lldb.SBData:

How much memory is this supposed to read?

can_debug() -> bool:

What's the use case for this?

mib added a comment.Feb 16 2021, 2:48 AM

Unfortunately I don't have the bandwidth to review this, though I feel that this should have a wider discussion, given that its destined to become a pretty big chunk of our public immutable api surface.

Thanks for taking the time to peek at this :) !

Some questions to seed that discussion:

get_num_memory_regions() -> int:
get_memory_region_at_index(idx: int) -> lldb.SBMemoryRegionInfo:

This means that the implementation needs to know the exact count of memory regions at any given moment. Elsewhere, we've handled this by having a single api like get_memory_region_containing_address. This permits (but doesn't force) the implementation to be lazy in calculating the memory regions, and the caller can still enumerate the memory regions by starting with get_memory_region_containing_address(0) and continuing by get_memory_region_containing_address(prev_region.base + prev_region+size). I think we should do the same here.

get_num_threads() -> int:
get_thread_at_index(idx: int) -> Dict:

This suffers from the same problem, though the solution is not that simple. But given that, not too long ago, we've (me&Jim) had a long discussion about how materializing all the threads all the time is prohibitively expensive, it might be good to elaborate on how exactly is this supposed to work (must the implementation always return all threads, or can it have threads disappear temporarily, etc.)

Sounds good! Regarding the threads, after talking to @jingham and @jasonmolenda, it sounds like XNU doesn't have a way to provide all its kernel threads at a single time, only the one loaded in each CPU core.
I changed the python interface to allow lazy loading, using the address to fetch a memory region or a thread ID to fetch a certain thread.

get_register_for_thread(tid:int) -> Dict:

I guess this should be at least get_registers_for_thread

Typo :p !

read_memory_at_address(addr:int) -> lldb.SBData:

How much memory is this supposed to read?

Right ... Originally, I didn't want to limit the users on how much data they should store in the SBData, that's why I didn't provided the size and checked the SBData size in the C++ interface.
In hindsight, this was a wrong design.

can_debug() -> bool:

What's the use case for this?

Indeed, that didn't serve much purpose. Got rid of it :)

mib updated this revision to Diff 323940.Feb 16 2021, 2:58 AM

Changed python API to allow lazy fetching for threads and memory regions.

JDevlieghere requested changes to this revision.Feb 16 2021, 10:49 AM

Unfortunately I don't have the bandwidth to review this, though I feel that this should have a wider discussion, given that its destined to become a pretty big chunk of our public immutable api surface.

When @mib and I discussed this offline, the idea was to start with an interface that matches the C++ class and iterate on it as we gain more experience through different use cases. Initially those uses cases consist of core file debugging with a scripted process and using it to write tests for things that have historically hard to test (@jasonmolenda has a better idea of what these are). Long term the scriptable process plugin would be like the OS plugins on steroids and would probably superseed them. I expect the XNU people (who are currently the main users of the OS plugins) to have valuable input on the interface as well. Until then the interface shouldn't be considered stable.

@mib Can you please add a disclaimed to the base class saying that the interface is subject to change?

This revision now requires changes to proceed.Feb 16 2021, 10:49 AM
mib updated this revision to Diff 324072.Feb 16 2021, 12:00 PM

Add disclaimer in the base class docstring saying that the interface is not stable.

JDevlieghere added inline comments.Feb 17 2021, 10:56 AM
lldb/examples/python/scripted_process/scripted_process.py
19

Doesn't the python documentation generator infer the attributes and methods from this file? If not, is it worth having to keep it in sync?

68–72

Personally I don't think these belongs in the base class:

  1. The corresponding methods don't return these lists.
  2. More importantly, if they're attributes then that becomes part of the interface. Now every implementation needs to populate e.g. the threads list while they might want to compute that lazily, like some of the OS plugin implementations do.
lldb/test/API/functionalities/scripted_process/TestScriptedProcess.py
28

None of the expect calls seems to be checking that the package exists and can be imported?

34

It's not clear to me what this is actually testing. The same is true for the other expect calls below. Maybe a comment or add a failure string to the expect calls.

mib updated this revision to Diff 324562.Feb 18 2021, 1:16 AM
mib marked 3 inline comments as done.

Address @JDevlieghere comments.

This LGTM. @labath are you fine with landing this and incrementally improving the interface?

mib marked an inline comment as done.Feb 18 2021, 4:35 PM
mib updated this revision to Diff 324809.Feb 18 2021, 4:37 PM

Remove repetitive documentation.

mib added a comment.Feb 22 2021, 1:39 PM

This LGTM. @labath are you fine with landing this and incrementally improving the interface?

@labath Any objection on the patches ? We would like to land these as soon as possible.

JDevlieghere accepted this revision.Mar 1 2021, 12:08 PM
This revision is now accepted and ready to land.Mar 1 2021, 12:08 PM
mib added a comment.Mar 1 2021, 12:13 PM

Hey @labath! I'm going to land this patch but I'm happy to iterate over it if you have more feedbacks.

Unfortunately I don't have the bandwidth to review this, though I feel that this should have a wider discussion, given that its destined to become a pretty big chunk of our public immutable api surface.

When @mib and I discussed this offline, the idea was to start with an interface that matches the C++ class and iterate on it as we gain more experience through different use cases. Initially those uses cases consist of core file debugging with a scripted process and using it to write tests for things that have historically hard to test (@jasonmolenda has a better idea of what these are). Long term the scriptable process plugin would be like the OS plugins on steroids and would probably superseed them. I expect the XNU people (who are currently the main users of the OS plugins) to have valuable input on the interface as well. Until then the interface shouldn't be considered stable.

@mib Can you please add a disclaimed to the base class saying that the interface is subject to change?

Making the api flexible alleviates a lot of my concerns. Thanks.