This is an archive of the discontinued LLVM Phabricator instance.

Implement GetMemoryRegions() for Linux and Mac OSX core files.
ClosedPublic

Authored by hhellyer on Jun 27 2016, 6:45 AM.

Details

Summary

This patch fills in the implementation of GetMemoryRegions() on the Linux and Mac OS core file implementations of lldb_private::Process (ProcessElfCore::GetMemoryRegions and ProcessMachCore::GetMemoryRegions.) The GetMemoryRegions API was added under: http://reviews.llvm.org/D20565

The patch re-uses the m_core_range_infos list that was recently added to implement GetMemoryRegionInfo in both ProcessElfCore and ProcessMachCore to ensure the returned regions match the regions returned by Process::GetMemoryRegionInfo(addr_t load_addr, MemoryRegionInfo &region_info).

Diff Detail

Event Timeline

hhellyer updated this revision to Diff 61951.Jun 27 2016, 6:45 AM
hhellyer retitled this revision from to Implement GetMemoryRegions() for Linux and Mac OSX core files..
hhellyer updated this object.
hhellyer added a reviewer: clayborg.
hhellyer added a subscriber: lldb-commits.
hhellyer updated this revision to Diff 61960.Jun 27 2016, 6:50 AM

Squashing local commits.

clayborg requested changes to this revision.Jun 27 2016, 10:35 AM
clayborg edited edge metadata.

We shouldn't need to implement this on all process subclasses. Process::GetMemoryRegions(...) could just call the Process::GetMemoryRegionInfo() and fill the stuff in. We would need to modify lldb_private::MemoryRegionInfo to support knowing when no memory exists for a given address. Why? Because Mac programs actually have a __PAGEZERO section that no read/write/execute, but it is actually mapped into memory, but we currently can't tell the difference between something that isn't mapped, and a section that is mapped with no read/write/exe. If we fix this, then lldb_private::Process subclasses only need to override Process::GetMemoryRegionInfo(...) and we can do the rest up in Process::GetMemoryRegions().

This revision now requires changes to proceed.Jun 27 2016, 10:35 AM

This of course would rely on the fact that each Process subclass the overrides GetMemoryRegionInfo() will follow the following rules:
1 - if no memory is mapped for a given address, set an extra "not_mapped" variable in MemoryRegionInfo and provide the next address that is mapped by setting the end address of the current MemoryRegionInfo correctly.
2 - set values correctly for mapped regions with the correct permissions
3 - For the last entry that is past all mapped entries, set the extra "not_mapped" and set the end address as LLDB_INVALID_ADDRESS

Since there’s a list of memory regions already created in the Process implementations for the core fles I thought it made sense just to use that directly, at least when reading from a core file. It meant there was no requirement to change those GetMemoryRegionInfo implementations if they didn’t behave quite the same. That said it is roughly what I planned to do for the live connections as the behaviour of the qMemoryRegionInfo query is more tightly specified and moving to that model would just push that implementation one layer higher.

I will push a revision which follows the approach you suggest, if we find that throws up it’s own problems then I can always revert. As I mentioned the remote connections should (hopefully) all work with that pattern so GetMemoryRegions implemented in fewer commits with that approach.

Is PAGEZERO actually included in Mac core dumps? I can see it as load command 0 when I run otool against an executable but when I look at a core dump from the same executable the first load command looks like load command 1 in the executable. There doesn’t seem to be a load command in the core for address 0. I might just be missing an flag on otool though.

Since there’s a list of memory regions already created in the Process implementations for the core fles I thought it made sense just to use that directly, at least when reading from a core file. It meant there was no requirement to change those GetMemoryRegionInfo implementations if they didn’t behave quite the same. That said it is roughly what I planned to do for the live connections as the behaviour of the qMemoryRegionInfo query is more tightly specified and moving to that model would just push that implementation one layer higher.

We should document the Process::GetMemoryRegionInfo() and what we expect of it so that all plug-ins implement it correctly.

I will push a revision which follows the approach you suggest, if we find that throws up it’s own problems then I can always revert. As I mentioned the remote connections should (hopefully) all work with that pattern so GetMemoryRegions implemented in fewer commits with that approach.

Indeed. Since we already expect a behavior from Process::GetMemoryRegionInfo(), I would say lets enforce these requirements by making Process::GetMemoryRegions() just use that API as required. The only time we should see an error from Process::GetMemoryRegionInfo() is when it isn't implemented.

Is PAGEZERO actually included in Mac core dumps? I can see it as load command 0 when I run otool against an executable but when I look at a core dump from the same executable the first load command looks like load command 1 in the executable. There doesn’t seem to be a load command in the core for address 0. I might just be missing an flag on otool though.

It might not be, but that brings up another thing I was thinking of is that we could start supplementing the Process::GetMemoryRegionInfo() info by also looking in any sections that were found in core file memory that might not be fully described by the core file itself. There is no requirements that if two sections are adjacent in a core file that they show up as different mapped regions, so you might find a mapped section of memory from 0x1000-0x8000 that is actually 0x1000-0x4000 for ".data" and 0x4000-0x8000 that is ".bss". The memory region was mapped read+write, but would it be nice to know about these different regions separately? We don't need to do this now, but this is one place where we could figure out that 0x0-0x100000000 is __PAGEZERO even though it isn't mapped into the current mach-o file's memory regions. Also there are different makers of core files: lldb, the kernel, possibly others. They don't always follow the same rules when making mach-o core files. So it is nice to be flexible.

labath added a subscriber: labath.Jun 29 2016, 4:51 AM

Please also add a test for this in TestLinuxCore.py.

There shouldn't be a need to add a new core file, you can just add an additional check to the do_test method that verifies the right memory regions (which you can pass in the same way as the pid). You can pass in bogus (empty) values for the i386 and s390x test cases, as they are not working now anyway.

We should document the Process::GetMemoryRegionInfo() and what we expect of it so that all plug-ins implement it correctly.

I'll add a full comment in Process.h and I'll make sure the description matches the behaviour of qMemoryRegionInfo.

I will push a revision which follows the approach you suggest, if we find that throws up it’s own problems then I can always revert. As I mentioned the remote connections should (hopefully) all work with that pattern so GetMemoryRegions implemented in fewer commits with that approach.

Indeed. Since we already expect a behavior from Process::GetMemoryRegionInfo(), I would say lets enforce these requirements by making Process::GetMemoryRegions() just use that API as required. The only time we should see an error from Process::GetMemoryRegionInfo() is when it isn't implemented.

Ok, I’ve looked at the implementation on a couple of Mac and Linux cores, at the moment it will return an error if the function is unimplemented or you request a load_addr past the last mapped memory range. After the change it will only return an error when unimplemented. As GetMemoryRegionInfo (on Mac and Linux cores) doesn't return an error if you ask for unmapped memory ranges in the middle of the address space this is probably more consistent as the MemoryRange will always be populated for any address that’s queried, there's nothing magic about addresses beyond the last mapped region it should behave the same for those. I’ll go through and check the behaviour elsewhere and update the patch. I will have to update the existing implementations to set the flag that says whether a memory region is mapped or not but that doesn't look complicated.

Is PAGEZERO actually included in Mac core dumps? I can see it as load command 0 when I run otool against an executable but when I look at a core dump from the same executable the first load command looks like load command 1 in the executable. There doesn’t seem to be a load command in the core for address 0. I might just be missing an flag on otool though.

It might not be, but that brings up another thing I was thinking of is that we could start supplementing the Process::GetMemoryRegionInfo() info by also looking in any sections that were found in core file memory that might not be fully described by the core file itself. There is no requirements that if two sections are adjacent in a core file that they show up as different mapped regions, so you might find a mapped section of memory from 0x1000-0x8000 that is actually 0x1000-0x4000 for ".data" and 0x4000-0x8000 that is ".bss". The memory region was mapped read+write, but would it be nice to know about these different regions separately? We don't need to do this now, but this is one place where we could figure out that 0x0-0x100000000 is __PAGEZERO even though it isn't mapped into the current mach-o file's memory regions. Also there are different makers of core files: lldb, the kernel, possibly others. They don't always follow the same rules when making mach-o core files. So it is nice to be flexible.

Another possibility, instead of splitting regions, is providing a way to iterate over all the sections a region contains. I'm not sure which is better. Do you want me to add a "backed" flag, possibly just for cores, that identifies whether there is any data for this region actually in the core dump. (For example a core might be truncated because of a ulimit or have omitted regions because of /proc/[pid]/coredump_filter settings on Linux.)

Please also add a test for this in TestLinuxCore.py.

There shouldn't be a need to add a new core file, you can just add an additional check to the do_test method that verifies the right memory regions (which you can pass in the same way as the pid). You can pass in bogus (empty) values for the i386 and s390x test cases, as they are not working now anyway.

Ok, I'll add that into this patch unless you prefer test updates as separate commits.

hhellyer updated this revision to Diff 62675.Jul 4 2016, 6:31 AM
hhellyer edited edge metadata.

I’ve added a test in TestLinuxCore.py. It found a bug which the change in RangeMap.h resolves.

It fills in the implementation of Process::GetMemoryRegions() and makes minor changes to support the mapped attribute in MemoryRegionInfo to the Mac and Linux process implementations. I also updated the documentation for Process::GetMemoryRegionInfo.

Windows will require more changes to support the API correctly, I’m currently getting an environment setup where I can work on those.

clayborg requested changes to this revision.Jul 5 2016, 11:39 AM
clayborg edited edge metadata.

It would be nice to export the is_mapped through the SBMemoryRegionInfo in the public API for completeness.

This revision now requires changes to proceed.Jul 5 2016, 11:39 AM
hhellyer updated this revision to Diff 62839.Jul 6 2016, 3:16 AM
hhellyer edited edge metadata.

Updated patch to export IsMapped in the C++ and Python APIs.
Updated the test case to verify all the regions returned in the list are mapped.

clayborg accepted this revision.Jul 6 2016, 10:13 AM
clayborg edited edge metadata.

Looks good.

This revision is now accepted and ready to land.Jul 6 2016, 10:13 AM
hhellyer closed this revision.Jul 7 2016, 1:28 AM