This is an archive of the discontinued LLVM Phabricator instance.

[lldb/gdb-remote] Add support for the qOffsets packet
ClosedPublic

Authored by labath on Feb 14 2020, 2:39 AM.

Details

Summary

This packet is necessary to make lldb work with the remote-gdb stub in
user mode qemu when running position-independent binaries. It reports
the relative position (load bias) of the loaded executable wrt. the
addresses in the file itself.

Lldb needs to know this information in order to correctly set the load
address of the executable. Normally, lldb would be able to find this out
on its own by following the breadcrumbs in the process auxiliary vector,
but we can't do this here because qemu does not support the
qXfer:auxv:read packet.

This patch does not implement full scope of the qOffsets packet (which
supports having different biases for code, data and bss sections). This
is because the relevant lldb interfaces (e.g., Module::SetLoadAddress)
do not support passing different values for different sections, and it's
not clear how would such a thing apply to typicall object files. And
qemu will always
(https://github.com/qemu/qemu/blob/master/linux-user/elfload.c#L2436)
return the same value for code and data offsets. In fact, even gdb
ignores the offset for the bss sections, and uses the "data" offset
instead. So, until the we need more of this packet, I think it's best
to stick to the simplest solution possible. This patch simply rejects
replies with non-uniform offsets.

Diff Detail

Event Timeline

labath created this revision.Feb 14 2020, 2:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 14 2020, 2:39 AM

This patch does not implement full scope of the qOffsets packet (which supports having different biases for code, data and bss sections). This is because the relevant lldb interfaces (e.g., Module::SetLoadAddress) do not support passing different values for different sections, and it's not clear how would such a thing apply to typical object files.

While Module::SetLoadAddress doesn't, the Target interfaces do in fact support this fully. You specify the section and the section load address. Look at the macOS dynamic loader, it does this. All shared libraries in the macOS dyld shared cache have all the the TEXT segments put next to each other so that the OS can load one large read + execute mapping, all the modifiable data sections in DATA are relocated as well along with other segments. The Module::SetLoadAddress() interface was put in Module to make it easy to slide full binaries.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp
3559

Too a bit to understand what this was doing. Might be worth a comment here or in the header that says something like:

We only handle offsets where all sections in a file are offset by the same amount. If any sections is slid by a different amount, we return no value.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.h
444

Might be nice to document that we only handle when a binary has all its sections slid by the same amount somewhere.

445

It would be nice to have two functions here:
1 - "llvm::SmallVector<llvm::StringRef> GetQOffsets();" which will return raw qOffsets calls results
2 - "llvm::Optional<lldb::addr_t> GetExecutableOffset();" which calls above function and then does it thing where it tries to get a single rigid slide for the entire executable binary or fails.

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
1116

We could fall back eventually to try and parse the full results of qOffsets directly here and try to slide. Doesn't need to be done now, but a TODO comment might be nice.

labath updated this revision to Diff 246196.Feb 24 2020, 6:29 AM
labath marked 4 inline comments as done.

Add some comments and documentation.

Thanks for the quick response, and sorry it took me a while to get back to this.

While Module::SetLoadAddress doesn't, the Target interfaces do in fact support this fully. You specify the section and the section load address. Look at the macOS dynamic loader, it does this. All shared libraries in the macOS dyld shared cache have all the the TEXT segments put next to each other so that the OS can load one large read + execute mapping, all the modifiable data sections in DATA are relocated as well along with other segments. The Module::SetLoadAddress() interface was put in Module to make it easy to slide full binaries.

Ah, thanks for explaing that. My understanding was that Module::SetLoadAddress was the "main" interface, and that the other functions were just there to support it. Having Module::SetLoadAddress be a utility function on top of those makes perfect sense now.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.h
445

I don't think that is really worth it -- supporting the full qOffsets packet is is somewhat tricky, and if anyone sets out to do that, the lack of a preexisting query function will be the least of his concerns.

The main thing I found tricky is knowing what exactly should the individual offset fields apply to. The packet response doesn't just list sections or anything. It specifically prescribes three fields: "Text", "Data", and "Bss". Looking at gdb sources, I got the impression that it maps "Text" to the ".text" section, but that seems too narrow to me -- I would expect that it should apply to all executable sections, but as I don't know of any system which would make use of this option, I cannot validate my hypothesis.

The second complication comes from the supposed difference in handling of "Text" vs "TextSeg" field. To implement that absolutely correctly, I'd need to know whether the object file uses segments or not. But knowing that is again tricky because our ObjectFile interface abstracts those differences away. I could again make some guesses at what would be a reasonable behavior, but I also could not check that this is right.

So I think it's better to leave this to someone who actually needs this behavior.

clayborg added inline comments.Feb 24 2020, 10:40 AM
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.h
445

So I am not advocating supporting the full QOffsets packet at all. Just splitting up the contents of GetExecutableOffset so that GetQOffsets() grabs the exact output and chops it up. No need to implement any true functionality. So the code would be identical to what you have, but the GetQOffsets() will just return the array of strings that contains the chopped up return value. Since this class is GDBRemoteCommunicationClient, and if someone new comes in and wants the results of QOffsets to do something, the code is already there. Right now is implemented, but hidden inside GetExecutableOffset. So no functionality, just have GetQOffsets() return the data from the packet. Let me know what you think. Not a blocker, I would just prefer that any packet that GDB remote protocol handles has a raw access function.

labath updated this revision to Diff 246397.Feb 25 2020, 2:40 AM

Fully parse the qOffsets response

labath marked an inline comment as done.Feb 25 2020, 2:49 AM
labath added inline comments.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationClient.h
445

Ok, I see what you mean. The new patch now has a function which extracts all the data from the qOffsets response. I was trying to avoid doing that (I could take some shortcuts if I assumed all offsets have to be same), but overall difference is not that big..

labath updated this revision to Diff 246401.Feb 25 2020, 2:54 AM

Update the unit test for the function rename (GetExecutableOffset->GetQOffsets)

clayborg accepted this revision.Feb 25 2020, 11:00 AM

Thanks for doing the changes, looks good.

This revision is now accepted and ready to land.Feb 25 2020, 11:00 AM
This revision was automatically updated to reflect the committed changes.