Page MenuHomePhabricator

[lldb] Note difference in vFile:pread/pwrite format for lldb
ClosedPublic

Authored by DavidSpickett on Oct 12 2020, 3:19 AM.

Details

Summary

https://sourceware.org/gdb/current/onlinedocs/gdb/Host-I_002fO-Packets.html

States that all numbers should be hexidecimal but lldb
uses decimals in vFile:pread and vFile:pwrite.

lldb-server can accept either since it ends up using
strtoull which will detect the base being used.

Diff Detail

Event Timeline

DavidSpickett created this revision.Oct 12 2020, 3:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 12 2020, 3:19 AM
DavidSpickett requested review of this revision.Oct 12 2020, 3:19 AM

This is the compatibility issue I mentioned in the other review. I found it investigating why remotely debugging lldb would work with lldb but not gdb, turned out to be a >2gb file causing it but I saw the different formats in the logs.

Source for the gdb side is binutils-gdb/gdbserver/hostio.cc::handle_pread. Which calls require_int, which tries to convert from hex.

lldb-server side is in source/plugins/Process/gdb-remote/GDBRemoteCommunicationServerCommon.cpp::Handle_vFile_pRead
and for lldb in source/plugins/Process/gdb-remote/GDBRemoteCommunicationClient.cpp::ReadFile.

DavidSpickett retitled this revision from [lldb] Note different vFile:pread/pwrite format for lldb to [lldb] Note difference in vFile:pread/pwrite format for lldb.Oct 12 2020, 3:32 AM
labath accepted this revision.Oct 12 2020, 4:30 AM
labath added a subscriber: jasonmolenda.

Ah, interesting. We already know of one incompatibility in vFile packets -- it seems you have found a second one.

Making note of this is a good idea. The other incompatibility is mentioned in the Host/File.h, but mentioning it here is a good idea too. If it's not too much of a trouble could you also mention the open flag mismatch issue here?

PS: If you're interested, it would also be very nice to fix this these incompatibilities. The biggest trick is to ensure that things still work with old debugserver isntances (well, lldb-server too, but we don't care about that as much as debugserver). Fixing the flag issue is tricky and would require adding some explicit negotiation about the format. However, for the hex/decimal encoding the path forward is more clear:

  • make sure consumers accept both encodings (sounds like lldb-server does -- debugserver would need to be checked too)
  • wait a while
  • change lldb client encoding
This revision is now accepted and ready to land.Oct 12 2020, 4:30 AM
DavidSpickett added inline comments.Oct 12 2020, 5:35 AM
lldb/docs/lldb-platform-packets.txt
10

Open flags are mentioned here and in the vFile:open description. Is that the one you mean?

labath added inline comments.Oct 12 2020, 6:45 AM
lldb/docs/lldb-platform-packets.txt
10

Oh goody. I didn't even realize we had that. Everything is fine then.

This revision was landed with ongoing or failed builds.Oct 12 2020, 7:09 AM
This revision was automatically updated to reflect the committed changes.

Thanks for adding the clarifying text, I wrote the doc based on the existing lldb-server implementation of these packets while I was writing a separate implementation of the platform packets, so I hadn't seen the gdb docs.

Given how often the '0x' prefix on base16 numbers is dropped in the remote serial protocol, it makes me a little nervous to have a divergence here. I think we should switch lldb / lldb-server to sending base 16 with the 0x prefix to make it clear what base it is. Even if us sending base 10 works with lldb-server/gdbserver/my platform impl, someone may write another platform implementation that doesn't use stroul and we could have an incompatibility.

What do you think?

DavidSpickett added a comment.EditedOct 13 2020, 2:16 AM

Given how often the '0x' prefix on base16 numbers is dropped in the remote serial protocol

Now you mention it, I thought I knew how strtoull worked but I don't. (I assumed it was some best effort that would roll back if somehow the digits didn't fit the assumed base)

This is what a gdb pread looks like:

getpkt ("vFile:pread:6,3fff,9ab473c8");  [no ack sent]

So in fact it is doing exactly what you said and not including the 0x.

What do you think?

Yes I think sending hex with the 0x is the clearest indicator for implementers of their own lldb-servers. However gdb-server doesn't expect the leading 0x (I don't see the spec mandating it either way) so it doesn't solve that side, if that is a goal.

So would it make more sense to change lldb to send unprefixed hex like gdb does but update lldb-server to accept either? (by trying base 10 then base 16). Especially if implementers are going to prefer the packet specification over reading packet logs from lldb.

That's not perfect either, an older lldb-server with a new lldb client could happen to parse the number as base 10 but you'll get a unexpected result. (at least adding 0x would give you a hard error there) Do we commit to that kind of compatibility?

Edit: this makes my comment in the doc not accurate, lldb-server accepts either format *if* there is the 0x prefix.

And I'm saying lldb-server and not that and debug-server because if I'm honest, I don't really know what the latter is. I assume this applies to both though.

Given how often the '0x' prefix on base16 numbers is dropped in the remote serial protocol

Now you mention it, I thought I knew how strtoull worked but I don't. (I assumed it was some best effort that would roll back if somehow the digits didn't fit the assumed base)

This is what a gdb pread looks like:

getpkt ("vFile:pread:6,3fff,9ab473c8");  [no ack sent]

So in fact it is doing exactly what you said and not including the 0x.

What do you think?

Yes I think sending hex with the 0x is the clearest indicator for implementers of their own lldb-servers. However gdb-server doesn't expect the leading 0x (I don't see the spec mandating it either way) so it doesn't solve that side, if that is a goal.

So would it make more sense to change lldb to send unprefixed hex like gdb does but update lldb-server to accept either? (by trying base 10 then base 16). Especially if implementers are going to prefer the packet specification over reading packet logs from lldb.

That's not perfect either, an older lldb-server with a new lldb client could happen to parse the number as base 10 but you'll get a unexpected result. (at least adding 0x would give you a hard error there) Do we commit to that kind of compatibility?

Hmm... interesting. Given that lldb-servers already accept a 0x encoding I think that changing the client to send it that way would be a strict improvement (a noop for lldb-server and an hard error instead of something random for gdb-server).
Then we might also try to rectify the situation via some steps like:

  • if it's important for new lldb-server to communicate with old lldb, then wait a while
  • change lldb-server to parse everything as hex, regardless of the 0x prefix
  • wait a while (I'm guessing that having new lldb communicate with old lldb-server *is* important)
  • change lldb to drop the 0x prefix

And I'm saying lldb-server and not that and debug-server because if I'm honest, I don't really know what the latter is. I assume this applies to both though.

debugserver is an apple-only implementation of the remote stub. However, I think that may be a red herring, because in lldb land the vFile packets are handled by lldb-server platform, which is used even on apple platforms (IIUC).

DavidSpickett added a comment.EditedOct 13 2020, 6:57 AM

Sounds like a plan. I've raised https://bugs.llvm.org/show_bug.cgi?id=47820 to track this. Tell me if I miss-stated anything there, otherwise I'll start the first part.

debugserver is an apple-only implementation of the remote stub. However, I think that may be a red herring, because in lldb land the vFile packets are handled by lldb-server platform, which is used even on apple platforms (IIUC).

Got it, thanks for explaining.

And I'm saying lldb-server and not that and debug-server because if I'm honest, I don't really know what the latter is. I assume this applies to both though.

debugserver is an apple-only implementation of the remote stub. However, I think that may be a red herring, because in lldb land the vFile packets are handled by lldb-server platform, which is used even on apple platforms (IIUC).

On the Darwin systems we're using a standalone "lldb-platform" I wrote a coupe of years ago, I had a bit of trouble getting lldb-server to work with the way we communicate between iOS devices and macs and was annoyed enough that a simple agent was causing me so much headache, so I wrote a freestanding implementation of the platform packets. It doesn't use any llvm/lldb so it's never been clear if it made sense to upstream. I wrote it with the primary design goal being simplicity and no thought given to performance, so I'm afraid upstreaming it would result in people improving the performance and reducing the simplicity, or integrating llvm/lldb classes, and that's not really what I wanted here.

(I think there's a real value from having a separate implementation of the protocol packets as well, that's why I took the time to write the platform protocol doc after writing the server).

In this case, my implementation probably does the wrong thing --

int file_fd;
if (ascii_to_i (file_fd_string, 10, file_fd) == false)
    return send_packet (m_fd, "E01");

int offset;
if (ascii_to_i (offset_string, 10, offset) == false)
    return send_packet (m_fd, "E01");

std::string bytes = binary_unescape_data (encoded_contents);

(which calls strtoul with the base parameter - I don't know what stroul would do with "0x100" and base 10).

FWIW this is our 3rd protocol bug that I know of.

The A packet https://bugs.llvm.org/show_bug.cgi?id=42471 (another base10 vrs base16 mixup)
The vFile:pread / vFile:pwrite packet https://bugs.llvm.org/show_bug.cgi?id=47820
The vFile:open flags where lldb uses enum OpenOptions and gdb uses https://sourceware.org/gdb/onlinedocs/gdb/Open-Flags.html#Open-Flags

We've discussed how to transition away from the current buggy behavior with these before, I think Pavel and I have been converging on using the qSupported packet at the start of the connection. e.g. at the start of a session, lldb sends:

qSupported:xmlRegisters=i386,arm,mips,arc

and the remote stub replies:

qXfer:features:read+;PacketSize=20000;qEcho+

( https://www.sourceware.org/gdb/onlinedocs/gdb/General-Query-Packets.html#qSupported )

lldb can add stubfeatures on to the qSupported packet like

qSupported: xmlRegisters=i386,arm,mips,arc;a-packet-base16;vfilep-base16;vfileopen-flags-gdb-compat;

and the response can include a-packet-base16+; vfilep-base16+; vfileopen-flags-gdb-compat+;

After some undefined transition period :) we change lldb to assume the correct gdb-compatible behavior.

The biggest hurdle for changing the default are iOS/watchOS devices out in the field - we cannot update the old debugservers that run on them, so we often need lldb to communicate with debugservers that are 4-5 years old (I forget the exact OS matrix for what Xcode supports for debugging, but it goes back a ways).

Just to confirm one quick detail (I couldn't remember for sure and had to check) - if we strtoul will stop parsing when it finds a character outside the range of the specified base. e.g. calling strtoul("0x100", nullptr, 10) will yield 0; strtoul("10f", nullptr, 10) will yield 10.

I think we should revert David's doc change to lldb-platform-packets.txt which says that either base10 or base16 can be used. David, what do you think.

DavidSpickett added a comment.EditedOct 14 2020, 4:14 AM

After some undefined transition period :) we change lldb to assume the correct gdb-compatible behavior.

The biggest hurdle for changing the default are iOS/watchOS devices out in the field - we cannot update the old debugservers that run on them, so we often need lldb to communicate with debugservers that are 4-5 years old (I forget the exact OS matrix for what Xcode supports for debugging, but it goes back a ways).

Ok then I agree the qSupported idea makes the most sense.

Just to confirm one quick detail (I couldn't remember for sure and had to check) - if we strtoul will stop parsing when it finds a character outside the range of the specified base. e.g. calling strtoul("0x100", nullptr, 10) will yield 0; strtoul("10f", nullptr, 10) will yield 10.

Yeah, what I wrote in the doc wasn't correct either. lldb-server ends up using a base of 0:

size_t count = packet.GetU64(SIZE_MAX);
uint64_t GetU64(uint64_t fail_value, int base = 0);
uint64_t StringExtractor::GetU64(uint64_t fail_value, int base) {
<...>
uint64_t result = ::strtoull(cstr, &end, base);

So it'll accept decimal, hex *with* 0x or octal. (lucky I've never seen anyone 0 pad the numbers, best not mention that)

I think we should revert David's doc change to lldb-platform-packets.txt which says that either base10 or base16 can be used. David, what do you think.

I will revert it (https://reviews.llvm.org/D89383). I could correct it but that's opening the door to more options and will just make the qSupported idea more complicated. (I'll update the bug I raised too)

Yeah, using qSupport also works. It just seems like it would be possible to handle this particular issue without adding the generic framework, so I was throwing that idea out too.