This is an archive of the discontinued LLVM Phabricator instance.

Add jThreadsInfo support to lldb-server
ClosedPublic

Authored by labath on Jul 14 2015, 7:35 AM.

Details

Summary

This commit adds initial support for the jThreadsInfo packet to lldb-server. The current
implementation does not expedite inferior memory. It also does not send over all GPR, but only
the most important ones (PC, SP, FP, RA), as for some reasons writing the registers is incredibly
slow in lldb-server (I intend to investigate this further). Nevertheless, this implementation
speeds up stepping of a heavily multi-threaded inferior significantly (about 50%).

I have also added a description of the new packet to our protocol documentation (mostly taken
from Greg's earlier commit message).

Diff Detail

Repository
rL LLVM

Event Timeline

labath updated this revision to Diff 29676.Jul 14 2015, 7:35 AM
labath retitled this revision from to Add jThreadsInfo support to lldb-server.
labath updated this object.
labath added reviewers: clayborg, ovyalov, tberghammer.
labath added a subscriber: lldb-commits.
labath updated this revision to Diff 29677.Jul 14 2015, 8:21 AM
  • Rename a function to reflect it is not (yet) sending out all GPRs
tberghammer edited edge metadata.Jul 14 2015, 8:55 AM

Generally looks good, but please clarify the format of the jThreads packet

docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29677)

What are the format of the register numbers? In the stop replay packets the register number is 2 hex digit, but here it looks like as it is decimal. If they are different then please emphasize it (until we can make it uniform)

source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
473 ↗(On Diff #29677)

(nit): make_shared

506 ↗(On Diff #29677)

(nit): make_shared

2717 ↗(On Diff #29677)

(nit): make_shared

2761 ↗(On Diff #29677)

(nit): make_shared

470–472 ↗(On Diff #29676)

The register numbers have uint32_t types

labath updated this revision to Diff 29679.Jul 14 2015, 9:23 AM
labath marked 5 inline comments as done.
labath edited edge metadata.
  • Address review comments
docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29677)

I have added wording to confirm that they are decimal. This is not consistent with the stop-reply packet, but it is consistent with (most of) our other uses of json, where we write the numbers in decimal format. (JSON does not support hex notation, and the quotes are there because dictionary keys must be strings.) Not ideal, but that's what it is...

tberghammer accepted this revision.Jul 14 2015, 9:36 AM
tberghammer edited edge metadata.

Looks good

docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29679)

I would prefer to use hex numbers to be consistent with the stop reply packet but that change is out of scope for this change. The missing support for hex isn't an issue here because of the quotes.

source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
499–501 ↗(On Diff #29679)

(nit): You can use std::to_string(regindex)

This revision is now accepted and ready to land.Jul 14 2015, 9:36 AM
jasonmolenda added inline comments.
docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29679)

I agree with Tamas, the number-base assumptions in gdb-remote numbers are a nightmare; at this point even decimal can be misconstrued as hex. I'd rather we just make it explicitly hex -- so instead of "14":, "0xe":.

I'm not a huge fan of sending back the registers as a series of bytes in target-native-endian order like this,

+ "5":"d8f8bf5fff7f0000",

I'd push for sending this back as a native JSON number object, so "0x5":140734799804632 but this would cause problems if we try to transfer a 16-byte (or larger) vector register value - lldb's JSON parser is only designed to handle 64-bit integer values, as are most others I would expect - so maybe we have to live with this. I'd argue for transposing to big-endian order but when we send the expedited memory contents we need to send those in target native endian order.

tl;dr: At least let's make the register numbers base 16 and prefixed with "0x".

jasonmolenda added inline comments.Jul 14 2015, 12:20 PM
docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29679)

Thinking about the value part of the register set, we can allow for both types of returns. e.g. both of these could be allowed:

"0xe":"d8f8bf5fff7f0000"
"0xe":140734799804632

If the value is a string, it is a series of hex bytes in target-native-endian order. If the value is an integer, it is an integer.

Once these are in a StructuredData object, lldb can detect whether the value is a string or integer and decode as appropriate.

debugserver could return integers for 64-bits-and-less register values, strings of raw bytes for vector regs.

labath added inline comments.Jul 14 2015, 1:41 PM
docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29679)

I agree with Tamas, the number-base assumptions in gdb-remote numbers are a nightmare; at this point even decimal can be misconstrued as hex. I'd rather we just make it explicitly hex -- so instead of "14":, "0xe":.

I understand the sentiment. We could potentially implement it this way. How do you think this should be handled from a protocol specification language standpoint? I am asking because including a specific base prefix can lead people to believe they can pass any base number they want (decimal, octal?). Or should we say the number MUST be hex and MUST have 0x prefix?

I'd push for sending this back as a native JSON number object, so "0x5":140734799804632 but this would cause problems if we try to transfer a 16-byte (or larger) vector register value - lldb's JSON parser is only designed to handle 64-bit integer values, as are most others I would expect - so maybe we have to live with this. I'd argue for transposing to big-endian order but when we send the expedited memory contents we need to send those in target native endian order.

I would keep the register endianness as is, precisely because of vector registers. Since they contain multiple values, the whole register does not have a specific endianness, only the individual values do. If you now go around reversing the register byte-by-byte, you will produce something very odd. And reversing the individual parts sounds very fragile.

Also, I think the memory read is a completely separate issue. I think of the memory reads as a sequence of bytes -- it doesn't have endianness in the same way that an char[] doesn't have endianness until you know what is actually stored in there.

If the value is a string, it is a series of hex bytes in target-native-endian order. If the value is an integer, it is an integer.

I think increasing the number of formats will only add more confusion here. Sticking with one format, whatever it may be, is a much better option in my view.

tberghammer added inline comments.Jul 14 2015, 2:16 PM
docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29679)

My primary opinion is we should keep the format as consistent with the stop reply packet as possible so both information can be treated just as a list of "key:value" pairs with the same list of keys. From the readers perspective it isn't trivial to understand but I prefer consistency over local readability.

I don't like the idea of sending the register values as a number because if somebody just reads the packet then a decimal number usually don't mean anything and it isn't easier to parse it in the code. For me it is quite natural that we send the register values as hex data without the "0x" prefix but probably because we do it in LLDB quite a lot.

If we are changing the format of this packet I would like to see 2 more changes.

  • Instead of a list of thread infos send back an object where the keys are the thread ids (as string) and the value is the current data content (without thread id)
  • Move out the "memory" section from the thread specific object to the top level scope (e.g. to the same level where we have the tids as keys if we make the other change) because that info is valid at process level. It might also help in decreasing the amount of memory we are transferring with eliminating some redundancy.
ovyalov added inline comments.Jul 14 2015, 7:56 PM
source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
470 ↗(On Diff #29679)

s/constexpr/const for compatibility with Visual Studio?

471 ↗(On Diff #29679)

What do you think about having #ifdef/#else branching to cover both of them - full set of registers and pruned?
I'd rather keep new functionality and optimizations in separate CLs - for example, if later, down the road we'll realize that have to send a full set of registers then only a CL that enables pruned set of registers should be reverted.

521 ↗(On Diff #29679)

No need to have 'break' here.

labath added inline comments.Jul 15 2015, 10:35 AM
docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29679)

The two changes seem reasonable, and in fact I was considering making them along with some other changes, but I would not want to make them a part of this patch. I propose to keep the packet as is, and I will float an email to lldb-dev later with more thoughts.

source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
470 ↗(On Diff #29679)

I was not aware of this limitation. Will do.

471 ↗(On Diff #29679)

Sure, sounds reasonable. I'll put the full register set in this patch, and I will submit a new one with the reduced set shortly.

labath updated this revision to Diff 29798.Jul 15 2015, 11:00 AM
labath marked 2 inline comments as done.
labath edited edge metadata.
  • Address review comments
  • Send full register set.
clayborg accepted this revision.Jul 15 2015, 11:43 AM
clayborg edited edge metadata.

Very nice. Glad to see it speeds things up as well!

ovyalov accepted this revision.Jul 15 2015, 1:30 PM
ovyalov edited edge metadata.

LGTM

tberghammer added inline comments.Jul 16 2015, 3:04 AM
docs/lldb-gdb-remote.txt
1555–1575 ↗(On Diff #29798)

I agree that it shouldn't be part of this patch

This revision was automatically updated to reflect the committed changes.