Page MenuHomePhabricator

[lldb-server] Add setting to force 'g' packet use
Needs RevisionPublic

Authored by guiandrade on Jun 5 2019, 1:38 PM.

Details

Summary

Following up on https://reviews.llvm.org/D62221, this change introduces
the setting plugin.process.gdb-remote.try-to-use-g-packet. When that is on
and the server supports the use of 'g' packets, those will be used regardless
of whether 'p' packets are supported.

Using 'g' packets can improve performance by reducing the number of packets
exchanged between client and server when a large number of registers
needs to be fetched.

Event Timeline

guiandrade created this revision.Jun 5 2019, 1:38 PM

Personally, I'm wondering whether we even need a setting for this. Running the speed test command locally (i.e. the lowest latency we can possibly get), I get results like:

(lldb) process plugin packet speed-test --count 100000 --max-receive 2048
Testing sending 100000 packets of various sizes:
...
qSpeedTest(send=      4, recv=      0) in 0.788656771 s for 126797.88 packets/s (0.007886 ms per packet) with standard deviation of 0.001604 ms
qSpeedTest(send=      4, recv=      4) in 0.770005465 s for 129869.21 packets/s (0.007700 ms per packet) with standard deviation of 0.002120 ms
qSpeedTest(send=      4, recv=      8) in 0.895460367 s for 111674.40 packets/s (0.008954 ms per packet) with standard deviation of 0.002048 ms
qSpeedTest(send=      4, recv=     16) in 0.910367966 s for 109845.70 packets/s (0.009103 ms per packet) with standard deviation of 0.001886 ms
qSpeedTest(send=      4, recv=     32) in 0.889442623 s for 112429.96 packets/s (0.008894 ms per packet) with standard deviation of 0.001705 ms
qSpeedTest(send=      4, recv=     64) in 0.945124865 s for 105806.12 packets/s (0.009451 ms per packet) with standard deviation of 0.002349 ms
qSpeedTest(send=      4, recv=    128) in 0.759995580 s for 131579.72 packets/s (0.007599 ms per packet) with standard deviation of 0.002971 ms
qSpeedTest(send=      4, recv=    256) in 0.847535312 s for 117989.19 packets/s (0.008475 ms per packet) with standard deviation of 0.002629 ms
qSpeedTest(send=      4, recv=    512) in 1.022377729 s for  97811.21 packets/s (0.010223 ms per packet) with standard deviation of 0.003248 ms
qSpeedTest(send=      4, recv=   1024) in 1.436389208 s for  69619.02 packets/s (0.014363 ms per packet) with standard deviation of 0.000688 ms
qSpeedTest(send=      4, recv=   2048) in 1.910557270 s for  52340.75 packets/s (0.019105 ms per packet) with standard deviation of 0.001194 ms
...

Now, if we assume that the p resonse is about 16 bytes long, and g response is 2048, we find that the g packet is worth approximately two p packets. And this is a local link, where it pretty much doesn't matter what we do, as we're pumping 50k packets per second even in the slowest case. On links with a non-negligible latency, the difference between the two packets should be even smaller (though it might be nice to verify that).

Combining this with the fact that we normally expedite all general purpose registers (and so there won't be *any* packets if one is only reading those), I would say that enabling this unconditionally is a pretty safe thing to do.

Greg, Jason, what do you think?

Personally, I'm wondering whether we even need a setting for this. Running the speed test command locally (i.e. the lowest latency we can possibly get), I get results like:

(lldb) process plugin packet speed-test --count 100000 --max-receive 2048
Testing sending 100000 packets of various sizes:
...
qSpeedTest(send=      4, recv=      0) in 0.788656771 s for 126797.88 packets/s (0.007886 ms per packet) with standard deviation of 0.001604 ms
qSpeedTest(send=      4, recv=      4) in 0.770005465 s for 129869.21 packets/s (0.007700 ms per packet) with standard deviation of 0.002120 ms
qSpeedTest(send=      4, recv=      8) in 0.895460367 s for 111674.40 packets/s (0.008954 ms per packet) with standard deviation of 0.002048 ms
qSpeedTest(send=      4, recv=     16) in 0.910367966 s for 109845.70 packets/s (0.009103 ms per packet) with standard deviation of 0.001886 ms
qSpeedTest(send=      4, recv=     32) in 0.889442623 s for 112429.96 packets/s (0.008894 ms per packet) with standard deviation of 0.001705 ms
qSpeedTest(send=      4, recv=     64) in 0.945124865 s for 105806.12 packets/s (0.009451 ms per packet) with standard deviation of 0.002349 ms
qSpeedTest(send=      4, recv=    128) in 0.759995580 s for 131579.72 packets/s (0.007599 ms per packet) with standard deviation of 0.002971 ms
qSpeedTest(send=      4, recv=    256) in 0.847535312 s for 117989.19 packets/s (0.008475 ms per packet) with standard deviation of 0.002629 ms
qSpeedTest(send=      4, recv=    512) in 1.022377729 s for  97811.21 packets/s (0.010223 ms per packet) with standard deviation of 0.003248 ms
qSpeedTest(send=      4, recv=   1024) in 1.436389208 s for  69619.02 packets/s (0.014363 ms per packet) with standard deviation of 0.000688 ms
qSpeedTest(send=      4, recv=   2048) in 1.910557270 s for  52340.75 packets/s (0.019105 ms per packet) with standard deviation of 0.001194 ms
...

Now, if we assume that the p resonse is about 16 bytes long, and g response is 2048, we find that the g packet is worth approximately two p packets. And this is a local link, where it pretty much doesn't matter what we do, as we're pumping 50k packets per second even in the slowest case. On links with a non-negligible latency, the difference between the two packets should be even smaller (though it might be nice to verify that).

Combining this with the fact that we normally expedite all general purpose registers (and so there won't be *any* packets if one is only reading those), I would say that enabling this unconditionally is a pretty safe thing to do.

Greg, Jason, what do you think?

There are so many various GDB remote servers and it is hard to say what will work for most people. Might be worth testing this with the setting set to true and false on a complex program single step on mac, linux and android and verifying that stepping speeds don't regress at all. Android tends to have hundreds of threads which usually makes it the slowest of all when stepping. If we don't see regressions, then I am fine with no setting. Every scenario I know of benefits from fewer packets with larger content (macOS, watchOS, linux, android) so I am fine just enabling it. Maybe it would be good to check what GDB does by default as it might tell us, via comments, why they don't use it if they did run into issues.

So I am ok with no setting as long as perf doesn't regress on complex steps and as long as GDB uses 'g' packets by default.

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
117

I would change the setting to just be "use-g-packet".

There are so many various GDB remote servers and it is hard to say what will work for most people. Might be worth testing this with the setting set to true and false on a complex program single step on mac, linux and android and verifying that stepping speeds don't regress at all. Android tends to have hundreds of threads which usually makes it the slowest of all when stepping. If we don't see regressions, then I am fine with no setting. Every scenario I know of benefits from fewer packets with larger content (macOS, watchOS, linux, android) so I am fine just enabling it. Maybe it would be good to check what GDB does by default as it might tell us, via comments, why they don't use it if they did run into issues.

How can we run those tests? Is there an automated way to do that?

So I am ok with no setting as long as perf doesn't regress on complex steps and as long as GDB uses 'g' packets by default.

I think they first try to use 'g' packets - https://github.com/bminor/binutils-gdb/blob/4fa0265edea0940b865678d93749e224547dd36a/gdb/remote.c#L8130 (I'm not familiar with that code at all, so I may be wrong)

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
1707

note to self: fix wrong method call

1823

note to self: fix wrong method call

Just checking the status of this change. Are we still considering enabling 'g' packets unconditionally, or are we better off using the setting for safety?

labath added a comment.Jul 1 2019, 6:26 AM

I think it will be pretty hard for a single person to gather all of these stats for all targets. So perhaps the best solution is to have the setting for the g packet, defaulting to true. After a while, if nobody notices a regression, we can just remove it...

guiandrade updated this revision to Diff 208838.Jul 9 2019, 4:26 PM
guiandrade marked 3 inline comments as done.

This rebases the change, rename *try_to_use_g_packet* to *use_g_packet*, defaults the setting to true, and modifies the condition of the read_all_registers_at_once bool in ThreadGDBRemote::CreateRegisterContextForFrame to guarantee the server supports 'g' packet.

guiandrade added inline comments.Jul 9 2019, 4:28 PM
lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
159

Should we default to true here as well?

clayborg added inline comments.Jul 10 2019, 11:38 AM
lldb/source/Plugins/Process/gdb-remote/ThreadGDBRemote.h
103 ↗(On Diff #208838)

Not a big deal, but might be easier to store this in ProcessGDBRemote and access it there?

guiandrade marked an inline comment as done.Jul 10 2019, 2:03 PM

Sorry for the delay. The patch looks fine to me. As far as testing goes, there's a "gdb-client" test suite in test/testcases/functionalities/gdb_remote_client/, which should allow you to mock gdb-server responses and verify that the right packets are being sent.

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
118

I'd add a trailing comma here to avoid changing this line next time we add a setting.

guiandrade marked an inline comment as done.

Adds a test to make sure the client is using 'g' by default.

labath accepted this revision.Thu, Jul 25, 11:37 PM

LGTM. Thanks.

This revision is now accepted and ready to land.Thu, Jul 25, 11:37 PM

Thanks, labath@

Could anyone please merge this for me?

Unfortunately this patch has raced with the latest tablegen changes, and it no longer applies cleanly. Could you please rebase on the current trunk?

Adding support for tablegen generated properties introduced in https://reviews.llvm.org/D65185

labath requested changes to this revision.Tue, Jul 30, 2:53 AM

Hm... this patch spectacularly fails for me with about 100 failures. Are you sure you ran the test suite with this change?

I took a brief look at x86-64-ymm-write.test as it was easiest to debug. It looks like the test fails with: error: Failed to write register 'ymm0' with value '{0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f}'

After enabling the gdb-remote log, I see that lldb now attempts to implement the "register write" command via the G packet, and lldb-server does not support that. If we want this new setting to control register writing as well, then I guess we'll need to add appropriate support in lldb-server. However, it that case we might also need to fix register writing in lldb in this scenario, as I did not see lldb issueing a g command before G, which leads me to believe that some of the G data was garbage.

Due to the potential need to issue a g packet before we write a single register (i.e., issue two large packets instead of a single small packet), I don't think we should tie the usage of g and G together. I haven't looked at what it would take to implement that though...

This revision now requires changes to proceed.Tue, Jul 30, 2:53 AM

Oh, sorry about that. I was relying on ninja check, which runs okay for me.

> ninja check
[0/1] Running the LLVM regression tests
Testing Time: 583.96s
  Expected Passes    : 32141
  Expected Failures  : 147
  Unsupported Tests  : 438

How can I invoke those other tests? (in case it's relevant, here's my cmake command cmake ../llvm -G Ninja -DLLVM_ENABLE_PROJECTS='lldb;clang;libcxx')

About not tying G and g together, maybe could we remove this branch?

Oh, sorry about that. I was relying on ninja check, which runs okay for me.

> ninja check
 [0/1] Running the LLVM regression tests
 Testing Time: 583.96s
   Expected Passes    : 32141
   Expected Failures  : 147
   Unsupported Tests  : 438

How can I invoke those other tests? (in case it's relevant, here's my cmake command cmake ../llvm -G Ninja -DLLVM_ENABLE_PROJECTS='lldb;clang;libcxx')

LLDB tests can be run with the "check-lldb" target. There's also "check-all" which would run *all* tests, but that's usually overkill.

About not tying G and g together, maybe could we remove this branch?

That would be fine for lldb-server, but I am not sure how would that play with other stubs. As I understand it, the main reason this read-all-at-once logic was introduced was to handle stubs which did not implement p/P packets, so we may not be able to just start sending P unconditionally. @clayborg or @jasonmolenda can probably say more about that...