This is an archive of the discontinued LLVM Phabricator instance.

[lldb-server] Add setting to force 'g' packet use
ClosedPublic

Authored by guiandrade on Jun 5 2019, 1:38 PM.

Details

Summary

Following up on https://reviews.llvm.org/D62221, this change introduces
the settings plugin.process.gdb-remote.use-g-packet-for-reading.
When they are on, 'g' packets are used for reading registers.

Using 'g' packets can improve performance by reducing the number of packets
exchanged between client and server when a large number of registers
needs to be fetched.

Event Timeline

guiandrade created this revision.Jun 5 2019, 1:38 PM

Personally, I'm wondering whether we even need a setting for this. Running the speed test command locally (i.e. the lowest latency we can possibly get), I get results like:

(lldb) process plugin packet speed-test --count 100000 --max-receive 2048
Testing sending 100000 packets of various sizes:
...
qSpeedTest(send=      4, recv=      0) in 0.788656771 s for 126797.88 packets/s (0.007886 ms per packet) with standard deviation of 0.001604 ms
qSpeedTest(send=      4, recv=      4) in 0.770005465 s for 129869.21 packets/s (0.007700 ms per packet) with standard deviation of 0.002120 ms
qSpeedTest(send=      4, recv=      8) in 0.895460367 s for 111674.40 packets/s (0.008954 ms per packet) with standard deviation of 0.002048 ms
qSpeedTest(send=      4, recv=     16) in 0.910367966 s for 109845.70 packets/s (0.009103 ms per packet) with standard deviation of 0.001886 ms
qSpeedTest(send=      4, recv=     32) in 0.889442623 s for 112429.96 packets/s (0.008894 ms per packet) with standard deviation of 0.001705 ms
qSpeedTest(send=      4, recv=     64) in 0.945124865 s for 105806.12 packets/s (0.009451 ms per packet) with standard deviation of 0.002349 ms
qSpeedTest(send=      4, recv=    128) in 0.759995580 s for 131579.72 packets/s (0.007599 ms per packet) with standard deviation of 0.002971 ms
qSpeedTest(send=      4, recv=    256) in 0.847535312 s for 117989.19 packets/s (0.008475 ms per packet) with standard deviation of 0.002629 ms
qSpeedTest(send=      4, recv=    512) in 1.022377729 s for  97811.21 packets/s (0.010223 ms per packet) with standard deviation of 0.003248 ms
qSpeedTest(send=      4, recv=   1024) in 1.436389208 s for  69619.02 packets/s (0.014363 ms per packet) with standard deviation of 0.000688 ms
qSpeedTest(send=      4, recv=   2048) in 1.910557270 s for  52340.75 packets/s (0.019105 ms per packet) with standard deviation of 0.001194 ms
...

Now, if we assume that the p resonse is about 16 bytes long, and g response is 2048, we find that the g packet is worth approximately two p packets. And this is a local link, where it pretty much doesn't matter what we do, as we're pumping 50k packets per second even in the slowest case. On links with a non-negligible latency, the difference between the two packets should be even smaller (though it might be nice to verify that).

Combining this with the fact that we normally expedite all general purpose registers (and so there won't be *any* packets if one is only reading those), I would say that enabling this unconditionally is a pretty safe thing to do.

Greg, Jason, what do you think?

Personally, I'm wondering whether we even need a setting for this. Running the speed test command locally (i.e. the lowest latency we can possibly get), I get results like:

(lldb) process plugin packet speed-test --count 100000 --max-receive 2048
Testing sending 100000 packets of various sizes:
...
qSpeedTest(send=      4, recv=      0) in 0.788656771 s for 126797.88 packets/s (0.007886 ms per packet) with standard deviation of 0.001604 ms
qSpeedTest(send=      4, recv=      4) in 0.770005465 s for 129869.21 packets/s (0.007700 ms per packet) with standard deviation of 0.002120 ms
qSpeedTest(send=      4, recv=      8) in 0.895460367 s for 111674.40 packets/s (0.008954 ms per packet) with standard deviation of 0.002048 ms
qSpeedTest(send=      4, recv=     16) in 0.910367966 s for 109845.70 packets/s (0.009103 ms per packet) with standard deviation of 0.001886 ms
qSpeedTest(send=      4, recv=     32) in 0.889442623 s for 112429.96 packets/s (0.008894 ms per packet) with standard deviation of 0.001705 ms
qSpeedTest(send=      4, recv=     64) in 0.945124865 s for 105806.12 packets/s (0.009451 ms per packet) with standard deviation of 0.002349 ms
qSpeedTest(send=      4, recv=    128) in 0.759995580 s for 131579.72 packets/s (0.007599 ms per packet) with standard deviation of 0.002971 ms
qSpeedTest(send=      4, recv=    256) in 0.847535312 s for 117989.19 packets/s (0.008475 ms per packet) with standard deviation of 0.002629 ms
qSpeedTest(send=      4, recv=    512) in 1.022377729 s for  97811.21 packets/s (0.010223 ms per packet) with standard deviation of 0.003248 ms
qSpeedTest(send=      4, recv=   1024) in 1.436389208 s for  69619.02 packets/s (0.014363 ms per packet) with standard deviation of 0.000688 ms
qSpeedTest(send=      4, recv=   2048) in 1.910557270 s for  52340.75 packets/s (0.019105 ms per packet) with standard deviation of 0.001194 ms
...

Now, if we assume that the p resonse is about 16 bytes long, and g response is 2048, we find that the g packet is worth approximately two p packets. And this is a local link, where it pretty much doesn't matter what we do, as we're pumping 50k packets per second even in the slowest case. On links with a non-negligible latency, the difference between the two packets should be even smaller (though it might be nice to verify that).

Combining this with the fact that we normally expedite all general purpose registers (and so there won't be *any* packets if one is only reading those), I would say that enabling this unconditionally is a pretty safe thing to do.

Greg, Jason, what do you think?

There are so many various GDB remote servers and it is hard to say what will work for most people. Might be worth testing this with the setting set to true and false on a complex program single step on mac, linux and android and verifying that stepping speeds don't regress at all. Android tends to have hundreds of threads which usually makes it the slowest of all when stepping. If we don't see regressions, then I am fine with no setting. Every scenario I know of benefits from fewer packets with larger content (macOS, watchOS, linux, android) so I am fine just enabling it. Maybe it would be good to check what GDB does by default as it might tell us, via comments, why they don't use it if they did run into issues.

So I am ok with no setting as long as perf doesn't regress on complex steps and as long as GDB uses 'g' packets by default.

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
118

I would change the setting to just be "use-g-packet".

There are so many various GDB remote servers and it is hard to say what will work for most people. Might be worth testing this with the setting set to true and false on a complex program single step on mac, linux and android and verifying that stepping speeds don't regress at all. Android tends to have hundreds of threads which usually makes it the slowest of all when stepping. If we don't see regressions, then I am fine with no setting. Every scenario I know of benefits from fewer packets with larger content (macOS, watchOS, linux, android) so I am fine just enabling it. Maybe it would be good to check what GDB does by default as it might tell us, via comments, why they don't use it if they did run into issues.

How can we run those tests? Is there an automated way to do that?

So I am ok with no setting as long as perf doesn't regress on complex steps and as long as GDB uses 'g' packets by default.

I think they first try to use 'g' packets - https://github.com/bminor/binutils-gdb/blob/4fa0265edea0940b865678d93749e224547dd36a/gdb/remote.c#L8130 (I'm not familiar with that code at all, so I may be wrong)

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
1670

note to self: fix wrong method call

1786

note to self: fix wrong method call

Just checking the status of this change. Are we still considering enabling 'g' packets unconditionally, or are we better off using the setting for safety?

labath added a comment.Jul 1 2019, 6:26 AM

I think it will be pretty hard for a single person to gather all of these stats for all targets. So perhaps the best solution is to have the setting for the g packet, defaulting to true. After a while, if nobody notices a regression, we can just remove it...

guiandrade updated this revision to Diff 208838.Jul 9 2019, 4:26 PM
guiandrade marked 3 inline comments as done.

This rebases the change, rename *try_to_use_g_packet* to *use_g_packet*, defaults the setting to true, and modifies the condition of the read_all_registers_at_once bool in ThreadGDBRemote::CreateRegisterContextForFrame to guarantee the server supports 'g' packet.

guiandrade added inline comments.Jul 9 2019, 4:28 PM
lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
160

Should we default to true here as well?

clayborg added inline comments.Jul 10 2019, 11:38 AM
lldb/source/Plugins/Process/gdb-remote/ThreadGDBRemote.h
103 ↗(On Diff #208838)

Not a big deal, but might be easier to store this in ProcessGDBRemote and access it there?

guiandrade marked an inline comment as done.Jul 10 2019, 2:03 PM

Sorry for the delay. The patch looks fine to me. As far as testing goes, there's a "gdb-client" test suite in test/testcases/functionalities/gdb_remote_client/, which should allow you to mock gdb-server responses and verify that the right packets are being sent.

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
119

I'd add a trailing comma here to avoid changing this line next time we add a setting.

guiandrade marked an inline comment as done.

Adds a test to make sure the client is using 'g' by default.

labath accepted this revision.Jul 25 2019, 11:37 PM

LGTM. Thanks.

This revision is now accepted and ready to land.Jul 25 2019, 11:37 PM

Thanks, labath@

Could anyone please merge this for me?

Unfortunately this patch has raced with the latest tablegen changes, and it no longer applies cleanly. Could you please rebase on the current trunk?

Adding support for tablegen generated properties introduced in https://reviews.llvm.org/D65185

labath requested changes to this revision.Jul 30 2019, 2:53 AM

Hm... this patch spectacularly fails for me with about 100 failures. Are you sure you ran the test suite with this change?

I took a brief look at x86-64-ymm-write.test as it was easiest to debug. It looks like the test fails with: error: Failed to write register 'ymm0' with value '{0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07 0x08 0x09 0x0a 0x0b 0x0c 0x0d 0x0e 0x0f 0x10 0x11 0x12 0x13 0x14 0x15 0x16 0x17 0x18 0x19 0x1a 0x1b 0x1c 0x1d 0x1e 0x1f}'

After enabling the gdb-remote log, I see that lldb now attempts to implement the "register write" command via the G packet, and lldb-server does not support that. If we want this new setting to control register writing as well, then I guess we'll need to add appropriate support in lldb-server. However, it that case we might also need to fix register writing in lldb in this scenario, as I did not see lldb issueing a g command before G, which leads me to believe that some of the G data was garbage.

Due to the potential need to issue a g packet before we write a single register (i.e., issue two large packets instead of a single small packet), I don't think we should tie the usage of g and G together. I haven't looked at what it would take to implement that though...

This revision now requires changes to proceed.Jul 30 2019, 2:53 AM

Oh, sorry about that. I was relying on ninja check, which runs okay for me.

> ninja check
[0/1] Running the LLVM regression tests
Testing Time: 583.96s
  Expected Passes    : 32141
  Expected Failures  : 147
  Unsupported Tests  : 438

How can I invoke those other tests? (in case it's relevant, here's my cmake command cmake ../llvm -G Ninja -DLLVM_ENABLE_PROJECTS='lldb;clang;libcxx')

About not tying G and g together, maybe could we remove this branch?

Oh, sorry about that. I was relying on ninja check, which runs okay for me.

> ninja check
[0/1] Running the LLVM regression tests
Testing Time: 583.96s
  Expected Passes    : 32141
  Expected Failures  : 147
  Unsupported Tests  : 438

How can I invoke those other tests? (in case it's relevant, here's my cmake command cmake ../llvm -G Ninja -DLLVM_ENABLE_PROJECTS='lldb;clang;libcxx')

LLDB tests can be run with the "check-lldb" target. There's also "check-all" which would run *all* tests, but that's usually overkill.

About not tying G and g together, maybe could we remove this branch?

That would be fine for lldb-server, but I am not sure how would that play with other stubs. As I understand it, the main reason this read-all-at-once logic was introduced was to handle stubs which did not implement p/P packets, so we may not be able to just start sending P unconditionally. @clayborg or @jasonmolenda can probably say more about that...

Oh, sorry about that. I was relying on ninja check, which runs okay for me.

> ninja check
[0/1] Running the LLVM regression tests
Testing Time: 583.96s
  Expected Passes    : 32141
  Expected Failures  : 147
  Unsupported Tests  : 438

How can I invoke those other tests? (in case it's relevant, here's my cmake command cmake ../llvm -G Ninja -DLLVM_ENABLE_PROJECTS='lldb;clang;libcxx')

LLDB tests can be run with the "check-lldb" target. There's also "check-all" which would run *all* tests, but that's usually overkill.

About not tying G and g together, maybe could we remove this branch?

That would be fine for lldb-server, but I am not sure how would that play with other stubs. As I understand it, the main reason this read-all-at-once logic was introduced was to handle stubs which did not implement p/P packets, so we may not be able to just start sending P unconditionally. @clayborg or @jasonmolenda can probably say more about that...

I'm trying to write a prototype that disassociates the use of 'g' and 'G' packets, and I think I found an issue with LLDB's current code and Intel MPX registers.

Repro steps:

Using https://github.com/llvm/llvm-project/blob/3cabfb344b820586bb7c43212340f64246cde8eb/lldb/lit/Register/Inputs/x86-ymm-read.cpp, run:

g++ -g x86-ymm-read.cpp && lldb a.out -o 'r' -o 'reg read bnd0' -o 'reg read ymm12' -o 'reg read bnd0'

The register bnd0 will initially be evaluated to {0x0000000000000000 0x0000000000000000} and then to {0x131211100f0e0d0c 0x1b1a191817161514}.

I think that's because there's an intersection between the interval [reg_info->byte_offset, reg_info->byte_offset + reg_info->byte_size) for bnd0 and ymm12, which causes data in https://github.com/llvm-mirror/lldb/blob/2f97c81e9ed17f6ad759265abfa3f8f49dd7b826/source/Plugins/Process/gdb-remote/GDBRemoteRegisterContext.h#L115 to get overwritten.

Could someone please confirm that this is an issue?

Thanks!

Yeah, that is a problem, though it's not really caused by you, and I am not sure even if your planned changed makes it worse (it looks like it *might* do that, but I'd have to dig deeper to tell for sure). If you found this issue by running the test suite, then I guess it does.

The problem here is that there is a lot of confusion in the code about what is the "offset" field in the register info struct supposed to mean. Somewhere it is used to specify the offset in the operating system register context struct (struct user on linux), and elsewhere for the position in the gdb register packet string. In an ideal world, the same number could be used for both things, but unfortunately ymm registers use this weird discontiguous layout, which is not expressible in the RegisterInfo struct. If you want to fix that, I think the only reasonable way to do that would be to change the RegisterInfo offset for BND registers to mean the gdb-remote offset, and then change any code which uses it for the OS context offset to do something else (the only place doing something like that should be RegisterContextLinux_x86_64.cpp).

(As a side note, the register infos in lldb badly need some TLC. I am preparing a patch which should make things slightly better, though it will not help with this issue in any way.)

Thank you for looking into this, @labath

I'd like to fix that, but I'm not sure if I understand the code well enough/ it's not clear to me what the solution would look like.

I think the only reasonable way to do that would be to change the RegisterInfo offset for BND registers to mean the gdb-remote offset

Would that involve hard-coding an extra offset at RegisterInfos_x86_64.h:28 and RegisterInfos_i386.h:32?

change any code which uses it for the OS context offset to do something else (the only place doing something like that should be RegisterContextLinux_x86_64.cpp)

Would that be NativeRegisterContextLinux_x86_64.cpp? If yes, the idea would be to somehow remove that extra factor before the ptrace calls?

Thanks!

labath added a comment.Oct 8 2019, 8:12 AM

Thank you for looking into this, @labath

I'd like to fix that, but I'm not sure if I understand the code well enough/ it's not clear to me what the solution would look like.

I think the only reasonable way to do that would be to change the RegisterInfo offset for BND registers to mean the gdb-remote offset

Would that involve hard-coding an extra offset at RegisterInfos_x86_64.h:28 and RegisterInfos_i386.h:32?

I would very much like to avoid adding new members to the RegisterInfo struct. As this stuff is specific to the x86 linux context, I'd like to keep this stuff there, if necessary then via keeping a side-table of "ptrace" offsets for each of the register (though I am hoping there's some better way to achieve that).

change any code which uses it for the OS context offset to do something else (the only place doing something like that should be RegisterContextLinux_x86_64.cpp)

Would that be NativeRegisterContextLinux_x86_64.cpp? If yes, the idea would be to somehow remove that extra factor before the ptrace calls?

Yes, I meant NativeRegisterContextLinux_x86_64. And yes, the idea is to remove that factor, or recompute that the ptrace offset in some other way. This could be a side table, a switch on the register set type, or whatever is the simplest.

Dissociating the use of g/G packets and trying to address the AVX/MPX offset bug

Thank you for looking into this, @labath

I'd like to fix that, but I'm not sure if I understand the code well enough/ it's not clear to me what the solution would look like.

I think the only reasonable way to do that would be to change the RegisterInfo offset for BND registers to mean the gdb-remote offset

Would that involve hard-coding an extra offset at RegisterInfos_x86_64.h:28 and RegisterInfos_i386.h:32?

I would very much like to avoid adding new members to the RegisterInfo struct. As this stuff is specific to the x86 linux context, I'd like to keep this stuff there, if necessary then via keeping a side-table of "ptrace" offsets for each of the register (though I am hoping there's some better way to achieve that).

change any code which uses it for the OS context offset to do something else (the only place doing something like that should be RegisterContextLinux_x86_64.cpp)

Would that be NativeRegisterContextLinux_x86_64.cpp? If yes, the idea would be to somehow remove that extra factor before the ptrace calls?

Yes, I meant NativeRegisterContextLinux_x86_64. And yes, the idea is to remove that factor, or recompute that the ptrace offset in some other way. This could be a side table, a switch on the register set type, or whatever is the simplest.

I updated this change with an initial attempt to implement that idea to see if we're on the same page. Please let me know what you think. Thank you!

Yeah, I think that's the best we can do right now. Could you just drop the gdb_remote_offset from the GetPtraceOffset. It can be recovered from the index argument, and I don't want people to think that they _have to_ implement the ptrace offset computation via the gdb offset. I mean ideally, I hope the two things will be the same everywhere else, and x86 is just weird, but conceptually, I'd like the individual architectures to be free to compute the ptrace offset any way they like. Also, could you put that into a separate patch, as it functionally independent of this.

guiandrade edited the summary of this revision. (Show Details)

Moving MPX fix to a parent change

Properly separating MPX fix from this change (I guess)

Hey @labath,

Have you had a chance to take a look at the last revision of this?

Tks!

I'm sorry, this dropped off my radar. The code looks fine, but it could use some more tests. For instance, one test when you set the setting value to the non-default , and then check that lldb does _not_ use the g packet .

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemoteProperties.td
23

Having a capital G in the setting name would be somewhat unprecedented (though I wouldn't be really opposed to it), but I think that at least the description should say 'G'.

Adding tests

guiandrade marked an inline comment as done.Oct 28 2019, 4:40 PM

I'm sorry, this dropped off my radar. The code looks fine, but it could use some more tests. For instance, one test when you set the setting value to the non-default , and then check that lldb does _not_ use the g packet .

Yeah, more tests would be useful. They made me notice an issue btw. I was simply sending a 'g' packet and checking if the server replied back nicely; however, IIUC with 'G' packets it isn't so simple because we also need to send the registers data in the same packet. I'm not sure how I could do that, and I'm also afraid that check could get too expensive. Do you have any idea what could be a nice way to handle that?

Thank you!

Yeah, more tests would be useful. They made me notice an issue btw. I was simply sending a 'g' packet and checking if the server replied back nicely; however, IIUC with 'G' packets it isn't so simple because we also need to send the registers data in the same packet. I'm not sure how I could do that, and I'm also afraid that check could get too expensive. Do you have any idea what could be a nice way to handle that?

fwiw I can't imagine a stub that supports g but not G.

I'm sorry, this dropped off my radar. The code looks fine, but it could use some more tests. For instance, one test when you set the setting value to the non-default , and then check that lldb does _not_ use the g packet .

Yeah, more tests would be useful. They made me notice an issue btw. I was simply sending a 'g' packet and checking if the server replied back nicely; however, IIUC with 'G' packets it isn't so simple because we also need to send the registers data in the same packet. I'm not sure how I could do that, and I'm also afraid that check could get too expensive. Do you have any idea what could be a nice way to handle that?

Well... I'd expect that a stub which does not support the G packet at all would return an "unsupported" response no matter what data you send it, whereas a stub which supports the G packet would return an "error" response to an incorrect G packet. But yeah, testing for the availability of packets which mutate the state is a bit tricky...

fwiw I can't imagine a stub that supports g but not G.

well.. lldb-server is in that bucket right now. :) However, one could definitely argue that this behavior is wrong and we shouldn't support that. However, that still leaves the question of what exactly should the use-g-packet setting do. In the previous version of this patch, it controlled both g and G. I think that was mostly an accident, because it was implemented by hooking into the code which was trying to support stubs which did not understand the p/P packet. But as the reason we're introducing it is performance, I think it'd make sense to separate these out, because writing is a much less common operation, and I expect wanting to write _all_ registers would be very rare. OTOH, wanting to _read_ all registers is not that uncommon, particularly as our stubs now expedite the most common registers, which means that the fact that one is attempting to read one of the non-expedited registers is a pretty good indication that he's going to read more than one.

So, one possibility would be to just have one setting (use-g-packet-for-reading), but make it so that it does not control the usage of the G packet -- the logic for that could stay as if (p_supported) send_P() else send_G(), while for reading we'd have something like if (p_supported && !g_requested) send_p() else send_g();.

guiandrade edited the summary of this revision. (Show Details)

Moving to a single config, use-g-packet-for-reading, that forces the use of 'g' packets for reading, but does not force 'G' for writing. The latter only ends up being used if 'p' packets aren't supported (it assumes that 'P' also will not).

labath accepted this revision.Oct 30 2019, 5:05 AM

This looks fine to me. Thanks for your patience. Do you still need someone to commit this for you?

This revision is now accepted and ready to land.Oct 30 2019, 5:05 AM

This looks fine to me. Thanks for your patience. Do you still need someone to commit this for you?

Np. Yes, I do. Could you please do it for me?

Thanks!

This revision was automatically updated to reflect the committed changes.