This commit picks up where https://reviews.llvm.org/D47879 left, and implements vAttachWait in lldb-server, so --waitfor can be used on Linux
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Hi guys.
I recently ran into an issue where I couldn't use --waitfor on Linux. I found out that there's a patch that adds the functionality which was only missing a test (https://reviews.llvm.org/D47879), so I decided to pick up from where it left.
@labath I added a test based on TestGdbRemoteAttach, as you suggested in the lldb-dev thread, however I couldn't get it working yet. Currently it fails on line 46 (context = self.expect_gdbremote_sequence) with a "Conenction reset by peer" exception. I'm pretty new to this part of LLDB, so I'm not sure on how to get this properly working. Could you give me some pointers?
Thanks for picking this up.
I'm not sure if this is the (only) reason for the failure, but I have a feeling you're encoding the packet incorrectly -- the application name should be hex-encoded. Try using lldbgdbserverutils.gdbremote_hex_encode_string instead.
Hi @labath. I see! I changed it to lldbgdbserverutils.gdbremote_hex_encode_string. Looking at the logs, I found that the checksum I inserted was wrong, so I've corrected that as well (is there a way I can calculate the checksum on the test, instead of hard-coding it in the string?). Despite these changes, I still get the same error (ConnectionResetError: [Errno 104] Connection reset by peer on line 46).
I did test the --waitfor functionality on a program with an infinite loop, but could it be that the test executable is exiting too quickly?
Hi @labath. Ok, I believe the test is passing now. Thank you for all the help today!
Question: the original author left a TODO:
// TODO: Make the polling interval configurable std::chrono::milliseconds waitfor_interval = std::chrono::seconds(1);
Is this a lot of work? Is it worth it to do this together in this patch?
Patch looks good, just a question of what timeout to use by default and possible adding an option to lldb-server in case users want faster or slower polling.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp | ||
---|---|---|
344 | We probably want this to be quite a bit shorter than 1 second. Why? Because if you don't find the process right away, 1 second is a long time for the system to wait and your program can execute quite a few instructions in 1 second. By the time you attach, you might have missed your breakpoint at main. Nothing will guarantee hitting a breakpoint, but I would suggest making a smaller timeout, maybe like 1 millisecond. It would be good to benchmark this a bit by waiting for a process and not launching a new process to see how much CPU lldb-server takes up with a shorter timeout. Adding a command line option might be nice is someone could specify "--wait-poll-msec 0" to constantly poll the system as fast as possible in case they do need to catch the process as early as possible. |
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp | ||
---|---|---|
344 |
It looks like there already is an option like what you're describing on debugserver: "waitfor-interval". As well as another related flag: "waitfor-duration". I'll try implementing them in this patch as well. |
Adds support to 'vAttachOrWait', as well as the 'waitfor-interval' and 'waitfor-duration' flags.
@clayborg I've added support for the 'waitfor-interval' and 'waitfor-duration' flags. Yesterday I thought they existed in macOS, but now I'm not so sure, as I couldn't find them on "Options.td". They were added in 2009, so maybe they did exist but were dropped later. Or I just didn't look at the right place.
A couple of questions:
- About the default polling interval, I've set it to 1ms for now, but I found that (in my system) this keeps a core at 100%. 10ms keeps still keeps it at around 80%, and 100ms keeps it at around 35%. Maybe 100ms is a good balance?
- On "Options.td", besides the "process attach" command, there's also a "platform process attach". I haven't touched it since I'm not familiar with the platform command. Should I add these flags to the platform counterpart as well?
Question: why does lldb queries if vAttachOrWait is supported, but not vAttachWait? Does it make sense to keep this query? Or should I remove it?
"vAttachOrWait" allows you to attach to a process by name even if it already exists, where "vAttachWait" will ignore any current existing processes with a matching name. LLDB will fall back to using vAttachWait if the users requested to NOT ignore existing processes since some support for waiting for a process by name is better than nothing. So we need to check for vAttachOrWait so we can use it if needed to allow attaching to an existing process if the user requested it. So since this is very similar code to what you have it just doesn't make an ignore list like you did in this patch. So it might be nice to add support for vAttachOrWait, along with the query packet, in this patch if you have the time? Let me know if you have any questions.
So it might be nice to add support for vAttachOrWait, along with the query packet, in this patch if you have the time?
Hi @clayborg, I saw that implementing vAttachOrWait was already 90% done, so I did just that :) The current patch as-is supports vAttachOrWait. My question was more on the lines of if keeping the query for vAttachOrWait made sense now that it's implemented on Linux. Are there be other targets where it isn't implemented?
I also left a comment with a couple of questions earlier, but I think it got hidden by my last comment. I'll repost it here:
@clayborg I've added support for the 'waitfor-interval' and 'waitfor-duration' flags. Yesterday I thought they existed in macOS, but now I'm not so sure, as I couldn't find them on "Options.td". They were added in 2009, so maybe they did exist but were dropped later. Or I just didn't look at the right place.
A couple of questions:
- About the default polling interval, I've set it to 1ms for now, but I found that (in my system) this keeps a core at 100%. 10ms keeps still keeps it at around 80%, and 100ms keeps it at around 35%. Maybe 100ms is a good balance?
- On "Options.td", besides the "process attach" command, there's also a "platform process attach". I haven't touched it since I'm not familiar with the platform command. Should I add these flags to the platform counterpart as well?
Could you confirm for me that 'waitfor-interval' and 'waitfor-duration' don't exist on macOS? If I'm sending a different message format than what's done on macOS that'd be an issue.
Best to not change anything if we don't know. Yes there are other GDB servers out there that may not implement vAttachOrWait, so I would vote to not change anything.
I also left a comment with a couple of questions earlier, but I think it got hidden by my last comment. I'll repost it here:
@clayborg I've added support for the 'waitfor-interval' and 'waitfor-duration' flags. Yesterday I thought they existed in macOS, but now I'm not so sure, as I couldn't find them on "Options.td". They were added in 2009, so maybe they did exist but were dropped later. Or I just didn't look at the right place.
A couple of questions:
- About the default polling interval, I've set it to 1ms for now, but I found that (in my system) this keeps a core at 100%. 10ms keeps still keeps it at around 80%, and 100ms keeps it at around 35%. Maybe 100ms is a good balance?
debugserver in the llvm sources uses 1ms as its default. I would vote to be as quick as possible so we can catch the process as soon as possible. The option --waitfor-interval specifies the time in microseconds, so we will want to mirror that. So even though it does peg one core at 100%, I think it might be worth it to catch the process sooner than later.
- On "Options.td", besides the "process attach" command, there's also a "platform process attach". I haven't touched it since I'm not familiar with the platform command. Should I add these flags to the platform counterpart as well?
Just "process attach" is fine to start with. The platform layer doesn't always have to use a GDB server, so adding those options there could end up being hard for some platforms to implement. In fact I would mind if we don't even add these options to "process attach" and just always use the defaults, because as you say, if someone selects or replaces the lldb-server with an older one, then we don't want the launching of the lldb-server to fail because it doesn't recognize an option that is being passed to it. The better way to fix this would be to make a new attach wait packet that takes these as arguments to the packet, but we don't need to do that.
Could you confirm for me that 'waitfor-interval' and 'waitfor-duration' don't exist on macOS? If I'm sending a different message format than what's done on macOS that'd be an issue.
Here lies the problem that I mentioned above. I would like to avoid having to launch lldb-server with any arguments so that we continue to work with older lldb-servers.
So maybe we just rely on defaults for now and avoid having to add any new arguments to "process attach" and to the launching of the lldb-server? Let me know your thoughts.
Another option would be to have lldb-server check for environment variables for default values for --waitfor-interval and --waitfor-duration. If they are set, they become the new default values. Of course a user can launch the lldb-server manually with the options set a different way and then attach with "process connect ..." if required. But this would provide an alternate way for users to control the polling and timeout. I would just hate for us to add these new flags and have some people not be able to debug because their GDB server gets launched (because they replaced the lldb-server) and doesn't support the flags.
Here lies the problem that I mentioned above. I would like to avoid having to launch lldb-server with any arguments so that we continue to work with older lldb-servers.
So maybe we just rely on defaults for now and avoid having to add any new arguments to "process attach" and to the launching of the lldb-server? Let me know your thoughts.
Sorry, I don't think I fully understand this point. Do you mean we shouldn't add 'waitfor-interval' and 'waitfor-duration' since older lldb-server versions might not recognize the packet format?
I would vote to:
- modify lldb-server
- add '--waitfor-interval' and '--waitfor-duration' options to lldb-server for people that want to launch lldb-server manually
- add vAttachWait, vAttachOrWait and the query packet to check if vAttachOrWait is supported
- no modifications to lldb itself
If I understood what you were asking before when you said:
On "Options.td", besides the "process attach" command, there's also a "platform process attach". I haven't touched it since I'm not familiar with the platform command. Should I add these flags to the platform counterpart as well?
It sounded like you were going to modify "process attach" to take options that would allow people to specify the '--waitfor-interval' and '--waitfor-duration' and then when we spawned lldb-server, pass these same options down to lldb-server or debugserver on Darwin. I was thinking it would be nice to avoid this part of the change as it then requires any lldb-server that LLDB uses to support these options. Lets say someone has a lldb-server already installed on their system, only updates the lldb binary and then tries to debug something with "process attach --waitfor". It would launch the lldb-server possibly with the '--waitfor-interval' options and it would exit with a non zero status saying "I don't know that option". Do we currently add or remove options when launching lldb-server based on any "process attach" arguments? If we do, then my suggestion doesn't make sense. If we don't, then it does make sense.
I think I get your point. If we pass the extra options in the packet, the validation on older lldb-server versions will reject the message.
Another option would be to have lldb-server check for environment variables for default values for --waitfor-interval and --waitfor-duration. If they are set, they become the new default values. Of course a user can launch the lldb-server manually with the options set a different way and then attach with "process connect ..." if required. But this would provide an alternate way for users to control the polling and timeout.
I'll defer to you guys, since you're a lot more knowledgeable on lldb than I am :)
I do have another possible solution we could consider: can we create a new message, similar to qVAttachOrWaitSupported, that queries if these extra options are supported or not. Something like qVAttachWaitExtraOptionsSupported. We might even consider returning something akin to a "version number", so if we support more options in the future we can bump up the number.
Let me know what you think (again, I'm fine doing it either way).
Yeah, I was thinking it might be nice to have a better attach + wait packet.
We have added new JSON packets in the past that are more flexible and allows us to specify arguments using a JSON dictionary. Currently the vAttachWait or vAttachOrWait packets have a hard coded single process name argument that is appended as hex ASCII so for "a.out":
vAttachWait:612E6F7574 vAttachOrWait:612E6F7574
But we could make a new "jAttachWait" that could be something like:
jAttachWait:{"executable":"a.out", "waitfor-interval-usec": 1000, "waitfor-duration-sec": 20}
Then the question becomes where the values for "waitfor-interval-usec" and "waitfor-duration-sec" come from. Since "process attach" has to work for any process plug-in (there are many in "lldb/source/Plugins/Process"), if we add any options to "process attach", each of the plug-ins in "lldb/source/Plugins/Process" would need to handle those options. It might be better to add new settings for the GDB remote plug-in (AKA: ProcessGDBRemote). There are global settings already for this plug-in:
(lldb) settings show ... plugin.process.gdb-remote.packet-timeout (unsigned) = 5 plugin.process.gdb-remote.target-definition-file (file) = plugin.process.gdb-remote.use-g-packet-for-reading (boolean) = false plugin.process.gdb-remote.use-libraries-svr4 (boolean) = true
We would easily add new settings to lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemoteProperties.td:
plugin.process.gdb-remote.waitfor-interval-usec (unsigned) = 1000 plugin.process.gdb-remote.waitfor-duration-sec (unsigned) = 20
And then use these in the new JSON packet as arguments.
Ok, that makes sense!
Let me summarize the work that needs to be done, correct me if I get something wrong:
- Change back vAttachWait and vAttachOrWait to the original format of sending only the process name.
- Add default values for "waitfor-interval-usec" and "waitfor-duration-sec" on ProcessGDBRemoteProperties.td.
- Add a new jAttachWait packet type, which sends a json which will read the values saved on the settings. Question: would we need a jAttachOrWait variant? Or could we send something like "ignore-existing" in jAttachWait?
- Add a qJAttachWaitSupported packet. If jAttachWait isn't supported, fallback to vAttachWait (we should also probably warn the user that their custom settings won't be used in this case).
- Add a test for jAttachWait.
Question: would we also keep the waitfor-interval and waitfor-duration command line options as well, and when they're missing, default to the settings? Or do we forego them completely?
I don't think it's a good idea to stuff all of this into a single patch. Can we go back to the version which just implements the basic vAttachWait packet (we can bikeshed on what the default interval should be)? I believe new commands/options/packets should be done in separate patches...
I don't think it's a good idea to stuff all of this into a single patch. Can we go back to the version which just implements the basic vAttachWait packet (we can bikeshed on what the default interval should be)? I believe new commands/options/packets should be done in separate patches...
Ok. I can open other patches after this one is merged.
Thanks. Feel free to create the other patches even before that (the downside is lost productivity due to more rebasing).
Thanks. This talk of all these packets has made me realize I did not completely understand the purpose of this packet. To really test the exclusion functionality of this packet, I guess we should be launching two instances of the inferior (one before sending the packet, one after), and then checking we attach to the right one. It looks like it should be easy to add to the existing test. Other than that (and some inline simplifications) this looks fine to me.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp | ||
---|---|---|
360 | ||
362–364 | ||
374–377 | You may need to #include "llvm/ADT/STLExtras.h" to get this... | |
389–391 | ||
393 | ||
395 |
Changes test to launch a process before and after we ask for the attach. Minor code fixes as well.
LGTM. Nice and simple. We can do another patch for vAttachOrWait as it will be very simple modifications and very similar to this patch. I am find avoiding all of the polling interval and duration settings for now.
looks good to me too. When you get around to the wait times & intervals I'd argue for not doing that as a GDBRemote specific addition, as Greg was suggesting above. There's nothing gdb-remote specific about how long you want the debug agent to wait around for some process to show up.
As long as we are find passing these arguments to _any_ process subclass I am find doing these as options.
I can't see why we wouldn't be, but I'm also not sure why we would need to change any Process API's. Wouldn't we just add the interval and timeout to the ProcessAttachInfo?
Looks good, modulo the inline comment.
Do you have commit access or you need someone to commit this for you?
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp | ||
---|---|---|
350 | Why posix? I'd expect native here... |
Looks good, modulo the inline comment.
Ok! Can I correct the issue pointed out on the comment on the next (vAttachOrWait) patch?
Do you have commit access or you need someone to commit this for you?
I don't, if one of you could do it that'd be great :)
We probably want this to be quite a bit shorter than 1 second. Why? Because if you don't find the process right away, 1 second is a long time for the system to wait and your program can execute quite a few instructions in 1 second. By the time you attach, you might have missed your breakpoint at main. Nothing will guarantee hitting a breakpoint, but I would suggest making a smaller timeout, maybe like 1 millisecond. It would be good to benchmark this a bit by waiting for a process and not launching a new process to see how much CPU lldb-server takes up with a shorter timeout.
Adding a command line option might be nice is someone could specify "--wait-poll-msec 0" to constantly poll the system as fast as possible in case they do need to catch the process as early as possible.