This is an archive of the discontinued LLVM Phabricator instance.

[lldb] Generic base for testing gdb-remote behavior
ClosedPublic

Authored by owenpshaw on Jan 17 2018, 12:52 PM.

Details

Summary

Adds new utilities that make it easier to write test cases for lldb acting as a client over a gdb-remote connection.

  • A GDBRemoteTestBase class that starts a mock GDB server and provides an easy way to check client packets
  • A MockGDBServer that, via MockGDBServerResponder, can be made to issue server responses that test client behavior.
  • Utility functions for handling common data encoding/decoding
  • Utility functions for creating dummy targets from YAML files

Split from the review at https://reviews.llvm.org/D42145, which was a new feature that necessitated the new testing capabilities.

Diff Detail

Repository
rL LLVM

Event Timeline

owenpshaw created this revision.Jan 17 2018, 12:52 PM

Looks fine to me. Wait for Pavel to give it the OK since he did more of the lldb-server testing stuff.

My main comment is about making sure the tearDown story is sufficiently robust. I want to be sure we don't introduce flakyness here.

packages/Python/lldbsuite/test/functionalities/gdb_remote_client/TestGDBRemoteClient.py
12 ↗(On Diff #130253)

Could you put some basic assertion here? I guess at this point we can expect to see QStartNoAckMode and/or some basic query packets (?)

packages/Python/lldbsuite/test/functionalities/gdb_remote_client/gdbclientutils.py
25 ↗(On Diff #130253)

call this yaml2obj? We should be able to use this function to create a Mach-O file as well...

164–167 ↗(On Diff #130253)

Maybe at least return valid hex digits here?

246–248 ↗(On Diff #130253)

The concurrent access to self._client here makes me very uneasy. If this were C, it would definitely be a race, but I don't think python can do much to avoid a fd-reassignment race anyway. The exception handling code below only reaffirms my suspicion. I think we should make this safer to avoid test flakyness down the line.

I think a client-initiated shutdown should be sufficient. SBProcess.Kill() should be enough to make sure the client socket is closed. After that, you just need to join the server thread (which should exit after it detects a connection drop).

What do you think?

249–250 ↗(On Diff #130253)

I don't think calling shutdown on a socket in listen mode is expected (i.e., just delete this comment).

280 ↗(On Diff #130253)

Could you try break the _receive function down into simpler chunks. The nested while loops make this quite hard to follow. I suggest a design that would involve functions like:

getFullPacket # checks whether accumulated input contains full packet and returns it
getPayload # verifies checksum, unframes, and unescapes

and making sure all sending happens in _handlePacket only.

431 ↗(On Diff #130253)

Add a message that the lines below are the last 10 packets received.

Updated with suggestions

  • Using process.Kill() to close client connection
  • Refactored server._receive
  • renamed elf-specific functions to be generic

Thank you for making the changes. The parsing code looks much cleaner now. I just have one more round of tiny remarks.

packages/Python/lldbsuite/test/functionalities/gdb_remote_client/gdbclientutils.py
241 ↗(On Diff #130472)

You've deleted closing of the listening socket altogether (I just wanted you to delete the comment). The socket itself needs to be closed, of course, I just don't think calling shutdown on it is needed.

400 ↗(On Diff #130472)

There's already a TestBase.process() that should do what you need.

Oh, and please add yaml2obj as a dependency of the check-lldb target in cmake.

owenpshaw updated this revision to Diff 130643.Jan 19 2018, 9:59 AM
  • Updated to use TestBase.process() instead of a new member
  • Added yaml2obj dependency. It it okay to be an unconditional dependency?
  • I can put the socket close() back in, but had removed it because python should close it automatically on garbage collection, so it didn't seem to matter either way
  • Added yaml2obj dependency. It it okay to be an unconditional dependency?

Yeah, it probably needs to be conditional (if(TARGET yaml2obj)), because standalone builds will not have that target. OTOH, standalone builds will probably also have an issue with finding yaml2obj. However, I don't think it should be up to you to make standalone builds work. cc'ing Kamil and Michal, as they are the ones who use those builds.

What I am not sure about is whether we need to make any changes to accomodate the XCode build. Jim, Davide: What would it take to make a test like this work from xcode?

  • I can put the socket close() back in, but had removed it because python should close it automatically on garbage collection, so it didn't seem to matter either way

Yes, please put it back. There's no telling when GC will happen, so we shouldn't be holding onto relatively scarce os resources waiting for that...

yaml2obj looks like a program from LLVM and we build and install it in pkgsrc.

owenpshaw updated this revision to Diff 130920.Jan 22 2018, 9:58 AM

Added back socket close()

It looks like the yaml2obj target hasn't been defined yet when the check-lldb target is defined, so if(TARGET yaml2obj) always returns false. Is there another way to do the conditional dependency? I'm assuming we don't want to reorder the includes in llvm/tools/CMakeLists.txt.

It looks like the yaml2obj target hasn't been defined yet when the check-lldb target is defined, so if(TARGET yaml2obj) always returns false. Is there another way to do the conditional dependency? I'm assuming we don't want to reorder the includes in llvm/tools/CMakeLists.txt.

Ah, I see. We can then probably gate this on IF(NOT LLDB_BUILT_STANDALONE).

In the standalone build, we put yaml2obj into $PREFIX/bin/, so into the default PATH of the toolchain.

  • Updated condition for yaml2obj dependency, now based on LLDB_BUILT_STANDALONE
  • Find yaml2obj in the same directory as clang
  • Move yaml2obj code into lldbtest.py so it's available to all tests

Is the findBuiltClang function the right thing to use here? I can't find anything else that uses it.

labath accepted this revision.Jan 25 2018, 4:15 AM

This is great. Thank you for doing this. I think we're ready to wrap this up.

This revision is now accepted and ready to land.Jan 25 2018, 4:15 AM
clayborg accepted this revision.Jan 25 2018, 6:37 AM

Thanks for iterating on this. This will be great.

Thanks for the help! Someone with commit access will need to land it.

labath added inline comments.Jan 26 2018, 7:59 AM
packages/Python/lldbsuite/test/lldbtest.py
1588 ↗(On Diff #131310)

I had to modify these to llvm-build/Release+Asserts/x86_64/bin/clang to be able to find the binary. I presume this is a change in xcode project layout. Can anyone confirm that?

The xcode build uses cmake to build llvm & clang. It passes a build flavor, by passing -DCMAKE_BUILD_TYPE=Debug for instance, but other than that it lets the llvm build figure out how to lay itself out.

Moreover, there are a bunch of defines that lldb sets up to find includes and some other bits from the clang build directory. The all assume that the LLVM_BUILD_ARCH is the last part of the path before you get to the actual good stuff. The uses of those defines was last touched on 02/17 (and that actually didn't change substance, it looks like it was just a line-ending change.) So I can't see how you would either get Release+Asserts/x86_64/Release+Asserts as a build layout or how the Xcode project would have dealt successfully with that.

Thanks for the clarification. It seems nobody was calling that function, so maybe it never actually worked.

I'll try committing this for Owen with the clang paths modified and check whether the bots like it.

This revision was automatically updated to reflect the committed changes.
labath added inline comments.Feb 1 2018, 3:35 AM
lldb/trunk/packages/Python/lldbsuite/test/functionalities/gdb_remote_client/gdbclientutils.py
335–337

Which servers were you observing here? I tried lldb-server and gdbserver and they don't send the ack back (It seems like a bad idea anyway, as it could lead to an infinite ack ping-pong).

I have removed this code as it was causing flakyness on the bots. If there is a server that does this, and we care about supporting it, then we can add a new targeted test which emulates this behavior, but then we need to also fix the client to handle this situation better.

owenpshaw added inline comments.Feb 1 2018, 9:57 AM
lldb/trunk/packages/Python/lldbsuite/test/functionalities/gdb_remote_client/gdbclientutils.py
335–337

Just took another look and I misunderstood what was going on. The server does send an opening +, but not in response to the client.

I was observing the packets from openocd, where lldb sends a + and then receives a + from openocd, and assumed one was a response to the other.

(lldb) log enable gdb-remote packets
(lldb) gdb-remote 3333
                 <   1> send packet: +
                 history[1] tid=0x0307 <   1> send packet: +
                 <   1> read packet: +
                 <  19> send packet: $QStartNoAckMode#b0
                 <   1> read packet: +
                 <   6> read packet: $OK#9a
                 <   1> send packet: +

Can't find any docs about these opening acks, but looking at openocd code, it just always sends a + at the start of a connection. Should the test server do the same?

I am pretty sure that the ack that gets sent should happen in response to getting an ACK. Note that LLDB send an ACK and then waits for an ACK. This helps us see if anything is there. So yes, this code can always just respond to an unsolicited ACK with an ACK.

I am pretty sure that the ack that gets sent should happen in response to getting an ACK. Note that LLDB send an ACK and then waits for an ACK. This helps us see if anything is there. So yes, this code can always just respond to an unsolicited ACK with an ACK.

LLDB sends an ACK, and then waits for anything. That "anything" can also be "nothing" (see GDBRemoteCommunicationClient::HandshakeWithServer). lldb-server currently does not respond to the initial ACK, which is good, because the length of the initial client wait (10ms) would mean this reply-ack would often get missed, and the connection would get out of sync (as this code demonstrated).

If we really want to have our servers send an initial ack, then we need to fix the client.