This is an archive of the discontinued LLVM Phabricator instance.

Create a generic handler for Xfer packets
ClosedPublic

Authored by aadsm on May 27 2019, 3:12 PM.

Details

Summary

This is the first of a few patches I have to improve the performance of dynamic module loading on Android.

In this first diff I'll describe the context of my main motivation and will then link to it in the other diffs to avoid repeating myself.

Motivation

I have a few scenarios where opening a specific feature on an Android app takes around 40s when lldb is attached to it. The reason for that is because 40 modules are dynamicly loaded at that point in time and each one of them is taking ~1s.

The problem

To learn about new modules we have a breakpoint on a linker function that is called twice whenever a module is loaded. One time just before it's loaded (so lldb can check which modules are loaded) and another right after it's loaded (so lldb can check again which ones are loaded and calculate the diference).
It's figuring out which modules are loaded that is taking quite some time. This is currently done by traversing the linked list of loaded shared libraries that the linker maintains in memory. Each item in the linked list requires its own x packet sent to the gdb server (this is android so the network also plays a part). In my scenario there are 400+ loaded libraries and even though we read 0x800 worth of bytes at a time we still make ~180 requests that end up taking 150-200ms.
We also do this twice, once before the module is loaded (state = eAdd) and another right after (state = eConsistent) which easly adds up to ~400ms per module.

A solution

Implement xfer:libraries-svr4 in lldb-server:
I noticed in the code that loads the new modules that it had support for the xfer:libraries-svr4 packet (added ~4 years ago to support the ds2 debug server) but we didn't support it in lldb-server. This single packet returns an xml list of all the loaded modules by the process. The advantage is that there's no more need to make 180 requests to read the linked list. Additionally this new requests takes around 10ms.

More efficient usage of the xfer:libraries-svr4 packet in lldb:
When xfer:libraries-svr4 is available the Process class has a LoadModules function that requests this packet and then loads or unloads modules based on the current list of loaded modules by the process.
This is the function that is used by the DYLDRendezvous class to get the list of loaded modules before and after the module is loaded. However, this is really not needed since the LoadModules function already loaded or unloaded the modules accordingly. I changed this strategy to call LoadModules only once (after the process has loaded the module).

Bugs
I found a few issues in lldb while implementing this and have submitted independent patches for them.

I tried to devide this into multiple logical patches to make it easier to review and discuss.

Tests

I wanted to put these set of diffs up before having all the tests up and running to start having them reviewed from a techical point of view. I'm also having some trouble making the tests running on linux so I need more time to make that happen.

This diff

The xfer packages follow the same protocol, they are requested with xfer:<object>:<read|write>:<annex>:<offset,length> and a return that starts with l or m depending if the offset and length covers the entire data or not. Before implementing the xfer:libraries-svr4 I refactored the xfer:auxv to generically handle xfer packets so we can easly add new ones.

The overall structure of the function ends up being:

  • Parse the packet into its components: object, offset etc.
  • Depending on the object do its own logic to generate the data.
  • Return the data based on its size, the requested offset and length.

Event Timeline

aadsm created this revision.May 27 2019, 3:12 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2019, 3:12 PM
aadsm edited the summary of this revision. (Show Details)May 27 2019, 3:13 PM
aadsm edited the summary of this revision. (Show Details)

Thank you for working on this. This can speed up the handling of shared library loading by a lot. Also, thank you for splitting the work up into logical pieces. Please find my comments inline.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2751–2771

I think it would be simpler to just forget about the StringExtractor here, and do something like

SmallVector<StringRef, 4> fields;
StringRef(packet.GetStringRef()).split(fields, 3);
if (fields.size() != 4) return "Malformed packet";
// After this, "object" is in fields[0], "action" is in fields[1], annex is in fields[2], and the rest is in fields[3]
std::string buffer_key = fields[0] + fields[2];
...
2771

Given that this function is going to grow, it would be good to split it in smaller chunks. Maybe move this code into something like ErrorOr<xfer_map::iterator> ReadObject(StringRef object)?

2773

I think we should just remove this ifdef. NetBSD and linux are the only platforms supported by lldb-server right now. And anyway, any implementation which does not support auxv reading can just return an error from process->GetAuxvData...

If we want to differentiate between SendErrorResponse and SendUnimplementedResponse we can just develop a convention that a specific error code (llvm::errc::not_supported) means "unimplemented".

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.h
85–86

This could be an llvm::StringMap. Also, it should be enough to use unique_ptr here. You just need to make sure that you don't copy the value out of the map. I recommend a pattern like

iter = map.find("foo");
if (iter == end()) {
  auto buffer_or_error = getFoo();
  if (buffer_or_error)
    iter = map.insert("foo", std::move(*buffer_or_error))
  else
    report(buffer_or_error.getError());
}
StringRef buffer = iter->second->getBuffer();

do_stuff(buffer);

if (done)
  map.erase(iter);

Thanks for doing this work! I'd love to see a faster LLDB. :)

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2751–2771

I agree, that would make it simpler.

2771

+1

You could have smaller methods like "ParseAuxvPacket" and "ReadObject"

aadsm marked an inline comment as done.May 28 2019, 8:05 PM
aadsm added inline comments.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2771

I'm actually trying to return an llvm::Expected so I can return a createStringError that will have a message and an error code (as far as I can see ErrorOronly allows to return an error code).
However, I can't figure out how to get both the error code and error message from the takeError() function. I found llvm::errorToErrorCode and llvm::toString but they both require to pass ownership of the error to them so I can only use one of them.
Is the only way to use the handleErrors function that will give me access to the underlying ErrorInfo where I can then call convertToErrorCode and getMessage on it? Sounds overly complicated for something that's probably simpler than this.

labath added inline comments.May 29 2019, 12:33 AM
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2771

Yeah, using Expected<T> is a even better idea.

handleErrors is the right way to do this kind of thing. If you're using Expected then the best way to handle this situation would be to define a custom error type to mean "unimplemented". Then you could do something like:

Expected<T> t = getT();
if (!t) {
 ??? result;
  handleAllErrors(t.takeError(),
    [&] (UnimplementedError &e) { result = SendUnimplementedResponse(e.message()); },
   // TODO: We should have a SendErrorResponse version that takes a llvm::Error
    [&] (std::unique_ptr<ErrorInfo> e) { result = SendErrorResponse(Status(Error(std::move(e)));
  );
  return result;
}
do_stuff(*e);

Ideally this code wouldn't even live inside the packet handler function, but we would have a separate function for that, and packet handlers would just return Expected<Packet>. However, that's for another patch...

One time just before it's loaded (so lldb can check which modules are loaded) and another right after it's loaded (so lldb can check again which ones are loaded and calculate the difference).

There is on NetBSD and on a selection of other OSs: _rtld_debug_state integrated as a part of ELF dynamic loader.

Is there something like that on Android that could be reused?

One time just before it's loaded (so lldb can check which modules are loaded) and another right after it's loaded (so lldb can check again which ones are loaded and calculate the difference).

There is on NetBSD and on a selection of other OSs: _rtld_debug_state integrated as a part of ELF dynamic loader.

Is there something like that on Android that could be reused?

Yes, there is, and it's being used now. The question here is what do we do *after* we hit the dynamic loader breakpoint...

One time just before it's loaded (so lldb can check which modules are loaded) and another right after it's loaded (so lldb can check again which ones are loaded and calculate the difference).

There is on NetBSD and on a selection of other OSs: _rtld_debug_state integrated as a part of ELF dynamic loader.

Is there something like that on Android that could be reused?

Yes, there is, and it's being used now. The question here is what do we do *after* we hit the dynamic loader breakpoint...

First I will need to make research of it locally as it was probably not used in some time.

aadsm updated this revision to Diff 202323.May 30 2019, 3:36 PM
  • Introduce better error handling by creating 2 new error classes, one for generic packet errors that contain a number and another for unimplemented features.
  • The logic for reading the xfer object was moved into its own function.
  • Removes the define for linux || netbsd for the auxv xfer object.
  • Switched from shared pointer to a unique pointer to store the memory buffers in the map

Thanks for the update. The actual code looks mostly good, but I have some comments on the error handling infrastructure. I am sorry that this is taking a bit long, but I am trying to make sure it's not unnecessarily overcomplicated (in the past we've generally not paid much attention to the error codes, and it doesn't look like we're about to start now), but instead it makes good use of our existing features (like error strings), and is generally unobtrusive so the actual code stands out (not needing to both log and create an error object when something happens helps that). Since you're building this as a general tool that can be used for future packets (or refactors of old ones), I believe it's worth the trouble.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServer.cpp
118

I am sorry, but I just remembered there's one more case to think about here. The error might actually be a ErrorList and contain multiple errors. It's not a commonly used feature (in fact, it might go away at some point), but it exists and if such an error happens to make it's way here it will cause us to send multiple error messages and completely mess up the packet synchronization.

So we should at least guard against that with assert(!error.isA<ErrorList>()); or implement this in a way that guarantees only one response would be sent. One way to do that would be to just send one of the errors based on some semi-arbitrary priority list. Maybe something like:

std::unique_ptr<PacketError> PE;
std::unique_ptr<PacketUnimplementedError> UE;
...
llvm::handleAllErrors(std::move(error),
  [&] (std::unique_ptr<PacketError> E) { PE = std::move(E); },
  ...);
if (PE)
  return SendErrorResponse(...);
if (UE)
  return SendUnimplementedError(...);
...
125

I'm a bit confused. What does this call exactly? It looks to me like this will use the implicit (we should probably make it explicit) std::unique_ptr<ErrorInfoBase> constructor of Error to call this method again and cause infinite recursion.

I'd suggest doing something like SendErrorResponse(Status(Error(std::move(e))). The advantage of this is that the Status overload knows how to send an error as string (we have an protocol extension for that), and so will provide a better error message on the client side.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServer.h
83

You need to define static char ID here too. Otherwise the dynamic type detection magic will not work..

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2758–2762

In the spirit of the comment below, here I'd just do something like return createStringError(inconvertibleErrorCode(), "No process");

2770

I am wondering whether we actually need the PacketError class. Such a class would be useful if we actually wanted to start providing meaningful error codes to the other side (as we would give us tighter control over the allocation of error codes). However, here you're just taking random numbers and sticking them into the PacketError class, so I am not sure that adds any value.

So, what I'd do here is just delete the PacketError class, and just do a return llvm::errorCodeToError(ec) here. I'd also delete the log message as the error message will implicitly end up in the packet log via the error message extension (if you implement my suggestion SendErrorResponse).

2793–2794

A slightly more efficient (and shorter) way to concatenate this would be (xfer_object + xfer_action + xfer_annex).str(). "Adding" two StringRefs produces an llvm::Twine (https://llvm.org/doxygen/classllvm_1_1Twine.html), which can then be converted to a string by calling .str()

2811

unused variable.

2849

since you already have the iterator around, it's more efficient to use that for the erase operation.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.h
85–86

What about the StringMap part ? :)

aadsm marked 4 inline comments as done.May 31 2019, 7:01 AM
aadsm added inline comments.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServer.cpp
125

ah, not sure what I was thinking here. I remember writing SendErrorResponse(Status(Error(std::move(e))) (or something to that effect) but somehow ended up with this recursion..
I need to force these errors to happen to make sure everything makes sense.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServer.h
83

Ah, I guess it would only match against the StringError because it's inheriting that one?

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2770

I thought it would be nice to have a little abstraction layer around the packet errors overall. My purpose with the PacketError is to make it more obvious in the code that the number given will be sent back as an error number.
I didn't realize the numbers we were using were meaningless today though (now that I think of it this ec.value is really whatever GetAuxvData returns). I searched at the time and found a few different numbers being used: 9, 4, 10, etc. I guess these numbers are just lies then :D.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.h
85–86

completely forgot about that :D

labath added inline comments.May 31 2019, 7:15 AM
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServer.cpp
125

It doesn't look like it should be too hard to instantiate GDBRemoteCommunicationServer from a unit test.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServer.h
83

I think it would match against *all* StringErrors, even those that are not PacketUnimplementedErrors.

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2770

Yeah, the only way you can assign meaning to these numbers today is if you open the source code and search for the occurrences of the given number. :)
That will be even easier if we switch to using strings. :)

aadsm marked an inline comment as done.May 31 2019, 3:30 PM
aadsm added inline comments.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2770

But we can't return strings on the protocol though, it will have to be a E NN.
I'm going with your suggestions but how about this in a future patch:

Have a base PacketError(num, message) and then subclass that one with the different errors we have like NoProcessAvailablePacketError() that would code their own error number and message (or maybe we just need to have static functions like PacketError::createNoProcessAvailableError()?).

Then, on the client side we could add an Optional<PacketError> GetResponseError() to StringExtractorGDBRemote that would create the right packet error given the number, so we can print a descriptive error message on the lldb terminal. (maybe GDBResponseError instead of PacketError...)

aadsm marked an inline comment as done.May 31 2019, 5:05 PM
aadsm added inline comments.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2770

Nevermind what I said about sending strings, I just noticed m_send_error_strings.

aadsm marked an inline comment as done.May 31 2019, 5:51 PM
aadsm added inline comments.
lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2766

@labath, I was thinking about the auto rule and I'm not sure how it would play here. I remember having to check the GetAuvData to know what this was returning. However, it's quite verbose but I have found instances like this in the codebase:

ErrorOr<std::unique_ptr<WritableMemoryBuffer>> buf =
      llvm::WritableMemoryBuffer::getNewMemBuffer(auxv_size);
aadsm updated this revision to Diff 202640.Jun 2 2019, 9:40 PM
aadsm marked an inline comment as done.

Address comments and add tests

aadsm marked an inline comment as done.Jun 2 2019, 9:43 PM
aadsm added inline comments.
lldb/unittests/tools/lldb-server/tests/CommunicationServerTest.cpp
40–42 ↗(On Diff #202640)

Is there a better way to do this? I found the copy function but it accepts a char *, not a const char *.

labath added a comment.Jun 3 2019, 2:28 AM

Thanks for writing the test, I have a bunch of more comments on the test itself, but hopefully this will be the last iteration. :)

lldb/source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2766

Yeah, for more complex types like this, the readability question gets a bit fuzzier. Right now, I think I'd spell this type out fully, but even I'm not fully consistent about that (I'm pretty sure I originally wrote the auto line on the LHS). Overall, I wouldn't worry too much about this particular case.

lldb/unittests/tools/lldb-server/tests/CommunicationServerTest.cpp
1 ↗(On Diff #202640)

Given the current layout of the code, a better place for this would be unittests/Process/gdb-remote, because that's where the class it is testing lives, and it is also the place where the "client" unit tests are located. The "client" unit tests are more similar to this than the "lldb-server" "unit" tests, because the latter test the lldb-server as a whole, whereas the client tests do mocking and other unittest-y stuff.

(Also, please name it GDBRemoteCommunicationServerTest, based on the class-under-test name)

40–42 ↗(On Diff #202640)

I'm not sure which copy function you meant, but I'd just do something like packets.emplace_back(static_cast<const char *>(dst), dst_len)

47 ↗(On Diff #202640)

You could just make this class take a std::vector<std::string> &, and make the MockServer class responsible for owning the packet array and handing it out via GetPackets as llvm::ArrayRef<std::string>.

Shared pointers are used much more frequently than they need to be in lldb. If you look at the llvm codebase, you'll see that it uses shared ownership extremely rarely. For simple types like this it doesn't matter, but once you start having objects with non-trivial destructors, and cross-referencing them via shared pointers, it becomes hard to reason about what are the possible orders destruction of various things. That's why it's better to just not use them at all, if possible.

50 ↗(On Diff #202640)
59–69 ↗(On Diff #202640)

It looks like you could replace these by using GDBRemoteCommunicationServer::SendErrorResponse

81–82 ↗(On Diff #202640)

EXPECT_THAT(server.GetPackets(), testing::ElementsAre("$E42#ab"));

Besides being a one-liner, this will give you better error messages if we ever have more than one packet here (it will print out the packets instead of just saying "1 != 2").

aadsm updated this revision to Diff 202852.Jun 3 2019, 10:14 PM
aadsm marked 2 inline comments as done.

Move test to a better place, address other comments related to tests.

labath accepted this revision.Jun 3 2019, 11:08 PM

Looks good to me now. Thanks for your patience.

lldb/unittests/Process/gdb-remote/GDBRemoteCommunicationServerTest.cpp
1–2 ↗(On Diff #202852)

This line wrapped accidentally.

lldb/unittests/Process/gdb-remote/GDBRemoteTestUtils.h
28 ↗(On Diff #202852)

A reference would be slightly better, as it is implicitly non-null.

This revision is now accepted and ready to land.Jun 3 2019, 11:08 PM
aadsm updated this revision to Diff 203235.Jun 5 2019, 1:05 PM
aadsm marked an inline comment as done.

Address final 2 comments

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2019, 1:58 PM